Cerebrium

Scalable serverless GPU infrastructure for building and deploying machine learning models, with high performance, cost-effectiveness, and ease of use.
Machine Learning Deployment Serverless Computing GPU Infrastructure

Cerebrium offers a serverless GPU infrastructure designed to make it easy to build and deploy machine learning models. Running GPUs serverlessly means you only pay for the compute you use, which is much cheaper than with alternatives like AWS or GCP.

Cerebrium is designed for heavy use, with high performance and scalability. Some of the key stats include:

  • Cold Starts: 3.4s
  • Requests Per Second: 5000
  • Uptime: 99.99%
  • SOC 2 Compliance: Security and reliability

The platform is designed to be easy for engineers to use:

  • GPU Variety: Supports multiple GPU types, including H100's, A100's, A5000's, and more.
  • Infrastructure as Code: Define environments in code for easy creation.
  • Volume Storage: Store and mount files or model weights directly to code, no S3 buckets needed.
  • Secrets: Use frameworks and platforms with secure credentials.
  • Hot Reload: Edit code and see changes live on GPU containers.
  • Streaming Endpoints: Stream output back to users in real-time.

Real-time logging and monitoring helps with debugging and performance monitoring:

  • Realtime Logs: Logs for builds and requests as they happen.
  • Cost Breakdowns: Breakdown of costs per model, per minute, and resource usage.
  • Alerts: Notifications for bad model states or high error rates.
  • Resource Utilization: Monitor resource usage and performance over time.
  • Performance Profiling: Monitor cold starts, runtime, and response times.
  • Status Codes: Customizable status codes for user feedback.

Cerebrium is designed to scale easily, making it good for big companies and startups:

  • Neglible Latency: Adds less than 60ms latency per request.
  • Redundancy: Distributed architecture across three regions for minimal downtime.
  • Minimal Failure Rates: 99.99% uptime and less than 0.01% request failure.

Cerebrium is pay-as-you-go, so you only pay for what you use:

  • Hobby: $0 per month, with 3 users, 3 deployed models, 5 GPU concurrency, and 1-day log retention.
  • Standard: $100 per month, with 10 users, unlimited models, 30 GPU concurrency, and 30-day log retention.
  • Enterprise: Custom pricing for large organizations, with dedicated support and unlimited features.
  • GPU Compute: Varies by GPU model, from $0.000282 to $0.001553 per second.
  • CPU Compute: $0.00005324 per core per second.
  • Memory: $0.00000659 per GB per second.
  • Storage: $0.3 per GB per month.

Cerebrium can be deployed in a variety of ways, including the ability to use your own AWS/GCP credits or deploy on your own infrastructure for better data privacy.

If you're curious about what Cerebrium can do, there are plenty of community contributed models and examples to get you started. You can also check the documentation and guides for deploying models like SDXL, Langchain, and Mistral 7B.

Check out the Cerebrium website for more information and to start a project today.

Published on June 14, 2024

Related Questions

Tool Suggestions

Analyzing Cerebrium...