If you're looking for a low-cost way to run AI/ML models at scale, Salad is worth a look. This cloud-based service lets you run and manage AI/ML production models with thousands of consumer GPUs around the world. It's got features like scalability, on-demand elasticity and multi-cloud support that can cut costs dramatically, up to 90% compared to traditional providers. With a user-friendly interface and powerful tooling, Salad is a good option for running GPU-hungry workloads like computer vision and language models, with prices starting at $0.02 per hour.
Another good option is Together, which is geared to making it easier to develop and deploy generative AI models. It's got optimizations like Cocktail SGD and FlashAttention 2 to accelerate model training and inference. The service supports a variety of models and has scalable inference to handle a lot of traffic. Together also offers collaborative tools for fine-tuning and deploying AI solutions, and it promises big cost savings, up to 117x compared to AWS and 4x compared to other suppliers.
Anyscale is another powerful service for developing, deploying and scaling AI applications. It's based on the open-source Ray framework, supports a variety of AI models and takes advantage of cloud flexibility on multiple clouds and on-premise systems. Anyscale's features include smart instance management, heterogeneous node control and reported cost savings of up to 50% on spot instances. It's geared for enterprises, with a free tier and custom pricing with volume discounts.
If you need a globally distributed GPU cloud, RunPod is worth a look. It lets you spin up GPU pods instantly and supports a variety of GPU workloads. RunPod bills by the minute with no egress or ingress charges, offers serverless ML inference and has a variety of preconfigured templates for frameworks like PyTorch and Tensorflow. Its pricing is based on GPU instance usage, so it's a good option for running AI models at scale.