LastMile AI is a full developer platform to help engineers productionize generative AI apps. It's got features like Auto-Eval to detect hallucinations automatically, RAG Debugger to optimize performance, and Consult AI Expert to get help. The service supports many AI models and comes with a notebook-like environment for prototyping and app building, and it's designed to let you easily deploy production apps.
Another good option is Humanloop, a service to coordinate and optimize Large Language Model (LLM) development. It tackles problems like suboptimal workflows and manual evaluation with a collaborative playground and an evaluation and monitoring suite. Humanloop supports several LLM providers and offers software development kits for easy integration, so it's good for product teams and developers who want to work more efficiently and collaborate better.
For a more specialized service, check out HoneyHive, an AI evaluation and testing service. It's got one LLMOps environment for collaboration, testing and evaluation, with features like automated continuous integration testing, observability and prompt management. HoneyHive supports a broad range of models, and it's got a few pricing tiers, including a free developer plan, so it's good for solo developers and researchers.