Another option is PlayHT, an AI text-to-speech service with more than 600 realistic voices. It can handle custom pronunciation, voice inflections and real-time voice cloning, and it's good for live streams, games and conversational AI. PlayHT has a focus on ethics and safety, including different pricing tiers and a lot of documentation.
If you need more flexibility in generating and editing audio, Audiobox offers a Meta research model that generates voices and sound effects from natural language text prompts. It can perform noise cancellation, audio editing and sound effect generation, so it's good for creative projects. Audiobox can add new voice styles to your audio and modify audio samples based on text prompts.
Also worth a look is Inworld, a powerful AI voice generator and Text-to-Speech API using the latest machine learning models. It's good for gaming, audiobooks and chatbots, with customizable speech synthesis and high-volume request support. Inworld has a developer-focused portal and lots of documentation, so it's a good option for compelling audio.