If you want to build your own chatbot that communicates with users through images and voice, there are a few good options. One of the most powerful is Deep Chat, a framework-agnostic chatbot component that you can customize to handle media uploads, camera and microphone input, and speech-to-text and text-to-speech output. It's open-source software that you can install with npm or a content delivery network, and it's customizable enough to be useful for anything from a simple chatbot to a full-fledged multimedia interface.
Another good option is EmbedAI, an AI chatbot builder powered by ChatGPT. It lets you build custom chatbots using your own data from files, websites and YouTube videos, and it supports more than 100 languages. EmbedAI offers a variety of ways to share your chatbot, and you can link to other apps with EmbedAI's API or Zapier integration. It's good for e-commerce customer service and educational sites.
If you want something more straightforward, Botpress has a user-friendly interface powered by OpenAI's Large Language Models. It can integrate with more than 100 channels like WhatsApp and Telegram, and its API and SDK let you customize it further. Botpress also has a free tier with some serious abilities, so it's a good option for personal projects.