For training and iterating on AI models with lots of unstructured data from many sources, Dataloop is a powerful AI development platform. It includes data management for exploration and analysis of big data sets, automated preprocessing, and embeddings for similarity detection. The platform also includes model management, pipeline orchestration and human feedback integration for collaboration and AI application development. It can handle many types of unstructured data, like images, videos and text, and is designed to meet high security requirements.
Another strong contender is Nanonets, an AI-powered automation platform that can extract insights from unstructured data from many sources. It includes ingestion, AI-powered data extraction, data enrichment for better insights and flexible export options. Nanonets is geared for industries like finance, manufacturing and health care, where it can automate tasks and free up staff for higher-level work.
Airbyte is an open-source data integration platform that lets you move data from more than 300 sources of structured and unstructured data to many destinations. It's geared for data engineers, AI engineers, analytics engineers and data analysts. Features include custom connectors, automated schema evolution and integrations with tools like OpenAI and dbt, so it can handle a lot of different data integration needs.
For intelligent document processing and data extraction, ABBYY uses AI, NLP and OCR to automate business processes. It includes a marketplace with custom-built AI models, including support for large language models and robotic automation. ABBYY's platform is well-suited for automating document-centric processes like accounts payable and customer onboarding, and providing fast and accurate insights to inform business decisions.