If you need an open-source project to apply more-advanced NLP techniques to your data, Trieve is a good option. It's a full-stack framework for building search, recommendations and Retrieval-Augmented Generation (RAG) experiences. Features include private managed embedding models, SPLADE full-text neural search, semantic vector search and hybrid search. The tools support date recency biasing, re-ranker models and semantic search, so it's a good option for more-advanced search. It also supports merchandising relevance tuning and has a free tier for noncommercial self-hosting.
Another good option is Exa, which uses embeddings and transformer-based models to process search queries. It can return contextually relevant results by processing natural language search queries and retrieving page content on the fly. Exa is designed to work with Large Language Models (LLMs) to return authoritative web content and avoid hallucinations. The service offers several pricing levels, including a free tier, and indexes its data every two minutes, focusing on high-quality web pages.
If you need to manage a big data infrastructure for Retrieval Augmented Generation (RAG) and semantic search, check out Neum AI. It offers open-source SDKs, built-in connectors to many data sources and scalable pipelines to handle millions of vectors. The framework can handle real-time data embedding and indexing and can be easily integrated with services like Supabase. Neum AI offers several pricing levels, including a free starter plan, so it should be available for your needs and scale.
Last is Embedditor, which is designed to optimize embedding metadata and tokens for vector search with more-advanced NLP techniques like TF-IDF and normalization. It has a user interface to fine-tune embedding tokens and optimize vector storage to cut costs. It's good for making vector database content more relevant and for improving data security and cost effectiveness.