Sentence Transformers

Compute embeddings and similarity scores with state-of-the-art models, enabling semantic search, paraphrase mining, and more with over 5,000 pre-trained models.
Natural Language Processing Text Embeddings Semantic Search

Sentence Transformers is a Python library that provides access to state-of-the-art text and image embedding models. It can be used to compute embeddings with Sentence Transformer models and to compute similarity scores with Cross-Encoder models. This opens up a wide range of use cases, including semantic search, semantic textual similarity, paraphrase mining, and more.

Over 5,000 pre-trained Sentence Transformers models are available for immediate use, including many of the state-of-the-art models from the Massive Text Embeddings Benchmark (MTEB) leaderboard. This variety of models means that users can pick the best models for their specific use cases. Users can also easily train or fine-tune their own models using Sentence Transformers.

Sentence Transformers provides a wide range of capabilities:

  • Computing Embeddings: Calculate embeddings using Sentence Transformer models.
  • Semantic Textual Similarity: Calculate the similarity between two text inputs.
  • Semantic Search: Find relevant text passages based on a query or search term.
  • Retrieve & Re-Rank: Retrieve the top-k results from a Sentence Transformer model and re-rank them using a Cross Encoder model.
  • Clustering: Group similar text inputs together.
  • Paraphrase Mining: Find paraphrased text inputs.
  • Translated Sentence Mining: Find translated text inputs.
  • Image Search: Find images based on text inputs.
  • Embedding Quantization: Reduce the dimensionality of embeddings.

The quickstart process is easy:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium."
]
embeddings = model.encode(sentences)
print(embeddings.shape)  # [3, 384]
similarities = model.similarity(embeddings, embeddings)
print(similarities)

Cross Encoder models can be used for tasks that compare pairs of texts:

from sentence_transformers.cross_encoder import CrossEncoder
model = CrossEncoder("cross-encoder/stsb-distilroberta-base")
query = "A man is eating pasta."
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin.",
    "Two men pushed carts through the woods.",
    "A man is riding a white horse on an enclosed ground.",
    "A monkey is playing drums.",
    "A cheetah is running behind its prey."
]
ranks = model.rank(query, corpus)
print("Query: ", query)
for rank in ranks:
    print(f"{rank['score']:.2f}\t{corpus[rank['corpus_id']]}")

Sentence Transformers can be installed via pip and conda, ensuring compatibility with different Python environments. The recommended setup includes Python 3.8+, PyTorch 1.11.0+, and transformers v4.34.0+. The project offers several installation options, including default, default with training, and development modes, each suited for different needs.

Sentence Transformers is maintained by Hugging Face and was initially developed by UKPLab. This module offers a comprehensive solution for natural language processing tasks that require efficient text and image embeddings.

Published on August 10, 2024

Related Questions

Tool Suggestions

Analyzing Sentence Transformers...