spaCy

Processes text into annotated objects for efficient large-scale information extraction, supporting 75+ languages and integratable with custom models and pipelines.
Natural Language Processing Text Analysis Language Translation

spaCy is a free, open-source library for Natural Language Processing (NLP) in Python. It includes a variety of features that let you do things like named entity recognition (NER), part-of-speech (POS) tagging, dependency parsing, and word vector computation.

spaCy is designed to let you get real work done, not waste your time with unnecessary complexity and obtuse APIs. It's written in Cython, which means it's fast for large-scale information extraction tasks and can handle a whole dump of the web in one pass.

The library supports more than 75 languages and comes with 84 trained pipelines for 25 languages, including multi-task learning with transformers like BERT and robust production-ready training systems. You can also plug in your own models trained with PyTorch, TensorFlow and other frameworks and use built-in visualizers for syntax and NER.

New features include the ability to integrate large language models into structured NLP pipelines, turning free-form responses into structured output for many NLP tasks without requiring training data. You can also use tools like Prodigy for efficient machine teaching and annotation.

spaCy's object-oriented design is built around the Language class, Vocab, and Doc object. The Language class processes text into Doc objects, which have sequences of tokens and their annotations. This approach provides a single source of truth for the data and is memory efficient.

Developers can easily add or replace components with Language.add_pipe. The library comes with many built-in components for different language processing tasks, and you can add your own. It also supports model packaging, deployment, and workflow orchestration.

For customers who need custom spaCy pipelines, the library's core developers can provide custom solutions with fixed pricing and maintainable code.

spaCy can be installed with pip or conda, which means it works on 64-bit CPython 3.7+ on Unix/Linux, macOS/OS X and Windows. You can install extra dependencies like lookups, transformers or specific language support with pip brackets, for example spacy[lookups, transformers].

By balancing performance, simplicity and extensibility, spaCy has become a de facto standard for Natural Language Processing tasks in Python.

Published on June 27, 2024

Related Questions

Tool Suggestions

Analyzing spaCy...