spaCy is a free, open-source library for Natural Language Processing (NLP) in Python. It includes a variety of features that let you do things like named entity recognition (NER), part-of-speech (POS) tagging, dependency parsing, and word vector computation.
spaCy is designed to let you get real work done, not waste your time with unnecessary complexity and obtuse APIs. It's written in Cython, which means it's fast for large-scale information extraction tasks and can handle a whole dump of the web in one pass.
The library supports more than 75 languages and comes with 84 trained pipelines for 25 languages, including multi-task learning with transformers like BERT and robust production-ready training systems. You can also plug in your own models trained with PyTorch, TensorFlow and other frameworks and use built-in visualizers for syntax and NER.
New features include the ability to integrate large language models into structured NLP pipelines, turning free-form responses into structured output for many NLP tasks without requiring training data. You can also use tools like Prodigy for efficient machine teaching and annotation.
spaCy's object-oriented design is built around the Language
class, Vocab
, and Doc
object. The Language
class processes text into Doc
objects, which have sequences of tokens and their annotations. This approach provides a single source of truth for the data and is memory efficient.
Developers can easily add or replace components with Language.add_pipe
. The library comes with many built-in components for different language processing tasks, and you can add your own. It also supports model packaging, deployment, and workflow orchestration.
For customers who need custom spaCy pipelines, the library's core developers can provide custom solutions with fixed pricing and maintainable code.
spaCy can be installed with pip or conda, which means it works on 64-bit CPython 3.7+ on Unix/Linux, macOS/OS X and Windows. You can install extra dependencies like lookups, transformers or specific language support with pip brackets, for example spacy[lookups, transformers]
.
By balancing performance, simplicity and extensibility, spaCy has become a de facto standard for Natural Language Processing tasks in Python.
Published on June 27, 2024
Analyzing spaCy...