23 febrero, 2024

How does Python’s spaCy work? | Bootcamps

The Python spaCy library is a library open source for NLP. There is a library similar to spaCy in Python, called NLTK. The main difference between the two is that NLTK handles a more comfortable environment and is perfect for beginners, while spaCy is more designed for productivity.

The working philosophy in Python spaCy is that, if there are a series of algorithms that solve a problem, the problem must be solved with a single algorithm. Its operation is based on the construction of pipelines.

What is spaCy?

SpaCy, together with NLTK, is one of the most used libraries in the world. Natural Language Processing (NLP) or natural language processing. SpaCy was developed by Matt Honnibal and launched in 2015; It has an MIT license and is available on GitHub.

Python spaCy Features

Among other things, this library has the following characteristics:

Supports more than 70 languages. Contains 80 pipelines translated into 24 languages. Includes pre-trained BERTs. BERT is an architecture deep learning based on transformer and it is one of the most powerful there is. Multitasking learning. Pre-trained word vectors. Linguistic tokenization. Components that allow the recognition of named entities, labeling of parts of speech, dependency analysis, sentence segmentation, text classification, lemmatization or morphological analysis, among others. It has support for custom models in PyTorchm Tensorflow and other frameworks.

SpaCy includes pre-trained models within the module itself. They can even be downloaded to be able to do entity detection or topic extraction (to give some examples) more automatically.

Python spaCy related libraries

NLTK

NLTK offers some of the functionality of spaCy. Although it was initially developed for teaching and searching, its great stability over time has guaranteed it a large number of users in industries of all types.

It is the leading alternative after spaCy for tokenization and sentence segmentation. Compared with spaCy, NLTK has a wider range of options than spaCy. SpaCy, for its part, is more focused on performance. Although both libraries have similar functionalities, the implementation of spaCy is usually faster and more precise.

GENSIM

Gensim provides unsupervised text modeling algorithms. Although Gensim is not a dependency of spaCy, we use it to train word vectors.

Tensorflow/Keras

It is the bookstore deep learning more popular. Provides efficient and powerful feature extraction functionality that can be used in any data preprocessing in deep learning.

What linguistic capabilities (models) does Python’s spaCy offer us?

Some of the functionalities that the spaCy Python library offers us are:

POS Tagging.
Dependency Parsing.
Named entities.
Tokenization. Sentence segmentation.
Rule – based matching.

That is to say, From spaCy we can always extract tokens, post tags, dependency trees or named entities whenever we want.. Also includes models of word embedding, which we have already seen superficially on the blog.

Some of the functionalities of spaCy regarding text.

Python spaCy models

Pre-trained models for different languages ​​and with different corpora can be downloaded in different ways, both download and pip.

Pipelines in Python spaCy

Let’s see an example of how these work. pipelines:

text=»My name is Fran and I live in Madrid. Today is Monday, January 31, 2022″ doc = nlp_es (text)

What this command will do is generate a spaCy model equivalent to the text. We can do various things, such as tokenization:

#For phrases for idx, sent in enumerate (doc.sents): print (f’ Phrase {idx} {sent.text}’)

Sentence 0 My name is Fran and I live in Madrid.

Sentence 1 Today is Monday, January 31, 2022.

#Tokens for idx, token in enumerate (doc): print (f’ Token {idx} {token.text}’) print (‘{0:10} {1:10} {2:5}’.format (‘ Token’, ‘Shape’, ‘is_alpha’)) for token in doc: print (‘{0:10} {1:10} {2:5}’.format (token.text, token.shape_, str (token .is_alpha)))

As we have seen, there are many things that can be done with Python spaCy. Here we have shown you some, we hope they have been enough to get you interested in the topic and continue learning.

Do you want to continue moving forward?

In order to access the job options of Big Data, one of the areas in the world tech better paid and with greater demand, we have the Big Data, Artificial Intelligence & Machine Learning Full Stack Bootcamp for you. With this intensive and comprehensive training you will acquire all the theoretical and practical knowledge you need to get the job of your dreams in a few months. Don’t keep waiting to boost your career and request more information now!

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *