In this article we will code a translator based on seq2seq with encoders and decoders. The seq2seq is made up of two models, which are the encoder and the decoderand for this reason, below we will build both.
What is seq2seq?
To understand how the seq2seq-based translator works, we must first understand what seq2seq is. The seq2seq (Sequence-to-Sequence) is a model of deep learning which is used to process input sequences and generate output sequences in natural language processing applications, such as machine translation, text summarization, or question answering, among others.
The model consists of two main components: an encoder and a decoder.
The encoder processes the input sequence and transforms it into a feature vector, which is used as input to the decoder.
The decoder generates the output sequence step by step, taking the feature vector and the previously generated sequence as input.
Seq2seq is based on recurrent neural networks (RNN), which are capable of handling sequences of variable length and capturing long-term dependencies. In particular, RNNs with long short-term memory (LSTM) or GRU networks are used, which are variants of RNNs for the encoder and decoder.
Seq2seq has proven to be very effective in machine translation and other natural language processing tasks.
Translator based on seq2seq
Encoder construction
As we have mentioned, the seq2seq-based translator requires a encoder and a decoder to work. Now we will see how to build the encoder.
The first thing we will do is build our input layer, which is where the representations of the English texts will be received. Subsequently, we generate our encoder, which in this case will be an LSTM with 400 units, which correspond to the number of neurons with which we will work.
This is going to directly receive the input. If we wanted to, we could optimize it further; we could add embeddings previous or any other type of method that we have been seeing in other blog posts. However, as this is already a fairly expensive method in itself, we will not include features that make it more complex.
Define the decoder
In our seq2seq-based translator exercise the next step is to define the decoder or decoder. It is at this point where the difference comes in with other models that we have seen previously. What we create, therefore, is a RepeatVector. With this RepeatVector, we basically generate the predictions, one by one, with the state that we received initially.
It is clear that this is within the same network, because it is not a different network.
If we look closely, They are different layers, but they are not defined within a different model. We see that the layers are built within the same model.
With this we define the RepeatVector of our translator based on seq2seq. Afterwards, we pass the output we generate to the LSTM so that they can optimize their weights. It is at this point (marked in red in the image) where the difference comes in, since We tell it that the input of this LSTM is what we are receiving, step by step, from the RepeatVector, and we initialize the LSTM with the initial_state that we have received from the encoder. That is, it is not going to start from scratch, but rather it is going to start from what we are going to give it as a result of the encoder.
Let’s see this seq2seq-based translator exercise in practice:
#Translator based on seq2seq from tensorflow.keras.layers import LSTM learning_rate = 1 e – 3 #Encoder encoder_input_seq = Input (shape = tmp_x.shape [1:]) encoder_output, state_h, state_c = LSTM (units = 400, return_sequences = False, return_state = True) (encoder_input_seq) #Decoder decoder_input_seq = RepeatVector (max_french_sequence_length) (encoder_output) decoder_out = LSTM (units = 400, return_sequences = True, return_state = False ) (decoder_input_seq, initial_state = [state_h, state_c]) logits = TimeDistributed (Dense (units = french_vocab_size)) (decoder_out) #Model model = Model (encoder_input_seq, Activation (‘softmax’) (logits)) model.compile (loss = sparse_categorical_crossentropy, optimizer = Adam (lr = learning_rate), metrics = [‘accuracy’]) model.fit (tmp_x, preproc_french_sentences, batch_size = 32, epochs = 1, validation_split = 0.2) #Print prediction (s) print (logits_to_text (model.predict (tmp_x [:1]) [0]french_tokenizer))
Whats Next?
We know that Big Data has many aspects and, therefore, there are many topics that you can learn about. At we offer you the possibility of learning with the best professionals, who will guide you through theory and practice so that, in a few months, you can become a great professional in the IT sector. Take a look at the Big Data, Artificial Intelligence & Machine Learning Full Stack Bootcamp syllabus and discover this intensive, comprehensive and high-quality training. Request more information now and take the step that will boost your career!