Understanding Transformers Part 17: Generating the Output Word

DEV Community

Rijul Rajesh

May 1, 2026, 04:53 PM

In the previous article, we set up the residual connections to get the final output values from the decoder. In this article, we begin by passing these two output values through a fully connected layer. This layer has: One input for each value representing the current token (in this case, 2 inputs) One output for each word in the output vocabulary Since our vocabulary has 4 tokens, this gives us 4 output values. Next, we pass these 4 output values through a softmax function. This allows us to select the most likely output word, which in this case is “vamos”. So far, the translation is correct. However, the process does not stop here. The decoder continues generating words until it produces an token, which indicates the end of the sentence. To generate the next word, we feed the predicted word back into the decoder. We will explore this step in the next article. Looking for an easier way to install tools, libraries, or entire repositories? Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance. Just run: ipm install repo-name … and you’re done! 🚀 🔗 Explore Installerpedia here