Decoding the Power of Multi-Head Attention in Transformers

Nov 6, 2024·
Tuhin Sharma
Tuhin Sharma
· 1 min read
Image credit: DALLE

Description

In our pursuit of creating a fun dialogue completer using transformers, we chose to build a system that could complete famous quotes — like those from Cersei Lannister when she played the game of thrones. To achieve this, we first passed our input text through an embedding layer, adding position information to each word embedding to create position-aware embeddings. These were then sent forward to the next layer. In this segment, we dive deeper into the transformative heart of transformer models: the multi-head attention mechanism, which, I must warn, is a powerful multi-faceted component.

Read more..