Skip to content

Hetens/nano-gpt

Repository files navigation

Notes:

Attention is a communication mechanism. Can be seen as nodes in a directed graph looking at each other and aggregating information with a weighted sum of all nodes that poitn to them, with data dependant weights.

There is no notion of space. Attention simpy acts over a set of vectors. This is why we need to positionally encode tokens.

Each example across batch dimension is ofcourse processed completely independant and never talk to each other

"self-attention" means that the keys and values produced in the same pool of nodes. "cross-attention" refers to a separate pool of nodes for the keys and a separate pool for the values.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published