Build and understand Transformer architecture through hands-on implementation.
This repository contains a step-by-step implementation of the Transformer architecture as described in "Attention Is All You Need". The goal is to deeply understand each component through practical implementation.
transformer/
│
├── src/
│ ├── models/
│ │ ├── __init__.py
│ │ ├── attention.py # Attention mechanisms
│ │ ├── encoder.py # Transformer encoder
│ │ ├── decoder.py # Transformer decoder
│ │ ├── embeddings.py # Token and positional embeddings
│ │ ├── feed_forward.py # Position-wise feed-forward networks
│ │ └── transformer.py # Full transformer model
│ │
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── config.py # Model configuration
│ │ ├── data_loader.py # Data loading utilities
│ │ ├── metrics.py # Evaluation metrics
│ │ └── visualization.py # Attention visualization tools
│ │
│ └── training/
│ ├── __init__.py
│ ├── trainer.py # Training loop
│ ├── optimizer.py # Custom optimizers
│ └── scheduler.py # Learning rate scheduling
│
├── tests/
│ ├── __init__.py
│ ├── test_attention.py
│ ├── test_encoder.py
│ └── test_decoder.py
│
├── notebooks/
│ ├── 1_attention_mechanism.ipynb
│ ├── 2_encoder_implementation.ipynb
│ └── 3_decoder_implementation.ipynb
│
├── configs/
│ └── default_config.yaml
│
├── requirements.txt
├── setup.py
├── .gitignore
├── LICENSE
└── README.md
- Scaled dot-product attention
- Multi-head attention
- Position-wise feed-forward networks
- Positional encoding
- Layer normalization
- Residual connections
[Full implementation phases listed in progress tracking...]
Python 3.8+
PyTorch 2.0+
git clone https://github.com/yourusername/transformer-implementation.git
cd transformer-implementation
pip install -r requirements.txt
--
model:
d_model: 512
n_heads: 8
n_layers: 6
d_ff: 2048
dropout: 0.1
max_seq_length: 512
training:
batch_size: 32
learning_rate: 0.0001
warmup_steps: 4000
max_epochs: 100
gradient_clip_val: 1.0
label_smoothing: 0.1
data:
vocab_size: 32000
train_file: "data/train.txt"
val_file: "data/val.txt"
test_file: "data/test.txt"