Skip to content

Build and understand Transformer architecture through hands-on implementation.

Notifications You must be signed in to change notification settings

james92kj/Attention-Is-All-You-Need

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Implementation from Scratch

Build and understand Transformer architecture through hands-on implementation.

🎯 Project Overview

This repository contains a step-by-step implementation of the Transformer architecture as described in "Attention Is All You Need". The goal is to deeply understand each component through practical implementation.

📂 Repository Structure

transformer/
│
├── src/
│   ├── models/
│   │   ├── __init__.py
│   │   ├── attention.py        # Attention mechanisms
│   │   ├── encoder.py          # Transformer encoder
│   │   ├── decoder.py          # Transformer decoder
│   │   ├── embeddings.py       # Token and positional embeddings
│   │   ├── feed_forward.py     # Position-wise feed-forward networks
│   │   └── transformer.py      # Full transformer model
│   │
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── config.py           # Model configuration
│   │   ├── data_loader.py      # Data loading utilities
│   │   ├── metrics.py          # Evaluation metrics
│   │   └── visualization.py    # Attention visualization tools
│   │
│   └── training/
│       ├── __init__.py
│       ├── trainer.py          # Training loop
│       ├── optimizer.py        # Custom optimizers
│       └── scheduler.py        # Learning rate scheduling
│
├── tests/
│   ├── __init__.py
│   ├── test_attention.py
│   ├── test_encoder.py
│   └── test_decoder.py
│
├── notebooks/
│   ├── 1_attention_mechanism.ipynb
│   ├── 2_encoder_implementation.ipynb
│   └── 3_decoder_implementation.ipynb
│
├── configs/
│   └── default_config.yaml
│
├── requirements.txt
├── setup.py
├── .gitignore
├── LICENSE
└── README.md

📚 Implementation Phases

Phase 1: Core Components

  • Scaled dot-product attention
  • Multi-head attention
  • Position-wise feed-forward networks
  • Positional encoding
  • Layer normalization
  • Residual connections

[Full implementation phases listed in progress tracking...]

🚀 Getting Started

Prerequisites

Python 3.8+
PyTorch 2.0+

Installation

git clone https://github.com/yourusername/transformer-implementation.git
cd transformer-implementation
pip install -r requirements.txt

📖 Learning Resources

--

configs/default_config.yaml

model:
  d_model: 512
  n_heads: 8
  n_layers: 6
  d_ff: 2048
  dropout: 0.1
  max_seq_length: 512

training:
  batch_size: 32
  learning_rate: 0.0001
  warmup_steps: 4000
  max_epochs: 100
  gradient_clip_val: 1.0
  label_smoothing: 0.1

data:
  vocab_size: 32000
  train_file: "data/train.txt"
  val_file: "data/val.txt"
  test_file: "data/test.txt"

About

Build and understand Transformer architecture through hands-on implementation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages