Welcome to my implementation of Kolmogorov–Arnold Networks (KAN), meticulously optimized for Apple Silicon using the MLX framework. This Python package leverages the exceptional computational capabilities of Apple’s M1 chip and later versions, providing an advanced, efficient, and scalable solution for developing, training, and evaluating KAN models. The package is designed to facilitate seamless integration with popular datasets such as MNIST and Fashion MNIST, showcasing the versatility and robustness of KANs in various machine learning tasks.
Kolmogorov–Arnold Networks represent a sophisticated approach to neural network design, incorporating innovative mathematical principles to enhance learning efficiency and model performance. This implementation is grounded in cutting-edge research, as detailed in the Kolmogorov–Arnold Networks paper, and is tailored to exploit the unique architecture of Apple’s silicon, ensuring optimal performance and resource utilization.
Based on the paper
install the Package:
pip install mlx-kan
Example usage in Python:
from mlx_kan.kan import KAN
# Initialize and use KAN
kan_model = KAN([in_features * out_features] + [hidden_dim] * (num_layers - 1) + [num_classes])
The ModelArgs
class for the basic KAN model defines the arguments for configuring the KAN model:
class ModelArgs:
layers_hidden: Optional[List[int]] = None
model_type: str = "KAN"
num_layers: int = 2
in_features: int = 28
out_features: int = 28
hidden_dim: int = 64
num_classes: int = 10
grid_size: int = 5
spline_order: float = 3
scale_noise: float = 0.1
scale_base: float = 1.0
scale_spline: float = 1.0
hidden_act = nn.SiLU
grid_eps: float = 0.02
grid_range = [-1, 1]
The TrainArgs
class defines the training configuration:
@dataclass
class TrainArgs:
train_algorithm: str = "simple"
dataset: str = "custom"
max_steps: int = 0
epochs: int = 2
max_train_batch_size: int = 32
max_val_batch_size: int = 32
max_test_batch_size: int = 32
learning_rate: float = 1e-3
weight_decay: float = 1e-5
clip_grad_norm: bool = False
save_path: str = "./models"
The SimpleTrainer
function facilitates model training with the specified arguments and datasets:
def SimpleTrainer(
model: nn.Module,
args: TrainArgs,
train_set: Optional[tuple] = None,
validation_set: Optional[tuple] = None,
test_set: Optional[tuple] = None,
validation_interval: Optional[int] = None,
logging_interval: int = 10
)
You can find additional example files in the mlx-kan/examples directory to help you get started with various configurations and training setups.
python -m mlx-kan.quick_scripts.quick_train --help
python -m mlx-kan.quick_scripts.quick_train --num-layers 2 --hidden-dim 64 --num-epochs 2 --batch-size 14 --seed 42 --clip-grad-norm
The following classes define different sizes of MLP architectures using KANLinear
layers. You can import them via:
from mlx_kan.kan.architectures.KANMLP import LlamaKANMLP, SmallKANMLP, MiddleKANMLP, BigKANMLP
in_features
: The number of input features.hidden_dim
: The number of hidden units in each layer.out_features
: The number of output features.grid_size
: The size of the grid used in theKANLinear
layer. Default is5
.spline_order
: The order of the spline used in theKANLinear
layer. Default is3
.scale_noise
: The noise scaling factor. Default is0.1
.scale_base
: The base scaling factor. Default is1.0
.scale_spline
: The spline scaling factor. Default is1.0
.enable_standalone_scale_spline
: Whether to enable standalone scaling for the spline. Default isTrue
.hidden_act
: The activation function used in hidden layers. Default isnn.SiLU
.grid_eps
: The epsilon value for the grid. Default is0.02
.grid_range
: The range of the grid. Default is[-1, 1]
.
The SmallKANMLP
class consists of two KANLinear
layers. It is designed for small-scale models.
The MiddleKANMLP
class consists of three KANLinear
layers. It is designed for medium-scale models.
The BigKANMLP
class consists of four KANLinear
layers. It is designed for large-scale models.
The LlamaKANMLP
class consists of three KANLinear
layers configured in a the same manner Llama's MLP layer is configured. It is designed for models requiring a unique layer arrangement.
if you wish to train the MLP's, you need to update the grid points at the end of each epoch.
# Update grid points here
for name, layer in model.__dict__.items():
if isinstance(layer, KANLinear):
with mx.no_grad():
layer.update_grid(train_set)
def train(model, train_set, train_labels, num_epochs=100):
optimizer = optim.AdamW(learning_rate=0.0004, weight_decay=0.003) # Initialize a new optimizer for each model
loss_and_grad_fn = nn.value_and_grad(model, loss_fn)
# For 1 step
loss, grads = loss_and_grad_fn(model, train_set, train_labels)
optimizer.update(model, grads)
mx.eval(model.parameters(), optimizer.state)
avg_loss = total_loss += loss.item()
# Update grid points here
for name, layer in model.__dict__.items():
if isinstance(layer, KANLinear):
with mx.no_grad():
layer.update_grid(train_set)
To run this example, you need to have Python and the necessary libraries installed. Follow these steps to set up your environment:
- Clone the repository:
git clone https://github.com/Goekdeniz-Guelmez/mlx-kan.git
cd mlx-kan
- Install the required packages:
pip install -r requirements.txt
You can run the script main.py
to train the KAN model on the MNIST dataset. The script supports various command-line arguments for configuration.
--cpu
: Use the Metal back-end.--clip-grad-norm
: Use gradient clipping to prevent the gradients from becoming too large. Default isFalse
.--dataset
: The dataset to use (mnist
orfashion_mnist
). Default ismnist
.--num_layers
: Number of layers in the model. Default is2
.--in-features
: Number input features. Default is28
.--out-features
: Number output features. Default is28
.--num-classes
: Number of output classes. Default is10
.--hidden_dim
: Number of hidden units in each layer. Default is64
.--num_epochs
: Number of epochs to train. Default is10
.--batch_size
: Batch size for training. Default is64
.--learning_rate
: Learning rate for the optimizer. Default is1e-3
.--weight-decay
: Weight decay for the optimizer. Default is1e-4
.--eval-report-count
: Number of epochs to report validations / test accuracy values. Default is10
.--save-path
: Path with the model name where the trained KAN model will be saved. Default istraned_kan_model.safetensors
.--train-batched
: Use batch training instead of single epoch. Default isFalse
.--seed
: Random seed for reproducibility. Default is0
.
python -m quick_scripts.quick_train --help
Train the KAN model on the MNIST dataset with default settings:
python -m quick_scripts.quick_train --dataset mnist
Train the KAN model with a custom configuration:
python -m quick_scripts.quick_train --dataset fashion_mnist --num-layers 2 --hidden-dim 64 --num-epochs 2 --batch-size 14 --seed 42 --clip-grad-norm
Train the KAN model using the CPU backend:
python -m quick_scripts.quick_train --cpu --dataset mnist
The KAN
(Kolmogorov–Arnold Networks) class defines the model architecture. The network consists of multiple KANLinear
layers, each defined by the provided parameters. The number of layers and the hidden dimension size can be configured via command-line arguments.
layers_hidden = [in_features * out_features] + [hidden_dim] * (num_layers - 1) + [num_classes]
model = KAN(layers_hidden)
The KAN
class initializes a sequence of KANLinear
layers based on the provided hidden layers configuration. Each layer performs linear transformations with kernel attention mechanisms.
class KAN(nn.Module):
def __init__(self, layers_hidden, grid_size=5, spline_order=3, scale_noise=0.1, scale_base=1.0, scale_spline=1.0, base_activation=nn.SiLU, grid_eps=0.02, grid_range=[-1, 1]):
super().__init__()
self.layers = []
for in_features, out_features in zip(layers_hidden, layers_hidden[1:]):
self.layers.append(
KANLinear(
in_features, out_features, grid_size, spline_order, scale_noise, scale_base, scale_spline, base_activation, grid_eps, grid_range
)
)
def __call__(self, x, update_grid=False):
for layer in self.layers:
if update_grid:
layer.update_grid(x)
x = layer(x)
return x
def regularization_loss(self, regularize_activation=1.0, regularize_entropy=1.0):
return mx.add(*(
layer.regularization_loss(regularize_activation, regularize_entropy)
for layer in self.layers
))
Contributions are welcome! If you have any suggestions or improvements, feel free to open an issue or submit a pull request.
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add new feature'
). - Push to the branch (
git push origin feature-branch
). - Create a new Pull Request.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
Made with love by Gökdeniz Gülmez.
The mlx-kan software suite was developed by Gökdeniz Gülmez. If you find mlx-kan useful in your research and wish to cite it, please use the following BibTex entry:
@software{
mlx-kan,
author = {Gökdeniz Gülmez},
title = {{mlx-kan}: KAN: Kolmogorov–Arnold Networks in MLX for Apple silicon},
url = {https://github.com/Goekdeniz-Guelmez/mlx-kan},
version = {0.2.0},
year = {2024},
}