Skip to content

Commit

Permalink
Develop (#1)
Browse files Browse the repository at this point in the history
* add files

* add new files

* initial re-factor

* fix ruff errors and rename transformer.py to unimolplus.py

* fix all ruff/mypy errors

* add docs and test

* add tests

* add further documentation

* add more tests

* change badges to atomgen path

* update idna

* remove pip-audit

* remove codecov upload temp.

* ignore depr. warnings

* remove einops

* remove coverage run

* remove cov upload
  • Loading branch information
a-kore authored May 4, 2024
1 parent 2352adc commit 277491e
Show file tree
Hide file tree
Showing 53 changed files with 9,029 additions and 331 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/code_checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
source .venv/bin/activate
poetry install --with test --all-extras
pre-commit run --all-files
- name: pip-audit (gh-action-pip-audit)
uses: pypa/[email protected]
with:
virtual-environment: .venv/
# - name: pip-audit (gh-action-pip-audit)
# uses: pypa/[email protected]
# with:
# virtual-environment: .venv/
22 changes: 11 additions & 11 deletions .github/workflows/docs_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,14 @@ jobs:
poetry install --with docs,test
cd docs && rm -rf source/reference/api/_autosummary && make html
cd .. && coverage run -m pytest -m "not integration_test" && coverage xml && coverage report -m
- name: Upload coverage to Codecov
uses: Wandalen/[email protected]
with:
action: codecov/[email protected]
with: |
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
name: codecov-umbrella
fail_ci_if_error: true
attempt_limit: 5
attempt_delay: 30000
# - name: Upload coverage to Codecov
# uses: Wandalen/[email protected]
# with:
# action: codecov/[email protected]
# with: |
# token: ${{ secrets.CODECOV_TOKEN }}
# file: ./coverage.xml
# name: codecov-umbrella
# fail_ci_if_error: true
# attempt_limit: 5
# attempt_delay: 30000
24 changes: 12 additions & 12 deletions .github/workflows/integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,15 @@ jobs:
poetry env use '3.10'
source $(poetry env info --path)/bin/activate
poetry install --with docs,test
coverage run -m pytest -m integration_test && coverage xml && coverage report -m
- name: Upload coverage to Codecov
uses: Wandalen/[email protected]
with:
action: codecov/[email protected]
with: |
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
name: codecov-umbrella
fail_ci_if_error: true
attempt_limit: 5
attempt_delay: 30000
# coverage run -m pytest -m integration_test && coverage xml && coverage report -m
# - name: Upload coverage to Codecov
# uses: Wandalen/[email protected]
# with:
# action: codecov/[email protected]
# with: |
# token: ${{ secrets.CODECOV_TOKEN }}
# file: ./coverage.xml
# name: codecov-umbrella
# fail_ci_if_error: true
# attempt_limit: 5
# attempt_delay: 30000
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ repos:
rev: v1.19.0
hooks:
- id: typos
args: []
args: [--force-exclude]

- repo: https://github.com/nbQA-dev/nbQA
rev: 1.7.1
Expand All @@ -50,7 +50,7 @@ repos:
- id: doctest
name: doctest
entry: python3 -m doctest -o NORMALIZE_WHITESPACE
files: "^aieng_template/"
files: "^atomgen/"
language: system

- repo: local
Expand Down
68 changes: 60 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,66 @@
# AI Engineering template

![atomgen Logo](https://github.com/VectorInstitute/atomgen/blob/main/docs/source/_static/atomgen_logo_text.png?raw=true)
----------------------------------------------------------------------------------------

[![code checks](https://github.com/VectorInstitute/aieng-template/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/code_checks.yml)
[![integration tests](https://github.com/VectorInstitute/aieng-template/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/integration_tests.yml)
[![docs](https://github.com/VectorInstitute/aieng-template/actions/workflows/docs_deploy.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/docs_deploy.yml)
[![codecov](https://codecov.io/gh/VectorInstitute/aieng-template/branch/main/graph/badge.svg)](https://codecov.io/gh/VectorInstitute/aieng-template)
[![license](https://img.shields.io/github/license/VectorInstitute/aieng-template.svg)](https://github.com/VectorInstitute/aieng-template/blob/main/LICENSE)
[![code checks](https://github.com/VectorInstitute/atomgen/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/code_checks.yml)
[![integration tests](https://github.com/VectorInstitute/atomgen/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/integration_tests.yml)
[![docs](https://github.com/VectorInstitute/atomgen/actions/workflows/docs_deploy.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/docs_deploy.yml)
[![codecov](https://codecov.io/gh/VectorInstitute/atomgen/branch/main/graph/badge.svg)](https://codecov.io/gh/VectorInstitute/atomgen)
[![license](https://img.shields.io/github/license/VectorInstitute/atomgen.svg)](https://github.com/VectorInstitute/atomgen/blob/main/LICENSE)

## Table of Contents

- [Overview](#overview)
- [Datasets](#datasets)
- [Models](#models)
- [Tasks](#tasks)
- [Installation](#installation)
- [Developing](#developing)

## Introduction

AtomGen provides a robust framework for handling atomistic graph datasets focusing on transformer-based implementations. We provide utilities for training various models, experimenting with different pre-training tasks, and pre-trained models.

It streamlines the process of aggregation, standardization, and utilization of datasets from diverse sources, enabling large-scale pre-training and generative modeling on atomistic graphs.

## Datasets

AtomGen facilitates the aggregation and standardization of datasets, including but not limited to:

- **S2EF Datasets**: Aggregated from multiple sources such as OC20, OC22, ODAC23, MPtrj, and SPICE with structures and energies/forces for pre-training.

- **Misc. Atomistic Graph Datasets**: Including Molecule3D, Protein Data Bank (PDB), and the Open Quantum Materials Database (OQMD).

Currently, AtomGen has pre-processed datasets for the S2EF pre-training task for OC20 and a mixed dataset of OC20, OC22, ODAC23, MPtrj, and SPICE. They have been uploaded to huggingface hub and can be accessed using the datasets API.

## Models

AtomGen supports a variety of models for training on atomistic graph datasets, including:

- SchNet
- TokenGT
- Uni-Mol+ (Modified)

## Tasks

Experimentation with pre-training tasks is facilitated through AtomGen, including:

- **Structure to Energy & Forces**: Predicting energies and forces for atomistic graphs.

- **Masked Atom Modeling**: Masking atoms and predicting their properties.

- **Coordinate Denoising**: Denoising atom coordinates.

These tasks are all facilitated through the DataCollatorForAtomModeling class and can be used simultaneously or individually.

## Installation

The package can be installed using poetry:

```bash
python3 -m poetry install
source $(poetry env info --path)/bin/activate
```

A template repo for AI Engineering projects (using ``python``)

## 🧑🏿‍💻 Developing

Expand Down
2 changes: 2 additions & 0 deletions _typos.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[files]
extend-exclude = ["*.json", "**/*.json"]
25 changes: 0 additions & 25 deletions aieng_template/bar.py

This file was deleted.

25 changes: 0 additions & 25 deletions aieng_template/foo.py

This file was deleted.

7 changes: 7 additions & 0 deletions atomgen/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
Data module for the AtomGen library.
This module contains the data classes and functions for
pre-processing and collating data for training/inference.
"""
Loading

0 comments on commit 277491e

Please sign in to comment.