Develop (#1)

* add files * add new files * initial re-factor * fix ruff errors and rename transformer.py to unimolplus.py * fix all ruff/mypy errors * add docs and test * add tests * add further documentation * add more tests * change badges to atomgen path * update idna * remove pip-audit * remove codecov upload temp. * ignore depr. warnings * remove einops * remove coverage run * remove cov upload
VectorInstitute · May 4, 2024 · 277491e · 277491e
1 parent 2352adc
commit 277491e
Show file tree

Hide file tree

Showing 53 changed files with 9,029 additions and 331 deletions.
diff --git a/.github/workflows/code_checks.yml b/.github/workflows/code_checks.yml
@@ -42,7 +42,7 @@ jobs:
           source .venv/bin/activate
           poetry install --with test --all-extras
           pre-commit run --all-files
-      - name: pip-audit (gh-action-pip-audit)
-        uses: pypa/[email protected]
-        with:
-          virtual-environment: .venv/
+      # - name: pip-audit (gh-action-pip-audit)
+      #   uses: pypa/[email protected]
+      #   with:
+      #     virtual-environment: .venv/
diff --git a/.github/workflows/docs_build.yml b/.github/workflows/docs_build.yml
@@ -34,14 +34,14 @@ jobs:
           poetry install --with docs,test
           cd docs && rm -rf source/reference/api/_autosummary && make html
           cd .. && coverage run -m pytest -m "not integration_test" && coverage xml && coverage report -m
-      - name: Upload coverage to Codecov
-        uses: Wandalen/[email protected]
-        with:
-          action: codecov/[email protected]
-          with: |
-            token: ${{ secrets.CODECOV_TOKEN }}
-            file: ./coverage.xml
-            name: codecov-umbrella
-            fail_ci_if_error: true
-          attempt_limit: 5
-          attempt_delay: 30000
+      # - name: Upload coverage to Codecov
+      #   uses: Wandalen/[email protected]
+      #   with:
+      #     action: codecov/[email protected]
+      #     with: |
+      #       token: ${{ secrets.CODECOV_TOKEN }}
+      #       file: ./coverage.xml
+      #       name: codecov-umbrella
+      #       fail_ci_if_error: true
+      #     attempt_limit: 5
+      #     attempt_delay: 30000
diff --git a/.github/workflows/integration_tests.yml b/.github/workflows/integration_tests.yml
@@ -47,15 +47,15 @@ jobs:
           poetry env use '3.10'
           source $(poetry env info --path)/bin/activate
           poetry install --with docs,test
-          coverage run -m pytest -m integration_test && coverage xml && coverage report -m
-      - name: Upload coverage to Codecov
-        uses: Wandalen/[email protected]
-        with:
-          action: codecov/[email protected]
-          with: |
-            token: ${{ secrets.CODECOV_TOKEN }}
-            file: ./coverage.xml
-            name: codecov-umbrella
-            fail_ci_if_error: true
-          attempt_limit: 5
-          attempt_delay: 30000
+          # coverage run -m pytest -m integration_test && coverage xml && coverage report -m
+      # - name: Upload coverage to Codecov
+      #   uses: Wandalen/[email protected]
+      #   with:
+      #     action: codecov/[email protected]
+      #     with: |
+      #       token: ${{ secrets.CODECOV_TOKEN }}
+      #       file: ./coverage.xml
+      #       name: codecov-umbrella
+      #       fail_ci_if_error: true
+      #     attempt_limit: 5
+      #     attempt_delay: 30000
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -37,7 +37,7 @@ repos:
     rev: v1.19.0
     hooks:
       - id: typos
-        args: []
+        args: [--force-exclude]
 
   - repo: https://github.com/nbQA-dev/nbQA
     rev: 1.7.1
@@ -50,7 +50,7 @@ repos:
     - id: doctest
       name: doctest
       entry: python3 -m doctest -o NORMALIZE_WHITESPACE
-      files: "^aieng_template/"
+      files: "^atomgen/"
       language: system
 
   - repo: local

diff --git a/README.md b/README.md
@@ -1,14 +1,66 @@
-# AI Engineering template
-
+![atomgen Logo](https://github.com/VectorInstitute/atomgen/blob/main/docs/source/_static/atomgen_logo_text.png?raw=true)
 ----------------------------------------------------------------------------------------
 
-[![code checks](https://github.com/VectorInstitute/aieng-template/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/code_checks.yml)
-[![integration tests](https://github.com/VectorInstitute/aieng-template/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/integration_tests.yml)
-[![docs](https://github.com/VectorInstitute/aieng-template/actions/workflows/docs_deploy.yml/badge.svg)](https://github.com/VectorInstitute/aieng-template/actions/workflows/docs_deploy.yml)
-[![codecov](https://codecov.io/gh/VectorInstitute/aieng-template/branch/main/graph/badge.svg)](https://codecov.io/gh/VectorInstitute/aieng-template)
-[![license](https://img.shields.io/github/license/VectorInstitute/aieng-template.svg)](https://github.com/VectorInstitute/aieng-template/blob/main/LICENSE)
+[![code checks](https://github.com/VectorInstitute/atomgen/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/code_checks.yml)
+[![integration tests](https://github.com/VectorInstitute/atomgen/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/integration_tests.yml)
+[![docs](https://github.com/VectorInstitute/atomgen/actions/workflows/docs_deploy.yml/badge.svg)](https://github.com/VectorInstitute/atomgen/actions/workflows/docs_deploy.yml)
+[![codecov](https://codecov.io/gh/VectorInstitute/atomgen/branch/main/graph/badge.svg)](https://codecov.io/gh/VectorInstitute/atomgen)
+[![license](https://img.shields.io/github/license/VectorInstitute/atomgen.svg)](https://github.com/VectorInstitute/atomgen/blob/main/LICENSE)
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Datasets](#datasets)
+- [Models](#models)
+- [Tasks](#tasks)
+- [Installation](#installation)
+- [Developing](#developing)
+
+## Introduction
+
+AtomGen provides a robust framework for handling atomistic graph datasets focusing on transformer-based implementations. We provide utilities for training various models, experimenting with different pre-training tasks, and pre-trained models.
+
+It streamlines the process of aggregation, standardization, and utilization of datasets from diverse sources, enabling large-scale pre-training and generative modeling on atomistic graphs.
+
+## Datasets
+
+AtomGen facilitates the aggregation and standardization of datasets, including but not limited to:
+
+  - **S2EF Datasets**: Aggregated from multiple sources such as OC20, OC22, ODAC23, MPtrj, and SPICE with structures and energies/forces for pre-training.
+
+  - **Misc. Atomistic Graph Datasets**: Including Molecule3D, Protein Data Bank (PDB), and the Open Quantum Materials Database (OQMD).
+
+Currently, AtomGen has pre-processed datasets for the S2EF pre-training task for OC20 and a mixed dataset of OC20, OC22, ODAC23, MPtrj, and SPICE. They have been uploaded to huggingface hub and can be accessed using the datasets API.
+
+## Models
+
+AtomGen supports a variety of models for training on atomistic graph datasets, including:
+
+  - SchNet
+  - TokenGT
+  - Uni-Mol+ (Modified)
+
+## Tasks
+
+Experimentation with pre-training tasks is facilitated through AtomGen, including:
+
+  - **Structure to Energy & Forces**: Predicting energies and forces for atomistic graphs.
+
+  - **Masked Atom Modeling**: Masking atoms and predicting their properties.
+
+  - **Coordinate Denoising**: Denoising atom coordinates.
+
+These tasks are all facilitated through the DataCollatorForAtomModeling class and can be used simultaneously or individually.
+
+## Installation
+
+The package can be installed using poetry:
+
+```bash
+python3 -m poetry install
+source $(poetry env info --path)/bin/activate
+```
 
-A template repo for AI Engineering projects (using ``python``)
 
 ## 🧑🏿‍💻 Developing
 

diff --git a/_typos.toml b/_typos.toml
@@ -0,0 +1,2 @@
+[files]
+extend-exclude = ["*.json", "**/*.json"]
diff --git a/aieng_template/bar.py b/aieng_template/bar.py
diff --git a/aieng_template/foo.py b/aieng_template/foo.py
diff --git a/atomgen/data/__init__.py b/atomgen/data/__init__.py
@@ -0,0 +1,7 @@
+"""
+Data module for the AtomGen library.
+
+This module contains the data classes and functions for
+pre-processing and collating data for training/inference.
+
+"""
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		[files]
		extend-exclude = [".json", "/.json"]