Releases · ml4ai/skema

17 May 19:34

cl4yton

v1.12.0

fe8f41f

ASKEM_SKEMA_Milestone_12 Latest

Latest

ASKEM SKEMA Milestone 12 release.

Code2FN
- Support to automatically ingest library interface (url and multi-file ingestion)
- Added support for dependency generation
  - module_dependencies field to GrometFNModuleCollection
  - module_location script to automatically extract and locate Python dependencies
- Major progress in porting Python AST front-end to tree-sitter.
Text Reading
- Implementation of encoder-based scenario context engine
  - Adapted instruction-tuned T5 model to extract time and location scenario context.
- Improved the sieve grounder with a cross-platform neural model
- Explored use of LLM-derived data augmentation to improve training data quality.
Eqn Reading
- Added support for exporting MathExpressionTree.
- Added support for physics symbols.
ISA
- Completion of ISA workflow endpoint.
MORAE
- Implemented support for exporting "generalized" AMR export.
- Added support for nonlinear differential equation extraction and representation.
MOVIZ
- Added support for navigating "up" the containment/parent hierarchy within the Function Network display.

Assets 2

22 Feb 20:16

cl4yton

v1.11.0

42a0a84

ASKEM_SKEMA_Milestone_11

ASKEM SKEMA Milestone 11 release.

Code2FN
- Ingest all of V3 of the CISM model code base
- Improved Fortran tree-sitter front-end
- In progress migration of Python AST front-end to tree-sitter
  - Includes support for handling common Python 2 idioms
- Extension of Gromet CAST and FN schema to support Gotos
Text Reading
- Updated core pipeline with updated NLP processors transformer backend model, improving runtime and decreasing memory requirements.
- Incorporated sieve-based DKG grounding module
- Implemented transformer-based model of location and scenario context based
Eqn Reading
- Numerous math idiom extensions to support physics equations common to climate and space weather.
- Support added for representing and serializing minimal-typed DECAPODEs representation.
ISA
- Implemented support for equation-to-equation alignment
- Started implementation of ISA with MathExpressionTree data structures
- Implemented ISA API endpoint
- Started work on equation and code alignment.
MORAE
- Dynamics linespace identification
- LLM-assisted Code2AMR
- AMR-enrichment – using execution to derive parameter values from expression tree evaluation
- Multiple MORAE API endpoints
MOVIZ
- Added URL-based file launching
- Improved network layout
- Added visual indicator of missing ports when wire exists
- Added framework for reverse reference to JSON FN linking
- Added tooltips for extra information per box
- Integrated metadata display

Assets 2

05 Nov 00:42

cl4yton

v1.10.0

23594f7

ASKEM_SKEMA_Milestone_10

ASKEM SKEMA Milestone 10 release.

Code2FN
- Extracting Climate Models
  - ClimLab progress
  - CISM Halfar model code extraction
- tree-sitter SKEMA front-end framework
- Fortran support
  - added pre-processor to convert fixed-form to free-form
  - added support for compound-conditionals
- Python support
  - started migration of PyAST to tree-sitter front-end
    - ported: assignments, arithmetic operations, function definitions
- Matlab support
  - Using SV2AIR3-Waterloo model as development target for ingestion
  - added support for: assignments, binary operators, conditionals
  - added unit tests
- Comment extraction
  - replaced Rust comment extraction with tree-sitter
  - current support: C, Cpp, Fortran, Python, Matlab, R
- GroMEt Generation
  - refactored Function Call and Primitive Function Call handler
- Execution Framework
  - initial support for Python built-in primitive operators and types
  - track symbols throughout execution, returning history of values
- Infrastructure and bug fixes
  - sync'd metadata and GroMEt schema versions
  - added 'Debug' metadata entry to GroMEt for error logging
  - increased unit test coverage
    - tree-sitter comment extractor, parser build-tool, code2fn endpoints, CAST generation, GroMEt generation, execution engine
Text Reading
- Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
  - Integration with DBK grounding annotations for epidemiology and climate.
  - Evaluated performance improvement, before and after fine-tuning
    - DistilBERT: 60.08 (5.85) --> 74.29 (3.07) MMR
    - SPECTER: 59.64 (5.55) --> 73.71 (3.11) MMR
- Extracting relevant NLP annotations including temporal context
  - Integrated with from-scratch re-implementation of Processors
METAL
- Version 2 transformer model for contextualized embedding for linking adapted for climate domain
- Collected code repositories for training and testing
- Generated automated comments using GPT4 (whose quality is considerably higher than GPT3.5)
- Implemented first end-to-end evaluation in two settings:
  - searching only within the file that contains the codee snippet
  - searching across large index over the entire corpus
- Conducted ablation study
Eqn Reading
- Improved support for equation image, using data from University of Wisconsin
  - Cleaned UWisc corpus and annotated
  - Improved handling of plain-text within equations
- pMML2AMR pipeline
  - Improved parser
  - added support for subscripts, unicode for Newtonian derivative syntax
  - improved support for AMR (e.g., infix expressions)
  - added support for representing and serializing Decapodes
    - handle Halfar equation
ISA
- Incorporate extractions from Text Reading module to seed alignments
- Code refactoring
MORAE
- Improvements to Code2AMR pipeline
  - bug fixes
  - developed test suite of synthetic data generated by GPT3.5
  - test suite: synthetic test suite, SIDARTHE, CHIME_SIR, SEIRD Hackathon S1, Simple_SIR
MOVIZ
- Added interface to display JSON
- Highlighting between JSON and FN views
- Improvements to FN layout algorithm, scaling to handle boxes with larger numbers of content elements
- Deployed MOVIZ client, allows local upload of FN JSON files.

Assets 2

06 Aug 23:39

cl4yton

v1.9.0

1198e53

ASKEM_SKEMA_Milestone_9

ASKEM SKEMA Milestone 9 release.

Code2FN
- Python idiom support
  - nested functions (function closures)
  - recursively called functions
- TS2CAST Fortran front-end developments
  - preprocessor (id unsupported idioms, identify missing include files, fixing unsupported & line continuation character)
  - compiler directives using GCC pre-processor
  - derived types (classes/structs) as FN Records
  - representing program, module and "outside" code in FN module namespaces
  - handling Fortran contains
- Initial support for tree-sitter-based MATLAB front-end
- Generalized JSON2GroMEt
- Additional GroMEt ingestion front-end
- source code comment to FN alignment
- bug fixes
TextReading
- unified TA-1 metadata extractions library
- unified TA-1 text reading REST API
- updates to TR and Scenario Context extraction with initial support for climate and earth science domain
- added AMR linking utility to text extractions with scenario contexts; includes support for AMR Petri Net and RegNet
- bug fixes
METAL
- METAL module with version 1 transformer model for contextualized embedding for linking adapted for the epidemiology domain
- development of synthetic epidemiology dataset
- METAL v1 with CodeBert backbone
- METAL v1 with GraphCodeBert backbone
Eqn Reading
- new conversion service and REST API
- improvements to pipeline for generating data for training equation extraction model
- evaluation dataset cleanup
- service structure reorganization
- image2MathML model improvements
- service response time improvements
- MathML inspection and annotation GUIs
- new support for interpretation of presentation MathML to generate content MathML
- improvements to DECAPODES interpretation of dynamics equations
ISA
- improved seed selection for seeded graph matching (SGM) algorithm
- variable name similarity measures
- expanded SGM method in graph matching
MORAE
- improved support for model identification and extraction out of FN
- Eqn2PetriNet produces AMR PetriNet
- Eqn2RegNet produces AMR RegNet
- work on ABM representation
MOVIZ
- updated MOVIZ to support dynamic interaction via point-and-click interactions for expanding and collapsing GroMEt boxes
- new layout mimics hand-drawn OmniGraffle representation of GroMEt FN
- demonstration client supports uploading of arbitrary GroMEt JSON files
- created live demo: https://ml4ai.github.io/moviz-client/#/

Assets 2

03 May 01:16

cl4yton

v1.8.0

1074cce

ASKEM_SKEMA_Milestone_8

ASKEM SKEMA Milestone 8 release. This includes:

Code2FN
- TS2CAST Fortran front-end (tree-sitter based, version 1)
  - Supports ingest of TIE-GCM cpktkm.F and cons.F, producing GrometFNModuleCollection
  - handling continuation lines: '|' and '&'
  - variable declaration and literal value creation
  - single and multiple dimension array declaration, get, set and slice
  - subroutine and function definition and calls
  - primitive operators
  - do loop (Fortran idiom similar to Python for-loop)
  - if, else, else-if support
- Updates to FN Loop and Conditional representation
  - removed explicit Loop/Conditional box wiring
  - fixed handling of for-loop iterator loop condition test
- Handling compound conditions
- Improved support for comprehensions
- Support for functions as first-class objects
  - bookkeeping of symbol table and variable environment: functions, records (classes) and variables
- CAST updates
  - generalization of LiteralValue, removing specific types
  - generalization of operator
  - porting of cast_to_agraph.py
- Progress on GroMEt FN Execution Engine
  - implemented algorithm to walk FN graph in execution order
  - developed v1 execution framework primitive operator set
- API and infrastructure improvements
  - front-end determines language type based on file extension
  - FN diff utility
  - General refactoring and name cleanup
TextReading
- Version 1 of automated code comment linking
- Added additional grounding mechanisms to TR pipeline
  - gazatteer-based grounding and composable grounding pipeline
  - delegate grounding to MIRA's web API
- Added support for additional input formats, in addition to COSMOS
  - plain text
  - grounding through web API
- Created docker file to build a docker image compliant with xDD
- Created (with MIT) library to read and write extractions in canonical JSON format
- Expanded support for initial mention linker to support multi-module GroMEt FN
- Exposed embedding grounding mechanism on TR web service
METAL
- Added space weather ontology support
- METAL module development
  - Data collection
    - generate artificial annotated data using gpt-3.5-turbo
    - extracted 514 repositories from GitHub, keeping only 115 with more than 2 stars
    - Functions and class definition extracted, creating 5,887 code fragments
  - Model architecture
    - two independent transformer encoder models initialized with CodeBERT
  - Evaluation Plan
    - token-level and span-level F1-score
Eqn Reading
- Improvements to Image2MathML pipeline
  - improved train/test data generation from arXiv 2014-2018 corpus
  - reprocessed eqn dataset
  - Image2MathML model retrained, improving BLEU score to 0.95
- Translating Space Weather Equations to DECAPODES Wiring Diagrams
ISA
- Improvements to equation conversion, including translation to canonical form
- Improvements to alignment visualization
- REST API development
MORAE
- Improvements to FN-to-PetriNet translation
- Prototype support for edge extractions
MOVIZ
- added optional JSON configuration specification interface to support drawing partially expanded FNs

Assets 2

24 Feb 18:42

cl4yton

v1.7.0

b3d4601

ASKEM_SKEMA_Milestone_7

ASKEM SKEMA Milestone 7 release. This includes:

Code2FN
- Added implicit conditional support for If statements.
- Fixed issue with Binary operations that use the same variable as both operands.
- Various Gromet formatting fixes such as removing extra new lines and consistent pathing format between different systems
- Developed API endpoint and example clients for Code2FN pipeline
- Developed script (single_file_ingester.py) to simplify Gromet generation for single files and code snippets
- Defined set of primitive operators and created framework for primitive execution
- Cleanup and rearranging of visitors in Code2FN Python to CAST and CAST to GroMEt steps.
TextReading
- version 1 of the scenario context engine into the text reading pipeline with support for location and temporal contexts, with the following highlights:
  - Detection of specific dates, times, date ranges, and time ranges
  - Detection of locations with different granularity levels: Abstract locations, countries, states/provinces, cities, organizations
  - An efficient algorithm to associate parameter extractions with the candidate scenario context detections based on the proximity of occurrence
METAL
- Improved the grounding mechanism to consider first relevant concepts of the DKG relevant to parameter extractions.
- Extended support of metadata alignment to extractions of collections of documents of arbitrary length.
- Updated the code to support metadata alignment on gromets with multiple modules
- Created a version based on contextualized embeddings (using SciBert, but any other transformer model works as a drop in replacement) to do the metadata alignment.
ISA
- The conversion from MathML to graph representation has been preliminarily completed, including dealing with basic operations, parentheses, and arithmetic priority
- The conversion from graph representation to adjacency matrix is completed.
- The preliminary development of the structural alignment between equation graphs returns the matching ratio and the closest matched term/variable pairs between two equations.
- Based on alignment results, a method is proposed to facilitate the identification of similarities and differences between models presented in two papers, or the investigation of code implementation issues
MORAE
- Target format of extraction changed from BiLayer to PetriNet in py-acset form
- Developed code GroMEt to graph database representation to allow for structural and dataflow related queries to isolate and extract code roles.
- Thin-thread pipeline to TA-2 developed and tested at the Hackathon/Evaluation
- Basic integration into Terarium via REST API
MOVIZ
- Added support for GroMEt class features that were added in last release
- Alterations to layout and visual design to better match hand-layout examples
- Added labels over original hand-layout examples to better support debugging
- Added interface for direct file GroMEt upload
- MOVIZ demonstrated to run locally on non-visualization SKEMA team machines to aid debugging and exploration of GroMEt function networks.
- MOVIZ demonstration on epidemiology model kernels.

Assets 2

18 Feb 23:37

cl4yton

v1.6.0

a1a53d8

ASKEM_SKEMA_Milestone_6

ASKEM SKEMA Milestone 6 release. This includes:

Adding support for implicit conditional checks in while loop conditions.
- Many Python data types may be interpreted as having a Boolean value. For example, if x is a list, then in the code while x:, the condition will evaluate to True if the list has elements, otherwise False. The Python2CAST translation now treats condition expression trees that do not have a Boolean operator at the head as implicitly wrapped in a Python bool() function call.
Python permits lazy variable declaration. This means a new variable identifier may be declared within a conditional branch (must be declared in any branch of a conditional). These are now handled.
- This particularly caused issues when the introduced variables were used in keyword arguments in function calls within conditionals.
Adding support for keyword only (kwonly) arguments
- These are arguments in a function definition that come after a * in the arguments list
Improved support of the user module imports.
- Added functionality to better read user modules. This also fixes an issue where user modules couldn't be read in certain instances.

Assets 2

20 Jan 14:54

cl4yton

v1.5.0

4d7a1b9

ASKEM_SKEMA_Milestone_5

ASKEM SKEMA Milestone 5 release. This includes:

migration of Program Analysis pipeline from AutoMATES to the SKEMA repository
Record (class/struct) inheritance and calls to super
identifying Record attribute (field) introduction outside of constructor (init)
support for general slicing
primitive support for raise
initial support for Ellipsis, as used in numpy slicing
Support to ingest bucky_v2 code base

Assets 2

09 Dec 17:01

cl4yton

v1.0.0

e17a2b4

ASKEM_SKEMA_Dec_2022_Demo

Release of SKEMA for the ASKEM December 2022 Demo.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ml4ai/skema

ASKEM_SKEMA_Milestone_12

ASKEM_SKEMA_Milestone_11

ASKEM_SKEMA_Milestone_10

ASKEM_SKEMA_Milestone_9

ASKEM_SKEMA_Milestone_8

ASKEM_SKEMA_Milestone_7

ASKEM_SKEMA_Milestone_6

ASKEM_SKEMA_Milestone_5

ASKEM_SKEMA_Dec_2022_Demo