Skip to content

Releases: ml4ai/skema

ASKEM_SKEMA_Milestone_12

17 May 19:34
fe8f41f
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 12 release.

  • Code2FN
    • Support to automatically ingest library interface (url and multi-file ingestion)
    • Added support for dependency generation
      • module_dependencies field to GrometFNModuleCollection
      • module_location script to automatically extract and locate Python dependencies
    • Major progress in porting Python AST front-end to tree-sitter.
  • Text Reading
    • Implementation of encoder-based scenario context engine
      • Adapted instruction-tuned T5 model to extract time and location scenario context.
    • Improved the sieve grounder with a cross-platform neural model
    • Explored use of LLM-derived data augmentation to improve training data quality.
  • Eqn Reading
    • Added support for exporting MathExpressionTree.
    • Added support for physics symbols.
  • ISA
    • Completion of ISA workflow endpoint.
  • MORAE
    • Implemented support for exporting "generalized" AMR export.
    • Added support for nonlinear differential equation extraction and representation.
  • MOVIZ
    • Added support for navigating "up" the containment/parent hierarchy within the Function Network display.

ASKEM_SKEMA_Milestone_11

22 Feb 20:16
42a0a84
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 11 release.

  • Code2FN
    • Ingest all of V3 of the CISM model code base
    • Improved Fortran tree-sitter front-end
    • In progress migration of Python AST front-end to tree-sitter
      • Includes support for handling common Python 2 idioms
    • Extension of Gromet CAST and FN schema to support Gotos
  • Text Reading
    • Updated core pipeline with updated NLP processors transformer backend model, improving runtime and decreasing memory requirements.
    • Incorporated sieve-based DKG grounding module
    • Implemented transformer-based model of location and scenario context based
  • Eqn Reading
    • Numerous math idiom extensions to support physics equations common to climate and space weather.
    • Support added for representing and serializing minimal-typed DECAPODEs representation.
  • ISA
    • Implemented support for equation-to-equation alignment
    • Started implementation of ISA with MathExpressionTree data structures
    • Implemented ISA API endpoint
    • Started work on equation and code alignment.
  • MORAE
    • Dynamics linespace identification
    • LLM-assisted Code2AMR
    • AMR-enrichment – using execution to derive parameter values from expression tree evaluation
    • Multiple MORAE API endpoints
  • MOVIZ
    • Added URL-based file launching
    • Improved network layout
    • Added visual indicator of missing ports when wire exists
    • Added framework for reverse reference to JSON FN linking
    • Added tooltips for extra information per box
    • Integrated metadata display

ASKEM_SKEMA_Milestone_10

05 Nov 00:42
23594f7
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 10 release.

  • Code2FN

    • Extracting Climate Models
      • ClimLab progress
      • CISM Halfar model code extraction
    • tree-sitter SKEMA front-end framework
    • Fortran support
      • added pre-processor to convert fixed-form to free-form
      • added support for compound-conditionals
    • Python support
      • started migration of PyAST to tree-sitter front-end
        • ported: assignments, arithmetic operations, function definitions
    • Matlab support
      • Using SV2AIR3-Waterloo model as development target for ingestion
      • added support for: assignments, binary operators, conditionals
      • added unit tests
    • Comment extraction
      • replaced Rust comment extraction with tree-sitter
      • current support: C, Cpp, Fortran, Python, Matlab, R
    • GroMEt Generation
      • refactored Function Call and Primitive Function Call handler
    • Execution Framework
      • initial support for Python built-in primitive operators and types
      • track symbols throughout execution, returning history of values
    • Infrastructure and bug fixes
      • sync'd metadata and GroMEt schema versions
      • added 'Debug' metadata entry to GroMEt for error logging
      • increased unit test coverage
        • tree-sitter comment extractor, parser build-tool, code2fn endpoints, CAST generation, GroMEt generation, execution engine
  • Text Reading

    • Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
      • Integration with DBK grounding annotations for epidemiology and climate.
      • Evaluated performance improvement, before and after fine-tuning
        • DistilBERT: 60.08 (5.85) --> 74.29 (3.07) MMR
        • SPECTER: 59.64 (5.55) --> 73.71 (3.11) MMR
    • Extracting relevant NLP annotations including temporal context
      • Integrated with from-scratch re-implementation of Processors
  • METAL

    • Version 2 transformer model for contextualized embedding for linking adapted for climate domain
    • Collected code repositories for training and testing
    • Generated automated comments using GPT4 (whose quality is considerably higher than GPT3.5)
    • Implemented first end-to-end evaluation in two settings:
      • searching only within the file that contains the codee snippet
      • searching across large index over the entire corpus
    • Conducted ablation study
  • Eqn Reading

    • Improved support for equation image, using data from University of Wisconsin
      • Cleaned UWisc corpus and annotated
      • Improved handling of plain-text within equations
    • pMML2AMR pipeline
      • Improved parser
      • added support for subscripts, unicode for Newtonian derivative syntax
      • improved support for AMR (e.g., infix expressions)
      • added support for representing and serializing Decapodes
        • handle Halfar equation
  • ISA

    • Incorporate extractions from Text Reading module to seed alignments
    • Code refactoring
  • MORAE

    • Improvements to Code2AMR pipeline
      • bug fixes
      • developed test suite of synthetic data generated by GPT3.5
      • test suite: synthetic test suite, SIDARTHE, CHIME_SIR, SEIRD Hackathon S1, Simple_SIR
  • MOVIZ

    • Added interface to display JSON
    • Highlighting between JSON and FN views
    • Improvements to FN layout algorithm, scaling to handle boxes with larger numbers of content elements
    • Deployed MOVIZ client, allows local upload of FN JSON files.

ASKEM_SKEMA_Milestone_9

06 Aug 23:39
1198e53
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 9 release.

  • Code2FN

    • Python idiom support
      • nested functions (function closures)
      • recursively called functions
    • TS2CAST Fortran front-end developments
      • preprocessor (id unsupported idioms, identify missing include files, fixing unsupported & line continuation character)
      • compiler directives using GCC pre-processor
      • derived types (classes/structs) as FN Records
      • representing program, module and "outside" code in FN module namespaces
      • handling Fortran contains
    • Initial support for tree-sitter-based MATLAB front-end
    • Generalized JSON2GroMEt
    • Additional GroMEt ingestion front-end
    • source code comment to FN alignment
    • bug fixes
  • TextReading

    • unified TA-1 metadata extractions library
    • unified TA-1 text reading REST API
    • updates to TR and Scenario Context extraction with initial support for climate and earth science domain
    • added AMR linking utility to text extractions with scenario contexts; includes support for AMR Petri Net and RegNet
    • bug fixes
  • METAL

    • METAL module with version 1 transformer model for contextualized embedding for linking adapted for the epidemiology domain
    • development of synthetic epidemiology dataset
    • METAL v1 with CodeBert backbone
    • METAL v1 with GraphCodeBert backbone
  • Eqn Reading

    • new conversion service and REST API
    • improvements to pipeline for generating data for training equation extraction model
    • evaluation dataset cleanup
    • service structure reorganization
    • image2MathML model improvements
    • service response time improvements
    • MathML inspection and annotation GUIs
    • new support for interpretation of presentation MathML to generate content MathML
    • improvements to DECAPODES interpretation of dynamics equations
  • ISA

    • improved seed selection for seeded graph matching (SGM) algorithm
    • variable name similarity measures
    • expanded SGM method in graph matching
  • MORAE

    • improved support for model identification and extraction out of FN
    • Eqn2PetriNet produces AMR PetriNet
    • Eqn2RegNet produces AMR RegNet
    • work on ABM representation
  • MOVIZ

    • updated MOVIZ to support dynamic interaction via point-and-click interactions for expanding and collapsing GroMEt boxes
    • new layout mimics hand-drawn OmniGraffle representation of GroMEt FN
    • demonstration client supports uploading of arbitrary GroMEt JSON files
    • created live demo: https://ml4ai.github.io/moviz-client/#/

ASKEM_SKEMA_Milestone_8

03 May 01:16
1074cce
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 8 release. This includes:

  • Code2FN
    • TS2CAST Fortran front-end (tree-sitter based, version 1)
      • Supports ingest of TIE-GCM cpktkm.F and cons.F, producing GrometFNModuleCollection
      • handling continuation lines: '|' and '&'
      • variable declaration and literal value creation
      • single and multiple dimension array declaration, get, set and slice
      • subroutine and function definition and calls
      • primitive operators
      • do loop (Fortran idiom similar to Python for-loop)
      • if, else, else-if support
    • Updates to FN Loop and Conditional representation
      • removed explicit Loop/Conditional box wiring
      • fixed handling of for-loop iterator loop condition test
    • Handling compound conditions
    • Improved support for comprehensions
    • Support for functions as first-class objects
      • bookkeeping of symbol table and variable environment: functions, records (classes) and variables
    • CAST updates
      • generalization of LiteralValue, removing specific types
      • generalization of operator
      • porting of cast_to_agraph.py
    • Progress on GroMEt FN Execution Engine
      • implemented algorithm to walk FN graph in execution order
      • developed v1 execution framework primitive operator set
    • API and infrastructure improvements
      • front-end determines language type based on file extension
      • FN diff utility
      • General refactoring and name cleanup
  • TextReading
    • Version 1 of automated code comment linking
    • Added additional grounding mechanisms to TR pipeline
      • gazatteer-based grounding and composable grounding pipeline
      • delegate grounding to MIRA's web API
    • Added support for additional input formats, in addition to COSMOS
      • plain text
      • grounding through web API
    • Created docker file to build a docker image compliant with xDD
    • Created (with MIT) library to read and write extractions in canonical JSON format
    • Expanded support for initial mention linker to support multi-module GroMEt FN
    • Exposed embedding grounding mechanism on TR web service
  • METAL
    • Added space weather ontology support
    • METAL module development
      • Data collection
        • generate artificial annotated data using gpt-3.5-turbo
        • extracted 514 repositories from GitHub, keeping only 115 with more than 2 stars
        • Functions and class definition extracted, creating 5,887 code fragments
      • Model architecture
        • two independent transformer encoder models initialized with CodeBERT
      • Evaluation Plan
        • token-level and span-level F1-score
  • Eqn Reading
    • Improvements to Image2MathML pipeline
      • improved train/test data generation from arXiv 2014-2018 corpus
      • reprocessed eqn dataset
      • Image2MathML model retrained, improving BLEU score to 0.95
    • Translating Space Weather Equations to DECAPODES Wiring Diagrams
  • ISA
    • Improvements to equation conversion, including translation to canonical form
    • Improvements to alignment visualization
    • REST API development
  • MORAE
    • Improvements to FN-to-PetriNet translation
    • Prototype support for edge extractions
  • MOVIZ
    • added optional JSON configuration specification interface to support drawing partially expanded FNs

ASKEM_SKEMA_Milestone_7

24 Feb 18:42
b3d4601
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 7 release. This includes:

  • Code2FN
    • Added implicit conditional support for If statements.
    • Fixed issue with Binary operations that use the same variable as both operands.
    • Various Gromet formatting fixes such as removing extra new lines and consistent pathing format between different systems
    • Developed API endpoint and example clients for Code2FN pipeline
    • Developed script (single_file_ingester.py) to simplify Gromet generation for single files and code snippets
    • Defined set of primitive operators and created framework for primitive execution
    • Cleanup and rearranging of visitors in Code2FN Python to CAST and CAST to GroMEt steps.
  • TextReading
    • version 1 of the scenario context engine into the text reading pipeline with support for location and temporal contexts, with the following highlights:
      • Detection of specific dates, times, date ranges, and time ranges
      • Detection of locations with different granularity levels: Abstract locations, countries, states/provinces, cities, organizations
      • An efficient algorithm to associate parameter extractions with the candidate scenario context detections based on the proximity of occurrence
  • METAL
    • Improved the grounding mechanism to consider first relevant concepts of the DKG relevant to parameter extractions.
    • Extended support of metadata alignment to extractions of collections of documents of arbitrary length.
    • Updated the code to support metadata alignment on gromets with multiple modules
    • Created a version based on contextualized embeddings (using SciBert, but any other transformer model works as a drop in replacement) to do the metadata alignment.
  • ISA
    • The conversion from MathML to graph representation has been preliminarily completed, including dealing with basic operations, parentheses, and arithmetic priority
    • The conversion from graph representation to adjacency matrix is completed.
    • The preliminary development of the structural alignment between equation graphs returns the matching ratio and the closest matched term/variable pairs between two equations.
    • Based on alignment results, a method is proposed to facilitate the identification of similarities and differences between models presented in two papers, or the investigation of code implementation issues
  • MORAE
    • Target format of extraction changed from BiLayer to PetriNet in py-acset form
    • Developed code GroMEt to graph database representation to allow for structural and dataflow related queries to isolate and extract code roles.
    • Thin-thread pipeline to TA-2 developed and tested at the Hackathon/Evaluation
    • Basic integration into Terarium via REST API
  • MOVIZ
    • Added support for GroMEt class features that were added in last release
    • Alterations to layout and visual design to better match hand-layout examples
    • Added labels over original hand-layout examples to better support debugging
    • Added interface for direct file GroMEt upload
    • MOVIZ demonstrated to run locally on non-visualization SKEMA team machines to aid debugging and exploration of GroMEt function networks.
    • MOVIZ demonstration on epidemiology model kernels.

ASKEM_SKEMA_Milestone_6

18 Feb 23:37
a1a53d8
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 6 release. This includes:

  • Adding support for implicit conditional checks in while loop conditions.
    • Many Python data types may be interpreted as having a Boolean value. For example, if x is a list, then in the code while x:, the condition will evaluate to True if the list has elements, otherwise False. The Python2CAST translation now treats condition expression trees that do not have a Boolean operator at the head as implicitly wrapped in a Python bool() function call.
  • Python permits lazy variable declaration. This means a new variable identifier may be declared within a conditional branch (must be declared in any branch of a conditional). These are now handled.
    • This particularly caused issues when the introduced variables were used in keyword arguments in function calls within conditionals.
  • Adding support for keyword only (kwonly) arguments
    • These are arguments in a function definition that come after a * in the arguments list
  • Improved support of the user module imports.
    • Added functionality to better read user modules. This also fixes an issue where user modules couldn't be read in certain instances.

ASKEM_SKEMA_Milestone_5

20 Jan 14:54
4d7a1b9
Compare
Choose a tag to compare

ASKEM SKEMA Milestone 5 release. This includes:

  • migration of Program Analysis pipeline from AutoMATES to the SKEMA repository
  • Record (class/struct) inheritance and calls to super
  • identifying Record attribute (field) introduction outside of constructor (init)
  • support for general slicing
  • primitive support for raise
  • initial support for Ellipsis, as used in numpy slicing
  • Support to ingest bucky_v2 code base

ASKEM_SKEMA_Dec_2022_Demo

09 Dec 17:01
e17a2b4
Compare
Choose a tag to compare

Release of SKEMA for the ASKEM December 2022 Demo.