Release ASKEM_SKEMA_Milestone_10 · ml4ai/skema

ASKEM SKEMA Milestone 10 release.

Code2FN
- Extracting Climate Models
  - ClimLab progress
  - CISM Halfar model code extraction
- tree-sitter SKEMA front-end framework
- Fortran support
  - added pre-processor to convert fixed-form to free-form
  - added support for compound-conditionals
- Python support
  - started migration of PyAST to tree-sitter front-end
    - ported: assignments, arithmetic operations, function definitions
- Matlab support
  - Using SV2AIR3-Waterloo model as development target for ingestion
  - added support for: assignments, binary operators, conditionals
  - added unit tests
- Comment extraction
  - replaced Rust comment extraction with tree-sitter
  - current support: C, Cpp, Fortran, Python, Matlab, R
- GroMEt Generation
  - refactored Function Call and Primitive Function Call handler
- Execution Framework
  - initial support for Python built-in primitive operators and types
  - track symbols throughout execution, returning history of values
- Infrastructure and bug fixes
  - sync'd metadata and GroMEt schema versions
  - added 'Debug' metadata entry to GroMEt for error logging
  - increased unit test coverage
    - tree-sitter comment extractor, parser build-tool, code2fn endpoints, CAST generation, GroMEt generation, execution engine
Text Reading
- Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
  - Integration with DBK grounding annotations for epidemiology and climate.
  - Evaluated performance improvement, before and after fine-tuning
    - DistilBERT: 60.08 (5.85) --> 74.29 (3.07) MMR
    - SPECTER: 59.64 (5.55) --> 73.71 (3.11) MMR
- Extracting relevant NLP annotations including temporal context
  - Integrated with from-scratch re-implementation of Processors
METAL
- Version 2 transformer model for contextualized embedding for linking adapted for climate domain
- Collected code repositories for training and testing
- Generated automated comments using GPT4 (whose quality is considerably higher than GPT3.5)
- Implemented first end-to-end evaluation in two settings:
  - searching only within the file that contains the codee snippet
  - searching across large index over the entire corpus
- Conducted ablation study
Eqn Reading
- Improved support for equation image, using data from University of Wisconsin
  - Cleaned UWisc corpus and annotated
  - Improved handling of plain-text within equations
- pMML2AMR pipeline
  - Improved parser
  - added support for subscripts, unicode for Newtonian derivative syntax
  - improved support for AMR (e.g., infix expressions)
  - added support for representing and serializing Decapodes
    - handle Halfar equation
ISA
- Incorporate extractions from Text Reading module to seed alignments
- Code refactoring
MORAE
- Improvements to Code2AMR pipeline
  - bug fixes
  - developed test suite of synthetic data generated by GPT3.5
  - test suite: synthetic test suite, SIDARTHE, CHIME_SIR, SEIRD Hackathon S1, Simple_SIR
MOVIZ
- Added interface to display JSON
- Highlighting between JSON and FN views
- Improvements to FN layout algorithm, scaling to handle boxes with larger numbers of content elements
- Deployed MOVIZ client, allows local upload of FN JSON files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASKEM_SKEMA_Milestone_10