ASKEM_SKEMA_Milestone_10
ASKEM SKEMA Milestone 10 release.
-
Code2FN
- Extracting Climate Models
- ClimLab progress
- CISM Halfar model code extraction
- tree-sitter SKEMA front-end framework
- Fortran support
- added pre-processor to convert fixed-form to free-form
- added support for compound-conditionals
- Python support
- started migration of PyAST to tree-sitter front-end
- ported: assignments, arithmetic operations, function definitions
- started migration of PyAST to tree-sitter front-end
- Matlab support
- Using SV2AIR3-Waterloo model as development target for ingestion
- added support for: assignments, binary operators, conditionals
- added unit tests
- Comment extraction
- replaced Rust comment extraction with tree-sitter
- current support: C, Cpp, Fortran, Python, Matlab, R
- GroMEt Generation
- refactored Function Call and Primitive Function Call handler
- Execution Framework
- initial support for Python built-in primitive operators and types
- track symbols throughout execution, returning history of values
- Infrastructure and bug fixes
- sync'd metadata and GroMEt schema versions
- added 'Debug' metadata entry to GroMEt for error logging
- increased unit test coverage
- tree-sitter comment extractor, parser build-tool, code2fn endpoints, CAST generation, GroMEt generation, execution engine
- Extracting Climate Models
-
Text Reading
- Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
- Integration with DBK grounding annotations for epidemiology and climate.
- Evaluated performance improvement, before and after fine-tuning
- DistilBERT: 60.08 (5.85) --> 74.29 (3.07) MMR
- SPECTER: 59.64 (5.55) --> 73.71 (3.11) MMR
- Extracting relevant NLP annotations including temporal context
- Integrated with from-scratch re-implementation of Processors
- Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
-
METAL
- Version 2 transformer model for contextualized embedding for linking adapted for climate domain
- Collected code repositories for training and testing
- Generated automated comments using GPT4 (whose quality is considerably higher than GPT3.5)
- Implemented first end-to-end evaluation in two settings:
- searching only within the file that contains the codee snippet
- searching across large index over the entire corpus
- Conducted ablation study
-
Eqn Reading
- Improved support for equation image, using data from University of Wisconsin
- Cleaned UWisc corpus and annotated
- Improved handling of plain-text within equations
- pMML2AMR pipeline
- Improved parser
- added support for subscripts, unicode for Newtonian derivative syntax
- improved support for AMR (e.g., infix expressions)
- added support for representing and serializing Decapodes
- handle Halfar equation
- Improved support for equation image, using data from University of Wisconsin
-
ISA
- Incorporate extractions from Text Reading module to seed alignments
- Code refactoring
-
MORAE
- Improvements to Code2AMR pipeline
- bug fixes
- developed test suite of synthetic data generated by GPT3.5
- test suite: synthetic test suite, SIDARTHE, CHIME_SIR, SEIRD Hackathon S1, Simple_SIR
- Improvements to Code2AMR pipeline
-
MOVIZ
- Added interface to display JSON
- Highlighting between JSON and FN views
- Improvements to FN layout algorithm, scaling to handle boxes with larger numbers of content elements
- Deployed MOVIZ client, allows local upload of FN JSON files.