Skip to content

ASKEM_SKEMA_Milestone_10

Compare
Choose a tag to compare
@cl4yton cl4yton released this 05 Nov 00:42
· 121 commits to main since this release
23594f7

ASKEM SKEMA Milestone 10 release.

  • Code2FN

    • Extracting Climate Models
      • ClimLab progress
      • CISM Halfar model code extraction
    • tree-sitter SKEMA front-end framework
    • Fortran support
      • added pre-processor to convert fixed-form to free-form
      • added support for compound-conditionals
    • Python support
      • started migration of PyAST to tree-sitter front-end
        • ported: assignments, arithmetic operations, function definitions
    • Matlab support
      • Using SV2AIR3-Waterloo model as development target for ingestion
      • added support for: assignments, binary operators, conditionals
      • added unit tests
    • Comment extraction
      • replaced Rust comment extraction with tree-sitter
      • current support: C, Cpp, Fortran, Python, Matlab, R
    • GroMEt Generation
      • refactored Function Call and Primitive Function Call handler
    • Execution Framework
      • initial support for Python built-in primitive operators and types
      • track symbols throughout execution, returning history of values
    • Infrastructure and bug fixes
      • sync'd metadata and GroMEt schema versions
      • added 'Debug' metadata entry to GroMEt for error logging
      • increased unit test coverage
        • tree-sitter comment extractor, parser build-tool, code2fn endpoints, CAST generation, GroMEt generation, execution engine
  • Text Reading

    • Improved grounding, transitioning from static word embeddings to contextualized word embeddings fine-tuned to domain annotations.
      • Integration with DBK grounding annotations for epidemiology and climate.
      • Evaluated performance improvement, before and after fine-tuning
        • DistilBERT: 60.08 (5.85) --> 74.29 (3.07) MMR
        • SPECTER: 59.64 (5.55) --> 73.71 (3.11) MMR
    • Extracting relevant NLP annotations including temporal context
      • Integrated with from-scratch re-implementation of Processors
  • METAL

    • Version 2 transformer model for contextualized embedding for linking adapted for climate domain
    • Collected code repositories for training and testing
    • Generated automated comments using GPT4 (whose quality is considerably higher than GPT3.5)
    • Implemented first end-to-end evaluation in two settings:
      • searching only within the file that contains the codee snippet
      • searching across large index over the entire corpus
    • Conducted ablation study
  • Eqn Reading

    • Improved support for equation image, using data from University of Wisconsin
      • Cleaned UWisc corpus and annotated
      • Improved handling of plain-text within equations
    • pMML2AMR pipeline
      • Improved parser
      • added support for subscripts, unicode for Newtonian derivative syntax
      • improved support for AMR (e.g., infix expressions)
      • added support for representing and serializing Decapodes
        • handle Halfar equation
  • ISA

    • Incorporate extractions from Text Reading module to seed alignments
    • Code refactoring
  • MORAE

    • Improvements to Code2AMR pipeline
      • bug fixes
      • developed test suite of synthetic data generated by GPT3.5
      • test suite: synthetic test suite, SIDARTHE, CHIME_SIR, SEIRD Hackathon S1, Simple_SIR
  • MOVIZ

    • Added interface to display JSON
    • Highlighting between JSON and FN views
    • Improvements to FN layout algorithm, scaling to handle boxes with larger numbers of content elements
    • Deployed MOVIZ client, allows local upload of FN JSON files.