ml4ai · vincentraymond-ua · Nov 15, 2023 · Nov 14, 2023 · Nov 14, 2023 · Nov 15, 2023
diff --git a/docs/dev/adding_new_model.md b/docs/dev/adding_new_model.md
@@ -0,0 +1,22 @@
+## Add Model to Google Drive
+### Original Model Source
+The full model clone, including all documentation and non source files should be uploaded to:
+```bash
+data/models/(climate|ecology|epidemiology|space_weather)/
+```
+### Zip Archive
+A zip archive containing ONLY the source files should be uploaded to:
+```bash
+data/models/zip-archives/
+```
+If using a MACOS system, it may be better to automate the generation of this zip archive using a script. See https://github.com/ml4ai/skema/issues/599
+
+## Add model to artifacts.askem.lum.ai bucket
+Currently this step is done automatically. Model archives are mirrored once a week to artifcts.askem.lum.ai.  
+
+## Updating models.yaml
+Add an entry to ```skema/program_analysis/model_coverage_report/models.yaml``` for the model.
+```YAML
+Example-Model:
+    zip_archive: "https://pathtozip.com"
+```
diff --git a/docs/dev/adding_new_tree_sitter_frontend.md b/docs/dev/adding_new_tree_sitter_frontend.md
@@ -0,0 +1,72 @@
+## Language support
+If the language you wish to use does not already have a [Tree-sitter parser](https://tree-sitter.github.io/tree-sitter/#parsers), you can create it with a grammar for that language.
+## Building the Tree-sitter parser
+**Requirements**:
+* A GitHub repository with a grammar file named `grammar.js` for the language you wish to support. 
+* Tree-sitter also support writing your own grammar file from scratch with the steps [shown here.](https://tree-sitter.github.io/tree-sitter/creating-parsers)
+
+**Steps**:
+1. In directory `skema/program_analysis/tree_sitter_parsers/` do the following:
+2. Add new entry to `languages.yaml`
+```yaml
+matlab:
+  tree_sitter_name: tree-sitter-matlab
+  clone_url: https://github.com/acristoffers/tree-sitter-matlab.git
+  supports_comment_extraction: True
+  supports_fn_extraction: True
+  extensions:
+    - .m
+```
+3. Run  `build_parsers.py`. Adding an entry to `languages.yaml` will automatically create a new command line argument for `build_parsers.py`.
+```bash
+python  build_languages.py --matlab
+```
+If successful, a build directory will have been created with a language object file `installed_languages.so` 
+
+## Using the tree-sitter parser
+**Requirements:**
+* Tree-sitter language object file built using above steps
+
+**Steps:**
+1. Import the path to the tree-sitter library. 
+```python
+from skema.program_analysis.CAST.tree_sitter_parsers.build_parsers import INSTALLED_LANGUAGES_FILEPATH
+```
+2. Create the Language object. This is used for parsing or running queries.
+```python
+language_object = Language(INSTALLED_LANGUAGES_FILEPATH, "matlab")
+```
+3. Parse the source code using the language object created above. Note that the source code needs to be a bytes object rather than a string.
+```python
+parser = Parser()
+parser.set_language(language_object)
+tree = parser.parse(bytes(source, "utf8"))
+```
+
+## Notes on walking tree-sitter Tree
+* Running parse will create a Tree of Node objects with the root node stored at tree.root_node. 
+* Node objects only contain the fields `type`, `children`, `start_point`, `end_point`. To get the actual string identifier of a node, you need to infer it from the source code and the source reference information. The following is the implementation that the Fortran frontend uses. 
+```python
+def get_identifier(self, node: Node, source: str) -> str:
+        """Given a node, return the identifier it represents. ie. The code between node.start_point and node.end_point"""
+        line_num = 0
+        column_num = 0
+        in_identifier = False
+        identifier = ""
+        for i, char in enumerate(source):
+            if line_num == node.start_point[0] and column_num == node.start_point[1]:
+                in_identifier = True
+            elif line_num == node.end_point[0] and column_num == node.end_point[1]:
+                break
+
+            if char == "\n":
+                line_num += 1
+                column_num = 0
+            else:
+                column_num += 1
+
+            if in_identifier:
+                identifier += char
+
+        return identifier
+```
diff --git a/docs/dev/generating_code2fn_model_coverage.md b/docs/dev/generating_code2fn_model_coverage.md
@@ -0,0 +1,2 @@
+# Generating code2fn model coverage reports
+WIP: [https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report](https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report).
diff --git a/docs/dev/using_code_ingestion_frontends.md b/docs/dev/using_code_ingestion_frontends.md
@@ -0,0 +1,50 @@
+## multi_file_ingester
+### Command line arguments
+ - **sysname (str)** - The name of the system being ingested
+ - **path (str)** - The path to the root of the system
+ - **files (str)** - The path to system_filepaths.txt
+### system_filepaths.txt
+Processing a multi-file system requires a system_filepaths.txt file describing the structure of the system. Each line represents the path to one file in the system relative to the root directory. For example the system_filepaths.txt file for chime_penn_full would be:
+```
+cli.py
+constants.py
+model/parameters.py
+model/sir.py
+model/validators/base.py
+model/validators/validators.py
+```
+### Running as script
+```bash
+python multi_file_ingester.py --sysname "CHIME" --path /path/to/root --files /path/to/system_filepaths.txt
+```
+### Running as library
+```python
+from skema.program_analysis.multi_file_ingester import process_file_system
+gromet_collection = process_file_system("CHIME", "data/chime/", "data/chime/system_filepaths.txt", write_to_file=True)
+```
+
+## single_file_ingester
+### Command line arguments
+ - **path (str)** - The relative or absolute path of the file to process"
+### Running as script
+```bash
+python single_file_ingester.py data/TIEGCM/cpktkm.F
+```
+### Running as library
+```python
+from skema.program_analysis.single_file_ingester import process_file
+gromet_collection = process_file("cpktkm.F", write_to_file=True)
+```
+
+## snippet_file_ingester
+### Command line arguments
+ - **snippet(str)** - The snippet of Python/Fortran code to process"
+ - **extension(str)** - A file extension representing the language of the code snippet(.f95, .f, .py)"
+### Running as script
+```bash
+python snippet_file_ingester.py "x=2" ".py"
+```
+### Running as library
+```python
+from skema.program_analysis.snippet_file_ingester import process_snippet
+gromet_collection = process_snippet("x=2", ".py", write_to_file=True)
diff --git a/docs/dev/using_tree_sitter_preprocessor.md b/docs/dev/using_tree_sitter_preprocessor.md
@@ -0,0 +1,42 @@
+# tree-sitter Fortran preprocessor
+## Command line options
+### Required
+ - **source_path (str)** - The path to the Fortran source file that the preprocessor will be run on
+ - **out_path (str)** - The path to the directory where intermediate products will be stored
+### Optional
+ - **overwrite (bool)** - If True, overwrite the files in the directory specified by out_path
+ - **out_missing_includes (bool)** - If True, output report of missing included files
+ - **out_gcc (bool)** - If True, output intermediate product generated by GCC
+ - **out_unsupported (bool)** - If True, output report of unsupported idioms contained in the source
+ - **out_corrected (bool)** - If True, output the final source that will be sent to tree-sitter.
+ - **out_parse (bool)** - If True, output the tree-sitter parse tree in sexp format
+## Preprocessing #include directives
+To handle the include directive, the preprocessor requires a directory containing any additional files that a source file includes.
+
+This directory should be located in the same directory as the source at source_path and follow the naming structure *include_filename*.
+
+For example, in the source file:
+**cons.F**
+
+    #include <defs.h>
+
+The directory structure should look like the following:
+
+ **TIE_GCM
+&emsp; cons.F 
+&emsp; include_cons
+&emsp;&emsp; defs.h**
+
+## Running as library 
+
+```python
+from skema.program_analysis.TS2CAST.preprocessor.preprocess import preprocess
+
+parse_tree = preprocess("skema/data/TIE_GCM/cons.F", "skema/data/TIE_GCM/intermediate_products_cons/", out_parse=True) 
+```
+
+## Running as script
+
+```bash
+python preproces.py skema/data/TIE_GCM/cons.F skema/data/TIE_GCM/intermediate_prodcuts_cons/ --out_parse
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -42,6 +42,11 @@ nav:
         - Code2fn coverage reports: "coverage/code2fn_coverage/report.html"
         - Building docker images: "dev/docker.md"
         - Publishing an incremental release: "dev/creating-an-incremental-release.md"
+        - Adding a new model: "dev/adding_new_model.md"
+        - Adding a new tree-sitter frontend: "dev/adding_new_tree_sitter_frontend.md"
+        - Generating code2fn model coverage reports: "dev/generating_code2fn_model_coverage.md"
+        - Using code ingestion frontends: "dev/using_code_ingestion_frontends.md"
+        - Using tree-sitter preprocessor: "dev/using_tree_sitter_preprocessor.md"
     #- Getting Started: getting-started.md
     # - User Guide:
     #     - Installation: install.md