-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[documentation] Moving wiki pages to Github Pages site (#649)
## Moving wiki to Github Pages Moves the following pages from the Wiki to the Github Pages site: - https://github.com/ml4ai/skema/wiki/Adding-a-new-model-to-the-project - https://github.com/ml4ai/skema/wiki/Adding-support-for-a-new-tree%E2%80%90sitter-frontend - https://github.com/ml4ai/skema/wiki/Code-ingestion-front-ends - https://github.com/ml4ai/skema/wiki/Using-the-tree-sitter-preprocessor Additionally, adds a link to the WIP page https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report to keep track of it during development. Once the pages are moved over, the corresponding wiki pages will be deleted. Resolves #645
- Loading branch information
1 parent
d193a52
commit 12cb0ca
Showing
6 changed files
with
193 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
## Add Model to Google Drive | ||
### Original Model Source | ||
The full model clone, including all documentation and non source files should be uploaded to: | ||
```bash | ||
data/models/(climate|ecology|epidemiology|space_weather)/ | ||
``` | ||
### Zip Archive | ||
A zip archive containing ONLY the source files should be uploaded to: | ||
```bash | ||
data/models/zip-archives/ | ||
``` | ||
If using a MACOS system, it may be better to automate the generation of this zip archive using a script. See https://github.com/ml4ai/skema/issues/599 | ||
|
||
## Add model to artifacts.askem.lum.ai bucket | ||
Currently this step is done automatically. Model archives are mirrored once a week to artifcts.askem.lum.ai. | ||
|
||
## Updating models.yaml | ||
Add an entry to ```skema/program_analysis/model_coverage_report/models.yaml``` for the model. | ||
```YAML | ||
Example-Model: | ||
zip_archive: "https://pathtozip.com" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
## Language support | ||
If the language you wish to use does not already have a [Tree-sitter parser](https://tree-sitter.github.io/tree-sitter/#parsers), you can create it with a grammar for that language. | ||
## Building the Tree-sitter parser | ||
**Requirements**: | ||
* A GitHub repository with a grammar file named `grammar.js` for the language you wish to support. | ||
* Tree-sitter also support writing your own grammar file from scratch with the steps [shown here.](https://tree-sitter.github.io/tree-sitter/creating-parsers) | ||
|
||
**Steps**: | ||
1. In directory `skema/program_analysis/tree_sitter_parsers/` do the following: | ||
2. Add new entry to `languages.yaml` | ||
```yaml | ||
matlab: | ||
tree_sitter_name: tree-sitter-matlab | ||
clone_url: https://github.com/acristoffers/tree-sitter-matlab.git | ||
supports_comment_extraction: True | ||
supports_fn_extraction: True | ||
extensions: | ||
- .m | ||
``` | ||
3. Run `build_parsers.py`. Adding an entry to `languages.yaml` will automatically create a new command line argument for `build_parsers.py`. | ||
```bash | ||
python build_languages.py --matlab | ||
``` | ||
If successful, a build directory will have been created with a language object file `installed_languages.so` | ||
|
||
## Using the tree-sitter parser | ||
**Requirements:** | ||
* Tree-sitter language object file built using above steps | ||
|
||
**Steps:** | ||
1. Import the path to the tree-sitter library. | ||
```python | ||
from skema.program_analysis.CAST.tree_sitter_parsers.build_parsers import INSTALLED_LANGUAGES_FILEPATH | ||
``` | ||
2. Create the Language object. This is used for parsing or running queries. | ||
```python | ||
language_object = Language(INSTALLED_LANGUAGES_FILEPATH, "matlab") | ||
``` | ||
3. Parse the source code using the language object created above. Note that the source code needs to be a bytes object rather than a string. | ||
```python | ||
parser = Parser() | ||
parser.set_language(language_object) | ||
tree = parser.parse(bytes(source, "utf8")) | ||
``` | ||
|
||
## Notes on walking tree-sitter Tree | ||
* Running parse will create a Tree of Node objects with the root node stored at tree.root_node. | ||
* Node objects only contain the fields `type`, `children`, `start_point`, `end_point`. To get the actual string identifier of a node, you need to infer it from the source code and the source reference information. The following is the implementation that the Fortran frontend uses. | ||
```python | ||
def get_identifier(self, node: Node, source: str) -> str: | ||
"""Given a node, return the identifier it represents. ie. The code between node.start_point and node.end_point""" | ||
line_num = 0 | ||
column_num = 0 | ||
in_identifier = False | ||
identifier = "" | ||
for i, char in enumerate(source): | ||
if line_num == node.start_point[0] and column_num == node.start_point[1]: | ||
in_identifier = True | ||
elif line_num == node.end_point[0] and column_num == node.end_point[1]: | ||
break | ||
if char == "\n": | ||
line_num += 1 | ||
column_num = 0 | ||
else: | ||
column_num += 1 | ||
if in_identifier: | ||
identifier += char | ||
return identifier | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Generating code2fn model coverage reports | ||
WIP: [https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report](https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
## multi_file_ingester | ||
### Command line arguments | ||
- **sysname (str)** - The name of the system being ingested | ||
- **path (str)** - The path to the root of the system | ||
- **files (str)** - The path to system_filepaths.txt | ||
### system_filepaths.txt | ||
Processing a multi-file system requires a system_filepaths.txt file describing the structure of the system. Each line represents the path to one file in the system relative to the root directory. For example the system_filepaths.txt file for chime_penn_full would be: | ||
``` | ||
cli.py | ||
constants.py | ||
model/parameters.py | ||
model/sir.py | ||
model/validators/base.py | ||
model/validators/validators.py | ||
``` | ||
### Running as script | ||
```bash | ||
python multi_file_ingester.py --sysname "CHIME" --path /path/to/root --files /path/to/system_filepaths.txt | ||
``` | ||
### Running as library | ||
```python | ||
from skema.program_analysis.multi_file_ingester import process_file_system | ||
gromet_collection = process_file_system("CHIME", "data/chime/", "data/chime/system_filepaths.txt", write_to_file=True) | ||
``` | ||
|
||
## single_file_ingester | ||
### Command line arguments | ||
- **path (str)** - The relative or absolute path of the file to process" | ||
### Running as script | ||
```bash | ||
python single_file_ingester.py data/TIEGCM/cpktkm.F | ||
``` | ||
### Running as library | ||
```python | ||
from skema.program_analysis.single_file_ingester import process_file | ||
gromet_collection = process_file("cpktkm.F", write_to_file=True) | ||
``` | ||
|
||
## snippet_file_ingester | ||
### Command line arguments | ||
- **snippet(str)** - The snippet of Python/Fortran code to process" | ||
- **extension(str)** - A file extension representing the language of the code snippet(.f95, .f, .py)" | ||
### Running as script | ||
```bash | ||
python snippet_file_ingester.py "x=2" ".py" | ||
``` | ||
### Running as library | ||
```python | ||
from skema.program_analysis.snippet_file_ingester import process_snippet | ||
gromet_collection = process_snippet("x=2", ".py", write_to_file=True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# tree-sitter Fortran preprocessor | ||
## Command line options | ||
### Required | ||
- **source_path (str)** - The path to the Fortran source file that the preprocessor will be run on | ||
- **out_path (str)** - The path to the directory where intermediate products will be stored | ||
### Optional | ||
- **overwrite (bool)** - If True, overwrite the files in the directory specified by out_path | ||
- **out_missing_includes (bool)** - If True, output report of missing included files | ||
- **out_gcc (bool)** - If True, output intermediate product generated by GCC | ||
- **out_unsupported (bool)** - If True, output report of unsupported idioms contained in the source | ||
- **out_corrected (bool)** - If True, output the final source that will be sent to tree-sitter. | ||
- **out_parse (bool)** - If True, output the tree-sitter parse tree in sexp format | ||
## Preprocessing #include directives | ||
To handle the include directive, the preprocessor requires a directory containing any additional files that a source file includes. | ||
|
||
This directory should be located in the same directory as the source at source_path and follow the naming structure *include_filename*. | ||
|
||
For example, in the source file: | ||
**cons.F** | ||
|
||
#include <defs.h> | ||
|
||
The directory structure should look like the following: | ||
|
||
**TIE_GCM | ||
  cons.F | ||
  include_cons | ||
   defs.h** | ||
|
||
## Running as library | ||
|
||
```python | ||
from skema.program_analysis.TS2CAST.preprocessor.preprocess import preprocess | ||
|
||
parse_tree = preprocess("skema/data/TIE_GCM/cons.F", "skema/data/TIE_GCM/intermediate_products_cons/", out_parse=True) | ||
``` | ||
|
||
## Running as script | ||
|
||
```bash | ||
python preproces.py skema/data/TIE_GCM/cons.F skema/data/TIE_GCM/intermediate_prodcuts_cons/ --out_parse | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters