Skip to content

Commit

Permalink
[documentation] Moving wiki pages to Github Pages site (#649)
Browse files Browse the repository at this point in the history
## Moving wiki to Github Pages
Moves the following pages from the Wiki to the Github Pages site:
- https://github.com/ml4ai/skema/wiki/Adding-a-new-model-to-the-project
-
https://github.com/ml4ai/skema/wiki/Adding-support-for-a-new-tree%E2%80%90sitter-frontend
- https://github.com/ml4ai/skema/wiki/Code-ingestion-front-ends
- https://github.com/ml4ai/skema/wiki/Using-the-tree-sitter-preprocessor

Additionally, adds a link to the WIP page
https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report
to keep track of it during development.

Once the pages are moved over, the corresponding wiki pages will be
deleted.

Resolves #645
  • Loading branch information
vincentraymond-ua authored Nov 15, 2023
1 parent d193a52 commit 12cb0ca
Show file tree
Hide file tree
Showing 6 changed files with 193 additions and 0 deletions.
22 changes: 22 additions & 0 deletions docs/dev/adding_new_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Add Model to Google Drive
### Original Model Source
The full model clone, including all documentation and non source files should be uploaded to:
```bash
data/models/(climate|ecology|epidemiology|space_weather)/
```
### Zip Archive
A zip archive containing ONLY the source files should be uploaded to:
```bash
data/models/zip-archives/
```
If using a MACOS system, it may be better to automate the generation of this zip archive using a script. See https://github.com/ml4ai/skema/issues/599

## Add model to artifacts.askem.lum.ai bucket
Currently this step is done automatically. Model archives are mirrored once a week to artifcts.askem.lum.ai.

## Updating models.yaml
Add an entry to ```skema/program_analysis/model_coverage_report/models.yaml``` for the model.
```YAML
Example-Model:
zip_archive: "https://pathtozip.com"
```
72 changes: 72 additions & 0 deletions docs/dev/adding_new_tree_sitter_frontend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
## Language support
If the language you wish to use does not already have a [Tree-sitter parser](https://tree-sitter.github.io/tree-sitter/#parsers), you can create it with a grammar for that language.
## Building the Tree-sitter parser
**Requirements**:
* A GitHub repository with a grammar file named `grammar.js` for the language you wish to support.
* Tree-sitter also support writing your own grammar file from scratch with the steps [shown here.](https://tree-sitter.github.io/tree-sitter/creating-parsers)

**Steps**:
1. In directory `skema/program_analysis/tree_sitter_parsers/` do the following:
2. Add new entry to `languages.yaml`
```yaml
matlab:
tree_sitter_name: tree-sitter-matlab
clone_url: https://github.com/acristoffers/tree-sitter-matlab.git
supports_comment_extraction: True
supports_fn_extraction: True
extensions:
- .m
```
3. Run `build_parsers.py`. Adding an entry to `languages.yaml` will automatically create a new command line argument for `build_parsers.py`.
```bash
python build_languages.py --matlab
```
If successful, a build directory will have been created with a language object file `installed_languages.so`

## Using the tree-sitter parser
**Requirements:**
* Tree-sitter language object file built using above steps

**Steps:**
1. Import the path to the tree-sitter library.
```python
from skema.program_analysis.CAST.tree_sitter_parsers.build_parsers import INSTALLED_LANGUAGES_FILEPATH
```
2. Create the Language object. This is used for parsing or running queries.
```python
language_object = Language(INSTALLED_LANGUAGES_FILEPATH, "matlab")
```
3. Parse the source code using the language object created above. Note that the source code needs to be a bytes object rather than a string.
```python
parser = Parser()
parser.set_language(language_object)
tree = parser.parse(bytes(source, "utf8"))
```

## Notes on walking tree-sitter Tree
* Running parse will create a Tree of Node objects with the root node stored at tree.root_node.
* Node objects only contain the fields `type`, `children`, `start_point`, `end_point`. To get the actual string identifier of a node, you need to infer it from the source code and the source reference information. The following is the implementation that the Fortran frontend uses.
```python
def get_identifier(self, node: Node, source: str) -> str:
"""Given a node, return the identifier it represents. ie. The code between node.start_point and node.end_point"""
line_num = 0
column_num = 0
in_identifier = False
identifier = ""
for i, char in enumerate(source):
if line_num == node.start_point[0] and column_num == node.start_point[1]:
in_identifier = True
elif line_num == node.end_point[0] and column_num == node.end_point[1]:
break
if char == "\n":
line_num += 1
column_num = 0
else:
column_num += 1
if in_identifier:
identifier += char
return identifier
```
2 changes: 2 additions & 0 deletions docs/dev/generating_code2fn_model_coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Generating code2fn model coverage reports
WIP: [https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report](https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report).
50 changes: 50 additions & 0 deletions docs/dev/using_code_ingestion_frontends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## multi_file_ingester
### Command line arguments
- **sysname (str)** - The name of the system being ingested
- **path (str)** - The path to the root of the system
- **files (str)** - The path to system_filepaths.txt
### system_filepaths.txt
Processing a multi-file system requires a system_filepaths.txt file describing the structure of the system. Each line represents the path to one file in the system relative to the root directory. For example the system_filepaths.txt file for chime_penn_full would be:
```
cli.py
constants.py
model/parameters.py
model/sir.py
model/validators/base.py
model/validators/validators.py
```
### Running as script
```bash
python multi_file_ingester.py --sysname "CHIME" --path /path/to/root --files /path/to/system_filepaths.txt
```
### Running as library
```python
from skema.program_analysis.multi_file_ingester import process_file_system
gromet_collection = process_file_system("CHIME", "data/chime/", "data/chime/system_filepaths.txt", write_to_file=True)
```

## single_file_ingester
### Command line arguments
- **path (str)** - The relative or absolute path of the file to process"
### Running as script
```bash
python single_file_ingester.py data/TIEGCM/cpktkm.F
```
### Running as library
```python
from skema.program_analysis.single_file_ingester import process_file
gromet_collection = process_file("cpktkm.F", write_to_file=True)
```

## snippet_file_ingester
### Command line arguments
- **snippet(str)** - The snippet of Python/Fortran code to process"
- **extension(str)** - A file extension representing the language of the code snippet(.f95, .f, .py)"
### Running as script
```bash
python snippet_file_ingester.py "x=2" ".py"
```
### Running as library
```python
from skema.program_analysis.snippet_file_ingester import process_snippet
gromet_collection = process_snippet("x=2", ".py", write_to_file=True)
42 changes: 42 additions & 0 deletions docs/dev/using_tree_sitter_preprocessor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# tree-sitter Fortran preprocessor
## Command line options
### Required
- **source_path (str)** - The path to the Fortran source file that the preprocessor will be run on
- **out_path (str)** - The path to the directory where intermediate products will be stored
### Optional
- **overwrite (bool)** - If True, overwrite the files in the directory specified by out_path
- **out_missing_includes (bool)** - If True, output report of missing included files
- **out_gcc (bool)** - If True, output intermediate product generated by GCC
- **out_unsupported (bool)** - If True, output report of unsupported idioms contained in the source
- **out_corrected (bool)** - If True, output the final source that will be sent to tree-sitter.
- **out_parse (bool)** - If True, output the tree-sitter parse tree in sexp format
## Preprocessing #include directives
To handle the include directive, the preprocessor requires a directory containing any additional files that a source file includes.

This directory should be located in the same directory as the source at source_path and follow the naming structure *include_filename*.

For example, in the source file:
**cons.F**

#include <defs.h>

The directory structure should look like the following:

**TIE_GCM
&emsp; cons.F
&emsp; include_cons
&emsp;&emsp; defs.h**

## Running as library

```python
from skema.program_analysis.TS2CAST.preprocessor.preprocess import preprocess

parse_tree = preprocess("skema/data/TIE_GCM/cons.F", "skema/data/TIE_GCM/intermediate_products_cons/", out_parse=True)
```

## Running as script

```bash
python preproces.py skema/data/TIE_GCM/cons.F skema/data/TIE_GCM/intermediate_prodcuts_cons/ --out_parse
```
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ nav:
- Code2fn coverage reports: "coverage/code2fn_coverage/report.html"
- Building docker images: "dev/docker.md"
- Publishing an incremental release: "dev/creating-an-incremental-release.md"
- Adding a new model: "dev/adding_new_model.md"
- Adding a new tree-sitter frontend: "dev/adding_new_tree_sitter_frontend.md"
- Generating code2fn model coverage reports: "dev/generating_code2fn_model_coverage.md"
- Using code ingestion frontends: "dev/using_code_ingestion_frontends.md"
- Using tree-sitter preprocessor: "dev/using_tree_sitter_preprocessor.md"
#- Getting Started: getting-started.md
# - User Guide:
# - Installation: install.md
Expand Down

0 comments on commit 12cb0ca

Please sign in to comment.