Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[documentation] Moving wiki pages to Github Pages site #649

Merged
merged 5 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docs/dev/adding_new_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Add Model to Google Drive
### Original Model Source
The full model clone, including all documentation and non source files should be uploaded to:
```bash
data/models/(climate|ecology|epidemiology|space_weather)/
```
### Zip Archive
A zip archive containing ONLY the source files should be uploaded to:
```bash
data/models/zip-archives/
```
If using a MACOS system, it may be better to automate the generation of this zip archive using a script. See https://github.com/ml4ai/skema/issues/599

## Add model to artifacts.askem.lum.ai bucket
Currently this step is done automatically. Model archives are mirrored once a week to artifcts.askem.lum.ai.

## Updating models.yaml
Add an entry to ```skema/program_analysis/model_coverage_report/models.yaml``` for the model.
```YAML
Example-Model:
zip_archive: "https://pathtozip.com"
```
72 changes: 72 additions & 0 deletions docs/dev/adding_new_tree_sitter_frontend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
## Language support
If the language you wish to use does not already have a [Tree-sitter parser](https://tree-sitter.github.io/tree-sitter/#parsers), you can create it with a grammar for that language.
## Building the Tree-sitter parser
**Requirements**:
* A GitHub repository with a grammar file named `grammar.js` for the language you wish to support.
* Tree-sitter also support writing your own grammar file from scratch with the steps [shown here.](https://tree-sitter.github.io/tree-sitter/creating-parsers)

**Steps**:
1. In directory `skema/program_analysis/tree_sitter_parsers/` do the following:
2. Add new entry to `languages.yaml`
```yaml
matlab:
tree_sitter_name: tree-sitter-matlab
clone_url: https://github.com/acristoffers/tree-sitter-matlab.git
supports_comment_extraction: True
supports_fn_extraction: True
extensions:
- .m
```
3. Run `build_parsers.py`. Adding an entry to `languages.yaml` will automatically create a new command line argument for `build_parsers.py`.
```bash
python build_languages.py --matlab
```
If successful, a build directory will have been created with a language object file `installed_languages.so`

## Using the tree-sitter parser
**Requirements:**
* Tree-sitter language object file built using above steps

**Steps:**
1. Import the path to the tree-sitter library.
```python
from skema.program_analysis.CAST.tree_sitter_parsers.build_parsers import INSTALLED_LANGUAGES_FILEPATH
```
2. Create the Language object. This is used for parsing or running queries.
```python
language_object = Language(INSTALLED_LANGUAGES_FILEPATH, "matlab")
```
3. Parse the source code using the language object created above. Note that the source code needs to be a bytes object rather than a string.
```python
parser = Parser()
parser.set_language(language_object)
tree = parser.parse(bytes(source, "utf8"))
```

## Notes on walking tree-sitter Tree
* Running parse will create a Tree of Node objects with the root node stored at tree.root_node.
* Node objects only contain the fields `type`, `children`, `start_point`, `end_point`. To get the actual string identifier of a node, you need to infer it from the source code and the source reference information. The following is the implementation that the Fortran frontend uses.
```python
def get_identifier(self, node: Node, source: str) -> str:
"""Given a node, return the identifier it represents. ie. The code between node.start_point and node.end_point"""
line_num = 0
column_num = 0
in_identifier = False
identifier = ""
for i, char in enumerate(source):
if line_num == node.start_point[0] and column_num == node.start_point[1]:
in_identifier = True
elif line_num == node.end_point[0] and column_num == node.end_point[1]:
break

if char == "\n":
line_num += 1
column_num = 0
else:
column_num += 1

if in_identifier:
identifier += char

return identifier
```
2 changes: 2 additions & 0 deletions docs/dev/generating_code2fn_model_coverage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Generating code2fn model coverage reports
WIP: [https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report](https://github.com/ml4ai/skema/wiki/Generating-Code2fn-Coverage-Report).
50 changes: 50 additions & 0 deletions docs/dev/using_code_ingestion_frontends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## multi_file_ingester
### Command line arguments
- **sysname (str)** - The name of the system being ingested
- **path (str)** - The path to the root of the system
- **files (str)** - The path to system_filepaths.txt
### system_filepaths.txt
Processing a multi-file system requires a system_filepaths.txt file describing the structure of the system. Each line represents the path to one file in the system relative to the root directory. For example the system_filepaths.txt file for chime_penn_full would be:
```
cli.py
constants.py
model/parameters.py
model/sir.py
model/validators/base.py
model/validators/validators.py
```
### Running as script
```bash
python multi_file_ingester.py --sysname "CHIME" --path /path/to/root --files /path/to/system_filepaths.txt
```
### Running as library
```python
from skema.program_analysis.multi_file_ingester import process_file_system
gromet_collection = process_file_system("CHIME", "data/chime/", "data/chime/system_filepaths.txt", write_to_file=True)
```

## single_file_ingester
### Command line arguments
- **path (str)** - The relative or absolute path of the file to process"
### Running as script
```bash
python single_file_ingester.py data/TIEGCM/cpktkm.F
```
### Running as library
```python
from skema.program_analysis.single_file_ingester import process_file
gromet_collection = process_file("cpktkm.F", write_to_file=True)
```

## snippet_file_ingester
### Command line arguments
- **snippet(str)** - The snippet of Python/Fortran code to process"
- **extension(str)** - A file extension representing the language of the code snippet(.f95, .f, .py)"
### Running as script
```bash
python snippet_file_ingester.py "x=2" ".py"
```
### Running as library
```python
from skema.program_analysis.snippet_file_ingester import process_snippet
gromet_collection = process_snippet("x=2", ".py", write_to_file=True)
42 changes: 42 additions & 0 deletions docs/dev/using_tree_sitter_preprocessor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# tree-sitter Fortran preprocessor
## Command line options
### Required
- **source_path (str)** - The path to the Fortran source file that the preprocessor will be run on
- **out_path (str)** - The path to the directory where intermediate products will be stored
### Optional
- **overwrite (bool)** - If True, overwrite the files in the directory specified by out_path
- **out_missing_includes (bool)** - If True, output report of missing included files
- **out_gcc (bool)** - If True, output intermediate product generated by GCC
- **out_unsupported (bool)** - If True, output report of unsupported idioms contained in the source
- **out_corrected (bool)** - If True, output the final source that will be sent to tree-sitter.
- **out_parse (bool)** - If True, output the tree-sitter parse tree in sexp format
## Preprocessing #include directives
To handle the include directive, the preprocessor requires a directory containing any additional files that a source file includes.

This directory should be located in the same directory as the source at source_path and follow the naming structure *include_filename*.

For example, in the source file:
**cons.F**

#include <defs.h>

The directory structure should look like the following:

**TIE_GCM
&emsp; cons.F
&emsp; include_cons
&emsp;&emsp; defs.h**

## Running as library

```python
from skema.program_analysis.TS2CAST.preprocessor.preprocess import preprocess

parse_tree = preprocess("skema/data/TIE_GCM/cons.F", "skema/data/TIE_GCM/intermediate_products_cons/", out_parse=True)
```

## Running as script

```bash
python preproces.py skema/data/TIE_GCM/cons.F skema/data/TIE_GCM/intermediate_prodcuts_cons/ --out_parse
```
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ nav:
- Code2fn coverage reports: "coverage/code2fn_coverage/report.html"
- Building docker images: "dev/docker.md"
- Publishing an incremental release: "dev/creating-an-incremental-release.md"
- Adding a new model: "dev/adding_new_model.md"
- Adding a new tree-sitter frontend: "dev/adding_new_tree_sitter_frontend.md"
- Generating code2fn model coverage reports: "dev/generating_code2fn_model_coverage.md"
- Using code ingestion frontends: "dev/using_code_ingestion_frontends.md"
- Using tree-sitter preprocessor: "dev/using_tree_sitter_preprocessor.md"
#- Getting Started: getting-started.md
# - User Guide:
# - Installation: install.md
Expand Down
Loading