Merge pull request #92 from nasa-jpl/dev

Merge dev into main to release v1.2.0
nasa-jpl · Feb 1, 2022 · 6b37e93 · 6b37e93
2 parents d196b4e + 5e04005
commit 6b37e93
Show file tree

Hide file tree

Showing 87 changed files with 268 additions and 37,004 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,13 @@
 # Change Log
 
+## [v1.2.0](https://github.com/nasa-jpl/ASSESS/tree/v1.2.0) (2022-01-31)
+- Refactor codebase entirely, removing old code from dashboard, outdated ML, outdated extractors, and consolidate the code that's being used.
+- Add increased API capabilities, allow for ML training, and new FAIS vector library.
+- Increase search capabilites. 
+- Dockerize everything.
+- Allow app to use only Elasticsearch.
+- Use different data sources and allow for bulk ingestion.
+
 ## [v1.1.0](https://github.com/nasa-jpl/ASSESS/tree/v1.1.0) (2019-09-26)
 - Improve underlying ASSESS algorithm (run-time, complexity, extraction, interoperability). See issues [56](https://github.com/nasa-jpl/ASSESS/issues/56) and [48](https://github.com/nasa-jpl/ASSESS/issues/48).
 - Upgrade to Python 3. See issue [44](https://github.com/nasa-jpl/ASSESS/issues/44).

diff --git a/README.md b/README.md
@@ -1,20 +1,22 @@
 # Automatic Semantic Search Engine for Suitable Standards
 
+ASSESS allows you to run an API server that performs document similarity for large troves of text documents as well as manage an application pipeline that allows for ingestion, search, inspection, deletion, training, logging, and editing documents. 
+
+**The problem**: Given an SoW, the goal is to produce standards that may be related to that SoW. 
+
+To understand the backend code, view the API in [main.py](https://github.com/nasa-jpl/ASSESS/blob/master/api/main.py)
+
+To understand the ML code, view [ml-core.py](https://github.com/nasa-jpl/ASSESS/blob/master/api/ml-core.py)
+
 ## Getting Started
 
 There are a few main components to ASSESS:
-
-- A React front-end
 - A FastAPI server
 - An Elastcisearch server with 3 data indices (main index, system logs, and user statistics)
 - Kibana for viewing data
+- A redis service for in-memory data storage and rate limiting
 
-`docker-compose.yml` shows the software stack. You can run the stack using `docker-compose up -d`. Please note, you need the Elasticsearch index data in order to actually have these components working.
-
-Make sure you edit `api/conf.yaml` with the correct server/port locations for elasticsearch.
-
-To understand the backend code, look at the API in [main.py](https://github.com/nasa-jpl/ASSESS/blob/master/api/main.py)
+Make sure you edit `api/conf.yaml` with the correct server/port locations for elasticsearch. `docker-compose.yml` shows the software stack. You can run the stack using `docker-compose up -d`. Please note, you need the corresponding feather data in order to actually have everything working and ingested into Elasticsearch
 
 ## Testing the stack
-
-You can test the Rest API with [assess_api_calls.py](https://github.com/nasa-jpl/ASSESS/blob/master/api/assess_api_calls.py)
+You can test the Rest API with [assess_api_calls.py](https://github.com/nasa-jpl/ASSESS/blob/master/api/scripts/assess_api_calls.py)
diff --git a/api/Dockerfile b/api/Dockerfile
@@ -40,8 +40,8 @@ RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
 RUN apt install -y python3.8
 RUN mv /usr/bin/python3.8 /usr/local/bin/python
 RUN python get-pip.py
-RUN python -m pip install --no-cache-dir -r requirements.txt 
-RUN python -m pip install --no-cache-dir -r ml_requirements.txt
+RUN python -m pip install --no-cache-dir -r requirements/requirements.txt 
+RUN python -m pip install --no-cache-dir -r requirements/ml_requirements.txt
 RUN pip3 install jpl.pipedreams==1.0.3
 RUN python -m spacy download en_core_web_sm
 RUN python -m pip install --no-cache-dir "uvicorn[standard]" gunicorn fastapi

diff --git a/api/conf.yaml b/api/conf.yaml
@@ -4,7 +4,7 @@ password:
 url: http://localhost:8080
 df_paths:
   - "data/source_1"
- # - "data/source_2"
+  - "data/source_2"
 es_server:
   #- localhost
   - elasticsearch

diff --git a/api/data/README.md b/api/data/README.md
@@ -1,4 +1,4 @@
 # Instructions
 
-Place the dataframe of the feather file here. This gets bound to the Docker container and used by the machine learning algorithm.
+Place the dataframe of the feather file here. Make sure your api/config.yaml is pointing to these files. This gets ingested into the elasticsearch Docker container and used by the machine learning algorithm.
 
diff --git a/api/gunicorn_conf.py b/api/gunicorn_conf.py
@@ -32,9 +32,9 @@
 use_accesslog = accesslog_var or None
 errorlog_var = os.getenv("ERROR_LOG", "-")
 use_errorlog = errorlog_var or None
-graceful_timeout_str = os.getenv("GRACEFUL_TIMEOUT", "1400")
-timeout_str = os.getenv("TIMEOUT", "1400")
-keepalive_str = os.getenv("KEEP_ALIVE", "10")
+graceful_timeout_str = os.getenv("GRACEFUL_TIMEOUT", "3600")
+timeout_str = os.getenv("TIMEOUT", "3600")
+keepalive_str = os.getenv("KEEP_ALIVE", "3600")
 
 # Gunicorn config variables
 loglevel = use_loglevel