Skip to content

Commit

Permalink
Merge branches 'main' and 'main' of https://github.com/derrickmuheki/…
Browse files Browse the repository at this point in the history
  • Loading branch information
derrickmuheki committed Nov 29, 2024
2 parents 8577f30 + 9c0a3cd commit 20ed839
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 6 deletions.
35 changes: 29 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@

Here we present MeteoSaver v1.0, a machine-learning based software for the transcription of historical weather data.

### Note: This README is still under development. However the code/scripts are up-to-date

![](docs/data_rescue_flowchart.png)


## Directory structure

Below is the structure for this project.
Expand Down Expand Up @@ -41,7 +36,7 @@ Below is the structure for this project.
|
|
├── src <- Modules (2-6): Transcribing code/scripts for MeteoSaver v1.0
│ ├── main.py <- Main script to run all the modules 1-6 of MetoSaver (scripts)
│ ├── main.py <- Main script to run all the modules 1-6 of MeteoSaver (scripts)
| | i.e. in order (i) configuration, (iI) image-preprocessing module, (iii) table and cell
| | detection model, (iv) transcription, (v) quality assessment and control,
| | and (vi) data formatting and upload
Expand Down Expand Up @@ -120,6 +115,34 @@ To spin up a docker image using:
docker run -it -v /local_data:/docker_data_dir transcribing_drc_data_environment
```

## Modules
The figure below represents the modules in MeteoSaver v1.0

![Schematic representation of the modules in MeteoSaver v1.0](https://github.com/VUB-HYDR/MeteoSaver/blob/6a5238af498088d58173940e04fe8e5cf66567be/docs/Schematic%20representation%20of%20the%20modules%20in%20MeteoSaver%20v1.0.png)


## How to run MeteoSaver v1.0
After setting up the python environment using the [environment.yml](https://github.com/VUB-HYDR/MeteoSaver/blob/b8138fa5a23f4ce40603cae8defd82d10734fdbd/environment.yml) file available on this repository, input the following settings in the [configuration module](https://github.com/VUB-HYDR/MeteoSaver/blob/b8138fa5a23f4ce40603cae8defd82d10734fdbd/configuration.ini) specific to your case study (sheets) before running:
1. General: Here, you specify the environment in which the scripts will run, i.e. ```local``` (Sequential processing on a personal computer) or ```hpc``` (Parallel processing using multiple processors, suitable for High Performance Computing (HPC) environments). This is set to ```local``` by default
2. Directories. Here, you specify the directories for the following: (i) all historical weather data sheet images in folders per station, (ii) pre-QA/QC transcribed data, (iii) post-QA/QC transcribed data, (iv) the final refined daily hydroclimate data (after all quality checks), (v) transient transcription output during processing, (vi) manually transcribed data (used for validation), (vii) alidation results comparing manually transcribed and the MeteoSaver transcribed data, and (viii) all the stations metadata.
3. Table and Cell Detection: User specifications for table and cell detection.
4. Transcription: User specifications related to the Optical Character Recognition/Handwritten Text Recognition (OCR/HTR).
5. QA/QC: Here, you specify parts of the transcribed table on which to perform QA/QC checks.
6. Data Formatting: Here, you specify the location of the date information in the tables, used for formatting the transcribed data to time series in .xlsx and .tsv (Station Exchange Format).

After inputting the configuration settings specific to your case study (see Table below), you can then run the [main.py](https://github.com/VUB-HYDR/MeteoSaver/blob/7aeab0f526b44056c062407df7cfe467e20a67d8/src/main.py) script if using ```local```, or [job_script](https://github.com/VUB-HYDR/MeteoSaver/blob/aefb8d8068762c5a7715be24dfee835363673ae7/job_script.sh) in case of ```hpc``` architecture, which runs all the modules 1-6 of MeteoSaver i.e. in order (i) configuration, (iI) image-preprocessing module, (iii) table and cell detection model, (iv) transcription, (v) quality assessment and control, and (vi) data formatting and upload, and return results in the specified directories.

### Minimal Working Example (MWE)
You can run the entire script in this repository as a Minimal Working Example (MWE) without modifying any configuration settings. Simply set up the Python environment on your personal computer, i.e. ```local```, and execute the script using [main.py](https://github.com/VUB-HYDR/MeteoSaver/blob/7aeab0f526b44056c062407df7cfe467e20a67d8/src/main.py).


### Configuration user-settings
The figure below describes all the configuration user-settings.
![Configuration_user_settings](https://github.com/VUB-HYDR/MeteoSaver/blob/4ddd56d52b3dda19afc6227595eba0d6ca843c30/docs/Configuration%20user%20settings.png)




## Authors
Derrick Muheki

Expand Down
Binary file added docs/Configuration user settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified environment.yml
Binary file not shown.

0 comments on commit 20ed839

Please sign in to comment.