docs: Updated documentation for all the major scenarios. (#190)

* Fixed some bugs introduced during refactoring. * update data_agent_fin_doc * Updated documentation for the four major scenarios * feat: remove pdfs and enable online pdf readings (#183) * remove pdfs and enable online pdf readings * update doc format * use url as key * feat: add entry for rdagent. (#187) * Add entries * update entry for rdagent * lint * fix typo * docs: Demo links (#188) add demo links * fix: Fix a fail href in readme (#189) * fix a ci bug * doc * feat: remove pdfs and enable online pdf readings (#183) * remove pdfs and enable online pdf readings * update doc format * use url as key * feat: add entry for rdagent. (#187) * Add entries * update entry for rdagent * lint * fix typo * doc * Updated documentation for med_model scenarios. * fix a ci bug --------- Co-authored-by: Xu Yang <[email protected]> Co-authored-by: you-n-g <[email protected]> Co-authored-by: XianBW <[email protected]> Co-authored-by: SH-Src <[email protected]>
microsoft · Aug 9, 2024 · 4b1e8e1 · 4b1e8e1
1 parent 44f61bf
commit 4b1e8e1
Show file tree

Hide file tree

Showing 8 changed files with 301 additions and 136 deletions.
diff --git a/docs/scens/data_agent_fin.rst b/docs/scens/data_agent_fin.rst
@@ -10,13 +10,12 @@ Finance Data Agent
 
 📖 Background
 ~~~~~~~~~~~~~~
-In the dynamic world of quantitative trading, **factors** are the secret weapons that traders use to harness market inefficiencies. 
-
-These powerful tools—ranging from straightforward metrics like price-to-earnings ratios to intricate discounted cash flow models—unlock the potential to predict stock prices with remarkable precision. 
-By tapping into this rich vein of data, quantitative traders craft sophisticated strategies that not only capitalize on market patterns but also drastically enhance trading efficiency and accuracy. 
-
-Embrace the power of factors, and you're not just trading; you're strategically outsmarting the market.
+In the dynamic world of quantitative trading, **factors** serve as the strategic tools that enable traders to exploit market inefficiencies. 
+These factors—ranging from simple metrics like price-to-earnings ratios to complex models like discounted cash flows—are the key to predicting stock prices with a high degree of accuracy.
 
+By leveraging these factors, quantitative traders can develop sophisticated strategies that not only identify market patterns but also significantly enhance trading efficiency and precision. 
+The ability to systematically analyze and apply these factors is what separates ordinary trading from truly strategic market outmaneuvering.
+And this is where the **Finance Model Agent** comes into play.
 
 🎥 Demo
 ~~~~~~~~~~
@@ -82,7 +81,7 @@ You can try our demo by running the following command:
     - Create a new conda environment with Python (3.10 and 3.11 are well tested in our CI):
 
       .. code-block:: sh
-      
+
           conda create -n rdagent python=3.10
 
     - Activate the environment:
@@ -91,12 +90,12 @@ You can try our demo by running the following command:
 
           conda activate rdagent
 
-- 🛠️ Run Make Files
-    - Navigate to the directory containing the MakeFile and set up the development environment:
+- 📦 Install the RDAgent
+    - You can directly install the RDAgent package from PyPI:
 
       .. code-block:: sh
 
-          make dev
+          pip install rdagent
 
 - ⚙️ Environment Configuration
     - Place the `.env` file in the same directory as the `.env.example` file.
@@ -118,33 +117,12 @@ You can try our demo by running the following command:
 - **Env Config**
 
 The following environment variables can be set in the `.env` file to customize the application's behavior:
-    - **Path to the folder containing private data (default fundamental data in Qlib):**
-
-        .. code-block:: sh
-
-          FACTOR_CODER_DATA_FOLDER=/path/to/data/factor_implementation_source_data_all
-
-    - **Path to the folder containing partial private data (for debugging):**
-
-      .. code-block:: sh
-
-          FACTOR_CODER_DATA_FOLDER_DEBUG=/path/to/data/factor_implementation_source_data_debug
-
-    - **Maximum time (in seconds) for writing factor code:**
-
-      .. code-block:: sh
-
-          FACTOR_CODER_FILE_BASED_EXECUTION_TIMEOUT=300
-
-    - **Maximum number of factors to write in one experiment:**
-
-      .. code-block:: sh
-
-          FACTOR_CODER_SELECT_THRESHOLD=5
-
-    - **Number of developing loops for writing factors:**
-
-      .. code-block:: sh
 
-          FACTOR_CODER_MAX_LOOP=10
+.. autopydantic_settings:: rdagent.app.qlib_rd_loop.conf.FactorBasePropSetting
+    :settings-show-field-summary: False
+    :exclude-members: Config
 
+.. autopydantic_settings:: rdagent.components.coder.factor_coder.config.FactorImplementSettings
+    :settings-show-field-summary: False
+    :members: coder_use_cache, data_folder, data_folder_debug, cache_location, enable_execution_cache, file_based_execution_timeout, select_method, select_threshold, max_loop, knowledge_base_path, new_knowledge_base_path
+    :exclude-members: Config, fail_task_trial_limit, v1_query_former_trace_limit, v1_query_similar_success_limit, v2_query_component_limit, v2_query_error_limit, v2_query_former_trace_limit, v2_error_summary, v2_knowledge_sampler
diff --git a/docs/scens/data_copilot_fin.rst b/docs/scens/data_copilot_fin.rst
@@ -17,7 +17,7 @@ Furthermore, rather than hastily replicating factors from a report, it's essenti
 Does the factor capture the essential market dynamics? How unique is it compared to the factors already in your library?
 
 Therefore, there is an urgent need for a systematic approach to design a framework that can effectively manage this process. 
-This is where our RDAgent comes into play.
+And this is where the **Finance Data Copilot** steps in.
 
 
 🎥 Demo
@@ -91,12 +91,12 @@ You can try our demo by running the following command:
 
           conda activate rdagent
 
-- 🛠️ Run Make Files
-    - Navigate to the directory containing the MakeFile and set up the development environment:
+- 📦 Install the RDAgent
+    - You can directly install the RDAgent package from PyPI:
 
       .. code-block:: sh
 
-          make dev
+          pip install rdagent
 
 - ⚙️ Environment Configuration
     - Place the `.env` file in the same directory as the `.env.example` file.
@@ -105,11 +105,21 @@ You can try our demo by running the following command:
     - If you want to change the default environment variables, you can refer to `Env Config`_ below
 
 - 🚀 Run the Application
-    .. code-block:: sh
+    - Store the factors you want to extract from the financial reports in your desired folder. Then, save the paths of the reports in the `report_result_json_file_path`. The format should be as follows:
 
-        rdagent fin_factor_report
+      .. code-block:: json
 
+          [
+              "git_ignore_folder/report/fin_report1.pdf",
+              "git_ignore_folder/report/fin_report2.pdf",
+              "git_ignore_folder/report/fin_report3.pdf"
+          ]
 
+    - Run the application using the following command:
+
+      .. code-block:: sh
+
+          rdagent fin_factor_report
 
 🛠️ Usage of modules
 ~~~~~~~~~~~~~~~~~~~~~
@@ -119,32 +129,13 @@ You can try our demo by running the following command:
 - **Env Config**
 
 The following environment variables can be set in the `.env` file to customize the application's behavior:
-    - **Path to the folder containing research reports:**
-
-      .. code-block:: sh
-
-          QLIB_FACTOR_LOCAL_REPORT_PATH=/path/to/research/reports
-
-    - **Path to the JSON file listing research reports for factor extraction:**
-
-      .. code-block:: sh
-
-          QLIB_FACTOR_REPORT_RESULT_JSON_FILE_PATH=/path/to/reports/list.json
 
-    - **Maximum time (in seconds) for writing factor code:**
-
-      .. code-block:: sh
-
-          FACTOR_CODER_FILE_BASED_EXECUTION_TIMEOUT=300
-
-    - **Maximum number of factors to write in one experiment:**
-
-      .. code-block:: sh
-
-          FACTOR_CODER_SELECT_THRESHOLD=5
-
-    - **Number of developing loops for writing factors:**
-
-      .. code-block:: sh
+.. autopydantic_settings:: rdagent.app.qlib_rd_loop.conf.FactorFromReportPropSetting
+    :settings-show-field-summary: False
+    :show-inheritance:
+    :exclude-members: Config
 
-          FACTOR_CODER_MAX_LOOP=10
+.. autopydantic_settings:: rdagent.components.coder.factor_coder.config.FactorImplementSettings
+    :settings-show-field-summary: False
+    :members: coder_use_cache, data_folder, data_folder_debug, cache_location, enable_execution_cache, file_based_execution_timeout, select_method, select_threshold, max_loop, knowledge_base_path, new_knowledge_base_path
+    :exclude-members: Config, python_bin, fail_task_trial_limit, v1_query_former_trace_limit, v1_query_similar_success_limit, v2_query_component_limit, v2_query_error_limit, v2_query_former_trace_limit, v2_error_summary, v2_knowledge_sampler
diff --git a/docs/scens/model_agent_fin.rst b/docs/scens/model_agent_fin.rst
@@ -9,7 +9,12 @@ Finance Model Agent
 
 📖 Background
 ~~~~~~~~~~~~~~
-TODO
+In the realm of quantitative finance, both factor discovery and model development play crucial roles in driving performance. 
+While much attention is often given to the discovery of new financial factors, the **models** that leverage these factors are equally important. 
+The effectiveness of a quantitative strategy depends not only on the factors used but also on how well these factors are integrated into robust, predictive models.
+
+However, the process of developing and optimizing these models can be labor-intensive and complex, requiring continuous refinement and adaptation to ever-changing market conditions. 
+And this is where the **Finance Model Agent** steps in.
 
 🎥 Demo
 ~~~~~~~~~~
@@ -19,9 +24,9 @@ TODO: Here should put a video of the demo.
 🌟 Introduction
 ~~~~~~~~~~~~~~~~
 
-In this scenario, our automated system proposes hypothesis, constructs model, implements code, receives back-testing, and uses feedbacks. 
-Hypothesis is iterated in this continuous process. 
-The system aims to automatically optimise performance metrics from Qlib library thereby finding the optimised code through autonomous research and development.
+In this scenario, our automated system proposes hypothesis, constructs model, implements code, conducts back-testing, and utilizes feedback in a continuous, iterative process.
+
+The goal is to automatically optimize performance metrics within the Qlib library, ultimately discovering the most efficient code through autonomous research and development.
 
 Here's an enhanced outline of the steps:
 
@@ -84,22 +89,58 @@ You can try our demo by running the following command:
 
           conda activate rdagent
 
-- 🛠️ Run Make Files
-    - Navigate to the directory containing the MakeFile and set up the development environment:
+- 📦 Install the RDAgent
+    - You can directly install the RDAgent package from PyPI:
 
       .. code-block:: sh
 
-          make dev
+          pip install rdagent
 
 - ⚙️ Environment Configuration
     - Place the `.env` file in the same directory as the `.env.example` file.
         - The `.env.example` file contains the environment variables required for users using the OpenAI API (Please note that `.env.example` is an example file. `.env` is the one that will be finally used.)
 
+    - Export each variable in the .env file:
+
+      .. code-block:: sh
+
+          export $(grep -v '^#' .env | xargs)
+    
+    - If you want to change the default environment variables, you can refer to `Env Config`_ below
+
 - 🚀 Run the Application
     .. code-block:: sh
 
         rdagent fin_model
 
 🛠️ Usage of modules
 ~~~~~~~~~~~~~~~~~~~~~
-TODO: Show some examples:
+
+.. _Env Config: 
+
+- **Env Config**
+
+The following environment variables can be set in the `.env` file to customize the application's behavior:
+
+.. autopydantic_settings:: rdagent.app.qlib_rd_loop.conf.ModelBasePropSetting
+    :settings-show-field-summary: False
+    :exclude-members: Config
+
+- **Qlib Config**
+    - The `config.yaml` file located in the `model_template` folder contains the relevant configurations for running the developed model in Qlib. The default settings include key information such as:
+        - **market**: Specifies the market, which is set to `csi300`.
+        - **fields_group**: Defines the fields group, with the value `feature`.
+        - **col_list**: A list of columns used, including various indicators such as `RESI5`, `WVMA5`, `RSQR5`, and others.
+        - **start_time**: The start date for the data, set to `2008-01-01`.
+        - **end_time**: The end date for the data, set to `2020-08-01`.
+        - **fit_start_time**: The start date for fitting the model, set to `2008-01-01`.
+        - **fit_end_time**: The end date for fitting the model, set to `2014-12-31`.
+
+    - The default hyperparameters used in the configuration are as follows:
+        - **n_epochs**: The number of epochs, set to `100`.
+        - **lr**: The learning rate, set to `1e-3`.
+        - **early_stop**: The early stopping criterion, set to `10`.
+        - **batch_size**: The batch size, set to `2000`.
+        - **metric**: The evaluation metric, set to `loss`.
+        - **loss**: The loss function, set to `mse`.
+        - **n_jobs**: The number of parallel jobs, set to `20`.
diff --git a/docs/scens/model_agent_med.rst b/docs/scens/model_agent_med.rst
@@ -1,5 +1,111 @@
 .. _model_agent_med:
 
-===================
+=======================
 Medical Model Agent
-===================
+=======================
+
+**🤖 Automated Medical Predtion Model Evolution**
+------------------------------------------------------------------------------------------
+
+📖 Background
+~~~~~~~~~~~~~~
+In this scenario, we consider the problem of risk prediction from patients' ICU monitoring data. We use the a public EHR dataset - MIMIC-III and extract a binary classification task for evaluating the framework.
+In this task, we aim at predicting the whether the patients will suffer from Acute Respiratory Failure (ARF) based their first 12 hours ICU monitoring data. 
+
+🎥 Demo
+~~~~~~~~~~
+TODO: Here should put a video of the demo.
+
+
+🌟 Introduction
+~~~~~~~~~~~~~~~~
+
+In this scenario, our automated system proposes hypothesis, constructs model, implements code, receives back-testing, and uses feedbacks. 
+Hypothesis is iterated in this continuous process. 
+The system aims to automatically optimise performance metrics of medical prediction thereby finding the optimised code through autonomous research and development.
+
+Here's an enhanced outline of the steps:
+
+**Step 1 : Hypothesis Generation 🔍**
+
+- Generate and propose initial hypotheses based on previous experiment analysis and domain expertise, with thorough reasoning and justification.
+
+**Step 2 : Model Creation ✨**
+
+- Transform the hypothesis into a model.
+- Develop, define, and implement a machine learning model, including its name, description, and formulation.
+
+**Step 3 : Model Implementation 👨‍💻**
+
+- Implement the model code based on the detailed description.
+- Evolve the model iteratively as a developer would, ensuring accuracy and efficiency.
+
+**Step 4 : Backtesting with MIMIC-III 📉**
+
+- Conduct backtesting using the newly developed model on the extracted task from MIMIC-III.
+- Evaluate the model's effectiveness and performance in terms of AUROC score.
+
+**Step 5 : Feedback Analysis 🔍**
+
+- Analyze backtest results to assess performance.
+- Incorporate feedback to refine hypotheses and improve the model.
+
+**Step 6 :Hypothesis Refinement ♻️**
+
+- Refine hypotheses based on feedback from backtesting.
+- Repeat the process to continuously improve the model.
+
+⚡ Quick Start
+~~~~~~~~~~~~~~~~~
+
+You can try our demo by running the following command:
+
+- 🐍 Create a Conda Environment
+    - Create a new conda environment with Python (3.10 and 3.11 are well tested in our CI):
+
+      .. code-block:: sh
+      
+          conda create -n rdagent python=3.10
+
+    - Activate the environment:
+
+      .. code-block:: sh
+
+          conda activate rdagent
+
+- 📦 Install the RDAgent
+    - You can directly install the RDAgent package from PyPI:
+
+      .. code-block:: sh
+
+          pip install rdagent
+
+- ⚙️ Environment Configuration
+    - Place the `.env` file in the same directory as the `.env.example` file.
+        - The `.env.example` file contains the environment variables required for users using the OpenAI API (Please note that `.env.example` is an example file. `.env` is the one that will be finally used.)
+
+    - Export each variable in the .env file:
+
+      .. code-block:: sh
+
+          export $(grep -v '^#' .env | xargs)
+    
+    - If you want to change the default environment variables, you can refer to `Env Config`_ below
+
+- 🚀 Run the Application
+    .. code-block:: sh
+
+        rdagent med_model
+
+🛠️ Usage of modules
+~~~~~~~~~~~~~~~~~~~~~
+
+.. _Env Config: 
+
+- **Env Config**
+
+The following environment variables can be set in the `.env` file to customize the application's behavior:
+
+.. autopydantic_settings:: rdagent.app.data_mining.conf.PropSetting
+    :settings-show-field-summary: False
+    :exclude-members: Config