Skip to content

Commit

Permalink
v 2023.10.10
Browse files Browse the repository at this point in the history
1. Updated notebooks
 notebook 02
  - added pip install moda instruction
 notebook 03
  - migrated to palmerpenguins
  - added data cleanup
  - added new graphics for split-apply-combine
  - included a separate notebook for PCA
2. Updated Quarto docs settings
- updated formatting on
3. Updated settings.ini to include updated dependencies
  • Loading branch information
sangyu committed Oct 19, 2023
1 parent 3ca7a19 commit c5fad30
Show file tree
Hide file tree
Showing 19 changed files with 6,791 additions and 3,674 deletions.
23 changes: 11 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
Modern Data Analysis
================
# Modern Data Analysis

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

By [Joses Ho](https://twitter.com/jacuzzijo), [Sangyu
Xu](https://xusangyu.com/), and [Adam
By [Sangyu Xu](https://xusangyu.com/), [Joses
Ho](https://twitter.com/jacuzzijo), [Yishan
Mai](https://twitter.com/myish_irl), and [Adam
Claridge-Chang](http://www.claridgechang.net/)

**Part of GMS6812 2022: Foundations of Precision Medicine hands-on
workshops (PhD programme in Clinical and Translational Sciences)**

------------------------------------------------------------------------

9 am - 1 pm, Febuary 14, 2023

Duke-NUS Medical School
**This material is used in Duke-NUS Medical School classes:** \*
GMS6812: Foundations of Precision Medicine hands-on workshops for the
PhD programme in Clinical and Translational Sciences. \* IBM Ethics and
Personal and Professional Development Session

The goal of this class is to introduce biomedical scientists to data
analysis with Python notebooks. There are two parts to the session: a
Expand All @@ -33,3 +29,6 @@ Please do the following preparations before class.
If we have more time in class, you will also be introduced to our
estimation statistics package [DABEST
Introduction](dabest_introduction.html).

If you have any questions about these materials, please contact
`xusangyu at gmail.com`
172 changes: 34 additions & 138 deletions nbs/01_Introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
"---\n",
"output-file: introduction.html\n",
"title: 01. Introduction\n",
"date-modified: \"2023-10-19\"\n",
"#date: \"2022-10-15\"\n",
"#author: \"Sangyu Xu, Adam Claridge-Chang\"\n",
"\n",
"---"
]
Expand Down Expand Up @@ -35,31 +38,52 @@
"\n",
"*If any issues can’t be resolved with the below steps, we can work on it in the class together.*\n",
"\n",
"### Getting the necessary software\n",
"1. You'll need to get set up with a [version-control](https://en.wikipedia.org/wiki/Version_control) system. Go to [GitHub](https://github.com/) and get an account. Download and install [GitHub Desktop](https://desktop.github.com/).\n",
"\n",
"2. Retrieve the course materials from GitHub. Go to the course repository (\"repo\") at https://github.com/ACCLAB/moda. Click the green <mark style=\"background-color: lightgreen\">Code</mark> button and then select <mark style=\"background-color: lightgray\">Open with GitHub Desktop</mark>.<img src=\"images/clonemoda.png\" alt=\"clonemoda.png\" width=\"600\"/> <br>You will be prompted to select a directory for the local repository. If you are using a PC it can be something like this: <br><img src=\"images/clonedir.png\" alt=\"clonedir.png\" width=\"500\"/> <br>If you are using a mac, it can be something like \"//Users/YOURUSERNAME/Documents/GitHub/moda\".\n",
"2. Retrieve the course materials from GitHub. Go to the course repository (\"repo\") at https://github.com/ACCLAB/moda. Click the green <mark style=\"background-color: lightgreen\">Code</mark> button and then select <mark style=\"background-color: lightgray\">Open with GitHub Desktop</mark>.<img src=\"images/clonemoda.png\" alt=\"clonemoda.png\" width=\"600\"/> <br>You will be prompted to select a directory for the local repository. If you are using a PC it can be something like this: <br><img src=\"images/clonedir.png\" alt=\"clonedir.png\" width=\"500\"/> <br>If you are using a mac, it can be something like `//Users/YOURUSERNAME/Documents/GitHub/moda`.\n",
"\n",
"3. To get set up with Python and [Jupyter](https://en.wikipedia.org/wiki/Project_Jupyter) notebooks, install the [Anaconda\n",
" Distribution](https://www.anaconda.com/download/) on your laptop."
" Distribution](https://www.anaconda.com/download/) on your laptop.\n",
"\n",
"4. Open Anaconda Navigator and open a terminal window by clicking on `Environments` > `base (root)`, and then clicking on the green triangle and select `Open Terminal`.\n",
"\n",
"<img src=\"images/openterminal.png\" alt=\"openterminal.png\" width=\"600\"/> <br>\n",
"\n",
"5. Go to your moda directory (replace the path with your own actual path) and install it with pip:<br>\n",
" `cd Documents/GitHub/moda`<br>\n",
" `pip install .`<br>"
]
},
{
"cell_type": "markdown",
"id": "666c283a-337a-4e42-a71d-da7a4b70f7ea",
"metadata": {},
"source": [
"4. Open Anaconda Navigator and launch [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/user/interface.html) by clicking on it.<img src=\"images/launchjupyterlab.png\" alt=\"launchjupyterlab.png\" width=\"600\"/> <br>JupyterLab will open in a browser tab.\n",
"### Checking out the notebooks\n",
"6. launch [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/user/interface.html) by clicking on it.\n",
"\n",
"5. In the File Browser panel in JupyterLab, navigate to the folder where you cloned the course repo (refer to step 2). Double click on 'nbs'. You should see a list of notebook files. Open \"02_Quick_tour_of_the_Notebook.ipynb\" by double-clicking on the icon shown in the JupyterLab browser window.<img src=\"images/opennotebook.png\" alt=\"opennotebook.png\" width=\"600\"/>\n",
"<img src=\"images/launchjupyterlab.png\" alt=\"launchjupyterlab.png\" width=\"600\"/> <br>JupyterLab will open in a browser tab.\n",
"\n",
"6. Work through the notebook. Familiarize yourself with basic Python,\n",
" and with working in the JupyterLab environment.\n",
"7. In the File Browser panel in JupyterLab, navigate to the folder where you cloned the course repo (refer to step 2). Double click on 'nbs'. You should see a list of notebook files. Open \"02_Quick_tour_of_the_Notebook.ipynb\" by double-clicking on the icon shown in the JupyterLab browser window.\n",
"<img src=\"images/opennotebook.png\" alt=\"opennotebook.png\" width=\"600\"/>\n",
"\n",
"7. Read about [pandas](https://pandas.pydata.org/),\n",
"8. Work through the notebook. Familiarize yourself with basic Python,\n",
" and with working in the JupyterLab environment.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "6929841a-d49f-4f08-99dc-c05d9b919d6e",
"metadata": {},
"source": [
"### Reading about the python packages we will use\n",
"9. Read about [pandas](https://pandas.pydata.org/),\n",
" [matplotlib](https://matplotlib.org/), and\n",
" [seaborn](https://seaborn.pydata.org/).\n",
"\n",
"8. Read our papers on estimation statistics\n",
"10. Read our papers on estimation statistics\n",
" [here](https://zenodo.org/record/60156) and\n",
" [here](https://doi.org/10.1101/377978).\n"
]
Expand All @@ -84,146 +108,18 @@
" [here](https://sangyu.github.io/Evidence-Session/lab?path=Notebooks%2F01.+Data+Analysis+with+Jupyter+and+Python.ipynb)."
]
},
{
"cell_type": "markdown",
"id": "4dc195ef",
"metadata": {},
"source": [
"## Further practice and resources\n",
"\n",
"Try using the [estimationstats.com](https://www.estimationstats.com/#/) web app to analyze your own grouped data.\n",
"\n",
"Open and have a look at the sample multivariate\n",
"[data](https://docs.google.com/spreadsheets/d/1F0c5I_S9_NnLKPMQxJkEfzGfhzQeR26SgkiHSTFwKDE/edit?usp=sharing).\n",
"Go through the [introductory\n",
"notebook](https://drive.google.com/file/d/1m_l4k5ZaUc03hpvcfBd_Riy2nXYDpFXg/view?usp=sharing)\n",
"that demonstrates data analysis.\n",
"\n",
"We recommend the following texts to strengthen your data-analysis and\n",
"presentation skills. They can be dipped into over the coming months or\n",
"years, and used as references. Being familiar with some or all of this\n",
"material will help you write your first-author paper/s and doctoral\n",
"thesis.\n",
"\n",
"### *Key resources*\n",
"\n",
"- Estimation: Our\n",
" [estimationstats.com](https://www.estimationstats.com/#/background)\n",
" site has introductory information on estimation and specific types\n",
" of\n",
" [analyses](https://www.estimationstats.com/#/user-guide/two-independent-groups)\n",
" and [effect\n",
" sizes](https://www.estimationstats.com/#/about-effect-sizes).\n",
"\n",
"- Datavis: Claus Wilke’s free online\n",
" [book](https://clauswilke.com/dataviz/index.html) is a great\n",
" introduction to data visualization, and a style guide. It is written\n",
" in R, which is the best language for statistics.\n",
"\n",
"- Coding: There are many online resources to learn coding. Published\n",
" in 2021, [A Data-Centric Introduction to\n",
" Computing](https://dcic-world.org/) uses a Python-like teaching\n",
" language ([Pyret](https://www.pyret.org/)) to introduce key concepts\n",
" in computer science.\n",
"\n",
"### *Additional resources*\n",
"\n",
"#### *Some are free, some you will need to buy or borrow from the library.*\n",
"\n",
"- Estimation: If you want to learn about estimation statistics in\n",
" greater depth, there is Calin-Jageman and Cumming’s\n",
" [textbook](http://thenewstatistics.com/itns/) that is well-written,\n",
" funny, and clear. The authors also run a\n",
" [blog](https://thenewstatistics.com/itns/).\n",
"\n",
"- Estimation: Christoph Bernard’s account of the pioneering experience\n",
" of a major journal (*eNeuro*) recommending estimation as standard:\n",
" the [initial\n",
" announcement](https://www.eneuro.org/content/6/4/ENEURO.0259-19.2019),\n",
" [author\n",
" feedback](https://blog.eneuro.org/2021/02/discussion-est-stats-author-feedback),\n",
" and [after one\n",
" year](https://www.eneuro.org/content/8/2/ENEURO.0091-21.2021).\n",
"\n",
"- Coding: The paid coding tutorial [Learn Python The Hard\n",
" Way](https://learncodethehardway.org/python/) has a good reputation,\n",
" but there are also many free options (see\n",
" [DCIC](https://dcic-world.org/) above) with great reviews.\n",
"\n",
"- Coding: It will help to learn to use your computer’s\n",
" [Unix-style](https://youtu.be/tc4ROCJYbm0) command-line\n",
" [shell](https://en.wikipedia.org/wiki/Unix_shell). This interface\n",
" will allow you to use package managers like\n",
" [conda](https://docs.conda.io/en/latest/) and\n",
" [homebrew](https://brew.sh/), version-control tools like\n",
" [git](https://git-scm.com/), and other important tools. There are\n",
" many [books](https://www.linuxcommand.org/index.php) about the\n",
" shell, with only minor differences between MacOS,\n",
" [Windows](https://www.howtogeek.com/249966/how-to-install-and-use-the-linux-bash-shell-on-windows-10/),\n",
" and Linux.\n",
"\n",
"- Datavis: A brief guide to oral–visual data presentations\n",
" ([talks](http://www.howtogiveatalk.com/)).\n",
"\n",
"- Datavis: A reader-funded textbook on\n",
" [typography](http://practicaltypography.com/presentations.html),\n",
" including for slides. Since so much communication relies on text,\n",
" typography is an important part of the data interface.\n",
"\n",
"- Datavis: For historical perspectives, Edward Tufte’s\n",
" [books](https://www.edwardtufte.com/tufte/) are classic texts to\n",
" develop your design skills, and there is Friendly and Wainer’s\n",
" [History of Data\n",
" Visualization](https://friendly.github.io/HistDataVis/).\n",
"\n",
"- As you progress, you will want to develop your skills in areas like\n",
" bioinformatics, image processing, and/or machine learning. The iris\n",
" dataset is widely used for training in multivariate data analysis,\n",
" with many online tutorials."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66d8d30a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"The social reasons to learn programming also apply to research programming.\n",
"<iframe width=\"600\" height=\"400\"\n",
"src=\"https://www.youtube.com/embed/kgicuytCkoY\">\n",
"</iframe>\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%HTML\n",
"The social reasons to learn programming also apply to research programming.\n",
"<iframe width=\"600\" height=\"400\"\n",
"src=\"https://www.youtube.com/embed/kgicuytCkoY\">\n",
"</iframe>"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef28733",
"id": "529fd030-8a95-45c2-abe8-c0d3eba29bc3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "python3",
"language": "python",
"name": "python3"
}
Expand Down
12 changes: 7 additions & 5 deletions nbs/02_Quick_tour_of_the_Notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
"---\n",
"output-file: quick_tour_of_the_notebook.html\n",
"title: 02. A Quick Tour of The Notebook\n",
"\n",
"#author: \"Joses Ho, Sangyu Xu\"\n",
"#date: \"15/10/2017\"\n",
"date-modified: \"2023-10-19\"\n",
"---\n",
"\n"
]
Expand Down Expand Up @@ -67,7 +69,7 @@
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd # Don't worry about this line yet. We'll explain it later below!"
"import pandas as pd # Don't worry about this line yet. We'll 3explain it later below!"
]
},
{
Expand Down Expand Up @@ -104,7 +106,7 @@
"metadata": {},
"outputs": [],
"source": [
"#pd.read_csv("
"# pd.read_csv("
]
},
{
Expand Down Expand Up @@ -133,7 +135,7 @@
"metadata": {},
"outputs": [],
"source": [
"#pd.r"
"# pd.r"
]
},
{
Expand Down Expand Up @@ -645,7 +647,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "python3",
"language": "python",
"name": "python3"
}
Expand Down
Loading

0 comments on commit c5fad30

Please sign in to comment.