Skip to content

Juliana-R/ds-prep-course

 
 

Repository files navigation

Data Science Prep Course Repository

Welcome to Data Science Prep Course repository. Here is where you'll find all information needed to setup your environment and the workflow you'll use during the Prep Course.

  1. Initial Setup
    1. Videos
    2. Windows Setup
    3. Setup Git and GitHub
    4. Setup your Workspace Repository
    5. Get the Learning Material
    6. Running a Learning Unit
  2. Learning Unit Workflow
  3. Updates to Learning Units
  4. Help
    1. Troubleshooting
  5. Tips and Tricks

Initial Setup

IMPORTANT
Before the prep-course you will have to complete these instructions, this is essential.

Once you complete the setup mark yourself as completed (Yes) on this spreadsheet. Make sure that you complete the setup by the 30th of March, as the course will begin on that day. If you are struggling to install any of the software mentioned below, tell us ASAP! The course by itself will be very intensive, so we do not want you to waste time setting up after the 30th of March!!

By completing this you will setup and learn about all the tools you'll be using during the academy. We will also be able to identify any problems in time to figure out a solution.

Don't worry if you can't figure out what some of the the commands you will use do. Anything that is important will be explained in more detail during the course.

Videos

You can find here some video guides that follow this setup:

Windows Setup

This section deals with setting up either Windows Subsystem for Linux (WSL) or VMWare. If you are using MacOS or Linux you can skip this section.

If you are using windows 10 we suggest using WSL (see below), if you are using an older Windows version we also support running a virtual linux machine with VMWare.

Why do I need to install either WSL or VMware?

Because of the differences in command line syntax between Windows vs Mac OS/Linux, it would be a great challenge for us to support and provide instructions for both Operating Systems. So, we’d ask you to install Windows Subsystem for Linux, or VMware, which would enable you to run Linux command lines inside Windows. Keep in mind that these are simply extensions to your Windows operating system, hence, installing this software will not do any changes to your laptop. It is also quick and easy to do so.

If you cannot install WSL or VMware, for whatever reason (e.g. you don't hav admin rights on your computer), you can still join the Prep Course and follow the Learning materials. However, all of our setup instructions and learning materials are created for Mac OS/Linux, and unfortunately we will not be able to provide support on how to do it on Windows. If you have some doubts/worries, feel free to reach out to us.

Windows 10 Setup

Follow this guide if you are running Windows 10.

Older Windows Setup

If you are running an older version of Windows (such as Windows 8 or 7), follow the guide below about running Ubuntu with Windows using VMware Player. You'll be required to download VMware and Ubuntu 18, for that please use the links provided below (not the links provided in the tutorial).

MacOS Setup

Some of the steps in the following sections will require Homebrew for MacOS. Homebrew will make it easier to install software that we'll use later on.
To open the terminal, choose one:

  • In Finder Finder, open the /Applications/Utilities folder, then double-click Terminal.
  • By pressing cmd + space then type terminal and press enter.

The terminal should now be open:

Copy and paste the following line in the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

You may be prompted to install the Command Line Developers Tools confirm and once it finishes, continue installing Homebrew by pressing enter again.

Setup Git and GitHub

Git is a distributed version-control system for tracking changes in source code.
A repository is where code lives, and the code from the prep course will live in ds-prep-course repository, and the learning materials and exercises will be released (made available) in that repository.

Install Git

Under Ubuntu

Open a terminal (or use one you've already opened) and run:

sudo apt update && sudo apt upgrade && sudo apt install git
Under MacOS
brew install git

Create a GitHub account

Sign up for a GitHub account and follow instructions.

Setup your Workspace Repository

The workspace directory/repository is where you will place everything you are working on, where you will make changes to files, write code, etc.

Creating the Workspace

  1. Log into GitHub
  2. In the upper-right corner of the page, there should be a signin button, and then select New repository Create Repository
  3. Create a new private GitHub repository called ds-prep-workspace, see Creating a new repository.
  4. You need to explicitly select Private - This is your private work environment.
  5. Initialize with a README.
  6. Add a Python .gitignore.

Create Repository

Cloning the Workspace

  • Open a Terminal (or use one you've already opened)
  • We're going to have a folder named projects where we will keep the repositories we'll be using.
  • We're going to use the mkdir command to create it, and the cd command to enter the folder:
mkdir ~/projects
cd ~/projects
  • You can now clone (retrieve from GitHub) your /ds-prep-workspace repository using the git clone command:

    Note: in the link used in the command bellow, be sure to replace <username> with your github username. (ie: my github username is buedaswag, so the link would be): https://github.com/buedaswag/ds-prep-workspace.git

git clone https://github.com/<username>/ds-prep-workspace.git
  • Now type your git username, then press enter
  • Then type your git password , then press enter
  • You're all set!

Get the Learning Material

You will be cloning the ds-prep-course repository. All of the learning material you need will be made available on this repo as the academy progresses.

  1. Open a Terminal (or use one you've already opened)
  2. Make sure you're in the right directory (use the cd command to enter the ~/projects)
  3. Clone the students repository ds-prep-course
cd ~/projects
git clone https://github.com/LDSSA/ds-prep-course.git

Get the Week 0 Learning Unit

In the ds-prep-course repository that you just cloned there is a Week 0 learning unit. It's used to give instructors guidelines to produce the learning units. We are also using it to ensure that you are able to run and submit a learning unit.

So go ahead and copy the Week 0 directory that contains the SLU000 - Jupyter Notebook from the ds-prep-course repository to your repository (named ds-prep-workspace).

You can do that either using the command line, or the Operating System's Graphical User Interface.

Using the command line

If you have both the ds-prep-course and ds-prep-workspace in a projects directory you could do it using the command line like this:

cp -r ~/projects/ds-prep-course/"Week 0" ds-prep-workspace
Using the Operating System's Graphical User Interface
  • On WSL with Ubuntu:
    • first enter the ~/projects/ds-prep-course directory using the cd command, then run explorer.exe . (don't forget to include the dot! the dot means "current directory") to open Windows explorer in the current directory:
cd ~/projects/ds-prep-course
explorer.exe .

Windows Explorer should pop up now:

Sample learning unit

  • On Mac:
    • In Finder Finder, open the "Go" menu, choose the option "Go to folder..."

Sample learning unit

then paste the path to the ds-prep-course repository: ~/projects/ds-prep-course, then click "Go".

Sample learning unit

Running a Learning Unit

Creating Python Virtual Environment and installing the necessary packages

Bellow are the instructions that are enough to get the setup done and get you up and running :)
You can also follow this guide for a more in depth set of instructions that accomplish exactly the same thing.

You should always be using a virtual environment to install python packages. We'll use venv to set them up.

To install and update packages, we'll be using pip which is the reference Python package manager.

If you are using Ubuntu you will need to install a couple of packages first, this can be done in a terminal by running:

sudo apt update && sudo apt upgrade && sudo apt install python3-pip python3-venv

If you are using Mac OS you will need to install python, this can be done in a terminal by running:

brew install python
Start by installing ensuring pip, setuptools, and wheel are up to date:
python3 -m pip install --user --upgrade pip setuptools wheel
  • Create a virtual environment with the name prep-venv
python3 -m venv ~/.virtualenvs/prep-venv
  • Activate the environment
source ~/.virtualenvs/prep-venv/bin/activate

Note: after you activate your virtual environment you should see at the leftmost of your command line the name of your virtual environment surrounded by parenthesis, like this:

mig@macbook-pro % source ~/.virtualenvs/prep-venv/bin/activate
(prep-venv) mig@macbook-pro %

And you're able to make sure your virtual environment is active using the which command:

(prep-venv) mig@macbook-pro % which python
/Users/mig/.virtualenvs/prep-venv/bin/python

Please don't forget to update pip.

pip install -U pip

This means that our virtual environment is active.

IMPORTANT!!! make sure that your virtual environment is active before you proceed

  • Now you're ready to install packages! Just enter the directory of the SLU000 - Jupyter Notebook using the cd command, and install the required packages that are enumerated in the requirements.txt file
cd ~/projects/ds-prep-workspace/"Week 0"/"SLU000 - Jupyter Notebook"
pip install -r requirements.txt

Working on the Learning Unit

All learning units come as a set of Jupyter Notebooks (and some links to presentations). Notebooks are documents that can contain text, images and live code that you can run interactively.

In this section we will launch the Jupyter Notebook application. The application is accessed through the web browser.

Once you have the application open feel free to explore the sample learning unit structure. It will give you a handle on what to expect and what rules the instructors follow (and the effort they put) when creating a learning unit.

So let's start the Jupyter Notebook app:

  • Activate your virtual environment
  • Enter the Learning unit directory in your workspace directory (ds-prep-workspace).

    Note: It is VERY IMPORTANT that you ALWAYS work on the files on your ds-prep-workspace repository, and NEVER work on files that are in your ds-prep-course repository!

  • Run the jupyter notebook

    Windows 10 note: if you are running Windows 10, the command to run the jupyter notebook is: jupyter notebook --NotebookApp.use_redirect_file=False

source ~/.virtualenvs/prep-venv/bin/activate
cd ~/projects/ds-prep-workspace/"Week 0"/"SLU000 - Jupyter Notebook"
jupyter notebook
  1. Activate the environment and run jupyter notebook
  2. When you run the jupyter notebook command, you should see something similar to this in your terminal: Open exercise notebook
  3. and your browser should pop up with Jupyter open, however, if this does not happen, you can simply copy the link you see on your terminal (the one that contains localhost) and past it in your browser's address bar:

Open exercise notebook

Note: If you see these scarry looking error messages, don't worry, you can just ignore them.

Open exercise notebook

The Exercise Notebook

Make sure you open and go through the Learning Notebook first.

Every learning unit contains an exercise notebook with exercises you will work on. So let's have a look at the sample Learning Unit.

  1. On the Jupyter Notebook UI in the browser open the exercise notebook Open exercise notebook
  2. Follow the instructions provided in the notebook

You'll see cells with the exercises and cells for you to write solutions.

Once you've solved all of the notebook we recommend following this simple checklist to avoid unexpected surprises.

  1. Save the notebook (again)
  2. Run "Restart & Run All" Restart & Run All
  3. At this point the notebook should have run without any error messages showing up.
  4. When you're done (after saving your work) you can go to the terminal and close it:

Sample learning unit

Commit and Push

Note: It is VERY IMPORTANT that you ALWAYS work on the files in your ds-prep-workspace repository, and NEVER work on files that are in your ds-prep-course repository! So before you do this step, make sure that the files you made changes to are the ones in your ds-prep-workspace folder.

Now you have worked on the sample learning unit and you have some uncommitted changes. It's time to commit the changes, which just means adding them to your ds-prep-workspace repository history, and pushing this history to your remote on GitHub.

  • First you need to configure your email and username (replace "[email protected]" with your email, and "buedaswag" with your username):
git config --global user.email "[email protected]"
git config --global user.username "buedaswag"
git config --global user.name "Bueda Swag"
  • Using the terminal first make sure you're in the right directory (using the cd command), then commit and push the changes
cd ~/projects/ds-prep-workspace
git add .
git commit -m 'Testing the sample notebook'
git push
  • Now type your git username, then press enter
  • Then type your git password , then press enter
  • You're all set!

Learning Unit Workflow

You will need to follow this workflow every week starting from week 1.

Learning units will be announced in the academy's #announcements channel. At this point they are available in the ds-prep-course repository.
A new Learning Unit is released every Monday, and its solutions are then released the next Monday.

The steps you followed during the initial setup are exactly what you are going to be doing for each new Learning Unit. Here's a quick recap:

  1. Once a new Learning Unit is available at the beginning of each week, pull the changes from the ds-prep-course repo:
    • enter the ~/projects/ds-prep-course/ using the cd command, then use the git pull command:
    cd ~/projects/ds-prep-course/
    git pull

    note that this will also pull the solutions for the Learning Unit of the previous week

  2. Copy the Learning Unit to your ds-prep-workspace repo
    • To do that you can use the cp command:
    cp -r ~/projects/ds-prep-course/"Week <week number>" ~/projects/ds-prep-workspace
    and you would replace the <week number> with the week number, such that in week 0, for example, the command would be:
    cp -r ~/projects/ds-prep-course/"Week 0" ~/projects/ds-prep-workspace
  3. Activate your virtual environment
    source ~/.virtualenvs/prep-venv/bin/activate
  4. Install the python packages from requirements.txt for the specific SLU (you must do this for each SLU, and there are multiple SLU's in a Week)
    pip install -r ~/projects/ds-prep-workspace/"Week <week number>"/"<SLU name>"/requirements.txt
    and you would replace <week number> and <SLU name>, such that in Week 0 and SLU000 - Jupyter Notebook, for example, the command would be:
    pip install -r ~/projects/ds-prep-workspace/"Week 0"/"SLU000 - Jupyter Notebook"/requirements.txt
  5. Change to the ds-prep-workspace dir
    cd ~/projects/ds-prep-workspace
  6. Open Jupyter Notebook
    jupyter notebook
  7. Work
  8. Once all tests pass or once you're happy, save your work, close the browser tab with the Jupyter Notebook, close the terminal and open a new terminal
  9. Then commit the changes and push
    cd ~/projects/ds-prep-workspace
    git add .
    git commit -m "Work on week <week number> exercises"
    git push
  10. Profit

Updates to Learning Units

As much as we try and have processes in place to prevent errors and bugs in the learning units some make it through to you. If the problem is not in the exercise notebook you can just pull the new version from the ds-prep-course repo and replace the file on your ds-prep-workspace. The problem is if the correction is in the exercise notebook, you can't just replace the file because your work is there and you'll lose it!

When a new version of the exercise notebook is released (and announced) you will have to merge the work you've already done into the new version of the notebook.

At the moment our suggestion to merge the changes is:

  1. Rename the old version
  2. Copy the new exercise notebook over
  3. Open both and copy paste your solutions to the new notebook

We understand it's not ideal and are working on improving this workflow.

Help

During the prep-course you will surely run into problems and have doubts about the material. Please refer to this wiki page on how to ask for help!

Troubleshooting

When I open Windows Explorer through Ubuntu, it goes to a different folder than in the guide

  • Please make sure:
    • you are running the command explorer.exe . including the dot at the end.
    • you are running Windows 10 version 1909 or newer.

Ubuntu on Windows 10 high CPU usage, crashes

  • First please make sure you are running Windows 10 version 1909 or newer.
  • Then, try following these steps

If the above steps didn't solve the problem for you, please contact us on Slack or if you are not on slack, open an issue

Tips and Tricks

Coming soon.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.0%
  • Python 1.0%