Skip to content

Commit

Permalink
python.md rm conda and add headings
Browse files Browse the repository at this point in the history
  • Loading branch information
bclaremar authored Dec 19, 2023
1 parent 541ec26 commit 6de2f95
Showing 1 changed file with 39 additions and 123 deletions.
162 changes: 39 additions & 123 deletions docs/software/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ A very small selection of installed packages are:

In the python scripts or python prompt packages are imported or loaded by the commands ``import``.

How to install packages
## How to install packages

There are two package installation systems

PyPI (pip) is traditionally for Python-only packages but it is no problem to also distribute packages written in other languages as long as they provide a Python interface.
Expand All @@ -137,7 +138,8 @@ Does it work? Then it is there!

Otherwise, you can either use "pip" or "Conda".

Pip
### Pip

You use pip this way, in Linux shell or python shell:

$ pip install --user <package name> # or pip3 if required from loaded python module
Expand All @@ -153,115 +155,14 @@ To be able to find those packages with non-default path you have to set the PYTH
$ export PYTHONPATH=<prefix-path>/lib/pythonX.Y/site-packages/:$PYTHONPATH.
You may want to add this line in your .bashrc file!

Conda
We have mirrored all major conda repositories directly on UPPMAX, on both Rackham and Bianca. These are updated every third day. We have the following channels available:

bioconda

biocore

conda-forge

dranew

free

main

pro

qiime2

r

r2018.11

scilifelab-lts

You reach them all by loading the conda module. You don’t have to state the specific channel.

Procedure
First load our conda module (there is no need to install you own miniconda, for most cases )

$ module load conda
This grants you access to the latest version of Conda and all major repositories on all UPPMAX systems.

Check the text output as conda is loaded, especially the first time, see below

The variable CONDA_ENVS_PATH contains the location of your environments. Set it to your project’s environments folder if you have one. Otherwise, the default is ~/.conda/envs. You may run source conda_init.sh to initialise your shell to be able to run conda activate and conda deactivate etc. Just remember that this command adds stuff to your shell outside the scope of the module system. REMEMBER TO conda clean -a once in a while to remove unused and unnecessary files.
Step 2 First time

The variable CONDA_ENVS_PATH contains the location of your environments. Set it to your project’s environments folder if you have one.

Otherwise, the default is ~/.conda/envs.

Example:

$ export CONDA_ENVS_PATH=/proj/snic2020-5-XXX
By choice:
Run conda init bash to initialise your shell (bash) to be able to run conda activate and conda deactivate etcetera instead of source activate. It will modify (append) your .bashrc file.

Step 3 Create the conda environment

Example:

$ conda create --name python36-env python=3.6 numpy=1.13.1 matplotlib=2.2.2
THE MAMBA ALTERNATIVE

mamba is a fast drop-in alternative to conda, using “libsolv” for dependency resolution. It is available from the conda module.

Example:

mamba create --name python37-env python=3.7 numpy=1.13.1 matplotlib=2.2.2
Step 4 Activate the conda environment by:
### Conda

$ source activate python36-env
You will see that your prompt is changing to start with (python-36-env) to show that you are within an environment.
See our [Conda user Guide](../cluster_guides/conda.md)

Step 5 Now do your work!

(python-36-env) $ <...>
Step 6 Deactivate

(python-36-env) $ source deactivate
Warning

Conda is known to create many small files. Your diskspace is not only limited in GB, but also in number of files (typically 300000 in $home).

Check your disk usage and quota limit with uquota

Do a conda clean -a once in a while to remove unused and unnecessary files

Conda in batch scripts
If you already have setup the CONDA_ENVS_PATH path and run 'conda init bash' a batch script containing a conda environment shall include

$ module load conda
$ conda activate <name of environment>
Packages on Bianca
Since we have mirrored conda repositories locally conda will work also on Bianca!

First try Conda! There is a mirrored repository with many available packages.

If your desired package is not there but available as pip follow the guide below, perhaps , while looking at Bianca user guide and Transit user guide.

Make an installation on Rackham and then use the wharf to copy it over to your directory on Bianca.

Path on Rackham and Bianca could be (~/.local/lib/python<version>/site-packages/ ).

You may have to:

in source directory:

$ cp –a <package_dir> <wharf_mnt_path>
you may want to tar before copying to include all possible symbolic links:

$ tar cfz <tarfile.tar.gz> <package>
and in target directory (wharf_mnt) on Bianca:
$ tar xfz <tarfile.tar.gz> #if there is a tar file!
$ mv –a <file(s)> ~/.local/lib/python<version>/site-packages/
If problems arise, send an email to [email protected] and we'll help you.

Isolated environments
## Isolated environments
Good introduction at CodeRefinery's course in Python for Scientific Computing .

Isolated environments solve a couple of problems:
Expand All @@ -272,7 +173,7 @@ You can create one for each project and no problem if the two projects require d

If you make some mistake and install something you did not want or need, you can remove the environment and create a new one.

Example with virtual environment
### Example with virtual environment
Create a "venv". First load the python version you want to base your virtual environment on.

Example with python/3.6.0
Expand Down Expand Up @@ -304,12 +205,13 @@ See further down how to use Jupyter from an isolated session where you used --sy

More on virtual environment

Installing with pyenv
### Installing with pyenv

This approach is more advanced and should be, in our opinion, used only if the above are not enough for the purpose. Probably Conda will work well four you. The approach below allows you to install your own python version and much more…

Confer the official pyenv documentation.

First time at UPPMAX
#### First time at UPPMAX

1. Download pyenv

Expand All @@ -321,7 +223,7 @@ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
To make sure everything gets loaded correctly, log out and back in to uppmax.

Installing own python version (not already available as an UPPMAX module)
#### Installing own python version (not already available as an UPPMAX module)

1. Get pyenv to install the python version of your liking.

Expand All @@ -344,7 +246,9 @@ pip install [package name]
Example:

pip install mechanize
Running Python from Jupyter notebook (and -lab)


## Running Python from Jupyter notebook (and -lab)
You can run Python in a notebook, i.e. in a web interface with possibility of inline figures and debugging. An easy way to do this is to load the python module as well. In shell:

module load python/<version>
Expand All @@ -353,7 +257,7 @@ A Firefox session should start with the Jupyter notebook interface. If not, cop

Presently we have jupyter-lab only installed for python>=3.10.8. You can install a personal version with Conda for lower versions.

Jupyter in a virtual environment (venv)
### Jupyter in a virtual environment (venv)
You could also use jupyter- (lab or notebook) in a virtual environment.

If you decide to use the --system-site-packages configuration you will get jupyter from the python modules you created you virtual environment with.
Expand All @@ -365,7 +269,8 @@ and run:
$ jupyter-notebook
Be sure to start the kernel with the virtual environment name, like "Example", and not "Python 3 (ipykernel)".

How to run parallel jobs
## How to run parallel jobs

Material here is taken partly from the parallel part of the online course Python for Scientific Computing

Parallel computing is when many different tasks are carried out simultaneously. There are three main models:
Expand All @@ -378,11 +283,14 @@ Message passing: Different processes manage their own memory segments. They shar

There are several packages available for Python that let you run parallel jobs. Some of them are only able to run on one node, while others try to leverage several machines.

Threading
### Threading

Threading divides up your work among a number of cores within a node. The threads shares its memory.

Multi-threading ocumentation
Examples
- Multi-threading documentation
- Examples


The designers of the Python language made the choice that only one thread in a process can run actual Python code by using the so-called global interpreter lock (GIL). This means that approaches that may work in other languages (C, C++, Fortran), may not work in Python without being a bit careful. At first glance, this is bad for parallelism. But it’s not all bad!:

External libraries (NumPy, SciPy, Pandas, etc), written in C or other languages, can release the lock and run multi-threaded. Also, most input/output releases the GIL, and input/output is slow.
Expand All @@ -395,7 +303,8 @@ Threading python module. This is very low level and you shouldn’t use it unles

We recommend you find a UNIX threading tutorial first before embarking on using the threading module.

Distributed computing
### Distributed computing

As opposed to threading, Python has a reasonable way of doing something similar that uses multiple processes.

Distributed processing uses individual processes with individual memory, that communicate with each other. In this case, data movement and communication is explicit.
Expand All @@ -405,7 +314,7 @@ Python supports various forms of distributed computing.
MPI through mpi4py : a Python wrapper for the MPI protocol, see further down
If choosing between multiprocessing and MPI, distributed is easier to program, whereas MPI may be more suitable for multi-node applications.

Multiprocessing/distributed
#### Multiprocessing/distributed

The interface is a lot like threading, but in the background creates new processes to get around the global interpreter lock.

Expand Down Expand Up @@ -461,7 +370,8 @@ n_sum = sum(x[0] for x in results)
n_inside_circle_sum = sum(x[1] for x in results)
pi = 4.0 * (n_inside_circle_sum / n_sum)
print(pi)
Batch example

##### Batch example

If you need to revive your knowledge about the scheduling system, please check Slurm user guide.

Expand All @@ -483,11 +393,13 @@ python distributed.py
​Put job in queue:

sbatch job_distributed.slurm
Interactive example

##### Interactive example

salloc -A <proj> -p node -N 1 -n 10 -t 1:0:0
python distributed.py
MPI

### MPI
Presently you have to install your own mpi4py. You will need to activate paths to the MPI libraries. Therefore follow these steps.

1. If you use python 3.10.8:
Expand Down Expand Up @@ -559,7 +471,9 @@ A batch script, job_MPI.slurm, should include a "module load gcc/9.3.0 openmpi/3
module load python/3.9.5
module load gcc/9.3.0 openmpi/3.1.5
mpirun -n 20 python pythonMPI.py
Using the GPU nodes


### Using the GPU nodes
Example with numba. First install numba locally:

pip install --user numba
Expand Down Expand Up @@ -609,7 +523,9 @@ Starting job now -- you waited for 90 seconds.
[bjornc@s160 ~]$ python add-list.py #run the script
CPU function took 36.849201 seconds.
GPU function took 1.574953 seconds.
Machine and Deep Learning


### Machine and Deep Learning
Please see our Tensorflow and and PyTorch guides.

Useful links:
Expand Down

0 comments on commit 6de2f95

Please sign in to comment.