Use the local virtualenv development option in the combination with the Breeze development environment. This option helps you benefit from the infrastructure provided by your IDE (for example, IntelliJ PyCharm/IntelliJ Idea) and work in the environment where all necessary dependencies and tests are available and set up within Docker images.
But you can also use the local virtualenv as a standalone development option if you develop Airflow functionality that does not incur large external dependencies and CI test coverage.
These are examples of the development options available with the local virtualenv in your IDE:
- local debugging;
- Airflow source view;
- auto-completion;
- documentation support;
- unit tests.
This document describes minimum requirements and instructions for using a standalone version of the local virtualenv.
Use system-level package managers like yum, apt-get for Linux, or Homebrew for macOS to install required software packages:
- Python (3.5 or 3.6)
- MySQL
- libxml
Refer to the Dockerfile.ci for a comprehensive list of required packages.
Note
On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might leads to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
pip install --upgrade pip==20.2.4
or, in case you use Pip 20.3, you need to add option
--use-deprecated legacy-resolver
to your pip install command.
You can also install extra packages (like [ssh]
, etc) via
pip install -e [EXTRA1,EXTRA2 ...]
. However, some of them may
have additional install and setup requirements for your local system.
For example, if you have a trouble installing the mysql client on macOS and get an error as follows:
ld: library not found for -lssl
you should set LIBRARY_PATH before running pip install
:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl/lib/
You are STRONGLY encouraged to also install and use pre-commit hooks for your local virtualenv development environment. Pre-commit hooks can speed up your development cycle a lot.
The full list of extras is available in setup.py.
To use your IDE for Airflow development and testing, you need to configure a virtual environment. Ideally you should set up virtualenv for all Python versions that Airflow supports (3.5, 3.6).
To create and initialize the local virtualenv:
Create an environment with one of the two options:
- Option 1: consider using one of the following utilities to create virtual environments and easily switch between them with the
workon
command:
mkvirtualenv <ENV_NAME> --python=python<VERSION>
- Option 2: create a local virtualenv with Conda
- install miniconda3
conda create -n airflow python=3.6 conda activate airflow
- Option 1: consider using one of the following utilities to create virtual environments and easily switch between them with the
Install Python PIP requirements:
Note
On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might leads to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
pip install --upgrade pip==20.2.4
or, in case you use Pip 20.3, you need to add option
--use-deprecated legacy-resolver
to your pip install command.
pip install -U -e ".[devel,<OTHER EXTRAS>]" # for example: pip install -U -e ".[devel,google,postgres]"
In case you have problems with installing airflow because of some requirements are not installable, you can try to install it with the set of working constraints (note that there are different constraint files for different python versions:
pip install -U -e ".[devel,<OTHER EXTRAS>]" \ --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
Note: when you first initialize database (the next step), you may encounter some problems.
This is because airflow by default will try to load in example dags where some of them requires dependencies google
and postgres
.
You can solve the problem by:
- installing the extras i.e.
[devel,google,postgres]
or - disable the example dags with environment variable:
export AIRFLOW__CORE__LOAD_EXAMPLES=False
or - simply ignore the error messages and proceed
In addition to above, you may also encounter problems during database migration. This is a known issue and please see the progress here: AIRFLOW-6265
Create the Airflow sqlite database:
# if necessary, start with a clean AIRFLOW_HOME, e.g. # rm -rf ~/airflow airflow db init
Select the virtualenv you created as the project's default virtualenv in your IDE.
Note that if you have the Breeze development environment installed, the breeze
script can automate initializing the created virtualenv (steps 2 and 3).
Activate your virtualenv, e.g. by using workon
, and once you are in it, run:
./breeze initialize-local-virtualenv
- (optionally) run yarn build if you plan to run the webserver
cd airflow/www
yarn build
In Airflow 2.0 we introduced split of Apache Airflow into separate packages - there is one main apache-airflow package with core of Airflow and 70+ packages for all providers (external services and software Airflow can communicate with).
Developing providers is part of Airflow development, but when you install airflow as editable in your local development environment, the corresponding provider packages will be also installed from PyPI. However, the providers will also be present in your "airflow/providers" folder. This might lead to confusion, which sources of providers are imported during development. It will depend on your environment's PYTHONPATH setting in general.
In order to avoid the confusion, you can set INSTALL_PROVIDERS_FROM_SOURCES
environment to true
before running pip install
command:
INSTALL_PROVIDERS_FROM_SOURCES="true" pip install -U -e ".[devel,<OTHER EXTRAS>]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
This way no providers packages will be installed and they will always be imported from the "airflow/providers" folder.
Running tests is described in TESTING.rst.
While most of the tests are typical unit tests that do not require external components, there are a number of Integration tests. You can technically use local virtualenv to run those tests, but it requires to set up a number of external components (databases/queues/kubernetes and the like). So, it is much easier to use the Breeze development environment for Integration tests.
Note: Soon we will separate the integration and system tests out via pytest so that you can clearly know which tests are unit tests and can be run in the local virtualenv and which should be run using Breeze.
When analyzing the situation, it is helpful to be able to directly query the database. You can do it using the built-in Airflow command:
airflow db shell