Airflow is quite a complex project, and setting up a working environment, but we made it rather simple if you follow the guide.
There are three ways you can run the Airflow dev env:
- With a Docker Containers and Docker Compose (on your local machine). This environment is managed with Breeze tool written in Python that makes the environment management, yeah you guessed it - a breeze.
- With a local virtual environment (on your local machine).
- With a remote, managed environment (via remote development environment)
Before deciding which method to choose, there are a couple of factors to consider:
- Running Airflow in a container is the most reliable way: it provides a more consistent environment and allows integration tests with a number of integrations (cassandra, mongo, mysql, etc.). However, it also requires 4GB RAM, 40GB disk space and at least 2 cores.
- If you are working on a basic feature, installing Airflow on a local environment might be sufficient. For a comprehensive venv tutorial - visit Virtual Env guide
- You need to have usually a paid account to access managed, remote virtual environment.
If you do not work in remote development environment, you need those prerequisites.
- Docker Community Edition (you can also use Colima, see instructions below)
- Docker Compose
- pyenv (you can also use pyenv-virtualenv or virtualenvwrapper)
The below setup describes Ubuntu installation. It might be slightly different on different machines.
- Installing required packages for Docker and setting up docker repo
$ sudo apt-get update
$ sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
$ sudo mkdir -p /etc/apt/keyrings
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
$ echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
- Install Docker Engine, containerd, and Docker Compose Plugin.
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
- Creating group for docker and adding current user to it.
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
Note : After adding user to docker group Logout and Login again for group membership re-evaluation.
- Test Docker installation
$ docker run hello-world
If you use Colima as your container runtimes engine, please follow the next steps:
- Install buildx manually and follow its instructions
- Link the Colima socket to the default socket path. Note that this may break other Docker servers.
$ sudo ln -sf $HOME/.colima/default/docker.sock /var/run/docker.sock
- Change docker context to use default:
$ docker context use default
- Installing latest version of Docker Compose
$ COMPOSE_VERSION="$(curl -s https://api.github.com/repos/docker/compose/releases/latest | grep '"tag_name":'\
| cut -d '"' -f 4)"
$ COMPOSE_URL="https://github.com/docker/compose/releases/download/${COMPOSE_VERSION}/\
docker-compose-$(uname -s)-$(uname -m)"
$ sudo curl -L "${COMPOSE_URL}" -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose
- Verifying installation
$ docker-compose --version
Note: You might have issues with pyenv if you have a Mac with an M1 chip. Consider using virtualenv as an alternative.
- Install pyenv and configure your shell's environment for Pyenv as suggested in Pyenv README
- After installing pyenv, you need to install a few more required packages for Airflow. The below command adds basic system-level dependencies on Debian/Ubuntu-like system. You will have to adapt it to install similar packages if your operating system is MacOS or another flavour of Linux
$ sudo apt install openssl sqlite default-libmysqlclient-dev libmysqlclient-dev postgresql
If you want to install all airflow providers, more system dependencies might be needed. For example on Debian/Ubuntu
like system, this command will install all necessary dependencies that should be installed when you use devel_all
extra while installing airflow.
$ sudo apt install apt-transport-https apt-utils build-essential ca-certificates dirmngr \
freetds-bin freetds-dev git gosu graphviz graphviz-dev krb5-user ldap-utils libffi-dev \
libkrb5-dev libldap2-dev libpq-dev libsasl2-2 libsasl2-dev libsasl2-modules \
libssl-dev locales lsb-release openssh-client sasl2-bin \
software-properties-common sqlite3 sudo unixodbc unixodbc-dev
- Restart your shell so the path changes take effect and verifying installation
$ exec $SHELL
$ pyenv --version
- Checking available version, installing required Python version to pyenv and verifying it
For Architectures other than MacOS/ARM
$ pyenv install --list
$ pyenv install 3.8.5
$ pyenv versions
For MacOS/Arm (3.9.1 is the first version of Python to support MacOS/ARM, but 3.8.10 works too)
$ pyenv install --list
$ pyenv install 3.8.10
$ pyenv versions
- Creating new virtual environment named
airflow-env
for installed version python. In next chapter virtual environmentairflow-env
will be used for installing airflow.
For Architectures other than MacOS/ARM
$ pyenv virtualenv 3.8.5 airflow-env
For MacOS/Arm (3.9.1 is the first version of Python to support MacOS/ARM, but 3.8.10 works too)
$ pyenv virtualenv 3.8.10 airflow-env
- Entering virtual environment
airflow-env
$ pyenv activate airflow-env
Goto https://github.com/apache/airflow/ and fork the project.
Goto your github account's fork of airflow click on
Code
you will find the link to your repo.Follow Cloning a repository to clone the repo locally (you can also do it in your IDE - see the Using your IDE chapter below.
For many of the development tasks you will need Breeze
to be configured. Breeze
is a development
environment which uses docker and docker-compose and its main purpose is to provide a consistent
and repeatable environment for all the contributors and CI. When using Breeze
you avoid the "works for me"
syndrome - because not only others can reproduce easily what you do, but also the CI of Airflow uses
the same environment to run all tests - so you should be able to easily reproduce the same failures you
see in CI in your local environment.
- Install
pipx
- follow the instructions in Install pipx - Run
pipx install -e ./dev/breeze
in your checked-out repository. Make sure to follow any instructions printed bypipx
during the installation - this is needed to make sure thatbreeze
command is available in your PATH. - Initialize breeze autocomplete
$ breeze setup autocomplete
- Initialize breeze environment with required python version and backend. This may take a while for first time.
$ breeze --python 3.8 --backend postgres
Note
If you encounter an error like "docker.credentials.errors.InitializationError: docker-credential-secretservice not installed or not available in PATH", you may execute the following command to fix it:
$ sudo apt install golang-docker-credential-helper
Once the package is installed, execute the breeze command again to resume image building.
- When you enter Breeze environment you should see prompt similar to
root@e4756f6ac886:/opt/airflow#
. This means that you are inside the Breeze container and ready to run most of the development tasks. You can leave the environment withexit
and re-enter it with justbreeze
command. Once you enter breeze environment, create airflow tables and users from the breeze CLI.airflow db reset
is required to execute at least once for Airflow Breeze to get the database/tables created. If you run tests, however - the test database will be initialized automatically for you.
root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
--email [email protected] --firstname foo --lastname bar
- Exiting Breeze environment. After successfully finishing above command will leave you in container,
type
exit
to exit the container. The database created before will remain and servers will be running though, until you stop breeze environment completely.
root@b76fcb399bb6:/opt/airflow#
root@b76fcb399bb6:/opt/airflow# exit
- You can stop the environment (which means deleting the databases and database servers running in the
background) via
breeze down
command.
$ breeze down
- Starting breeze environment using
breeze start-airflow
starts Breeze environment with last configuration run( In this case python and backend will be picked up from last executionbreeze --python 3.8 --backend postgres
) It also automatically starts webserver, backend and scheduler. It drops you in tmux with scheduler in bottom left and webserver in bottom right. Use[Ctrl + B] and Arrow keys
to navigate.
$ breeze start-airflow
Use CI image.
Branch name: main
Docker image: ghcr.io/apache/airflow/main/ci/python3.8:latest
Airflow source version: 2.4.0.dev0
Python version: 3.8
Backend: mysql 5.7
Port forwarding:
Ports are forwarded to the running docker containers for webserver and database
* 12322 -> forwarded to Airflow ssh server -> airflow:22
* 28080 -> forwarded to Airflow webserver -> airflow:8080
* 25555 -> forwarded to Flower dashboard -> airflow:5555
* 25433 -> forwarded to Postgres database -> postgres:5432
* 23306 -> forwarded to MySQL database -> mysql:3306
* 21433 -> forwarded to MSSQL database -> mssql:1443
* 26379 -> forwarded to Redis broker -> redis:6379
Here are links to those services that you can use on host:
* ssh connection for remote debugging: ssh -p 12322 [email protected] (password: airflow)
* Webserver: http://127.0.0.1:28080
* Flower: http://127.0.0.1:25555
* Postgres: jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
* Mysql: jdbc:mysql://127.0.0.1:23306/airflow?user=root
* MSSQL: jdbc:sqlserver://127.0.0.1:21433;databaseName=airflow;user=sa;password=Airflow123
* Redis: redis://127.0.0.1:26379/0
Alternatively you can start the same using following commands
- Start Breeze
$ breeze --python 3.8 --backend postgres
- Open tmux
$ root@0c6e4ff0ab3d:/opt/airflow# tmux
- Press Ctrl + B and "
$ root@0c6e4ff0ab3d:/opt/airflow# airflow scheduler
- Press Ctrl + B and %
$ root@0c6e4ff0ab3d:/opt/airflow# airflow webserver
Now you can access airflow web interface on your local machine at http://127.0.0.1:28080 with user name
admin
and passwordadmin
.Setup a PostgreSQL database in your database management tool of choice (e.g. DBeaver, DataGrip) with host
127.0.0.1
, port25433
, userpostgres
, passwordairflow
, and default schemaairflow
.Stopping breeze
root@f3619b74c59a:/opt/airflow# stop_airflow
root@f3619b74c59a:/opt/airflow# exit
$ breeze down
- Knowing more about Breeze
$ breeze --help
For more information visit : Breeze documentation
Following are some of important topics of Breeze documentation:
Before committing changes to github or raising a pull request, code needs to be checked for certain quality standards such as spell check, code syntax, code formatting, compatibility with Apache License requirements etc. This set of tests are applied when you commit your code.
To avoid burden on CI infrastructure and to save time, Pre-commit hooks can be run locally before committing changes.
- Installing required packages
$ sudo apt install libxml2-utils
- Installing required Python packages
$ pyenv activate airflow-env
$ pip install pre-commit
- Go to your project directory
$ cd ~/Projects/airflow
- Running pre-commit hooks
$ pre-commit run --all-files
No-tabs checker......................................................Passed
Add license for all SQL files........................................Passed
Add license for all other files......................................Passed
Add license for all rst files........................................Passed
Add license for all JS/CSS/PUML files................................Passed
Add license for all JINJA template files.............................Passed
Add license for all shell files......................................Passed
Add license for all python files.....................................Passed
Add license for all XML files........................................Passed
Add license for all yaml files.......................................Passed
Add license for all md files.........................................Passed
Add license for all mermaid files....................................Passed
Add TOC for md files.................................................Passed
Add TOC for upgrade documentation....................................Passed
Check hooks apply to the repository..................................Passed
black................................................................Passed
Check for merge conflicts............................................Passed
Debug Statements (Python)............................................Passed
Check builtin type constructor use...................................Passed
Detect Private Key...................................................Passed
Fix End of Files.....................................................Passed
...........................................................................
- Running pre-commit for selected files
$ pre-commit run --files airflow/utils/decorators.py tests/utils/test_task_group.py
- Running specific hook for selected files
$ pre-commit run black --files airflow/decorators.py tests/utils/test_task_group.py
black...............................................................Passed
$ pre-commit run ruff --files airflow/decorators.py tests/utils/test_task_group.py
Run ruff............................................................Passed
- Enabling Pre-commit check before push. It will run pre-commit automatically before committing and stops the commit
$ cd ~/Projects/airflow
$ pre-commit install
$ git commit -m "Added xyz"
- To disable Pre-commit
$ cd ~/Projects/airflow
$ pre-commit uninstall
- For more information on visit STATIC_CODE_CHECKS.rst
Following are some of the important links of STATIC_CODE_CHECKS.rst
- It may require some packages to be installed; watch the output of the command to see which ones are missing.
$ sudo apt-get install sqlite libsqlite3-dev default-libmysqlclient-dev postgresql
$ ./scripts/tools/initialize_virtualenv.py
- Add following line to ~/.bashrc in order to call breeze command from anywhere.
export PATH=${PATH}:"/home/${USER}/Projects/airflow"
source ~/.bashrc
You can usually conveniently run tests in your IDE (see IDE below) using virtualenv but with Breeze you can be sure that all the tests are run in the same environment as tests in CI.
All Tests are inside ./tests directory.
Running Unit tests inside Breeze environment.
Just run
pytest filepath+filename
to run the tests.
root@63528318c8b1:/opt/airflow# pytest tests/utils/test_dates.py
============================================================= test session starts ==============================================================
platform linux -- Python 3.8.16, pytest-7.2.1, pluggy-1.0.0 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: timeouts-1.2.1, capture-warnings-0.0.4, cov-4.0.0, requests-mock-1.10.0, rerunfailures-11.1.1, anyio-3.6.2, instafail-0.4.2, time-machine-2.9.0, asyncio-0.20.3, httpx-0.21.3, xdist-3.2.0
asyncio: mode=strict
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 12 items
tests/utils/test_dates.py::TestDates::test_days_ago PASSED [ 8%]
tests/utils/test_dates.py::TestDates::test_parse_execution_date PASSED [ 16%]
tests/utils/test_dates.py::TestDates::test_round_time PASSED [ 25%]
tests/utils/test_dates.py::TestDates::test_infer_time_unit PASSED [ 33%]
tests/utils/test_dates.py::TestDates::test_scale_time_units PASSED [ 41%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_no_delta PASSED [ 50%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_end_date_before_start_date PASSED [ 58%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_both_end_date_and_num_given PASSED [ 66%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_invalid_delta PASSED [ 75%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_positive_num_given PASSED [ 83%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_negative_num_given PASSED [ 91%]
tests/utils/test_dates.py::TestUtilsDatesDateRange::test_delta_cron_presets PASSED [100%]
============================================================== 12 passed in 0.24s ==============================================================
- Running All the test with Breeze by specifying required python version, backend, backend version
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset testing tests --test-type All
Running specific type of test
- Types of tests
- Running specific type of test
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset testing tests --test-type Core
Running Integration test for specific test type
- Running an Integration Test
$ breeze --backend postgres --postgres-version 10 --python 3.8 --db-reset testing tests --test-type All --integration mongo
For more information on Testing visit : TESTING.rst
Following are the some of important topics of TESTING.rst
- To know how to contribute to the project visit CONTRIBUTING.rst
Following are some of important links of CONTRIBUTING.rst
Go to your GitHub account and open your fork project and click on Branches
Click on
New pull request
button on branch from which you want to raise a pull request.Add title and description as per Contributing guidelines and click on
Create pull request
.
Often it takes several days or weeks to discuss and iterate with the PR until it is ready to merge.
In the meantime new commits are merged, and you might run into conflicts, therefore you should periodically
synchronize main in your fork with the apache/airflow
main and rebase your PR on top of it. Following
describes how to do it.
If you are familiar with Python development and use your favourite editors, Airflow can be setup similarly to other projects of yours. However, if you need specific instructions for your IDE you will find more detailed instructions here:
In order to use remote development environment, you usually need a paid account, but you do not have to setup local machine for development.