On this page
Airflow CI/CD Concepts
Understanding Airflow deployment and CI/CD concepts through a kitchen analogy.
When I first started working with Airflow, the relationship between DAGs, ETL code, Docker images, and deployments was confusing. What gets restarted when? Which changes are hot-reloaded? Why does a DAG change deploy instantly but a requirements.txt change needs a full rebuild? A kitchen analogy helped me understand the architecture, and knowing which changes need restarts versus which are picked up automatically saved me from unnecessary downtime.
DAGs vs ETL: Recipes vs Cooking
The most fundamental distinction in Airflow is between the DAG and the ETL code.
DAG = Recipe Card
A DAG (Directed Acyclic Graph) is like a recipe card. It defines:
- WHAT to do (which tasks to run)
- WHEN to do it (schedule: daily, hourly, etc.)
- IN WHAT ORDER (task A before task B)
# This is a DAG - it's just instructions, not the actual work
with DAG('amplitude_pipeline', schedule='@daily'):
task1 = "Fetch data from Amplitude" # Step 1
task2 = "Transform the data" # Step 2
task3 = "Save to database" # Step 3
task1 >> task2 >> task3 # Do in this order ETL = The Actual Cooking
ETL (Extract, Transform, Load) is the actual code that does the work:
- Extract: Get data (connect to API, download events)
- Transform: Process data (clean, calculate, join)
- Load: Save results (to S3, database)
The key insight: the DAG doesn’t process data. It tells the worker “run this container now.” The container runs the ETL code, does the actual work, and exits. The DAG is the orchestrator, not the executor.
What Needs a Restart?
This is the question that confused me most. The answer depends on what changed:
No Restart Needed (90% of cases)
| Change | What happens |
|---|---|
dags/my_dag.py (new/modify) | Scheduler auto-detects in ~30 sec |
| ETL code (arch-etl) | Next DAG run uses new container |
DAG file changes are hot-reloaded — the Airflow scheduler continuously scans the DAGs folder and picks up changes within ~30 seconds. ETL code changes are even simpler: since each task run pulls a fresh Docker image, pushing a new image to ECR means the next DAG run automatically uses the updated code.
Restart Required (10% of cases)
| Change | Why restart |
|---|---|
| Airflow version | New image = need to restart |
| requirements.txt | New Python packages need to be in image |
| Dockerfile | Image changed = rebuild + restart |
| .env file | Environment variables loaded at container start |
These changes affect the Airflow infrastructure itself (scheduler, webserver, worker containers), not the DAGs or ETL code. Infrastructure changes require a container rebuild and restart.
Deployment Scenarios
Putting it all together:
| Scenario | Action | Restart? | Downtime |
|---|---|---|---|
| DAG changes | git pull on EC2 | No | None (~30s) |
| ETL code changes | ECR push | No | None |
| Airflow upgrade | Image rebuild + restart | Yes | ~1-2 min |
The Three Repos Pattern
In our setup, different types of code live in different repositories, each with their own deployment flow:
| Repo | Contains | Deploy How | Restart? |
|---|---|---|---|
arch-airflow | DAG files | git pull to EFS | No |
arch-airflow | Airflow images | ECR + docker restart | Yes (rare) |
arch-etl | ETL job code | ECR push | No (auto-pulls latest) |
backend-infra | Infrastructure | Terraform (one-time) | N/A |
The separation matters because DAG files change frequently (daily), ETL code changes moderately (weekly), and infrastructure changes rarely (monthly). Each should deploy independently without affecting the others.
Why This Matters
Understanding the deployment model prevents two common mistakes:
- Unnecessary restarts — Restarting Airflow containers for a DAG change causes 1-2 minutes of downtime for no reason. DAG changes are hot-reloaded.
- Missing restarts — Updating
requirements.txtwithout rebuilding the Docker image means the new package isn’t available. The DAG fails at runtime with an import error.
The rule is simple: if the change is to a Python file in the DAGs folder or to ETL code in a separate Docker image, no restart. If the change affects the Airflow infrastructure (Docker image, environment variables, Airflow version), restart.
Key Takeaways
- DAG = Recipe (what/when/order), ETL = Cooking (actual work)
- DAGs don’t touch data — they tell the worker “run this container now”
- Most deployments don’t need restart — DAGs and ETL are hot-reloaded
- Only restart for Airflow image changes (version upgrade, new packages)
- ETL containers are ephemeral — they run, do work, exit, and get deleted
Takeaway
Airflow’s deployment model has two independent paths: DAG files (hot-reloaded, no restart) and infrastructure (requires restart). Knowing the boundary between them — DAGs and ETL code versus Docker images and environment variables — eliminates both unnecessary downtime and missed restarts. When in doubt, check: does this change affect the Airflow containers themselves, or just the instructions they follow?