brandonwie.dev
EN / KR
On this page
devops devopsairflowdeploymentgitopswork

DAG Deployment Strategies

Different approaches to deploying Airflow DAGs, with trade-offs analysis.

Updated March 22, 2026 5 min read

When setting up Airflow on EC2 with Docker Compose, the first question was: how do DAG files get from the Git repository onto the running containers? Airflow documentation describes multiple approaches without recommending one. Picking the wrong strategy early means painful migration later as the team or infrastructure grows.

I evaluated four common approaches, chose the simplest one for our two-person team, and documented the decision tree so future-me (or future-team) knows when to migrate.

The Difficulty

No single recommended approach — Airflow docs describe several strategies but don’t recommend one for a given setup. I had to piece together trade-offs from blog posts, GitHub issues, and Helm chart defaults.

Conflating DAG deployment with code deployment — Most guides bundle deploying DAG Python files with deploying the Airflow application itself (Docker image). These are independent concerns.

Git-sync sidecar docs assume Kubernetes — The most-documented approach is Kubernetes-native. Translating it to Docker Compose on EC2 felt like forcing a pattern.

The Four Approaches

1. Full Git Repo on EC2

Clone the entire repository to the EC2 instance. Containers volume-mount the dags/ folder.

EC2 /opt/airflow/          ← Full Git repository
├── dags/                  ← DAG files
├── master/
│   └── docker-compose.yml
├── worker/
└── .git/

Changes sync via git pull. The Airflow scheduler detects new/changed DAGs within ~30 seconds. No container restart needed.

Best for: Small teams (2-10), EC2-based, frequent DAG changes.

2. Docker Image with DAGs (Bake into Image)

Include DAG files in the Docker image at build time:

# Dockerfile
COPY dags/ /opt/airflow/dags/

DAG changes require an image rebuild and container restart. This gives you immutable, versioned deployments but slows iteration significantly.

Best for: Immutable infrastructure, strict versioning requirements.

3. Git-Sync Sidecar (Kubernetes Standard)

A separate git-sync container periodically pulls the repo and shares DAGs via a volume:

# Kubernetes Pod
containers:
  - name: scheduler
    image: airflow
  - name: git-sync # Separate container
    image: git-sync
    args: ["--repo=https://github.com/...", "--branch=main"]

This is the standard pattern for Kubernetes-based Airflow deployments. The sidecar handles pulling, and the scheduler sees updates without restart.

Best for: Kubernetes environments, large teams.

4. S3/EFS Sync

Upload DAGs to an S3 bucket or mount an EFS volume:

S3 bucket                    EC2
s3://airflow-dags/   ───►  /opt/airflow/dags/

AWS-native and works across regions, but adds infrastructure (S3 bucket or EFS mount) and introduces sync lag.

Best for: AWS-native workflows, multi-region deployments.

Comparison Matrix

CriterionGit Repo on EC2Bake into ImageGit-SyncS3/EFS
Setup complexityLowLowMediumMedium
DAG change speedFast (git pull)Slow (rebuild)FastFast
Container restartNoYesNoNo
Extra infraNoneNoneSidecarS3/EFS
Best environmentEC2 small teamImmutable infraKubernetesAWS native
Team size2-10AnyLargeMedium-Large

Our Choice: Full Git Repo on EC2

We chose the simplest option because:

  • Small team (2 people) on EC2, not Kubernetes
  • DAG changes are frequent and need fast iteration (seconds, not minutes)
  • Zero downtime is critical — no container restarts for DAG-only changes
  • Git provides built-in version control and instant rollback
  • The downsides (repo exposure, auth) are mitigated with deploy keys and .gitignore

Decision Tree

What's your infrastructure?
├─ Kubernetes
│   └─ Use Git-Sync Sidecar

├─ EC2 with small team (< 10)
│   └─ Use Full Git Repo on EC2

├─ Strict immutable requirements
│   └─ Use Bake into Image

└─ AWS-native, multi-region
    └─ Use S3/EFS Sync

The Git Repo Approach in Detail

Deployment Flow

1. Developer edits DAG
   └─► git push origin main

2. GitHub Actions triggers
   └─► dags/ change detected

3. SSM command to EC2
   └─► cd /opt/airflow && git pull

4. Scheduler detects (~30 seconds)
   └─► New DAG parsed and ready

Container restart: NOT NEEDED
Downtime: NONE
Reflection time: ~30 seconds

Pros

AdvantageDescription
SimpleNo extra infra (S3, EFS, git-sync container)
Fast deploySingle git pull syncs DAGs
Familiar workflowStandard Git-based deployment
Zero downtimeNo container restart for DAG changes
Version controlGit history for DAG changes
Easy rollbackgit checkout <commit> for instant rollback

Cons and Mitigations

DisadvantageDescriptionMitigation
Git dependencyEC2 needs GitAmazon Linux has Git built-in
Full repo exposedUnnecessary files on EC2.gitignore for sensitive files
Auth requiredPrivate repo needs credentialsDeploy Key or HTTPS + PAT
Manual syncNot auto-syncedCI/CD automation (SSM)

Directory Convention

/opt is the Linux standard directory for third-party software. Apache Airflow uses AIRFLOW_HOME=/opt/airflow as the default:

/opt        ← Third-party apps (Airflow, Jenkins, etc.)
/usr        ← System-installed software
/home       ← User home directories

When to Migrate

SituationRecommended Change
Kubernetes adoptionGit-Sync Sidecar
Security hardeningBake into Image
Multi-Region deploymentS3 + CloudFront
DAG 10+, team 5+Git-Sync or S3

When NOT to Use Each Strategy

  • Full Git Repo on EC2 — Don’t use if the repository contains secrets that can’t be excluded via .gitignore, or if compliance requires immutable deployments with auditable image tags
  • Bake into Image — Don’t use if DAG iteration speed matters. Rebuilding and restarting containers for every change creates unacceptable feedback loops
  • Git-Sync Sidecar — Don’t use on plain EC2 or Docker Compose. The sidecar adds unnecessary complexity outside of Kubernetes
  • S3/EFS Sync — Don’t use if you need strict version control. S3 sync doesn’t provide atomic updates or rollback guarantees the way Git does

Takeaway

There’s no universally correct DAG deployment strategy — it depends on your infrastructure, team size, and iteration speed requirements. For small teams on EC2, a full Git repo with CI/CD-triggered git pull is the simplest approach with the fastest feedback loop. Know your decision tree and plan your migration path for when the team or infrastructure outgrows the current strategy.

References

Comments

enko