Post

🌊 GitOps: Deep Dive & Best Practices

Concise, clear, and validated revision notes on GitOps (Git, GitHub, GitLab) β€” structured for beginners and practitioners.

🌊 GitOps: Deep Dive & Best Practices

Table of Contents

  1. Introduction
  2. Core Concepts
  3. GitOps Principles
  4. Git Fundamentals
  5. GitHub and GitHub Actions
  6. GitLab CI/CD
  7. Repository Structure
  8. Branching Strategies
  9. CI/CD Pipeline Design
  10. Infrastructure as Code
  11. Deployment Strategies
  12. Security Best Practices
  13. Monitoring and Observability
  14. GitOps Tools
  15. Best Practices
  16. Common Pitfalls
  17. Jargon Tables

Introduction

GitOps is a modern operational framework that leverages Git as the single source of truth for declarative infrastructure and application code. It extends DevOps practices by using Git repositories to manage infrastructure configuration and application deployment, enabling teams to deliver software faster, more reliably, and with greater auditability.

Directory Structure Best Practices

Use Folders, Not Branches:

  • Avoid environment branches (dev, staging, prod)
  • Use directories to organize environments
  • Easier to see all variants simultaneously
  • Simpler promotion between environments
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
k8s/
β”œβ”€β”€ base/                     # Common configuration
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ service.yaml
β”‚   └── kustomization.yaml
└── overlays/
    β”œβ”€β”€ dev/
    β”‚   β”œβ”€β”€ kustomization.yaml
    β”‚   └── patch-replicas.yaml
    β”œβ”€β”€ staging/
    β”‚   β”œβ”€β”€ kustomization.yaml
    β”‚   └── patch-replicas.yaml
    └── prod/
        β”œβ”€β”€ kustomization.yaml
        └── patch-replicas.yaml

WET vs DRY Configuration:

DRY (Don’t Repeat Yourself): Use templates and generators

  • Pros: Less repetition, easier updates
  • Cons: Harder to review, requires processing

WET (Write Everything Twice): Explicit configuration files

  • Pros: Easy to review, no processing needed
  • Cons: More files, potential inconsistencies

Recommendation: Use WET for GitOps

  • Changes are visible in pull requests
  • No hidden logic or transformations
  • Config Sync applies exactly what’s in Git

Branching Strategies

Trunk-Based Development

Recommended for GitOps: Single main branch with short-lived feature branches.

Principles:

  • Main branch is always deployable
  • Feature branches live < 2 days
  • Small, incremental changes
  • Continuous integration
  • Feature flags for incomplete features
1
2
3
4
main ─────●─────●─────●─────●─────●─────
           \   /       \   /
            ● ●         ● ●
         feature-1   feature-2

Workflow:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 1. Create feature branch
git checkout -b feature/add-health-check

# 2. Make small changes
vim deployment.yaml

# 3. Commit frequently
git add deployment.yaml
git commit -m "feat: add liveness probe to deployment"

# 4. Push and create PR immediately
git push origin feature/add-health-check

# 5. Merge quickly (within hours)
# 6. Delete branch
git branch -d feature/add-health-check

Environment Promotion

Use Directories, Not Branches:

1
2
3
4
5
6
7
configs/
β”œβ”€β”€ dev/
β”‚   └── app-config.yaml
β”œβ”€β”€ staging/
β”‚   └── app-config.yaml
└── prod/
    └── app-config.yaml

Promotion Process:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 1. Test in dev
git checkout main
cd configs/dev
# make changes, test

# 2. Promote to staging
cp dev/app-config.yaml staging/app-config.yaml
# adjust environment-specific values
git add staging/
git commit -m "chore: promote dev config to staging"
git push

# 3. After validation, promote to prod
cp staging/app-config.yaml prod/app-config.yaml
# adjust environment-specific values
git add prod/
git commit -m "chore: promote staging config to prod"
git push

Release Strategies

1
2
3
4
5
6
7
main ────────────●──────────●──────────●───
                /          /          /
release ───────●──────────●──────────●─────
              /          /          /
develop ─────●──────────●──────────●───────
            / \        / \        / \
feature ───●   ●──────●   ●──────●   ●─────

Why Not for GitOps:

  • Multiple long-lived branches
  • Complex merge strategies
  • Cherry-picking required
  • Doesn’t match declarative model
1
2
3
4
main ─────●─────●─────●─────●─────●─────
           \   /       \   /       \   /
            ● ●         ● ●         ● ●
         feature-1   feature-2   feature-3

Workflow:

  1. Branch from main
  2. Make changes
  3. Create PR
  4. Review and test
  5. Merge to main
  6. Delete branch

CI/CD Pipeline Design

Pipeline Stages

1
2
3
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Code   │───▢│ Build   │───▢│  Test   │───▢│ Deploy  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1. Code Stage

Pre-commit Hooks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# .git/hooks/pre-commit
#!/bin/bash

# Run linting
npm run lint
if [ $? -ne 0 ]; then
    echo "Linting failed. Commit aborted."
    exit 1
fi

# Run tests
npm test
if [ $? -ne 0 ]; then
    echo "Tests failed. Commit aborted."
    exit 1
fi

exit 0

Pre-push Hooks:

1
2
3
4
5
6
7
8
9
10
11
# .git/hooks/pre-push
#!/bin/bash

# Prevent push to main
branch=$(git rev-parse --abbrev-ref HEAD)
if [ "$branch" = "main" ]; then
    echo "Direct push to main is not allowed."
    exit 1
fi

exit 0

2. Build Stage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
build:
  stage: build
  script:
    # Build application
    - docker build -t $IMAGE:$CI_COMMIT_SHA .
    
    # Scan for vulnerabilities
    - trivy image --severity HIGH,CRITICAL $IMAGE:$CI_COMMIT_SHA
    
    # Push to registry
    - docker push $IMAGE:$CI_COMMIT_SHA
    
    # Update image tag in GitOps repo
    - cd gitops-repo
    - kustomize edit set image app=$IMAGE:$CI_COMMIT_SHA
    - git commit -am "Update image to $CI_COMMIT_SHA"
    - git push

3. Test Stage

Test Types:

1
2
3
4
5
6
7
8
test:
  parallel:
    matrix:
      - TEST_TYPE: unit
      - TEST_TYPE: integration
      - TEST_TYPE: e2e
  script:
    - npm run test:$TEST_TYPE

Test Pyramid:

1
2
3
4
5
6
7
       /\
      /  \     E2E Tests (Few)
     /____\
    /      \   Integration Tests (Some)
   /________\
  /          \ Unit Tests (Many)
 /____________\

4. Deploy Stage

GitOps Deploy (Update manifest, agent applies):

1
2
3
4
5
6
7
8
9
deploy:
  stage: deploy
  script:
    - git clone https://gitlab.com/org/gitops-repo.git
    - cd gitops-repo
    - yq eval ".spec.template.spec.containers[0].image = \"$IMAGE:$TAG\"" -i deployment.yaml
    - git add deployment.yaml
    - git commit -m "Deploy $TAG to production"
    - git push

Pipeline Best Practices

1. Fail Fast

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
jobs:
  quick-checks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint
        run: npm run lint
      - name: Type check
        run: npm run typecheck
  
  expensive-tests:
    needs: quick-checks  # Only run if quick checks pass
    runs-on: ubuntu-latest
    steps:
      - name: Integration tests
        run: npm run test:integration

2. Cache Dependencies

1
2
3
4
5
6
7
- name: Cache node modules
  uses: actions/cache@v3
  with:
    path: ~/.npm
    key: $-node-$
    restore-keys: |
      $-node-

3. Parallel Execution

1
2
3
4
5
6
7
test:
  strategy:
    matrix:
      suite: [unit, integration, e2e]
  runs-on: ubuntu-latest
  steps:
    - run: npm run test:$

4. Conditional Execution

1
2
3
4
5
6
7
8
9
10
11
deploy-staging:
  if: github.ref == 'refs/heads/develop'
  runs-on: ubuntu-latest
  steps:
    - run: ./deploy.sh staging

deploy-prod:
  if: github.event_name == 'release'
  runs-on: ubuntu-latest
  steps:
    - run: ./deploy.sh production

5. Manual Approval Gates

1
2
3
4
5
6
7
8
deploy-production:
  runs-on: ubuntu-latest
  environment:
    name: production
    url: https://example.com
  steps:
    - run: ./deploy.sh
  # Requires manual approval in GitHub

Infrastructure as Code

Terraform

Example Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# main.tf
terraform {
  required_version = ">= 1.0"
  
  backend "s3" {
    bucket = "terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.region
}

# VPC
module "vpc" {
  source = "./modules/vpc"
  
  cidr_block = "10.0.0.0/16"
  azs        = ["us-east-1a", "us-east-1b", "us-east-1c"]
  
  tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
  }
}

# EKS Cluster
module "eks" {
  source = "./modules/eks"
  
  cluster_name    = "my-cluster"
  cluster_version = "1.28"
  
  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets
  
  node_groups = {
    general = {
      desired_capacity = 3
      max_capacity     = 10
      min_capacity     = 2
      
      instance_types = ["t3.medium"]
      
      labels = {
        role = "general"
      }
    }
  }
}

# Outputs
output "cluster_endpoint" {
  value = module.eks.cluster_endpoint
}

output "cluster_name" {
  value = module.eks.cluster_name
}

Terraform Workflow:

1
2
3
4
5
6
7
8
9
10
11
# Initialize
terraform init

# Plan changes
terraform plan -out=tfplan

# Apply changes
terraform apply tfplan

# Destroy resources
terraform destroy

GitOps with Terraform:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# .github/workflows/terraform.yml
name: Terraform

on:
  push:
    branches: [ main ]
    paths:
      - 'terraform/**'
  pull_request:
    branches: [ main ]
    paths:
      - 'terraform/**'

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: terraform
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Format
        run: terraform fmt -check
      
      - name: Terraform Validate
        run: terraform validate
      
      - name: Terraform Plan
        run: terraform plan -no-color
        continue-on-error: true
      
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve

Kubernetes Manifests

Plain YAML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        version: v1.0.0
    spec:
      containers:
      - name: app
        image: myapp:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: production
spec:
  selector:
    app: my-app
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

Kustomize

Directory Structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
k8s/
β”œβ”€β”€ base/
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ service.yaml
β”‚   β”œβ”€β”€ configmap.yaml
β”‚   └── kustomization.yaml
└── overlays/
    β”œβ”€β”€ dev/
    β”‚   β”œβ”€β”€ kustomization.yaml
    β”‚   β”œβ”€β”€ patch-replicas.yaml
    β”‚   └── patch-resources.yaml
    β”œβ”€β”€ staging/
    β”‚   β”œβ”€β”€ kustomization.yaml
    β”‚   └── patch-replicas.yaml
    └── prod/
        β”œβ”€β”€ kustomization.yaml
        └── patch-replicas.yaml

Base Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - deployment.yaml
  - service.yaml
  - configmap.yaml

commonLabels:
  app: my-app
  managedBy: kustomize

Dev Overlay:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# overlays/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

bases:
  - ../../base

namespace: dev

patches:
  - patch-replicas.yaml
  - patch-resources.yaml

images:
  - name: myapp
    newTag: dev-latest

configMapGenerator:
  - name: app-config
    behavior: merge
    literals:
      - LOG_LEVEL=debug
      - ENVIRONMENT=development
1
2
3
4
5
6
7
# overlays/dev/patch-replicas.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1

Build and Apply:

1
2
3
4
5
6
7
8
# Build kustomization
kustomize build overlays/dev

# Apply to cluster
kustomize build overlays/dev | kubectl apply -f -

# Or use kubectl directly
kubectl apply -k overlays/dev

Helm

Chart Structure:

1
2
3
4
5
6
7
8
9
10
11
my-app/
β”œβ”€β”€ Chart.yaml
β”œβ”€β”€ values.yaml
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”œβ”€β”€ service.yaml
β”‚   β”œβ”€β”€ ingress.yaml
β”‚   β”œβ”€β”€ configmap.yaml
β”‚   β”œβ”€β”€ secret.yaml
β”‚   └── _helpers.tpl
└── charts/

Chart.yaml:

1
2
3
4
5
6
apiVersion: v2
name: my-app
description: A Helm chart for my application
type: application
version: 1.0.0
appVersion: "1.0.0"

values.yaml:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
replicaCount: 3

image:
  repository: myapp
  tag: "1.0.0"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: 
  labels:
spec:
  replicas: 
  selector:
    matchLabels:
  template:
    metadata:
      labels:
    spec:
      containers:
      - name: 
        image: ":"
        imagePullPolicy: 
        ports:
        - name: http
          containerPort: 8080
          protocol: TCP
        resources:

Environment-Specific Values:

1
2
3
4
5
6
7
8
9
10
11
12
13
# values-dev.yaml
replicaCount: 1

image:
  tag: dev-latest

resources:
  requests:
    cpu: 50m
    memory: 64Mi

autoscaling:
  enabled: false
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# values-prod.yaml
replicaCount: 5

image:
  tag: "1.0.0"

resources:
  requests:
    cpu: 200m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20

Helm Commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Install chart
helm install my-app ./my-app -f values-prod.yaml

# Upgrade
helm upgrade my-app ./my-app -f values-prod.yaml

# Rollback
helm rollback my-app 1

# Uninstall
helm uninstall my-app

# List releases
helm list

# Get values
helm get values my-app

Deployment Strategies

Rolling Update

Description: Gradually replace old pods with new ones.

1
2
3
4
5
6
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max pods above desired count
      maxUnavailable: 0  # Max pods that can be unavailable

Pros:

  • Zero downtime
  • Gradual rollout
  • Easy rollback

Cons:

  • Both versions run simultaneously
  • Slower than recreate

Blue-Green Deployment

Description: Run two identical environments, switch traffic between them.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Blue deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: blue

---
# Green deployment (new)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: green

---
# Service points to active version
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
    version: blue  # Switch to 'green' to cutover

Pros:

  • Instant rollback
  • Zero downtime
  • Full testing before cutover

Cons:

  • Double resources required
  • Database migrations complex

Canary Deployment

Description: Route small percentage of traffic to new version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Stable version (90% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-stable
spec:
  replicas: 9

---
# Canary version (10% of traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-canary
spec:
  replicas: 1

Using Istio:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts:
  - my-app
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: my-app
        subset: canary
  - route:
    - destination:
        host: my-app
        subset: stable
      weight: 90
    - destination:
        host: my-app
        subset: canary
      weight: 10

Pros:

  • Reduced risk
  • Real user testing
  • Gradual rollout

Cons:

  • Complex setup
  • Monitoring required
  • Longer deployment time

Security Best Practices

1. Secrets Management

Never Commit Secrets to Git:

1
2
3
4
5
6
# .gitignore
.env
secrets.yaml
*.pem
*.key
credentials.json

Use External Secret Stores:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: db-credentials
    creationPolicy: Owner
  data:
  - secretKey: password
    remoteRef:
      key: prod/db/password

Sealed Secrets (Bitnami):

1
2
3
4
5
6
# Encrypt secret
kubeseal --format yaml < secret.yaml > sealed-secret.yaml

# Commit sealed secret to Git
git add sealed-secret.yaml
git commit -m "Add database credentials"
1
2
3
4
5
6
7
8
# sealed-secret.yaml (safe to commit)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-credentials
spec:
  encryptedData:
    password: AgBHW3N2c3RoaW5nZW5jcnlwdGVkCg==

2. RBAC (Role-Based Access Control)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: production
rules:
- apiGroups: ["", "apps"]
  resources: ["pods", "deployments"]
  verbs: ["get", "list", "watch"]

---
# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: production
subjects:
- kind: User
  name: jane.doe@example.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

3. Pod Security

Pod Security Standards:

1
2
3
4
5
6
7
8
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Context:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:1.0.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL

4. Network Policies

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

5. Image Security

Image Scanning:

1
2
3
4
5
6
7
8
9
10
11
12
13
# .github/workflows/security.yml
- name: Run Trivy scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: $:$
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'

- name: Upload to GitHub Security
  uses: github/codeql-action/upload-sarif@v2
  with:
    sarif_file: 'trivy-results.sarif'

Image Signing (Cosign):

1
2
3
4
5
# Sign image
cosign sign --key cosign.key $IMAGE:$TAG

# Verify signature
cosign verify --key cosign.pub $IMAGE:$TAG

6. Audit Logging

1
2
3
4
5
6
7
8
9
10
11
12
# Enable audit logging in Kubernetes
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
  resources:
  - group: ""
    resources: ["secrets", "configmaps"]
- level: RequestResponse
  resources:
  - group: "apps"
    resources: ["deployments", "statefulsets"]

Monitoring and Observability

Metrics

Prometheus:

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: ServiceMonitor
metadata:
  name: my-app
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Key Metrics:

  • Application: Request rate, error rate, latency (RED)
  • Infrastructure: CPU, memory, disk, network (USE)
  • GitOps: Sync status, drift detection, reconciliation time

Logging

Structured Logging:

1
2
3
4
5
6
7
8
9
{
  "timestamp": "2025-01-15T10:30:00Z",
  "level": "info",
  "message": "Deployment successful",
  "service": "my-app",
  "version": "v1.2.3",
  "environment": "production",
  "user": "jane.doe@example.com"
}

Log Aggregation:

  • ELK Stack (Elasticsearch, Logstash, Kibana)
  • Loki (Grafana)
  • CloudWatch Logs (AWS)

Tracing

OpenTelemetry:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    
    processors:
      batch:
    
    exporters:
      jaeger:
        endpoint: jaeger:14250
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [jaeger]

Alerting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# PrometheusRule
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-app-alerts
spec:
  groups:
  - name: my-app
    rules:
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "High error rate detected"
        description: "Error rate is  requests/second"
    
    - alert: PodCrashLooping
      expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod  is crash looping"

GitOps Tools

ArgoCD

Installation:

1
2
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Application Definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  
  source:
    repoURL: https://github.com/org/gitops-repo.git
    targetRevision: HEAD
    path: apps/my-app/overlays/prod
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true
  
  ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
    - /spec/replicas

ArgoCD CLI:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Login
argocd login argocd.example.com

# List applications
argocd app list

# Get application details
argocd app get my-app

# Sync application
argocd app sync my-app

# Rollback
argocd app rollback my-app 0

Flux

Installation:

1
2
3
4
5
6
flux bootstrap github \
  --owner=myorg \
  --repository=fleet-infra \
  --branch=main \
  --path=./clusters/production \
  --personal

GitRepository:

1
2
3
4
5
6
7
8
9
10
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: GitRepository
metadata:
  name: my-app
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/org/my-app
  ref:
    branch: main

Kustomization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: my-app
  namespace: flux-system
spec:
  interval: 5m
  path: ./k8s/overlays/prod
  prune: true
  sourceRef:
    kind: GitRepository
    name: my-app
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: my-app
    namespace: production

Flux CLI:

1
2
3
4
5
6
7
8
9
10
11
12
# Check Flux components
flux check

# Get kustomizations
flux get kustomizations

# Reconcile
flux reconcile kustomization my-app

# Suspend/resume
flux suspend kustomization my-app
flux resume kustomization my-app

Jenkins X

Installation:

1
jx boot

Pipeline Configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# jenkins-x.yml
buildPack: none
pipelineConfig:
  pipelines:
    release:
      pipeline:
        stages:
        - name: build
          steps:
          - sh: docker build -t $DOCKER_REGISTRY/$APP_NAME:$VERSION .
        - name: test
          steps:
          - sh: make test
        - name: deploy
          steps:
          - sh: jx step helm apply

Comparison

FeatureArgoCDFluxJenkins X
UIβœ… Rich Web UI❌ Limitedβœ… Web UI
Multi-clusterβœ… Nativeβœ… Via Git reposβœ… Native
Helm Supportβœ… Fullβœ… Fullβœ… Native
Kustomize Supportβœ… Fullβœ… Fullβœ… Via plugin
SSOβœ… OIDC, LDAPβŒβœ… OAuth
RBACβœ… Fine-grainedβœ… K8s RBACβœ… K8s RBAC
Notificationsβœ… Slack, Emailβœ… Slack, Emailβœ… Multiple
CI Integrationβœ… Any CIβœ… Any CIβœ… Built-in
Learning CurveMediumLowHigh

Best Practices

1. Git Repository Organization

Separate Concerns:

  • Application code repository
  • Infrastructure repository
  • Configuration repository

Benefits:

  • Different lifecycles
  • Different teams
  • Different security requirements
  • Different approval processes
1
2
3
4
org/
β”œβ”€β”€ app-user-service/       # Application code
β”œβ”€β”€ infrastructure/         # Terraform, CloudFormation
└── gitops-configs/         # K8s manifests, Helm values

2. Environment Management

Use Directories, Not Branches:

1
2
3
4
5
6
configs/
β”œβ”€β”€ base/                   # Common configuration
└── environments/
    β”œβ”€β”€ dev/
    β”œβ”€β”€ staging/
    └── prod/

Environment Promotion:

1
2
3
4
5
6
# Promote staging to prod
git diff environments/staging environments/prod
git checkout environments/staging -- app-config.yaml
mv app-config.yaml environments/prod/
git add environments/prod/
git commit -m "Promote staging config to prod"

3. Declarative Configuration

Always Use Declarative Syntax:

1
2
3
4
5
6
7
8
# Good - Declarative
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 3

# Bad - Imperative
# kubectl scale deployment my-app --replicas=3

4. Version Everything

Tag Releases:

1
2
git tag -a v1.2.3 -m "Release version 1.2.3"
git push origin v1.2.3

Semantic Versioning:

  • MAJOR.MINOR.PATCH (1.2.3)
  • MAJOR: Breaking changes
  • MINOR: New features (backward compatible)
  • PATCH: Bug fixes

5. Automated Testing

Test Infrastructure Code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# .github/workflows/terraform-test.yml
name: Terraform Test

on:
  pull_request:
    paths:
      - 'terraform/**'

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Terraform Format
        run: terraform fmt -check -recursive
      
      - name: Terraform Validate
        run: |
          terraform init -backend=false
          terraform validate
      
      - name: TFLint
        uses: terraform-linters/setup-tflint@v3
      
      - name: Run TFLint
        run: tflint --recursive
      
      - name: Checkov Security Scan
        uses: bridgecrewio/checkov-action@master
        with:
          directory: terraform/

Test Kubernetes Manifests:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# .github/workflows/k8s-test.yml
name: Kubernetes Manifest Test

on:
  pull_request:
    paths:
      - 'k8s/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup tools
        run: |
          curl -s https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh | bash
          sudo snap install kubeconform
      
      - name: Validate with kustomize
        run: |
          kustomize build k8s/overlays/prod > output.yaml
      
      - name: Validate with kubeconform
        run: |
          kubeconform -summary -output json output.yaml
      
      - name: Policy check with OPA
        uses: open-policy-agent/opa-action@v2
        with:
          tests: policies/

6. Security Practices

Scan for Secrets:

1
2
3
4
- name: Gitleaks scan
  uses: gitleaks/gitleaks-action@v2
  env:
    GITHUB_TOKEN: $

Sign Commits:

1
2
3
4
5
6
# Configure GPG
git config --global user.signingkey YOUR_GPG_KEY
git config --global commit.gpgsign true

# Sign commits
git commit -S -m "Add deployment configuration"

Verify Commits:

1
git verify-commit HEAD

7. Documentation

README Template:

1
2
3
4
5
6
7
8
9
# Project Name

## Overview
Brief description of the project and its purpose.

## Architecture
High-level architecture diagram and explanation.

## Repository Structure

project/ β”œβ”€β”€ apps/ # Application manifests β”œβ”€β”€ infrastructure/ # Infrastructure code └── docs/ # Documentation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
## Prerequisites
- Kubernetes 1.25+
- kubectl
- kustomize

## Deployment
Step-by-step deployment instructions.

## Monitoring
Links to dashboards and monitoring tools.

## Troubleshooting
Common issues and solutions.

## Contributing
Contribution guidelines.

8. Rollback Strategy

Keep Rollback Simple:

1
2
3
4
5
6
7
8
9
# With ArgoCD
argocd app rollback my-app 0

# With Flux
flux reconcile kustomization my-app --with-source

# With Git
git revert HEAD
git push

Test Rollback Procedures:

  • Practice rollbacks regularly
  • Automate rollback triggers
  • Monitor rollback success

9. Change Management

Pull Request Template:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## Description
What does this PR do?

## Type of Change
- [ ] New feature
- [ ] Bug fix
- [ ] Configuration change
- [ ] Infrastructure change

## Impact Analysis
- [ ] Affects production
- [ ] Requires downtime
- [ ] Breaking change
- [ ] Rollback plan documented

## Testing
- [ ] Tested in dev
- [ ] Tested in staging
- [ ] Load testing completed
- [ ] Security review completed

## Deployment Plan
Step-by-step deployment instructions

## Rollback Plan
Step-by-step rollback instructions

## Checklist
- [ ] Documentation updated
- [ ] Monitoring alerts configured
- [ ] Team notified

10. Observability

Monitor GitOps Health:

1
2
3
4
5
# Prometheus metrics
- argocd_app_sync_total
- argocd_app_health_status
- gitops_runtime_reconcile_duration_seconds
- flux_reconcile_duration_seconds

Dashboard Metrics:

  • Sync success rate
  • Time to sync
  • Drift detection count
  • Failed reconciliations
  • Deployment frequency
  • Lead time for changes
  • Mean time to recovery (MTTR)

11. Disaster Recovery

Backup Strategy:

1
2
3
4
5
# Backup cluster state
kubectl get all --all-namespaces -o yaml > cluster-backup.yaml

# Backup ArgoCD applications
argocd app list -o yaml > argocd-apps-backup.yaml

Recovery Plan:

  1. Restore infrastructure (Terraform)
  2. Deploy GitOps operator
  3. Apply application definitions
  4. Verify sync status

12. Progressive Delivery

Canary with Flagger:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
  webhooks:
  - name: load-test
    url: http://load-tester.test/
    timeout: 5s

Common Pitfalls

1. Committing Secrets to Git

Problem: Secrets accidentally committed to repository.

Solution:

  • Use .gitignore
  • Use git-secrets or gitleaks
  • Use external secret management
  • Rotate exposed secrets immediately
1
2
3
4
5
6
# Install git-secrets
git secrets --install
git secrets --register-aws

# Scan repository
git secrets --scan

2. Direct Cluster Modifications

Problem: Manual kubectl commands bypass GitOps.

Solution:

  • Enforce RBAC policies
  • Use admission controllers
  • Audit cluster changes
  • Educate team on GitOps workflow
1
2
3
4
5
6
7
# OPA Policy: Deny manual changes
package kubernetes.admission

deny[msg] {
  input.request.userInfo.username != "system:serviceaccount:flux-system:flux"
  msg := "Manual changes not allowed. Use GitOps."
}

3. Not Testing Before Merge

Problem: Broken configurations merged to main.

Solution:

  • Require CI checks to pass
  • Use branch protection
  • Enable preview environments
1
2
3
4
5
6
# Branch protection
main:
  required_status_checks:
    - validate-manifests
    - security-scan
  required_reviews: 2

4. Ignoring Drift

Problem: Actual state diverges from desired state.

Solution:

  • Enable auto-sync
  • Monitor drift metrics
  • Set up alerts
1
2
3
4
5
# ArgoCD auto-sync
syncPolicy:
  automated:
    prune: true
    selfHeal: true

5. Poor Repository Structure

Problem: Difficult to navigate and maintain.

Solution:

  • Follow consistent structure
  • Document organization
  • Use clear naming conventions

6. Missing Rollback Plan

Problem: No clear way to revert changes.

Solution:

  • Document rollback procedures
  • Practice rollbacks
  • Keep rollback simple (git revert)

7. Inadequate Monitoring

Problem: Don’t know when deployments fail.

Solution:

  • Monitor GitOps metrics
  • Set up alerts
  • Integrate with incident management

8. Over-Complicated Pipelines

Problem: Complex pipelines are hard to maintain.

Solution:

  • Keep pipelines simple
  • Use reusable workflows
  • Document complex logic

9. Lack of Documentation

Problem: Team doesn’t understand workflows.

Solution:

  • Document processes
  • Create runbooks
  • Provide training

10. Not Using Environments Properly

Problem: Testing directly in production.

Solution:

  • Use dev/staging/prod environments
  • Test in lower environments first
  • Automate promotion

Jargon Tables

Table 1: GitOps Lifecycle Terminology

GitOps TermAlternative TermsDefinitionContext
Desired StateTarget state, intended stateConfiguration stored in GitWhat you want
Actual StateCurrent state, live state, runtime stateCurrent configuration in clusterWhat you have
ReconciliationSync, convergence, drift correctionProcess of aligning actual with desiredContinuous process
DriftConfiguration drift, state divergenceDifference between desired and actual stateProblem detection
SyncSynchronization, apply, deployUpdate actual state to match desiredAction
Pull-basedAgent-based, operator patternAgent pulls changes from GitGitOps model
Push-basedTraditional CI/CD, pipeline deployPipeline pushes to clusterTraditional model
DeclarativeDescriptive, state-basedDefine what you want, not howConfiguration style
ImperativeProcedural, command-basedDefine how to achieve stateTraditional approach
ManifestConfiguration file, resource definitionYAML/JSON describing resourcesK8s terminology
GitOps AgentOperator, controller, reconcilerSoftware monitoring and applying changesArgoCD, Flux
Source of TruthSingle source, canonical sourceAuthoritative configuration locationGit repository
Auto-syncAutomated sync, continuous deploymentAutomatic application of changesGitOps feature
Self-healAuto-remediation, drift correctionAutomatic correction of manual changesGitOps feature
PruneCleanup, deletionRemove resources not in desired stateGitOps operation

Table 2: Git Operations Terminology

Git TermAlternative TermsDefinitionCommon Commands
RepositoryRepo, projectDirectory with Git historygit init, git clone
CommitRevision, snapshot, changesetSaved state of repositorygit commit
BranchLine of developmentParallel version of codegit branch, git checkout
MergeIntegration, combineIntegrate changes from branchesgit merge
Pull RequestPR, merge request (GitLab)Request to merge changesGitHub/GitLab UI
TagRelease tag, version tagNamed reference to commitgit tag
PushUpload, publishSend commits to remotegit push
PullDownload, fetch+mergeGet changes from remotegit pull
FetchRetrieve, downloadGet remote changes without mergegit fetch
RebaseReapply, replay commitsMove commits to new basegit rebase
Cherry-pickSelect commitApply specific commitgit cherry-pick
StashTemporary saveSave uncommitted changesgit stash
ResetUndo, rewindMove HEAD to different commitgit reset
RevertReverse, undo commitCreate new commit undoing changesgit revert
RemoteRepository URLRemote repository referencegit remote

Table 3: CI/CD Pipeline Stages

StageAlternative NamesPurposeCommon Tools
SourceCode checkout, cloneGet code from repositoryGit, GitHub, GitLab
BuildCompile, packageCreate deployable artifactsDocker, Maven, npm
TestValidation, quality checkVerify code qualityJest, pytest, JUnit
Security ScanSAST, vulnerability scanIdentify security issuesTrivy, Snyk, SonarQube
Artifact StorageRegistry, repositoryStore build artifactsDocker Hub, ECR, Nexus
DeployRelease, rolloutDeploy to environmentArgoCD, Flux, Helm
VerifySmoke test, health checkConfirm deployment successcurl, k8s probes
PromoteEnvironment progressionMove between environmentsGit operations

Table 4: Hierarchical GitOps Architecture

LevelComponentSub-ComponentPurposeTools
1Source ControlΒ Version control systemGit
Β Β RepositoryStore configurationsGitHub, GitLab
Β Β BranchParallel developmentGit branches
Β Β Pull RequestCode review mechanismGitHub PR, GitLab MR
2CI PipelineΒ Continuous IntegrationGitHub Actions, GitLab CI
Β Β BuildCreate artifactsDocker build
Β Β TestValidationpytest, jest
Β Β SecurityVulnerability scanningTrivy, Snyk
3Artifact RegistryΒ Store build outputsContainer registries
Β Β Container ImagesDocker imagesDocker Hub, ECR, GCR
Β Β Helm ChartsK8s packagesHelm registry
4GitOps OperatorΒ Sync engineArgoCD, Flux
Β Β Source ControllerMonitor Git reposFlux Source Controller
Β Β Sync ControllerApply changesArgoCD Application Controller
Β Β Health AssessmentCheck resource statusHealth checks
5Target EnvironmentΒ Deployment destinationKubernetes
Β Β ClusterK8s clusterEKS, GKE, AKS
Β Β NamespaceLogical separationK8s namespaces
Β Β WorkloadsRunning applicationsDeployments, StatefulSets

Table 5: Deployment Strategy Comparison

StrategySpeedRiskDowntimeResource CostRollback SpeedUse Case
RecreateFastHighYesLowSlowDev environments
Rolling UpdateMediumMediumNoLowMediumMost applications
Blue-GreenInstantLowNoHigh (2x)InstantCritical services
CanarySlowLowNoMediumFastHigh-risk changes
A/B TestingSlowLowNoMediumN/AFeature testing

Table 6: GitOps Tool Comparison

FeatureArgoCDFluxJenkins XSpinnaker
ArchitectureControllerOperatorPlatformPipeline
UIβœ… Rich⚠️ Basicβœ… Goodβœ… Rich
Multi-tenantβœ… Nativeβœ… Via namespacesβœ… Nativeβœ… Native
Helm Supportβœ… Fullβœ… Fullβœ… Nativeβœ… Full
Kustomizeβœ… Nativeβœ… Nativeβœ… Pluginβœ… Plugin
SSO/OIDCβœ… Yes❌ Noβœ… Yesβœ… Yes
RBACβœ… Fine-grainedβœ… K8s RBACβœ… K8s RBACβœ… Fine-grained
Webhook Eventsβœ… Yesβœ… Yesβœ… Yesβœ… Yes
Notificationsβœ… Multipleβœ… Multipleβœ… Multipleβœ… Multiple
Progressive Delivery⚠️ Via Argo Rolloutsβœ… Via Flagger❌ Noβœ… Native
Learning CurveMediumLowHighHigh
CommunityLargeLargeMediumLarge

Table 7: Infrastructure as Code Tools

ToolLanguageCloud SupportState ManagementUse Case
TerraformHCLMulti-cloudRemote backendsUniversal IaC
PulumiTypeScript, Python, GoMulti-cloudCloud storageCode-first IaC
CloudFormationYAML/JSONAWS onlyAWS managedAWS native
AnsibleYAMLMulti-cloudStatelessConfiguration management
HelmYAML + TemplatesKubernetesIn-clusterK8s packages
KustomizeYAML + OverlaysKubernetesStatelessK8s configuration

Table 8: Security Components in GitOps

ComponentPurposeToolsIntegration Point
Secret ManagementSecure credentialsSealed Secrets, External SecretsGit repository
Image ScanningVulnerability detectionTrivy, Snyk, ClairCI pipeline
Policy EnforcementCompliance checksOPA, Kyverno, GatekeeperAdmission controller
RBACAccess controlK8s RBAC, IAMCluster
Network PoliciesTraffic controlCalico, CiliumKubernetes
Audit LoggingChange trackingK8s audit, Git historyMultiple
SigningArtifact verificationCosign, NotaryContainer registry
SASTCode analysisSonarQube, CodeQLCI pipeline

Complete GitOps Workflow Example

Scenario: Deploy New Application Version

Step 1: Developer Makes Changes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Create feature branch
git checkout -b feature/update-version

# Update application code
vim src/app.py

# Update Docker image version
vim k8s/base/deployment.yaml

# Commit changes
git add .
git commit -m "feat: update application to v1.2.0"

# Push to remote
git push origin feature/update-version

Step 2: Create Pull Request

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# GitHub Actions runs automatically
name: CI Pipeline
on:
  pull_request:
    branches: [ main ]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Lint code
        run: npm run lint
      
      - name: Run tests
        run: npm test
      
      - name: Build Docker image
        run: docker build -t myapp:pr-$ .
      
      - name: Scan image
        run: trivy image myapp:pr-$
      
      - name: Validate K8s manifests
        run: kustomize build k8s/overlays/prod | kubeconform -

Step 3: Code Review and Approval

1
2
3
4
5
6
# PR Review Checklist
- [ ] Code follows style guidelines
- [ ] Tests pass
- [ ] Security scan clean
- [ ] Documentation updated
- [ ] Approved by 2 reviewers

Step 4: Merge to Main

1
2
3
4
# After approval, merge PR
git checkout main
git merge feature/update-version
git push origin main

Step 5: CI/CD Pipeline Builds and Pushes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
name: Build and Deploy
on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build image
        run: docker build -t myapp:$ .
      
      - name: Push to registry
        run: docker push myapp:$
      
      - name: Update GitOps repo
        run: |
          git clone https://github.com/org/gitops-repo.git
          cd gitops-repo/k8s/overlays/prod
          kustomize edit set image myapp=myapp:$
          git commit -am "Deploy myapp:$"
          git push

Step 6: GitOps Agent Syncs

1
2
3
4
5
6
7
8
9
# ArgoCD detects change in Git
# Reconciliation loop:
# 1. Fetch latest from Git
# 2. Compare with cluster state
# 3. Apply differences
# 4. Monitor health

# View sync status
argocd app get myapp

Step 7: Verification

1
2
3
4
5
6
7
8
9
10
11
# Check deployment
kubectl get deployment myapp -n production

# Check pods
kubectl get pods -n production -l app=myapp

# Check logs
kubectl logs -n production -l app=myapp --tail=100

# Verify health
curl https://myapp.example.com/health

Step 8: Monitoring

1
2
3
4
5
6
7
8
# Prometheus alerts fire if issues detected
- alert: DeploymentFailed
  expr: kube_deployment_status_replicas_unavailable > 0
  for: 5m

- alert: HighErrorRate
  expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
  for: 10m

Step 9: Rollback (if needed)

1
2
3
4
5
6
7
8
9
# Option 1: Git revert
git revert HEAD
git push

# Option 2: ArgoCD rollback
argocd app rollback myapp 0

# Option 3: Manual kubectl
kubectl rollout undo deployment/myapp -n production

References


Summary

GitOps is a powerful operational framework that leverages Git as the single source of truth for declarative infrastructure and applications. By treating infrastructure and application configuration as code stored in Git repositories, teams can achieve:

Key Benefits

  • Increased Velocity: Faster deployments with automated pipelines
  • Improved Stability: Declarative configurations reduce errors
  • Enhanced Security: Audit trails, RBAC, and secret management
  • Better Collaboration: Git-based workflows enable code review
  • Disaster Recovery: Complete system state in Git enables easy restoration
  • Compliance: Full audit trail of all changes

Core Principles

  1. Declarative: System’s desired state described declaratively
  2. Versioned: All configuration stored in Git with full history
  3. Automated: Software agents automatically apply desired state
  4. Reconciled: Continuous monitoring and drift correction

Essential Components

  • Git: Version control and source of truth
  • CI/CD: Automated pipelines for building and testing
  • GitOps Agent: ArgoCD, Flux, or similar tools
  • Kubernetes: Target platform for deployments
  • IaC Tools: Terraform, Helm, Kustomize

Best Practices

  • Separate application code from configuration
  • Use directories, not branches, for environments
  • Implement comprehensive testing
  • Secure secrets with external management
  • Monitor GitOps metrics and health
  • Document processes and maintain runbooks
  • Practice rollback procedures regularly

GitOps represents a paradigm shift in how we manage and deploy applications, bringing the best practices of software development to operations. By embracing GitOps, teams can build more reliable, secure, and scalable systems. What is GitOps?

GitOps treats infrastructure and application configuration as code, stored in Git repositories. All changes to infrastructure and applications are made through Git commits and pull requests, triggering automated processes that synchronize the desired state (in Git) with the actual state (in production).

Key Characteristics

  • Declarative: Define the desired state of your system rather than imperative instructions
  • Versioned and Immutable: All changes are tracked in Git with complete history
  • Automatically Applied: Automated agents continuously reconcile actual state with desired state
  • Continuously Reconciled: Systems self-heal by detecting and correcting drift

When to Use GitOps

βœ… Ideal For:

  • Kubernetes and container orchestration
  • Cloud-native applications
  • Microservices architectures
  • Infrastructure as Code (IaC) deployments
  • Multi-environment management
  • Teams requiring audit trails and compliance

❌ Not Ideal For:

  • Legacy monolithic applications without automation
  • Simple static websites
  • One-off deployments without version control needs

Core Concepts

Single Source of Truth

Git serves as the canonical source for both application code and infrastructure configuration. Every aspect of your system’s desired state is stored in Git repositories.

Benefits:

  • Complete audit trail of all changes
  • Easy rollback to any previous state
  • Clear separation of concerns
  • Disaster recovery capabilities
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Example: Kubernetes deployment stored in Git
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      containers:
      - name: app
        image: myapp:v1.2.3
        ports:
        - containerPort: 8080

Declarative Configuration

Describe what you want, not how to achieve it. The system determines the necessary steps to reach the desired state.

Imperative vs Declarative:

1
2
3
4
5
6
7
8
# Imperative (how to do it)
kubectl create namespace production
kubectl create deployment my-app --image=myapp:1.0
kubectl scale deployment my-app --replicas=3
kubectl expose deployment my-app --port=8080

# Declarative (what you want)
kubectl apply -f production-deployment.yaml

Continuous Reconciliation

Automated agents continuously monitor the actual state and compare it with the desired state in Git. Any drift is automatically corrected.

Reconciliation Loop:

  1. Observe: Monitor actual state of infrastructure
  2. Compare: Check against desired state in Git
  3. Detect Drift: Identify differences
  4. Remediate: Automatically apply changes to align states
  5. Repeat: Continuously monitor

Pull vs Push Deployment

Traditional Push Model (CI/CD):

  • CI/CD pipeline pushes changes to production
  • Requires cluster credentials in CI/CD system
  • Pipeline has write access to production

GitOps Pull Model:

  • Agent inside cluster pulls changes from Git
  • No external system needs cluster access
  • Improved security posture
  • Self-healing capabilities
1
2
3
4
5
6
7
8
9
10
11
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Developer  │────────▢│   Git Repo  │◀────────│  GitOps     β”‚
β”‚             β”‚  commit β”‚             β”‚  pull   β”‚  Agent      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                                                        β”‚
                                                        β”‚ apply
                                                        β–Ό
                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                β”‚  Kubernetes β”‚
                                                β”‚   Cluster   β”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

GitOps Principles

1. Declarative Description

The entire system’s desired state is described declaratively in a format that machines can parse and understand (YAML, JSON, HCL, etc.).

Example - Terraform Configuration:

1
2
3
4
5
6
7
8
9
resource "aws_instance" "web_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  
  tags = {
    Name = "WebServer"
    Environment = "Production"
  }
}

2. Versioned and Immutable

All desired state is stored in a version control system that provides versioning, immutability, and audit trails.

Git Provides:

  • Complete change history
  • Author attribution
  • Timestamps
  • Commit messages explaining changes
  • Ability to revert to any previous state
1
2
3
4
5
6
7
8
# View change history
git log --oneline --graph

# See what changed
git diff HEAD~1 deployment.yaml

# Revert to previous version
git revert abc123

3. Pulled Automatically

Software agents automatically pull the desired state declarations from Git and apply them to the infrastructure.

GitOps Agents:

  • ArgoCD: Kubernetes-native continuous delivery
  • Flux: GitOps operator for Kubernetes
  • Jenkins X: Cloud-native CI/CD for Kubernetes
  • Terraform Cloud: IaC automation platform

4. Continuously Reconciled

Software agents continuously observe actual system state and attempt to apply the desired state.

Drift Detection and Correction:

1
2
3
4
5
6
# Desired state in Git: 3 replicas
spec:
  replicas: 3

# Actual state: 5 replicas (manually scaled)
# GitOps agent detects drift and corrects to 3 replicas

Git Fundamentals

Git Basics

Git is a distributed version control system that tracks changes in source code during software development.

Key Concepts

Repository: Directory containing your project files and Git metadata

1
2
3
4
5
# Initialize new repository
git init

# Clone existing repository
git clone https://github.com/username/repo.git

Commit: Snapshot of your repository at a specific point in time

1
2
3
4
5
6
7
8
# Stage files
git add filename.yaml

# Commit with message
git commit -m "Add production deployment configuration"

# View commit history
git log

Branch: Parallel version of your repository

1
2
3
4
5
6
7
8
9
10
11
# Create new branch
git branch feature/new-deployment

# Switch to branch
git checkout feature/new-deployment

# Create and switch in one command
git checkout -b feature/new-deployment

# List branches
git branch -a

Merge: Integrate changes from one branch into another

1
2
3
# Merge feature branch into main
git checkout main
git merge feature/new-deployment

Tag: Named reference to a specific commit (often used for releases)

1
2
3
4
5
6
7
8
# Create tag
git tag -a v1.0.0 -m "Release version 1.0.0"

# Push tags to remote
git push origin --tags

# List tags
git tag -l

Git Workflow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 1. Update local repository
git pull origin main

# 2. Create feature branch
git checkout -b feature/update-deployment

# 3. Make changes to files
vim deployment.yaml

# 4. Stage changes
git add deployment.yaml

# 5. Commit changes
git commit -m "Update deployment replicas to 5"

# 6. Push to remote
git push origin feature/update-deployment

# 7. Create pull request (on GitHub/GitLab)
# 8. Review and merge
# 9. Delete feature branch
git branch -d feature/update-deployment

Git Configuration

1
2
3
4
5
6
7
8
9
10
11
12
# Set user information
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Set default branch name
git config --global init.defaultBranch main

# Configure editor
git config --global core.editor "vim"

# View configuration
git config --list

Git Best Practices

Commit Messages

Good Commit Messages:

1
2
3
4
5
Add Kubernetes deployment for user service

- Configure 3 replicas for high availability
- Set resource limits: 500m CPU, 512Mi memory
- Add health checks on /health endpoint

Bad Commit Messages:

1
2
3
update
fix stuff
changes

Conventional Commits Format:

1
2
3
4
5
<type>(<scope>): <subject>

<body>

<footer>

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes (formatting)
  • refactor: Code refactoring
  • test: Adding tests
  • chore: Maintenance tasks

Example:

1
2
3
4
5
6
7
feat(deployment): add horizontal pod autoscaling

Configure HPA to scale between 3-10 replicas based on CPU
utilization target of 70%. This improves application availability
during traffic spikes.

Closes #123

Branching Hygiene

1
2
3
4
5
6
7
8
9
# Keep branches short-lived
# Delete merged branches
git branch -d feature/completed-feature

# Prune remote-tracking branches
git fetch --prune

# Clean up old branches
git branch --merged | grep -v "\*" | xargs -n 1 git branch -d

GitHub and GitHub Actions

GitHub Basics

GitHub is a web-based hosting service for Git repositories with collaboration features.

Repository Structure

1
2
3
4
5
6
7
8
9
10
11
my-project/
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ workflows/          # GitHub Actions workflows
β”‚   β”‚   β”œβ”€β”€ ci.yml
β”‚   β”‚   └── deploy.yml
β”‚   β”œβ”€β”€ CODEOWNERS          # Code review assignments
β”‚   └── dependabot.yml      # Dependency updates
β”œβ”€β”€ .gitignore              # Files to ignore
β”œβ”€β”€ README.md               # Project documentation
β”œβ”€β”€ LICENSE                 # License file
└── src/                    # Application code

GitHub Features for GitOps

Pull Requests: Code review and collaboration mechanism

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Example: PR template (.github/pull_request_template.md)
## Description
Brief description of changes

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## Testing
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed

## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated

Protected Branches: Enforce code quality standards

1
2
3
4
5
6
7
8
9
10
11
12
# Branch protection rules
main:
  required_reviews: 2
  require_code_owner_review: true
  dismiss_stale_reviews: true
  require_status_checks: true
  required_status_checks:
    - ci/lint
    - ci/test
    - ci/security-scan
  enforce_admins: true
  restrict_pushes: true

Code Owners: Automatic reviewer assignment

1
2
3
4
5
6
7
8
9
10
11
12
13
# .github/CODEOWNERS
# Global owners
* @team-leads

# Infrastructure files
/terraform/ @platform-team @sre-team
/kubernetes/ @platform-team

# Application code
/src/ @dev-team

# Documentation
/docs/ @doc-team @dev-team

GitHub Actions

GitHub Actions is a CI/CD platform integrated directly into GitHub.

Workflow Components

Workflow: Automated process defined in YAML

Event: Triggers that start workflows (push, pull_request, schedule, etc.)

Job: Set of steps executed on the same runner

Step: Individual task (run command, use action)

Action: Reusable unit of code

Runner: Server that executes workflows

Basic Workflow Structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# .github/workflows/ci.yml
name: CI Pipeline

# Events that trigger workflow
on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

# Environment variables
env:
  NODE_VERSION: '18'

# Jobs to run
jobs:
  test:
    name: Run Tests
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: $
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run linter
        run: npm run lint
      
      - name: Run tests
        run: npm test
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage/coverage.xml

Advanced Workflow Features

Matrix Builds: Test across multiple configurations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
jobs:
  test:
    runs-on: $
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        node-version: [16, 18, 20]
        exclude:
          - os: macos-latest
            node-version: 16
    
    steps:
      - uses: actions/checkout@v4
      - name: Setup Node.js $
        uses: actions/setup-node@v4
        with:
          node-version: $
      - run: npm ci
      - run: npm test

Conditional Execution:

1
2
3
4
5
6
7
8
steps:
  - name: Deploy to production
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    run: ./deploy.sh production
  
  - name: Deploy to staging
    if: github.ref == 'refs/heads/develop'
    run: ./deploy.sh staging

Secrets Management:

1
2
3
4
5
6
7
8
9
steps:
  - name: Deploy application
    env:
      AWS_ACCESS_KEY_ID: $
      AWS_SECRET_ACCESS_KEY: $
    run: |
      aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
      aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
      ./deploy.sh

Caching Dependencies:

1
2
3
4
5
6
7
8
9
10
11
12
13
steps:
  - uses: actions/checkout@v4
  
  - name: Cache dependencies
    uses: actions/cache@v3
    with:
      path: ~/.npm
      key: $-node-$
      restore-keys: |
        $-node-
  
  - name: Install dependencies
    run: npm ci

Reusable Workflows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# .github/workflows/reusable-deploy.yml
name: Reusable Deploy Workflow

on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
      version:
        required: true
        type: string
    secrets:
      deploy-key:
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: $
    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        run: ./deploy.sh $ $
        env:
          DEPLOY_KEY: $
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# .github/workflows/main.yml
name: Deploy Application

on:
  push:
    branches: [ main ]

jobs:
  deploy-staging:
    uses: ./.github/workflows/reusable-deploy.yml
    with:
      environment: staging
      version: $
    secrets:
      deploy-key: $
  
  deploy-production:
    needs: deploy-staging
    uses: ./.github/workflows/reusable-deploy.yml
    with:
      environment: production
      version: $
    secrets:
      deploy-key: $

Parallel Jobs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint
  
  test-unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run test:unit
  
  test-integration:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run test:integration
  
  deploy:
    needs: [lint, test-unit, test-integration]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh

Service Containers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
jobs:
  test:
    runs-on: ubuntu-latest
    
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
      
      redis:
        image: redis:7
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 6379:6379
    
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost:5432/testdb
          REDIS_URL: redis://localhost:6379
        run: npm test

Self-Hosted Runners:

1
2
3
4
5
6
7
jobs:
  build:
    runs-on: [self-hosted, linux, x64, gpu]
    steps:
      - uses: actions/checkout@v4
      - name: Build with GPU
        run: ./build-with-cuda.sh

Complete CI/CD Workflow Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
name: Complete CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  release:
    types: [published]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: $

jobs:
  # Code quality checks
  lint:
    name: Lint Code
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run ESLint
        run: npm run lint
      
      - name: Run Prettier
        run: npm run format:check
  
  # Security scanning
  security:
    name: Security Scan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'
      
      - name: Upload to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'
  
  # Unit and integration tests
  test:
    name: Run Tests
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [16, 18, 20]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node.js $
        uses: actions/setup-node@v4
        with:
          node-version: $
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test -- --coverage
      
      - name: Upload coverage
        if: matrix.node-version == '18'
        uses: codecov/codecov-action@v3
  
  # Build and push Docker image
  build:
    name: Build Image
    needs: [lint, security, test]
    runs-on: ubuntu-latest
    if: github.event_name != 'pull_request'
    permissions:
      contents: read
      packages: write
    
    outputs:
      image-tag: $
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: $
          username: $
          password: $
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: $/$
          tags: |
            type=ref,event=branch
            type=semver,pattern=
            type=semver,pattern=.
            type=sha
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: $
          labels: $
          cache-from: type=gha
          cache-to: type=gha,mode=max
  
  # Deploy to staging
  deploy-staging:
    name: Deploy to Staging
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment:
      name: staging
      url: https://staging.example.com
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Update Kubernetes manifests
        run: |
          cd k8s/staging
          kustomize edit set image app=$
      
      - name: Commit changes
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git add .
          git commit -m "Deploy $ to staging"
          git push
  
  # Deploy to production
  deploy-production:
    name: Deploy to Production
    needs: build
    runs-on: ubuntu-latest
    if: github.event_name == 'release'
    environment:
      name: production
      url: https://example.com
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Update Kubernetes manifests
        run: |
          cd k8s/production
          kustomize edit set image app=$
      
      - name: Commit changes
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git add .
          git commit -m "Deploy $ to production"
          git push

GitLab CI/CD

GitLab Overview

GitLab is a complete DevOps platform with built-in CI/CD capabilities.

GitLab CI/CD Configuration

GitLab uses .gitlab-ci.yml file in the repository root.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# .gitlab-ci.yml
stages:
  - build
  - test
  - deploy

variables:
  DOCKER_REGISTRY: registry.gitlab.com
  IMAGE_NAME: $CI_REGISTRY_IMAGE

# Build stage
build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker build -t $IMAGE_NAME:$CI_COMMIT_SHA .
    - docker push $IMAGE_NAME:$CI_COMMIT_SHA
  only:
    - main
    - develop

# Test stage
test:
  stage: test
  image: node:18
  cache:
    paths:
      - node_modules/
  script:
    - npm ci
    - npm run lint
    - npm test
  coverage: '/Lines\s*:\s*(\d+\.\d+)%/'
  artifacts:
    reports:
      junit: junit.xml
      coverage_report:
        coverage_format: cobertura
        path: coverage/cobertura-coverage.xml

# Deploy to staging
deploy-staging:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl set image deployment/my-app app=$IMAGE_NAME:$CI_COMMIT_SHA
    - kubectl rollout status deployment/my-app
  environment:
    name: staging
    url: https://staging.example.com
  only:
    - develop

# Deploy to production
deploy-production:
  stage: deploy
  image: bitnami/kubectl:latest
  script:
    - kubectl set image deployment/my-app app=$IMAGE_NAME:$CI_COMMIT_SHA
    - kubectl rollout status deployment/my-app
  environment:
    name: production
    url: https://example.com
  when: manual
  only:
    - main

GitLab Features

Auto DevOps: Automated CI/CD pipeline

Container Registry: Built-in Docker registry

Kubernetes Integration: Native K8s deployment

Security Scanning: SAST, DAST, dependency scanning

Merge Requests: Code review process

Protected Branches: Enforce merge requirements


Repository Structure

Separation of Concerns

Best Practice: Separate application code from infrastructure configuration.

Reasons:

  1. Different lifecycles and release cadences
  2. Different teams and approval processes
  3. Configuration changes shouldn’t trigger app rebuilds
  4. Security and access control separation

Repository Organization

Pattern 1: Monorepo for Small Projects

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
project/
β”œβ”€β”€ src/                      # Application code
β”‚   β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ backend/
β”‚   └── shared/
β”œβ”€β”€ infrastructure/           # Infrastructure code
β”‚   β”œβ”€β”€ terraform/
β”‚   β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ environments/
β”‚   β”‚   β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   β”‚   β”œβ”€β”€ staging/
β”‚   β”‚   β”‚   └── prod/
β”‚   β”‚   └── main.tf
β”‚   └── kubernetes/
β”‚       β”œβ”€β”€ base/
β”‚       └── overlays/
β”‚           β”œβ”€β”€ dev/
β”‚           β”œβ”€β”€ staging/
β”‚           └── prod/
β”œβ”€β”€ .github/
β”‚   └── workflows/
└── docs/

Pattern 2: Multi-Repo for Enterprise

Application Repository:

1
2
3
4
5
6
7
8
9
app-user-service/
β”œβ”€β”€ src/
β”œβ”€β”€ tests/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       β”œβ”€β”€ ci.yml
β”‚       └── build.yml
└── README.md

Infrastructure Repository:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
infrastructure/
β”œβ”€β”€ terraform/
β”‚   β”œβ”€β”€ modules/
β”‚   β”‚   β”œβ”€β”€ vpc/
β”‚   β”‚   β”œβ”€β”€ eks/
β”‚   β”‚   └── rds/
β”‚   └── environments/
β”‚       β”œβ”€β”€ dev/
β”‚       β”œβ”€β”€ staging/
β”‚       └── prod/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── terraform.yml
└── README.md

Configuration Repository (GitOps):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
gitops-config/
β”œβ”€β”€ clusters/
β”‚   β”œβ”€β”€ dev/
β”‚   β”‚   β”œβ”€β”€ apps/
β”‚   β”‚   β”œβ”€β”€ infrastructure/
β”‚   β”‚   └── system/
β”‚   β”œβ”€β”€ staging/
β”‚   └── prod/
β”œβ”€β”€ apps/
β”‚   β”œβ”€β”€ user-service/
β”‚   β”‚   β”œβ”€β”€ base/
β”‚   β”‚   β”‚   β”œβ”€β”€ deployment.yaml
β”‚   β”‚   β”‚   β”œβ”€β”€ service.yaml
β”‚   β”‚   β”‚   └── kustomization.yaml
β”‚   β”‚   └── overlays/
β”‚   β”‚       β”œβ”€β”€ dev/
β”‚   β”‚       β”œβ”€β”€ staging/
β”‚   β”‚       └── prod/
β”‚   └── payment-service/
└── README.md

Pattern 3: Package Repository

Platform Repository (Fleet-wide config):

1
2
3
4
5
6
7
8
9
10
11
12
platform-config/
β”œβ”€β”€ cluster-config/
β”‚   β”œβ”€β”€ namespaces/
β”‚   β”œβ”€β”€ rbac/
β”‚   └── network-policies/
β”œβ”€β”€ shared-services/
β”‚   β”œβ”€β”€ ingress-nginx/
β”‚   β”œβ”€β”€ cert-manager/
β”‚   └── monitoring/
└── policies/
    β”œβ”€β”€ pod-security/
    └── network/

###

This post is licensed under CC BY 4.0 by the author.