DevOps Blog

$ terraform -chdir=Infra/dev show # the stack when lab-on

This is the stack I built for this personal project. It is not running right now. Scale-to-zero by default; scaled up on demand to prove the wiring still holds.

cluster at a glance

KNOB	VALUE
region	eu-west-1 (Ireland)
vpc	10.0.0.0/16 · 2 AZs · no NAT gateway
cluster	devopsblog-dev-cluster · k8s 1.34 · public API
nodes	t3.medium · min=0, desired=0, max=2
addons	coredns, kube-proxy, vpc-cni
tfstate	s3://tfstate-devopsblog2 · DynamoDB lock · encrypted

IRSA — how the cluster talks to AWS without static keys

IAM roles bound via the EKS OIDC provider, each scoped to one ServiceAccount. No long-lived IAM user, no secret stuffed in a ConfigMap.

ROLE	TRUSTS SA	PURPOSE
AmazonEKSLoadBalancerControllerRole-devopsblog-dev	kube-system / aws-load-balancer-controller	ingress → ALB provisioning
devopsblog-dev-s3-reader	devopsblog-dev / devopsblog-s3	Flask pod reads posts/ from S3

# cost-at-24x7 estimate: EKS control plane ~$73/mo alone. Add t3.medium ×1, ALB, data transfer — call it ~$110–140/mo just to keep a blog warm. The static stack is under a dollar. Keeping it up 24/7 would be too expensive for a personal project. (Infra/prod/ also exists in Terraform but is not provisioned in the current account — legacy from a prior AWS account.)

lab-on / lab-off — scale-to-zero helpers

$ lab-on.sh      # EKS nodes: min=0, desired=0 → desired=1,  ArgoCD syncs workloads
$ lab-off.sh     # back to desired=0, $0 compute until next lab-on

$ gh workflow list # two pipelines, one app-repo, one gitops-repo

Two pipelines, distinct triggers, distinct blast radii:

application pipeline — app repo → app-ci.yaml builds image, pushes to ECR by digest, opens a PR on the GitOps repo bumping the dev Deployment to the new digest → ArgoCD picks it up.
content pipeline — markdown push → content-ci.yaml syncs to the S3 content bucket under dev/, then dispatches content_validation on the GitOps repo.

application repo — .github/workflows/

WORKFLOW	TRIGGER	DOES
pr-validation.yaml	PR to main	SonarCloud scan + quality gate, Snyk SCA (fail-on-HIGH), Snyk container scan (fail-on-HIGH)
app-ci.yaml	push to main (code paths)	build image, push to ECR, open PR on GitOps repo updating dev Deployment to the new digest
content-ci.yaml	push to main (content/)	sync markdown to S3 content-bucket/dev/, dispatch content_validation on GitOps
autobot-merge.yaml	PR opened / reopened	GitHub App enables auto-merge once required checks pass

gitops repo — .github/workflows/

WORKFLOW	TRIGGER	DOES
dev-validation.yaml	push to master (dev/**)	wait-for-healthy on /health, OWASP ZAP baseline DAST, then commit prod Deployment with the validated digest
content_validation.yaml	workflow_dispatch (from app repo)	smoke-test dev, ZAP baseline against dev
copy-content.yaml	on content_validation success	S3 sync dev/ → prod/, write promotion stamp
prod-validation.yaml	push to master (prod/**)	smoke-test prod, ZAP baseline against prod
autobot-merge-gitops.yaml	PR opened / reopened	auto-merge GitOps PRs once checks pass (Socket + validation)

content flow (markdown → dev → prod)

 content/*.md ──▶ content-ci.yaml ──▶ s3://.../dev/
                                           │
                                           ▼
                                 dispatch content_validation
                                           │
                             ┌─────────────┴─────────────┐
                             ▼                           ▼
                      smoke (curl /health)         ZAP baseline DAST
                             └─────────────┬─────────────┘
                                           ▼
                                   on success:
                              copy-content.yaml syncs
                               s3://.../dev/ → prod/
                                           ▼
                              prod-validation runs ZAP
                                    against prod

$ argocd app list # app-of-apps, pull-based, self-healing

Delivery is pull-based. Nothing in CI runs kubectl apply. ArgoCD sits inside the cluster and reconciles desired state from devopsblog-gitops to what's actually running.

 devopsblog-application  ──┐
   (source + Dockerfile)   │
                           ▼
                   GHA: app-ci.yaml
                   ├─ build image
                   ├─ push to ECR (by digest)
                   └─ open PR on devopsblog-gitops
                           │
                           ▼
 devopsblog-gitops ──────── ArgoCD (in-cluster) ─────▶ dev namespace
   (k8s manifests,          │                          (Deployment, SA,
    one commit = one        │                           Service, Ingress)
    desired state)          └── automated sync, self-heal, prune

app-of-apps pattern

argocd-apps-root-dev/ — the root Application, bootstrapped once into ArgoCD by hand.
argocd-apps-dev/ — child Applications the root points at (currently: devopsblog-dev).
dev/devopsblog-dev/ — the actual manifests: Deployment, Service, Ingress, ConfigMap, two ServiceAccounts (ALB + S3).
dev-alb-controller/ — the AWS Load Balancer Controller install.

sync policy (every child Application)

automated — ArgoCD applies new commits without a human clicking sync.
self-heal — manual kubectl edit on a managed resource is reverted back to git.
prune — resources deleted from git are deleted from the cluster.
drift detection — every resource is either Synced or flagged OutOfSync with a diff against git.

why this over kubectl apply from CI

Rollback is git revert. No CI-held credential with cluster-admin.
Desired state has a review trail — every cluster change is a PR.
The cluster pulls; CI never reaches in. Smaller attack surface.

$ aws resourcegroupstaggingapi get-resources # everything that isn't EKS or CI

RESOURCE	ROLE
S3 devopsblog-content2 (prefix dev)	markdown source of truth; Flask pod reads via IRSA
ECR devopsblog-app	container registry; CI pushes by digest, ArgoCD pulls
Secrets Manager	GitHub App private key, scanner tokens (Sonar / Snyk)
Route 53 (devopsblog.online)	hosted zone, apex + www ALIAS, CAA pinned to amazon.com
IAM roles (OIDC)	assumed by GHA workflows — zero static AWS keys

GitHub App pattern — zero secrets in repo Variables/Secrets

Cross-repo writes (app repo → GitOps PR) need a token with more scope than the default GITHUB_TOKEN. Instead of stuffing a PAT into repo secrets, a GitHub App holds the permission; its private key lives in AWS Secrets Manager.

 GHA job starts
   │
   ├─ aws-actions/configure-aws-credentials  (OIDC → short-lived STS)
   ├─ aws-actions/aws-secretsmanager-get-secrets  (fetch App private key)
   └─ actions/create-github-app-token  (mint short-lived installation token)
                 │
                 └─▶ git push / gh pr create on devopsblog-gitops

zero hardcoded secrets in either repo's Variables or Secrets.
short-lived tokens at every hop — STS credential, then App installation token.
scoped — the App is only installed on the two repos it needs, with the minimum permission set.

runtime env (when the Flask pod is actually running)

---

POSTS_S3_BUCKET: devopsblog-content2

POSTS_S3_PREFIX: dev

serviceAccountName: devopsblog-s3 # IRSA-annotated

runAsNonRoot: true

runAsUser: 1000

allowPrivilegeEscalation: false

capabilities.drop: [ALL]

---

$ curl -I https://devopsblog.online/ # how you're reading this right now

No origin server. No Kubernetes pod. No database. This page is a flat HTML file sitting in an S3 bucket, fronted by CloudFront. A Python process baked it once (Frozen-Flask) and then exited. What you are hitting is a CDN edge cache.

 you ──HTTPS──▶ CloudFront edge ──SigV4/OAC──▶ S3 (private)
                   │
                   ├─ ACM cert in us-east-1 (TLSv1.2_2021 min)
                   ├─ response_headers_policy: HSTS, CSP, XFO, nosniff, referrer
                   └─ caches baked HTML + static assets

COMPONENT	ROLE	STATUS
S3 (devopsblog-site-origin)	origin bucket, private, OAC-only GetObject	LIVE
CloudFront	TLS termination, caching, security headers	LIVE
ACM (us-east-1)	viewer cert for devopsblog.online + www	LIVE
Route 53	hosted zone + apex/www ALIAS + CAA	LIVE
CAA record	only amazon.com may issue; no wildcards	ENFORCED

security headers at the edge (CloudFront response-headers policy)

---

strict-transport-security: max-age=63072000; includeSubDomains; preload

x-content-type-options: nosniff

x-frame-options: DENY

referrer-policy: strict-origin-when-cross-origin

content-security-policy: "default-src 'self'; img-src 'self' data: https:; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; font-src 'self' https://fonts.gstatic.com; script-src 'self'"

---

why static:

cost — roughly $0.65/month. No compute, no NAT gateway, no control-plane fee.
uptime — CloudFront+S3 availability is higher than any single-cluster EKS I'll ever run.
blast radius — a bad commit ships HTML, not containers. Worst case: invalidate the CDN.
tradeoff — no server-side dynamic behaviour. For a blog, that's a feature.

valentin@devops:~ # the punchline

> This page? It's baked HTML in an S3 bucket.

> CloudFront is caching it at an edge closer to you than my desk is.

> Monthly bill for all of it: ~$0.65.

> Every EKS / IRSA / ArgoCD / ZAP / OIDC / Terraform detail above is real and applied.

> None of it is serving this page.

> Knowing which tool to reach for, and which one to not reach for, is the job.

> exit 0