# Ephemera — ECS Fargate service (blue/green) — SERVICE-knobbed (AWS CLI)

> Self-executing Markdown. The per-service layer of the ECS Fargate demo as one auditable plan,
> **parameterized on a `SERVICE` knob** the way a multi-env plan is parameterized on `ENV`. `service-a`,
> `service-b`, `service-c` are the same plan run with `SERVICE=service-a|b|c` — never forked files. The
> cloud is the source of truth; this file is intent + write-back ledger.

> **Requires** (discovered by observing the cloud):
> - `ecs-cluster(${ENV}-cluster)` and `alb(${ENV}-fargate-alb)` from **`ecs-cluster.aws.md`** — cluster
>   ARN/name, ALB ARN, :443 + :9001 listener ARNs, ALB-SG id.
> - `zone(${DOMAIN_NAME})` from **`domain.cloudflare.md`** (example.com is on Cloudflare) — for the
>   §10 DNS record *(only when `DNS_MODE=managed`)*. Use `domain.aws.md`/Route 53 only for AWS-delegated domains.
> - A **container image present in ECR** before the ECS service can reach steady state (built in §2).
>
> **Provides** `ecs-service(${ENV}-${SERVICE})` — the running service + its blue/green target groups,
> for anything downstream (dashboards, alarms, a future API gateway).

---

## 🤖 Director prompt

Observe before acting; verify each step; **stop at 🔴/💥 for human go**; write realized ARNs/IDs back
into Live State each step; **IAM needs the credential-broker `--no-session` mode** (see gotcha) and may
lag (retry "role cannot be assumed"); blue/green cutover is a CodeDeploy lifecycle, not an `aws` one-shot;
teardown observes-first and is resumable. Verbs: `verify` (read-only), `apply` (create, gated),
`teardown` (destroy, gated).

```
Legend  🟢 create · 🟡 config · 🔴 GATE (human go) · 💥 destructive (human go) · ⏳ wait · ✔ verify
```

## Intent

Run one containerized service on the shared Fargate platform, fronted by the ALB, with **blue/green
deploys via CodeDeploy** (the demo default). One ECR repo + image, three IAM roles (execution, task,
codedeploy), a service security group that only accepts traffic from the ALB SG, **two target groups**
(blue + green), a task definition, an ECS service wired to the blue TG under the `CODE_DEPLOY`
controller, a CodeDeploy app + deployment group binding the :443 prod and :9001 test listeners, and a
a DNS record to the ALB (Cloudflare, the authoritative provider for example.com). Mirrors
`common.hcl`: `port=80`, `health_check_path=/`, `deployment_type=bluegreen`.

## Deployment strategy is a fork (Provisioning Input #3)

`bluegreen` ⇒ CodeDeploy + 2 target groups + the `CODE_DEPLOY` controller; `rolling` ⇒ 1 target group +
the ECS controller managing the listener rule; `canary` ⇒ 2 target groups + a weighted listener under
the ECS controller. The resource graph is a pure function of that answer; §6/§8/§9 branch on it.

## Gotcha that cost the most time elsewhere: credential brokers can't call IAM

§3–§5 create **three IAM roles**. If AWS creds come from a broker/SSO/MFA that vends GetSessionToken
session tokens (e.g. `aws-vault export`), IAM calls fail `InvalidClientTokenId` while every other
service works and `sts get-caller-identity` succeeds. Run **only the IAM steps** with the long-term /
no-session mode: `aws-vault exec <profile> --no-session -- <iam cmd>`. ECR/ECS/ELB/Route53 steps run
fine on the normal session creds. (Generalized in `README.md`.)

## Live State

```yaml
status:        not-created      # published template - run it to realize state
last_action:   authored — per-service layer (service-a)
last_verified: —
resolved_inputs:                # filled once the Provisioning Inputs interview runs
  env:             dev
  service:         service-a
  deployment_type: bluegreen
  container_port:  "80"
  image_source:    placeholder
  dns_mode:        managed
  dns_provider:    cloudflare
  cpu:             "256"
  memory:          "512"
```

| key                | value (filled on apply) |
|--------------------|-------------------------|
| AWS_REGION         | `us-west-2` |
| ACCOUNT_ID         | `<discover>` |
| ENV / SERVICE      | `dev` / `service-a` |
| CLUSTER_ARN        | `<Requires: discover from ecs-cluster.aws.md>` |
| ALB_ARN / DNS      | `<Requires: discover>` |
| LISTENER_443/9001  | `<Requires: discover>` |
| ALB_SG_ID          | `<Requires: discover>` |
| DNS_PROVIDER       | `cloudflare` (example.com on Cloudflare) |
| ZONE / RECORD      | `<Requires: discover from domain.cloudflare.md, if DNS_MODE=managed>` |
| ECR_REPO_URL       | `—` |
| EXEC_ROLE_ARN      | `—` |
| TASK_ROLE_ARN      | `—` |
| CODEDEPLOY_ROLE_ARN| `—` |
| SERVICE_SG_ID      | `—` |
| TG_BLUE_ARN        | `—` |
| TG_GREEN_ARN       | `—` |
| TASKDEF_ARN        | `—` |
| SERVICE_ARN        | `—` |
| CODEDEPLOY_APP     | `—` |
| CODEDEPLOY_DG      | `—` |

| ✔ check                       | expected                                         | observed | result |
|-------------------------------|--------------------------------------------------|----------|--------|
| Requires: cluster + ALB live  | both discovered, `ACTIVE` / `active`             | —        | — |
| ECR image present             | `latest` tag pushed (`linux/amd64`)              | —        | — |
| 3 IAM roles assumable         | trust ecs-tasks / codedeploy                     | —        | — |
| service SG ingress            | only from `ALB_SG_ID` on container port          | —        | — |
| blue + green target groups    | both exist, health-check `/` → 200               | —        | — |
| task definition               | registered, Fargate, port matches                | —        | — |
| ECS service                   | `CODE_DEPLOY` controller, runningCount=desired   | —        | — |
| CodeDeploy deployment group   | blue/green, both listeners bound                 | —        | — |
| Route 53 record (managed)     | alias → ALB DNS                                  | —        | — |

## Provisioning Inputs

Resolve once, up front, before any cloud mutation. Accept the default on silence; write `resolved_inputs`
into Live State. Every option is a closed enum (bar free-text identifiers), so the resource graph is a
pure function of the answers.

| # | Question | Options (closed enum) | Default | Sets | Gates |
|---|----------|-----------------------|---------|------|-------|
| 1 | Which service? | `service-a` / `service-b` / `service-c` | `service-a` | `SERVICE` | every resource name |
| 2 | Which environment? | `dev` / `stg` / `uat` / `prod` | `dev` | `ENV` | every name, cluster/ALB discovery |
| 3 | Deployment strategy? | `bluegreen` / `rolling` / `canary` | `bluegreen` | `DEPLOYMENT_TYPE` | §6 TG count, §8 controller, §9 CodeDeploy on/off |
| 4 | Container port (must match Dockerfile `EXPOSE`) | `80` / `3000` / `5000` / `8080` / `9000` | `80` | `CONTAINER_PORT` | §2 image, §6 TG, §7 taskdef |
| 5 | Image source | `placeholder` / `build-dockerfile` | `placeholder` | `IMAGE_SOURCE` | §2 |
| 6 | DNS record? | `managed` / `none` | `managed` | `DNS_MODE` | §10 (Requires zone) |
| 6b | DNS provider (if managed) | `cloudflare` / `route53` / `godaddy` / `manual` | `cloudflare` | `DNS_PROVIDER` | §10 branch (which API places the record) |
| 7 | Task size (CPU / MEM) | `256/512` / `512/1024` / `1024/2048` | `256/512` | `CPU` `MEMORY` | §7 taskdef |
| 8 | Public DNS name (free-text, if `managed`) | e.g. `service-a.dev.example.com` | — | `DOMAIN_NAME` | §10 |

```yaml
# → written into Live State once resolved
resolved_inputs:
  service:         service-a
  env:             dev
  deployment_type: bluegreen
  container_port:  "80"
  image_source:    placeholder
  dns_mode:        managed
  dns_provider:    cloudflare
  cpu:             "256"
  memory:          "512"
  domain_name:     <set if dns_mode=managed>
  resolved_by:     <human who confirmed>
  resolved_at:     <timestamp>
```

## 0. Variables

```bash
export ENV="dev" SERVICE="service-a" AWS_REGION="us-west-2"
export ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
export DEPLOYMENT_TYPE="bluegreen" CONTAINER_PORT="80" CPU="256" MEMORY="512"
export NAME="${ENV}-${SERVICE}"               # e.g. dev-service-a
export ECR_REPO="ecs-fargate-demo-${SERVICE}-${ENV}"
export LOG_GROUP="/ecs/${NAME}-log-group"
export PLACEHOLDER_IMAGE="nginx:latest"       # from common.hcl placeholder_images[port]
```

## Requires-discovery (read-only — pre-fill from the cloud)  ✔

```bash
CLUSTER_ARN="$(aws ecs describe-clusters --clusters "${ENV}-cluster" --query 'clusters[0].clusterArn' --output text)"
ALB_ARN="$(aws elbv2 describe-load-balancers --names "${ENV}-fargate-alb" --query 'LoadBalancers[0].LoadBalancerArn' --output text)"
ALB_DNS="$(aws elbv2 describe-load-balancers --load-balancer-arns "$ALB_ARN" --query 'LoadBalancers[0].DNSName' --output text)"
ALB_SG_ID="$(aws elbv2 describe-load-balancers --load-balancer-arns "$ALB_ARN" --query 'LoadBalancers[0].SecurityGroups[0]' --output text)"
VPC_ID="$(aws ssm get-parameter --name /network/vpc --query Parameter.Value --output text)"
LISTENER_443="$(aws elbv2 describe-listeners --load-balancer-arn "$ALB_ARN" --query "Listeners[?Port==\`443\`].ListenerArn" --output text)"
LISTENER_9001="$(aws elbv2 describe-listeners --load-balancer-arn "$ALB_ARN" --query "Listeners[?Port==\`9001\`].ListenerArn" --output text)"
# any of these empty => the platform plan (ecs-cluster.aws.md) is not applied; stop and apply it first.
```
> → Live State: CLUSTER_ARN, ALB_ARN, ALB_DNS, ALB_SG_ID, LISTENER_443, LISTENER_9001 (discovered, not created)

## Dependency frontier

```
[Requires] cluster, alb, :443+:9001 listeners, alb-sg, vpc  ── discovered ──┐
ecr ─> 🔴? image build/push (linux/amd64) ──┐                               │
exec_role 🔴 ─┐                              ├─> task_def (roles + image uri)│
task_role 🔴 ─┘                              │                              │
                                             service_sg (ingress from alb-sg)│
codedeploy_role 🔴                           target_group ×2 (blue+green, vpc, health /) ─┐
                                                                                          ├─> ecs_service (cluster,taskdef,blue-TG,sg,controller)
                                                                                          └─> codedeploy app+DG (service,both-TGs,both-listeners)
                                                                            dns record ─> ALB DNS (DNS_MODE=managed; CF or R53)
```
Non-negotiable edges: **task def needs both role ARNs + the ECR image URI**; **the image must exist
before the service stabilizes**; **target groups need the VPC id**; **the ECS service needs the cluster,
the task def, the blue TG, and the service SG**; **CodeDeploy needs the service + both TGs + both
listeners + the codedeploy role**; **the DNS record needs the ALB DNS**. Teardown reverses this.

## 1. ECR repository  🟢

```bash
ECR_REPO_URL="$(aws ecr create-repository --repository-name "$ECR_REPO" \
  --image-tag-mutability IMMUTABLE \
  --image-scanning-configuration scanOnPush=true \
  --query 'repository.repositoryUri' --output text)"
```
```bash
# ✔ verify
aws ecr describe-repositories --repository-names "$ECR_REPO" \
  --query 'repositories[0].imageTagMutability' --output text   # IMMUTABLE
```
> → Live State: ECR_REPO_URL

## 2. Build & push the image (precondition for a healthy service)  🟢

> `IMAGE_SOURCE=placeholder` seeds the public image; `build-dockerfile` builds from a local `Dockerfile`
> whose `EXPOSE` must equal `CONTAINER_PORT`. Apple Silicon: `--platform linux/amd64` is mandatory.

```bash
aws ecr get-login-password --region "$AWS_REGION" \
  | docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"

if [ "${IMAGE_SOURCE:-placeholder}" = "placeholder" ]; then
  docker pull --platform linux/amd64 "$PLACEHOLDER_IMAGE"
  docker tag "$PLACEHOLDER_IMAGE" "${ECR_REPO_URL}:latest"
else
  docker build --platform linux/amd64 -t "${ECR_REPO_URL}:latest" .
fi
docker push "${ECR_REPO_URL}:latest"
```
```bash
# ✔ verify image present
aws ecr list-images --repository-name "$ECR_REPO" \
  --query 'imageIds[?imageTag==`latest`].imageTag' --output text   # latest
```
> → Live State: note image pushed

## 3. Task execution role (IAM)  🔴🟢

> 🔴 GATE — IAM is global/billable-by-policy and gated. **Run with the broker's no-session mode** if
> creds come from a session-token broker (see gotcha). Trust: `ecs-tasks.amazonaws.com`.

```bash
TRUST='{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ecs-tasks.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
EXEC_ROLE_ARN="$(aws iam create-role --role-name "${NAME}-exec-role" \
  --assume-role-policy-document "$TRUST" --query 'Role.Arn' --output text)"
aws iam attach-role-policy --role-name "${NAME}-exec-role" \
  --policy-arn <ARN>
# scoped inline: write to THIS service's log group + read SSM creds
aws iam put-role-policy --role-name "${NAME}-exec-role" --policy-name "${NAME}-exec-inline" \
  --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[
    {\"Effect\":\"Allow\",\"Action\":[\"logs:CreateLogStream\",\"logs:PutLogEvents\"],\"Resource\":\"<ARN>${AWS_REGION}:${ACCOUNT_ID}:log-group:${LOG_GROUP}:*\"},
    {\"Effect\":\"Allow\",\"Action\":[\"ssm:GetParameters\"],\"Resource\":\"<ARN>${AWS_REGION}:${ACCOUNT_ID}:parameter/creds/*\"}]}"
```
```bash
# ✔ verify
aws iam get-role --role-name "${NAME}-exec-role" \
  --query 'Role.AssumeRolePolicyDocument.Statement[0].Principal.Service' --output text   # ecs-tasks.amazonaws.com
```
> → Live State: EXEC_ROLE_ARN

## 4. Task role (IAM)  🔴🟢

> 🔴 GATE. Runtime permissions for the container (logs + whatever the app needs — kept minimal here).

```bash
TASK_ROLE_ARN="$(aws iam create-role --role-name "${NAME}-task-role" \
  --assume-role-policy-document "$TRUST" --query 'Role.Arn' --output text)"
aws iam put-role-policy --role-name "${NAME}-task-role" --policy-name "${NAME}-task-inline" \
  --policy-document "{\"Version\":\"2012-10-17\",\"Statement\":[
    {\"Effect\":\"Allow\",\"Action\":[\"logs:CreateLogStream\",\"logs:PutLogEvents\"],\"Resource\":\"<ARN>${AWS_REGION}:${ACCOUNT_ID}:log-group:/ecs/*\"}]}"
```
```bash
# ✔ verify
aws iam get-role --role-name "${NAME}-task-role" --query 'Role.Arn' --output text   # <ARN>
```
> → Live State: TASK_ROLE_ARN

## 5. CodeDeploy role (IAM)  🔴🟢  *(skip if DEPLOYMENT_TYPE≠bluegreen)*

> 🔴 GATE. Trust: `codedeploy.amazonaws.com`; attach the AWS-managed ECS CodeDeploy policy.

```bash
CD_TRUST='{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"codedeploy.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
CODEDEPLOY_ROLE_ARN="$(aws iam create-role --role-name "${NAME}-codedeploy-role" \
  --assume-role-policy-document "$CD_TRUST" --query 'Role.Arn' --output text)"
aws iam attach-role-policy --role-name "${NAME}-codedeploy-role" \
  --policy-arn <ARN>
```
```bash
# ✔ verify
aws iam list-attached-role-policies --role-name "${NAME}-codedeploy-role" \
  --query 'AttachedPolicies[0].PolicyName' --output text   # AWSCodeDeployRoleForECS
```
> → Live State: CODEDEPLOY_ROLE_ARN

## 6. Service security group + target groups  🟢

```bash
# service SG: ingress only from the ALB SG on the container port
SERVICE_SG_ID="$(aws ec2 create-security-group --group-name "${NAME}-sg" \
  --description "SG for ${NAME}" --vpc-id "$VPC_ID" --query GroupId --output text)"
aws ec2 authorize-security-group-ingress --group-id "$SERVICE_SG_ID" \
  --ip-permissions "IpProtocol=tcp,FromPort=${CONTAINER_PORT},ToPort=${CONTAINER_PORT},UserIdGroupPairs=[{GroupId=${ALB_SG_ID}}]" >/dev/null

# blue + green target groups (type ip for Fargate; health check '/')
make_tg() { aws elbv2 create-target-group --name "${NAME}-$1-tg" \
  --protocol HTTP --port "$CONTAINER_PORT" --vpc-id "$VPC_ID" --target-type ip \
  --health-check-path "/" --matcher HttpCode=200 \
  --health-check-interval-seconds 30 --health-check-timeout-seconds 5 \
  --healthy-threshold-count 3 --unhealthy-threshold-count 3 \
  --query 'TargetGroups[0].TargetGroupArn' --output text; }
TG_BLUE_ARN="$(make_tg blue)"
[ "$DEPLOYMENT_TYPE" = "bluegreen" ] || [ "$DEPLOYMENT_TYPE" = "canary" ] && TG_GREEN_ARN="$(make_tg green)"
```
```bash
# ✔ verify
aws ec2 describe-security-groups --group-ids "$SERVICE_SG_ID" \
  --query 'SecurityGroups[0].IpPermissions[0].UserIdGroupPairs[0].GroupId' --output text   # == ALB_SG_ID
aws elbv2 describe-target-groups --target-group-arns "$TG_BLUE_ARN" \
  --query 'TargetGroups[0].HealthCheckPath' --output text   # /
```
> → Live State: SERVICE_SG_ID, TG_BLUE_ARN, TG_GREEN_ARN

## 7. Task definition  🟢

```bash
aws logs create-log-group --log-group-name "$LOG_GROUP" 2>/dev/null || true
cat > /tmp/${NAME}-taskdef.json <<JSON
{ "family": "${NAME}-task", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"],
  "cpu": "${CPU}", "memory": "${MEMORY}",
  "executionRoleArn": "${EXEC_ROLE_ARN}", "taskRoleArn": "${TASK_ROLE_ARN}",
  "containerDefinitions": [{
    "name": "${SERVICE}", "image": "${ECR_REPO_URL}:latest", "essential": true,
    "portMappings": [{"containerPort": ${CONTAINER_PORT}, "protocol": "tcp"}],
    "healthCheck": {"command":["CMD-SHELL","curl -f http://localhost:${CONTAINER_PORT}/ || exit 1"],
                    "interval":30,"timeout":5,"retries":3,"startPeriod":10},
    "logConfiguration": {"logDriver":"awslogs","options":{
      "awslogs-group":"${LOG_GROUP}","awslogs-region":"${AWS_REGION}","awslogs-stream-prefix":"ecs"}}
  }] }
JSON
TASKDEF_ARN="$(aws ecs register-task-definition --cli-input-json file:///tmp/${NAME}-taskdef.json \
  --query 'taskDefinition.taskDefinitionArn' --output text)"
```
```bash
# ✔ verify
aws ecs describe-task-definition --task-definition "${NAME}-task" \
  --query 'taskDefinition.containerDefinitions[0].portMappings[0].containerPort' --output text   # == CONTAINER_PORT
```
> → Live State: TASKDEF_ARN

## 8. ECS service  🟢⏳

> `bluegreen` ⇒ `deploymentController=CODE_DEPLOY` (CodeDeploy owns cutover); `rolling`/`canary` ⇒ `ECS`.
> Network: the private/app subnets + the service SG; load balancer points at the **blue** TG.

```bash
SUBNET_1A="$(aws ssm get-parameter --name /network/subnet/public/1a --query Parameter.Value --output text)"
SUBNET_1B="$(aws ssm get-parameter --name /network/subnet/public/1b --query Parameter.Value --output text)"
CONTROLLER=$([ "$DEPLOYMENT_TYPE" = "bluegreen" ] && echo CODE_DEPLOY || echo ECS)
SERVICE_ARN="$(aws ecs create-service --cluster "$CLUSTER_ARN" --service-name "${NAME}-service" \
  --task-definition "${NAME}-task" --desired-count 1 --launch-type FARGATE \
  --deployment-controller "type=${CONTROLLER}" \
  --network-configuration "awsvpcConfiguration={subnets=[${SUBNET_1A},${SUBNET_1B}],securityGroups=[${SERVICE_SG_ID}],assignPublicIp=ENABLED}" \
  --load-balancers "targetGroupArn=${TG_BLUE_ARN},containerName=${SERVICE},containerPort=${CONTAINER_PORT}" \
  --query 'service.serviceArn' --output text)"
aws ecs wait services-stable --cluster "$CLUSTER_ARN" --services "${NAME}-service"   # ⏳ until tasks healthy
```
```bash
# ✔ verify
aws ecs describe-services --cluster "$CLUSTER_ARN" --services "${NAME}-service" \
  --query 'services[0].{controller:deploymentController.type,running:runningCount}' --output json   # CODE_DEPLOY, 1
```
> → Live State: SERVICE_ARN

## 9. CodeDeploy app + deployment group (blue/green)  🔴🟢  *(skip if DEPLOYMENT_TYPE≠bluegreen)*

> 🔴 GATE — wiring production traffic control. Binds prod listener :443 + test listener :9001 and both TGs.

```bash
CODEDEPLOY_APP="$(aws deploy create-application --application-name "${NAME}-cd-app" \
  --compute-platform ECS --query 'applicationId' --output text)"
cat > /tmp/${NAME}-dg.json <<JSON
{ "applicationName": "${NAME}-cd-app",
  "deploymentGroupName": "${NAME}-cd-deployment-group",
  "serviceRoleArn": "${CODEDEPLOY_ROLE_ARN}",
  "deploymentConfigName": "CodeDeployDefault.ECSAllAtOnce",
  "deploymentStyle": {"deploymentType":"BLUE_GREEN","deploymentOption":"WITH_TRAFFIC_CONTROL"},
  "blueGreenDeploymentConfiguration": {
    "terminateBlueInstancesOnDeploymentSuccess": {"action":"TERMINATE","terminationWaitTimeInMinutes":5},
    "deploymentReadyOption": {"actionOnTimeout":"CONTINUE_DEPLOYMENT"}},
  "ecsServices": [{"clusterName":"${ENV}-cluster","serviceName":"${NAME}-service"}],
  "loadBalancerInfo": {"targetGroupPairInfoList":[{
    "targetGroups":[{"name":"${NAME}-blue-tg"},{"name":"${NAME}-green-tg"}],
    "prodTrafficRoute":{"listenerArns":["${LISTENER_443}"]},
    "testTrafficRoute":{"listenerArns":["${LISTENER_9001}"]}}]} }
JSON
CODEDEPLOY_DG="$(aws deploy create-deployment-group --cli-input-json file:///tmp/${NAME}-dg.json \
  --query 'deploymentGroupId' --output text)"
```
```bash
# ✔ verify
aws deploy get-deployment-group --application-name "${NAME}-cd-app" \
  --deployment-group-name "${NAME}-cd-deployment-group" \
  --query 'deploymentGroupInfo.deploymentStyle.deploymentType' --output text   # BLUE_GREEN
```
> → Live State: CODEDEPLOY_APP, CODEDEPLOY_DG

## 10. DNS record → ALB  🟢  *(skip if DNS_MODE=none)*

> Branch on `DNS_PROVIDER` (from `network.aws.md`). **example.com is on Cloudflare**, so the default
> is the Cloudflare branch — `domain.aws.md`/Route 53 only applies to domains still delegated to AWS.
> Discover the zone, don't assume it.

> The record **content is a pure function** of the inputs (`${DOMAIN_NAME} → ${ALB_DNS}`); only the API
> that places it differs by provider. CNAME UPSERTs are idempotent, so re-applying is deterministic.

```bash
case "${DNS_PROVIDER:-cloudflare}" in
  cloudflare)  # CNAME → ALB DNS, PROXIED (orange) — pairs with SSL=Full(strict) + the CF Origin cert on the ALB.
    curl -fsS -X POST "https://api.cloudflare.com/client/v4/zones/${CF_ZONE_ID}/dns_records" \
      -H "Authorization: Bearer ${CF_API_TOKEN}" -H "Content-Type: application/json" \
      --data "{\"type\":\"CNAME\",\"name\":\"${DOMAIN_NAME}\",\"content\":\"${ALB_DNS}\",\"proxied\":true}" ;;
  route53)     # alias A-record (only for AWS-delegated domains)
    HOSTED_ZONE_ID="$(aws route53 list-hosted-zones-by-name --dns-name "${DOMAIN_NAME#*.}" \
      --query 'HostedZones[0].Id' --output text | sed 's#/hostedzone/##')"
    ALB_ZONE_ID="$(aws elbv2 describe-load-balancers --load-balancer-arns "$ALB_ARN" \
      --query 'LoadBalancers[0].CanonicalHostedZoneId' --output text)"
    aws route53 change-resource-record-sets --hosted-zone-id "$HOSTED_ZONE_ID" --change-batch "{
      \"Changes\":[{\"Action\":\"UPSERT\",\"ResourceRecordSet\":{\"Name\":\"${DOMAIN_NAME}\",\"Type\":\"A\",
      \"AliasTarget\":{\"HostedZoneId\":\"${ALB_ZONE_ID}\",\"DNSName\":\"${ALB_DNS}\",\"EvaluateTargetHealth\":true}}}]}" ;;
  godaddy)     # CNAME via GoDaddy API
    curl -fsS -X PUT "https://api.godaddy.com/v1/domains/${DOMAIN_NAME#*.}/records/CNAME/${DOMAIN_NAME%%.*}" \
      -H "Authorization: sso-key ${GODADDY_KEY}:${GODADDY_SECRET}" -H "Content-Type: application/json" \
      --data "[{\"data\":\"${ALB_DNS}\",\"ttl\":600}]" ;;
  manual)      # any other registrar — emit the exact record, wait for the human, then verify
    echo "ADD AT YOUR DNS HOST, THEN PRESS ENTER:  CNAME  ${DOMAIN_NAME}  →  ${ALB_DNS}"; read -r _ ;;
esac
```
```bash
# ✔ verify
dig +short "$DOMAIN_NAME"   # cloudflare(proxied) → CF edge IPs; route53/godaddy/manual → ALB DNS
```
> → Live State: HOSTED_ZONE_ID (route53) / CF record id (cloudflare) / provider record id

## Update (idempotent reconcile)

- **New image / new deploy (blue/green):** push a new immutable tag, register a new task-def revision,
  then `aws deploy create-deployment --application-name "${NAME}-cd-app" --deployment-group-name
  "${NAME}-cd-deployment-group" --revision <appspec-pointing-at-new-taskdef>`. CodeDeploy shifts :443
  from blue→green after :9001 validates, then terminates blue. **Do not** `update-service` the task-def
  directly under the CODE_DEPLOY controller — CodeDeploy owns revisions.
- **Scale:** `aws ecs update-service --cluster "$CLUSTER_ARN" --service "${NAME}-service" --desired-count N`.
- **Stuck deploy:** `aws ecs update-service … --force-new-deployment` (per the README's recovery note).

## Teardown (observe-first, resumable)  💥

> Reverse of create. IAM deletes need the broker `--no-session` mode. The ALB-SG in `ecs-cluster.aws.md`
> cannot be deleted until **this** service SG is gone — tear the service down before the platform.

```bash
# 1. CodeDeploy DG + app (if bluegreen)  💥
aws deploy delete-deployment-group --application-name "${NAME}-cd-app" --deployment-group-name "${NAME}-cd-deployment-group" 2>/dev/null || true
aws deploy delete-application --application-name "${NAME}-cd-app" 2>/dev/null || true
# 2. Route 53 record (if managed)  💥  — re-run §10 change-batch with Action=DELETE
# 3. ECS service (scale to 0, then delete)  💥⏳
aws ecs update-service --cluster "$CLUSTER_ARN" --service "${NAME}-service" --desired-count 0 >/dev/null
aws ecs delete-service --cluster "$CLUSTER_ARN" --service "${NAME}-service" --force >/dev/null
# 4. target groups  💥
for TG in "$TG_GREEN_ARN" "$TG_BLUE_ARN"; do [ -n "$TG" ] && aws elbv2 delete-target-group --target-group-arn "$TG"; done
# 5. task def (deregister all revisions)  💥 ; log group
aws logs delete-log-group --log-group-name "$LOG_GROUP" 2>/dev/null || true
# 6. service SG  💥  (frees the ALB-SG for the platform teardown)
aws ec2 delete-security-group --group-id "$SERVICE_SG_ID"
# 7. IAM roles (broker --no-session)  💥  — detach policies then delete-role for exec/task/codedeploy
# 8. ECR repo (force = delete images too)  💥
aws ecr delete-repository --repository-name "$ECR_REPO" --force
```
```bash
# ✔ teardown verify
aws ecs describe-services --cluster "$CLUSTER_ARN" --services "${NAME}-service" --query 'services[0].status' --output text  # DRAINING/INACTIVE/missing
aws ecr describe-repositories --repository-names "$ECR_REPO" 2>&1 | grep -q RepositoryNotFound && echo ecr-gone
```
> → Live State: set `status: gone`, clear realized ids.

## Deliberately not included

- **The cluster, ALB, listeners, ALB-SG** — the shared platform; a **Requires** on `ecs-cluster.aws.md`.
- **The DNS zone** — a **Requires** on `domain.cloudflare.md` (example.com on Cloudflare; `domain.aws.md`
  only for AWS-delegated domains), and only when `DNS_MODE=managed`.
- **`service-b` / `service-c` as separate files** — they are `SERVICE=service-b|c` runs of *this* plan;
  the multiplicity is by-parameter, not by-fork.
- **Autoscaling target tracking** (`cpu_target=50`, `mem_target=75`) — add via `application-autoscaling`
  as an Update; omitted from first apply to keep the frontier legible.
- **An appspec/CI pipeline** — `create-deployment` is shown in Update; wiring CodePipeline is a separate plan.
