deploying to amaysim ECS

  • Introduction to amaysim’s ECS platform
  • Setting up a project with ecs-utils
  • Common troubleshooting steps

why ECS?

  • Overcome shortcomings of Rancher
  • Easy to manage compared to Kubernetes
  • Multi-tenanting
  • Very cost-efficient with spot instances
  • Container autoscaling
  • IAM roles for containers

goals of our ECS platform

  • Fully automated, zero downtime blue/green deployments
  • Multi-tenanted Docker platform
  • Container autoscaling
  • Spot Fleet
  • Flexible but opinionated architecture

architecture overview

amaysim ECS Overview

amaysim ECS Overview

autoscaling

Image by Philipp Garbe via https://garbe.io

  • Autoscaling using Target Tracking on reservation metrics
  • All traffic goes through shared CloudFront
  • We use Application Load Balancer (ALB), not Classic ELB
  • Multiple instance types via Spot Fleet

prerequisite: the application stack

Application Stack

docker-ecs-utils/ecs-cluster-application.yml

application template in Github

Continuous Deployment using Stacker:

  - name: ECS-Dev-App-Example
    template_path: ecs-cluster-application.yml
    region: ap-southeast-2
    profile: nonprod
    requires:
      - ECS-Dev
    variables:
      Name: Example
      Environment: Dev
      ClusterName: Dev
      HostedZoneName: dev-apps.amaysim.net.
      LBType: ALB
      Scheme: External
      SSLCertificateARN: arn:aws:acm:ap-southeast-2:999999999999:certificate/4013c1bc-a532-4fcc-9f90-123456789876  # *.dev-apps.amaysim.net
      Subnets: subnet-12345678,subnet-22345678,subnet-3456789e  # Dev Public x
      VpcId: vpc-abc234ad

setting up a project with ecs-utils

ecs-utils repo

  1. Add ecs.json and ecs-config.yml
  2. Add deploy, cutover and autocleanup targets to Makefile
  3. Add amaysim/ecs-utils to docker-compose.yml
  4. Update .env and .env.template with environment variables

ecs.json

{
    "containerDefinitions": [
        {
            "essential": true,
            "image": "123456789987.dkr.ecr.ap-southeast-2.amazonaws.com/devops/ok:${BUILD_VERSION}",
            "name": "${ECS_APP_NAME}",
            "linuxParameters": {
                "initProcessEnabled": true
            },
            "portMappings": [
                {
                    "containerPort": 8888
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "ecs-${ECS_APP_NAME}-${ENV}",
                    "awslogs-region": "ap-southeast-2",
                    "awslogs-stream-prefix": "${BUILD_VERSION}"
                }
            }
        }
    ],
    "family": "${ECS_APP_NAME}-${ENV}",
    "volumes": [],
    "memory": "128",
    "cpu": "128"
}

ecs-config.yml

---
lb_health_check: /app/healthcheck
lb_health_check_grace_period: 30
lb_deregistration_delay: 60

autoscaling: Enable
autoscaling_target: 60  # what level of CPU utilisation to maintain
autoscaling_min_size: 3
autoscaling_max_size: 20

Makefile & .env

deploy: $(ENV_RM_REQUIRED) $(DOTENV_TARGET) $(ASSUME_REQUIRED)
	docker-compose run --rm ecs make -f /scripts/Makefile deploy

cutover: $(ENV_RM_REQUIRED) $(DOTENV_TARGET) $(ASSUME_REQUIRED)
	docker-compose run --rm ecs make -f /scripts/Makefile cutover

autocleanup: $(ENV_RM_REQUIRED) $(DOTENV_TARGET) $(ASSUME_REQUIRED)
	docker-compose run --rm ecs make -f /scripts/Makefile autocleanup
ENV=Dev
REALM=NonProd
ECS_APP_NAME=Example
ECS_CLUSTER_NAME=Dev
BUILD_VERSION=10-b4cce22

AWS_DEFAULT_REGION=ap-southeast-2
AWS_HOSTED_ZONE=www-dev.amaysim.com.au
BASE_PATH=/551f7c62858899445e42d904170f56ca

deployment walkthrough

Version Stack

cutting over

  • Changing default listener rule
  • Checking the live version
  • Setting desired number of containers
  • blue/green

verifying and troubleshooting

  1. Check the CloudFormation events
  2. Check the service events and state
  3. Check the task exit reason
  4. Check the cluster state
  5. Check the app logs
  6. Check response directly from load balancer or even container
  7. Check the target group healthchecks
  8. Check CloudWatch metrics
  9. If all else fails: SSH to EC2 instance

Thanks!

Questions?