Did you ever had the problem that your ECS cluster isn’t scaling extra EC2 instances as resource provider? Even though some tasks are failing?
Does the error look very similar to
service stroobantsdev-service was unable to place a task because no container instance met all of its requirements. The closest matching container-instance x has insufficient CPU units available. For more information, see the Troubleshooting section.
service stroobantsdev-service was unable to place a task because no container instance met all of its requirements. The closest matching container-instance x has insufficient memory available. For more information, see the Troubleshooting section.
But when you go look at the metrics you see that there is only like 10% CPU Usage/Memory Usage?
The problem lies in that you have set the reservation too high! So you have a couple fixes
Now, for 99% of the cases you mostly scale on CPU/Memory usage (not possible here), you scale on queue size or any other third party events.
But you can also scale on CPU/Memory reservation
The first thing we must look at is, how do we want to scale? For this example I’ll take that we want to scale if one of the two reservation is above 75%. Now. we could create multiple alarms that would scale in/out when this happens but these could conflict with each other. For example when CPU is above 75% but memory is under 40% (which would trigger a scale down) it would cause a conflict as one alarm wants to scale out and another wants to scale in.
So let’s take a formula I remembered somewhere deep in memory.
sqrt(CPUReservation^2 + MemoryReservation^2)
Now if one of them is 75 or above it, this would return >=75 even when the other one would be zero. For the scale in you should try to find a good number but I use 40 at the moment. As everything with cloud (and FinOps), you should iterate over these values to what fits your situation. There is no magic number. Everything also depends on your instance sizes, container usages,…
First we will define our metrics. AWS publishes a CPUReservation and MemoryReservation metric under the namespace AWS/ECS. The dimensions are our cluster name (which here I get from a created cluster).
= new Metric({
const reservationCpuMetric : 'AWS/ECS',
namespace: 'CPUReservation',
metricName: "Average",
statistic: cdk.Duration.minutes(1),
period: {
dimensions"ClusterName": cluster.clusterName
}
;
})= new Metric({
const reservationMemoryMetric : 'AWS/ECS',
namespace: 'MemoryReservation',
metricName: "Average",
statistic: cdk.Duration.minutes(1),
period: {
dimensions"ClusterName": cluster.clusterName
};
})
= new MathExpression({
const scaleReservation : "(m1^2+m2^2)^(1/2)",
expression: cdk.Duration.minutes(1),
period: {
usingMetrics'm1': reservationCpuMetric,
'm2': reservationMemoryMetric,
}; })
so, as you can see, the important part here is the scaleReservation
which will be a math expression using the reservation of CPU and memory metric and apply our formula on it. Now to apply the scaling itself
= new AutoScalingGroup(this, "clusterAsgSpotFleet", {
const autoScalingGroup ,
vpc: new InstanceType('t3.large'),
instanceType: ecs.EcsOptimizedImage.amazonLinux2(),
machineImage: 1,
minCapacity: 5,
maxCapacity: UpdatePolicy.rollingUpdate(),
updatePolicy: cdk.Duration.days(14),
maxInstanceLifetime;
})
.addUserData(`
autoScalingGroup#!/bin/bash
cat <<'EOF' >> /etc/ecs/ecs.config
ECS_CLUSTER=${cluster.clusterName}
EOF`);
.scaleOnMetric('ScaleToReservation',{
autoScalingGroup: averageReservation,
metric: [
scalingSteps: 40, change: -1 },
{ upper: 75, change: +1 },
{ lower,
]: AdjustmentType.CHANGE_IN_CAPACITY,
adjustmentType;
})
= new ecs.AsgCapacityProvider(this, 'clusterAsgSpotFleetProvider', {
const capacityProvider ,
autoScalingGroup;
})
.addAsgCapacityProvider(capacityProvider); cluster
Here we created and added the autoscaling group to our cluster. The important part is the scaleOnMetric
where we configure how the group should scale when the metric is >=75 (+1 instance)
and <=40 (-1 instance)