Last updated: Jan 27, 2024
Reading timeยท7 min
AWS Services emit metrics that we can use to set up alarms via CloudWatch.
For example, metrics can be:
Every AWS service has documentation on the list of the metrics that are available by default:
Once we have created the metrics we want to track, we can create an alarm.
The purpose of an alarm in CloudWatch is to notify us when the metrics we've set reach specific values, over a specified period of time.
For instance, we can create an alarm that notifies us:
if the sum of Errors of a lambda function is greater than or equal to 5 for a period of 3 minutes
if the average Duration time of a lambda function's invocation exceeds 2 seconds over a period of 3 minutes
if the sum of throttled Dynamodb requests exceeds 3 over a period of 5 minutes
We are going to create a small CDK application that consists of the following resources:
Let's start by defining the Lambda function and the metrics.
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'; import * as lambda from 'aws-cdk-lib/aws-lambda'; import * as cdk from 'aws-cdk-lib'; import * as path from 'path'; export class MyCdkStack extends cdk.Stack { constructor(scope: cdk.App, id: string, props: cdk.StackProps) { super(scope, id, props); // ๐ lambda function definition const myFunction = new lambda.Function(this, 'my-function', { runtime: lambda.Runtime.NODEJS_18_X, memorySize: 1024, timeout: cdk.Duration.seconds(5), handler: 'index.main', code: lambda.Code.fromAsset(path.join(__dirname, '/../src/my-lambda')), }); // ๐ define a metric for lambda errors const functionErrors = myFunction.metricErrors({ period: cdk.Duration.minutes(1), }); // ๐ define a metric for lambda invocations const functionInvocation = myFunction.metricInvocations({ period: cdk.Duration.minutes(1), }); } }
The code for the Lambda function could be as simple as follows.
async function main(event) { throw new Error('An unexpected error occurred'); } module.exports = {main};
In the code sample we:
The higher-level constructs often expose methods that allow us to create metric objects, without having to manually instantiate the Metric class from the CloudWatch module.
For example, if we were working with a DynamoDB table, we could take advantage of methods like:
Next, let's add the alarms that will be triggered when our metrics reach a specified threshold.
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch'; import * as lambda from 'aws-cdk-lib/aws-lambda'; import * as cdk from 'aws-cdk-lib'; import * as path from 'path'; export class MyCdkStack extends cdk.Stack { constructor(scope: cdk.App, id: string, props: cdk.StackProps) { super(scope, id, props); // ... rest of the code // ๐ create an Alarm using the Alarm construct new cloudwatch.Alarm(this, 'lambda-errors-alarm', { metric: functionErrors, threshold: 1, comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD, evaluationPeriods: 1, alarmDescription: 'Alarm if the SUM of Errors is greater than or equal to the threshold (1) for 1 evaluation period', }); // ๐ create an Alarm directly on the Metric functionInvocation.createAlarm(this, 'lambda-invocation-alarm', { threshold: 1, evaluationPeriods: 1, alarmDescription: 'Alarm if the SUM of Lambda invocations is greater than or equal to the threshold (1) for 1 evaluation period', }); } }
We created a CloudWatch alarm using the Level 2 Alarm construct.
For the metric, we used the functionErrors
metric we created earlier.
threshold
is the value against which the statistic emitted by the metric is compared. For example, in our case, the number of error invocations of a lambda function will be compared against a threshold of 1.The comparisonOperator
is the operator we're using to compare the threshold
against the statistic emitted by the metric. In our case, if the SUM of
invocation errors from the Lambda function is GREATER_THAN_OR_EQUAL_TO
the
threshold of 1 over 1 evaluation period of 1 minute, the alarm will be
triggered.
The evaluationPeriods
property is the number of consecutive periods, over
which the threshold
is compared to the statistic emitted by the metric. In our
case, we set the period
property to 1 Minute when we created our metrics, so
we'll be comparing the threshold and the metric's statistic for 1 evaluation
period of 1 minute.
createAlarm
method, directly on the metric. Defining alarms in CDK can be done in multiple ways, so it's a matter of personal preference.Let's now create the stack and look at the result:
npx aws-cdk deploy
If I open my CloudFormation console I can see that the resources were created successfully:
If I open my CloudWatch console I can see that the alarms are at the Insufficient data state:
Metric Alarms can be in 1 of 3 states:
OK
- the metric is within the specified thresholdALARM
- the metric is outside the specified thresholdINSUFFICIENT_DATA
- the alarm has just started or there isn't enough data
available to determine the alarm's state.We've set up 2 alarms:
Let's invoke our lambda function using the console and look at the result.
As expected our lambda function failed, which means that if we look at the state of our alarms we should see that they have been triggered.
In order to create Alarms in CDK we have to first define a metric and then create our CloudWatch alarm.
We only created metrics by using the metric*
methods exposed on the
Function construct,
i.e.:
const functionErrors = myFunction.metricErrors({ period: cdk.Duration.minutes(1), });
However, some of the higher-level constructs might not expose a helper method for all of the metrics we need to create.
In this case, we can define our metrics by using the Metric class, for example:
// ๐ manually instantiate a Metric const concurrentExecutions = new cloudwatch.Metric({ namespace: 'AWS/Lambda', metricName: 'ConcurrentExecutions', period: cdk.Duration.minutes(5), statistic: 'Maximum', dimensions: { FunctionName: myFunction.functionName, }, });
In the code sample:
We created a metric in the namespace AWS/Lambda
. A namespace denotes an AWS
Service and the names are in the form of AWS/ServiceName
, for example
AWS/DynamoDB
, AWS/ApiGateway
.
As the letter casing can be confusing, you can view the specific casing of a
service's name by clicking on the Metrics
section in your CloudWatch
console and filtering by the name of the service.
The metricName
property is set to ConcurrentExecutions
. We can find all
of the available metrics for an AWS service simply by googling for
ServiceName cloudwatch metrics
. Lambda-specific CloudWatch metrics can be
found at
Lambda function metrics
The period
property is the period over which the specified statistic is
applied. In our case, we're tracking concurrent executions for a Lambda
functions over a period of 5 minutes.
The statistic
property is an aggregate of metric data over a specified
period
of time. The statistic
can be: Minimum
, Maximum
, Sum
, Average
,
etc. In our case, we're taking the Maximum
number of concurrent Lambda
function executions over a period of 5 minutes.
The dimensions
property allows us to filter the results that CloudWatch
returns. In the example, we get statistics for a specific lambda function,
because we've set the FunctionName
property.
To create CloudWatch alarms, we first define our metrics and then create an
alarm that compares a threshold
we've set, to the statistic emitted by a
metric over a period of time.
Most of the time we are able to use predefined helper methods, already written
for us by the CDK team, by using the metric*
methods on the construct.
const functionInvocation = myFunction.metricInvocations({ period: cdk.Duration.minutes(1), });
In the case that the helper method for the metric we need is not implemented, we can manually create a metric using the Metric class.
There are also multiple ways to create CloudWatch Alarms in CDK.
We can either create an alarm, using the Alarm construct:
new cloudwatch.Alarm(this, 'lambda-errors-alarm', { metric: functionErrors, threshold: 1, comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD, evaluationPeriods: 1, alarmDescription: 'Alarm if the SUM of Errors is greater than or equal to the threshold (1) for 1 evaluation period', });
Or we can create an alarm using the createAlarm
method directly on the metric
object:
functionInvocation.createAlarm(this, 'lambda-invocation-alarm', { threshold: 1, evaluationPeriods: 1, alarmDescription: 'Alarm if the SUM of Lambda invocations is greater than or equal to the threshold (1) for 1 evaluation period', });
We can delete the provisioned resources by running the cdk destroy
command:
npx aws-cdk destroy
You can learn more about the related topics by checking out the following tutorials: