This plugin ensures that the Amazon SQS queue does not exceed the maximum number of unprocessed messages.
Risk Level: MEDIUM
This plugin ensures that the Amazon SQS queue does not exceed the maximum number of unprocessed messages. To be highly available and responsive, Amazon SQS queues should have fewer unprocessed messages than the limit.
Recommended Action: Set up appropriate message polling time and set up dead letter queue for Amazon SQS queue to handle messages in time.
About the Service :
SQS (Amazon Simple Queue Service) is a fully managed message queuing service for decoupling and scaling microservices, distributed systems, and serverless applications. SQS removes the complexity and overhead of managing and operating message-oriented middleware, allowing developers to concentrate on work that is unique. You may send, store, and receive messages across software components using SQS at any volume without losing messages or necessitating the availability of other services.
If the number of unresponsive messages is not monitored and crosses a certain limit then it may affect the health of AWS consumers such as EC2 instances or Lambda functions that read messages from designated SQS queue and do the actual processing.
Steps to reproduce :
- Sign in to your AWS Management Console.
- Navigate to the SQS dashboard at: https://console.aws.amazon.com/sqs/
- Select the SQS queue that you want to examine.
- Select the Monitoring tab from the bottom panel and check for “ Approximate Number of Messages Visible ”.
- The specified AWS SQS queue stores too many unprocessed messages if the value presented here is equal to or higher than the threshold value defined (default or custom on Pingsafe dashboard).
- Repeat steps no. 3 and 4 for each SQS present in the current region as well as for other regions
Steps for remediation :
For EC2 Instances:
- Navigate to EC2 dashboard at: https://console.aws.amazon.com/ec2/
- In the left navigation panel, under the Instances section, click Instances.
- Select the worker EC2 instance.
- If the instance Status Check is failed and the resource is unreachable:
- Click the Actions button from the dashboard top menu, select Instance State, then choose Reboot.
- In the Reboot Instances dialog box, review the instance details and click Yes, Reboot to reboot the instance.
- If the Status Check passes, the instance most likely lacks the capacity to process the required SQS messages. Execute the following commands to upgrade the resource type:
- From the dashboard's top menu, click Actions, then Instance State, then Stop.
- Review the action details in the Stop Instances dialogue box before clicking Yes, Stop to stop the instance.
- Select Instance Settings, then Change Instance Type by clicking the Actions button once more.
- Choose the resource type to upgrade to from the Instance Type dropdown list in the Change Instance Type dialogue box, then click Apply to upgrade the instance type.
- From the dashboard's top menu, select Actions, then Instance State, then Start.
- To restart the instance, click Yes, Start in the Start Instances dialogue box. The booting of the instance and its system checks should only take a few minutes.
For Lambda Function:
- Navigate to Lambda dashboard at: https://console.aws.amazon.com/lambda/
- In the navigation panel, under the AWS Lambda section, choose Functions.
- Select the Lambda function that serves as an SQS consumer.
- Select the Monitoring tab from the dashboard bottom panel then click View logs in CloudWatch link to access the selected function logs. Click the right log stream and analyze it for errors.
- If the function log stream does not have any errors, most probably the Lambda function is not getting enough resources to process the designated SQS messages. To increase the serverless consumer resources, execute the following:
- To access the resources settings panel, go to the Configuration tab and click Advanced settings.
- Adjust the size of the memory allotted for the specified function by picking one of the predefined values from the Memory (MB) dropdown list, or change the existing timeout value within the Timeout min/sec settings boxes to enhance the worker compute capacity.
- Click the Save button from the dashboard top menu to apply the changes.
- If the Lambda serverless worker cannot resume SQS queue processing following the capacity (memory) update, troubleshoot your worker function using the Code tab on the setup page.