-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out how to performance test the aws-s3 input in SQS mode #76
Comments
OOTB experienceThe OOTB is the performance we get from the Agent without any customizations on the input or the output. 8.11.4For the Agent 8.11.4, we will set:
Note: the 8.11 default settings come from elastic/elastic-agent#3797
bulk_max_size: 50
worker: 1
queue.mem.events: 4096
queue.mem.flush.min_events: 2048
queue.mem.flush.timeout: 1s
compression: 0
idle_timeout: 60 $ cat logs/elastic-agent-* | jq -r '[.["@timestamp"],.component.id,.monitoring.metrics.filebeat.events.active,.monitoring.metrics.libbeat.pipeline.events.active,.monitoring.metrics.libbeat.output.events.total,.monitoring.metrics.libbeat.output.events.acked,.monitoring.metrics.libbeat.output.events.failed//0] | @tsv' | sort | grep s3
2024-02-04T23:00:22.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 852 852 9800 9750 0
2024-02-04T23:00:52.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 910 910 10850 10850 0
2024-02-04T23:01:22.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1007 1007 11300 11300 0
2024-02-04T23:01:52.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 861 861 11250 11250 0
2024-02-04T23:02:22.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1127 1127 11400 11400 0
2024-02-04T23:02:52.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 996 996 11350 11350 0
2024-02-04T23:03:22.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1013 1013 11150 11150 0
2024-02-04T23:03:52.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 785 785 11250 11250 0
2024-02-04T23:04:22.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1069 1069 10850 10850 0
2024-02-04T23:04:52.949Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 956 956 11350 11350 0 8.12.0For the Agent 8.12.0, we will set:
bulk_max_size: 1600
worker: 1
queue.mem.events: 3200
queue.mem.flush.min_events: 1600
queue.mem.flush.timeout: 10s
compression: 1
idle_timeout: 3 2024-02-04T23:20:01.321Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1254 1254 2225 2225 0
2024-02-04T23:20:31.322Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1053 1053 3465 3465 0
2024-02-04T23:21:01.322Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 836 836 2014 2014 0
2024-02-04T23:21:31.321Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1130 1130 3232 3232 0
2024-02-04T23:22:01.321Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1122 1122 2418 2418 0
2024-02-04T23:22:31.322Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1060 1060 3190 3190 0
2024-02-04T23:23:01.322Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1305 1305 2004 2004 0
2024-02-04T23:23:31.322Z aws-s3-b3107b00-d538-11ed-bb66-095ca05d09b4 1270 1270 3604 3604 0 |
The impact of the cache credentials bugIn progress |
Tweak the input and output settings to improve performance
Output settings bulk_max_size: 2048
worker: 8
queue.mem.events: 32768
queue.mem.flush.min_events: 2048
queue.mem.flush.timeout: 1s
idle_connection_timeout: 60s
compression_level: 1 Variants:
Observations:
|
The hypothesis is that s3 objects with only one event significantly impact the input performance. Let's test and compare the performance of two batches of s3 objects only made of big or small objects. I'm extracting two datasets from the Cloudwatch dataset:
And I'm loading into ES using two SQS queues and data streams. Query the entire dataset we previously ingested:
I saved the two datasets in two CSV files:
I loaded the content of these two files in two SQS queues using the script:
|
At ~60% of the "bigfiles" batch, the performance starts declining: The declining trend may be due to ordering the s3 objects in the batch. The query in ES returned a list of s3 objects ordered by doc count. The batch contains 150k object keys, with a doc count from 10102 to 101 ordered by doc count DESC. |
I am trying to adjust the output settings to get better performance from the input. By reducing the bulk_max_size: 2048
worker: 8
queue.mem.events: 32768
queue.mem.flush.min_events: 1
queue.mem.flush.timeout: 1s
idle_connection_timeout: 60s
compression_level: 1 |
Do you observe the mean |
There are lots of interesting notes in https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html
This suggests to me that ~200 ms is possibly as good as we will observe when getting a small object. It also suggests that the way around this is to greatly increase concurrency to trigger the auto-scaler. If we take the maximum for a single GET as 200 ms, that is 5 request/sec for a single client. At the suggested maximum of 5500 requests/sec within a single prefix that would require us to use 5500 / 5 = 1100 concurrent requests. So perhaps making 1000s of concurrent requests is exactly what we should be doing. |
I want to run performance tests on the
aws-s3
input using the Elastic Agent 8.11.4 and 8.12.0 to ingest Cloudtrail logs.The test will run on the following EC2 instance.
c7g.2xlarge
The goals are:
role_arn
option.Authentication
We will use an EC2 instance profile only, so there will be no authentication related options in the integration settings.
Dataset
As a test dataset, I will use an S3 bucket containing 1.2M objects of Cloudtrail logs. Each object is a file with 1-n Cloudtrail events compressed with gzip.
I will use a script to:
I will download the S3 objects once, and load it multiple times as needed.
Test process
The text was updated successfully, but these errors were encountered: