- Introduction
- Installing MLPerf serverless platform
- Benchmarking DNN models in MLPerf
- Visualizing the experiments
- SUT Layers (IE, HANDLER)
- SUT Lambda function
- Useful information
This repo is the implementation of the Benchmarking DNN inference performance on serverless environment with MLPerf paper deployed to AWS Lambda. This repo benchmarks DNN models performance with Caffe, TensorFlow and OpenVINO models, using OpenCV and OpenVINO as inference engines. The benchmark follows the MLPerf closed division metrics and rules for image classification and object detection computer vision tasks using MobileNetV1 and SSDMobileNetV1 models. It has the following prerequisites:
- OpenCV with OpenVINO as dependency. A precompiled version in Python 3.6 for Amazon Linux (the OS used in Amazon Lambda runtimes) is available here.
- AWS account
- AWS cli command line.
- AWS boto3 library
The MLPerf SUT is comprised of AWS Lambda function (lambda_test_SUT) and two layers (IE and Handler).
- 1.1. Follow the instructions of the AWS Lambda documentation.
- 1.2. Paste the code of lambda_function/lambda_test_SUT.py into your Lambda function.
- Go to the IAM manager
- create a role choosing the lambda use case
Attach this role to the lambda function in lambda_console_manager/permissions/execution_role.
In the aws_layer folder there two layers (IE layer, Handler layer):
- IE Layer: a precompiled version of IE layer (cv2-layer.zip) is available here.
- Handler layer: use generate_layer.py python script to generate a handler_utils.zip.
Upload them (cv2-layer.zip, handler_utils.zip) as AWS Lambda layers using the Lambda function web interface.
The runtime environment in the Amazon layer creation must be Python 3.6.
Important: If you are willing to upload a modified version of the layers, the zip file folder structure must follow this path: python/lib/python3.6/site-packages/the_package_name
- For IE layer cv2.
- For Handler layer dldt_tools.
For more information about AWS Lambda layers check the following link .
- 3.1 Create a bucket in your S3 storage service.
- 3.2 Then, create the following folder structure.
Folder descriptions:
- coco: it will contain the dataset for the object detection task.
- imagenet: it will contain the dataset for the image the classification task.
- completed: the SUT moves here the input .json files from the input folder, when the inference is done.
- input: LoadGen uploads the input .json files from your PC (the event notification must be assigned to this folder, check section 5).
- models: the DNN models are placed here.
- output: the SUT delivers the output .json files here.
Add the following environment variables in the AWS Lambda control panel:
The bucket name should match with the previously created in 3.1
- 5.1 Create a PUT notification in S3 for the input folder in your AWS S3 bucket properties:
Example:
- 5.2 Create a S3 event in the Lambda control panel, with the associated code of the s3 trigger. Example:
- 5.3 Add a s3 trigger in your bucket with the event type ObjectCreatedByPut with the associated notification name in 5.1.
Example:
If the folder structure of 3.2 section is created, upload the DNN models to your Amazon S3 bucket to the following path:
s3://your_bucket/models/mobilenetv1/FP32/
s3://your_bucket/models/ssd-mobilenetv1/FP32/
The models used for benchmark are:
Model Name | Vision Task | Engine & Model Type | Download |
---|---|---|---|
MobileNetV1 | image classification | IE-IR | weights, model |
MobileNetV1 | image classification | OCV-CF | weights, model |
MobileNetV1 | image classification | OCV-TF | weights & model |
SSDMobileNetV1 | object detection | IE-IR | weights, model |
SSDMobileNetV1 | object detection | OCV-CF | weights model |
SSDMobileNetV1 | object detection | OCV-TF | weights, model |
Notice that the filenames of the DNN models must follow the naming convention.
The reference TensorFlow models are taken from MLPerf v0.5 version:
- TF MobileNetV1: mobilenet-v1
- SSD MobileNetV1: ssd-mobilenet 300x300
We used OpenVINO model optimizer to export models to OpenVINO.
7.1 Upload the images to Amazon S3 using aws-cli (recommended) or s3 web interface to s3://your_bucket/coco for object_detection and to s3://your_bucket/imagenet for image classification.
For more information about datasets and how to download dataset files, go to MLPerf v0.5 page.
For more information about how to upload severals files through Amazon S3 and aws-cli check the following link
The LoadGen is stored in a host machine to send the queries to the SUT. Execute experiments/src/generate_loadgen_input.py to generate input.json files to be ready for executing the benchmark. Important: the generate_loadgen_input.py requires that the downloaded image dataset and the uploaded image dataset to S3 be the same as ix explained in step 7.1. usage:
python3 generate_loadgen_input.py \
--num number_of_image_samples --input_dataset <local path of the images of the dataset> \
--dataset_name coco
Options:
- --num: number of image samples for the experiments.
- --input_dataset: the collection of the images downloaded from the selected dataset [COCO, ImageNet], check 7.1 section for more information.
- --dataset_name: the dataset name, accepted values [coco, imagenet]
For example, if --num 1000 , a thousand .json files will be created for each experiment. The output .json files should look like this format.
{"CompletedPath": "completed/ssd_tf", "ImageFilenames": ["COCO/000000578093.jpg"], "OutputPath": "output/ssd_tf"}
The experiments/src/MLPERF.py python script manages the all life-cycle of the benchmark, which finally shows the benchmarked DNN model performance results.
MLPerf.py output result sample:
**********benchmarked unit***********
benchmark ssd_openvino
memory 768
latencyes 50% -> 291.14699363708496 90% -> 316.87426567077637
QPS: 47.511881528529536
number of files: 1001
Time difference 21.068414211273193 s
Go to the created Lambda function and modify the cv_task and engine_and_model variable in the Lambda function. The cv_task allows classification and object_detection configurations and the engine_and_model options are ie-ir, ocv-cf, ocv-tf (see the paper for details).
Example.
engine_and_format = "ie-ir" # ie-ir, ocv-cf, ocv-tf
cv_task = "classification" # classification, object_detection
backend = choose_engine_and_format(engine_and_format, cv_task)
Also, modify the Lambda memory configuration for the desired memory configuration for each function instance.
In the article 4 memory slots are defined [768, 1536, 2240, 3008]
Example:
python MLPERF.py --engine_and_format ie-ir --cv_task classification --memory_type 0 --bucket_name your_bucket_name --profile=your_profile
Options:
- --engine_and_format: [ie-ir, ocv-cf, ocv-tf]
- --cv_task: [classification, object_detection]
- --memory_type: [0,1,2,3] -> for [768,1536,2240,3008] MB
- --bucket_name: the user created bucket name in AWS S3.
- --profile: the created profile name with secret id and keys for aws-cli operations.
If a profile is not defined in aws-cli, use --profile=default in the command line.
While the benchmarking process is working in amazon AWS Lambda, each function instance delivers a .json file with the performance metrics in the s3://your_bucket/output with the benchmarked metrics. The json structure of these files are:
{
"ImageFilenames": [
"Imagenet/val/ILSVRC2012_val_00003351.JPEG"
],
"inf_perf": [
{
"load_model": 72.51644134521484,
"image_operations": 34.81721878051758,
"forward": 1137.1262073516846
}
],
"result": [], # we skip this because the vector is too big (of size [1,1,100,7])
"start_time": 1590342810.3134592,
"finish_time": 1590342811.5321517
}
So if you upload 1000 .json to S3 folder, 1000 .json results will be stored. When the benchmark is finished all measured metric files are downloaded and the benchmark is calculated. The MLPERF.py python script calculates the benchmark from the S3 stored .json files, as it is explained in the section ENABLING SERVERLESS RUNTIME IN MLPERF.
experiments/src/Latencies_barchart.py and experiments/src/qps_barchart.py generate the barchart images taking results.csv file as input data.
As it is explained in the paper there are two AWS layer in the implementation. This diagram shows a detailed overview of the layers:
This Lambda function depends on two AWS layers IE layers (cv2.zip) and handler_utils (handler_utils.zip) processing_layer. A Lambda function has two scopes, global and function scope, for example:
import xxxx
GLOBAL_SCOPE
def lambda_handler(event, context):
FUNCTION SCOPE.
The interesting part of GLOBAL_SCOPE is that all instances created in this scope, are shared between function instances (FUNCTION_SCOPE).
In the Lambda function there is a function called choose_engine_and_format() which defines the vision task type along with the inference engine and DNN model format. It loads the DNN model, initializes the dataset (ImageNet, COCO) and prepares the inference engine into the GLOBAL SCOPE.
def choose_engine_and_format(engine_and_format, cv_task):
if cv_task == "classification":
models_path = "mobilenetv1/FP32/"
if engine_and_format == "ie-ir":
backend = "mobilenet-ov-runtime"
elif engine_and_format == "ocv-tf":
backend = "mobilenet-tf-runtime"
elif engine_and_format == "ocv-cf":
backend = "mobilenet-caffe-runtime"
if cv_task == "object_detection":
models_path = "ssd-mobilenetv1/FP32/"
if engine_and_format == "ie-ir":
backend = "ssd-mobilenet-ov-runtime"
elif engine_and_format == "ocv-tf":
backend = "ssd-mobilenet-tf-runtime"
elif engine_and_format == "ocv-cf":
backend = "ssd-mobilenet-caffe-runtime"
backend_handler = HandlerApp(engine_and_format)
backend_handler.init(backend, models_path, make_profiling = True)
return backend_handler
The Handler layer is imported in the Lambda function as this way:
from dldt_tools.processing_layer import HandlerApp #for Handler layer
And the handler layer is initialized with this code lines:
backend_handler = HandlerApp(engine_and_format)
backend_handler.init(backend, models_path, make_profiling = True)
The DNN inference is processed with the following functions:
h_var = backend.init_handler_variables(event) #initialize the input data from Dataset stored in OSS
backend.make_aws_inference(h_var) # make aws inference and post-processing
finish_time = time.time()
h_var["OutputData"]["start_time"] = init_time
h_var["OutputData"]["finish_time"] = finish_time
backend.deliver_output_data(h_var) #deliver the benchmarking metrics in the OSS with a .json file