Benchmarking DNN inference performance on serverless environment with MLPerf (example source code)

Index:

Introduction
Installing MLPerf serverless platform
Benchmarking DNN models in MLPerf
Visualizing the experiments
SUT Layers (IE, HANDLER)
SUT Lambda function
Useful information

Introduction

This repo is the implementation of the Benchmarking DNN inference performance on serverless environment with MLPerf paper deployed to AWS Lambda. This repo benchmarks DNN models performance with Caffe, TensorFlow and OpenVINO models, using OpenCV and OpenVINO as inference engines. The benchmark follows the MLPerf closed division metrics and rules for image classification and object detection computer vision tasks using MobileNetV1 and SSDMobileNetV1 models. It has the following prerequisites:

OpenCV with OpenVINO as dependency. A precompiled version in Python 3.6 for Amazon Linux (the OS used in Amazon Lambda runtimes) is available here.
AWS account
AWS cli command line.
AWS boto3 library

Installing MLPerf serverless platform

Installing SUT

The MLPerf SUT is comprised of AWS Lambda function (lambda_test_SUT) and two layers (IE and Handler).

1 Create Lambda function:

1.1. Follow the instructions of the AWS Lambda documentation.
1.2. Paste the code of lambda_function/lambda_test_SUT.py into your Lambda function.

2 create IAM Role to enable the Lambda function execution with Amazon S3:

Go to the IAM manager
create a role choosing the lambda use case

Attach S3 full access policy:
Define rolename: for example:

Attach this role to the lambda function in lambda_console_manager/permissions/execution_role.

2 Upload AWS Layers

In the aws_layer folder there two layers (IE layer, Handler layer):

IE Layer: a precompiled version of IE layer (cv2-layer.zip) is available here.
Handler layer: use generate_layer.py python script to generate a handler_utils.zip.

Upload them (cv2-layer.zip, handler_utils.zip) as AWS Lambda layers using the Lambda function web interface.

The runtime environment in the Amazon layer creation must be Python 3.6.

Important: If you are willing to upload a modified version of the layers, the zip file folder structure must follow this path: python/lib/python3.6/site-packages/the_package_name

For IE layer cv2.
For Handler layer dldt_tools.

For more information about AWS Lambda layers check the following link .

3 Configure the Online Storage Service (S3)

3.1 Create a bucket in your S3 storage service.
3.2 Then, create the following folder structure.

Folder descriptions:

coco: it will contain the dataset for the object detection task.
imagenet: it will contain the dataset for the image the classification task.
completed: the SUT moves here the input .json files from the input folder, when the inference is done.
input: LoadGen uploads the input .json files from your PC (the event notification must be assigned to this folder, check section 5).
models: the DNN models are placed here.
output: the SUT delivers the output .json files here.

4 Define Lambda environment variables:

Add the following environment variables in the AWS Lambda control panel:

The bucket name should match with the previously created in 3.1

5 Define event driven listener to activate the SUT:

5.1 Create a PUT notification in S3 for the input folder in your AWS S3 bucket properties:

Example:

5.2 Create a S3 event in the Lambda control panel, with the associated code of the s3 trigger. Example:
5.3 Add a s3 trigger in your bucket with the event type ObjectCreatedByPut with the associated notification name in 5.1.

Example:

6 Upload the models:

If the folder structure of 3.2 section is created, upload the DNN models to your Amazon S3 bucket to the following path:

s3://your_bucket/models/mobilenetv1/FP32/
s3://your_bucket/models/ssd-mobilenetv1/FP32/

The models used for benchmark are:

Model Name	Vision Task	Engine & Model Type	Download
MobileNetV1	image classification	IE-IR	weights, model
MobileNetV1	image classification	OCV-CF	weights, model
MobileNetV1	image classification	OCV-TF	weights & model
SSDMobileNetV1	object detection	IE-IR	weights, model
SSDMobileNetV1	object detection	OCV-CF	weights model
SSDMobileNetV1	object detection	OCV-TF	weights, model

Notice that the filenames of the DNN models must follow the naming convention.

The reference TensorFlow models are taken from MLPerf v0.5 version:

TF MobileNetV1: mobilenet-v1
SSD MobileNetV1: ssd-mobilenet 300x300

We used OpenVINO model optimizer to export models to OpenVINO.

7. Upload the dataset images.

7.1 Upload the images to Amazon S3 using aws-cli (recommended) or s3 web interface to s3://your_bucket/coco for object_detection and to s3://your_bucket/imagenet for image classification.

For more information about datasets and how to download dataset files, go to MLPerf v0.5 page.

For more information about how to upload severals files through Amazon S3 and aws-cli check the following link

Configure LoadGen:

The LoadGen is stored in a host machine to send the queries to the SUT. Execute experiments/src/generate_loadgen_input.py to generate input.json files to be ready for executing the benchmark. Important: the generate_loadgen_input.py requires that the downloaded image dataset and the uploaded image dataset to S3 be the same as ix explained in step 7.1. usage:

python3 generate_loadgen_input.py \
  --num number_of_image_samples --input_dataset <local path of the images of the dataset> \
   --dataset_name coco

Options:

--num: number of image samples for the experiments.
--input_dataset: the collection of the images downloaded from the selected dataset [COCO, ImageNet], check 7.1 section for more information.
--dataset_name: the dataset name, accepted values [coco, imagenet]

For example, if --num 1000 , a thousand .json files will be created for each experiment. The output .json files should look like this format.

{"CompletedPath": "completed/ssd_tf", "ImageFilenames": ["COCO/000000578093.jpg"], "OutputPath": "output/ssd_tf"}

Benchmarking DNN models in MLPerf

The experiments/src/MLPERF.py python script manages the all life-cycle of the benchmark, which finally shows the benchmarked DNN model performance results.

MLPerf.py output result sample:

**********benchmarked unit***********
benchmark ssd_openvino
memory 768
latencyes 50% -> 291.14699363708496 90% -> 316.87426567077637
QPS: 47.511881528529536
number of files: 1001
Time difference 21.068414211273193 s

Configure the benchmark type

Go to the created Lambda function and modify the cv_task and engine_and_model variable in the Lambda function. The cv_task allows classification and object_detection configurations and the engine_and_model options are ie-ir, ocv-cf, ocv-tf (see the paper for details).

Example.

engine_and_format = "ie-ir" # ie-ir, ocv-cf, ocv-tf
cv_task = "classification" # classification, object_detection
backend = choose_engine_and_format(engine_and_format, cv_task)

Also, modify the Lambda memory configuration for the desired memory configuration for each function instance. In the article 4 memory slots are defined [768, 1536, 2240, 3008] Example:

Start benchmarking

python MLPERF.py --engine_and_format ie-ir --cv_task classification --memory_type 0 --bucket_name your_bucket_name --profile=your_profile

Options:

--engine_and_format: [ie-ir, ocv-cf, ocv-tf]
--cv_task: [classification, object_detection]
--memory_type: [0,1,2,3] -> for [768,1536,2240,3008] MB
--bucket_name: the user created bucket name in AWS S3.
--profile: the created profile name with secret id and keys for aws-cli operations.

If a profile is not defined in aws-cli, use --profile=default in the command line.

Understanding the benchmarked results

While the benchmarking process is working in amazon AWS Lambda, each function instance delivers a .json file with the performance metrics in the s3://your_bucket/output with the benchmarked metrics. The json structure of these files are:

{
  "ImageFilenames": [
    "Imagenet/val/ILSVRC2012_val_00003351.JPEG"
  ],
  "inf_perf": [
    {
      "load_model": 72.51644134521484,
      "image_operations": 34.81721878051758,
      "forward": 1137.1262073516846
    }
  ],
  "result": [], # we skip this because the vector is too big (of size [1,1,100,7])
  "start_time": 1590342810.3134592,
  "finish_time": 1590342811.5321517
}

So if you upload 1000 .json to S3 folder, 1000 .json results will be stored. When the benchmark is finished all measured metric files are downloaded and the benchmark is calculated. The MLPERF.py python script calculates the benchmark from the S3 stored .json files, as it is explained in the section ENABLING SERVERLESS RUNTIME IN MLPERF.

Visualizing the experiments

experiments/src/Latencies_barchart.py and experiments/src/qps_barchart.py generate the barchart images taking results.csv file as input data.

SUT Layers

As it is explained in the paper there are two AWS layer in the implementation. This diagram shows a detailed overview of the layers:

SUT AWS Lambda function:

This Lambda function depends on two AWS layers IE layers (cv2.zip) and handler_utils (handler_utils.zip) processing_layer. A Lambda function has two scopes, global and function scope, for example:

  import xxxx
  GLOBAL_SCOPE
  def lambda_handler(event, context):
    FUNCTION SCOPE.

The interesting part of GLOBAL_SCOPE is that all instances created in this scope, are shared between function instances (FUNCTION_SCOPE).

In the Lambda function there is a function called choose_engine_and_format() which defines the vision task type along with the inference engine and DNN model format. It loads the DNN model, initializes the dataset (ImageNet, COCO) and prepares the inference engine into the GLOBAL SCOPE.

def choose_engine_and_format(engine_and_format, cv_task):
    if cv_task == "classification":
        models_path = "mobilenetv1/FP32/"
        if engine_and_format == "ie-ir":
            backend = "mobilenet-ov-runtime"
        elif engine_and_format == "ocv-tf":
            backend = "mobilenet-tf-runtime"
        elif engine_and_format == "ocv-cf":
            backend = "mobilenet-caffe-runtime"
    if cv_task == "object_detection":
        models_path = "ssd-mobilenetv1/FP32/"
        if engine_and_format == "ie-ir":
            backend = "ssd-mobilenet-ov-runtime"
        elif engine_and_format == "ocv-tf":
            backend = "ssd-mobilenet-tf-runtime"
        elif engine_and_format == "ocv-cf":
            backend = "ssd-mobilenet-caffe-runtime"
    backend_handler = HandlerApp(engine_and_format)
    backend_handler.init(backend, models_path, make_profiling = True)
    return backend_handler

The Handler layer is imported in the Lambda function as this way:

from dldt_tools.processing_layer import HandlerApp #for Handler layer

And the handler layer is initialized with this code lines:

    backend_handler = HandlerApp(engine_and_format)
   backend_handler.init(backend, models_path, make_profiling = True)

The DNN inference is processed with the following functions:

   h_var = backend.init_handler_variables(event) #initialize the input data from Dataset stored in OSS
   backend.make_aws_inference(h_var) # make aws inference and post-processing
   finish_time = time.time()
   h_var["OutputData"]["start_time"] = init_time
   h_var["OutputData"]["finish_time"] = finish_time
   backend.deliver_output_data(h_var) #deliver the benchmarking metrics in the OSS with a .json file

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aws_layers		aws_layers
experiments/src		experiments/src
lambda_function		lambda_function
misc		misc
models		models
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking DNN inference performance on serverless environment with MLPerf (example source code)

Index:

Introduction

Installing MLPerf serverless platform

Installing SUT

1 Create Lambda function:

2 create IAM Role to enable the Lambda function execution with Amazon S3:

2 Upload AWS Layers

3 Configure the Online Storage Service (S3)

4 Define Lambda environment variables:

5 Define event driven listener to activate the SUT:

6 Upload the models:

7. Upload the dataset images.

Configure LoadGen:

Benchmarking DNN models in MLPerf

Configure the benchmark type

Start benchmarking

Understanding the benchmarked results

Visualizing the experiments

SUT Layers

SUT AWS Lambda function:

USEFUL INFORMATION:

About

Releases

Packages

Contributors 3

Languages

Vicomtech/serverless-mlperf

Folders and files

Latest commit

History

Repository files navigation

Benchmarking DNN inference performance on serverless environment with MLPerf (example source code)

Index:

Introduction

Installing MLPerf serverless platform

Installing SUT

1 Create Lambda function:

2 create IAM Role to enable the Lambda function execution with Amazon S3:

2 Upload AWS Layers

3 Configure the Online Storage Service (S3)

4 Define Lambda environment variables:

5 Define event driven listener to activate the SUT:

6 Upload the models:

7. Upload the dataset images.

Configure LoadGen:

Benchmarking DNN models in MLPerf

Configure the benchmark type

Start benchmarking

Understanding the benchmarked results

Visualizing the experiments

SUT Layers

SUT AWS Lambda function:

USEFUL INFORMATION:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages