Skip to content

Commit

Permalink
Update optimize-counter-initialization.md
Browse files Browse the repository at this point in the history
  • Loading branch information
stephenxs authored Dec 5, 2024
1 parent f667df9 commit 72f3e4b
Showing 1 changed file with 21 additions and 21 deletions.
42 changes: 21 additions & 21 deletions doc/flex_counter/optimize-counter-initialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@ Runtime counter performance optimization is out of scope.
| Name | Meaning |
|:----:|:-------:|
| PG | Represents priority group, which is the ingress queue on a port |
| single counter-poll API | Represents the SAI API calls to fetch many counters of a single SAI object. |
| single counter-polling API | Represents the SAI API calls to fetch many counters of a single SAI object. |
|| They have the name convention of `get_<object_type>_stats` or `get_<object_type>_stats_ext`. |
|| Like `get_port_stats`, `get_port_stats_ext`, `get_queue_stats_ext`, etc. |
| bulk counter-poll API | Represents `sai_bulk_object_get_stats`, the SAI API call to fetch a uniform set of counters of many SAI objects in one-shot. |
| bulk counter-polling API | Represents `sai_bulk_object_get_stats`, the SAI API call to fetch a uniform set of counters of many SAI objects in one-shot. |
| counter group | The way that counters are managed in SONiC. The counters are divided into different groups by their motivation. |
| flex counter | The component that manages counter polling in SONiC. |
| sub orchagents | Represents different components in `sonic-swss/orchagent` folder. Each of them handles certain tables, like portsorch, bufferorch, etc. |

### Overview

Counter is an important way to monitor a system's performance and trouble shooting. Flex Counter is responsible for managing counters in SONiC. The counters are polled, represented on a per group basis. The groups are organized based on the counters' motivation and providers. In SONiC, counters are divided into counter groups. The number of groups varies among different systems. The following groups exist on all systems:
Counter is an important way to monitor a system's performance and troubleshooting. Flex Counter is responsible for managing counters in SONiC. The counters are polled and represented on a per-group basis. The groups are organized based on the counters' motivation and providers. In SONiC, counters are divided into counter groups. The number of groups varies among different systems. The following groups exist on all systems:

1. Port counter group
2. Port drop counter group
Expand All @@ -38,28 +38,28 @@ Counter is an important way to monitor a system's performance and trouble shooti
5. Watermark counter group for buffer pools, PGs and queues
6. Router interface counter group

The flex counter handles counter on a per group basis in the following ways:
The flex counter handles counter on a per-group basis in the following ways:

- To initialize counters during system initialization
- To poll counters periodically during system running

Originally, the counters were polled using single counter-poll API, which means only one object's counters could be polled by a SAI API call. However, it is time consuming to poll the counters because there are a large number of counters on the SONiC system, which introduces performance issues.
Originally, the counters were polled using single counter-polling API, which means only one object's counters could be polled by a SAI API call. However, it is time-consuming to poll the counters because there are a large number of counters on the SONiC system, which introduces performance issues.

The bulk counter poll API has been introduced to improve the performance of counter polling, enabling a set of objects' counters to be polled in one SAI API call. However, all the objects' counters do not support to be polled in bulk mode. Eg. bulk counter polling can be supported on unicast queues but not multicast queues on some platforms. It requires checking whether bulk API is supported on each object during initialization and grouping all the bulk-counter-supporting objects together and then polling their counters in bulk mode. We still use single counter-poll API on the objects whose counters can not be polled in bulk mode.
The bulk counter-polling API has been introduced to improve the performance of counter-polling, enabling a set of objects' counters to be polled in one SAI API call. However, all the objects' counters do not support to be polled in bulk mode. Eg. bulk counter polling can be supported on unicast queues but not multicast queues on some platforms. It requires checking whether bulk API is supported on each object during initialization, grouping all the bulk-counter-supporting objects together, and then polling their counters in bulk mode. We still use a single counter-polling API on the objects whose counters can not be polled in bulk mode.

However, it is time-consuming to check whether bulk counter poll API is supported on each object during initialization, especially on systems with a large number of ports. This is because:
However, it is time-consuming to check whether bulk counter-polling API is supported on each object during initialization, especially on systems with a large number of ports. This is because:

1. The number of objects increases significantly when the number of ports increases.

Many counters are attached to port, queue, PG objects which are directly proportional to the number of ports.

By default, there are 3 PGs and 8 queues per port. So, for each port there are 12 objects relevant
By default, there are 3 PGs and 8 queues per port. So, for each port, there are 12 objects relevant

There can be several counter groups attached to each object.

2. Flex counter needs to check whether bulk operation is supported by calling SAI bulk counter polling API on a per object per counter group basis.
2. Flex counter needs to check whether the bulk operation is supported by calling SAI bulk counter polling API per object per counter group basis.

In this design, we improve the counter initialization by checking whether bulk operation is supported on a group of objects altogether. By doing so, the number of SAI API calls is significantly reduced and the time is shorten.
In this design, we improve the counter initialization by checking whether bulk operation is supported on a group of objects altogether. By doing so, the number of SAI API calls is significantly reduced and the time is shortened.

#### Challenge

Expand All @@ -82,11 +82,11 @@ The current architecture is not changed in this design.

#### Current approach

The current flow to initialize a counter group is as following:
The current flow to initialize a counter group is as follows:

![current-initialization-flow](optimize-counter-initialization-files/counter-initialization-current-approach.png).

1. Orchagent starts to initialize counters for each object, like ports, PGs, queues, etc. For each object, orchagent notifies the flex counter manager and then flex counter manager notifies SAIRedis to start counter polling on a set of counter IDs.
1. Orchagent starts to initialize counters for each object, like ports, PGs, queues, etc. For each object, orchagent notifies the flex counter manager and then the flex counter manager notifies SAIRedis to start counter polling on a set of counter IDs.
2. SAIRedis needs to call `sai_bulk_object_get_stats` for each object with counter IDs as the arguments

1. SAIRedis adds the object to `bulk-supporting object set` if the call succeeds.
Expand All @@ -95,13 +95,13 @@ The current flow to initialize a counter group is as following:
In the normal runtime, SAIRedis periodically executes the following for each counter group:

1. Calls `sai_bulk_object_get_stats` for all objects in `bulk-supporting object set` once to get statistics in bulk mode
2. For each object in `bulk-unsupporting object set`, calls `sai_<object_type>_get_stats(object ID, counter IDs)` to get statistics in single mode
2. For each object in `bulk-unsupporting object set`, call `sai_<object_type>_get_stats(object ID, counter IDs)` to get statistics in single mode

To summarize, SAIRedis needs to call `sai_bulk_object_get_stats` once for each object during initialization, which slows down the initialization process.

#### Optimized approach

The optimized flow to initialize a counter group is as following:
The optimized flow to initialize a counter group is as follows:

![optimized-initialization-flow](optimize-counter-initialization-files/counter-initialization-optimized-approach.png).

Expand All @@ -123,20 +123,20 @@ In the optimized flow, the number of SAIRedis calling `sai_bulk_object_get_stats

###### Flex counter manager

The Flex counter manager is a class introduced to manage a flex counter group. It is defined in file https://github.com/sonic-net/sonic-swss/blob/master/orchagent/flex_counter/flex_counter_manager.cpp
The Flex counter manager is a class introduced to manage a flex counter group. It is defined in the file https://github.com/sonic-net/sonic-swss/blob/master/orchagent/flex_counter/flex_counter_manager.cpp

The sub orchagents use flex counter manager to

- enable/disable counter polling
- set counter polling interval
- set counter-polling interval
- add/remove an object to/from the counter group

Originally, when a sub orchagent called flex counter manager to add an object to the counter group, flex counter manager directly notified SAIRedis to start to poll counters on the object.
Originally, when a sub orchagent called the flex counter manager to add an object to the counter group, the flex counter manager directly notified SAIRedis to start poll counters on the object.

In this design, batch mode is introduced to enable the flex counter manager to notify SAIRedis to start to poll counters on a set of objects altogether.
In this design, batch mode is introduced to enable the flex counter manager to notify SAIRedis to start poll counters on a set of objects altogether.

1. A flag is introduced as an argument of the constructor to indicate whether the batch mode is enabled.
2. A cache, `pending_sai_objects`, is introduced, which accommodates all objects that have been notified by sub orchagent but have not notified SAIRedis to start to poll counters.
2. A cache, `pending_sai_objects`, is introduced, which accommodates all objects that have been notified by sub orchagent but have not notified SAIRedis to start poll counters.
3. A method `flush` which notifies SAIRedis to start to poll counters on all the objects in `pending_sai_object`
4. The sub orchagent should call `flush` explicitly.

Expand All @@ -148,11 +148,11 @@ Ports orchagent manages counter polling of port, PG, and queue relevant flex cou

method `doTask` is essentially the `main loop` of each sub orchagent. It handles all `CONFIG_DB` or `APPL_DB` table updates.
2. Originally, there was a single flex counter manager object managing a flex counter group for all queue objects. In this design, we will introduce two flex counter manager objects for each queue relevant flex counter group, for unicast and multicast queues respectively.
3. Originally, the PG flex counter group was not managed using flex counter manager. In this design, we will manage PG flex counter group using flex counter manager to simplify the code.
3. Originally, the PG flex counter group was not managed using a flex counter manager. In this design, we will manage PG flex counter group using the flex counter manager to simplify the code.

##### sonic-sairedis

Support starting counter polling on a set of objects.
Support starting counter-polling on a set of objects.

### SAI API

Expand Down

0 comments on commit 72f3e4b

Please sign in to comment.