add linux pressure stall metrics #125

copperlight · 2024-02-22T02:23:42Z

This change adds the following new metrics, which can be used to provide
feedback on where a system is currently constrained. The metrics are
collected for both EC2 instances and Titus containers, except the full:cpu
metric, which is meaningless on EC2 instances.

EC2 instances:

name=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second
name=sys.pressure.full,id=[io|memory]     counter unit=seconds/second

Titus comtainers:

name=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second
name=sys.pressure.full,id=[cpu|io|memory] counter unit=seconds/second

https://docs.kernel.org/accounting/psi.html#pressure-interface

The "some" line indicates the share of time in which at least some tasks are
stalled on a given resource.

The "full" line indicates the share of time in which all non-idle tasks are
stalled on a given resource simultaneously. In this state actual CPU cycles
are going to waste, and a workload that spends extended time in this state
is considered to be thrashing.

The total absolute stall time (in us) is tracked and exported as well, to
allow detection of latency spikes which wouldn't necessarily make a dent in
the time averages, or to average trends over custom time frames.

The total stall time is a monotonic counter which is collected, transformed
into a base unit of seconds, and reported to the backend as a rate-per-second.

brharrington

Would the maximum expected value of name,sys.pressure.full,:eq,:sum for a given instance be 1 second/second? If so, then this makes sense as we should be able to reason about the value as a percentage of time stalled. Otherwise I'm not sure how we would reason about it.

brharrington · 2024-02-22T12:58:44Z

lib/pressure_stall_test.cc

+      {"sys.pressure.some|count|cpu", 10},
+      {"sys.pressure.some|count|io", 10},
+      {"sys.pressure.some|count|memory", 10},
+      {"sys.pressure.full|count|io", 20},


On real data should full always be less than or equal to some?

Yes, this should be the common case. The test data was artificially picked just to ensure that it is parsed correctly.

I will update these values so that some goes to 1 and full goes to 0.5, so it's less of a surprise.

copperlight · 2024-02-23T21:25:39Z

Would the maximum expected value of name,sys.pressure.full,:eq,:sum for a given instance be 1 second/second? If so, then this makes sense as we should be able to reason about the value as a percentage of time stalled. Otherwise I'm not sure how we would reason about it.

https://docs.kernel.org/accounting/psi.html#pressure-interface

The "full" line indicates the share of time in which all non-idle tasks are stalled on a given resource simultaneously. In this state actual CPU cycles are going to waste, and a workload that spends extended time in this state is considered to be thrashing.

The total absolute stall time (in us) is tracked and exported as well, to allow detection of latency spikes which wouldn't necessarily make a dent in the time averages, or to average trends over custom time frames.

To the best of my understanding, I believe that this is the case. We take the total value in microseconds, transform that into seconds, and report a rate-per-second to the backend.

copperlight · 2024-02-23T21:32:56Z

I wrote a small Python script to monitor pressure stall values, as a way to preview data values.

On a few EC2 systems, the some stall category was just tenths or hundredths of a second.

#!/usr/bin/env python

# purpose: calculate the pressure stall time in seconds, every minute, so that we can understand
# the behavior of these values as they are recorded into metrics.
#
# See https://docs.kernel.org/accounting/psi.html for more details on Pressure Stall Information (PSI)

import argparse
import json
import time
from threading import Thread

MICROS = 1000 * 1000


def parse_args():
    parser = argparse.ArgumentParser('Monitor pressure stall statistics')
    parser.add_argument('-c', '--container', action='store_true', help='/sys/fs/cgroup')
    parser.add_argument('-i', '--instance', action='store_true', help='/proc/pressure')
    args = parser.parse_args()
    if not (args.instance or args.container) or (args.instance and args.container):
        parser.error('Must choose either --container or --instance')
    return args


def parse_pressure_stall(lines):
    result = {'some': None, 'full': None}
    for line in lines:
        line = line.split(' ')
        usec = int(line[-1].split('=')[-1])
        result[line[0]] = usec / MICROS
    return result


def monotonic_delta(iteration, category, parsed, last_value, stall):
    if iteration == 0:
        last_value[category] = parsed
    else:
        stall[category]['some'] = round(parsed['some'] - last_value[category]['some'], 4)
        stall[category]['full'] = round(parsed['full'] - last_value[category]['full'], 4)
        last_value[category]['some'] = parsed['some']
        last_value[category]['full'] = parsed['full']


def read_instance_pressure_stall(iteration, last_value, stall):
    for category in ['io', 'memory']:
        with open(f'/proc/pressure/{category}', 'r') as f:
            parsed = parse_pressure_stall(f.readlines())
            monotonic_delta(iteration, category, parsed, last_value, stall)


def read_container_pressure_stall(iteration, last_value, stall):
    for category in ['cpu', 'io', 'memory']:
        with open(f'/sys/fs/cgroup/{category}.pressure', 'r') as f:
            parsed = parse_pressure_stall(f.readlines())
            monotonic_delta(iteration, category, parsed, last_value, stall)


def monitor_pressure_stall(args):
    iteration = 0
    last_value = {
        'cpu': {'some': None, 'full': None},
        'io': {'some': None, 'full': None},
        'memory': {'some': None, 'full': None}
    }
    stall = {
        'cpu': {'some': None, 'full': None},
        'io': {'some': None, 'full': None},
        'memory': {'some': None, 'full': None}
    }
    while True:
        print(f'---- iteration {iteration} ----')
        if args.instance:
            read_instance_pressure_stall(iteration, last_value, stall)
            if iteration != 0:
                print(f'instance stall={json.dumps(stall, indent=2)}')
        if args.container:
            read_container_pressure_stall(iteration, last_value, stall)
            if iteration != 0:
                print(f'container stall={json.dumps(stall, indent=2)}')
        iteration += 1
        time.sleep(60)


if __name__ == '__main__':
    args = parse_args()
    print('BEGIN monitoring pressure stall statistics')
    t = Thread(daemon=True, target=monitor_pressure_stall, args=([args]))
    t.start()
    try:
        t.join()
    except KeyboardInterrupt:
        print('\nEND monitoring pressure stall statistics')

This change adds the following new metrics, which can be used to provide feedback on where a system is currently constrained. The metrics are collected for both EC2 instances and Titus containers, except the `full:cpu` metric, which is meaningless on EC2 instances. EC2 instances: ``` name=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second name=sys.pressure.full,id=[io|memory] counter unit=seconds/second ``` Titus comtainers: ``` name=sys.pressure.some,id=[cpu|io|memory] counter unit=seconds/second name=sys.pressure.full,id=[cpu|io|memory] counter unit=seconds/second ``` https://docs.kernel.org/accounting/psi.html#pressure-interface > The "some" line indicates the share of time in which at least some tasks are > stalled on a given resource. > The "full" line indicates the share of time in which all non-idle tasks are > stalled on a given resource simultaneously. In this state actual CPU cycles > are going to waste, and a workload that spends extended time in this state > is considered to be thrashing. > The total absolute stall time (in us) is tracked and exported as well, to > allow detection of latency spikes which wouldn't necessarily make a dent in > the time averages, or to average trends over custom time frames. The `total` stall time is a monotonic counter which is collected, transformed into a base unit of seconds, and reported to the backend as a rate-per-second.

copperlight force-pushed the pressure-stall branch from 90cfd0a to 2dd921d Compare February 22, 2024 03:49

brharrington reviewed Feb 22, 2024

View reviewed changes

copperlight force-pushed the pressure-stall branch from 00e3b03 to bf33b0b Compare February 23, 2024 22:05

copperlight merged commit 3b38838 into Netflix-Skunkworks:main Feb 23, 2024
2 checks passed

copperlight deleted the pressure-stall branch February 23, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add linux pressure stall metrics #125

add linux pressure stall metrics #125

copperlight commented Feb 22, 2024 •

edited

Loading

brharrington left a comment

brharrington Feb 22, 2024

copperlight Feb 23, 2024 •

edited

Loading

copperlight commented Feb 23, 2024 •

edited

Loading

copperlight commented Feb 23, 2024 •

edited

Loading

add linux pressure stall metrics #125

add linux pressure stall metrics #125

Conversation

copperlight commented Feb 22, 2024 • edited Loading

brharrington left a comment

Choose a reason for hiding this comment

brharrington Feb 22, 2024

Choose a reason for hiding this comment

copperlight Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

copperlight commented Feb 23, 2024 • edited Loading

copperlight commented Feb 23, 2024 • edited Loading

copperlight commented Feb 22, 2024 •

edited

Loading

copperlight Feb 23, 2024 •

edited

Loading

copperlight commented Feb 23, 2024 •

edited

Loading

copperlight commented Feb 23, 2024 •

edited

Loading