Releases: intel/cri-resource-manager
v0.5.0: Improved policies, bug fixes, better test coverage.
This release brings general stability and correctness improvements. It merges the memory tiering policy to
the original topology aware one, with a number of important fixes for resource accounting and assignment.
Major Changes
-
policies:
- Add new podpools policy for pod-granularity workload placement
- topology-aware: merge topology-aware and memory tiering policies
- topology-aware: honor CPU reservation/reserved CPU set in configuration
- topology-aware: unify syntax for per container and pod annotated preferences
-
RDT:
- split out RDT manipulation code to a self-contained package, https://github.com/intel/goresctrl
- implement operating modes (Disabled, Discovery, Full)
- add option to disable RDT monitoring
- support L2 cache allocation
-
CPU allocator (used by topology-aware and podpools policies):
- detect CPU priority levels with Intel Speed Select Technology (SST)
Bug Fixes
-
policies:
- topology-aware: several significant cpu and memory accounting fixes
- topology-aware: fixes in gradually relaxed memory pinning for OOM-prevention
- topology-aware: better handling of bounding and reserved resources
- topology-aware: fix assignment of CPU-less memory zones
- topology-aware: fix building sparse topology trees
-
RDT:
- use root class as a fallback for missing classes
- empty class implies root class
- do forceful rdt (re-)configuration
-
resource-manager:
- force full reallocation when switching policies
- run post-update hooks after reconfiguration
- save cache at startup
-
config:
- handle composite structs in Module.validate()
-
cache:
- (over)write cache file atomically
-
testing:
- e2e: fix clearing cri-resmgr cache on uninstall
- e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
-
documentation:
- fix static-pools debug logging instructions
- sample-configs: sample configuration fixes
Other Improvements
-
policies:
- topology-aware: more regular annotation interpretation for CPU allocation preferences
-
resource-manager:
- dump extra data for message disambiguation
- flush logs after every request/event processed
-
cache:
- log name on pod/container removal
-
cri-resmgr:
- increase allowed service journal log bursts
-
logging:
- switch logger to use klog
-
testing:
- e2e: add tests for memset expansion in topology-aware policy
- e2e: add vm-put-docker-image to the vm library
- e2e: allow user override for VM_SSH_USER over distro-ssh-user
- e2e: generalize templating any file with instantiate()
- e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
- e2e: set imagePullPolicy on every test pod
- e2e: support namespaced kubectl create from templates
- e2e: unified memory-type and cold-start annotation syntax
- e2e: update dynamic page demotion tests
- e2e: update podpools tests to pass with new cpuallocator
- e2e: update tests on pinning reserved CPUs
- benchmark: add memtier_benchmark for memcached/redis
-
documentation:
- improve RDT documentation
- fix static-pools debug logging instructions
List of Merged PRs
- PR #528: build: include only cri-resmgr in binary dist tarballs
- PR #529: docs: fix static-pools debug logging instructions
- PR #530: memtier/c4pmem4/test03-coldstart: don't jump the gun
- PR #536: .github: update issue template for new releases
- PR #537: docs: minor fixes in html template customization
- PR #538: docs: use 'release branch' as the current version in versions menu
- PR #540: e2e: support namespaced kubectl create from templates
- PR #541: e2e: fix clearing cri-resmgr cache on uninstall
- PR #542: e2e: generalize templating any file with instantiate()
- PR #543: memtier: implement reserved CPUs pool
- PR #545: resource-manager: run post-update hooks after reconfiguration
- PR #546: go.mod: update to Kubernetes v1.19.4
- PR #547: scripts: helper for maintaining replace lines in go.mod
- PR #549: benchmark: add memtier_benchmark for memcached/redis
- PR #550: test/functional: prevent read/write data race in klog
- PR #553: docs: quote text containing '<' and '>' using `` in affinity docs
- PR #555: scripts/update-gh-pages: more intelligent http redirect
- PR #556: e2e: allow user override for VM_SSH_USER over distro-ssh-user
- PR #557: Improve CPU prioritization
- PR #560: e2e: add vm-put-docker-image to the vm library
- PR #561: memtier: rework building of pool tree by HW topology
- PR #562: docs: improve rdt documentation
- PR #563: memtier/pool test: fix fd leakage causing test panics with more data
- PR #566: Kata container support
- PR #567: config: handle composite structs in Module.validate()
- PR #568: control/rdt: add option to disable rdt monitoring
- PR #570: page-migrate: add cache-like container.GetPodID()
- PR #571: config: fix typo in log message
- PR #572: control/rdt: fix and simplify handling of implicit disabling
- PR #573: control/rdt: empty class implies root class
- PR #574: control/rdt: implement assignAll()
- PR #575: control/rdt: do forceful rdt (re-)configuration
- PR #576: control/rdt: correct usage of checkIdle() in configNotify()
- PR #577: control/rdt: implement operating modes
- PR #579: memtier: don't imply error by signature for functions that never fail
- PR #580: docs: use an explicit version of recommonmark
- PR #581: rdt: accept missing default classes in Discovery mode
- PR #583: docs: refer to the latest release in the installation instructions
- PR #586: rdt: use root class as a fallback to missing classes
- PR #587: e2e: set imagePullPolicy on every test pod
- PR #588: memtier: unify syntax for annotated preferences
- PR #589: memtier: fix build error introduced by improper, unrebased merging of both #524 and #543
- PR #590: memtier: more regular annotation interpretation for CPU allocation preferences
- PR #591: fix: nil pointer dereference on updateSharedAllocations(nil)
- PR #592: e2e: unified memory-type and cold-start annotation syntax
- PR #594: policy/builtin/*: fix outdated comment about PolicyName
- PR #595: docs: recognize/handle .md-links to element IDs
- PR #596: server,resource-manager: flush logs after every request/event processed
- PR #597: resource-manager: rename 'memtier' policy to 'topology-aware'
- PR #598: podpools: policy for pod-granularity workload placement
- PR #599: rdt: fix order of params passed to GetTasksInContainer()
- PR #600: test: drop stale rdt testdata
- PR #601: topology-aware: improved topology tree/node dump
- PR #602: cpuallocator: add CPU priority levels
- PR #604: e2e: properly set VM_COMPOSE_YAML when reloading existing vm-configs
- PR #606: Extended detection of Intel Speed Selection Technology (SST)
- PR #607: klog: skip headers for journald by default
- PR #608: cri-resmgr: increase allowed service journal log bursts
- PR #609: fixes: topology-aware policy cpu/memory accounting fixes
- PR #610: resource-manager: force full reallocation when switching policies
- PR #612: topology-aware: force reserved/kube-system containers to the root
- PR #613: e2e: add tests for memset expansion in topology-aware policy
- PR #614: resource-manager,dump: dump extra data for message disambiguation
- PR #615: topology-aware: better and more readable logs
- PR #616: topology-aware: memory accounting and memset expansion fixes
- PR #617: resource-manager: catch containers earlier when they are gone
- PR #618: e2e: update podpools tests to pass with new cpuallocator
- PR #622: topology-aware: use normal as fallback for reserved
- PR #623: e2e: update tests on pinning reserved CPUs
- PR #624: topology-aware: use prettyMem() in log messages
- PR #625: cache: (over)write cache file atomically
- PR #626: resource-manager: save cache at startup
- PR #627: cache: log name on pod/container removal
- PR #628: rdt: support L2 cache allocation
- PR #629: topology-aware: fix filtering out nodes with insufficient memory
- PR #630: topology-aware: fix moving up memory grant
- PR #631: pkg/sysfs: clarifying comment on getCPUMapping()
- PR #632: e2e: update dynamic page demotion tests
- PR #634: sample-configs: make cri-resmgr-configmap.example.yaml usable
- PR #636: podpools: fix reflect JSON tag typo
v0.4.1: Improved documentation, end-to-end testing, bug fixes.
The documentation in this release has been overhauled with significant structural improvements and additional content over previous ones. End-to-end test coverage has been vastly extended and the test framework significantly improved. This release contains a number of important bug fixes and a few other functional improvements. Here is a non-exhaustive list of these.
Bug fixes
- agent:
- refuse to start if
NODE_NAME
environment variable is not specified
- refuse to start if
- memtier policy:
- fix updating containers after shared pool changes
- honor CPU isolation opt-out preference
- honor allowed CPUs in resource discovery
- fix PMEM-only NUMA node assignment for weird topologies
- static-pools policy:
- make dynamic (re-)configuration work properly
- look for cmk isolate when parsing container command line
- re-load legacy config on config update
- only take pools configuration from legacy config
- improved sanity check on pool configuration
- fix node tainting
- cri-resmgr:
- fill in defaults for unspecified values in configuration
Other Improvements
- cri-resmgr:
- dump outbound requests if debugging is enabled for the 'cri/relay' source
- resource controllers:
- page-migrate: split out page-migration into a controller of its own
- e2e test framework
- vastly improved test coverage on multiple distros
- builds:
- build binary dist tarballs
Difference wrt. Rolling Master
With the exception of the PRs listed below, all others in the inclusive range #411 - #527 has been cherry-picked or back-ported from the rolling master branch to this release. The omitted PRs have been excluded due to backwards compatibility or other similar reasons:
#525: cri-resmgr: reuse 'rdt' logger for the split out rdt package#490: rdt: use goresctrl#497: pkg/log: switch logger to use klog#472: e2e: add tests for static-pools#489: static-pools: slight refactoring and renaming#483: static-pools: lazier node updates#475: static-pools: drop all cmdline flags
v0.4.0: Improved support for Memory Tiering, Binary packages
Major changes
- 'topology-aware' policy superseded by 'memtier'
- support for cold start of containers
- support for dynamic demotion of memory
- support for limiting container top tier/DRAM memory usage (require kernel support)
- support for externally adjusting container resource assignments
- multi-die aware resource allocation
- binary distribution with packages for popular Linux distributions and images at Docker Hub
Detailed changelog
Policies
- 'topology-aware' policy superseded by 'memtier', which
- is a forked and improved version of 'topology-aware'
- has the same basic functionality
- has a number of improvements and extra functionality:
- multi-die topology support
- multi-tier (DRAM/PMEM) memory support
- top tier/DRAM memory limiting
- container 'cold start' support: force containers initially exclusively to PMEM
- experimental dynamic page demotion: periodically move least-used pages from DRAM to PMEM
- experimental support for dynamic external adjustments to container resource assignments
- has a bunch of resource assignment/allocation fixes (which are not backported to 'topology-aware' any more)
- will in the next release replace 'topology-aware' altogether
- static-pools:
- compatibility back-ports from CMK: advertise CPUs in 'shared', 'infra' pools via
CMK_CPUS_SHARED
,CMK_CPUS_INFRA
environment variables
- compatibility back-ports from CMK: advertise CPUs in 'shared', 'infra' pools via
- common:
- support for new Pod annotation controls:
- opt out from automatic topology hint generation:
topologyhints.cri-resource-manager.intel.com/pod: false
topologyhints.cri-resource-manager.intel.com/container.$name: false
- set DRAM/top tier memory limit:
toptierlimit.cri-resource-manager.intel.com/pod: $limit
toptierlimit.cri-resource-manager.intel.com/container.$name: $limit
- opt out from automatic topology hint generation:
- make simple container affinities always implicitly symmetric
- limit user-defined container affinity to [-1000,1000]
- re-trigger pod cgroupfs parent directory and QoS class discovery if necessary
- support for new Pod annotation controls:
Resource controllers
- RDT:
- remove controller-level class name mapping
- don't consider assignment to a default class an error if no classes are defined
- fix crash/misplaced logging of group deletion
- Block I/O:
- remove controller-level class name mapping
- don't consider assignment to a default class an error if no classes are defined
- CRI:
- properly send out generated/queued
UpdateContainerResources
requests
- properly send out generated/queued
Data collectors
- cgroupstats:
- use/report container IDs
- fix hugetlb size parsing
- avx:
- switch to cilium/ebpf from iovisor/gobpf
cri-resmgr
- new command line options:
- reset cached configuration:
--reset-config
- reset cached policy data:
--reset-policy
- reset cached configuration:
- always set up node agent connection, even when running with
--force-config
- allow switching policies during startup, unless started with
--disable-policy-switch
Packaging
- install sample fallback config as fallback and not real config file
- use
/etc/default
for defaults on debian-based distros - support Ubuntu 20.04, OpenSUSE 15.2
Documentation
- automatic generation and publishing of documentation to github pages
- a number of documentation fixes and clarifications
Testing
- end-to-end test framework added
v0.3.1: Packaging and build fixes
This v0.3.1 patch release adds packaging and build fixes on top of the v0.3.0 release.
Changes:
- feature: add command line options for resetting the active policy in the cache and allow this to happen automatically during startup if necessary
- fix: NUMA CPU-/memory-attachment detection code to work with older kernels
- fix: move from gobpf to Cilium-based AVX eBPF implementation to address build issues on older kernel
- fix: add targets for containerized cross-builds for distro packages
v0.3.0: Memory management improvements
- added memory-tiering policy:
topology-aware
policy with support for DRAM, PMEM (Intel Optate DC) and HBM (High Bandwidth Memory) allocation - added blockio controller: class-based control over block I/O using the cgroupfs blkio controller
- added support for metrics collection:
- collection of raw metrics data, exporting to Prometheus
AVX512
usage: collect per container AVX512 instruction usage, tag containers accordingly
- rdt controller improvements: disjoint partitioning, L3 and memory bandwidth monitoring, and Intel RDT metrics
- new annotations:
- assign full pod or a container to
block I/O
orRDT
class:rdtclass.cri-resource-manager.intel.com/container.$container
: class-namerdtclass.cri-resource-manager.intel.com/pod
: class-nameblockioclass.cri-resource-manager.intel.com/container.$container
: class-nameblockioclass.cri-resource-manager.intel.com/pod
: class-name
memtier
policy preference for type of memory allocated to a container:memory-type.cri-resource-manager.intel.com
:
$container: [dram,][pmem,][hbm]
- assign full pod or a container to
v0.2.0: More generic runtime configuration handling.
Implement a more general, unified mechanism for handling runtime configuration.
v0.1.0: First publicly available release
Initial release for the project with major functionality available in alpha state.
Note: this is pre-production Alpha release. Not for production use!