forked from pivotal-cf/docs-ops-guide
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpws_upgrade_load.html.md.erb
163 lines (138 loc) · 6.28 KB
/
pws_upgrade_load.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
---
title: Pivotal Web Services Performance During Upgrade
owner: CloudOps
---
<strong><%= modified_date %></strong>
This topic provides sample performance measurements of a Cloud Foundry installation undergoing the workload associated with an upgrade.
To obtain these measurements, Pivotal repaved its production Pivotal Web Services (PWS) deployment. The repave process simulates system load that would be incurred when performing a rolling upgrade of Diego cells.
Use the measurements and configuration values published in this document as guidance when ensuring you have adequate file storage hardware prior to a platform upgrade.
For more information on the impact of upgrade on file storage performance, see [Upgrade Considerations for Selecting File Storage in Pivotal Cloud Foundry](./storage_considerations.html).
## <a id="configuration"></a> Platform Configuration
The following table details the starting parameters and configuration of PWS.
<table border="1" class="nice">
<tr>
<th>Configuration</th>
<th>Value</th>
<th>How to Locate</th>
</tr>
<tr>
<td><b>IaaS</b></td>
<td>Amazon Web Services</td>
<td>Refer to your Ops Manager Director configuration or BOSH deployment manifest.</td>
</tr>
<tr>
<td><b>File Storage</b></td>
<td>AWS EBS (External with some elastic capacity)</td>
<td>Refer to your Elastic Runtime configuration or BOSH deployment manifest.</td>
</tr>
<tr>
<td><b>Version of CF</b></td>
<td>v252</td>
<td>Refer to your Ops Manager Director and Elastic Runtime configuration or BOSH deployment manifest.</td>
<tr>
<td><b>Number of Diego Cells</b></td>
<td>218</td>
<td>To view the number of Diego cell instances currently running in your deployment, see the <b>Resource Config</b> section of your Elastic Runtime tile or consult your Diego deployment manifest.</td>
</tr>
<tr>
<td><b>Maximum Number of Started Containers</b></td>
<td>250</td>
<td>See <a href="../adminguide/max-container-starts.html">PCF</a> or <a href="http://docs.cloudfoundry.org/adminguide/max-container-starts.html">Cloud Foundry</a> documentation for configuration information.
</td>
</tr>
<tr>
<td><b>max_in_flight Configuration for Diego Cells</b></td>
<td>6</td>
<td>To retrieve the existing <code>max_in_flight</code> value for the Diego Cell job in Ops Manager Director, use the Ops Manager API. See the Ops Manager API documentation. If you are running open source CF, consult your BOSH deployment manifest.
</td>
<tr>
<td><b>Number of Availability Zones (AZ)</b></td>
<td>2</td>
<td>Consult your Elastic Runtime or BOSH deployment AZ configuration.</td>
</tr>
<tr>
<td><b>Number of App Instances</b></td>
<td>16231</td>
<td><code>datadog.nozzle.bbs.LRPsRunning</code></td>
</tr>
<tr>
<td><b>Number of Application Security Groups (ASGs)</b></td>
<td>43</td>
<td>As admin user, run the <code>cf security-groups</code> command. For more information, see <a href="../concepts/asg.html">Understanding Application Security Groups</a>.</td>
</tr>
</table>
## <a id="during_upgrade"></a> System Performance Measurements During Cell Repave
This table presents performance measurements taken during the Diego cell repave.
<p class="note"><b>Note</b>: These measurements indicate the peak cumulative values of the entire system (250 Diego cells, ~15,000 application instances, and 2 AZs.)</p>
Use these measurements as a baseline for expected system load during Diego cell upgrade.
<table border="1" class="nice">
<tr>
<th>Measurement</th>
<th>Value</th>
<th>Metric Used</th>
</tr>
<tr>
<td><b>Cell CPU Consumption</b></td>
<td>36%</td>
<td><code>bosh.healthmonitor.system.cpu.user</code></td>
</tr>
<tr>
<td><b>Cell Memory Consumption</b></td>
<td>~50%</td>
<td><code>bosh.healthmonitor.system.mem.percent</code></td>
</tr>
<tr>
<td><b>Cell I/O Consumption (Read) During Normal Operations</b></td>
<td>43 Read I/O Operations per second
</td>
<td><code>aws.ebs.volume\_read\_ops</code></td>
</tr>
<tr>
<td><b>Cell I/O Consumption (Read) During Upgrade</b></td>
<td>1,943 Read I/O Operations per second
</td>
<td><code>aws.ebs.volume\_read\_ops</code></td>
</tr>
<tr>
<td><b>Cell IO Consumption (Write) During Normal Operations</b></td>
<td>2,166 Write I/O Operations per second</td>
<td><code>aws.ebs.volume\_write\_ops</code></td>
</tr>
<tr>
<td><b>Cell IO Consumption (Write) During Upgrade</b></td>
<td>21,000 Write I/O Operations per second</td>
<td><code>aws.ebs.volume\_write\_ops</code></td>
</tr>
<tr>
<td><b>Cell Network Consumption (Network Out) During Normal Operations</b></td>
<td>~1.25 GB per minute
</td>
<td><code>aws.ec2.network\_out</code></td>
</tr>
<tr>
<td><b>Cell Network Consumption (Network Out) During Upgrade</b></td>
<td>~1.25 GB per minute (no significant change)
</td>
<td><code>aws.ec2.network\_out</code></td>
</tr>
<tr>
<td><b>Cell Network Consumption (Network In) During Normal Operations</b></td>
<td>2.11 GB per minute</td>
<td><code>aws.ec2.network\_in</code></td>
</tr>
<tr>
<td><b>Cell Network Consumption (Network In) During Upgrade</b></td>
<td>16.75GB per minute</td>
<td><code>aws.ec2.network\_in</code></td>
</tr>
</table>
## <a id="graph"></a> Sample Performance Graphs
These DataDog graphs represent a timeline visualization of read and write operations during the repave event.
### <a id="graph_readops"></a> Read I/O Operations
The read I/O operations sample was taken over 115 VMs and represent the number of read operations over 300 seconds for a single Diego cell.
<%= image_tag("pws_read_ops_upgrade.png") %>
### <a id="graph_writeops"></a> Write I/O Operations
The write I/O operations sample was taken over 115 VMs and represent the number of read I/O operations over 300 seconds for a single Diego cell.
<%= image_tag("pws_write_ops_upgrade.png") %>
## <a id="summary"></a> Summary
During the repave process, 250 Diego cells were updated. The repave process took 6 hours overall or about 3 hours for each Availability Zone.