Skip to end of metadata
Go to start of metadata

Abstract: This page will give the details for the Metrics Based Port Selection algorithms and code changes that are being developed for ViPR 2.2.

This page was ported from the EMC internal Wiki.

This gives a brief history of Metrics Based Port selection.

Release

 <target release for this feature>

2.2 Initial support VNX, VMAX, HDS
3.5Added VPLEX support
SkywalkerChanges to balance volumes better on arrays with negligible I/O loads by adding volume count into usage calcuation.


Problem Statement

Originally before Metrics Port selection, given a choice of several ports providing equal hardware redundancy,Vipr used a simple metric to determine port usage, which was the count of ViPR provisioned initiators that are using the port. 

Customers requested that we use array (or network) performance metrics to make this choice.Possible array port metrics that come to mind are IOPs (I/O Operations per unit time) or bandwidth transferred per unit time. (These two metrics are available via SMI-S as "IOPs" and "Kbytes transferred" and we developed code to determine similar metrics for the VPLEX and HDS.)

Use Cases

Customers desire to take performance metrics into account in the port allocation algorithms because:

  1. They want to avoid new allocations to ports that are overloaded (as defined by having too many volumes, or too high an I/O load).
  2. They want to avoid new allocations to ports that reside on CPUs (especially in VMAX2 systems) that are overloaded as given by the CPU %busy is too high or because the CPU is servicing too many volumes).
  3. The customer wants to avoid allocating more storage on arrays that are overloaded (as given by some overall array usage metric.)

Note that it is still considered more important to provide ports with adequate hardware redundancy. That is given a choice of two ports, one with a lower metric that does not provide additional hardware redundancy, and another with a higher metric that does provide additional hardware redundancy, in many choices the latter port will be the correct choice.

Skywalker Release Changes (May 2017)

This section highlights the changes made for the Skywalker release. See the information below for an overview of the entire design that has been updated with the Skywalker changes.

In the Skywalker release we addressed a defect COP-30254 that indicated port allocation was not balanced on a newly installed array because there was not yet any significant I/O load on any of the ports from which metrics could be derived. This was problem was observed by several customers. What they desired in this circumstance is that the number of volumes be evenly distributed across the available ports.

The changes made to rectify this condition were as follows:

  1. Previously the Port Metric (if valid for all ports under consideration) consisted of the average of the Port Percent Busy and Cpu Percent Busy evenly combined. To balance the number of volumes assigned to each port when there was a negligible I/O load, we had to add a volume term to the Port Metric equation. To do that, we assumed that 2048 volumes was a maximum reasonable number of volumes on a port, and calculate a volume percentage metric based on that. Then the volume percentage metric is added into the overall Port Metric equation after being multiplied by a Volume Coefficient that allows volumes to be weighted more or less heavily in the calculations.
  2. To account for the background idle usage ("noise") on a relatively idle port and array, two new user configurable variables were introduced, the Port Utilization Floor and Cpu Utilization Floor. If the Port Percent Busy is less than the Port Utilization Floor, it won't be added to the port metric at all and similarly if the Cpu Percent Utilization is less than the Cpu Utilization Floor it won't be added. This was introduced primarily because it was observed that arrays with no I/O load showed a cpu percent busy typically of about 3 to 7 percent, which is just overhead. These floors allow the metric to ignore the pure overhead that doesn't represent real "load".

Design

I. Metrics to be Collected

We have determined what metrics are available on the VMAX and VNX, and made a decision on what metrics will be used for the algorithm. This was done in collaboration with the VMAX team and the PM team. We also support the same metrics on the HDS and VPLEX.

Note: Metrics collection are contingent on having metering turned on and configured on the array if necessary. An additional, explicit, action is necessary in Unisphere to enable collection in the VNX.

VMAX Performance Metrics

The VMAX has the following metrics of interest that will be collected for the algorithm:

MetricVariableDescription
FEPort: TotalIOsiopsThe cumulative number of IO requests for a port (read and write).
FEPort: KbytesTransferredkbytesTransferredThe cumulative number of kilobytes transferred for read or write.
FEAdapt: EMCCollectionTimeDirticksThe cumulative number of ticks.
FEAdapt: EMCIdleTimeDiridle

The cumulative number of idle ticks.

FEAdapt: TotalIOsiopsThe cumulative number of I/O operations for the CPU (read and write).
FEPort, FEAdapt: StatisticTimesampleTimeA string representing the current time, of the format yyyyMMddHHmmss.SSSSSSsutc where yyyy - is a 4 digit year; MM - is the mont;h dd - is the day; of the month HH - is the hour (24 hour clock); mm - is the minut;e ss - is the second; mmmmmm - is the number of microseconds; and sutc gives the sign and offset from GMT.

We use these metrics to calculate two values, percent busy for the port (FEPort) , and percent busy for the CPU (FEAdapt). We decided to express port load and cpu load as a percent busy so that it could easily be normalized across different ports, storage systems, and even platform types (using the same algorithms.) Although IOPS are collected, they are not actually used in the metrics calculation at this time.

The port percent busy is calculated as:

Long deltaKbytes = kbytesTransferred[t] - kbytesTransferred[t-1]		// delta kbytes transferred
Long deltaSampleTime = sampleTime[t] - sampleTime[t-1]					// delta time expressed in msec.
Long deltaSeconds = deltaSampleTime / MSEC_PER_SEC;						// MSEC_PER_SEC = 1000
Long portSpeedGbitPerSec;  // the port speed in Gbit/sec. collected from the StoragePort
Long maxKbytesPerSecond = portSpeedGbitPerSec * KBYTES_PER_GBIT;   // KBYTES_PER_GBIT = 1024 * 1024 / 8;
Double portPercentBusy = (deltaKbytes * 100.0 / secondsDelta) / maxKbytesPerSecond;


The CPU percent busy is calculated as:

Long deltaIdle = idle[t] - idle[t-1];				// difference in idle time from time t to time t-1
Long deltaTicks = ticks[t] - ticks[i-1];			// difference in cumulative ticks from time t to time t-1
Long deltaBusy = deltaTicks - deltaIdle;			// difference in busy ticks from time t to time t-1
Double cpuPercentBusy = deltaBusy * 100.0 / deltaTicks	// percent busy


The overall metric is calculated (approximately) according to the following formula:

Double portMetric = 0.0;   // comprised of port and cpu percent busy and volume load, higher is greater usage
if (portPercentBusy >= portUtilizationFloor) {
    portMetric += portPercentBusy;
}
if (cpuPercentBusy >= cpuUtilizationFloor) {
    portMetric += cpuPercentBusy;
    portMetric /= 2;    // evenly weight portPercentBusy and cpuPercentBusy on 0-100% scale
}
// Now account for volume load, 2048 assumed maximum reasonable volume load
portMetric += volumeCoefficient * (volumeCount * 100.0) / 2048


VNX Performance Metrics

The VNX provides the following 

MetricVariableDescription
FEPort: Total IOPsiopsThe cumulative number of IO requests for a port (read and write).
FEPort: KbytesTransferredkbytesTransferredThe cumulative number of kilobytes transferred for read or write.
FEAdapt: IdleTimeCounteridleThe cumulative ticks of idle time (idleTicksValue)
FEAdapt: IOTimeCounterioTimeThe cumulative ticks of I/O busy time.
FEAdapt: TotalIOsiopshe cumulative number of I/O operations for the CPU (read and write).
FEPort, FEAdapt: StatisticTimesampleTimeA string representing the current time, of the format yyyyMMddHHmmss.SSSSSSsutc where yyyy - is a 4 digit year; MM - is the mont;h dd - is the day; of the month HH - is the hour (24 hour clock); mm - is the minut;e ss - is the second; mmmmmm - is the number of microseconds; and sutc gives the sign and offset from GMT.

We use these metrics to calculate two values, percent busy for the port (FEPort) , and percent busy for the CPU (FEAdapt).

The port percent busy is calculated the same as for the VMAX (see above).

The CPU percent busy is calculated as:

Long deltaIdle = idle[t] - idle[t-1];				// difference in idle time from time t to time t-1 (in 100 msec ticks)
Long deltaSampleTime = (sampleTime[t] - sampleTime[t-1]) / 100;    // difference in time between two samples (in 100 msec. ticks)
Long deltaBusy = deltaSampleTime - deltaIdle;		// This represents the busy ticks between two samples
Long cpuPercentBusy = deltaBusy * 100.0 / (deltaSampleTime)   //  Percent busy given 100 msec. ticks

An alternate method is mentioned here: https://community.emc.com/docs/DOC-16144. We evaluated this method but are not currently using it.

I have implemented the  computation for computing SP percent busy based on using the StatisticTime. Currently the BlockStatisticsCapabilities says the ClockTickInterval is 10 usec. However, in a private conversation from Rich xxx's team, it was revealed that the IdleTimeCounter and IOTimeCounter ticks are based on a 100 msec. clock, despite what BlockStatisticsCapabilities says.

The overall port metric is calulated the same as for the VMAX.

HDS Performance Metrics

The HDS array supports the same metrics for ports that the VNX does:

MetricVariableDescription
FEPort: Total IOPsiopsThe cumulative number of IO requests for a port (read and write).
FEPort: KbytesTransferredkbytesTransferredThe cumulative number of kilobytes transferred for read or write.
FEPort, FEAdapt: StatisticTimesampleTimeA string representing the current time, of the format yyyyMMddHHmmss.SSSSSSsutc where yyyy - is a 4 digit year; MM - is the mont;h dd - is the day; of the month HH - is the hour (24 hour clock); mm - is the minut;e ss - is the second; mmmmmm - is the number of microseconds; and sutc gives the sign and offset from GMT.

The port percent busy calculation is the same for the HDS as it is for the VMAX and VNX (see above).

Note: Cpu metrics (cpu percent busy) are not collected on the HDS array. Therefore the cpu percent busy term is not used in the overall Port Metric Calculation. Otherwise it is the same as for the VMAX.

VPLEX Performance Metrics

The VPLEX provides port metrics by logging into each Vplex Cluster's management server using ssh and reading some metrics collection files on the management server. The metrics collected on a VPLEX are:

MetricVariableDescription
TimestampTimeThe sample time in format: yyyy-mm-dd hh:mm:ss in UTC.
Director percent busydirector.busyThe percent time the director is busy performing I/O operations.
Director IOPs/secondsdirector.fe-osThe number of I/O operations executed per second by the director.
Port IOPs/secondsfe-prt.opsThe number of I/O operations executed per second by the port.
Port KB read/secondsfe-prt.readThe number of Kilobytes read per second by the port
Port KB write/secondsfe-prt.writeThe number of Kilobytes written per second by the port.

These metrics are used to calculate:

  • Percent busy for the port which is computed from kbytesTransferred (both read and write) over the time period since the last valid sample.
  • Percent busy for the cpu which is obtained from the director percent busy.

Note: Both VPLEX management servers must be discovered as the each management server only provides the metrics for its corresponding VPLEX cluster.

Non-Performance Metrics

Two metrics will be collected that are not directly performance based. These are useful for setting limits on how many objects we want the ports to select. These are:

  1. The number of Initiators mapped to a port. This is calculated by searching through all ExportMasks in the ViPR database containing the port and summing the number of Initiators that are using (i.e. zoned to) a port. If a "Discover Unmanaged Volumes" has been performed for array, the initiator count will also include initiators in the UnManagedExportMasks (export structures that have not been ingested yet) as of the time of that discovery.
  2. The number of Volumes mapped via a port. This is calculated by searching through all ExportMasks in the ViPR database containing the port and summing all the Volumes available via the port. If a "Discover Unmanaged Volumes" has been performed for the array, then the volume count will also include volumes in the UnManagedExportMasks (export structures that have not been ingested yet) as of the time of that discovery.

II. Computing the Averages and PortMetric

ViPR computes two types of averages related to the Port Metrics. It does this separately for PortPercentBusy and CpuPercentBusy

  1. A running moving average is computed for each metric. Averages are accumulated for a user specified sample period T. This period can be configured between 1 and 30 days.

    • AVG[n] = (n/(n+1)) AVG[n-1] + (1/(n+1)) * currentValue, where n=0, 1, ... and represents the number of samples added to the average The default for this calculation is to create a new average each day (i.e. T = 1).
  2. At the end of the running average period we update a  historical Exponential Moving Average that weights recent changes more heavily than past changes. The weight of the latest simple moving average (k) vs. the weight of past averages (1-k) can be set by the customer to a value between 0 and 1. A value of 1 means only the latest simple moving average would be considered, and past averages would not be considered. At the end of sample period T, the EMA is updated as follows: 
    • EMA(t) = k * LatestSampleAverage(t) + (1-k) EMA(t-1)
    The default value for k is 0.6.

Then the final value for the metric is the combination of the current sample average and the EMA:

metricValue = k * LatestSampleAverage(t) + (1-k) EMA(t-1)

The overall port and cpu percent busy percentages are averaged separately and then the metric value for each is used in the final computation of the metric value, according to the formulas introduced for the VMAX: The overall metric is calculated (approximately) according to the following formula:

Double portMetric = 0.0;   // comprised of port and cpu percent busy and volume load, higher is greater usage
if (portPercentBusy >= portUtilizationFloor) {
    portMetric += portPercentBusy;
}
if (cpuPercentBusy >= cpuUtilizationFloor) {
    portMetric += cpuPercentBusy;
    portMetric /= 2;    // evenly weight portPercentBusy and cpuPercentBusy on 0-100% scale
}
// Now account for volume load, 2048 assumed maximum reasonable volume load
portMetric += volumeCoefficient * (volumeCount * 100.0) / 2048

III. Storing the Metric Values

The current DataModel classes StoragePort and StorageHADomain will be modified to each have a new metrics field. The metrics field will have a mapping from a enumerated key to a String value. Here are the currently defined String keys:

MetricKeyDescription
avgPercnetBusyThe current running average for percent busy of a port or CPU.
emaPercentBusyThe current EMA that represents history for percent busy of a port or CPU.
idleTicksValueThe value of the cumulative idle ticks counter at the last sample. (Used to calculate deltas between two samples).
cumTicksValueThe value of the cumulative ticks counter at the last sample. (Used to calculate deltas between two samples).
kbytesValueThe value of the KbytesTransferred port metric at the last sample. (Used to calculate deltas between two samples).
iopsValueThe value of the cumulative Iops counter at the last sample. Currently not used, informational only.
lastSampleTimeThe time in milliseconds given as a long that the last sample was record. (Used to calculate the delta time between two samples.)
avgStartTimeThe starting time for the current running average.
avgCountThe number of samples in the current running average.

IV. Customer Settable Parameters

The algorithms will take the following parameters which may be individually set for a given array type by the customer. They are found under the "Physical" tab under "Controller Config" and then "Port Allocation". EMC supplies default upon initial ViPR installation. The parameters fall into these categories:

  • Maximum Limits or Ceilings. There are ceiling values that can be set representing the maximum value a particular can have before the port is disqualified for allocation purposes. These include the counts of initiators or volumes using the ports, and the percent busy of the port itself or the cpu affiliated with the port. If any metric for that port is at or above the specified ceiling, the port is disqualified, and will not be used for provisioning. Set the ceilings carefully, if too many ports are disqualified, you may not have enough ports to satisfy allocation requests, or the ports that remain qualified may not have the desired redundancy characteristics.
  • Sample Time Parameters. As explained before, there are two averages that are used, the initial running average used to create a valid metric, and the Exponential Moving Average used to weight the consideration of the current average versus past average calculates.
  • Floors on Port Percent Busy or Cpu Percent Busy (new in Skywalker). This is just to ignore background "noise" on an idle system. The floor value is the minimum percent busy considered significant. Think of it like the idle speed on your car's tachometer. If a metric (PortPercentBusy or CpuPercnetBusy) is below the "Floor" value, it is not counted at all. If PortPercentBusy aand CpuPercentBusy are both below the floors, then the only remaining component of usage will be the number of volumes, and volumes will be placed according to balancing the volumes across the ports.
  • Coefficients. These control the relative weighting between different components of the metric calculation.
ParameterType

Default

Value

Min.

Value

Max.

Value

Description
Initiator CeilingCeilingunlimited1N/AIf the number of initiators using a port is equal to or greater than the ceiling, the port is disqualified from allocation. Use this value to control the absolute maximum limit of the number of initiators that will use a port.

Volume Ceiling

Ceilingunlimited1N/AIf the number of volumes using a port is equal to or greater than the ceiling, the port is disqualified from allocation. Use this value to control the absolute maximum limit of the number of volumes that will use a port.
Port Utilization CeilingCeiling100%0%100%

If the port percent busy of a port is equal to or greater than the ceiling, the port is disqualified from allocation. Use this value to set a maximum limit on the port utilization expressed in percent.

Cpu Utilization CeilingCeiling100%0%100%

If the cpu percent busy of a port is equal to or greater than the ceiling, the port is disqualified from allocation. Use this value to set a maximum limit on the cpu utilization expressed in percent.

Days to Average UtilizationTime1130The number of days that samples are averaged before being considered a valid "metric".
Weight for Exponential Moving AverageTime0.601.0The weight k given the current metric versus previous metrics ( k * metric[T] + (1-k) * metric[T-1] )
Port Utilization FloorFloor3%0%100%If the port metric is below the floor, it is arbitrarily eliminated from consideration by setting it to 0.0. This should be set just slightly higher than the idle "Port Percent Busy" on a port on which essentially no I/O is being performed.
Cpu Utilization FloorFloor8%0%100%If the cpu metric is below the floor, it is arbitrarily eliminated from consideration by setting it to 0.0. This should be set just slightly higher than the idle "Cpu Percent Busy" on an idle array where essentially no I/O is being performed.
Metric Volume CoefficientCoefficient1.00.05.0A coefficient controlling the importance of the number of volumes in the metric calculation. A port is assumed to service a maximum of 2048 volumes (arbitrarily.) Therefore about 20.5 volumes is assumed to be equivalent to a 1% volume load. The volume load (in percent) is added to the metric containing the port and cpu utilizations. Setting the Volume Coefficient to a higher value will make the volume counts using a port more predominate in the selection of the least used port.

V. Port Allocation Algorithm


Here is a summary of the Port Allocation Algorithm, detailing how the Port Metric is used:

  1. A candidate port list is identified based on being in the desired Network and Virtual Array, and the port having a good status.
  2. Ports that exceed one or more of their ceilings are eliminated from the candidate ports list.
  3. The PortUsageMetric for each port is now calculated by combining the metric values for the Port%Busy and Cpu%Busy using an equal weighting.
    • If the array does not support Port%Busy or Cpu%Busy metrics we will use number of volumes mapped to the port.
  4. The first port allocated will be the one with the lowest PortUsageMetric.
  5. Subsequent ports are chosen based on two criteria:
    • Ports sharing hardware components with previously allocated ports (i.e. that do not provide redundancy) are eliminated unless all other equivalent components have been used. This ensures ports on different Directors or Engines will get selected if possible.
    • Of the remaining ports, each port chosen is the one with the lowest Port Usage Metric.

VI. Effect of Port Ceilings on NumPaths Matcher


Ports with one or more ports over their “ceiling” values are not used for allocation. Consequently, ports with one or more metrics over their ceiling value are not “usable” according to the criteria for the NumPaths matcher

The NumPaths matcher will disqualify Storage Pools that belong to an array where the number of “usable” ports on the array is less than max_paths in the Vpool. Here a “usable” port must:

  • Be registered and have a port status of OK or UNKNOWN (not BAD).
  • Must be a “front-end” port.
  • Be assigned to a Network.
  • Must not have any port metric over a “ceiling” value (that would disqualify the port). (new)

VII. Using Array Metrics to Affect Pool Selection

It is desirable to compute an overall array usage metric, and to use that value to steer allocations to arrays that are not over utilized.  For this purpose we will compute an array usage metric as  the ŸAverage of the Port Usage Metric for all “usable” ports on the array, where usable here means ports included that are included in this average must be Status “OK” or “Unknown”, REGISTERED, and assigned to a Network. (Arrays with no usage metric (not calculated for some reason) would be set at 50% usage.)

The Pool Selection code then is changed as follows:

  1. The Storage Pools matching the attributes in the Vpool are sorted in descending order of space available.
  2. For a given allocation request, the Storage Pool with the lowest array usage metric is chosen that has the required space.

This will result on placing the volume in the array with the lowest overall usage.

VIII. API and UI Changes

The current StoragePortRestRep seems to have two fields that are metrics based:  "avg_band_width" and "static_load". As far as I can tell, these fields are not really populated. I propose we replace these with the following new fields that would be read-only for use by the UI (and other tools) in reporting metrics.

Field in StoragePortRestRepTypeDescription of Field
port_allocation_metricFloatThe overall port metric that has been computed for port allocation purposes. This value will be in percent (0-100%). Arrays that do not support port metrics will supply an empty field (i.e. <port_allocation_metric></port_allocation_metric>. In this case the UI should display "N/A" for not available or not applicable.
port_percent_busyFloatA number between 0 and 100% that represents the average amount of bandwidth transferred vs. the maximum bandwidth that could be transferred as computed by the port speed. (Not all array times may support this metric. If not supported, it will be 0%.).
cpu_percent_busyFloatA number between 0 and 100% that represents the average percent of time the cpu containing this port is non idle. (Not all array times may support this metric. If not supported, it will be 0%.).
initiator_loadIntegerThe number of Initiators ViPR is aware of that are using this port.
volume_loadIntegerThe number of Volumes ViPR is aware of that are mapped through this port.
allocation_disqualifiedBooleanA boolean that when true indicates one of the port metrics was over a ceiling value that disqualifies it for allocation.


IX. Testing Strategies

There are three different types of testing that need to be done for this work. They will be outlined here.

  1. Testing that we make the right decisions given metrics for ports (i.e. the allocator uses the metric), the ceilings correctly disqualify the port, NumPathsMatcher, and Storage Pool Selection.
    1. For this type of testing, the desired metrics could be injected directly into the database records for the StoragePort and associated StorageHADomain. (Note- I am not planning API support for writing metrics).
  2.  Testing the calculation of the metrics themselves, we inject various loads on a port and ideally compare it with some other barometers (indication on the array itself, maybe SRM) ..
    1. QE will need to then run provisioning operations against the array, and run load on provisioned volumes. As load on some ports increases, others should be used. This can be done

      using simple tools like IOMeter or Adios, and stretch goal is to preferably use workloads that are representative of typical customer workloads (VMAX folks should have this). They then need to track how load across ports (load distribution) on an array does over time for allocated ports that are seeing IO. 

    2. QE should run the above experiment with today's ViPR to establish a baseline before incorporating ViPR bits that have perf-based port selection.

    3. We will want to add the metrics generation into the array simulator (at least for the VMAX) (Please Jai...)
  3.  The longer term aspects of how the algorithm actually does over time. 
    1. Another aspect is generating some data to show how port allocation algorithms did over time - PMs could use this info when talking to customers. This can be done as described in 2a but will be over a longer time period.


X. Notes on Testing (important)

  1. If testing on a single node system, you must set the controllersvc variable to allow metrics collection: "controller_enable_auto_discovery_metering_scan_single_node_deployments=true". If this variable is not set, no metering will be run on single node systems and thus no metrics will be collected.
  2. If using the AIO Simulator to test metrics, note that only ECOM462 and VPLEX supports metrics. ECOM80 does not support metrics collection.
  3. By changing the "Days to Average Utilization" variable to a value greater than 10000, the value is interpreted in minutes to average (for faster testing times) The number of minutes used is the setting minus 10000. Thus a value of 10060 would be interpreted as 60 minutes.
  • No labels