Skip to end of metadata
Go to start of metadata

High Level Functional Overview and Design Specifications

Introduction

LPARs (Logical Partitions) are virtual machines in the IBM P-Series systems. LPAR Mobility allows migration of LPARs from one Managed System to another. On these systems, the Virtual I/O Server (VIOS) allows sharing of physical resources between LPARs including virtual SCSI, virtual Fibre Channel (NPIV) and virtual networking.

N_Port ID Virtualization (NPIV) is a standardized method for virtualizing a fibre channel port. An NPIV-capable fibre channel HBA can have multiple N_Port IDs, each with a unique identity and world wide port name. On the SAN switch, the physical FC ports log in on switch ports with physical WWPNs. On the same switch port but with subsequent port IDs there are virtual world wide names (vWWNs) of the virtual fibre channel client adapters in the client partition. These virtual NPIV world wide names start with C0:--:--:--:--:--:--:--.

NPIV is used in non-IBM environments also (e.g. VMware).

On LPARs, there are WWPNs in multiples of two, existing in pairs. They are associated with one virtual fiber channel client adapter. At any given time, only one of the WWPNs in the pair is active. If I1 and I2 are the pair of initiator WWPNs and I1 is active, then after a successful partition migration I2 will active and  while the I1 will become inactive.This switching from active to passive and vice versa is taken care by HMC (or other IBM management software).

Problem Statement

Based on the above introduction, lets assume the user is migrating LPAR which has paired initiator I1 and I2 as WWNs, where I1 is the active WWN that is logged into the switch. So the export mask contains I1 along with the volume to be exported. Once the migration completes, I2 becomes active and I1 will be inactive. For the migrated LPAR, the exported volume is no longer visible. All user operations on the migrated LPAR are disrupted. This feature implementation will address this problem where the exported volume will be visible to the migrated LPAR and user operations are seamless.

One goal of this feature is to come up with paired initiators, in which two Initiators get paired with each other and treated as a single entity. This can also work in non-IBM environments . There is no explicit support for NPIV and VMs in CoprHD today. Some customers have worked around this limitation by adding VMs with NPIV as physical hosts in the UI. While this solution works to an extent, there are several drawbacks to this approach – e.g. VMs show up as hosts in the UI, and CoprHD sometime may apply host specific actions which are not really applicable to VMs. 

Functional Requirements

JIRA ID1-Line DescriptionComments, notes, etc.

COP-18546

CoprHD should discover active/passive HBAs and use them in zoning and masking (LPAR)


This is to support IBM live partition mobility (LPM) usecase.

  • Ability to distinguish virtual host from a physical host.
  • Ability to pair initiators of a Virtual Host.
  • When exporting a volume from CoprHD it must add both initiator to masking view.
  • During zoning, CoprHD must zone the paired initiators with same set of storage ports.
  • CoprHD should not maintain or monitor states (active or passive) of initiators. Switching back and forth between states should not trigger any alert in CoprHD, nor any action needs to be taken.
  • From a multi-path perspective, for a pair of initiators it is as good as having a single path to a storage port.
  • Both initiators from pair should be part of same network.
  • Shutting down the virtual host or changes in virtual host initiator(s) should not create actionable events.

Design Approach

High Level Implementation

Design Summary: This covers what has changed from the previous review. LPAR Mobility- Design Changes

Data Model

Host: New attribute virtualMachine is added to existing Host CF. User can add a virtual  Host in CoprHD by making this field true.

Since LPAR is added manually in CoprHD (no auto discovery), no events are triggered when these hosts or initiators are down.

Initiator: A new attribute "associatedInitiator" is added in "Initiator" CF to link one initiator with another initiator. This enables us to add pair of initiators to a virtual host and to treat the pair as a single unit.

The states of the initiators (active or passive) are not maintained or monitored by CoprHD. The pair of initiators will be treated as one entity for zoning, masking and all other operations.

If any one of the initiators in the pair is added/removed to/form Host or Network, both should get added/removed. (See flowchart below).

Storage Port Assignment

  1. When a volume is exported to an Virtual Host, both the initiators in the pair must be zoned to the same set of storage ports. Lets say, if  I1 and I2 are the paired initiators of an Virtual Host and a volume is being exported through storage ports P1 and P2. Then both I1 and I2 are zoned with both P1 and P2 respectively. So the zoning map looks like this:

    I1 → {P1, P2}
    I2 → {P1, P2}
    This makes sure even if an LPAR is migrated from one managed system to another, the volume is accessible to the LPAR via the same storage ports.
  2. Only one initiator should be considered while calculating the path from Host to Storage Port, since at any given time only one of the initiators will be active.

    DefaultStoragePortAssigner: This is the place where storage ports are assigned to initiators to construct the zoningMap:
    Procedure: assignPortsToHost (See flowchart below)
    1. If there are no storage ports assigned to an initiator (I1), then
       1a. Check if its associated-initiator (I2) has storage ports assigned.
       1b. If yes, then assign the same storage ports to I1
       1c. Remove I1 from the initiator list.

Array drivers: This feature is generic to all storage arrays, so no changes are required to the array drivers.

Co-existence, Ingestion and Upgrade cases

Test Case#

Category : Test Case Name

Test Case Description

Expected Result

Actual Result

1

Co-existence : New LPAR co-existence with Old LPAR

 

 

    • LPAR ‘old-lpar’ present in pre 4.0 and has some of volumes exported
    • Upgrade to 4.0
    • Add a new LPAR ‘new-lpar’ after upgrade as a virtual host
    • ( Add Volume ) Perform new volume export operations to new LPAR
    • ( Add Volume ) Perform new volume export operation to old LPAR
    • ( Remove Volume ) Perform un-export volume operation from 'old-lpar'
    • ( Remove Volume ) Perform un-export volume operation from 'new-lpar'
  • Export operation to “old-lpar’ and ‘new-lpar’ works without any issues
  • Un-export operation from 'old-lpar' and 'new-lpar' works without any issues.

 

2

Co-existence : Fake cluster and new LPAR

 

 

    • Fake cluster ‘fake-cluster’ having 2 LPARs one for active HBA and other for passive HBA is present in pre 4.0 and has some volumes exported.
    • Upgrade to 4.0
    • Add a new LPAR ‘new-lpar’ after upgrade as a virtual host to Fake cluster ‘fake-cluster’
    • ‘new-lpar’ should have both active and passive HBAs added
    • Perform volume export operations to ‘fake-cluster’
  • This falls in the category of invalid configuration and we don't expect user shall do such a configuration.

 

3

Co-existence : New LPAR co-existence with Old LPAR with any OS combinations

 

    • An existing LPAR and new LPAR must be able to co-exist with any other OS host  (regular AIX, Linux, HPUX, Solaris…) for a given Storage Array (VPLEX or VMAX initially, but also VNX, XIO, Unity, DellSC…)
    • LPAR ‘old-lpar’ present in pre4.0, this LPAR is with windows OS
    • Upgrade to 4.0
    • Add a new LPAR ‘new-lpar-aix’ as a virtual host
    • (Add Volume) Perform volume export operations to both lpars
    • (Remove Volume) perform volume un-export operation from both lpars

 

  • Export operations should go through with no issues.
  • Un-export operation works with no issues

 

4

Co-existence : New LPAR co-existence with ESX or Windows cluster

    • A new LPAR must be able to co-exist with any other cluster (ESX, Windows, regular AIX, Linux, HPUX, Solaris…) for a given Storage Array (VPLEX or VMAX initially, but also VNX, XIO, Unity, DellSC…)
    • A cluster exists in pre 4.0
    • Upgrade to 4.0
    • Add a new LPAR ‘new-lpar-aix’ as a virtual host
    • (Add Volume) Perform volume export operations to both lpars
    • (Remove Volume) perform volume un-export operation from both lpars
  • Export operation works with no issues
  • Un-export operation works with no issues

 

5

Upgrade : Old LPAR update

 

 

    • A host (Consider both manually added and discovered hosts) in ViPR (3.6) has some initiators and some volumes exported to it. After upgrading to Leia (or whatever release has the LPAR support), user tries to add a passive initiator to that host’s existing active initiator.

 

  • ViPR-C should reject this attempt to add a passive initiator since the host is not a virtual host.

 

 

6

Upgrade : Adding existing initiator WWN

    • A host is ViPR (3.6) has an initiator and some volumes exported to it. The admin tries to add a new LPAR but the active (or passive) virtual initiator has the same WWN as the existing host’s WWN.

 

  • ViPR-C should reject this attempt, since initiator already exists in ViPR DB.

 

7

Upgrade : LPAR with its VIO server

    • Admin adds an LPAR where the underlying VIO server is already part of ViPR DB, and some volumes are exported to it
  • ViPR-C should be able to support this – simultaneous export of volumes to LPAR and the physical server on which that LPAR is running

 

8

Upgrade : Active and passive belongs to different network??

    • Admin adds an LPAR where active initiator is part of one network, but the passive initiator belongs to another network
  • This is an invalid use case, we need to elaborate on why it is invalid case

 

9

Ingestion : Volumes exported to LPAR outside ViPR

    • LPAR has volumes exported before it is brought into ViPR, it has both active and passive initiators
    • Add this LPAR to ViPR
    • Ingest volumes exported to LPAR and bring them into ViPR

 

 

10

Ingestion : Volumes exported to LPAR cluster outside ViPR

    • Cluster of LPARs created and has volumes exported to it before it is brought into ViPR. All LPARs present in cluster have active and passive initiators
    • Add this cluster along with LPARs into ViPR
    • Ingest volumes exported to cluster, ingest volumes exported to individual hosts

 

 

11

Ingestion : SRDF protected volumes exported to LPAR host and LPAR cluster

    • LPAR cluster and or LPAR hosts have SRDF protected volumes exported
    • Add these cluster and host to ViPR
    • Ingest volumes exported to these cluster and host

 

 

12

Ingestion : Volumes with FAST policies exported to LPAR host and LPAR cluster

    • LPAR cluster and or LPAR hosts have FAST policy volumes exported
    • Add these cluster and host to ViPR
    • Ingest volumes exported to these cluster and host

 

 

13

Co-existence : Cluster with fake host and new LPAR

    • Cluster of LPARs is represented as a cluster with a host having active initiator and a fake host having passive initiator
    • Upgrade to 4.0
    • Add a new LPAR ‘new-lpar’ after upgrade as a virtual host to a cluster having fake host,
    • ‘new-lpar’ should have both active and passive HBAs added
    • Perform volume export operations to cluster

This falls in the category of invalid configuration

 

14Co-existence : VMAX : 1MV + Non-Cascaded SG + FAST 1
    • LPAR present in the data-center with one FAST volume already exported
    • ViPR gets deployed
    • Create LPAR as a virtual host in ViPR
    • Perform following operations
      • Create Export
      • Add Volume ( FAST policy )
      • Remove volume
      • Add Initiator
      • Remove Initiator


Create Export : MV reused and a new export mask gets created in ViPR and volumes are added to it.

Add Volume : The volume gets added to non-cascaded SG in MV1

Remove Volume : Volume will be removed, but MV remains on the storage system as it is not created by ViPR. If external volumes are removed by the user and the volume to be removed from ViPR is the last volume, then such MV will be deleted.

Add Initiator : Initiator gets added to the IG in the MV1

Remove Initiator : Initiator gets removed from ViPR mask, but it will not be removed from the IG on the array.



15Ingestion : VMAX : 1MV + Non-Cascaded SG + FAST 1
  • LPAR present in the data-center with one FAST volume already exported
  • ViPR gets deployed
  • Create LPAR as a virtual host in ViPR
  • Ingest the volume already exported into ViPR
  • Perform following operations
    • Add Volume ( FAST policy )
    • Remove volume
    • Add Initiator
    • Remove Initiator

Add Volume : The volume gets added to non-cascaded SG in MV1

Remove Volume : Volume removed, and if its the last Volume, then MV1 will be deleted.

Add Initiator : Initiator gets added to the IG in the MV1

Remove Initiator : Initiator removed from IG. MV1 will be deleted if this is the last initiator to be removed.


16Co-existence : VMAX :

2 Masking Views

MV 1 + Non-Cascaded SG + FAST 1

MV 2 + Non-Cascaded SG + FAST 2

  • LPAR present in the data-center with one FAST volume already exported
  • ViPR gets deployed
  • Create LPAR as a virtual host in ViPR
  • Perform following operations
    • Create Export
    • Add Volume ( FAST policy )
    • Remove volume
    • Add Initiator
    • Remove Initiator

Create Export (FAST1) : MVs reused, 2 Export Masks created in ViPR, and volume gets to added to the masking views based on FAST policy selected, in this case MV 1 is selected.

Add Volume ( FAST1 ) : The volume gets added to non-cascaded SG in MV1

Remove Volume : Volume gets removed from the right MV. MV will be removed if it is the last volume to be removed from MV

Add Initiator : Initiator Added to IG , as IG is shared across these masking views, the initiator indirectly gets added to both MVs.

Remove Initiator : Initiator gets removed from ViPR mask, but it will not be removed from the IG on the array.


17Ingestion : VMAX :

2 Masking Views

MV 1 + Non-Cascaded SG + FAST 1

MV 2 + Non-Cascaded SG + FAST 2

  • LPAR present in the data-center with one FAST volume already exported
  • ViPR gets deployed
  • Create LPAR as a virtual host in ViPR
  • Ingest the volume already exported into ViPR
  • Perform following operations
    • Add Volume ( FAST policy )
    • Remove volume
    • Add Initiator
    • Remove Initiator

Add Volume ( FAST1 ) : The volume gets added to non-cascaded SG in MV1

Remove Volume : Volume gets removed from the right MV. MV will be removed if it is the last volume to be removed from MV

Add Initiator : Initiator Added to IG , as IG is shared across these masking views, the initiator indirectly gets added to both MVs.

Remove Initiator : Initiator gets removed from ViPR mask, if it is the last initiator to be removed, then IG gets deleted and so the MV.



Exclusions and Limitations

    •  User needs to add the paired initiator manually to the Virtual Host.

Future Work and Related Projects

      • The project for supporting IBM PowerVC depends on and will use the APIs implemented in this project.

      Management Station (HMC) Discovery

      The details for LPARs and theirs active/passive (NPIV enabled) WWNs can be found from HMC (Hardware Management Console).

      Discovery of HMC will be similar to discovery of vCenter. Here all the managed systems and all the NPIV enabled LPARs of each managed system will be discovered. For each LPAR we will get the active and passive WWNs. The discovered information will be persisted in a new CF "ManagementStation".

      Discovery operation requires these column families listed below:

      1. Management Station
      2. Managed System (similar to ESX server)
      3. Host with VirtualMachine attribute.
    • ingestion exported volume of LPAR  into CoprHD.

Implementation Strategy

Implementation Timeframe

This project is targeted for the 3.5 SP1 release.

RISK: Timely Infrastructure availability for testing the live partition mobility

Is this a multi-phase project? No

Virtual Team

Who will work on this change? Herlekar, AmitSaurabh, AmitKiran UdupaNimai SoodKumar, Pooja

Does the virtual team span multiple physical teams? If so, how are you going to share responsibilities and ensure consistency? No

Testing Strategy

This section will describe how existing tests or methods can be applied to this new functionality, or what new test capability should be added to enable testing here. Test link should reflect the old/new testcase mapping as desired. Overall testing scope should cover the following:

                Manual testing:

      • System level API Testing covering functional areas
      • End to End usecase test approach

      • Negative tests

      • Co-existence and ingestion
      • LPAR testing will be covered on VMAX and VPLEX

                Automation:

      •  API based automation for testing new APIs
      • Catalog/API based automation for testing impacted areas
      • Testing priority for array as flow: VMAX, VPLEX, VNX, UNITY, XtremIO

                Migration of existing customer:

      • Migration script or document will be developed and validated

                Regression:

      • Combination of automation and manual testing

                Lab requirements:

      • HSM and VIOS servers and arrays are available in lab
      • Need to configure the required LPAR and LPAR clusters
      • Dummy hosts and array simulators will be used for automation                 

High level use cases those will be covered

      • Addition of LPARs and paring of its initiators.
      • Block Provisioning to LPAR
      • Perform LPAR mobility outside of CoPRHD and check the behavior.
      • Volume export for multiple hosts 
      • Ingestion use cases for already provisioned volumes outside CoprHD.

All the above cases will be tested for LPAR and VMAX combination.(testing priority for other array as flow  VNX, UNITY ,Xtremio)

One of the sample test cases that needs to be executed:

      1. Discover the underline storage system (VMAX, VNX)
      2. Add the lpar partition as Virtual host . Validate if virtuaMachine  attribute is present in host CF.
      3. Add paired of initiator to virtual host.
      4. Export a Volume to virtual host and validate if export succeed and data can be written
      5. Run the LPAR mobility outside ViPR to move the partition under other VIO server
      6. After the partition moved check if the exported volume is still accessible and all operations( export, unexport, delete, expand) should work


*Test plans will be attached to the wiki shortly.

Documentation Strategy

This section will describe, how the new feature affect IDD deliverables? For example, are there new error messages,  online help components, CLI commands etc.

Impact Evaluation

Public APIs and public functionality (end-user impact)

New APIs

FunctionREST APIComments
Create initiator pair for the host
POST https://<ipaddress>:4443/compute/hosts/{id}/paired-initiators
Request body:
<paired_initiator_create>
<first_initiator>
<protocol>FC</protocol>
<initiator_node>C0:00:00:00:C9:D0:20:C3</initiator_node>
<initiator_port>C0:00:00:00:C9:D0:20:C4</initiator_port>
<name>virtualport1</name>
</first_initiator>
<second_initiator>
<protocol>FC</protocol>
<initiator_node>C0:00:00:00:C9:D0:20:C1</initiator_node>
<initiator_port>C0:00:00:00:C9:D0:20:C2</initiator_port>
<name>virtualport2</name>
</second_initiator>
</paired_initiator_create>

Link (pair) one initiator to another initiator
PUT https://<ipaddress>:4443/compute/initiators/{id1}/associate/{id2}

Delink initiator from another initiator
PUT https://<ipaddress>:4443/compute/initiators/{id1}/dissociate

Changes in existing APIs

REST APIChanges in input or outputComments
Add a host With Virtual Machine true property.Modification: Added a new  optional request parameter "virtual_machine" .
POST https://<ipaddress>:4443/compute/hosts
<host_create>
<type>Linux</type>
<name>hoatLpar1</name>
<virtual_machine>true</virtual_machine>
<port_number>22</port_number>
<user_name>root</user_name>
<password>dangerous</password>
<host_name>abc.com</host_name>
<use_ssl>true</use_ssl>
<discoverable>false</discoverable>
<tenant>{tenant_id}</tenant>
</host_create>

Backward compatible


Update physical host to virtual host
PUT https://<ipaddress>:4443/compute/hosts/{id}
<host_update>
<type></type>
<host_name></host_name>
<virtual_machine>true</virtual_machine>
<os_version></os_version>
<name></name>
<port_number></port_number>
<user_name></user_name>
<password></password>
<use_ssl></use_ssl>
<cluster></cluster>
<vcenter_data_center></vcenter_data_center>
<project></project>
<discoverable></discoverable>
<tenant></tenant>
<boot_volume></boot_volume>
</host_update>

Changes in GUI

    1. In list hosts page new column to display whether host is Virtual or not
    2. In Add host page, have a checkbox that will let the user select whether the host is a virtual machine or a physical host.
    3. In add Initiator to host, provide an option to add associated Initiator

Changes in CLI

    1. Update the existing CLI command for adding paired Initiator to host.

Some Use Cases

S.NOScenario/use caseHow to handleComments
1Exporting Volume to LPAR partition in CoprHD.

Prerequisites:

Create a Host with auto discovery false and Virtual Machine true .

Add a pair of initiators to it.

Create a volume on VMAX.

Export this volume to host(LPAR)

In the APISVC the initiator pair of virtual machine is fetched and export orchestration controller is invoked. Ultimately, an export mask is created with both initiators added in it.
2A LPAR already discovered in CoprHD as a normal Host with volumes exported to it. Customer upgraded CoprHD to new version.

Proposal: To be discussed

a. Modify the host to virtual host

b. Add new Initiator to host and pair it.

c. Export volumes to the host.


3Ingestion of exported volumes: Exported volume to LPARs are present in storage array but not in CoprHD. Customer want to ingest it.

Storage type: "Exclusive":

CoprHD should scan the masking views for volume and find the closest masking view for virtual machine initiators. If there is match found, assign the volume to the export group.

This will be addressed in future releases.
4Execute LPAR mobility outside CoprHD

Volume should be accessible after migration of LPAR.


No changes are required in CoprHD.
5Shutting down of LPARInitiators will be logged off. No event should get triggered.

Other components and component interaction

If the change spans multiple components and/or changes interaction between components, can it create any new problems?

Does this change affect dependencies between services?

Does it change startup or shutdown order?

Does it require any new management interfaces?

Detailed design

Changes in existing operations

OperationDescriptionSteps
Create/Update Export Group: New masking view, new Initiator GroupA new masking view is created on the array when there are no existing masking views.Both the initiator WWNs are added to the new initiator group. This newly created initiator group will be used in creating new masking view.
Create/Update Export Group: New masking view, existing Initiator GroupA new masking view is created on the array when there are no existing masking views. Existing initiator groups will be used.

Check if both the initiator WWNs exists in the initiator group. If not, add the initiator WWN which is absent.

Use the initiator group to create new masking view.

Create/Update Export Group: Existing masking view, existing Initiator GroupA masking view exists on the array with an initiator group.

Check if both the initiator WWNs exist in the initiator group. If not, add the initiator WWNs which are absent.

Add InitiatorThe user adds the initiator of a virtual Host manually.

If the user wants to add the corresponding paired initiator, there will be a field to capture it in the add initiator request.

Update the export group
a. Update the underlying initiator group by adding the pair of initiators.
b. Update the export mask.

Remove InitiatorThe user removes the initiator of a virtual Host manually.

When the user removes the initiator and if this initiator belongs to a pair:

Update the export group
a. Get the corresponding paired initiator.
b. Update the underlying initiator group by removing the pair of initiators.
c. Update the export mask.

Changes in classes and functions

ClassMethodChanges
InitiatorServiceassociateInitiatorNew function to pairs/associates two initiators with each other.
NetworkService

checkAndFilterAddEndpoints

checkAndFilterRemoveEndpoints

If an initiator has associated initiator, then both initiator WWNs are added to the list of endpoints.

If an initiator has associated initiator, then both initiator WWNs are removed from the list of endpoints.

ExportGroupService

createExportGroup

updateExportGroup

createExportGroup:

    • While checking for network connectivity, if one of the initiators in the pair is in the network then validation is successful.

updateExportGroup:

    1. addInitiators: If the initiator being added has associatedInitiator, then both initiators are added to the request to BlockController
    2. addInitiators: If the initiator being added has associatedInitiator, then both initiators are removed from the request to BlockController
NetworkSchedulerplaceZones

Since the initiator which is logged into the switch is part of the Network or VSAN, it is enough to check that anyone in the initiator pair is part of that network. Then It is implicit that the associated initiator is in the same network.

DefaultStoragePortAssignerassignPortsToHost

This is where the zoningMap is constructed initially when export volume operation is invoked.

    1. If there are no storage ports assigned to an initiator (I1), then
    Check if its associated-initiator (I2) has assigned storage ports.
      1. If yes, then assign the same storage ports to I1
      2. Remove I1 from the initiator list.

NetworkDeviceController

getInitiatorsZoneInfoMap

getInitiatorsZones

Get Network of the initiator pair. if anyone of the initiators in the pair is part of a network, then it is implicit that the associated initiator is also part of the same network.
BlockStorageScheduler

selectStoragePortsInNetworks

assignPrezonedStoragePorts

Get Network of the initiator pair. if anyone of the initiators in the pair is part of a network, then it is implicit that the associated initiator is also part of the same network.
ComputeSystemController

addInitiatorsToExport

removeInitiatorFromExport 

These methods are used when initiator is added/removed to/from the host.

    • If an initiator is added, then add the associated initiator also
    • If an initiator is removed, then remove the associated initiator also

DefaultStoragePortAssigner

Methods: assignPortsToInitiatorFromAssociatedInitiator and assignPortsToAssociatedInitiator

These two routines take care of assigning storage ports to associated initiator. In this way, we make sure the pair of initiators is assigned the same storage ports. It is called in assignPortsToHost()

 

Flow Charts

Add initiator to network

Assign Storage Ports to Host

Persistence model

Column Family: Initiator and Host

Initiator Host

Testing of prototype

The prototype code is tested on the live LPM setup with AIX host using VMAX  as storage and brocade fabric. Below are steps executed

    1. Add virtual host
    2. Add a pair of initiator to the virtual host.
    3. Create a volume and export it to virtual host.


4. Check the zoning in the switch


 5. Check the masking view in storage provider


6. Mount the volume to AIX client.


7. Create a file with some content in the mounted directory and save it.

8. Run the LPAR mobility from HMC and check if the file is accessible.


Upgrades

  Upgrade scenario mentioned in the above  Impact evaluation table point 4.

Are there any other special considerations regarding upgrades? Think twice - this is rarely obvious (wink)

Will we need any new tools to debug or repair system if the upgrade code specific to this change fails?

Performance

 Can this change adversely affect performance? If so, how are you going to test it? Is there a way to test this early, before the entire implementation is ready?

Scalability and resource consumption

Will it scale? How long will essential operations take at the scale of O(10,000,000)? How are you going to test it?

Will specific performance at scale testing be required? No

Does this change have impact on memory and CPU usage? How does memory and CPU usage scale with the number of objects? No

Security

Are there any implications for CoprHD security? No

Will new security scans be required? No

Deployment and serviceability

Developer impact, dependencies and conditional inclusion

      • What is the impact on build time?
      • Is there any impact on integration with IDEs (Eclipse and IntelliJ)?
      • Are there any new third party dependencies? If so, please list these dependencies here:
Package NamePackage VersionLicenseURLDescription of the package and what it is used for















      • Shall this feature by included with a special build-time flag?
      • Shall this feature be enabled only under certain run-time condition?

Reviewed and Approved By

NameDateVoteComments, remarks, etc.
2016Aug24 (pass 1)No vote

Recommendations:

    • Remove concept of virtual machine from NB API. (It can be added as a first-class representation regardless of how the data is modeled in the DB)
    • Remove VirtualMachine CF. Add attribute to Host. Add to design impact on Host UI and controller Host queries. (we'll need to add a constraint filter, which is minor)
    • OK with just having an associatedInitiator field in Initiator
    • Document design to during zoning/masking operations that associated initiators are added to requests and pathing arguments are doubled or ignored
    • Update the wiki to document the proposed design (gray out or remove approaches that are no longer valid) and have another meeting

 

+1Approved, pending resolution of the question I asked in the last review session concerning the somewhat redundant "add two initiators and link them" API. Overall, nice work taking the feedback from the last review and simplifying the design.  Thank you!



2016 Sep 25+1Approval contingent on addressing my comments in the block starting with "Here are my comments after another reading of the wiki page"
Sept 26 2016+1Approved

Jan 17 2017


Started looking at this again ,and added comments, will approve once I hear back.

Based on the responses, looks like the current design handles Greenfield exports. Ingestion and co-existence case need to be analyzed.

Please evaluate co-existence and ingestion scenarios on exports and include the same in the design for VMAX. We cannot deliver exports without co-existence and ingestion cases.

Also, initiators are abstract objects used across all the platforms (VPlex, XtremIO,..). We cannot deliver exports without testing these platforms.


Approval required by this delegate via Bill Elliott


Thanks for the excellent write up. I agree with introducing associated initiator and running port allocation even if one storage port is connected, but the latter part of the design I have a different view point.

As discussed in last meeting , I was more inclined towards moving the associated initiator's masking/zoning to the device drivers.

Design Points:

    1. Export Mask will never have associated initiator reference through the initiators field.
    2. Orchestrations doesn't need to understand about associated initiators.
    3. Orchestrator after deciding the workflow steps, calls the device driver with 2 fields
      1. List of Initiators and their corresponding associated initiators.
    4. Its the driver responsibility to decide whether to add the passive initiator to the same IG or create a new IG or create a new masking view for the passive IG.
    5. The orchestrator could enforce a condition to force the driver to use the passive initiators in masking , if needed.

The reason for moving the decision to device drivers as below

    1. One of the key points customer require from us is to provide flexibility, this is an ongoing issue and we are struggling with the same even now.
      1. e.g. Each customer has their own way of provisioning masking on the Array, it could be single Masking view per Cluster, or multiple per Host and so on.
      2. I see from the above design, the passive initiators are being added to the same IG as active ones. There is absolutely nothing wrong in doing that, but what we have seen so far, is each customer has a different way of doing things ,and they want ViPR to not change that pattern, 
      3. what if customer wants to maintain a separate IG for associated initiators?
      4. What if customer wants to create a new MV for associated initiators?
      5. What if customer doesn't want to mask associated initiator for certain type of OS?
      6. What if customer doesn't want to add associated initiators to masking, because he doesn't want that set of volumes to be participating in migration?
    2. Most important moving the implementation to device drivers is very SIMPLE..No need for any changes at any level of the code, other than device driver.
    3. The concern most people have is the the necessity of drivers understanding the passive initiators, I do think the drivers should understand the Host OS and associated initiators, in order for the drivers to implement what they need.

4. Having said the above, I do agree with ITL concern raised by Tom, just dropping my thoughts

      • Based on driver's response, the orchestrator can understand what associated ports are used in the masking.
      • API to generate ITL's will inturn accommodate the passive initiator list as part of return ITL's .

II. Please add design details of how we are going to handle existing workarounds made by Julian in customer environments. The way it had been put up now is very tricky to align with current proposed design, I will approve the design, after hearing from you guys.







    • The members of the CoprHD Technical Steering Committee (TSC) have been automatically added to your reviewer list
      • A majority of them must approve this document before the feature can be merged to a project integration branch (e.g. integration-*, release-*, master)
      • Please un-check the boxes next to their names when your document is ready for their review
    • Please refer to Design Approval Guidance for information on additional reviewers who should be added and the approval process

This falls in the category of invalid configuration and not supported

  • No labels

12 Comments

  1. Let me specifically elaborate on my position from the call yesterday with what I am willing to approve or not approve.

    1. I will not approve a design that introduces a new CF Virtual Machine that differs only by having a virtualMachineName or something similar. I would expect that there be a virtual machine attribute or host type attribute introduced in the Host CF. Note that it does not preclude having a VirtualMachine service and ViPR API interfaces that are Virtual Machine specific. The only valid reason I can think of for a Virtual Machine CF was if you wanted to specify the relationship that a Host hosted certain Virtual Machines, i.e. the set of Virtual Machines contained in a Host. Even here though it's not clear a new CF is warranted.
    2. You must be careful in the design to not propogate the knowledge of which initiators are active and which are passive to the Orchestration layer. This state can change outside of Vipr at any time and our export provisioning, in particular, should not be "only provisioning the active initiators", as we we come back later to do another operation, they roles might have reversed. I will not approve a design that has this flaw.
    3. I do not want to see "the passive initiators added by the storage drivers". This is because: a) it has to be repeated for each storage driver; b) it violates #2 above where the roles might change; and c) it would make things like displaying the ITLs show an incomplete list of what was actually provisioned. Upon completion I would expect that the ITLs (as viewed from the Volume or Export Group) would show all initiators, both active, and passive. However I am not saying that there cannot be minor changes to device drivers if required, but I do discourage that. In my view the Southbound SDK should be usable for Virtual Machines without changes. I will not approve a design where this isn't true.
    4. I do believe you will need a layer to deal of code to deal with the active and passive initiators. I want this code to be fairly centralized, and very clearly explained in the design. I want to know the design choices, such as a) is it required that the active and passive initiators all be passed into export operations (my preference), or b) there is code to intercept the initiators and add the associated initiators (not my preference unless you can convince me why). If a) is chosen that implies to me that this "active/passive initiator layer" will be higher up, in the Compute Service layers. If b) is chosen then I want to know where this code will be located in the stack.
    5. I do believe that you will need to change the port allocation pieces such that active and passive initiator pairs are provisioned to the same ports.So I do expect changes to the BlockStorageScheduler and StoragePortsAssigner. A good implementation might make a subclass of the StoragePortsAssigner to deal with active / passive pairs. I will be glad to hold targeted design discussions with you about how to approach this.

    Please communicate directly with me when you feel these items are addressed in the design document, or if you want to disagree with any of my views.

    One more extremely important comment. Thank you so much for having the design discussion phone call early in the process where hopefully you didn't feel the design is finalized. I would like these discussions to happen earlier in the process so that we can revise designs before we have spent significant effort in what may be the wrong direction. I would even encourage two design calls, one early in the process, and another as you indicated after the design is more finalized.

     

    Thanks, Tom

    1. Thanks for the feedback Tom. 

  2. Reviewers, after the upcoming design review meeting, please give your feedback as soon as you can.  Deadline for review is 9/29 so we don't get too far behind and the team gets what they need.

  3. Amit, Nimai, and team,

    I believe that Julien Pialet and Sirpis, Andrew have worked with customers running current GA code to implement a workaround.  They can provide more details but essentially they have created a pseudo-cluster with both the active and passive initiators.  I would imagine that those customers would like to move to this native solution.  Is there anyway that could be done non-disruptively?  At a minimum we should test the steps necessary to "move" an existing host to this new implementation and understand the behavior and perhaps block end users from getting into a DU situation.

    1. Correct. Workaround that was used at customer sites :

      1- Create manually Host1-ACT and add the active HBAs

      2- Create manually Host-LPM and add the passive HBAs

      3- Add manually the HBAs to the Networks in VIPR (if the "active" HBAs are not seen in the Networks, for example if the LPAR is offline, you need to add them)

      4- Create a Cluster Host1 and add both Host1-ACT and Host1-LPM

      5- Export a volume to the cluster = in "Shared" mode. No export should exist to any of the host before this Step, otherwise the StoragePort allocation will take different frontend ports...


      So as part of "Use case 2", we have to think of a way to "migrate" online such "AIX" hosts into the new model "Virtual Host".

      1. We tried to simulate customer environment and came up with below steps to successfully migrate such LPAR hosts to new model Virtual Host.

        (Cluster name is Host1 and two hosts in it are Host1-ACT and Host1-LPM)

        After upgrading to new version of CoprHD (X-Wing SP1) follow the below steps.

        1. Go to volume resource page, select the volume exported to cluster Host1 and choose “Inventory only” for deleting this volume.
        2. Go to cluster Host1 and remove host  Host1-ACT and Host1-LPM. (This will de-link the hosts from the cluster)
        3. Go to host Host1-ACT and delete its initiator. Do the same for Host1-LPM.
        4. Go to host list page and delete host Host1-ACT and Host1-LPM  (make sure Detach Storage: is not selected)
        5. Go to cluster list page and delete cluster Host1  (make sure  Detach Storage: is not selected)
        6. Create a host  Host1New manually using the new API/GUI  (virtualmachine=true, Discoverable=false)

        7. Add initiator pair (active and passive ) to Host1New using new API/GUI to add paired initiator.
        8. Ingest the volume deleted in step #1 into CoprHD by following the steps below:

          1. Discover Unmanaged Volumes by selecting the storage system.

          2. Goto the catalog: Ingest Exported Unmanaged Volumes.
          3. Select storage type as “exclusive” and select Host1New from the host dropdown list.
          4. Select the volume to be ingested and order it.
  4. Comments received during the design review on September 20, 2016Answers/Justifications
    There might be performance issues when we use creating two initiators and linking them in one step. Its better to use the existing REST API to create Initiator twice and link them later.
    1. If the volume is already exported to the pair of initiators of a host, then user wants to add one more pair of initiators to the same host, then if users adds initiators one by one, then there is no guarantee that same set of storage-ports will be assigned to the these initiators as export operation gets triggered for each initiator.
    2. It will be easier for the user to add two initiators and link them in a single step.
    If we remove all the active initiators from a host, then the associated initiators (passive initiators) will also get removed and the export mask should be deleted also.Yes. We tested this scenario today. And all the initiators got removed from host and the export mask also got deleted.
  5. Here are a few comments I have on Stalin's comments above (in the approval table):

    1. "what if customer wants to maintain a separate IG for associated initiators?" – I don't see anything in the revised design that keeps the device driver from doing this if it desires
    2. "What if customer wants to create a new MV for associated initiators?" – I don't see anything in the revised design that keeps the device driver from doing this if it desires (although in this case there would be two masking views per ExportMask)
    3. "What if customer doesn't want to mask associated initiator for certain type of OS?" – This unfortunately is must be a broader question, as it affects zoning as well. So in my view then the only one of the associated initiators is passed into the export operations.
    4. "What if customer doesn't want to add associated initiators to masking, because he doesn't want that set of volumes to be participating in migration?" – Then my view is they're simply not in the export request.

    So what I think is this:

    a. Both of the associated initiators are passed into the export_create / export_update operations.

    b. The orchestration or masking layer is not impacted by the associated initiators 

    c. Port allocation and computation of paths used is impacted by the associated initiators in that it will assign the same ports to both of the associated initiators, and not count them as duplicate paths

    d. The driver may or may not take different actions for associated initiators (if they are present in the masking view) as it sees fit.

  6. Here are my comments after another reading of the wiki page:

    1. "Create initiator pair for the host" – I am still of the opinion this is redundant and all that is needed is the operation to pair two initiators: "Link (pair) one initiator to another initiator".
    2. "A LPAR already discovered in CoprHD as a normal Host with volumes exported to it. Customerupgraded CoprHD to new version.' Are you proposing a customer could upgrade his existing exports to use paired initiators after installing SP1? What operation would do that exactly?
    3. The flowchart for the StoragePortsAssigner looks roughly ok. You should run the StoragePortsAssignerTest program, and add a use case in it for associated ports before make a pull request please. (Otherwise I'll likely insist that you do it then... if you need help with the test please email.)
    4. I don't see why the NetworkDeviceController and scheduler need any changes whatsoever. If you have two paired initiators Ia and Ib all you need are separate entries in the zoning map for Ia→{P1,P2} and Ib→{P1,P2}. Please explain why any changes are needed in the NetworkDeviceController.
    5. The BlockStorageScheduler funtion calculateExportPathParamForExportMask is also impacted by this design (so as not to count associated initiators as duplicate paths), and must be updated as part of the implementation.
    6. The class ExportPathUpdater is also likely impacted by the changes in path parameter calculation and will likely need to be updated and tested.
    1. Thanks, Tom for your feedback. Here are my replies to your queries.

      1. Problem of NOT having REST API that creates a pair of initiators in a single step

        Consider a host (H1) with volume exported to it. Currently it has I1 and I2 as initiators. This means an exportGroup for H1 already exists.

        Now the user decides to add a new pair of initiators (I3 and I4) to this host. So, he invokes the createInitiator() REST API twice to add I3 and I4 to Host H1. (Note that I3 and I4 are not linked to one another; they do not exist in a pair).
        a. When I3 is added to the host, updateExportGroup is triggered and export mask and zoning map are updated.
        b. Now when I4 is added to the host, updateExportGroup() is triggered again. Now I4 may not be assigned to the storage ports assigned to I3.

        If the user links I3 and I4 after the above steps, it defeats the purpose of assigning same storage ports to the initiator pair. It means that the logic of paired initiators works if initiators are already paired before they become a part of export mask.

      2. Customers are using the workaround mentioned by Julien above. We are carrying out some tests to simulate customer environment and come up with the steps required after upgrade to SP1. We will provide more updates soon.
      3. We will execute the StoragePortsAssignerTest  and add the use case for associated initiators.
      4. Regarding the changes in NetworkScheduler and NetworkDeviceController

      Based on our understanding of active and passive initiators of LPAR, the passive ones are not logged into the switch and hence they are not discovered by fabric managers. So, passive initiators are not part of any network/VSAN. Since active and passive initiators are paired, then to ignore the network connectivity check for passive initittiators, we are checking their respective active initiators has port connectivity. We will revert the changes in these classes if our understanding is wrong.


      5. BlockStorageScheduler: calculateExportPathParamForExportMask:  Thanks for pointing out. We are working on this change.

      6. ExportPathUpdater: We will make the make the changes if required and test it accordingly.

  7. Use Case examples:  I can get slither or database backup if that would help as well.  Please just let me know.


    For a lpar1, two hosts are registered/created in vipr

    Lpar-1a (active wwns)

    Lpar-1b(passive wwns)

    And a cluster is created with all the both hosts above

    Lpar-1

    So for a standalone host, both the rootvg exports and datavg exports will be done to this pseudo-cluster/host Lpar-1



    Say for a case like actual cluster between Lpar-2 and Lpar-3, there will be individual hosts with active and passive wwns as above.

    Lpar-2a (active wwns)

    Lpar-2b(passive wwns)

    Lpar-3a (active wwns)

    Lpar-3b(passive wwns)

    And to export the shared volumes, there will be a shared cluster with all the above hosts..

    Lpar-2-3-CL


    And to do the rootvg exports, since you cannot have cluster for just the 2 and 3 as the individual hosts (2a,2b,3a,3b) are already part of the combined cluster, you will have to export rootvg in two steps for each host. First export to Lpar-2a and then again export the same volume for Lpar-2b. Same process for Lpar-3 rootvg export.

    So for an actual cluster like this, you wil have 5 export groups

    Lpar-2-3-CL

    Lpar-2a

    Lpar-2b

    Lpar-3a

    Lpar-3b



    See screenshot from their Prod instance for one of the clusters they have..attached















    shared exports are done to lpar1 cluster (that has all the 4 WWNs active and passive)

    Rootvg volumes are exported

    1. Migrating to new model for managing LPAR consist of two steps(operations).

      1. Remove the LPAR's volume(s) and related entry only from CoprHD without modifying array, switch and host side configuration.(to avoid any DU scenario).
      2. Ingest same volume(s) into CoprHD host with paired initiator.


      We may get some complexity in second step(ingestion of volume) based on different different use case(simple use case mention by Julien works and tested by us on VMAX/brocade set-up).

      For this particular case, below are some high level steps for volume(s) ingestion.

      On  upgraded CoprHD (after cleaning existing LPAR related entry ) follow the steps below.

      1. Create two virtual host Lpar-2 and Lpar-3
      2. Create associated pair initiator for hosts.
      3. Create a cluster Lpar-2-3-CL and add host Lpar-2 and Lpar-3.
      4. Discover the unmanaged system .
      5. Ingest Exported unmanaged volumes into CoprHD.
        1. Ingest respective rootvg volume in exclusive mode to each host(Lpar-2 and Lpar-3) one by one.
        2. Ingested datavg volume in shared mode to cluster Lpar-2-3-CL.

      For safe side it should be tried on test environment before doing into production.


      On new CoprHD set-up only ingestion steps can be tried out to test ingestion of unmanned volume in particular use case  works or not. If it works then actual migration can be tried out in test environments and any other environment.