Skip to end of metadata
Go to start of metadata

Johannes Chrysostomus Wolfgangus Theophilus* Mozart

27 January 1756 – 5 December 1791

Orchestrator/Composer from Salzburg, Austria


*Theophilus means 'loved by God' in Greek, but he preferred the Latin 'Amadeus'


High Level Functional Overview and Design Specifications

Problem Statement

CoprHD Controller (CoprHD-C) currently provides workflows via our Service Catalog which are fixed, shipped via a given release train and cannot be extended to include operations outside of predefined sets of workflows driven by CoprHD Engineering.

Users wishing to extend or create new customized Service Catalog workflows can do so only through a formal request to CoprHD-C Engineering, resulting in a formal release train to deliver said changes or by using the existing 

published REST APIs to create new workflows that are run and managed external to the CoprHD-C Cluster. Feedback has highlighted the following areas that currently present issues with the existing Service Catalog solution:


    • Lacks flexibility with regards to Service Catalog offerings.
    • Lacks customizability to meet the needs of User's DevOps environments.
    • Lacks integration solutions for existing DevOps processes and frameworks.

Functional Requirements

With Custom Services users will have the ability to:


    • Develop new customized Service Catalog workflows 
    • Meet their DevOps/Datacenter processing requirements without a new version/release of CoprHD-C having to be developed and delivered.
    • Utilize a new visual tool that allows for new workflows to be constructed by simply dragging & dropping building blocks and linking them to create a processing sequence .
    • Take advantage of the existing infrastructure in CoprHD-C for triggering and monitoring the progress of a Service Catalog workflow as an "Order."
    • Publish new Service Catalog workflows that show up in the Service Catalog to be consumed by other DevOps personnel.


JIRA IDDescriptionComments, notes, etc.

COP-22637

Custom Services - "Mozart"Epic

Glossary

Ansible - An open source automation platform used for configuration management, application deployment and  task automation.

Ansible Playbooks - Playbooks are Ansible’s configuration, deployment, and orchestration language. 

Workflow - The sequence of steps/building blocks through which a piece of work passes from initiation to completion.

Workflow Builder - A graphical tool used to visually design, validate, test and publish new Workflows to the Service Catalog.

Workflow Definition - JSON used to describe a Workflow which is generated by the Workflow Builder and used for execution.

Operation - A predefined step/building block that can be added to a workflow during workflow creation. An Operation can have inputs, outputs, & a result status backed by an implementation.


    • Ansible Local Operation - Provides the building block to execute an Ansible Playbook locally.
    • Ansible Remote Operation - Provides the building block to execute and Ansible Playbook on a remote Ansible Server via SSH.
    • CoprHD REST Operation - Provides predefined building blocks synonymous to existing CoprHD-C REST APIs.
    • REST API Operation - Provides the building block to make REST calls.
    • Shell Script Operation - Provides the building block to run Shell Scripts 

Service Catalog - The Service Catalog presents the service categories, which contain a set of pre-configured services appropriate to the storage operation to perform.

Design Approach / High Level Implementation Details

Overview

With the introduction of Custom Services, users will have the ability to build/define new workflows and publish them to the CoprHD-C Service Catalog .

Upon publishing these workflows, users can trigger them and track their progress as they would any Order from the Service Catalog 

A Workflow Builder would be introduced to help users visually design, validate, test and publish new Workflows to the Service Catalog. Through the use of a "drag and drop"

style interface, users will have the ability to:

    • Define new building blocks/operations.
    • Use existing CoprHD-C REST APIs in creation of new workflows.
    • Upload and Define an Ansible Local building block/operation that contains Playbooks which can use the local instance of Ansible running on the CoprHD-C nodes.
    • Define an Ansible Remote building block/operation that can call out to a remote server running Ansible to run Playbooks already deployed in the user environment. These can be used in creation of new workflows.
    • Upload and Define a Shell Script building block/operation that runs Shell Scripts that can be used in creation of new workflows.
    • Define a REST API building block/operation that makes REST calls that can be used in creation of new workflows.

The following provides examples of the Workflow Builder:

Custom Services - Workflow Builder: Entry Point

Custom Services - Workflow Builder: Tree Control of Workflows & Operations 

  


Custom Services - Workflow Builder: Creation of new Operations


Custom Services - Workflow Builder: Workflow Creation

 


Custom Services - Workflow Builder: Uploading and Defining new Shell Script Operation Wizard

  

Custom Services - Workflow Builder: Uploading and Defining new Local Ansible Operation Wizard

   

Custom Services - Workflow Builder: Uploading and Defining new Remote Ansible Operation Wizard

 

Custom Services - Workflow Builder: Uploading and Defining new REST API Operation Wizard

  

Custom Services - Workflow Builder: Workflow Lifecycle (Save, Validate, Test, Publish)

     

Custom Services - Workflow Builder: Importing Workflow

Personas

Custom Services - Developer

    • Define workflows
    • Create any scripts for implementing workflow tasks
    • Create a new catalog entry and associate it with the workflow
    • No code changes required
Custom Services - Admin
    • Lock down fields against which workflow is to be run
    • Set ACLs for Operator access

Custom Services - Operator

    • Execute workflows that operators have access to

Support Model

Users who have difficulty creating workflows will require support.  Given the flexibility of the solution, the range of issues will vary widely.  To mitigate the impact, a support model will need to be defined, including things like:


    • Creation of a Developers guide.
    • Delivery of Workflow examples to be used as templates.
    • Professional Services engagement.
    • Troubleshooting and Diagnostics Guide.

Integration with Service Catalog

Users will have the ability to Publish workflows from the Workflow Builder that have gone through Validation and Test, into the CoprHD-C  Service Catalog with the flexibility to organize and group them as seen fit. See example below:

The new Service Catalog entry will behave like all other Services in the Service Catalog in terms of order execution and feedback to the end users.

Workflow Validation

The validation of the workflow ensures that the workflow designer has accomplished the work that is intended for execution. As stated earlier, in the life cycle of the workflow, validation plays an important role which is synonymous to the compiler in an IDE, where the designer can catch all the errors in the workflow, such as the following, before execution: 

    • Workflow is acyclic and strongly connected with only one root node being the ‘START’ node and only one child node being the ‘END’ symbolizing that the workflow execution starts from start node, executing the steps defined – taking the path per the success/ failure criteria - and ends in the END node. There are no other forest that exist in the workflow.
    • Ensure that the inputs are properly configured for each step. Stating couple of input validation rules: 
      • The names that will be displayed in the order page must be unique
      • The mapping of the current step's input from another step input/ output should be from the current step’s ancestors’ 


Open Source Considerations

Currently the Workflow Builder part in the UI is planned to be an OSS component that we will create/customize.

ASD OSS Procedures

The following new components will be added to the appliance:

Exclusions and Limitations

    • Fail-over recovery: if a node fails, existing tasks will fail and not be restarted on other nodes, and workflow will fail (but workflow may be restarted by client/user and completed if other nodes in cluster are available)

Implementation Strategy

Implementation Timeframe

    • Skywalker Release

Agile Team

Testing Strategy

Documentation Strategy

Documentation assistance will come in a number of forms:


    • Developers Guide - A document that describes features, functionality & the process for creating workflows using the OE
    • Sample Workflows - Sample workflows will be provided, allowing users to see examples of how to implement various workflows
    • External documentation links - Since 3rd party components are used, references & links to documentation for components like Ansible will be provided
    • Troubleshooting and Diagnostics Guide - To enable serviceability and diagnostics of the Feature.

Updates to existing documentation may include:

    • Architectural documentation.
    • Deployment documentation.
    • Support Matrix.
    • Upgrade Procedures.
    • 3rd Party Product components/declarations.

Impact Evaluation

Public APIs and public functionality (end-user impact)

      • Existing Catalog API will support Custom Services

        The following new REST APIs would be added for Custom Services 

        MethodURLDescriptionPayload
        API for Primitive operations:
        POST/customservices/primitives?type=<primitive type>Create a new primitive Valid types are: 'script', 'ansible', 'rest', 'remote_ansible'


        <primitive_create_param>
          <attributes>
            <entry>
              <key>playbook</key>
              <value>tomcatInstall.yml</value>
            </entry>
          </attributes>
          <description>Install Tomcat with minimum config</description>
          <friendly_name>Tomcat install</friendly_name>
          <input>
            <entry>
              <key>input_params</key>
              <value>
                <input>input1</input>
              </value>
            </entry>
          </input>
          <name>Tomcat install</name>
          <output>output1</output>
          <resource>urn:storageos:CustomServicesDBAnsibleResource:9fa4f151-f6a1-444a-a634-b7f534569061:vdc1</resource>
          <type>ansible</type>
        </primitive_create_param>
        GET/customservices/primitives?type=<primitive type>

        Get the list of primitive names/IDs

        Optionally primitives can be filtered by appropriate type: 'vipr', 'script', 'ansible', 'rest', 'remote_ansible'


        GET/customservices/primitives/{id}id of the primitive
        PUT/customservices/primitives/{id}


        <primitive_update_param>
          <attributes>
            <entry>
              <key>playbook</key>
              <value>tomcatInstall.yml</value>
            </entry>
          </attributes>
          <description>Install Tomcat with minimum config</description>
          <friendly_name>Tomcat install</friendly_name>
          <input>
            <add>
              <entry>
                <key>input_params</key>
                <value>
                  <input>in2</input>
                </value>
              </entry>
            </add>
            <remove>
              <entry>
                <key>input_params</key>
                <value>
                  <input>in1</input>
                </value>
              </entry>
            </remove>
          </input>
          <name>PP_ANsible</name>
          <output>
            <add>out2</add>
            <remove>out1</remove>
          </output>
          <resource>urn:storageos:CustomServicesDBAnsibleResource:9fa4f151-f6a1-444a-a634-b7f534569061:vdc1</resource>
        </primitive_update_param>
        POST/customservices/primitives/bulkList of ids


        <ids>
          <id>
        urn:storageos:CustomServicesDBAnsiblePrimitive:d89a27fb-8f2b-45f5-a115-5bb838b3ea12:vdc1
        </id>
        </ids>
        POST
        /customservices/primitives/{id}/deactivate
        Id to delete
        GET/customservices/primitives/resource?type=<primitive resource type>&parentId=<associated parent resources, if any>valid primitive resource type are: 'script', 'ansible' , 'ansible_inventory'
        POST/customservices/primitives/resource?type=<primitive resource type>&name=<resource name>&parentId=<associated parent resources, if any>content-type is
        application/octet-stream
        The byte stream of the resource to be uploaded

        PUT/customservices/primitives/resource/{id}?name=<resource name>&parentId=<associated parent resources, if any>

        GET/customservices/primitives/resource/{id}URI of the resource
        GET/customservices/primitives/resource/{id}/downloadResource byte-stream is obtained
        APIs for Custom Services Workflow operations:
        GET/customservices/workflows?status=<status>&primitiveId=<URI of primitive>Status of the workflow can be NONE, VALID, INVALID, PUBLISHED
        GET/customservices/workflows/{id}URI of the workflow
        POST/customservices/workflows


        <custom_services_workflow_create_param>
          <document>
            <name>CreateVolume and send email</name>
            <description>Create block volume and send email</description>
            <steps>
              <step>
                <id>Start</id>
                <next>
                  <default>1dmnhn</default>
                </next>
                <friendly_name>Start</friendly_name>
                <position_x>1458</position_x>
                <position_y>1791</position_y>
              </step>
              <step>
                <id>1dmnhn</id>
                <operation>urn:storageos:CustomServicesViPRPrimitive:BlockServiceCreateVolume:</operation>
                <description>Create volume</description>
                <type>vipr</type>
                <inputGroups>
                  <input_params>
                    <input>
                      <name>volume_create.count</name>
                      <type>InputFromUser</type>
                      <required>true</required>
                      <locked>false</locked>
                      <friendly_name>Volume Count</friendly_name>
                      <input_field_type>number</input_field_type>
                    </input>
                    <input>
                      <name>volume_create.name</name>
                      <type>InputFromUser</type>
                      <required>true</required>
                      <locked>false</locked>
                      <friendly_name>VolumeName</friendly_name>
                      <input_field_type>text</input_field_type>
                    </input>
                         ...
                  </input_params>
                </inputGroups>
                <output>
                  <name>tasks.task.op_id</name>
                  <type>STRING</type>
                </output>
                <output>
                  <name>tasks.task.resource.name</name>
                  <type>STRING</type>
                </output>
                <output>
                  <name>tasks.task.resource.id</name>
                  <type>URI</type>
                </output>
                ...
                <next>
                  <default>Email</default>
                </next>
                <friendly_name>Create Volume</friendly_name>
                <position_x>1419</position_x>
                <position_y>1867</position_y>
                <success_criteria>#task.state = ready</success_criteria>
              </step>
              <step>
                <id>Email</id>
                ...
              </step>
              <step>
                <id>End</id>
                ...
              </step>
            </steps>
          </document>
        </custom_services_workflow_create_param>
        PUT/customservices/workflows/{id}id - URI of the workflow


        <custom_services_workflow_update_param>
          <document>
            <name>Create Volume, export to host and send email</name>
        …
            <step>
              <id>1dmnhn</id>
              <operation>urn:storageos:CustomServicesViPRPrimitive:BlockServiceCreateVolume:</operation>
              <next>
                <default> ueqgxr </default>
              </next>
              …
            </step>
            <steps>
              <id>ueqgxr</id>
              <operation>urn:storageos:CustomServicesViPRPrimitive:ExportGroupServiceCreateExportGroup:</operation>
              <description>Create block export</description>
              <type>vipr</type>
              <inputGroups>
                <input_params>
                  <input>
                    <name>block_export_create.clusters.@cluster</name>
                    <type>Disabled</type>
                    <required>false</required>
                  </input>
                  <input>
                    <name>block_export_create.hosts.@host</name>
                    <type>AssetOptionMulti</type>
                    <value>assetType.vipr.host</value>
                    <required>true</required>
                    <locked>false</locked>
                    <friendly_name>Host</friendly_name>
                  </input>
                  <input>
                    <name>block_export_create.initiators.@initiator</name>
                    <type>Disabled</type>
                    <required>false</required>
                  </input>
        …
                </input_params>
              </inputGroups>
              <output>
                <name>task.op_id</name>
                <type>STRING</type>
              </output>
        …
              <next>
                <default>Email</default>
              </next>
              <friendly_name>Create Export Group</friendly_name>
              <position_x>1434</position_x>
              <position_y>2006</position_y>
              <success_criteria>#task.state = ready</success_criteria>
            </steps>
        …
          </document>
        </custom_services_workflow_update_param>
        POST
        /customservices/workflows/{id}/deactivate
        Workflow id to be deleted
        POST/customservices/workflows/{id}/publishWorkflow to be published. Only valid workflows can be published.
        POST/customservices/workflows/{id}/unpublishWorkflow to be unpublished. Published workflows which are not associated with catalog service can be unpublished
        POST

        /customservices/workflows/{id}/validate

        Workflow to be validated
        POST/customservices/workflows/bulkList of workflow ids


        <ids>
          <id>
        urn:storageos:CustomServicesWorkflow:d62d55be-ae31-4be2-a336-627bc1f69c11:vdc1
        </id>
        </ids>
        POST

        /customservices/workflows/import?directory=<workflow directory to be import the workflow>

        The workflow tar file to be imported.
        POST/customservices/workflows/exportThe selected workflow and associated primitives and resources are exported as tar file.

        APIs for Custom Services Workflow Directory operations:

        GET/customservices/workflows/directoryGets the list of all directories
        POST/customservices/workflows/directory/bulk
        <ids>
         <id>
        urn:storageos:WFDirectory:e0ce0751-94ec-4f02-8b92-b0cbf7c5b59d:vdc1
        </id>
        </ids>
        GET/customservices/workflows/directory/{id}Gets the specified workflow



        POST/customservices/workflows/directory
        <wf_directory_create>
          <name>Block</name>
          <parent>urn:storageos:WFDirectory:5097cb65-6c64-42a8-8ac0-65e0b2460941:vdc1</parent>
          <workflows>
            <workflow>
        urn:storageos:CustomServicesWorkflow:d62d55be-ae31-4be2-a336-627bc1f69c11:vdc1
        </workflow>
          </workflows>
        </wf_directory_create>
        POST/customservices/workflows/directory/{id}/deactivateID to delete
        PUT/customservices/workflows/directory/{id}


        <wf_directory_update>
          <name>File</name>
          <parent>urn:storageos:WFDirectory:7432cb65-6f23-42t5-8ac0-68k8b2430951:vdc1</parent>
          <workflows>
            <workflow>
        urn:storageos:CustomServicesWorkflow:sd55be-ae31-4be2-a336-627bc1f69c11:vdc1
        </workflow>
          </workflows>
        </wf_directory_update>

Other components and component interaction

The following new components will be added to the appliance:

       The following open source component is not added to the appliance but it used during build to generate the ViPR primitive java classes

OSS Request Submitted:

Persistence model

    • Custom services workflow document
      • The workflow document is a JSON document describing the steps of the workflow along with meta data like name and description.  
      • The steps are persisted as a JSON blob in the database
      • The metadata is stored in individual columns so that it is searchable/indexable
    • ViPR primitives
      • The ViPR primitives are one to one mappings with ViPR REST APIs
      • The ViPR primitives are generated java classes and are compiled in the vipr-primitives.jar.  They are not persisted to Cassandra
    • User defined primitives
      • In addition to the ViPR primitives the user can also define their own primitives of several types: REST API, Local Ansible, Remote Ansible, and Shell
      • Each primitive that the user defines is persisted in the Cassandra database
      • Each primitive type has it's own column family
      • Primitives are meta data that describe an operation (i.e. the input/output, name, etc.)
    • Uploaded resources
      • Resources that the local Ansible and shell primitives reference are uploaded by the user and saved as byte array blobs in the database.  Each type of resource has its own column family.
      • Local Ansible primitives have a reference to a tarball that contains the ansible playbook(s) and the roles etc.  The resource also references one or more inventory files that the user uploads separately.
      • Shell script primitives have a reference to a shell script file.


      Following is the list of new column families that are added:

      New column familyDescription
      CustomServicesDBScriptPrimitiveTo persist the details of the user defined script primitives
      CustomServicesDBRemoteAnsiblePrimitiveTo persist the details of the user defined remote ansible primitives
      CustomServicesDBAnsiblePrimitiveTo persist the details of the user defined local ansible primitives
      CustomServicesDBRESTApiPrimitiveTo persist the details of the user defined Generic REST primitives
      CustomServicesDBScriptResourceTo persist the details of the user defined shell script resources. The script is stored as byte array
      CustomServicesDBAnsibleResourceTo persist the details of the user defined local ansible resources
      CustomServicesDBAnsibleInventoryResourceTo persist the details of the user defined local ansible's inventory resource
      CustomServicesWorkflowTo persist the details of the Custom Services Workflow

      WFDirectory

      To persist the details of the Custom Services Workflow directory information

Upgrades

Initial upgrade: Adds new DB column families and code.  No issues here

Future upgrades:

    • Ansible
    • Playbooks that were uploaded need to be preserved
    • Upgrades to Ansible itself may break existing playbooks - retain backward compatibility
    • Name/ID conflicts
    • If upgrade contains new services/playbooks, handle possible conflicts with user created workflow

Performance

    • No significant performance impacts from this feature are expected

    • The performance of loading the "workflow & operation" library has a dependency on how many items the user has defined.

Scalability and Resource consumption

    • Custom Services is distributed across all nodes and scales as cluster does
    • Load presented by running workflows is expected to not be significant, but should be verified in testing.
    • Concurrency of workflows will follow same rules/behavior as existing Service Catalog workflows..

Security

Custom Services provide users with the ability to run their own Shell or Ansible scripts on the CoprHD-C instance. Given the ability to run scripts on CoprHD-C, associated vulnerabilities can be introduced, intentional/un-intentional to the CoprHD-C instance. 

To avoid destructive scripts causing havoc in CoprHD node, Custom Services implemented chrooted jail for script order execution.

When users run a Custom Services order from Service Catalog, following steps are done to isolate the scripts and make sure that the CoprHD-C instance is safe from destructive scripts.

    • chroot jail environment will be setup during the initialization of CoprHD-C.
    • Custom Services create temporary order directory to execute the order.
    • chroot jail implement limited container concept of UNIX/LINUX be changing the root directory of given process. Since no process can access beyond its root directory, any scripts run on chroot jail will operate within the chroot directory.
    • When user run a Custom Services order either with Shell script or Ansible, Custom Services will invoke chroot jail environment, and set the root directory of process to its chrooted directory and the Custom Services will execute the script of its order inside chrooted jail.

High level flow charts:



Authentication Mechanism

    • We Authenticate remote target using username and Password for remote ansible execution. SSH Key authentication is not yet supported.
    • We only support HTTPS protocol for REST Operation Execution. 
    • Supported Authentication Mechanism for REST operation are Basic Auth (Using user and password) and Token Auth (Setting auth token in REST Request)

Deployment and serviceability

    • Troubleshooting and Diagnostics Guide to be provided
    • Installation Guide to be provided. Note: Installation of all dependencies should be transparent to the end users.

Developer impact, dependencies and conditional inclusion


    • Ansible Concepts and Ansible Playbook Development - New Ansible Operation/Building blocks creation.

Reviewed and Approved By

NameDateVoteComments, remarks, etc.

+1

I still have concerns , that we haven't executed a real controller workflow as part of the test cases.

Approving with the assumption that we will be running those.







June 5, 2017+1Comments were answered. Waiting on code review walkthrough before approval in case of any other questions.
Approved 2017 May 30+1

Review & provided comments May 25 2017. Inclined to approve, awaiting answers on comments.

.

Approved -June, 6th, 2017+1

For some reason, the ‘edit’ button was missing when I opened the design spec. So I will note my approval on the email here.

 I have reviewed the code and given code review comments to the pull request. I still have one question on how we can support loops.

 Approved

 Anil










    • The members of the CoprHD Technical Steering Committee (TSC) have been automatically added to your reviewer list
      • A majority of them must approve this document before the feature can be merged to a project integration branch (e.g. integration-*, release-*, master)
      • Please un-check the boxes next to their names when your document is ready for their review
    • Please refer to Design Approval Guidance for information on additional reviewers who should be added and the approval process
  • No labels

50 Comments

  1. Given the overall scope of the project, should we split it into multiple epics around

    a) The declarative language constructs and Orchestration Engine itself

    b) The WF builder

    c) Trial use cases for e.g. NDM

    d) Ansible or any other scripting Devops toolkit integration

  2. Do we think they could get delivered separately?

  3. I don't see how you cannot deliver a) b) and c) as a unit. Maybe ansible is separate?

  4. Do each workflow run as a job/task in an isolated environment (Ex: container) in order to avoid impacts of dependency between the workflows/tenants/libraries/files/versions and ViPR C host? An option is to exec them from within a container/chroot capsule and clean-up used capsules hourly/daily.

    1.  Each Task runs as ViPR Task. For Ansible/Script Task we are planning to use chroot or something similar approach and planning to have its space inside a order context dir. which will be cleaned after execution is complete or periodically. This will also give us some control to stop running any vulnerable commands on ViPR Node.

      For now we are not planning to run these inside container. But we are still exploring.

      Thank you for your input. Once we decide on something, I will definitely let you know before implementing.

      1. Sonali Sahu the chroot is probably the quickest/simplest way, but i don't think that is the best approach...

        The adoption of container should be the way to go. Follow below a proposition:

        • Phase1:
          • Create ENV Variables of two types: Protected (Password) and Unprotected (Generic).
          • Create Container image (coprhddevkit?) and make it available in CoprHD Controller (https://coprhd/executor-image.tgz).
          • Create :
            • 1: Executor wrapper inside of the container to notify ZK/CoprHD about the status of the on-going/running tasks.
            • or
            • 2: Poller (Poke/Polling) to check status of launched/scheduled tasks.
          • Create TaskRunner's ExecutorDrivers.
            • The same approach of existing ones for Operating Systems.
          • Create ExecutorDriver for Docker (API Call).
          • TaskRunner calls ExecutorDriverDocker with localhost as target.
        • Phase2:
          • Create ExecutorEndpoints and associate the proper ExecutorDriver.
            • ex: 10.0.0.1 = docker
          • Group ExecutorEndpoints as Clusters
            • Round-Robin among ExecutorEndpoints
          • Association/Affinity of Workflows and ExecutorEndpoints
        • Phase3:
          • Additional ExecutorDrivers: Swarm, Kubernetes, others...


        This approach would fit in almost every scenario and extend CoprHD capabilities to have not only actions being executed remotely, but also long-running-tasks that might be useful to run additional services. Ex: Rack Controller services like: DHCP, TFTP, etc...

        Any comments are welcome..

  5. Victor - last I knew (I haven't been involved on a daily basis for a number of weeks) all the workflow steps are run as ViPR ExecutionTasks, like all other SA services.  sullivan, shane might be able to describe details of how Ansible is exec'd by the task - and whether it is isolated.

  6. Steve and all, what is the status? If it is going into Skywalker, we need to get it approved. Has a design review call been held? If so please post the recording. If not, please schedule on (this week or next week if possible.) Is this ready for review? Are you actively working on it now? Thanks, Tom

    1. I haven't been working on this lately, but Aron Suliman and the rest of the team are actively pushing forward.

  7. Review comments 2017May25

    1. Looks like definitions "Ansible Local Operation" and "Ansible Remote Operation" are reversed to me. Typo?
    2. Are there any security issues with including ansible capability on the Vipr node? ... Later I see  you addressed this with the "chroot jail" solution. As I understand this is applied both to user scripts and ansible scripts, please advise if this is not the case.
    3. Good ideas in the section entitled "Support Model". Will the "Developer's Guide" and "Support and Troubleshooting Guide" be supplied as part of Skywalker deliverables?
    4. Open Source Considerations - have these been cleared with legal for open sourcing?
    5. Fail-over / recovery. In the event a user "restarts" a workflow on a new node after a node failure, the workflow itself must be coded in such a way that it can ignore operations that have already been done (i.e. things are idempotent) and proceed to the remaining Operations that need to be done?
    6. The video demo ignored what happens out of a "N" branch of an operation. Presumably "N" means the operation failed for something like create volume? It is up to the workflow designer to handle all error conditions? If nothing is specified for an "N" output of a node, what is the behavior if it is invoked, i.e. does the workflow immediately halt with a failure?
    7. Missing from this write-up that I would like to see:
      1. brief summaries of the new column families (specifically "Custom services workflow document", "User defined primitives", and "Uploaded Resources". 
      2. Please provide documentation on the new APIs used to implement these facilities as used by the GUI and Workflow Builder.
    8. One thing recently introduced into Skywalker is the ability to supply warning messages, for example when zones are not removed because we think they're in use externally, or optional steps are not completed, but we go on. Have any facilities been designed into this solution for reporting more than a "Success" / "Failure" result back to the user? Examples could include:
      1. User oriented log of what operations took place
      2. How are detailed error messages presented
      3. Warning messages
      4. Is there a way one could construct a preview operation, present results to user, and then get user input to "Approve & Continue" or "Not Approve & Stop"?

              I see these type features as being key to making Vipr more user friendly in the future, and as we are looking at providing more and more operations via Mozart in the future, it would be nice o have these sorts of features designed in. (This could be done post Skywalker... but I'm pushing you here to extend you're thinking to a more interactive and informative user interaction model.)

    Overall I am inclined to approve this design, but would like to see the above items addressed.

    1. On chrooted jail, yes this apply to both Ansible Playbook based scripts and shell/bash scripts.

    2. Responses for a few comments.

      (1) Yes they were reversed I switched them.

      (4) Ansible is approved.  The others are still pending approval.  The AOS JIRAs are attached.  Also, I noticed that java poet is missing from the list.  I'll add the details.

      (5) We leveraged the existing order execution engine that holds the execution context in memory.  If the node that is executing the order fails there is no failover/recovery and it will need to be started from the beginning as with other catalog services.  Implementing an orchestration engine was outside the scope of this project.  

      (6) If there is no failure path then the workflow does immediately fail.

      (7) Priya is going to compile this information and add it to the wiki.

      1. Thanks very much Shane.

      2. (7) Updated the APIs and CFs used in Custom Services

    3. 2) With Ansible we get great power. With great power we need guardrails. So we added the chroot so that the custom script(local ansible/shell script) cannot do any destructive operations to ViPRC file system. (e.g we restrict them from doing rm -rf /)

      3) We are definitely planning to provide some sample workflow which will be reference point for others to develop new workflow using customservice. It is definitely is a good idea to provide trouble shooting document.

      5) A Workflow is a single order for CustomService even if it has multiple steps in it. If the order fails and user decide to reorder, then the order will be executed from "Start" step again. It is the responsibility of the workflow designer to design the workflow to handle rollback/failover.

      6) You are right. N means a failure path and Y means a success path. If there is No N that means "if the step fails then the complete workflow fails with proper log messages on UI"

      8) a- We log message on UI for what operation we  are executing with step Id, type of operation.

         b- We show failure message on UI. But the detailed error message is logged in sasvc.log. All the log messages are appended with stepID for easier debugging. Moreover in the log file we write all the script/rest execution result.

         d- It is not in V1




  8. Tom Watson Please let us know if we have addressed you questions so that you can move forward with the approval process. Thanks!

  9. I would stil like just a bit more info about the column families, just whether they're all blob'd. That is, how the scripts, ansible, etc. is stored, serialized Java object, xml representation, etc. No need to provide detail about things like names or types, I think we get the idea in general and I'll look at the pull request. Approving. Thanks for the updates. Tom

  10. Tom WatsonThangs1, Bill ElliottAnil DegwekarEvgeny Roytman Trevor Dawe I am not sure that this is the concrete list of TSC member or things may have changed. I just want to make sure that the updates to the original design reaches everyone. We have addressed the questions and would like to know if anyone else has any questions on the updates. Thanks for your help! I am not sure if unchecking the names is causing a notification. 

    1. This is the correct list except for Bill Beltis. I will reach out to them (again) to review. Tom

  11. A few questions/comments on the design:

    1. Can an ansible and shell script be shared across folders?
    2. The REST API URLs start with /primitives – I feel we should start them with something like /workflow/primitives and so on. In general, everything related to custom workflow should start with /workflow or something like that. 
    3. For each API, indicate what is the required role to execute that API (SYS ADMIN/ TENANT ADMIN / PROJECT ADMIN etc.)
    4. What is the mapping between workflow personas (developer/operator etc.) and traditional CoprHD roles (e.g. SYS ADMIN / TENANT ADMIN etc.)
    5. Can a tenant admin create workflows which are private to only that tenant (not shared with other tenants)?
    6. Do we expect credentials for external ansible playbook servers to be embedded inside the workflow, or to be typed in by the user at the time of execution? Do we support both these methods?
    7. Is it possible to create a loop (e.g. three retries for an operation) using custom workflow primitives? Or is it expected to be embedded within the script/primitive?

    (end of questions) 

      1. If you mean ansible/shell script Operations/Primitives will be shared among workflows then Answer is yes. You can create ansible/script primitive once in any folder. Then you can reuse it any number of workflows. For Local ansible you can upload the package once and create each ansible primitive per playbook. You can reuse the ansible package.
        5- Workflow Builder once enabled, all the workflows will be visible to all the tenants/roles. Once the workflow is published we can create a catalog service and set the ACL if we want to. This will behave like our existing catalog services.

        6- If the designer wants they can embed the user/password/target in the workflow in default value. There is also option for the end user to give the credentials on order page during execution.
        7- For V1 we do not support any kind of loop. We will throw validation error if the workflow has any loop
    1. I also had a question about Anil's #2 comment above. Should these APIs belong under /catalog? (Example: /catalog/primitives or /catalog/workflows/primitives). This may also help avoid confusion with the current controller workflow APIs (/vdc/workflows).

      1. or might be /customservices/primitives or /customservices/workflow 

        sullivan, shane Periaswamy, Priya any comments on this?

        1. yes we can group all the custom service APIs under /customservices

          1. +1. This seems like a good plan.

  12. A specific question about error messages. If a task executed for a step gets an error, are the task error message(s) propagated to the order results correctly? This seems like a "must have".

    1. Yes custom service does that. For example if we create 2 volumes in a Custom service Step. first volume creation succeeded and 2nd one failed, then we declare the whole step as failed and with the error message from the failed Vipr Task.

  13. I would like to see at least one example of receipt how to build a real custom workflow in this document.

    For example, assume that user wants to add new service catalog workflow for SRDF volume split operation (this is not part of existing "Block Protection Services" catalog).

    Workflow has to let user to select one of existing  ViPR projects, let to define one of existing element types -- volume or consistency group, and let to select one of existing instances of the selected element type from the selected project. And finally, workflow has to execute request to either BlockService or BlockConsistencyService REST API to split the selected SRDF volume or consistency group. 

    1. Evgeny Roytman Were you able to see the demo that we did for Dell/EMC world? That would give you an idea.

      1. Sent internal email with the demo recording as well as Sprint Demos.

  14. Aron Suliman May be I missed the design review. Was the design review mail sent to all?  Can you share the design review recording?  I would like to go through design review.

  15. Tom WatsonStalinsaravanakumar ThangapalamBill ElliottAnil DegwekarEvgeny Roytman Trevor Dawe Are there any other questions to move forward with this approval? If everyone else is set from the TSC , please mark the design appropriately so that we can move forward. Thanks for your help!

  16. Below listed the general comments I have 

    1. Naming Conventions can be simplified, especially around the input types.(Multiple Selection, Single selection)
    2. How to Start demo video embedded inside ViPR would be helpful.
    3. Do we really need to open up all the Northbound APIs in the workflow builder? What's the most complex workflow we have tested locally?
    4. How do we guarantee that the database doesn't get screwed up , if user tries to stitch unsupported APIs together?
    5. What kind of negative test cases have been covered?

    Rest API Support - I see we have options to extract data from headers only? Would this be really useful?What kind of data we have seen which is been sent through headers? I would go with supporting only GET for now, till we are done with parsing the response body.

    From the test wiki https://asdwiki.isus.emc.com:8443/display/OS/Test+Plan+for+Custom+Service  , nested workflows are supported? Is there any restriction in building these nested workflows?The test cases referred in the wiki for nested workflows doesn't really make any sense.

    Does the workflow support loops? I see "InputsFromOtherSteps" available during the creation of workflow step (the demo video shows very basic), again what kind of real workflows we have tried and tested?

    1. (1) We struggled to find a naming convention that made sense to everyone.  Originally these were named 'Asset option' and 'Asset option multi-select' which made sense to us but not to people that did usability testing.  We came up with these which were more acceptable for UX. If you have suggestions they would be welcomed.  Please work with Hanna Yehuda if you have some ideas

      (2) Not sure Aron SulimanHanna Yehuda, or Goldstein, Perry do we already have the video?

      (3) No we are only going to release a subset.  For now we have lots in the list but we're working on limiting the list to what is useable in V1.  But I do think we need to get to the point where most APIs are available in future releases.  

      (4) I'm not sure what you mean?  Can you give an example?  How do we guarantee that the database doesn't get screwed up today? 

      (5) Dong, Andrew?


      We dropped nested workflows from V1.  I think the test case is actually talking about copy/paste workflows which also did not get done for V1.


      Loops are not supported in V1.  A loop in a workflow will cause validation to fail.  In order to support loops we need a way for the user to set a limit on how many times a loop is executed.  If we allowed infinite loops a user might accidentally create a workflow that provisioned all of the space on their array one night, for example, and they would be sad (sad) 

      1. #4. e.g.  Using the workflow builder user can s stitch together the independent APIs(resulting in certain unsupported configurations like cascaded snap, RP-SRDF..) which invoked in-turn has possibilities of screwing up the database.The bottom line is not all Northbound APIs can be stitched together, as invoking these incompatible APIs would result in the objects getting screwed up in database.

        Also, I would expect the wiki to discuss more about the architectural aspects of the Execution Engine.

        • A diagram which explains how the workflow engine operates.
        • How does the result of a step gets propagated to the next workflow step.
        • Can the result be parsed and manipulated before passing the same to next step?
        • Does user need to write new libraries in order to parse the result?

        It would be helpful, if you guys can share some details on the list of controller workflows tested with this feature. From the test wiki, I don't see any real workflows being tested at all. 

    2. Hi Stalin, test cases are created based on the checkpoints of the each component of this feature and checkpoints could be categorized into positive and negative. So negative test are created by categorized by the each component generally speaking. Take the component Workflow as an example, negative test cases are created based on a checkpoint that if there is a catalog service associated to a published workflow then such workflow would not be allow to un-publish.

      For the nested workflow mentioned the in the test plan wiki, it's actually a quoted "nested" which actually means copy&paste just as Shane explained, at the time the wiki written, we were planning to have it in V1 but finally decide to drop it so the wiki is kind of out-dated.

      Regarding the real workflow, we have some people contributing the proposals of the real workflow which are recorded here. Compositions, but for the V1 release, considering the feature scope and tighten QE resources, all my attention is on making sure each piece of the functionality work as expected. The real workflow cases would be covered with we moving into the V2 phases and the proposing composition being specific.

  17. Hey Stalin,

    Nested workflows would not be supported in the first release. The testcases would need to be revisited once we tackle this in the future release trains.

  18. Stalin, 

    1. For your comment: "Naming Conventions can be simplified, especially around the input types (Multiple Selection, Single selection)", we decided to use the full terms because we have single selection for ViPR Resources and also for User input. We could split this to 2 fields: where in input field we will have "ViPR Resource List" and "User Defined List" and have another field with Single/Multiple selection options, but we already got the feedback that we are starting to have too many fields to enter per input field. Therefore we combined them.
    2.  We already have a video. The video will be re-created again using the same script we used for the first version so we capture the latest version before DA. The idea is to have a link on the canvas when we create a new workflow to launch it. 
    1. "ViPR resource list- Multiple Selection" - Why can't we change it to "Select multiple ViPR resources"

      "ViPR resource list- Multiple Selection" → "Select single ViPR resource"


  19. I discussed this with Perry and she proposed the following changes:

    • Integer/String/Boolean/Password
    • Single selection from ViPR resource list
    • Single selection from custom list
    • Multiple selection from ViPR resource list
    • Input to a previous step
    • Output from a previous step
    1. Sounds better this time. Thanks.

  20. Many ViPR REST APIs return a task object. The caller has to keep polling the task object until it changes status to ready/error. In order to support this use case, I feel we need to support loops inside the custom workflow – or else how does one figure out whether the REST API is successful or failed? One way of course is to always use synchronous calls – but then that will not be an efficient way of doing things?

  21. 1) Can the user add test connectivity option to the custom workflow? This will help user validate the credentials provided by the provisioning user (user placing the order) to confirm the connectivity before placing the order. Maybe we can let the workflow developer user select ‘Test Connectivity’ option (if needed) while building the workflow.

    2) There seems to be 4 different buttons when user edits the workflow (Save, Validate, Test and Publish) and since both ‘save’ and ‘validate’ are not active simultaneously at any point in time, maybe we can consider having single button and when there are changes to be saved ONLY then we have save option and after user hits save, then the button can now read validate and be enabled for user. This would reduce the count if the same option toggles between save and validate as they cannot exist together.

    3) Along with the Save, Validate, Test and Publish, should we also have Cancel/Discard changes in case user is editing the workflow and intends to discard the changes done to the workflow.

    4) If user is editing a workflow and accidentally navigates to different page with the edits/changes, ViPR should alert user saying the workflow changes will be discarded if user navigates from the workflow edit page

    5) Should we consider ‘Read Only’ user persona, if customers would like to grant users read only privilege to review the published workflows. Maybe we can consider the persona “Custom Services - Admin” to have read only option to review the published workflows and operations.

    Testing related:

    6) Testing needs to capture the max number of steps (operations) that can be built into single workflow. We also need to capture the max supported custom workflows max steps in a single workflow, etc in the support matrix document (under ViPR Controller Limits) based on the test results.

    7) Capture result of importing a huge/large workflow with several user defined operations along with the single workflow.

    8) Have we planned for any performance and longevity tests for custom services work load and identified the performance cost of having the custom services turned on (if any).

    1. Once a workflow has been published, can it be modified ? ie edited and saved again ? and then re-published?
    2. Once a workflow has been published and used in a catalog service, can the workflow be edited and saved and re-published ?
    3. Thinking of a scenario where a workflow is created and used in a catalog service. An order is created from this service and is run successfully. Then the workflow is edited to change the steps. If the previous order is re-submitted, I guess  the set of steps will be different.
      I see there is no preview of an order in V1. But can the catalog service that uses a custom wokflow display a preview of just the step descriptions of the component workflow? This would help is cases like above, where a workflow changes and user may not be aware of the change?
    4. Is any specific role required for 
      1. building a custom workflow?
      2. creating a catalog service that uses a custom workflow?
    5. Does the workflow execute as a single task or are the steps in the workflow individual tasks ? Do we still have a mechanism to add the order Id tag to the tasks that run as part of the custom service order? 
    1. User will able to "Un Publish" a published workflow. 

      After unpublished, user will able to edit the workflow and re-publish again, this is supported


  22. 9) Should there be predefined workflow steps/operations that users can chose/use while building the workflow such as below,

    • Wait Step/Operation – This will temporarily halt the workflow execution for the time specified in the wait step and then resume the workflow with the execution of next step
    • If Step/Operation – This will allow the workflow developer to help validate the output of any previous step for certain conditions using the if step and make a workflow execution decision accordingly
    • Set Variable Step/Operation - This will allow user to set new values to any variables in case the user need to set new value to a variable (may be output of previous step) in the workflow.