Skip to end of metadata
Go to start of metadata

Johannes Chrysostomus Wolfgangus Theophilus* Mozart

27 January 1756 – 5 December 1791

Orchestrator/Composer from Salzburg, Austria


*Theophilus means 'loved by God' in Greek, but he preferred the Latin 'Amadeus'


High Level Functional Overview and Design Specifications

Problem Statement

The Catalog Services provide workflows for many operations, but do not allow users to customize the workflows, or include workflow steps involving external systems.

A user who wishes to implement custom workflows must do so by either:

    1. Developing a new Service in the catalog, using developer toolkit, or
    2. Implementing orchestration via the API, which may be more complex than desired.

which is beyond the skill level or resources of some users.

Customer Use Case

Here is a large sample customer workflow, showing the kind of scenario customers are requesting.

(click to enlarge)


Functional Requirements

Project Mozart will:

    1. implement a process for adding custom workflows (WFs)
    2. provide a visual tool to create WFs by dragging & dropping workflow steps into a sequence
    3. not require a development environment (i.e.: no recompiling of code).
    4. handle workflows made up of individually defined steps
    5. allow user to upload script to run as a WF step
    6. provide a way of triggering the workflow from the Service Catalog as a 'Order'
    7. provide a means for monitoring the progress of the workflow from the Service Catalog
    8. allow new services to be added to the catalog to trigger external workflows
    9. accommodate 3 roles of users:
      1. Developers, who create & implement the workflow
      2. Admins, who provide access to the workflow by tenants
      3. Operators, who run the workflows
JIRA ID1-Line DescriptionComments, notes, etc.

COP-22637

Provide an orchestration engine to allow users to create their own workflows.This is the epic for the effort

Glossary

Service Catalog - The catalog of predefined services (or new services extended from existing services)

Catalog Service - A single service in Catalog that runs a predefined workflow (typically making multiple API calls)

Task - an internal Task created by Controller as it carries out controller operations (typically creates multiple Tasks per API call)

Note: There is also the concept of an 'Operation', which multiple tasks can map to

Orchestration Service - The service in the Service Catalog that allows access to OE Workflows

OE - Orchestration Engine that contains all services for running OE workflows

Workflow - a group of sequenced OE Steps with default input values & result status

Step - a unit of work with inputs, outputs, & result status that has an implementation

Playbook - an Ansible script that executes remotely via SSH on managed nodes.  Can be run by OE Task

Extended Workflow - a workflow that includes remote execution of steps on a node

WF Builder - a graphical tool, used to define workflows

Workflow Definition -  JSON that describes the workflow (generated by WF Builder, and acts as input to the OE_Runner)

OE Runner - the engine that runs the workflow.  Is contained in a Catalog Service

Nested Workflow - a WF that is added, as a step, to another WF

Primitive - a pre-defined step to added to WF; may be dragged into new WF in  WF Builder; has properties to allow inputs/outputs to be defined

Root-Primitive - primitives can be extended from each other; ultimately all primitives extend from top level primitives know as root-primitives

REST Primitive - provides ability to make REST calls

Ansible primitive - provides ability to run playbook (which must be uploaded)

Workflow primitive - allow predefined WF to be nested, and run as a step in another WF

Design Approach / High Level Implementation Details

Overview

Users will be provided with the ability to define their own workflows (WFs) 

Users can run their WFs from the current Service Catalog

Users may use a graphical, drag-n-drop style tool ("WF Builder") to define their new WFs (made up of 'Steps')

Steps in WF Builder include ready-to-use steps that do things like
    • API Calls (CreateVolume, ExportVolume, etc)
    • External REST Call (user specifies URL, method, payload, etc)
    • Ansible Playbook (user uploads playbook to run)
      • Supports local (in cluster) & remote execution (Ansible runs remotely)

Step details

    • can be connected in any sequence
    • have inputs & outputs
    • can wait for other steps to succeed/fail
    • can inherit properties from other 'parent' steps
    • Workflows can call other Workflows as steps
    • Loops will be detected to prevent runaway workflows

Users can define steps that access external systems, as well as our own API 

Orchestration Service is a special catalog service that can run a workflow

Workflow has steps that can make REST calls

REST calls can go to our own API or an external system

Users may upload scripts in the form of Ansible Playbooks to provide logic for WF steps

Users will not be prevented from performing administrative functions (e.g.: discovery, etc)

A system user will be provided to run workflows, as required, to provide adequate security on the platform

Users/Actors/Roles

    1. Mozart Developer
      1. Define workflows
      2. Create any scripts for implementing workflow tasks
      3. Create a new catalog entry and associate it with the workflow
      4. No code changes required
    2. Mozart Admin
      1. Lock down fields against which workflow is to be run
      2. Set ACLs for Operator access
    3. Mozart Operator
      1. Execute workflows that operators have access to

Support Model

Customers who have difficulty creating workflows will require support.  Given the flexibility of the solution, the range of issues will vary widely.  To mitigate the impact, a support model will need to be defined, including things like:

    • Creation of a developer's guide
    • Delivery of examples & templates
    • Stating extent of support offered
    • Professional Services involvement
    • etc

Integration with Service Catalog

The Service Catalog will contain a new group of services called "Orchestration Services":

In that category will be a base service (which can be extended using the existing "Edit Catalog" feature):

The Orchestration Service will

    1. Take all form fields in the order and send the values as parameters to the OE Workflow
    2. Start designated workflow in OE
    3. Continuously update order with Tasks & Affected Resources that OE created
    4. Wait until OE Workflow and all related Tasks complete
    5. Behave like all other non-orchestration services in the catalog:

A sequence diagram details the process:

Integration Points

Services

A catalog "service" is currently run when an 'order' is submitted.  A new Orchestration Service to run & monitor workflows will be provided, and will behave like any other catalog service in the UI.

Inputs to the orchestration service will include the name of the workflow in the OE to be run, and will include any fields the workflow developer adds to the WF definition.

AssetOptions

Drop-down menus in catalog order forms are currently populated by AssetOptionProviders, which retrieve data from the DB as name/value pairs.  The existing Service Descriptor triggers this retrieval by using a field type specification (e.g.: "assetType.vipr.project" - which retrieves all projects for a drop down menu as name/ID pairs).  Currently these providers may be dependent on each other (e.g.: available Varrays may depend on selected Project)

To allow users to define their own name/value pairs for drop-down menus, a new  AssetOptionProvider will be provided to retrieve menu options via Mozart. Additionally, this new AssetOptionProvider will handle fields that depend on each other (same as existing AssetOptions).  E.g.: if a list of hosts depends on the selected project, then the options for a field for 'hosts' will depend on the value of the 'project' field, and be dynamically updated when selected

To use the new AssetOptionProvider, when the WF designer is creating a new workflow, the WF step inputs may be tagged with a special identifier (e.g.: asset.option.getMyProjects), indicating an option list is to be retrieved, and the workflow name to use to retrieve name/value pairs.  After running the WF, these pairs will be returned to the order form and populate the drop down menu.

Example Workflow Detail: Create 10 Volumes & Export with Rollback on Failure

Notes:

    • WF Builder exports JSON for OE to persist in DB
    • Functional only - look & feel TBD
    • Rollbacks are implemented as WF steps (i.e.: there is no rollback defined 'for' a step, only steps that can be executed that will provide rollback functionality)

Resulting JSON Workflow Description (abbreviated)

   "WorkflowName":"CreateExportWithRollback",
   "inputParams":[ 
      "volumeName",
      "varrayId"
   ],
   "Steps":[ 
      
         "Start": {
             "Next": {
                 "Default": {
                     "StepId""1"
                 }
              }
         }
       },
       {
         "Step": {
            "StepId":"1",
            "OpName":"CreateVolume",
    ...
            "inputParams":[ 
               "size",
               "qty",
               "vArray",
               "vPool",
               "project"
            ],
    ...
            "SuccessCriteria":[ 
               "ALL step.resource.state == ready"
            ],
            "Next": {
               "Condition": {
                  "Statement": {
                     "Simple": { "Status""Failed" }
                  },
                  "StepId""2"
               },
               "Default": {
                  "StepId":"3"
               }
            }         
         }
      },
      
         "StepId":"2",
         "OpName":"deleteVolumes",
    ...
      }
   ]
}


Mozart Requirements

Requirement Details

  1. Drag-N-Drop WF builder
    1. UI Builder creates JSON description of WF (to store in DB via POST request to API)
      1. JSON description can be reloaded to edit WF later on (implies original raw JSON with positional data is preserved)
      2. if JSON is moved to another system, will it work? (for Servcie Provider Business Use Case
    2. Present primitives to drag onto diagram
      1. RestCall: makes a REST call, returning response
      2. ApiRestCall: makes a REST call to API
      3. Ansible: Runs an Ansible playbook locally
        1. Develop 2 primitive types, if necessary, for remote & local
      4. Other workflows: any other workflow defined for OE can be inserted into the workflow
    3. Follow flowchart style  (suggested)
      1. Has Start node
      2. Fail/stop WF anywhere by going to 'end' node(either 'Done' or 'Failed' end step)
      3. Decisions can take different branches (although no need for separate 'diamond' shape decision elements)
      4. (Rollback steps are just plain steps)
    4. Allow user to set properties on primitive steps
      1. e.g.: text fields and/or drop down menus
    5. Map output of a step's variable to another step's input (e.g.: map output var1 of step 1 tio input var Var2 of next step)
    6. Allow user to set input variable values to constants
    7. Allow user to specify option-list input for a step which comes from output of another WF (e.g.: 'project' input field comes from list that is output of 'getUsersProjects' WF)
      1. Allow option-list inputs coming from other WFs to be dependent (e.g.: input field 'vpool' depends on the 'project' input field value)  
    8. Allow user to link step to another step based on success of the step (i.e.: SuccessCriteria)
    9. Allow user to reuse primitives (i.e.: CreateVolume primitive can be used in multiple workflows, and multiple times in a single workflow)
    10. All defined primitives should be available in the library/toolbox
    11. Allow user to specify timeout value for step
      1. timout be different from failure - and allow different branching to next step
    12. Allow workflow to call other workflows
      1. Allow user to map inputs/outputs of a nested WF to/from other steps in the WF
  2. Catalog Service Input Form
    1. Dynamically present form with all input variables for WF Definition (i.e.: fields presented are those that match the inputs of this workflow)
    2. Allow AssetOptions to present info for drop down menus
      1. User creates WF for each drop down menu
      2. When defining WF, user specifies which WF to use to get asset options (i.e.: 'asset.option.workflowName')
      3. Allow AssetOptions to depend on each other.
      4. AssetOptions are implemented as regular workflows, returning pairs  of {optionName,optionValue}
      5. AssetOption WFs should NOT appear in Service Catalog
      6. Try to allow re-use of Asset Options (consider dependency)
  3. Service Execution
    1. Parse WF Definition as input (JSON from UI Builder)
    2. Get input variables from form submission, which should be present in JSON WF definition (see above story re:Catalog Service Form)
    3. Merge input params with workflow definition to create 'order' JSON (and persist)
      1. normalize code that parse WF Def
    4. Call OE_Runner to execute catalog order
    5. Show log messages (from 'Log' steps)
    6. Show affected resources
    7. Show related tasks (as returned from API)
      1. Show tasksdefined in WF
      2. Show tasks started by WF tasks
        1. locate Tasks that match WF tasks that called API
  4. Validation
    1. Users will code workflow/step validation into their workflows manually
  5. Workflow Primitives
    1. Define primitives (Primitives will be either Ansible playbooks or REST calls)
    2. Users can extend existing primitives to make their own new ones
    3. Users can export/import primitives to other clusters (includes related scripts)
    4. versioning
  6. Export/import of workflows
    1. Allow users to take a WF they have created on one cluster and export it so that the WF can then be imported on a different system.  (to avoid having to rebuild that WF from scratch on the other cluster). 
    2. FUTURE: Create Catalog Service to load a WF to another system (to execute there via it's API/Catalog)
      1. This is in order to support Remote Execution features, so SP can use a central cluster to load WF onto their customers' managed systems
  7. Convert exiting API calls to OE Workflows
    1. All existing provisioning APIs calls should exist as primitives
    2. All primitives will wait for API calls to complete (async - even for tasks which primitive will wait for)
    3. All existing Catalog Services should be available as Workflows
  8. Catalog API Support
    1. All OE WFs will available via Catalog service API
  9. Workflows that fail need to be restarted
    1. state will not be persisted

Use Cases

User wants to create volume and export it (using existing primitives)

  1. In UI Builder
    1. Add Steps - drag createVolume and exportVolume primitives into work flow
    2. Connect Steps - connect createVolume to exportVolume (by dragging arrow)
    3. Map Inputs - specify where exportVolume step inputs come from (e.g.: volumeId(s) would come from createVolume output)
    4. Adjust Success Criteria (optional) - for each step , enter success criteria (if default is not desired)
      1. Define also for WF (for nested WF scenario)
    5. Set Condition (optional)
      1. By default, the arrow connecting two steps indicates the next step to execute on step being a SUCCESS
      2. Optionally, an arrow may be designated as the FAILURE path, and lead to a different step, if step does not SUCCEED
    6. Save workflow - saves to DB
  2. In Service Catalog
    1. Select Service
    2. Enter data for fields (a field will appear for each undefined input variable in the WF)
    3. Submit order

User wants to call a child-WF from a parent-WF as a WF step

  1. In UI Builder, when adding a WF as a step
    1. Add WF primitive as Step - drag nested-WF primitive into work flow
    2. Map Inputs - specify where nested-WF inputs come from (i.e.: variables already in WF assigned by other steps as outputs)
      1. nested WF's inputs will be available to map variables to from previous step
      2. WF outputs can be mapped to the next step's inputs
    3. Adjust Success Criteria (optional) - for nested-WF, enter success criteria (if default is not desired)
    4. Set Condition (optional) - e.g.: on arrow, select to proceed only if nested-WF SUCCEEDS/FAILS (for some or all volumes)
    5. Complete parent workflow - continue adding steps to parent-WF

Primitives

Overview

    • DB holds entries for primitives
    • primitives can be based on other primitives (createVolume is based on restCall)
    • child primitives inherit properties of parent
    • child primitives inherit default values for properties (e.g.: inputs, outputs, successCriteria, etc), if defined
    • WF Builder allows primitives to be put into workflow
    • workflow is saved in DB
    • Catalog Order form shows fields marked in WF as inputs
    • Form field data bundled as params is sent to OE_Runner with WF Description 

High level diagram

Property inheritance of primitives

Note: Custom REST call to populate CMDB needs formatted input.  Ansible playbook does that by formatting output of createVolume step

Note: Only RESTCall and AnsiblePlaybook root-primitives are planned to be developed

Primitive Definition Examples

Create primitive:restCall
    1. Implement a ExecutionTask to make a REST call
      1. This is a java class that needs to be written
    2. Create new OE_Primitive
      1. Create primitive in DB, by adding a record
    3. Add unique name or ID
    4. Define inputs (e.g.: host, port, path, body, method...)
      1. Inputs are just variables required to make the REST call
    5. Define outputs (and where output values come from)
      1. Output variables will be available as inputs to other steps
      2. Implemented Java step will make all HTTP headers & response available for variables
      3. ContentType determines available methods to extract output vars
      4. If HTTP response is JSON or XML, response.task.resource.id will get all values from that node   (XML will be future)
    6. Define default success criteria
      1. HTTP return status code will be available, so add expression like "header.status >199 AND header.status < 300
Create primitive:createVolume (as child of primitive:restCall)

Note: this is child of primitive:restCall and inherits properties

    1. Create new OE_Primitive
      1. Create primitive in DB, by adding a record (maybe deevlop a tool to assist - e.g.: API or UI or Util)
    2. Add unique name or ID
    3. Identify primitive:restCall as the parent
    4. Define additional inputs (optional)
    5. Override inherited variables
      1. Set host:port to the cluster VIP you want
      2. Set path to /block/volume.json
      3. Set ContentType to 'application/json'
      4. If volumeName is a new input variable, you can set 'body' to '{"name":"{{volumeName}}"}'
    6. Define outputs (and where output values come from)
    7. Define success criteria
      1. HTTP return status code will be available, so add expression like "header.status >199 AND header.status < 300"
      2. Also add expression saying all tasks must succeed:  "ALL response.task.status = ready"

Parallel Execution

Workflow steps will be blocking, by default, meaning a step must completely finish before WF proceeds to the next step. Certain primitive types may support non-blocking behavior, where step execution is started and then execution continues to next step prior to the completion of the step.

In the first release, parallel execution wil be supported for steps that make REST calls into the API that return an API Task.  Since the API Task represents an underlying, long running operation the step can be configured to not wait for the underlying task to complete.  (Pre-requisites include not having any success criteria for the task, and not having any outputs used in the WF).

Nested Workflows

    • In WF Builder (in UI) there will be a primitive that allows the user to add a pre-defined WF as a step in the WF they are creating
    • At execution time, the new WF will execute the nested WF, and wait for the results before proceeding
    • After creating a new WF with a nested WF, if the nested-WF is edited/changed, then the change is reflected next time the WF runs
    • Nested-WF has inputs & outputs, so it can get inputs from previous step, and deliver outputs to next step

OE Runner

    • OE Runner is implemented as ViPR Service
    • It takes WF Definition JSON and Map of User Inputs
    • Parses the JSON WF
    • Execute Each Step using condition variable present in WF Definition, which is represented as some Expression Language<TBD>
    • It handles the various types of input required per step (e.g: input from other steps)
    • It has Two Primitives implemented as ViPR Tasks. They are: REST Primitive (handles ViPR REST and Other REST steps), Ansible Primitive(Runs any Script of type Python, Shell script)
    • Evaluate Step status from the successCriteria given by WF Definition as Expression Language.
    • Update Results per step
    • Logs progress of each step on ViPR UI
    • Execute subsequent steps according the condition
    • it has a generic ViPR REST Result parser. So it can give any output from ViPR REST Response if user asked for
    • Ansible Primitive should be able to execute scripts on local or remote machine. resentation

Workflow Validation

Until high level validation approach is determined , users will code validation into their workflows manually, as either part of their workflow, or as a workflow which is called from inside their workflow.

WF Step Validation as 'plain old workflow'

No special support for validation in first release.

Levels of support for validation:

    • Run time: prevents WF from starting if required resources for all steps are currently unavailable 
    • Design time: prevents incompatible WF steps from being used in a WF as it's being designed (FUTURE)
    • Test run: runs WF, without making persistent changes, to test whether WF will succeed (FUTURE)

Run-time validation will be a nested WF (or part of a WF), defined the same way as any OE WF. 

WF Description JSON with validation WF defined


{
   "WorkflowName":"SampleWorkflow",
   "inputParams":{
      "volumeName":"vol"
   },
   "Steps":[
      {
         "StepId":"1",
         "OpName":"createVolumeValidation",
...


WF Builder showing validation workflow


The validation workflow is created using WF Builder, just like all WFs.

Workflow Definition Details (JSON Structures)

Sample Workflow 

WF Description Detail

Locking input variables can be locked down using existing Catalog Service feature as well

{ 

   "WorkflowName":"CreateVolandsendEmail",

   "Description":"Create Volumes if fails delete the created volumes. Send Email about the Workflow status",

   "Steps":[ 

      { 

         "StepId":"Start",

         "Next":{ 

            "Default":"GoBig"

         }

      },

      { 

         "StepId":"GoBig",

         "OpName":"com.emc.storageos.model.orchestration.internal.BlockServiceCreateVolume",

         "Description":"Create Volumes",

         "Type":"ViPR REST API",

         "Input":{ 

            "size":{ 

               "Type":"InputFromUser",

               "FriendlyName":"CreateVolume Size",

               "Required":"true",

               "Default":"1GB",

               "AssetValue":"",

               "Group":"Provisioning",

               "LockDown":""

            },

            "name":{ 

               "Type":"InputFromUser",

               "FriendlyName":"Create Volume Name",

               "Required":"true",

               "Default":"Mozart-Vol",

               "AssetValue":"",

               "Group":"Provisioning",

               "LockDown":""

            },

            "count":{ 

               "Type":"InputFromUser",

               "FriendlyName":"Num of volumes to create",

               "Required":"true",

               "Default":"1",

               "AssetValue":"",

               "Group":"Provisioning"

            },

            "varray":{ 

               "Type":"AssetOption",

               "FriendlyName":"Varray",

               "Required":"true",

               "Default":"urn:storageos:VirtualArray:9ff1c466-4f17-4d0a-aaf9-df9cb06cfde0:vdc1",

               "AssetValue":"asset.option.varray",

               "Group":"Controller"

            },

            "vpool":{ 

               "Type":"AssetOption",

               "FriendlyName":"Vpool",

               "Required":"true",

               "Default":"urn:storageos:VirtualPool:8b81adcd-91c8-422a-bc2d-6d245db66998:vdc1",

               "AssetValue":"asset.option.vpool",

               "Group":"Controller"

            },

            "project":{ 

               "Type":"AssetOption",

               "FriendlyName":"Project",

               "Required":"true",

               "Default":"urn:storageos:Project:51d2cc03-62ad-4e7e-92b1-e60cf614c84f:global",

               "AssetValue":"asset.option.vpool",

               "Group":"Controller"

            },

        "consistencyGroup":{

           "Type":"InputFromUser",

               "FriendlyName":"consistency group",

               "Required":"false",

               "Default":"",

               "Group":"Controller"

        },

        "computeResource":{

           "Type":"InputFromUser",

               "FriendlyName":"compute resource",

               "Required":"false",

               "Default":"",

               "Group":"Controller"

        }

         },

         "Output":{ 

            "CreatedVols":"task.resource.id"

         },

         "StepAttribute":{ 

            "WaitForTask":true,

            "Timeout":"60"

         },

         "SuccessCriteria":"#task_state == 'pending'",

         "Next":{ 

            "Default":"WinBig",

            "FailedStep":"WE123"

         }

      },

      { 

         "StepId":"WE123",

         "OpName":"deleteVolumes",

         "Description":" Delete the volumes",

         "Type":"ViPR REST API",

         "Input":{ 

            "id":{ 

               "Type":"FromOtherStepOutput",

               "FriendlyName":"Volumes to be deleted",

               "OtherStepValue":"GoBig.id"

            }

         },

         "Output":{ 

            "DeletedVols":"task.resource.id"

         },

         "SuccessCriteria":null,

         "Next":{ 

            "Default":"WinBig"

         }

      },

      { 

         "StepId":"WinBig",

         "OpName":"com.emc.storageos.model.orchestration.internal.LocalAnsible",

         "Description":"Generic Shell Primitive",

         "Type":"Ansible Script",

         "Input":{ 

            "email":{ 

               "Type":"InputFromUser",

               "FriendlyName":"Email Send To",

               "Required":"true",

               "Default":"noReply@dell.com",

               "Group":"Others"

            },

            "Subject":{ 

               "Type":"InputFromUser",

               "FriendlyName":"Subject of SendEmail",

               "Default":"Sending Email…",

               "Group":"Others"

            },

            "CreatedVolumes":{ 

               "Type":"FromOtherStepOutput",

               "FriendlyName":"created Volumes",

               "OtherStepValue":"GoBig.CreatedVols"

            },

            "Deletedvolumes":{ 

               "Type":"FromOtherStepOutput",

               "FriendlyName":"Deleted Volumes",

           "Required":"false";

               "OtherStepValue":"WE123.DeletedVols"

            }

         },

         "Output":null,

         "SuccessCriteria":null,

         "Next":{ 

            "Default":"End"

         }

      },

      { 

         "StepId":"END"

      }

   ]

}


Sample JSON for createVolume/exportToHost

Example request/response for CreateVolume/ExportVolumeToHost

Workflow results

The OE_Runner needs to return data after completion of each step as well as the WF itself.

Results should support

    • status (success, failure)
    • log messages (for UI)
    • response from step (stdout from Ansible, or HTTP response for REST calls)

By default, OE Runner should generate

    • UI Log message when Workflow starts/ends
    • UI Log message when step starts/ends
    • Result state of steps & workflow (e.g.: pending, success, failed)

Extended Workflows

The OE steps will either execute a REST call or an Ansible playbook.  Ansible playbooks can be run in 2 modes:

Local Execution Mode:  OE Step execution is performed locally in the OE (which provides capabilities to make requests out to external systems).

Remote Execution Mode: OE Steps may run processes and/or scripts on external systems themselves.  Ansible will handle running the playbook on those nodes.  

Tools will be provided to allow user to create WF steps to execute both kinds of steps

Persistence model

Primitives

Each primitive type will have a DB model class.  When the user creates a new primitive it will be saved in the ViPR DB for use in one or many workflows.  

OEPrimitive

The base primitive is an abstract java class that the other primitive types are derived from.  The base primitive is not a column family but has the base elements for a primitive.

OEParameter
OEParameterList
OERestCall
LocalAnsible

LocalAnsible represents an ansible playbook that can be called locally.

RemoteAnsible

Workflow document

The workflow document will represent a JSON document that contains the workflow input/output and the list of steps and they're inputs and outputs.

Order execution

Order

Modify the existing column family to include a field for the WF JSON document

OrderParameter

Should be able to leverage this CF to save input parameters as strings

More details at: Project "Mozart" - Persistence Model

Why build our own?

Why not RackHD?

    • Too heavy (lots of changes)
    • Difficult to separate WF engine from rest of RackHD components
    • No graphical UI front end
    • Plan to leverage Ansible anyway (as RackHD does)

Why not other BPMN tool?

    • None found that satisfy requirements and ease of use
    • Too many steps for end user
    • Lack of control (look & feel, architecture)

Open Source Considerations

Currently the WF Builder part in the UI is planned to be an OSS component that we will customize.  No other new OSS packages are currently expected.

Note: Ansible 1.9.4 is latest supported in SLES repo, but we'd like to provide Ansible 2.x

ASD OSS Procedures

Update:  Ansible approved.  See AOS-411

Exclusions and Limitations

Fail-over recovery: if a node fails, existing tasks will fail and not be restarted on other nodes, and workflow will fail (but workflow may be restarted by client/user and completed if other nodes in cluster are available)

Future Work and Related Projects

Other systems like RackHD also have functionality to provision bare metal servers, which may be incorporated in Vblock features at a later time 

Mofification of the underlying API calls to be more granular would enable Mozart to execute workflows that could more closely resemble the underlying workflows inside the controller itself (i.e.: the workflow engine that governs storage operations below the API).

Implementation Strategy

Implementation Timeframe

    • In Progress

Virtual Team

    • Dev - Steve Mendes, Shane Sullivan, Sonali Sahu, Keerthi Bala 

    • QE  - TBD

    • UXD - Hanna Yehuda and Mindy Villaran

Testing Strategy

Requires QE to develop test plan.  Since this is a somewhat open-ended engine which will be used to develop workflows for customers (and eventually 'by' customers) it would be preferable to test the 'process' of creating workflows.  Attention needs to be given to what kinds of workflows will be included for testing.

It is also assumed any example workflows that are shipped will also require testing.

The Orchestration project will run a series of usability and operability testing which will be called "internal beta". This will be run as part of the implementation process unlike external beta which are done after complete functionality development and significant testing.

Internal Betas will be used to validate targeted workflows involving Storage and Compute and the complete end-user experience. 

Internal Beta 1 - Target - Engineering team members  who has knowledge of existing workflow capabilities and integration with CoprHD UI Layer

Internal Beta 2 - Target - External teams like Solutions Engineering, Professional services who have more customer exposure writing workflows in IT environments

Documentation Strategy

Documentation assistance will come in a number of forms:

    1. OE Developers Guide - A document that describes features, functionality & the process for creating workflows using the OE
    2. Sample Workflows - Sample workflows will be provided, allowing users to see examples of how to implement various workflows
    3. External documentation links - since 3rd party components are used, references & links to documentation for components like Ansible will be provided

Updates to existing documentation may include:

    • Architectural documentation
    • Deployment documentation
    • Support Matrix
    • Upgrade Procedures
    • 3rd Party Product components/declarations

Impact Evaluation

Public APIs and public functionality (end-user impact)

Catalog Services will display an additional OE category and its base service.

Existing Catalog API will support OE Services

Other components and component interaction

The following new components will be added to the appliance:

See previous diagrams for interactions

Persistence model

OE Services will persist the WF definition as well as the primitives available when creating new WFs in Cassandra

Catalog state is stored, and the OE service will use that existing mechanism to persist the services added to the catalog via the Edit Catalog feature.

Upgrades

Initial upgrade: adds new DB column families and code.  No issues here

Future upgrades:

    • Ansible
    • Playbooks that were uplaoded need to be preserved
    • Upgrades to Ansible itself my break existing playbooks - retain backward copatibility
    • Name/ID conflicts
    • If upgrade contains new services/playbooks, handle possible conflicts with user created workflow

Performance

Disk Storage

Playbook script files will be stored. Assume the number of scripts will grow over time. There may need to be a size limit per file.  Need to learn about any restrictions on storing files.  

It is not expected that a large number of workflows will be executing simultaneously, therefore CPU loads are not expected to be impacted greatly, but increase will need to be accounted for in sizing resources.  Services will distribute work across cluster.

Memory

No great impact.

Service Descriptors are currently stored in ZK.  Since there will be more of these, it will be necessary to check impact, although there will be no mor of them than there are services, so impact should be limited. 

Functionality

AssetOptionsProvider

Currently different providers are used for each field type, which can run in parallel.  To eliminate the need for users to compile code when adding new workflows, a single provider will be used for OE integration.  Currently this provider handles request serially.  An effort will be made to investigate running requests parallel.

Scalability and resource consumption

OE is distributed across all nodes and scales as cluster does

Load presented by running workflows is expected to not be significant, but should be verified in testing.

Concurrency of workflows will follow same rules/behavior as existing catalog services.

Workload Distribution:

    1. For Service Catalog forms that retrieve drop-down menu options dynamically, different fields may be populated by different nodes
    2. An OE order submitted on one node will get routed through nginx to an OE on any of the nodes in the cluster
    3. OE workflow execution will be contained to the node where it was initiated
    4. OE Tasks that call back to API will go to VIP and be distributed across nodes (like any API call)

All requests involved in executing workflows route through nginx and get load balanced/distributed.  In diagram below:

    • Service Catalog on node-1 starts workflow, which runs OE Step (as a Task) on node-3     (blue)
    • OE Task sends API request to node-2     (red)

Security

Areas to address :

    • Workflows that interface with external systems will need open communication paths for:
      • REST (most common, hopefully)
      • SSH (where Ansible is running playbooks remotely against external systems)
      • Other protocols required by custom scripts that we cannot know of ahead of time
      Stored credentials, including
      • Credentials stored in Ansible playbooks (will provide guidelines on using Ansible Vault)

Deployment and serviceability

Troubleshooting guide for Installation to be provided

Developer impact, dependencies and conditional inclusion

Developer skills may include:

    • Ansible Playbook Development - for workflow implementers

Reviewed and Approved By

NameDateVoteComments, remarks, etc.

























    • The members of the CoprHD Technical Steering Committee (TSC) have been automatically added to your reviewer list
      • A majority of them must approve this document before the feature can be merged to a project integration branch (e.g. integration-*, release-*, master)
      • Please un-check the boxes next to their names when your document is ready for their review
    • Please refer to Design Approval Guidance for information on additional reviewers who should be added and the approval process


  • No labels

8 Comments

  1. Given the overall scope of the project, should we split it into multiple epics around

    a) The declarative language constructs and Orchestration Engine itself

    b) The WF builder

    c) Trial use cases for e.g. NDM

    d) Ansible or any other scripting Devops toolkit integration

  2. Do we think they could get delivered separately?

  3. I don't see how you cannot deliver a) b) and c) as a unit. Maybe ansible is separate?

  4. Do each workflow run as a job/task in an isolated environment (Ex: container) in order to avoid impacts of dependency between the workflows/tenants/libraries/files/versions and ViPR C host? An option is to exec them from within a container/chroot capsule and clean-up used capsules hourly/daily.

    1.  Each Task runs as ViPR Task. For Ansible/Script Task we are planning to use chroot or something similar approach and planning to have its space inside a order context dir. which will be cleaned after execution is complete or periodically. This will also give us some control to stop running any vulnerable commands on ViPR Node.

      For now we are not planning to run these inside container. But we are still exploring.

      Thank you for your input. Once we decide on something, I will definitely let you know before implementing.

      1. Sonali Sahu the chroot is probably the quickest/simplest way, but i don't think that is the best approach...

        The adoption of container should be the way to go. Follow below a proposition:

        • Phase1:
          • Create ENV Variables of two types: Protected (Password) and Unprotected (Generic).
          • Create Container image (coprhddevkit?) and make it available in CoprHD Controller (https://coprhd/executor-image.tgz).
          • Create :
            • 1: Executor wrapper inside of the container to notify ZK/CoprHD about the status of the on-going/running tasks.
            • or
            • 2: Poller (Poke/Polling) to check status of launched/scheduled tasks.
          • Create TaskRunner's ExecutorDrivers.
            • The same approach of existing ones for Operating Systems.
          • Create ExecutorDriver for Docker (API Call).
          • TaskRunner calls ExecutorDriverDocker with localhost as target.
        • Phase2:
          • Create ExecutorEndpoints and associate the proper ExecutorDriver.
            • ex: 10.0.0.1 = docker
          • Group ExecutorEndpoints as Clusters
            • Round-Robin among ExecutorEndpoints
          • Association/Affinity of Workflows and ExecutorEndpoints
        • Phase3:
          • Additional ExecutorDrivers: Swarm, Kubernetes, others...


        This approach would fit in almost every scenario and extend CoprHD capabilities to have not only actions being executed remotely, but also long-running-tasks that might be useful to run additional services. Ex: Rack Controller services like: DHCP, TFTP, etc...

        Any comments are welcome..

  5. Victor - last I knew (I haven't been involved on a daily basis for a number of weeks) all the workflow steps are run as ViPR ExecutionTasks, like all other SA services.  sullivan, shane might be able to describe details of how Ansible is exec'd by the task - and whether it is isolated.