This page is in intended for CoprHD developers who are working with new or existing workflows and aims to provide a reference of Do's and Don'ts in order to produce a clean workflow. The material here is based on findings from the Rollback Quality project and as such will also cover recommended testing strategies using a failure injection framework.
Recommended reading: Workflow Service
Design Patterns (Do's)
This section describes various design patterns to consider for your workflows.
Task Completer Best Practices
The following list proposes a set of recommended patterns that may be followed when implementing or updating TaskCompleter instances.
Allow early access to TaskCompleters
This is especially important for rollback workflow steps, where the rollback method is a wrapper for one or more existing "happy-path" workflow methods that perform the inverse operation. For example, "createVolumes" would rollback with "rollbackCreateVolumes" which in turn delegates to a method for deleting volumes.
#deleteVolumes is the "happy-path" workflow step method for a delete request. Because the delete operation is also shared with a rollback operation, we simply create the relevant task completer here and delegate.
Finally, #deleteVolumesWithCompleter no longer has to create its own task completer. The task completer passed in could be coming from either a "happy path" or rollback context.
The intended benefits of this pattern are to provide a single instance of a task completer for all scenarios of an operation and help ensure that a task is completed once, or in the context of the same thread - not at all, if an asynchronous thread has assumed responsibility (see section Asynchronous Steps).
Responsibilities of a TaskCompleter
When completing a TaskCompleter instance, it should be considered an opportune time to perform any database operations that:
- Establish or destroy relationships between objects.
- Create, update or deactivate objects.
Rather than having these database operations strewn over the course of a workflow step, it can be a convenient and single location to have the TaskCompleter assume these responsibilities.
Detect and report rollback failures
Failures that occur during rollback have the potential to leave behind provisioned resources and should be reported to user. For example, a volume creation rollback may fail to remove a volume and complete the task with an error state:
In order to detect a failure during rollback, we can use TaskCompleter#isRollingBack in the error case.
We can then update the service code message accordingly.
Complete only once
A task completer can be asked about its completion state via TaskCompleter#isCompleted. This property is set to true in the #error and #ready methods in TaskCompleter, and so it's important to ensure a chain of "super" calls within the TaskCompleter inheritance hierarchy will reach these implementations if it's to be used reliably. Alternatively, a sub-class may set this property directly.
Note that although it is possible to write a TaskCompleter in an idempotent manner, the ability to query the completed state may still be useful when the TaskCompleter instance is being passed into other methods with uncertain exception handling.
Relinquishing TaskCompleter responsibilities
Workflow steps that hand off their task completer to an asynchronous job should immediately relinquish the responsibility of completing the task completer, once the job has been successfully queued.
After an asynchronous job has been queued, there may be subsequent code to be executed that is subject to failure. In the event of an exception being thrown, an exception handler must be able to recognize if an asynchronous job has assumed task responsibility, otherwise the task may be completed whilst the asynchronous job is still queued or executing. This can result in a running command from the CLI exiting prematurely, or an order page in the UI incorrectly informing the user a workflow has completed.
A task completer can be asked about its asynchronous-related state via TaskCompleter#isAsynchronous. This property is automatically set to true when an associated job is placed into the Dispatcher queue. Surrounding code should then conditionally completed based on the result of TaskCompleter#isAsynchronous.
The following example is taken from an SMI-S delete volume scenario, where an asynchronous job tracks the running operation on the SMI-S provider.
In the exception catch block, we can now conditionally complete the task completer. If it has been marked as asynchronous, it is the asynchronous job's responsibility to handle the task status.
During workflow execution, each step has the option to create a context object associated with itself. The context object is an arbitrary object that implements the Serializable interface and is persisted in Zookeeper.
A context object can be useful for when rollbacks need more data about what it should or should not do, in order to prevent data unavailability. For example, consider part of a scenario where a request is made to export a Volume to a Host:
- An Initiator is to be added to an initiator group, but is found to exist unbeknownst to the database.
- An exception occurs in the last step of the workflow, triggering a rollback.
- A rollback step associated with adding the Initiator performs the inverse remove operation.
Here, the rollback step required more data about the Initiator before it went ahead and removed it (causing a DU). The following code snippets demonstrate how the use of a context object could have prevented this:
As the method name suggests, as part of an export operation it is to handle existing Initiators in the given ExportMask.
Now, when the related rollback step executes it can retrieve this context and make the decision to leave the Initiator alone:
Load the step data from using the task ID.
Prepare to iterate over a list of contexts.
Cast to appropriate class.
The context references an operation stating that the Initiator was already added and makes the decision to leave it alone.
In this particular example, we're removing the Initiator from the task completer to prevent it being marked as inactive.
A standard list of bad practices based on our findings.
How to use the failure injection framework and where failures should be invoked.