Skip to main content

Setting up a workflow

When predictions are missing or have low confidence, a human in the loop may be needed. In this guide we will cover the following topics:

Creating a workflow

First we will learn how to deploy and configure a default workflow for a model.

Starting a workflow execution

Then we will show how the workflow can be executed using Cradl's CLI and SDKs.

Exporting documents

... and finally how documents can be exported by configuring a webhook.

Cradl Web App

Prerequisites

In order to follow this guide, make sure that you have:

  1. Installed the command-line interface (CLI)
  2. Created and downloaded API credentials
  3. Created a model

Only available for paid plans

You must be on one of the paid plans in order to use the workflow functionality.

Creating a workflow

Now we will set up a default workflow for a model. Before we can do that, we need to find the ID of the model we want to set up a workflow for. The model ID can be copied from the Overview-tab of your model, or you can use the CLI or SDKs:

$ las models list 
{
"models": [
{
"modelId": "las:model:<my-model-id>",
"name": "My invoice model"
"description": "A brand new model for reading invoices",
...
},
]
}

Now that we have our model ID, let's create a workflow:

$ las workflows create-default --from-model-id las:model:<my-model-id>  <name>

Creating secrets ... Done.
Creating assets ... Done.
Creating datasets ... Done.
Creating transitions ... Done.
Creating workflow ... Done.

{
"workflowId": "las:workflow:<my-workflow-id>"
"createdTime": "2022-01-01T12:00:00.000000+0000",
...
}

In the next section, we'll take a closer look at what our auto-generated workflow does.

info

Auto-generated workflows are currently only supported by the CLI.

Workflows, transitions and executions

A workflow is defined by a series of transitions which mutate the state of your workflow execution. There are two types of transitions; manual transition which mutates the state based on input from a user, and docker transition which mutates the state programatically in a Docker-container. At each step in the workflow, the current state is given as input to the transition, and the output of the transition is the new state. When a new workflow execution is created, an initial state is provided.

Starting a workflow execution

Let's test our new workflow by creating a new workflow execution. The initial state of the workflow executions is provided as a JSON object, and the workflow we just generated assumes that the initial state is on the form {"documentId": "las:document:<document-id>"}.

$ las documents create mydocument.pdf > input.json

$ las workflows execute las:workflow:<my-workflow-id> input.json
{
"workflowId": "las:workflow:<my-workflow-id>"
"executionId": "las:workflow-execution:<my-execution-id>"
...
}

Now that we have created a workflow execution, the following steps are executed:

  1. The initial state will be provided as input to the first transition (Preprocess). The transition will create a Prediction on the provided document.
  2. If the confidence of any field predicted is below a certain threshold, a manual transition will be invoked so that an end user can validate that the predictions are correct.
  3. In the last transition (Postprocess), the ground truth of the document is updated and the document is assigned to a Dataset so that it can be used for training. This transition is also responsible for exporting the document.

Exporting documents

To customize the exporting functionality, we have two options:

  1. We can use one of the default export options (webhooks or file export)
  2. Write a custom Docker image

In this section, we will cover how to use the default export options. Before we get started, make sure to find the ID of the Postprocess-transition:

$ las transitions list
[
{
"name": "Postprocess transition for workflow [..]",
"transitionId": "las:transition:<transition-id>",
...
}
}

Alternative 1: Configuring a webhook

In order to configure the Postprocess-transition to use a webhook, we need to set the environment variable WEBHOOK_URI on our Postprocess-transition. Updating environment variables will overwrite any exising environment variables that are set, so make sure to include them as well:

$ las transitions get las:transition:<transition-id>
{
"transitionId": "las:transition:<transition-id>",
"name": "Postprocess transition for workflow [...]",
"transitionType": "docker",
...
"parameters": {
"environment": {
"DATASET_ID": "las:dataset:<dataset-id>",
"FORM_CONFIG_ASSET_ID": "las:asset:<asset-id>",
"MODEL_ID": "las:model:<model-id>"
},
...
},
...
}

Copy the old environment variables plus our new environment variable to a new file called env.json:

$ echo '{
"DATASET_ID": "las:dataset:<dataset-id>",
"FORM_CONFIG_ASSET_ID": "las:asset:<asset-id>",
"MODEL_ID": "las:model:<model-id>",
"WEBHOOK_URI": "https://my.webhook.com/a"
}' > env.json

Now we update the environment variables for the Postprocess-transition:

$ las transitions update las:transition:<transition-id> --environment env.json

The next time the Postprocess-transition is run, the result of the workflow execution will be posted to the specified URL.

Alternative 2: File export using SSH/SCP

Coming soon.

Reference