4. Jobs


This section covers the Job API resource. Jobs typically take an uploaded document as input so be sure you have read that section.

The Job Resource

Jobs are the primary resource for the API. They represent the work being done as inputs get transformed.

Jobs are created through the endpoint POST /jobs. Once created, each job has a unique ID (UUID format) as well as an owner (the user who created it). By default, jobs can only be seen, modified, and removed by their owner.

A list of accessible jobs can be retrieved from GET /jobs. The details of a particular job can be retrieved from GET /jobs/:jobId.

Job Type

Each job has a field type which indicates the type of the job. The type of the job uniquely identifies the backend process that will be used for the job.

Some example job types:

The full list of job types are available at the end point GET /job-types. The details of a particular job type can be retrieved at GET /job-types/:jobTypeId.

Job Status

Each job has a field status which indicates the current state of the job. Here are some example job statuses:

  • running — the job is actively being processed by the system.
  • blocked — the job has been paused for human review.
  • completed — the job is complete and the output is ready for retrieval.
  • cancelled — the job was deliberately cancelled prematurely. The contents of the job are likely incomplete or invalid.
  • failed — the job cannot continue and has been marked as failed. The contents of the job are likely incomplete or invalid.

Job Contents

Each job includes a description of its input and output. For all jobs, the input document is detailed in the input_content field. Once a job completes, the field output_content contains a description of the output, including a URI for retrieving the output.

Jobs also contain other types of content, determined by their type, their status, and their progress through the workflow. For example, jobs of type data-point-extraction contain a content of type mapping-taxonomy-json. For more information on retrieving content other than input and output see: Retrieving Job Contents .

Job Collaboration

By default, only a job’s owner can view, edit, and work on a job. To allow other users to collaboratively work on the job, the field collaboration is used.

Collaboration can be controlled through either teams or directly through users, as the collaboration field contains both of these as sub-fields.

  • Adding collaborators by team requires either the name of the team or the ID of the team.
  • Adding collaborators directly requires either the email of the user or the ID of the user.

Each collaboration contains a list of strings called steps. These indicate which roles the collaborator is allowed to access. Sending "*" for steps indicates the users can access all roles.

Example

An example of a job with a single team collaborating on all steps.

{
...
  "collaboration": {
    "teams": [
      {
        "id": "c383d5a5-4cff-473f-b820-b53bb70abb78",
        "steps": ["*"]
      }
    ],
    "users": []
  },
...
}

Job Metadata

Each job has a field metadata which is an object of key/value pairs. Clients are free to use this field to add any additional client-specific information about the job. When creating a job, any unknown fields are automatically added to the metadata. The values can be any valid JSON type.

Most job types require specific metadata fields. For each job type, read the documentation specific to that type.

Monitoring Job Status

Once a job has been created, the client must monitor the status of the job and track its progress. This is done through the endpoint GET /jobs/:jobId.

The progress of the job will be found in the fields progress.current and progress.total which indicate the current step and the total number of steps respectively. These numbers are simply an estimate and should be not be taken too critically.

Once the job has reached a final status of completed, cancelled or failed, the client should no longer monitor the job.