> For the complete documentation index, see [llms.txt](https://developer.collibra.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://developer.collibra.com/workflows/designing-workflows/processes/process-execution/bulk-operations-in-groovy-script-tasks.md).

# Bulk operations in Groovy script tasks

When a Groovy script task calls the Collibra **\<Resource>Api** interfaces, such as `assetApi`, `relationApi`, `attributeApi`, or `responsibilityApi`, those calls invoke Spring service beans directly rather than making HTTP requests. All of these beans use the Spring default transaction propagation, which means they join the existing transaction rather than opening their own.

A synchronous script task therefore runs entirely in a single database transaction. Every API call made during the script, including reads and writes, participates in that transaction, which stays open until the script completes.

## Impact of bulk operations on transaction management

A script that iterates over a large dataset and writes for each item accumulates all those writes in one transaction. The open transaction holds row-level locks on every row it touches. Under concurrent load, other processes attempting to read or modify the same rows are blocked behind those locks. If the script runs long enough, the lock contention can saturate the database connection pool and make Collibra unresponsive.

Setting script tasks to **Asynchronous** alone does not resolve this issue. The task is moved to a background job thread, but the transaction boundary is unchanged and the entire script still runs in one transaction.

## Recommended pattern for bulk operations

To avoid long-running transactions, split the workflow into two script tasks connected by a loop-back gateway:

{% stepper %}
{% step %}
A **collector script task** that runs synchronously in one short transaction. It queries the full dataset, builds a list of work items, and stores it as a process variable using `execution.setVariable("workItems", ...)`.
{% endstep %}

{% step %}
A **processor script task** in asynchronous mode, which gives each execution its own transaction. It takes the first batch of items from **workItems**, processes them, updates the process variable with the remaining items, and sets `execution.setVariable("hasMoreWork", !workItems.isEmpty())`.
{% endstep %}

{% step %}
An **exclusive gateway** that routes back to the processor task when `${hasMoreWork}` is true, or proceeds to the end event when `${!hasMoreWork}` is true.
{% endstep %}
{% endstepper %}

The following diagram illustrates this pattern:

![](/files/30nSj6yBz4cdPKXY1p4P)

Each time the async job executor picks up the processor task, it runs in its own transaction. A failure in one batch rolls back only that batch, and the remaining work continues.

### Choosing a batch size

A batch of 25 to 50 items is a practical starting point. Smaller batches result in shorter transactions and less lock contention. Larger batches reduce job executor overhead but increase the risk of contention under concurrent load.

## Alternative: parallel multi-instance subprocess

You can wrap the processor in an asynchronous subprocess with the **Multi instance type** property set to **Parallel** to process all batches in parallel, which increases throughput. However, all batch jobs compete for database connections simultaneously. Under high concurrent load, this approach may be counterproductive. Use the sequential loop-back pattern unless throughput is a critical requirement.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developer.collibra.com/workflows/designing-workflows/processes/process-execution/bulk-operations-in-groovy-script-tasks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
