Connect upsert operation

This tutorial guides you through adding data from a CSV file into Collibra Data Governance Center (Collibra DGC) by performing an upsert operation using Collibra Connect. An upsert is an operation where a resource is either updated if it already exists in Collibra DGC or created if it does not exist.

Prerequisites

To complete this tutorial you must have access to a Collibra DGC environment and have installed MuleSoft Anypoint Studio. For installation details, see the Getting started with Collibra Connect tutorial.

Prepare the CSV file

The source of the upsert operation is a CSV file. We are using a short list of acronyms in this example as sample data. For the operation to succeed there are mandatory elements that must be provided.

To create a new business term:

  • The name of the business term.
  • The domain the business terms belongs to.

    Note The domain may be identified by name or the universally unique identifier (UUID).

If the specified domain does not exit, it will be created and the following mandatory elements need to be provided.

To create a new domain:

  • The name of the domain.
  • The type of the domain.
  • The community the domain belongs to.

    Note The domain may be identified by name or UUID.

If the specified community does not exist, it will be created if a name has been provided.

For additional information, see the Upserting assets by name section of the Collibra Connect documentation.

Our sample file looks like the following:

Name Domain Domain Type Community
DGC New Business Terms Glossary Data Governance Council
FAQ New Business Terms Glossary Data Governance Council
GDPR New Business Terms Glossary Data Governance Council
Name,Domain,Domain Type,Community
FAQ,New Business Terms,Glossary,Data Governance Council
DGC,New Business Terms,Glossary,Data Governance Council
GDPR,New Business Terms,Glossary,Data Governance Council

Tip Save the above as a CSV file to use later in the project

Create a MuleSoft project

Open Anypoint Studio and create a new project:

  1. Click FileNewMule Project.

  2. Enter a name for your project and click Next. We are using csv_upsert in this example.

    Note Blank spaces are not supported as part of the project name.

  3. Leave all the defaults as they are for the Java settings and click Next.

  4. To create your new project, click Finish.

The Package Explorer, Palette, a list of drag-and-drop building blocks for your application, and Connections Explorer will be populated.

Create the connect flow

  1. From the Palette, drag the File connector to the Canvas.

  2. From the Palette, drag the Transform Message component to the Canvas.

  3. From the Palette, drag the CollibraDGC connector to the Canvas.

Configure the components

A File connector placed at the beginning of a flow is set to behave like an inbound endpoint. To configure the File connector:

  1. Select the File connector on the Canvas.
  2. In the General section, specify the path to the CSV file. We are using src/main/resources/input in this example.

    To create the input folder:

    1. In the Package Explorer, right click the resources folder and select New Folder.
    2. Type the name of the folder you wish to create and click Finish.
  3. In the General section click the green + sign to create a new Connector Configuration.

  4. In the Global Element Properties window, click OK.

  5. In the Metadata section, click Add metadata and then click the edit icon to add a new metadata type.

  6. In the Select metadata type window, click Add.
  7. Type the name of the metadata type and click Create type. We are using Asset_by_name in this example.

  8. From the Type drop-down select CSV.

  9. In the Sample File field, select the file you have created.

    The metadata will be retrieved from the file.

  10. In the Select metadata type window, click Select.

The CollibraDGC connector is used to establish a direct connection to the Collibra Data Governance Center. To configure the CollibraDGC connector:

  1. Select the CollibraDGC connector on the Canvas.
  2. In the General section click the green + sign to create a new Connector Configuration.
  3. In the Global Element Properties window, in the General tab, enter the connection details of your Collibra DGC environment:
    • Username
    • Password
    • Base Application Url

    Tip Use the Test Connection… feature to make sure you can establish a successful connection to Collibra DGC.

  4. Click OK to save the configuration.
  5. In the General section, select Upsert assets by name from the Operation drop-down list.

  6. Select Acronym from the Asset Type Id drop-down list.

The Transform Message component takes the output of the File connector and transforms it into the expected input for the CollibraDGC connector. Since you configured both connectors, their metadata is already loaded into the Transform Message component. To configure it, drag the fields from the input section over the corespondent fields in the output section:

As you create the connections, the Transform Message component builds the required code.

Save your project.

Run the application

To start the application in your development environment, right click on the Canvas and select Run project csv_upsert.

If there are no errors, the console will show that the application has started.

To perform the upsert operation, copy the CSV file to the src/main/resources/input folder.

Tip You can use the Package Explorer to paste or drop the file.

If there are no errors, the contents of the CSV file will be upserted to Collibra DGC. You will find them in the Business Glossary application, under the All Business Assets view.

In case of errors, check the console output as it will contain details about what went wrong.

To stop the application, right click on the Canvas and select Stop project csv_upsert.

Next Steps

Add a definition to the CSV file, reload the CSV metadata and map the new column to the corresponding field for the CollibraDGC connector input. Run the application to add the definitions to your Collibra Data Governance Center.

Name,Definition,Domain,Domain Type,Community
FAQ,Frequently Asked Questions,New Business Terms,Glossary,Data Governance Council
DGC,Data Governance Center,New Business Terms,Glossary,Data Governance Council
GDPR,General Data Protection Regulation,New Business Terms,Glossary,Data Governance Council

Stay tuned for the next tutorial, which shows you how to connect to an external system, such as Salesforce.

Additional resources