Under the hood – Building integrations with Collibra Connect

This document guides you through the Under the Hood Collibra Connect workshop. Follow each of the steps and ask the closest Collibra assistant if something is not clear. The exercise is broken into several parts. After each part there should be a clear result, which indicates the application was correctly implemented. The results are either noted in the logs or as a Postman response.

Prepare and test the skeleton project

In this part you import the skeleton of the project and create the HTTP listener that you use to trigger the Connect flow. You test it and inspect the console to ensure the flow is running.

  1. In Anypoint Studio, import the provided skeleton project:
    1. Go to FileImport…Anypoint Studio.
    2. Select Anypoint Studio generated Deployable Archive (.zip) and click Next.

    3. Browse for and select the uth-demo-start.zip file.
    4. Click Finish.
    5. The uth-demo-start project appears in the Package Explorer panel, on the left-hand side.

  2. Double click the uth-demo.xml file to open it.

  3. From the Mule Palette panel, on the right-hand side, drag the HTTP Connector to the canvas, the uth-demo tab.

    While the HTTP Connector is selected on the canvas, the Console panel, in the lower part of the screen, shows the properties of the connector.

  4. In the Console panel, under the General tab, click the edit icon next to the Connector Configuration to see the HTTP Listener Configuration, which is the entry point to start the connect flow, then close the configuration window.

  5. From the Palette, drag the Logger component to the main flow, next to the HTTP Connector.

  6. Save the project.
  7. Run the project: right-click the uth-demo folder and select Run AsMule Application.

    The Console shows the project has been deployed.

  8. Open Postman and send a GET request to localhost:8081.

    There is no response, but the log in the Anypoint Studio Console confirms the request was received.

  9. Check the Console log to find new events that happened after the DEPLOYED stauts.

Note   There should be no errors in either Postman or Anypoint Studio.

Add the domain id to the request

This part demonstrates sending information from the request to the Connect flow.

To read and store the domain ID sent in the request, you must create a variable that stores it in the Connect flow.

  1. From the Palette, drag the Variable transformer to the main flow, before the Logger.
  2. In the Console panel, configure the Variable properties:
    • Display name: Set domain ID
    • Operation: Set variable
    • Name: domainId
    • Value: #[message.inboundProperties.'domainId']

  3. Select the Logger component and configure the component properties:
    • Message: #[payload]
  4. In Postman:
    1. Set the method to POST
    2. In the Headers tab, add a new header:
      • KEY: domainId
      • VALUE: d04dc138-1a9b-41de-9f1a-d19fbc37e78a

    3. Send the request and check the Anypoint Studio Console log for the following line:

      domainId: d04dc138-1a9b-41de-9f1a-d19fbc37e78a

Read the domain ID from the request

  1. From the Palette, drag the Set Payload transformer to the main flow, before the Logger.
  2. Configure the Set Payload properties:
    • Value: #[flowVars.domainId]

  3. Save the project and run the application if it is stopped.
  4. In Postman:
    • Send the same POST request.

    The response contains the domain ID: d04dc138-1a9b-41de-9f1a-d19fbc37e78a

  5. If the test is successful, delete the Set Payload transformer.

Add sub-flows

To make the flow easier to read, split it into sub-flows based on tasks it performs:

  1. Load assets for a given community.
  2. Load profiling data for the given assets.
  3. Load the merged information, which is the profiling data, back to Collibra Data Governance Center.
  4. Merge information within the flow.

You create sub-flows for the first two steps because they include more than one sub-task. You can use the main flow for steps 3 and 4.

  1. Stop the project: In the Console, right-click and select Terminate/Disconnect All.
  2. From the Palette, drag the Sub Flow scope to the canvas, below the main flow.
  3. Change the name of the sub-flow to: load-profiling-data-subflow.
  4. From the Palette, drag a second Sub Flow scope to the canvas, below the first sub-flow.
  5. Change the name of the second sub-flow to: list-assets-subflow.
  6. From the Palette, drag two Flow Reference components to the main flow, before the Logger.
  7. Configure the first Flow Reference component:
    • Display Name:Load profiling data subflow.
    • Flow Name: select load-profiling-data-subflow from the drop-down menu.
  8. Configure the second Flow Reference component:
    • Display Name: List assets subflow.
    • Flow Name: select list-assets-subflow from the drop-down menu.

  9. From the Palette, drag a Logger component to each of the sub-flows.
  10. The connect flow now looks like this:

  11. Save the project.
  12. Run the project.
  13. Test the project:
    1. In Postman, send the same POST request containing the domain ID.
    2. If there is a problem running the application, restart Anypoint Studio.

Get Assets for a specific domain

You connect to Collibra Data Governance Center using the CollibraDGC connector. You get assets for a given community and store them into a JAVA object for manipulation.

The CollibraDGC connector configuration properties are in the src/main/properties/config.properties.

  1. From the Palette, drag the CollibraDGC connector to the list-assets-subflow sub-flow, before the Logger.
  2. Configure the CollibraDGC connector:
    • In the General tab, in the Basic Settings section, click the edit icon of the Connector Configuration:
      • If the connection is not configured, use the information from the config.properties file.
      • Test the connection.
      • Close the Connector Configuration window.
    • In the General tab, in the Basic Settings section, set the Operation to List assets.
    • In the General section, set the Domain Id to #[flowVars.domainId].
  3. From the Palette, drag the Transform Message component to the list-assets-subflow sub-flow, before the Logger.
  4. Configure the Transform Message component:
    • In the Output section, in the lower-right part of the screen, paste the following code:
      %dw 1.0
      %output application/java
      
      %function getName(name) trim ((name splitBy '>')[-1])
      ---
      {(
      	payload map (asset,index) ->
      	{
      		(getName(asset.name)):{
      			"id": asset.id,
      			"assetName": asset.name,
      			"domainId": asset.domain.id,
      			"communityName": flowVars.community,
      			"domainName": asset.domain.name
      		}
      	})
      }
  5. Delete the Logger component.
  6. The connect flow now looks like this:

  7. Save the project.
  8. In Postman, send the same POST request containing the domain ID.
  9. The response contains the binary payload:

  10. If you want to see the JSON representation of the assets collection:
    1. From the Palette, drag the Object to JSON transformer to the list-assets-subflow sub-flow, after the Transform Message.
    2. Save the project.
    3. In Postman, send the same POST request containing the domain ID.
    4. The response contains the JSON payload:

    5. Delete the Object to JSON transformer as it is not needed for the rest of the Connect flow.

Get the profiling data

In this part you connect to the profiling server and get the results. The profiling is done in 2 steps:

  • Start the profiling process by submitting the CSV file to be profiled.
  • Query the results.
  1. From the Palette, drag the Set Payload transformer to the load-profiling-data-subflow sub-flow, before the Logger.
  2. Configure the Set Payload properties:
    • Value: {"input":"/Documents/UTH stuff/product_price_forecast_20181031.csv"}
  3. Delete the Logger component.
  4. From the Palette, drag the HTTP connector to the load-profiling-data-subflow sub-flow, after the Set Payload transformer.
  5. Configure the HTTP connector:
    • Display name:Request profiling
    • Connector Configuration: HTTP_Request_Profiling_Config
    • Path: /profile
    • Method: POST
  6. From the Palette, drag the Transform Message component to the load-profiling-data-subflow sub-flow, after the HTTP connector.
  7. Configure the Transform Message component:
    • In the Output section, in the lower-right part of the screen, paste the following code:
      %dw 1.0
      %output application/java
      ---
      payload.jobId
  8. From the Palette, drag the Until Successful scope to the load-profiling-data-subflow sub-flow, after the Transform Message component.
  9. Configure the Until Successful scope:
    • Failure Expression: #[message.inboundProperties['http.status'] != 200]
    • Milliseconds Between Retries: 5000
    • In the Threading tab, select the Synchronous check-box.

  10. From the Palette, drag the HTTP connector to the Until Successful scope.

  11. Configure the HTTP connector:
    • Display name: Request profiling results
    • Connector Configuration: HTTP_Request_Profiling_Result
    • Path: #[payload]
    • Method: GET
  12. From the Palette, drag the JSON to Object transformer to the load-profiling-data-subflow sub-flow, after theUntil Successful scope.
  13. Configure the JSON to Object transformer:
    • Return Class: uthdemo.pojo.ProfilingData
  14. From the Palette, drag the Variable to the load-profiling-data-subflow sub-flow, after the JSON to Object transformer.
  15. Configure the Variable transformer:
    • Display name: Profiling data
    • Operation: Set variable
    • Name: profilingData
    • Value: #[payload]
  16. From the Palette, drag the Logger component to the list-assets-subflow sub-flow, after the Variable.
  17. Configure the Logger component:
    • Message: #[payload]
  18. The sub-flow now looks like this:

  19. Save the project.
  20. Run the project id it is stopped.
  21. In Postman, send the same POST request containing the domain ID.
  22. The response contains the binary payload:

  23. If you want to see the JSON response:
    1. Delete the JSON to Object transformer from the load-profiling-data-subflow sub-flow.
    2. From the Palette, drag the Set Payload transformer to the main flow, before the Logger.
    3. Configure the Set Payload properties:
      • Value: #[flowVars.profilingData]
      • MIME Type: text/json
    4. Save the project.
    5. In Postman, send the same POST request containing the domain ID.
    6. The response is now in JSON format:

    7. Return the Connect flow to the previous state:
      • Delete the Set Payload transformer from the main flow.
      • Add the JSON to Object transformer to the load-profiling-data-subflow sub-flow and configure it.

Merge and send the information

In this part, you aggregate the information stored in the Connect flow variables, the domain assets and the profiling information, and prepare it for submission to Collibra Data Governance Center.

  1. From the Palette, drag the Transform Message component to the main flow, before the Logger.
  2. Configure the Transform Message component:
    • In the Output section, in the lower-right part of the screen, paste the following code:
      %dw 1.0
      %output application/json
      ---
      {
      	"columnProfiles": flowVars.profilingData.columnProfiles map ((profile,index) -> {
      		assetIdentifier: payload[profile.columnName] default "",
      		categoricalMetadata: profile.categoricalMetadata,
      		columnName : profile.columnName,
      		counts: profile.counts,
      		dataType: profile.dataType,
      		distributions : profile.distributions,
      		quantiles : profile.quantiles,
      		samples: profile.samples,
      		statistics : profile.statistics
      	})
      }
  3. From the Palette, drag the HTTP connector to the main flow, before the Logger.
  4. Configure the HTTP connector:
    • Display name: Submit data
    • Connector Configuration: HTTP_Request_SubmitProfilingData
    • Path: /profiling/columns
    • Method: PATCH

The complete Connect flow looks like this:

Run the application

To run the application, send the same POST request from Postman with the following header:

  • KEY: domainId
  • VALUE: d04dc138-1a9b-41de-9f1a-d19fbc37e78a

You get a response after a few seconds:

Tip   To see the execution of the steps, follow the Anypoint Studio Console log.