Configure Google Cloud Platform for Collibra Insights consumption

Introduction

Collibra Insights creates Parquet files to store and deliver the Reporting Data Layer. You must store the files and make the reporting data available for consumption. The Google Cloud Platform provides ways for storage via Google Cloud Buckets and catering thorough Google BigQuery.

Follow these steps to create and configure a bucket named collibra-insights-dg where you store Parquet files with tables of the Reporting Data Layer exported from Collibra Data Governance Center, and configure BigQuery to use that data.

Prerequisites

  • Access to a Google Cloud account with Cloud Buckets and BigQuery.
  • The Parquet files with the tables of the Reporting Data Layer extracted from the ZIP file.
  • A project, for example, Collibra Insights.

Configure Google Cloud Storage to store Parquet files

  1. In the Google Cloud Platform project, go to StorageBrowser.

    Path to the storage browser in a Google Cloud Platform project

  2. Select CREATE BUCKET.

    The Create Bucket button in the storage browser of a Google Cloud Platform project

  3. Enter the required information:
    • Name: for example, collibra-insights-dg.
    • Location type: Region.
    • Location: for example, europe-west1.
  4. Select a default storage class for your data: Standard.
  5. Select how to control access to objects: set object-level and bucket-level permissions.
  6. Optionally, select an encryption method in the Advanced settings section: for example, Google-managed key.
  7. Click CREATE to finish the process.
  8. In the newly created bucket, upload the Parquet folders with the tables of the Reporting Data Layer.

    A list of default Parquet folders representing tables of the Reporting Data Layer extracted from the Collibra Platform.

Configure BigQuery to make the Reporting Data Layer available

  1. In the Google Cloud Platform project, go to BigQuery.

    Location of BigQuery in a Google Cloud Platform project

  2. In the Resources section, select the bucket with the Reporting Data Layer tables. We are using collibra-insights-dg in this example.

    Location of the collibra-insignts-dg bucket in Google BigQuery

Create a dataset

  1. In the collibra-insights-dg bucket, select CREATE DATASET.
  2. Enter the name of the dataset in the Dataset ID filed. The name is appended to the bucket name to form the ID.
  3. Leave all other fields to their default values.

Create the dataset tables

  1. In the collibra-insights-dg bucket, select CREATE TABLE.
  2. Enter the required Source information:
    • Create table from: select Google Cloud Storage.
    • Select file from GCS bucket: collibra-insights-dg/asset/*.
    • File format: select Parquet.
  3. Enter the required Destination information:
    • Project name: select your project. We are using Collibra Insights in this example.
    • Dataset name: select your dataset name. We are using insights.
    • Table name: enter the Reporting Data Layer table name, for example, asset.
  4. Leave all other fields to their default values.
  5. Click CREATE.

When the job completes, the table appears under the dataset.

Repeat the above steps for each table.

A list of tables created in a Google BigQuery dataset corresponding to tables of the Reporting Data Layer extracted from the Collibra Platform

Verification

To verify that everything is working, you can run a test SQL in the Query editor:

select c.community_name
, d.domain_name
, a.*
from insights.asset a
join insights.domain d
on a.domain_id = d.domain_id
and a.snapshot_date = d.snapshot_date
join insights.community c
on c.community_id = d.community_id
and c.snapshot_date = d.snapshot_date;

Next steps

You can now configure Tableau to display the data you store in Google Cloud Platform.