Configure Google Cloud Platform for Collibra Insights consumption
Introduction
Usage Analytics creates Parquet files to store and deliver the Reporting Data Layer. You must store the files and make the reporting data available for consumption. The Google Cloud Platform provides ways for storage via Google Cloud Buckets and catering thorough Google BigQuery.
Follow these steps to create and configure a bucket named collibra-insights-dg where you store Parquet files with tables of the Reporting Data Layer exported from Collibra Data Intelligence Platform, and configure BigQuery to use that data.
Prerequisites
- Access to a Google Cloud account with Cloud Buckets and BigQuery.
- The Parquet files with the tables of the Reporting Data Layer extracted from the ZIP file.
- A project, for example, Collibra Insights.
Configure Google Cloud Storage to store Parquet files
- In the Google Cloud Platform project, go to Storage → Browser.
- Select CREATE BUCKET.
- Enter the required information:
- Name: for example, collibra-insights-dg.
- Location type: Region.
- Location: for example, europe-west1.
- Select a default storage class for your data: Standard.
- Select how to control access to objects: set object-level and bucket-level permissions.
- Optionally, select an encryption method in the Advanced settings section: for example, Google-managed key.
- Click CREATE to finish the process.
- In the newly created bucket, upload the Parquet folders with the tables of the Reporting Data Layer.
Configure BigQuery to make the Reporting Data Layer available
- In the Google Cloud Platform project, go to BigQuery.
- In the Resources section, select the bucket with the Reporting Data Layer tables. We are using collibra-insights-dg in this example.
Create a dataset
- In the collibra-insights-dg bucket, select CREATE DATASET.
- Enter the name of the dataset in the Dataset ID filed. The name is appended to the bucket name to form the ID.
- Leave all other fields to their default values.
Create the dataset tables
- In the collibra-insights-dg bucket, select CREATE TABLE.
- Enter the required Source information:
- Create table from: select Google Cloud Storage.
- Select file from GCS bucket: collibra-insights-dg/asset/*.
- File format: select
Parquet
.
- Enter the required Destination information:
- Project name: select your project. We are using Collibra Insights in this example.
- Dataset name: select your dataset name. We are using insights.
- Table name: enter the Reporting Data Layer table name, for example, asset.
- Leave all other fields to their default values.
- Click CREATE.
When the job completes, the table appears under the dataset.
Repeat the above steps for each table.
Verification
To verify that everything is working, you can run a test SQL in the Query editor:
select c.community_name , d.domain_name , a.* from insights.asset a join insights.domain d on a.domain_id = d.domain_id and a.snapshot_date = d.snapshot_date join insights.community c on c.community_id = d.community_id and c.snapshot_date = d.snapshot_date;
Next steps
You can now configure Tableau to display the data you store in Google Cloud Platform.