Run Catalog data profiling jobs on Apache Spark clusters

In Collibra, profiling jobs are executed in JobServer, which runs Spark in local mode. With the Collibra Catalog Profiling Library, you can leverage your infrastructure and scale up profiling jobs to get more out of your Collibra Catalog.

As a profiling library user, you control the data that is profiled: you can also ingest and profile data sources that are not supported out-of-the-box by the Collibra Catalog application. You can define your own Spark DataSet, run the profiling library and then transfer the result to the Collibra Catalog.

Find out more and download the Collibra Catalog Profiling Library from GitHub.