Run a DQ Job from a PySpark notebook
1
2
3
4
5
6
7
8
9
Prerequisites
Steps
Step 1: Create a notebook
Step 2: Install PySpark
1
2
3
4
# Install
!pip install -q pyspark==3.4.1 # installs PySpark library version 3.4.1
!pip install -q findspark # installs the findspark libraryStep 3: Import the libraries
Step 4: Add the JAR files
Step 5: Add secrets and environment details
Step 6: Start the SparkSession
Step 7: Load the DataFrame
Step 8: Run the job
Step 9: Check the Jobs page in Collibra DQ
Last updated
Was this helpful?