Skip to content

Latest commit

 

History

History
121 lines (65 loc) · 2.47 KB

module-11-1d-auto-dq-duplicates.md

File metadata and controls

121 lines (65 loc) · 2.47 KB

M11-1c: Auto Data Quality - UNIQUENESS checks

The focus of this lab module is auto data quality - uniqueness checks, where you can specify column and have autoDQ check for duplicates.

Prerequisites

Successful completion of prior modules

Duration

5 minutes or less

Documentation

Data Quality Overview
About Auto Data Quality
Use Auto Data Quality

Learning goals

  1. Understand options for data quality in Dataplex
  2. Practical knowledge of running Auto Data Quality - uniqueness checks feature

Lab flow

LF



LAB



1. Target data for Data Quality checks

We will use the same table as in the Data Profiling lab module.

ADQ-3


Familiarize yourself with the table, from the BigQuery UI by running the SQL below-

SELECT * FROM oda_dq_scratch_ds.customer_master LIMIT 20


2. Create a Data Quality scan with uniqueness checks on client_id column

2.1. Navigate to Auto Data Quality in Dataplex UI

ADQ-3


2.2. Click on Create Data Quality Scan

ADQ-3


ADQ-3


2.3. Define Data Quality Rules - UNIQUENESS checks

Click on the scan and define rules. Lets start with recommendations from Data profiling results.

ADQ-3


ADQ-3


ADQ-3


ADQ-3


ADQ-3


2.4. Run Data Quality Rules - UNIQUENESS checks

Lets check all the fields for quality scan and click on "run now".

ADQ-3


ADQ-3


2.5. Job for Data Quality Rules - UNIQUENESS checks gets submitted

ADQ-3


2.6. Click on the DQ - UNIQUENESS job that completed & review the results

ADQ-3



This concludes the module. Proceed to the next module.