In this lab module, we will use the Dataplex Lineage API and manage lineage captured in the previous modules.
Completion of prior modules
15 minutes or less
https://cloud.google.com/data-catalog/docs/reference/data-lineage/rest
............................................................................
............................................................................
Paste the below in cloud shell-
PROJECT_ID=`gcloud config list --format "value(core.project)" 2>/dev/null`
PROJECT_NBR=`gcloud projects describe $PROJECT_ID | grep projectNumber | cut -d':' -f2 | tr -d "'" | xargs`
LOCATION_MULTI="us"
LOCATION="us-central1"
Lineage API supports the following when it comes to lineage processes:
- Create a process
- List processes
- Get specific process
- Update a process name
- Delete a process
You need to provide the process ID for lineage management.
Lets write all the processes to a file
mkdir -p ~/temp-lineage
cd ~/temp-lineage
rm -rf *
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes >> my_lineage_processes.json
Lets find the custom process we created
cat my_lineage_processes.json | grep -C 1 "Load Public Dataset"
Author's output:
THIS IS INFORMATIONAL
"name": "projects/36819656457/locations/us/processes/67bb9f5a-88cd-49d4-8d68-a1f82cde7d5a",
"displayName": "Load Public Dataset"
},
First lets try to display the process ID:
cat my_lineage_processes.json | grep -C 1 "Load Public Dataset" | grep name | cut -d':' -f2 | cut -d '/' -f6
Lets assign it to a variable-
MY_CUSTOM_PROCESS_ID=`cat my_lineage_processes.json | grep -C 1 "Load Public Dataset" | grep name | cut -d':' -f2 | cut -d '/' -f6 | tr -d \" | tr -d , `
echo $MY_CUSTOM_PROCESS_ID
You should see a GUID.
The current process name is "Load Public Dataset". Lets update it to - "Load Chicago Crimes Public Dataset".
curl -X PATCH -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID -d "{ \
\"displayName\": \"Load Chicago Crimes Public Dataset\" \
}"
Lineage API supports the following when it comes to lineage process runs:
- Create a process run
- List process runs
- Get specific process run
- Update a process run name
- Delete a process run
You need to provide the process run ID for lineage event association.
Lets write all the lineage events for the process above to a file
cd ~/temp-lineage
rm -rf my_lineage_process_runs.json
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs >> my_lineage_process_runs.json
Lets find the custom process run we created with grep; In our case, the process has only one run, so, we just have one item listed.
cat my_lineage_process_runs.json
Author's output:
THIS IS INFORMATIONAL
"runs": [
{
"name": "projects/36819656457/locations/us/processes/67bb9f5a-88cd-49d4-8d68-a1f82cde7d5a/runs/483fa91e-35dc-4e1a-8782-5c1302778e09",
"displayName": "One time load",
"startTime": "2022-01-23T14:14:11.238Z",
"endTime": "2022-01-23T14:16:11.238Z",
"state": "COMPLETED"
}
]
}
First lets try to display the process run ID:
cat my_lineage_process_runs.json | grep name | cut -d':' -f2 | cut -d '/' -f8 | tr -d \" | tr -d ,
Lets assign it to a variable-
MY_CUSTOM_PROCESS_RUN_ID=`cat my_lineage_process_runs.json | grep name | cut -d':' -f2 | cut -d '/' -f8 | tr -d \" | tr -d ,`
echo $MY_CUSTOM_PROCESS_RUN_ID
You should see a GUID. We will later delete this run.
Lineage API supports the following when it comes to lineage events:
- Create a lineage event
- List lineage events
- Get specific lineage event
- Delete a lineage event
You need to provide the process run ID for lineage event management.
cd ~/temp-lineage
rm -rf my_lineage_event_runs.json
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/$MY_CUSTOM_PROCESS_RUN_ID/lineageEvents/ >> my_lineage_event_runs.json
Author's output:
THIS IS INFORMATIONAL
cat my_lineage_event_runs.json
{
"lineageEvents": [
{
"name": "projects/36819656457/locations/us/processes/67bb9f5a-88cd-49d4-8d68-a1f82cde7d5a/runs/483fa91e-35dc-4e1a-8782-5c1302778e09/lineageEvents/d88ca91e-32f2-4433-acc4-bbf22c0c6715",
"startTime": "2022-01-01T14:14:11.238Z",
"links": [
{
"source": {
"fullyQualifiedName": "internet:dataplex-oda.government_datasets.chicago_crimes_dataset"
},
"target": {
"fullyQualifiedName": "bigquery:bigquery-public-data.chicago_crime.crime"
}
}
]
}
]
}
MY_CUSTOM_LINEAGE_RUN_ID=`cat my_lineage_event_runs.json | grep name | cut -d':' -f2 | cut -d'/' -f10 | tr -d \" | tr -d ,`
echo $MY_CUSTOM_LINEAGE_RUN_ID
You should see a GUID.
a) List the candidate for deletion-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/$MY_CUSTOM_PROCESS_RUN_ID/lineageEvents/
b) Delete the candidate for deletion-
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/$MY_CUSTOM_PROCESS_RUN_ID/lineageEvents/$MY_CUSTOM_LINEAGE_RUN_ID
c) Validate deletion with a list command-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/$MY_CUSTOM_PROCESS_RUN_ID/lineageEvents/
You should see empty curly braces.
a) List the candidate for deletion-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/
b) Delete the candidate for deletion-
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/$MY_CUSTOM_PROCESS_RUN_ID
c) Validate deletion with a list command-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID/runs/
You should see empty curly braces.
a) List the processes-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/
b) Delete the candidate for deletion-
curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/$MY_CUSTOM_PROCESS_ID
c) Validate deletion with a list command-
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application.json" https://us-datalineage.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION_MULTI/processes/
You should NOT see the deletion targeted process.
This concludes the lab module. Proceed to the next module.