This repository contains the Tag Engine application, which is an open-source extension to Google Cloud's Data Catalog. Tag Engine automates the tagging of BigQuery and Cloud Storage assets.
- If you are new to Tag Engine, start with this tutorial.
- To learn about Tag Engine's API methods, read the API reference guide.
- To learn about Tag Engine's UI features, read the UI guide.
- To upgrade your existing Tag Engine installation, read the upgrade guide.
Tag Engine requires both Google App Engine and Firestore. It also assumes that you will be tagging assets in BigQuery or Google Cloud Storage. Follow the steps below to deploy the Tag Engine application in your Google Cloud project.
Note: In the deployment procedure below, we use one GCP project for running Tag Engine and Data Catalog and another project for storing data assets in BigQuery. If this is your first time running Tag Engine, you may want to keep everything in one project for simplicity.
export TAG_ENGINE_PROJECT=tag-engine-vanilla-337221
export TAG_ENGINE_REGION=us-central
export BQ_PROJECT=warehouse-337221
export TAG_ENGINE_SA=${TAG_ENGINE_PROJECT}@appspot.gserviceaccount.com
gcloud config set project $TAG_ENGINE_PROJECT
gcloud services enable iam.googleapis.com
gcloud services enable appengine.googleapis.com
git clone https://github.com/GoogleCloudPlatform/datacatalog-tag-engine.git
cd datacatalog-tag-engine
cat > deploy/variables.tfvars << EOL
tag_engine_project="${TAG_ENGINE_PROJECT}"
bigquery_project="${BQ_PROJECT}"
app_engine_region="${TAG_ENGINE_REGION}"
app_engine_subregion="${TAG_ENGINE_SUB_REGION}"
EOL
Edit the five variables in datacatalog-tag-engine/tagengine.ini
:
[DEFAULT]
TAG_ENGINE_PROJECT = tag-engine-develop
QUEUE_REGION = us-central1
INJECTOR_QUEUE = tag-engine-injector-queue
WORK_QUEUE = tag-engine-work-queue
BIGQUERY_REGION = us-central1
gcloud alpha firestore databases create --project=$TAG_ENGINE_PROJECT --region=$TAG_ENGINE_REGION
gcloud app create --project=$TAG_ENGINE_PROJECT --region=$TAG_ENGINE_REGION
gcloud app deploy datacatalog-tag-engine/app.yaml
Note: The deploy command assumes that you will be running Tag Engine using App Engine's default Service Account (SA). This SA gets created automatically when you run the deploy command and is assigned the 'Editor' role on the project. Verify that the SA has been assigned the Editor role before continuing with the deployment.
gcloud app firewall-rules create 100 --action ALLOW --source-range [IP_RANGE]
gcloud app firewall-rules update default --action deny
Alternatively, control access to App Engine by user identity (instead of IP address) with Identity-Aware Proxy (IAP).
gcloud auth application-default login
cd datacatalog-tag-engine/deploy
terraform init
terraform apply -var-file=variables.tfvars
Note: The deployment can take up to one hour due to the large number of index builds. There are 27 Firestore indexes that get created sequentially to avoid hitting concurrency limits in Firestore.
gcloud app browse
Hint: read this tutorial to learn about Tag Engine's various tag configuration options.
- Open the Tag Engine UI:
gcloud app browse
-
Create a static asset config:
curl -X POST [TAG ENGINE URL]/static_asset_tags -d @examples/static_asset_configs/static_asset_create_auto_bq.json
-
Create a dynamic table config:
curl -X POST [TAG ENGINE URL]/dynamic_table_tags -d @examples/dynamic_table_configs/dynamic_table_create_auto.json
-
Create a dynamic column config:
curl -X POST [TAG ENGINE URL]/dynamic_column_tags -d @examples/dynamic_column_configs/dynamic_column_create_auto.json
-
Create a glossary asset config:
curl -X POST [TAG ENGINE URL]/glossary_asset_tags -d @examples/glossary_asset_configs/glossary_asset_create_ondemand_bq.json
-
Create a sensitive column config:
curl -X POST [TAG ENGINE URL]/sensitive_column_tags -d @examples/sensitive_column_configs/sensitive_column_create_auto.json
-
Create Data Catalog entry config:
curl -X POST [TAG ENGINE URL]/entries -d @examples/entry_configs/entry_create_auto.json
-
Import tags from CSV files:
curl -X POST [TAG ENGINE URL]/import_tags -d @examples/import_configs/import_column_tags.json
-
Export tags to BigQuery tables:
curl -X POST [TAG ENGINE URL]/export_tags -d @examples/export_configs/export_tags_by_project.json
-
Restore tags from Data Catalog's metadata export:
curl -X POST [TAG ENGINE URL]/restore_tags -d @examples/restore_configs/restore_table_tags.json
-
Get the status of a job:
curl -X POST [TAG ENGINE URL]/get_job_status -d '{"job_uuid":"47aa9460fbac11ecb1a0190a014149c1"}'
- Consult the App Engine logs if you encounter any errors while using Tag Engine:
gcloud app logs tail -s default