Skip to content

Latest commit

 

History

History
282 lines (207 loc) · 13.2 KB

README.md

File metadata and controls

282 lines (207 loc) · 13.2 KB

AI Platform Notebook Security Blueprint: Protecting PII Data

This repository provides an opinionated way to set up AI Platform Notebook in a secure way using Terraform.

This is not an officially supported Google product

Reference Architecture

Reference Architecture

The resources that this module will create are:

  • One AI Platform Notebook per Notebook user
  • Service Account for Notebooks
  • an HSM key used for Customer Managed Encryption Keys (CMEK) in each Notebook
  • Custom Role to restrict exporting data
  • Google Cloud Storage bucket with bootstrap code for Notebooks
  • Org Policies at the folder that the trusted-data project is in
    • constraints/gcp.resourceLocations
    • constraints/iam.disableServiceAccountCreation
    • constraints/iam.disableServiceAccountKeyCreation
    • constraints/iam.automaticIamGrantsForDefaultServiceAccounts
    • constraints/compute.requireOsLogin
    • constraints/compute.restrictProtocolForwardingCreationForTypes
    • constraints/compute.restrictSharedVpcSubnetworks

Assumptions

  • You have your Project and network configuration available for where you want to deploy your trusted environment.
  • You have the appropriate IAM permissions to configure project resources (see service account roles).
  • You have an IAM Group and list of identities that is allowed to access the trusted environment.
  • You are familiar with your organization's security best practices and policies. Learn about Google Cloud security foundation best practices by reading the security foundation blueprint.

Prerequisites

Prepare your admin workstation

You can use Cloud Shell, a local machine or VM as your admin workstation

Tools for Cloud Shell as your Admin workstation

Tools for a local workstation as your Admin workstation

Installation instructions for Tools for your environment

Install Cloud SDK

This is pre installed if you are using Cloud Shell

The Google Cloud SDK is used to interact with your GCP resources. Installation instructions for multiple platforms are available online.

Install Terraform

Terraform is used to automate the manipulation of cloud infrastructure. Its installation instructions are also available online. When configuring terraform for use with Google cloud create a service account as detailed in Getting started with the google provider

Authentication

After installing the gcloud SDK run gcloud init to set up the gcloud cli. When executing choose the correct region and zone

'gcloud init'

Ensure you are using the correct project . Replace my-project-name with the name of your project

Where the project name is my-project-name

gcloud config set project my-project-name

Compatibility

This module is meant for use with Terraform 0.12. Learn how to upgraded to the required version.

Usage

Basic usage of this module is as follows:

module "notebooks_blueprint_security" {
  source  = "GoogleCloudPlatform/notebooks-blueprint-security/google"

  vpc_perimeter_regions           = ["US", "DE"]
  vpc_perimeter_policy_name       = "higher_trust_perimeter_policy"
  vpc_perimeter_ip_subnetworks    = ["NETWORK_CIDR"]  # allowed to access VPC-SC perimeters
  zone                            = "us-central1-a"
  resource_locations              = ["in:us-locations", "in:eu-locations"]
  notebook_key_name               = "trusted-data-key"
  dataset_id                      = "sample_ds_for_notebooks"
  notebook_name_prefix            = "trusted-sample"
  bootstrap_notebooks_bucket_name = "notebook_bootstrap"
  default_policy_id               = "12345678"  # likely org id
  project_trusted_analytics       = "trusted-analytics"
  project_trusted_data            = "trusted-data"
  project_trusted_kms             = "trusted-kms"
  trusted_private_network         = "projects/<shared-restricted-prj>/global/networks/<your_vpc>"
  trusted_private_subnet          = "projects/<shared-restricted-prj>/regions/<region>/subnetworks/<your_subnets_for_notebooks>"
  confidential_groups             = ["group:[email protected]", "group:[email protected]"]
  trusted_scientists              = ["user:[email protected]", "user:[email protected]"]
}
  1. Create a tfvars file with the required inputs (see Inputs section below)

  2. terraform init to get the plugins

  3. terraform plan -var-file="YOUR_FILE.tfvars" to see the infrastructure plan. Note: Replace YOUR_file with the name of your tfvars file from the first step

  4. terraform apply -var-file="YOUR_FILE.tfvars" to apply the infrastructure build. Note: Replace YOUR_file with the name of your tfvars file from the first step

  5. Access your AI Platform Notebook

    • establish an SSH tunnel from your device to your AI Platform Notebook
    • in your browser, visit http://localhost:8080 to access your AI Platform Notebook

Be sure to specify your PROJECT_ID, DATASET, and TABLE below, which should match your terraform.tfvars file.

%%bigquery
SELECT
  *
FROM `PROJECT_ID.DATASET.TABLE`
LIMIT 10
  1. terraform destroy -var-file="YOUR_FILE.tfvars" to destroy the built infrastructure. Note: Replace YOUR_file with the name of your tfvars file from the first step

Adding identities to groups

  1. You may need to add service accounts the appropriate IAM high trust data scientist group.
# please change the values below to your specific values
gcloud identity groups memberships add --group-email [email protected] --member-email = sa-p-notebook-compute@<proj>.iam.gserviceaccount.com

Accessing Notebooks

Use ssh to access your notebook. Notebooks have no external IP and users should not impersonate the Notebook service account. Learn how to open an ssh tunnel to launch JuptyerLab, by reading the SSH to access JupyterLab article.

Functional examples are included in the examples directory.

Inputs

Inputs

Name Description Type Default Required
bootstrap_notebooks_bucket_name Bucket name to create bootstrap scripts for notebooks. string "notebook_bootstrap" no
confidential_groups The list of groups allowed to access PII data. list(string) n/a yes
dataset_id BigQuery dataset ID with PII data that your scientists need to access from their Notebook. string n/a yes
default_policy_id The id of the default org policy. string n/a yes
notebook_key_name HSM key used to protect PII data in Notebooks. string "trusted-data-key" no
notebook_name_prefix Prefix for notebooks indicating in higher trusted environment. string "trusted-sample" no
project_trusted_analytics The trusted project for analytics activities and data scientists. string n/a yes
project_trusted_data The trusted project that has PII data for notebooks. string n/a yes
project_trusted_kms Top level trusted environment folder that will house the encryption keys. string n/a yes
resource_locations The locations used in org policy to limit where resources can be provisioned. list(string)
[
"in:us-locations",
"in:eu-locations"
]
no
trusted_private_network Network with no external IP for Notebooks. Should be a restricted private VPC. string n/a yes
trusted_private_subnet Subnet with no external IP for Notebooks. Should be part of a restricted private network and have logs and private network enabled. string n/a yes
trusted_scientists The list of trusted users. list(string) n/a yes
vpc_perimeter_ip_subnetworks IP subnets for perimeters. list(string) n/a yes
vpc_perimeter_policy_name Policy name for VPC service control perimeter. string "higher_trust_perimeter_policy" no
vpc_perimeter_regions 2 letter identifier for regions allowed for VPC access. A valid ISO 3166-1 alpha-2 code. list(string) n/a yes
zone The zone in which to create the secured notebook. Must match the region. string n/a yes

Outputs

Name Description
access_level_name access level name used in the perimeter policy
bkt_notebooks_name name of bootstrap bucket
caip_sa_email email of the SA used by CAIP; should not be a default SA
folder_trusted folder that holds all the trusted projects and constraints
notebook_instances list of notebooks created (vm names)
notebook_key_name name of the key used in the notebooks.
notebook_key_ring_name name of keyring
perimeter_name vpc-sc perimeter name
script_name name of the post startup script installed
vpc_perimeter_resource_protected list of projects included in the VPC-Sc perimeter

Requirements

These sections describe requirements for using this module.

Software

The following dependencies must be available:

Service Account

A service account with the following roles must be used to provision the resources of this module:

Organization Level

  • Access Context Manager Policy Admin: roles/accesscontextmanager.policyAdmin
  • Organization Policy Admin: roles/orgpolicy.policyAdmin
  • Security Admin: roles/iam.securityAdmin
  • Service Usage Consumer: roles/serviceusage.serviceUsageConsumer

Restricted Shared VPC Project (created in blueprint foundation)

  • Network Admin: compute.networkAdmin

Analytics Project

  • Service Account Creator: roles/iam.serviceAccountCreator
  • Cloud KMS Admin: roles/cloudkms.admin
  • Compute Instance Admin: roles/compute.admin
  • BigQuery Job User: roles/bigquery.jobUser
  • BigQuery User: roles/bigquery.user
  • Notebooks Runner: roles/notebooks.runner
  • Service Account User: roles/iam.serviceAccountUser
  • Service Usage Admin: roles/serviceusage.serviceUsageAdmin

Data Project

  • BigQuery Job User: roles/bigquery.jobUser
  • BigQuery User: roles/bigquery.user
  • Role Administrator: roles/iam.roleAdmin
  • Storage Admin: roles/storage.admin

KMS Project

  • Cloud KMS Admin: roles/cloudkms.admin

The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.

Enable APIs

In order to operate with the Service Account you must activate the following APIs on the project where analytics and Notebooks reside:

  • Access Context Manager API: accesscontextmanager.googleapis.com
  • BigQuery API: bigquery.googleapis.com
  • Compute Engine API: compute.googleapis.com
  • Identity and Access Management (IAM) API: iam.googleapis.com
  • Key Management Service (KMS) API: cloudkms.googleapis.com
  • Notebooks (AI Platform) API: notebooks.googleapis.com
  • Google Cloud Storage API: storage.googleapis.com
  • Resource Manager API: cloudresourcemanager.googleapis.com
  • IAM Service Account Credentials API: iamcredentials.googleapis.com

In order to operate with the Service Account you must activate the following APIs on the project where your KMS/HSM keys reside:

  • Google Cloud Storage API: storage.googleapis.com
  • Key Management Service (KMS) API: cloudkms.googleapis.com

Resource Hierarchy

Within your Org's prod environment, create a folder to hold your trusted projects and centrally managed your policies for Notebooks that use PII data. Note: the fldr-prod is created by the foundation blueprint. Create folders by using the project factory

fldr-prod
└── fldr-trusted
    ├── trusted-data
    ├── trusted-analytics
    └── trusted-kms

Contributing

Refer to the contribution guidelines for information on contributing to this module.