Name		Name	Last commit message	Last commit date
parent directory ..
terraform		terraform
README.md		README.md
comparison.html		comparison.html
lookerstudio.png		lookerstudio.png
ranger_export.png		ranger_export.png

README.md

Apache Ranger Hive to BigLake/BigQuery IAM migration assessment

With Google Cloud Dataproc, you can deploy various Hive versions and use them like on-premise. One of the challenges in migration is authorization. It is possible to deploy Apache Ranger on Dataproc to preserve the authorization rules. Ranger-Hive plugin works as it is, and recently Ranger-GCS plugin has been introduced to replace Ranger-HDFS rules.

After an initial migration, many teams wants to adopt:

BigQuery datawarehouse for its serverless architecture instead of maintaining Hive infrastructure
BigLake API as a single interface to data stored in GCS, AWS S3 or Azure Data Lake. BigLake provides the exact fine grained access control rules as BigQuery.

One of the challenges in this, in addition to data migration and SQL translation, is the adaptation of Apache Ranger authorization rules to BigLake/BigQuery IAM. In this work, we provide an assessment tool for you to see what can be copied as it is and what you have to re-architecture.

BigLake/BigQuery IAM vs Ranger Hive Plugin

The table below compares these authorization systems.

	BigLake/BigQuery IAM	Ranger Hive Plugin
Users and Groups	supported	supported
Permission Mapping	Select -> bigquery.tables.getData, bigquery.jobs.create Update -> bigquery.tables.updateData Create -> bigquery.tables.create Drop -> bigquery.tables.delete Alter -> bigquery.tables.update Index, -> index is different in BigQuery, clustering and partitioning features exist. Lock -> there is no lock
Authorization Granularity	Dataset, Table, Column, Row
Data Masking on Columns	supported	supported
Policy Labels	supported	supported
Tag Based Access Control	Policy Tags for column based access control Resource Tags at BigQuery dataset level	supported
Allow Exceptions	not supported	supported
Deny Policies	not supported. Preview for GCS. Please check the latest information.	supported
Policy Priorities (Priorities of 0 or 1)	Not supported since IAM is additive. In Ranger, policy priorities are used to allow access temporarily by overriding other policies. We can achieve temporary access by using group memberships	supported
Wildcard	Wildcard () are not supported in IAM. For instance, we cannot create IAM rules for finance_ However, we can group resources in Projects, Folders etc.	supported
Validity Period	Validity period is not supported since BigQuery IAM does not support conditions yet (please check the latest information). Nevertheless, temporary access can be given by temporary group membership, which is also the recommended since it does not require changes in the IAM policy and hence no need to run a CI/CD pipeline.	supported

How the assessment works

The assessment tool is in fact a dashboard where we:

load Ranger policies to BigQuery,
apply transformations via views
display the results in a Looker studio dashboard

Steps for the assessment

Export Ranger policies for hive
Convert them to newline delimited json format so that BQ can parse. You may use jq:

cat <RANGER EXPORT>.json | jq -c '.policies[]' > ranger_policies.jsonl

We should create or re-use a GCS bucket to upload the ranger_polices.jsonl. Then we need BigQuery tables and views for analysis. Under terraform folder you find all the necessary scripts to create all.
```
cd terraform
gcloud auth application-default login
# Create and set the parameters in terraform.tfvars file
# vi terraform.tfvars
terraform apply
```

copy the policy export to GCS:

export PROJECT_ID=<your project id>
gsutil cp ranger_policies.jsonl gs://$PROJECT_ID-ranger-assessment/

load into BQ:

export REGION=<region for the dataset>
bq load --location=$REGION  \
--source_format=NEWLINE_DELIMITED_JSON \
$DATASET_ID.hive_ranger_policies \
gs://$PROJECT_ID-ranger-assessment/ranger_policies.jsonl

If you have used the terraform you may skip creating the views. Otherwise: Create hive_policies_view from the BigQuery console:

create or replace durmusio_experiements.hive_policies_view as
(
  select *, 
  array_length(validitySchedules) > 0 as temporary_policy,
  (array_length(denyPolicyItems) > 0 or isDenyAllElse) as hasDenyPolicy,
  array_length(policyItems) > 0 as hasAllowPolicy,
  array_length(allowExceptions) > 0 as hasAllowExceptions,
  (resources.udf is not null or resources.url is not null) as isNonBQResource,
  (array_length(resources.column.values) > 0 and regexp_contains(ARRAY_TO_STRING(resources.column.values, ','), r'[0-9A-Za-z_]' ) ) as isColumnBased,
  (array_length(resources.table.values) > 0 and regexp_contains(ARRAY_TO_STRING(resources.table.values, ','), r'[0-9A-Za-z_]' ) and regexp_contains(ARRAY_TO_STRING(resources.column.values, ','), r'\*' ) ) as isTableBased,
  (array_length(resources.database.values) > 0 and not regexp_contains(ARRAY_TO_STRING(resources.table.values, ','), r'[0-9A-Za-z_]' ) and not regexp_contains(ARRAY_TO_STRING(resources.column.values, ','), r'[0-9A-Za-z_]' ) ) as isDatabaseBased,
  (resources.column.isExcludes or resources.database.isExcludes or resources.table.isExcludes) as hasExcludes,
  ["Authorization", "Column Masking", "Row Level Filtering"][OFFSET(policyType)] as policyTypeName,
  
  from `[YOUR PROJECT ID].ranger_assessment.fk_hive_policies`
)

Then create statistics view:

create or replace view ranger_assessment.hive_policy_stats (AllPolicies, ValidPolicies, Deny, AllowExceptions, Temporary, HighPriority, HasExcludes) as (

select
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view`) as number_of_policies,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where not (
hasDenyPolicy = true or hasAllowExceptions = true or hasExcludes = true or temporary_policy = true or policyPriority = 1
)) as number_of_valid_policies,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where 
hasDenyPolicy = true 
) as number_of_deny_policies,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where 
hasAllowExceptions = true 
) as number_of_allow_exceptions,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where 
temporary_policy = true 
) as number_of_temporary_policies,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where 
policyPriority > 0 
) as number_of_high_priority_policies,
(select count(1) from `YOUR_PROJECT_ID.ranger_assessment.hive_policies_view` where 
HasExcludes = true 
) as number_of_policies_with_excludes,
)

Finally make a copy of the data studio example and change the source to your own BigQuery dataset. If you don't have access to dashboard to copy, please request access.

LICENSE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ranger-to-bigquery-biglake-assessment

ranger-to-bigquery-biglake-assessment

README.md

Apache Ranger Hive to BigLake/BigQuery IAM migration assessment

BigLake/BigQuery IAM vs Ranger Hive Plugin

How the assessment works

Steps for the assessment

LICENSE

Files

ranger-to-bigquery-biglake-assessment

Directory actions

More options

Directory actions

More options

Latest commit

History

ranger-to-bigquery-biglake-assessment

Folders and files

parent directory

README.md

Apache Ranger Hive to BigLake/BigQuery IAM migration assessment

BigLake/BigQuery IAM vs Ranger Hive Plugin

How the assessment works

Steps for the assessment

LICENSE