Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Shared cache grants check option #466

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

hulk8
Copy link
Contributor

@hulk8 hulk8 commented Sep 11, 2024

Description

Summary

I want to add some feature option for fixing possible insecure shared cache behavior with shared_with_all_users option enabled.
User who haven't grants for tables in query can get cached result if another user who have sufficient grants already executed same query (like SELECT * FROM table) because only user authorization checked before cache read but not grants (without shared_with_all_users enabled request will be redirected to click and then failed if user have insufficient grants).

Example

For example, if we have ClickHouse database ny_taxi (from dataset examples) with user_A and user_B:

CREATE DATABASE ny_taxi;

CREATE TABLE ny_taxi.trips (
    trip_id             UInt32,
    pickup_datetime     DateTime,
    dropoff_datetime    DateTime,
    pickup_longitude    Nullable(Float64),
    pickup_latitude     Nullable(Float64),
    dropoff_longitude   Nullable(Float64),
    dropoff_latitude    Nullable(Float64),
    passenger_count     UInt8,
    trip_distance       Float32,
    fare_amount         Float32,
    extra               Float32,
    tip_amount          Float32,
    tolls_amount        Float32,
    total_amount        Float32,
    payment_type        Enum('CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4, 'UNK' = 5),
    pickup_ntaname      LowCardinality(String),
    dropoff_ntaname     LowCardinality(String)
)
ENGINE = MergeTree
PRIMARY KEY (pickup_datetime, dropoff_datetime);

INSERT INTO ny_taxi.trips
SELECT
    trip_id,
    pickup_datetime,
    dropoff_datetime,
    pickup_longitude,
    pickup_latitude,
    dropoff_longitude,
    dropoff_latitude,
    passenger_count,
    trip_distance,
    fare_amount,
    extra,
    tip_amount,
    tolls_amount,
    total_amount,
    payment_type,
    pickup_ntaname,
    dropoff_ntaname
FROM gcs(
    'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
    'TabSeparatedWithNames'
);

CREATE USER user_A IDENTIFIED WITH plaintext_password BY 'user_A';
CREATE USER user_B IDENTIFIED WITH plaintext_password BY 'user_B';
GRANT SHOW, SELECT ON ny_taxi.* TO user_A;

and chproxy config:

caches:
  - name: "longterm"
    mode: "file_system"
    file_system:
      dir: "/app/chproxy/longterm/cache"
      max_size: 10Gb
    expire: 1h
    shared_with_all_users: true

server:
  http:
    listen_addr: ":80"

  https:
    listen_addr: ":443"
    autocert:
      cache_dir: "/app/chproxy/certs"
  proxy:
    enable: true
    header: CF-Connecting-IP

users:
  - name: "*"
    to_cluster: "click"
    to_user: "*"
    is_wildcarded: true
    cache: "longterm"

clusters:
  - name: "click"
    scheme: "http"
    nodes: ["click1:8123"]
    users:
      - name: "*"

I want to review 3 cases:

  1. Another user_C trying to execute query through chproxy. He will have 403 Forbidden status code with response content:
Code: 516. DB::Exception: user_C: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 24.6.2.17 (official build))

And this is OK.
2. If user_B who is present in ClickHouse will try to execute query SELECT * FROM ny_taxi.trips LIMIT 10 for example. He will receive 500 Internal Server Error with response content:

Code: 497. DB::Exception: user_B: Not enough privileges. To execute this query, it's necessary to have the grant SELECT(trip_id, pickup_datetime, dropoff_datetime, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, passenger_count, trip_distance, fare_amount, extra, tip_amount, tolls_amount, total_amount, payment_type, pickup_ntaname, dropoff_ntaname) ON ny_taxi.trips. (ACCESS_DENIED) (version 24.6.2.17 (official build))

And this is OK too.
3. If user_A who is present in ClickHouse and have grants for ny_taxi.trips will execute query SELECT * FROM ny_taxi.trips LIMIT 10 he will receive 200 Ok and response data, which will be also cached in file_system_cache. And finally, until cached result expires user_B (who haven't sufficient grants for ny_taxi.trips) and any other user can execute this exact query and get cached response data even if they have no grants for SELECT on ny_taxi.trips. It works because of shared_with_all_users.
And this is not so good, but, as far as I understand, was designed this way.

Solution

My proposition is:

  1. Add config option check_grants_for_shared_cache in caches section which enables or disables additional grants check for cached request.
  2. Add additional grant check only for cached queries if shared_with_all_users: true and check_grants_for_shared_cache: true. This check will be performed only when user executes query and before load shared cache result. This check will create new request to ClickHouse (with same request scope) with wrapped original query into DESC ({{original_query}}). Along with query parsed and analyzed for return types (which is fast) ClickHouse checks permissions for the user to execute the query.

This solution I implemented in the PR.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Checklist

  • Linter passes correctly
  • Add tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary

Does this introduce a breaking change?

  • Yes
  • No

Further comments

In my cases of usage chproxy, shared cache between users is very helpful with some BI instruments, because:

  • BI requests like pagination, table structure and other queries are cached good.
  • BI othen have many users and it's good to share cache between them.

Security issue like this, when user can access to data (cached) with insufficient grants prevent from using shared cache in this cases.

This is the first draft and I will refactor it later if this is useful feature.

I will be glad, if someone can give me a hint to implement 3 cases from Example in tests.

@mga-chka
Copy link
Collaborator

fyi I'm a bit busy to review your PR, I asked the other maintainer to see if someone can have a look

Copy link
Collaborator

@vfoucault vfoucault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @hulk8 and thanks for your contribution. I had a look at the code, it is pretty straightforward.

What I'd love to see is some tests around the feature you are adding and also if you could fix lint issues.

@@ -146,6 +146,25 @@ func (rp *reverseProxy) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
}

if shouldReturnFromCache {
// if cache shared between all users
// try to check if cached query is allowed for current user
if s.user.cache != nil && s.user.cache.SharedWithAllUsers && s.user.cache.CheckGrantsForSharedCache {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to properly understand why we need to check s.user.cache here ? what is the goal of this.

Isn't Checking if a cache is shared && that we should check grants enough ?

rp.proxyRequest(s, srwCheck, srw, checkReq)

if srwCheck.statusCode == http.StatusOK {
log.Debugf("%s: check grants for shared cached query request success; query: %q; Method: %s; URL: %q", s, checkQuery, checkReq.Method, checkReq.URL.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding the user would be nice in the debug logs, wdyt ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants