-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BoilingData to ClickBench #125
Comments
I tried to use it with our public bucket, |
My test query is:
@santrancisco created an IAM role for me: and I've set it with
as in the documentation. But I'm still getting an access denied error. |
Sometimes it took over 2 minutes for a “select 1” to return for me |
This contradicts the statement on their website:
Realtime dashboarding is a natural and well-explored use case for ClickHouse, and I don't like when other inferior engines, like duckdb, make unfounded claims, pretending to have similar capabilities. |
Trying other queries:
but none works giving the same error |
I'm waiting for several minutes, but then it logs me out. I didn't have a single time the |
Hello, sorry for replying this late! Since Jul we have new GUI and new backend release so things should work better. You could use the bdcli to setup the IAM Role as it crafts the IAM Policy for you.
Does this help? |
This is the set of queries I'm running on Boiling. I've fixed some sort orders and column aliases to make the results comparable with local laptop running DuckDB instance. So, I have copied the SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID <> 0;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID = 435090932899640449;
SELECT UserID FROM parquet_scan('s3://boilingdata-demo/hits.parquet') ORDER BY UserID LIMIT 10;
SELECT COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%';
SELECT SearchPhrase, EventTime FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT AdvEngineID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE AdvEngineID > 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
SELECT SearchPhrase, MIN(URL) AS url, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC, SearchPhrase LIMIT 10;
SELECT COUNT(*) AS total, SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '' GROUP BY SearchPhrase ORDER BY SearchPhrase, total DESC LIMIT 10;
SELECT COUNT(DISTINCT SearchPhrase) AS total FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase != '';
SELECT COUNT(DISTINCT UserID) AS uniqueUserIds FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY u DESC LIMIT 10;
SELECT UserID, COUNT(*) AS count FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY UserID ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC, UserID LIMIT 10;
SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> '' AND UserID <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE MobilePhoneModel <> ''AND UserID <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC, SearchPhrase LIMIT 13;
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, COUNT(*) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE UserID <> NULL GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10;
SELECT SUM(ResolutionWidth) AS resWth, SUM(ResolutionWidth + 1) AS resW1, SUM(ResolutionWidth + 2) AS resW2, SUM(ResolutionWidth + 3) AS resW3, SUM(ResolutionWidth + 4) AS resW4, SUM(ResolutionWidth + 5) AS resW5, SUM(ResolutionWidth + 6) AS resW6, SUM(ResolutionWidth + 7) AS resW7, SUM(ResolutionWidth + 8) AS resW8, SUM(ResolutionWidth + 9) AS resW9, SUM(ResolutionWidth + 10) AS resW10, SUM(ResolutionWidth + 11) AS resW11, SUM(ResolutionWidth + 12) AS resW12, SUM(ResolutionWidth + 13) AS resW13, SUM(ResolutionWidth + 14) AS resW14, SUM(ResolutionWidth + 15) AS resW15, SUM(ResolutionWidth + 16) AS resW16, SUM(ResolutionWidth + 17) AS resW17, SUM(ResolutionWidth + 18) AS resW18, SUM(ResolutionWidth + 19) AS resW19, SUM(ResolutionWidth + 20) AS resW20, SUM(ResolutionWidth + 21) AS resW21, SUM(ResolutionWidth + 22) AS resW22, SUM(ResolutionWidth + 23) AS resW23, SUM(ResolutionWidth + 24) AS resW24, SUM(ResolutionWidth + 25) AS resW25, SUM(ResolutionWidth + 26) AS resW26, SUM(ResolutionWidth + 27) AS resW27, SUM(ResolutionWidth + 28) AS resW28, SUM(ResolutionWidth + 29) AS resW29, SUM(ResolutionWidth + 30) AS resW30, SUM(ResolutionWidth + 31) AS resW31, SUM(ResolutionWidth + 32) AS resW32, SUM(ResolutionWidth + 33) AS resW33, SUM(ResolutionWidth + 34) AS resW34, SUM(ResolutionWidth + 35) AS resW35, SUM(ResolutionWidth + 36) AS resW36, SUM(ResolutionWidth + 37) AS resW37, SUM(ResolutionWidth + 38) AS resW38, SUM(ResolutionWidth + 39) AS resW39, SUM(ResolutionWidth + 40) AS resW40, SUM(ResolutionWidth + 41) AS resW41, SUM(ResolutionWidth + 42) AS resW42, SUM(ResolutionWidth + 43) AS resW43, SUM(ResolutionWidth + 44) AS resW44, SUM(ResolutionWidth + 45) AS resW45, SUM(ResolutionWidth + 46) AS resW46, SUM(ResolutionWidth + 47) AS resW47, SUM(ResolutionWidth + 48) AS resW48, SUM(ResolutionWidth + 49) AS resW49, SUM(ResolutionWidth + 50) AS resW50, SUM(ResolutionWidth + 51) AS resW51, SUM(ResolutionWidth + 52) AS resW52, SUM(ResolutionWidth + 53) AS resW53, SUM(ResolutionWidth + 54) AS resW54, SUM(ResolutionWidth + 55) AS resW55, SUM(ResolutionWidth + 56) AS resW56, SUM(ResolutionWidth + 57) AS resW57, SUM(ResolutionWidth + 58) AS resW58, SUM(ResolutionWidth + 59) AS resW59, SUM(ResolutionWidth + 60) AS resW60, SUM(ResolutionWidth + 61) AS resW61, SUM(ResolutionWidth + 62) AS resW62, SUM(ResolutionWidth + 63) AS resW63, SUM(ResolutionWidth + 64) AS resW64, SUM(ResolutionWidth + 65) AS resW65, SUM(ResolutionWidth + 66) AS resW66, SUM(ResolutionWidth + 67) AS resW67, SUM(ResolutionWidth + 68) AS resW68, SUM(ResolutionWidth + 69) AS resW69, SUM(ResolutionWidth + 70) AS resW70, SUM(ResolutionWidth + 71) AS resW71, SUM(ResolutionWidth + 72) AS resW72, SUM(ResolutionWidth + 73) AS resW73, SUM(ResolutionWidth + 74) AS resW74, SUM(ResolutionWidth + 75) AS resW75, SUM(ResolutionWidth + 76) AS resW76, SUM(ResolutionWidth + 77) AS resW77, SUM(ResolutionWidth + 78) AS resW78, SUM(ResolutionWidth + 79) AS resW79, SUM(ResolutionWidth + 80) AS resW80, SUM(ResolutionWidth + 81) AS resW81, SUM(ResolutionWidth + 82) AS resW82, SUM(ResolutionWidth + 83) AS resW83, SUM(ResolutionWidth + 84) AS resW84, SUM(ResolutionWidth + 85) AS resW85, SUM(ResolutionWidth + 86) AS resW86, SUM(ResolutionWidth + 87) AS resW87, SUM(ResolutionWidth + 88) AS resW88, SUM(ResolutionWidth + 89) AS resW89 FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT SearchPhrase, MIN(URL) AS url, MIN(Title) AS title, COUNT(*) AS c, COUNT(DISTINCT UserID) AS usrIdCount FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c, searchphrase DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY WatchID, ClientIP ORDER BY c DESC, WatchID, ClientIP LIMIT 10;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY URL ORDER BY c DESC LIMIT 12;
SELECT URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE url ='http://kinopoisk.ru/perm.irr.ru' GROUP BY URL;
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT AVG(UserID) FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT MIN(EventDate) AS minEventDate, MAX(EventDate) AS maxEventDate FROM parquet_scan('s3://boilingdata-demo/hits.parquet');
SELECT RegionID, SUM(AdvEngineID) AS sumAdvEngineId, COUNT(*) AS c, AVG(ResolutionWidth) AS avgResolutionWidth, COUNT(DISTINCT UserID) AS distinctUsers FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY RegionID ORDER BY c DESC LIMIT 10;
SELECT 1, URL, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT Title, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www.)?([^/]+)/.*$', '\x01') AS k, AVG(STRLEN(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC, c LIMIT 25;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS avgResWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC, SearchEngineID, ClientIP LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh) AS sumIsRefresh, AVG(ResolutionWidth) AS AvgRsesWidth FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c, WatchID, ClientIP DESC LIMIT 10;
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC, ClientIP LIMIT 11;
SELECT COUNT(*) AS count FROM ( SELECT SearchPhrase FROM parquet_scan('s3://boilingdata-demo/hits.parquet') GROUP BY SearchPhrase ) AS a ORDER BY count DESC;
SELECT DATE_TRUNC('minute', EventTime::timestamp) AS M, COUNT(*) AS PageViews FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_TRUNC('minute', EventTime::timestamp) ORDER BY DATE_TRUNC('minute', EventTime::timestamp) LIMIT 10 OFFSET 1000;
SELECT CounterID, AVG(STRLEN(URL)) AS l, COUNT(*) AS c FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT * FROM parquet_scan('s3://boilingdata-demo/hits.parquet') WHERE URL LIKE '%google%' ORDER BY EventTime, WatchID LIMIT 1; |
Thank you! Yes, let's try these queries. |
I've created "engineering white paper" with the test results. Link is on front page: https://www.boilingdata.com/ |
https://www.boilingdata.com/
Suggested by @danthegoodman1 in a discussion at https://clickhouse.com/slack
The text was updated successfully, but these errors were encountered: