{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":56128733,"defaultBranch":"master","name":"impala","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2016-04-13T07:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1724797221.0","currentOid":""},"activityList":{"items":[{"before":"e7376466c56c15101fc699428b6bf3a00de7332b","after":"7167f3b4f0b6940f36705c95b8da17941557f721","ref":"refs/heads/master","pushedAt":"2024-09-20T09:46:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13336: Fix syntax error in creating Iceberg test table on Apache Hive 3\n\nApache Hive 3 doesn't support the syntax of STORED BY ICEBERG STORED AS\nAVRO. When loading test data on Apache Hive 3, we convert this clause to\n STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'\n TBLPROPERTIES('write.format.default'='AVRO')\nHowever, when there is a LOCATION clause in the statement, the\nTBLPROPERTIES clause will be put before the LOCATION clause, which\ncauses the syntax error.\n\nIn the CreateTable statement, TBLPROPERTIES clause should be put after\nthe LOCATION clause. This patch fixes generate-schema-statements.py to\ntake care of this case.\n\nTests:\n - Verified the SQL files generated by generate-schema-statements.py\n\nChange-Id: I5b47d6dc1a2ab63d4ecea476dbab67c1ae8ca490\nReviewed-on: http://gerrit.cloudera.org:8080/21730\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13336: Fix syntax error in creating Iceberg test table on Apac…"}},{"before":"58fd45f20c22f8a062f603d6b96f62ee57d85ca9","after":"e7376466c56c15101fc699428b6bf3a00de7332b","ref":"refs/heads/master","pushedAt":"2024-09-20T06:54:38.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13377: Excercise exec_option in test_recover_partitions.py\n\nBefore this patch, test_recover_partitions.py declare exec_option\ndimension, but none of the test function exercise it. vector arg is\nnever used anywhere.\n\nThis patch fix the test by setting client configuration with given\nvector.get_value('exec_option').\n\nTesting:\n- Run and pass test_recover_partitions.py in exhaustive mode.\n- Confirm in coordinator log file that there are equal number of queries\n run with either sync_ddl true or false (95 each).\n\nChange-Id: I4e938dd8667937c996854032a1e13184c62d7b48\nReviewed-on: http://gerrit.cloudera.org:8080/21796\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13377: Excercise exec_option in test_recover_partitions.py"}},{"before":"c7ce233679917761c4435852f049ac4b1d05ccce","after":"58fd45f20c22f8a062f603d6b96f62ee57d85ca9","ref":"refs/heads/master","pushedAt":"2024-09-16T23:08:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12876: Add catalogVersion and loaded timestamp in query profiles\n\nWhen debugging stale metadata, it'd be helpful to know what catalog\nversion of the tables are used and what's the time when catalogd loads\nthose versions. This patch exposes these info in the query profile for\neach referenced table. E.g.\n\nOriginal Table Versions: tpch.customer, 2249, 1726052668932, Wed Sep 11 19:04:28 CST 2024\ntpch.nation, 2255, 1726052790140, Wed Sep 11 19:06:30 CST 2024\ntpch.orders, 2257, 1726052803258, Wed Sep 11 19:06:43 CST 2024\ntpch.lineitem, 2254, 1726052785384, Wed Sep 11 19:06:25 CST 2024\ntpch.supplier, 2256, 1726052794235, Wed Sep 11 19:06:34 CST 2024\n\nEach line consists of the table name, catalog version, loaded timestamp\nand the timestamp string.\n\nImplementation:\n\nThe loaded timestamp is updated whenever a CatalogObject updates its\ncatalog version in catalogd. It's passed to impalads with the\nTCatalogObject broadcasted by statestore, or in DDL/DML responses.\nCurrently, the loaded timestamp is added for table, view, function, data\nsource, and hdfs cache pool in catalogd. However, only those of table\nand view are applied used in impalad. For the loaded timestamp of other\ntypes, users can check them in the /catalog WebUI of catalogd.\n\nTests:\n - Adds e2e test\n\nChange-Id: I94b2fd59ed5aca664d6db4448c61ad21a88a4f98\nReviewed-on: http://gerrit.cloudera.org:8080/21782\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-12876: Add catalogVersion and loaded timestamp in query profiles"}},{"before":"922ee7f9eaac2b3d66ab2b12cf8bcf9b3f14cbb8","after":"c7ce233679917761c4435852f049ac4b1d05ccce","ref":"refs/heads/master","pushedAt":"2024-09-16T00:56:56.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12594: Add flag to tune KrpcDataStreamSender mem estimate\n\nThe way the planner estimates mem usage for KrpcDataStreamSender\nis very different than how the backend actually uses memory -\nthe planner assumes that batch_size number of rows are sent at\na time while the BE tries to limit it to\ndata_stream_sender_buffer_size_ (but doesn't consider var len data).\nThe Jira has more detail about differences and issues.\n\nThis change adds flag data_stream_sender_buffer_size_used_by_planner.\nIf this is set to 16K (data_stream_sender_buffer_size_ default)\nthen the estimation will work similarly to BE.\n\nTested that this can improve both under and overestimations:\npeak mem / mem estimate of the first sender:\n\nselect distinct * from tpch_parquet.lineitem limit 100000\ndefault:\n284.04 KB 2.75 MB\n--data_stream_sender_buffer_size_used_by_planner=16384:\n282.04 KB 283.39 KB\n\nselect distinct l_comment from tpch_parquet.lineitem limit 100000;\ndefault:\n747.71 KB 509.94 KB\n--data_stream_sender_buffer_size_used_by_planner=16384:\n740.71 KB 627.46 KB\n\nThe default is not changed to avoid side effects. I would like\nto change it once BE's handling of var len data is fixed, which\nis a prerequisity to use mem reservation in KrpcDataStreamSender.\n\nChange-Id: I1e4b1db030be934cece565e3f2634ee7cbdb7c4f\nReviewed-on: http://gerrit.cloudera.org:8080/21797\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-12594: Add flag to tune KrpcDataStreamSender mem estimate"}},{"before":"61e90e9e90bccfe591965b5011488f294a277a5d","after":"922ee7f9eaac2b3d66ab2b12cf8bcf9b3f14cbb8","ref":"refs/heads/master","pushedAt":"2024-09-13T17:37:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13378: Verify tuple ids in descriptor table received in executor side\n\nIt's suspicious that KRPC could receive incomplete sidecar data\n(KUDU-3582). TQueryCtx and TExecPlanFragmentInfo are both sent as\nsidecars. Executors should check whether tuple ids in\nTExecPlanFragmentInfo are consistent with the descriptor table\ndeserialized from TQueryCtx.\n\nThis patch adds the check when launching the fragment instance threads.\n\nTests\n - ran CORE tests\n\nChange-Id: I0c489d3bff7ae08813271c65086ea6b238420e47\nReviewed-on: http://gerrit.cloudera.org:8080/21794\nReviewed-by: Wenzhe Zhou \nReviewed-by: Michael Smith \nTested-by: Michael Smith ","shortMessageHtmlLink":"IMPALA-13378: Verify tuple ids in descriptor table received in execut…"}},{"before":"30ccfce590903ab64af1ca10b9d42fe3f665aac3","after":"61e90e9e90bccfe591965b5011488f294a277a5d","ref":"refs/heads/master","pushedAt":"2024-09-13T12:40:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13182: Support uploading additional jars\n\nThis patch enables adding custom jars from the\nabsolute path: /opt/impala/aux-jars to the CLASSPATH.\n\nSteps:\n1. Download the jars into the /opt/impala/aux-jars directory\n2. Restart impala cluster.\n\nTesting:\n* Tested manually: Added jar files in /opt/impala/aux-jars\n before impala start. After starting impala, asserted that\n the new jars were appended to the value of CLASSPATH as\n printed in the impalad logs.\n\nChange-Id: Ica5fa4c0cd1a5c938f331f3a4bba85d4910db90e\nReviewed-on: http://gerrit.cloudera.org:8080/21556\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13182: Support uploading additional jars"}},{"before":"874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d","after":"30ccfce590903ab64af1ca10b9d42fe3f665aac3","ref":"refs/heads/master","pushedAt":"2024-09-12T11:03:58.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13371: Avoid throwing exception in FindFileInPath()\n\nSome boost::filesystem:: functions can throw exceptions, which led to\ncrash without informative error when FileSystemUtil::FindFileInPath()\nbumped into a problematic path in JavaAddJammAgent().\nFixed by switching to overloads of the functions that can't throw\nexception (other functions in FileSystemUtil already used these).\n\nTesting:\n- Added tests with paths where Impala has no permissions (/root)\n\nChange-Id: I6ed9f288ac5c400778a6b1215e16baf191bf5d0c\nReviewed-on: http://gerrit.cloudera.org:8080/21778\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13371: Avoid throwing exception in FindFileInPath()"}},{"before":"22723d0f276468a25553f007dc65b21d79bd821d","after":"874e4fa117bdccfb8784c1987e5e3bf1ef4fbc1d","ref":"refs/heads/master","pushedAt":"2024-09-10T23:21:54.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13222: Clean up .Trash and temp files at the end of S3 test runs\n\nRemove the .Trash directory for HDFS, and temporary files left in\n/tmp and in /other from the S3 bucket used for an S3 test run.\nDeletion happens using AWSCLI after the minicluster is shut down.\n\nFiles are deleted only from selected refixes (subdirectories) so that\nthe cleanup logic is safe to use for private buckets, or the regular\nbucket for private-s3-parameterized runs, impala-test-uswest2-3 too,\nwhere other files may exist besides the ones generated for a test run.\n\nTested by running an S3 build then checking the contents of the test\nbucket.\n\nChange-Id: I60a23394de8a67768a0b5b4c9c9576ee6a24348e\nReviewed-on: http://gerrit.cloudera.org:8080/21585\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13222: Clean up .Trash and temp files at the end of S3 test runs"}},{"before":"9f98b848142e893bbdb475a4ebaf0709f7fc466b","after":"22723d0f276468a25553f007dc65b21d79bd821d","ref":"refs/heads/master","pushedAt":"2024-09-09T17:53:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-7086: Cache timezone in *_utc_timestamp()\n\nAdded Prepare - Close routine around from/to_utc standard functions.\nThis gives a consistent time improvement for constant timezones.\n\nGiven sample table with 600M timestamp rows, on all-default\nenvironment the query below gives a stable 2-3 seconds improvement.\nSELECT count(*) FROM a_table\nwhere from_utc_timestamp(ts, \"a_timezone\") > \"a_date\";\n\nAveraged results for Release, SET MT_OP=1, SET DISABLE_CODEGEN=TRUE:\nfrom_utc: 16,53s -> 12,53s\nto_utc: 14,02 - > 11,53\n\nChange-Id: Icdf5ff82c5d0554333aef1bc3bba034a4cf48230\nReviewed-on: http://gerrit.cloudera.org:8080/21735\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-7086: Cache timezone in *_utc_timestamp()"}},{"before":"f78b3c5cb0f4672007efc618f0317ae7701e4866","after":"9f98b848142e893bbdb475a4ebaf0709f7fc466b","ref":"refs/heads/master","pushedAt":"2024-09-06T17:55:00.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-11431: Avoid getting stats for synthetic column row__id from HMS\n\nBefore this change Impala always tried to fetch stats for row__id in\nfull ACID tables, which does not exist in the metastore. Sometimes this\nled to an exception from HMS, see HIVE-28498 for details.\n\nThis caused flakyness in test_compute_stats_with_structs. The test also\nhad side effects (it computed stats for a shared table) so it was\nmodified to use unique_database.\n\nTesting:\n- could reproduce the issue by starting HMS with\n hive.metastore.try.direct.sql=true and verified that the change\n fixes it\n\nChange-Id: I759f57c99aa16e4ab5fd82aa5f6b756446291f03\nReviewed-on: http://gerrit.cloudera.org:8080/21742\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-11431: Avoid getting stats for synthetic column row__id from HMS"}},{"before":"de1a925cb79931d786301e58a8422d0930ae983c","after":"f78b3c5cb0f4672007efc618f0317ae7701e4866","ref":"refs/heads/master","pushedAt":"2024-09-06T13:34:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13347: Fixes TSAN Thread Leak of Workload Management Thread\n\nThe workload management processing runs in a separate thread declared\nin impala-server.h. This thread runs until a graceful shutdown is\ninitiated. The last step of the Impala coordinator shutdown process\nis to drain the completed queries queue to the query log table thus\nensuring completed queries do not get lost.\n\nThis thread has to run to completion, but the coordinator shutdown\nprocess never joins that thread. This patch adds the joining of that\nthread during the coordinator shutdown process. If the workload\nmanagement shutdown process exceedes the allotted time, the thread is\ndetached.\n\nInfo level logging was added to indicate which completed queries\nqueue drain situation occurred - successful or timed out.\n\nA new custom cluster test was added to test the situation where the\ncompleted queries queue drain process times out.\n\nChange-Id: I1e95967bb6e04470a8900c9ba69080eea8aaa25e\nReviewed-on: http://gerrit.cloudera.org:8080/21744\nReviewed-by: Riza Suminto \nReviewed-by: Michael Smith \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13347: Fixes TSAN Thread Leak of Workload Management Thread"}},{"before":"a0aaf338aef8dd682c555814c0f89b82e175b5fd","after":"de1a925cb79931d786301e58a8422d0930ae983c","ref":"refs/heads/master","pushedAt":"2024-09-05T21:28:11.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13349: Fix remaining tests with unexercised exec_option\n\nThis patch fixes remaining tests that has unexercised exec_option.\nSome test reorganization are done to clarify their test dimension\ndeclaration. The WARNING log added by IMPALA-13323 is turned into\npytest.fail() with error message suggestion on how to fix it.\nFixed some flake8 warnings and error as well.\n\nTesting:\n- Pass EE and custom cluster tests in exhaustive exploration.\n\nChange-Id: I33bb4b6c4ff50b55a082460dd9944d2aa3511e11\nReviewed-on: http://gerrit.cloudera.org:8080/21743\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13349: Fix remaining tests with unexercised exec_option"}},{"before":"dbe3002828394de379162b6f1738c1078d846a81","after":"a0aaf338aef8dd682c555814c0f89b82e175b5fd","ref":"refs/heads/master","pushedAt":"2024-09-05T19:38:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12732: Add support for MERGE statements for Iceberg tables\n\nMERGE statement is a DML command that allows users to perform\nconditional insert, update, or delete operations on a target table based\non the results of a join with a source table. This change adds MERGE\nstatement parsing and an Iceberg-specific semantic analysis, planning,\nand execution. The parsing grammar follows the SQL standard, it accepts\nthe same syntax as Hive, Spark, and Trino by supporting arbitrary number\nof WHEN clauses, with conditions or without and accepting inline views\nas source.\n\nExample:\n'MERGE INTO target t USING source s ON t.id = s.id\nWHEN MATCHED AND t.id < 100 THEN UPDATE SET column1 = s.column1\nWHEN MATCHED AND t.id > 100 THEN DELETE\nWHEN MATCHED THEN UPDATE SET column1 = \"value\"\nWHEN NOT MATCHED THEN INSERT VALUES (s.id, s.column1);'\n\nThe Iceberg-specific analysis, planning, and execution are based on a\nconcept that was previously used for UPDATE: The analyzer creates a\nSELECT statement with all target and source columns (including\nIceberg's virtual columns) and a 'row_present' column that defines\nwhether the source, the target, or both rows are present in the result\nset after joining the two table references by the ON clause. The join\ncondition should be an equi-join, as it is a FULL OUTER JOIN, and Impala\ncurrently supports only equi-joins in this case. The joining order is\nforced by a query hint, this guarantees that the target table is always\non the left side.\n\nA new, IcebergMergeNode is added at planning phase, this node does the\nrow-level filtering for each MATCHED/ NOT MATCHED cases. The\n'row_present' column decides which case group will be evaluated; if\nboth sides are available, the matched cases, if only the source side\nmatches then the not matched cases and their filter expressions\nwill be evaluated over the row. If one of the cases match, then the\nexecution evaluates the result expressions into the output row batch,\nand an auxiliary tuple will store the merge action. The merge action is\na flag for the newly added IcebergMergeSink; this sink will route each\nincoming row from IcebergMergeNode to their respective destination. Each\nrow could go to the delete sink, insert sink, or to both sinks.\n\nTarget-side duplicate records are filtered during IcebergMergeNode's\nexecution, if one target table-side duplicate is detected, the whole\nstatement's execution is stopped and the error is reported back to the\nuser.\n\nAdded tests:\n - Parser tests\n - Analyzer tests\n - Unit test for WHEN NOT MATCHED INSERT column collation\n - Planner tests for partitioned/sorted cases\n - Authorization tests\n - E2E tests\n\nChange-Id: I3416a79740eddc446c87f72bf1a85ed3f71af268\nReviewed-on: http://gerrit.cloudera.org:8080/21423\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-12732: Add support for MERGE statements for Iceberg tables"}},{"before":"7741edcf8bd6b815a0645c58c41f77fdf5470605","after":"dbe3002828394de379162b6f1738c1078d846a81","ref":"refs/heads/master","pushedAt":"2024-09-04T22:31:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13350: Fix Workload Management flush_on_interval Test\n\nThe test 'flush_on_interval' in test_query_log.py is failing. The\ncause is not allowing enough time for the workload management query\nprocessing loop to execute before checking the number of queries\nwritten to the query log table.\n\nThe fix is to allow more time for the processing loop to execute.\n\nChange-Id: I2fb1034ca63e170d5e57a6ece9b47da5dafebff4\nReviewed-on: http://gerrit.cloudera.org:8080/21750\nReviewed-by: Riza Suminto \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13350: Fix Workload Management flush_on_interval Test"}},{"before":"89698426f6c57a06761003434fd6dc7fbabe15fa","after":"7741edcf8bd6b815a0645c58c41f77fdf5470605","ref":"refs/heads/master","pushedAt":"2024-09-04T22:00:21.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13344: Analyze new rewrite exprs\n\nEnsures all new expressions created during ExprRewriteRule evaluation\nare analyzed so later rules will evaluate them. Centralize analyzing and\nlogging new expressions created by rewrite rules.\n\nUpdates tests:\n- one IMPALA-8030 test is fixed because it now applies\n ExtractCommonConjunctRule after NormalizeBinaryPredicatesRule.\n- several TPCDS tests have slight changes in how constant arithmetic\n expression results are stored; FoldConstantsRule retains the original\n type of the ArithmeticExpr, which may have a larger type (such as\n SMALLINT instead of TINYINT) than the resulting NumericLiteral needs.\n\nChange-Id: I6be731c2ea79c96e51d199c822e2cb34e5bb3028\nReviewed-on: http://gerrit.cloudera.org:8080/21679\nReviewed-by: Impala Public Jenkins \nTested-by: Michael Smith ","shortMessageHtmlLink":"IMPALA-13344: Analyze new rewrite exprs"}},{"before":"48ee4276be1eb278fb628a4813728134a4910b1f","after":"89698426f6c57a06761003434fd6dc7fbabe15fa","ref":"refs/heads/master","pushedAt":"2024-09-04T19:02:59.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12165: Add option for split debug information (-gsplit-dwarf)\n\nThis adds the IMPALA_SPLIT_DEBUG_INFO environment variable,\nwhich controls whether the build uses -gsplit-dwarf. This option\nputs debug information in a separate .dwo file for each C++\nfile. Executables contain metadata pointing to those .dwo files\nand don't need to include the debug information themselves. This\nreduces link time and disk space usage. The default for\nIMPALA_SPLIT_DEBUG_INFO is off as this is intended to be an\nopt-in option for developers.\n\nFor a debug build with compressed debug information,\nit cuts disk space usage roughly in half:\n\nWithout backend tests (measuring \"du -sh be\"):\nRegular: 5.6GB\nSplit debuginfo: 2.7GB\nWith backend tests:\nRegular: 22GB\nSplit debuginfo: 12GB\n\nThis only works for the debug information from Impala itself.\nThe debug information from dependencies from the toolchain\nare included in each executable.\n\nSplit debug information has been around for a long time,\nso tools like GDB work. Resolving minidumps works properly.\n\nTesting:\n - Ran builds locally (with GCC and Clang)\n - Attached to Impala with GDB and verified that symbols worked\n - Resolved a minidump and checked the output\n\nChange-Id: I3bbe700279a5dc3fde338fdfa0ea355d3570c9d0\nReviewed-on: http://gerrit.cloudera.org:8080/21720\nReviewed-by: Jason Fehr \nReviewed-by: Michael Smith \nTested-by: Michael Smith ","shortMessageHtmlLink":"IMPALA-12165: Add option for split debug information (-gsplit-dwarf)"}},{"before":"b9b4a6d12243c20ae42b9c9b1e4683a48ee70505","after":"48ee4276be1eb278fb628a4813728134a4910b1f","ref":"refs/heads/master","pushedAt":"2024-09-03T22:20:05.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12363: Upgrade RE2 to 2023-03-01\n\nThis bumps the version of re2 to 2023-03-01, which is the\nlast release that doesn't have an Abseil dependency.\nThe toolchain already contains a build of re2 2023-03-01,\nso there is no need to bump the toolchain version.\n\nThis has a performance benefit for TPC-H's Q13, which uses\nthis predicate: \"o_comment not like '%special%requests%\"\nThis like predicate is complicated enough that it doesn't fit\nthe heavily optimized paths that exist for simpler likes.\nInstead, this gets converted to an RE2 regex. The newer RE2\nsignificantly improves performance of that predicate, and\nTPC-H Q13 gets ~9% faster.\n\nTesting:\n - Ran a core job\n - Ran a perf-AB-test\n\nChange-Id: Ic7f131102bd7590d222f22dcc412d9fd2286f006\nReviewed-on: http://gerrit.cloudera.org:8080/21712\nReviewed-by: Michael Smith \nTested-by: Impala Public Jenkins \nReviewed-by: Yida Wu ","shortMessageHtmlLink":"IMPALA-12363: Upgrade RE2 to 2023-03-01"}},{"before":"53a452af669b022fd2ab19eb58a5dc5ea29aed78","after":"b9b4a6d12243c20ae42b9c9b1e4683a48ee70505","ref":"refs/heads/master","pushedAt":"2024-09-03T14:37:41.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13330: Fix orc_schema_resolution in test_nested_types.py\n\ntest_nested_types.py declare 'orc_schema_resolution' dimension, but does\nnot actually exercise it. None of the test actively inserting\n'orc_schema_resolution' dimension value into\nvector.get_value('exec_dimension').\n\nThis patch fix that issue by declaring 'orc_schema_resolution' option\nusing helper function add_exec_option_dimension() to automatically\ninsert it into 'exec_option' dimension. Test classes also reorganized to\nreduce test skipping and deepcopy-ing.\n\nFollowing are notable changes:\n- Use 'unique_database' in test_struct_in_select_list to avoid collision\n during view creation.\n- Drop unused 'unique_database' fixture in\n TestNestedCollectionsInSelectList.\n- test_map_null_keys does not have 'mt_dop' dimension anymore since it\n only test how NULL map key are displayed.\n- Created common base class TestParquetArrayEncodingsBase for\n TestParquetArrayEncodings and TestParquetArrayEncodingsAmbiguous. The\n latter does not run with 'parquet_array_resolution' anymore since that\n query option is set directly within parquet-ambiguous-list-modern.test\n and parquet-ambiguous-list-legacy.test files.\n- Make ImpalaTestMatrix.add_dimensions() call\n ImpalaTestMatrix.clear_dimension() if given dimension.name is\n 'exec_option' and independent_exec_option_names is not empty.\n\nThe reduction of test count are follows:\nBefore patch:\n168 core tests, 571 exhaustive tests\nAfter patch:\n161 core tests, 529 exhaustive tests\n\nTesting:\n- Ran and pass test_nested_types.py in exhaustive exploration.\n- Verified that no WARNING log printed by\n ImpalaTestSuite.validate_exec_option_dimension()\n\nChange-Id: Ib958cd34a56c949190b4f22e5da5dad2c0de25ff\nReviewed-on: http://gerrit.cloudera.org:8080/21726\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13330: Fix orc_schema_resolution in test_nested_types.py"}},{"before":"daa7f8ad88bdef886641c32b8580064f79d51ee7","after":"53a452af669b022fd2ab19eb58a5dc5ea29aed78","ref":"refs/heads/master","pushedAt":"2024-08-29T20:22:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13254: Optimize REFRESH for Iceberg tables\n\nConsidering that Iceberg's ContentFile is a collection of immutable\nfiles, the current code logic has been simplified. The optimized\nprocess is as follows:\n\n1. For existing ContentFiles, directly reuse the existing file\n descriptors.\n2. For newly added ContentFiles that do not support block locations,\n directly create file descriptors.\n3. For newly added ContentFiles that support block locations,\n choose between using a listLocatedStatus operation or calling\n getFileBlockLocations one by one, based on the number of files.\n\nA simple performance comparison test has been conducted in a\nsingle-node environment. The test used the following data tables:\n- non_partitioned_table: No partitions, containing 10,000 files\n- partitioned_table_1: Contains 10,000 partitions, each with 1 file\n- partitioned_table_2: Contains 300 partitions, each with 300 files\n\nand scenarios tested:\n- FULL: Perform REFRESH after executing INVALIDATE METADATA\n- ADD_1_FILES: Insert 1 file using Hive and then perform REFRESH\n- ADD_101_FILES: Insert 101 files using Hive and then perform REFRESH\n\nThe test results of the new version are as follows:\n+------------------------+----------+-------------+----------------+\n| Table | FULL | ADD_1_FILES | ADD_101_FILES |\n+------------------------+----------+-------------+----------------+\n| non_partitioned_table | 356.389ms| 40.015ms | 302.435ms |\n| partitioned_table_1 | 288.798ms| 26.667ms | 33.035ms |\n| partitioned_table_2 | 1s436ms | 237.057ms | 225.749ms |\n+------------------------+----------+-------------+----------------+\n\nThe test results of the old version are as follows:\n+------------------------+----------+-------------+----------------+\n| Table | FULL | ADD_1_FILES | ADD_101_FILES |\n+------------------------+----------+-------------+----------------+\n| non_partitioned_table | 338ms | 57.156ms | 12s903ms |\n| partitioned_table_1 | 281ms | 40.525ms | 12s743ms |\n| partitioned_table_2 | 1s397ms | 336.965ms | 1m57s |\n+------------------------+----------+-------------+----------------+\n\nIt can be observed that when the number of newly added files exceeds\niceberg_reload_new_files_threshold, REFRESH performance improves\nsignificantly, while there is no noticeable change in other scenarios.\n\nChange-Id: I8c99a28eb16275efdff52e0ea2711c0c6036719\nReviewed-on: http://gerrit.cloudera.org:8080/21608\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13254: Optimize REFRESH for Iceberg tables"}},{"before":"77a87bb103362ebafb0624f95d1a413417763d66","after":"daa7f8ad88bdef886641c32b8580064f79d51ee7","ref":"refs/heads/master","pushedAt":"2024-08-29T08:46:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13328: Fix missing krb5-config in building impala_quickstart_client docker image\n\nBuilding the impala_quickstart_client docker image failed by krb5-config\nnot found. It's installed by the libkrb5-dev package. This patch adds it\nto fix the build failure. Also improves\ndocker/publish_images_to_apache.sh to skip inexisting images (usually\ndue to not be built). Updates the quickstart_hms image to base on Ubuntu\n18.04.\n\nAlso fixes an issue that docker/CMakeLists.txt doesn't dump all the\nimage names to docker/docker-images.txt\n\nTests:\n - Verified the quickstart images on MacOS.\n\nChange-Id: Ieaa9878fa9cd9902ac883866c82e224889940615\nReviewed-on: http://gerrit.cloudera.org:8080/21725\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13328: Fix missing krb5-config in building impala_quickstart_c…"}},{"before":"9e0649b9ceac86643d69afeb62d32e01bbc43717","after":"77a87bb103362ebafb0624f95d1a413417763d66","ref":"refs/heads/master","pushedAt":"2024-08-28T22:19:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12737: Refactor the Workload Management Initialization Process.\n\nThe workload management initialization process creates the two tables\n\"sys.impala_query_log\" and \"sys.impala_query_live\" during coordinator\nstartup.\n\nThe current design for this init process is to create both tables on\neach coordinator at every startup by running create database and\ncreate table if not exists DDLs. This design causes unnecessary DDLs\nto execute which delays coordinator startup and introduces the\npotential for unnecessary startup failures should the DDLs fail.\n\nThis patch splits the initialization code into its own file and adds\nversion tracking to the individual fields in the workload management\ntables. This patch also adds schema version checks on the workload\nmanagement tables and only runs DDLs for the db tables if necessary.\n\nAdditionally, versioning of workload management table schemas is\nintroduced. The only allowed schema version in this patch is 1.0.0.\nFuture patches that need to modify the workload management table\nschema will expand this list of allowed versions.\n\nSince this patch is a refactor and does not change functionality,\ntesting was accomplished by running existing workload management\nunit and python tests.\n\nChange-Id: Id645f94c8da73b91c13a23d7ac0ea026425f0f96\nReviewed-on: http://gerrit.cloudera.org:8080/21653\nReviewed-by: Riza Suminto \nReviewed-by: Michael Smith \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-12737: Refactor the Workload Management Initialization Process."}},{"before":"6493060459a9192b05b95286d800841fcee12108","after":"9e0649b9ceac86643d69afeb62d32e01bbc43717","ref":"refs/heads/master","pushedAt":"2024-08-28T18:25:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-12867: Filter files to OPTIMIZE based on file size\n\nThe OPTIMIZE TABLE statement is currently used to rewrite the entire\nIceberg table. With the 'FILE_SIZE_THRESHOLD_MB' option, the user can\nspecify a file size limit to rewrite only small files.\n\nSyntax: OPTIMIZE TABLE [(FILE_SIZE_THRESHOLD_MB=)];\nThe value of the threshold is the file size in MBs. It must be a\nnon-negative integer. Data files larger than the given limit will only\nbe rewritten if they are referenced from delete files.\nIf only 1 file is selected in a partition, it will not be rewritten.\nIf the threshold is 0, only the delete files and the referenced data\nfiles will be rewritten.\n\nIMPALA-12839: 'Optimizing empty table should be no-op' is also\nresolved in this patch.\n\nWith the file selection option, the OPTIMIZE operation can operate\nin 3 different modes:\n- REWRITE_ALL: The entire table is rewritten. Either because the\n compaction was triggered by a simple 'OPTIMIZE TABLE' command\n without a specified 'FILE_SIZE_THRESHOLD_MB' parameter, or\n because all files of the table are deletes/referenced by deletes\n or are smaller than the limit.\n- PARTIAL: If the value of 'FILE_SIZE_THRESHOLD_MB' parameter is\n specified then only the small data files without deletes are selected\n and the delete files are merged. Large data files without deletes\n are kept to avoid unnecessary resource consuming writes.\n- NOOP: When no files qualify for the selection criteria, there is\n no need to rewrite any files. This is a no-operation.\n\nTesting:\n - Parser test\n - FE unit tests\n - E2E tests\n\nChange-Id: Icfbb589513aacdb68a86c1aec4a0d39b12091820\nReviewed-on: http://gerrit.cloudera.org:8080/21388\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-12867: Filter files to OPTIMIZE based on file size"}},{"before":"ae6a3b9ec058dfea4b4f93d4828761f792f0b55e","after":"6493060459a9192b05b95286d800841fcee12108","ref":"refs/heads/master","pushedAt":"2024-08-28T14:00:24.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13311: Hive3 INSERT failed by ClassNotFoundException: org.apache.tez.runtime.api.Event\n\ncorrect TEZ_HOME when using apache tez\nuse apache tez 0.10.2 and make hive compatible with it\n\nChange-Id: Ia278a87f92fedb96ec20608b5872facc55ae0a3c\nReviewed-on: http://gerrit.cloudera.org:8080/21706\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13311: Hive3 INSERT failed by ClassNotFoundException: org.apac…"}},{"before":"29073c7349438ff33469887b133fbd552b264cdf","after":null,"ref":"refs/tags/4.4.1-RC1","pushedAt":"2024-08-27T22:20:21.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"stiga-huang","name":"Quanlong Huang","path":"/stiga-huang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/18474287?s=80&v=4"}},{"before":"b6ca6ffb9cd9f69f6c903c0416cecbc60446097c","after":"ae6a3b9ec058dfea4b4f93d4828761f792f0b55e","ref":"refs/heads/master","pushedAt":"2024-08-27T01:03:55.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13082: Use separate versions for jackson vs jackson-databind\n\nSometimes there is a jackson-databind patch release without\na corresponding release of other jackson libraries. For example,\nthere is a jackson-databind 2.12.7.1, but jackson-core does not\nhave an artifact with that version. To handle these scenarios,\nit is useful to have a separate version for jackson-databind\nvs other jackson libraries.\n\nThis introduces IMPALA_JACKSON_VERSION (which currently matches\nIMPALA_JACKSON_DATABIND_VERSION) and uses this for non-databind\njackson libraries.\n\nTesting:\n - Ran a local build\n\nChange-Id: I3055cb47986581793d947eaedb6a24b4dd92e3a6\nReviewed-on: http://gerrit.cloudera.org:8080/21719\nTested-by: Impala Public Jenkins \nReviewed-by: Michael Smith ","shortMessageHtmlLink":"IMPALA-13082: Use separate versions for jackson vs jackson-databind"}},{"before":"d91d99c08e469b4cd40d81ce1aeb8c2bec596880","after":"b6ca6ffb9cd9f69f6c903c0416cecbc60446097c","ref":"refs/heads/master","pushedAt":"2024-08-26T18:45:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13317: Enhance tpc_sort_key for wider name support\n\nCurrently, the tpc_sort_key function is used for sorting TPCH or\nTPCDS files while running the TPCH or TPCDS tests, and only\nused by test_tuple_cache_tpc_queries now. It is designed to\nhandle filenames in formats like \"tpch-qx-y,\" \"tpch-qx,\" or\n\"tpch-qxX.\" However, it doesn't support filenames in the format\n\"tpch-qx-yY,\" and attempting to sort these files results in an error.\n\nThis patch improves the robustness of the tpc_sort_key function\nby adding more checks to prevent errors and extending support\nfor filenames in the \"tpch-qxX-yY\" format.\n\nTests:\nReran and passed tests with file name like \"tpch-qxX-yY\" format.\nSeems no tests exist for test util functions, I tested the function\nwith following unit tests locally and passed\ntest_cases = {\n 'tpcds-q1': (1, 0, '', ''),\n 'tpcds-q1X': (1, 0, 'X', ''),\n 'tpcds-q1-2Y': (1, 2, '', 'Y'),\n 'tpcds-q1X-2Y': (1, 2, 'X', 'Y'),\n 'tpcds-q2-3': (2, 3, '', ''),\n 'tpcds-q10': (10, 0, '', ''),\n 'tpcds-q10-20': (10, 20, '', ''),\n 'tpcds-q10a-20': (10, 20, 'a', ''),\n 'tpcds-q10-20b': (10, 20, '', 'b'),\n 'tpcds-q10a-20b': (10, 20, 'a', 'b'),\n 'tpcds-q0': (0, 0, '', ''),\n 'tpcds-': (0, 0, '', ''),\n 'tpcds--': (0, 0, '', ''),\n 'tpcds-xx-xx': (0, 0, '', ''),\n 'tpcds-x1-x1': (0, 0, '', ''),\n 'tpcds-x1-x': (0, 0, '', ''),\n 'tpcds-x-x1': (0, 0, '', ''),\n 'tpcds': (0, 0, '', ''),\n}\nfor input_str, expected in test_cases.items():\n result = tpc_sort_key(input_str)\n assert result == expected\n\nChange-Id: Ib238ff09d5a2278c593f2759cf35f136b0ff1344\nReviewed-on: http://gerrit.cloudera.org:8080/21708\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13317: Enhance tpc_sort_key for wider name support"}},{"before":"76847fb03d9cc92530f97517a4481993392f331a","after":"d91d99c08e469b4cd40d81ce1aeb8c2bec596880","ref":"refs/heads/master","pushedAt":"2024-08-23T07:27:23.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13303: FileSystemUtil.listFiles() should handle non-recursive case\n\nFileSystemUtil.listFiles() is used in FileMetadataLoader#loadInternal()\nto list the files with block locations. When table property\n\"impala.disable.recursive.listing\" is set to true, it's supposed to skip\nfiles in the sub dirs. However, for FileSystems that don't support\nrecursive listFiles(), we always create a RecursingIterator and don't\nrespect the 'recursive' argument.\n\nThis patch fixes the issue by adding the check for the 'recursive'\nargument and use the non-recursive iterator when it's false.\n\nTests\n - Add test in test_recursive_listing.py to reveal the issue\n\nChange-Id: Ia930e6071963d53561ce79896bff9d19720468a4\nReviewed-on: http://gerrit.cloudera.org:8080/21680\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13303: FileSystemUtil.listFiles() should handle non-recursive …"}},{"before":"7d043864ff5f2b46b40e589bd8f19eaa23308eca","after":"76847fb03d9cc92530f97517a4481993392f331a","ref":"refs/heads/master","pushedAt":"2024-08-22T19:18:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13291: Filter dmesg messages by date\n\nAt the end of a test run, one of the things finalize.sh does is to look\nfor interesting messages in the output of dmesg. Recently we had the\nissue where it was reporting false positives. This was because the\ndmesg output covers the history since the last machine reboot.\n\nAdd an optional parameter to finalize.sh which gives the start time of\nthe test run in the format \"2012-10-30 18:17:16\". This parameter is\noptional until all callers have been updated, some of which may be in\ndifferent git repositories.\n\nSwitch to using journalctl to fetch the dmesg output. This allows use of\nthe --since option to filter the messages starting at the given\ntimestamp. When this is used we should not see the false positives form\nearlier test runs on the same machine.\n\nChange-Id: I7ac9c16dfe1c60f04e117dd634609f03faa3c3dc\nReviewed-on: http://gerrit.cloudera.org:8080/21705\nReviewed-by: Michael Smith \nTested-by: Impala Public Jenkins \nReviewed-by: Joe McDonnell ","shortMessageHtmlLink":"IMPALA-13291: Filter dmesg messages by date"}},{"before":"07d44b7affa0652f14d8044cd7ffa604f250b77b","after":"7d043864ff5f2b46b40e589bd8f19eaa23308eca","ref":"refs/heads/master","pushedAt":"2024-08-22T18:47:58.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13274: Filter out illegal output for certain join nodes\n\nFilter out illegal output for certain join nodes, including those with\njoin operators LEFT_ANTI_JOIN, LEFT_SEMI_JOIN, NULL_AWARE_LEFT_ANTI_JOIN,\nand ICEBERG_DELETE_JOIN. For these join nodes, we only retain the tuple\nids of the outer side while computing tuple ids. If the illegal output\nfrom these join nodes is referenced by the parent node, it may cause\nthe backend to crash due to missing tuple id.\n\nTests\n- Add e2e test\n\nChange-Id: I50b82d85737025df2fdd9e7ab0fca2385e642415\nReviewed-on: http://gerrit.cloudera.org:8080/21671\nReviewed-by: Impala Public Jenkins \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13274: Filter out illegal output for certain join nodes"}},{"before":"4b500a55cbfcdd311a1c766e33849f7ae05a1a8e","after":"07d44b7affa0652f14d8044cd7ffa604f250b77b","ref":"refs/heads/master","pushedAt":"2024-08-22T00:53:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"asfgit","name":null,"path":"/asfgit","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1341245?s=80&v=4"},"commit":{"message":"IMPALA-13262: Do not always migrate inferred predicates into inline view\n\nThis patch removes a predicate inferred from a set of analytic\npredicates if both sides of the inferred predicate reference the same\nTupleId when migrating those analytic predicates into an inline view.\nThis is to prevent Impala from pushing the inferred conjunct to the\nscan node before the analytic functions are applied, which could produce\nan incorrect result.\n\nTesting:\n - Added additional query and planner test cases to verify Impala's\n behavior after this patch.\n - Verified the patch passed the core tests.\n\nChange-Id: I6e2632b3b1a140ae0104ceba4e2f474ac1bbcda1\nReviewed-on: http://gerrit.cloudera.org:8080/21688\nReviewed-by: Michael Smith \nReviewed-by: Riza Suminto \nTested-by: Impala Public Jenkins ","shortMessageHtmlLink":"IMPALA-13262: Do not always migrate inferred predicates into inline view"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQwOTo0Njo1MC4wMDAwMDBazwAAAAS7yqAA","startCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOS0yMFQwOTo0Njo1MC4wMDAwMDBazwAAAAS7yqAA","endCursor":"Y3Vyc29yOnYyOpK7MjAyNC0wOC0yMlQwMDo1MzoxNC4wMDAwMDBazwAAAASgwlYY"}},"title":"Activity · apache/impala"}