Skip to content

Commit

Permalink
[SPARK-47018][BUILD][SQL][HIVE] Bump built-in Hive to 2.3.10
Browse files Browse the repository at this point in the history
  • Loading branch information
pan3793 committed May 8, 2024
1 parent f3d9b81 commit f976ea2
Show file tree
Hide file tree
Showing 18 changed files with 63 additions and 75 deletions.
5 changes: 0 additions & 5 deletions connector/kafka-0-10-assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,6 @@
<artifactId>commons-codec</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
Expand Down
5 changes: 0 additions & 5 deletions connector/kinesis-asl-assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,6 @@
<artifactId>jackson-databind</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-client</artifactId>
Expand Down
27 changes: 13 additions & 14 deletions dev/deps/spark-deps-hadoop-3-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ commons-compress/1.26.1//commons-compress-1.26.1.jar
commons-crypto/1.1.0//commons-crypto-1.1.0.jar
commons-dbcp/1.4//commons-dbcp-1.4.jar
commons-io/2.16.1//commons-io-2.16.1.jar
commons-lang/2.6//commons-lang-2.6.jar
commons-lang3/3.14.0//commons-lang3-3.14.0.jar
commons-math3/3.6.1//commons-math3-3.6.1.jar
commons-pool/1.5.4//commons-pool-1.5.4.jar
Expand Down Expand Up @@ -81,19 +80,19 @@ hadoop-cloud-storage/3.4.0//hadoop-cloud-storage-3.4.0.jar
hadoop-huaweicloud/3.4.0//hadoop-huaweicloud-3.4.0.jar
hadoop-shaded-guava/1.2.0//hadoop-shaded-guava-1.2.0.jar
hadoop-yarn-server-web-proxy/3.4.0//hadoop-yarn-server-web-proxy-3.4.0.jar
hive-beeline/2.3.9//hive-beeline-2.3.9.jar
hive-cli/2.3.9//hive-cli-2.3.9.jar
hive-common/2.3.9//hive-common-2.3.9.jar
hive-exec/2.3.9/core/hive-exec-2.3.9-core.jar
hive-jdbc/2.3.9//hive-jdbc-2.3.9.jar
hive-llap-common/2.3.9//hive-llap-common-2.3.9.jar
hive-metastore/2.3.9//hive-metastore-2.3.9.jar
hive-serde/2.3.9//hive-serde-2.3.9.jar
hive-beeline/2.3.10//hive-beeline-2.3.10.jar
hive-cli/2.3.10//hive-cli-2.3.10.jar
hive-common/2.3.10//hive-common-2.3.10.jar
hive-exec/2.3.10/core/hive-exec-2.3.10-core.jar
hive-jdbc/2.3.10//hive-jdbc-2.3.10.jar
hive-llap-common/2.3.10//hive-llap-common-2.3.10.jar
hive-metastore/2.3.10//hive-metastore-2.3.10.jar
hive-serde/2.3.10//hive-serde-2.3.10.jar
hive-service-rpc/4.0.0//hive-service-rpc-4.0.0.jar
hive-shims-0.23/2.3.9//hive-shims-0.23-2.3.9.jar
hive-shims-common/2.3.9//hive-shims-common-2.3.9.jar
hive-shims-scheduler/2.3.9//hive-shims-scheduler-2.3.9.jar
hive-shims/2.3.9//hive-shims-2.3.9.jar
hive-shims-0.23/2.3.10//hive-shims-0.23-2.3.10.jar
hive-shims-common/2.3.10//hive-shims-common-2.3.10.jar
hive-shims-scheduler/2.3.10//hive-shims-scheduler-2.3.10.jar
hive-shims/2.3.10//hive-shims-2.3.10.jar
hive-storage-api/2.8.1//hive-storage-api-2.8.1.jar
hk2-api/3.0.3//hk2-api-3.0.3.jar
hk2-locator/3.0.3//hk2-locator-3.0.3.jar
Expand Down Expand Up @@ -184,7 +183,7 @@ kubernetes-model-storageclass/6.12.1//kubernetes-model-storageclass-6.12.1.jar
lapack/3.0.3//lapack-3.0.3.jar
leveldbjni-all/1.8//leveldbjni-all-1.8.jar
libfb303/0.9.3//libfb303-0.9.3.jar
libthrift/0.12.0//libthrift-0.12.0.jar
libthrift/0.16.0//libthrift-0.16.0.jar
log4j-1.2-api/2.22.1//log4j-1.2-api-2.22.1.jar
log4j-api/2.22.1//log4j-api-2.22.1.jar
log4j-core/2.22.1//log4j-core-2.22.1.jar
Expand Down
4 changes: 2 additions & 2 deletions docs/building-spark.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ Example:

To enable Hive integration for Spark SQL along with its JDBC server and CLI,
add the `-Phive` and `-Phive-thriftserver` profiles to your existing build options.
By default Spark will build with Hive 2.3.9.
By default Spark will build with Hive 2.3.10.

# With Hive 2.3.9 support
# With Hive 2.3.10 support
./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package

## Packaging without Hadoop Dependencies for YARN
Expand Down
8 changes: 4 additions & 4 deletions docs/sql-data-sources-hive-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,10 +127,10 @@ The following options can be used to configure the version of Hive that is used
<thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Since Version</th></tr></thead>
<tr>
<td><code>spark.sql.hive.metastore.version</code></td>
<td><code>2.3.9</code></td>
<td><code>2.3.10</code></td>
<td>
Version of the Hive metastore. Available
options are <code>2.0.0</code> through <code>2.3.9</code> and <code>3.0.0</code> through <code>3.1.3</code>.
options are <code>2.0.0</code> through <code>2.3.10</code> and <code>3.0.0</code> through <code>3.1.3</code>.
</td>
<td>1.4.0</td>
</tr>
Expand All @@ -142,9 +142,9 @@ The following options can be used to configure the version of Hive that is used
property can be one of four options:
<ol>
<li><code>builtin</code></li>
Use Hive 2.3.9, which is bundled with the Spark assembly when <code>-Phive</code> is
Use Hive 2.3.10, which is bundled with the Spark assembly when <code>-Phive</code> is
enabled. When this option is chosen, <code>spark.sql.hive.metastore.version</code> must be
either <code>2.3.9</code> or not defined.
either <code>2.3.10</code> or not defined.
<li><code>maven</code></li>
Use Hive jars of specified version downloaded from Maven repositories. This configuration
is not generally recommended for production deployments.
Expand Down
2 changes: 1 addition & 1 deletion docs/sql-migration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1067,7 +1067,7 @@ Python UDF registration is unchanged.
Spark SQL is designed to be compatible with the Hive Metastore, SerDes and UDFs.
Currently, Hive SerDes and UDFs are based on built-in Hive,
and Spark SQL can be connected to different versions of Hive Metastore
(from 0.12.0 to 2.3.9 and 3.0.0 to 3.1.3. Also see [Interacting with Different Versions of Hive Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).
(from 2.0.0 to 2.3.10 and 3.0.0 to 3.1.3. Also see [Interacting with Different Versions of Hive Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).

#### Deploying in Existing Hive Warehouses
{:.no_toc}
Expand Down
31 changes: 13 additions & 18 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,8 @@
<hive.group>org.apache.hive</hive.group>
<hive.classifier>core</hive.classifier>
<!-- Version used in Maven Hive dependency -->
<hive.version>2.3.9</hive.version>
<hive23.version>2.3.9</hive23.version>
<hive.version>2.3.10</hive.version>
<hive23.version>2.3.10</hive23.version>
<!-- Version used for internal directory structure -->
<hive.version.short>2.3</hive.version.short>
<!-- note that this should be compatible with Kafka brokers version 0.10 and up -->
Expand Down Expand Up @@ -192,8 +192,6 @@
<commons-codec.version>1.17.0</commons-codec.version>
<commons-compress.version>1.26.1</commons-compress.version>
<commons-io.version>2.16.1</commons-io.version>
<!-- org.apache.commons/commons-lang/-->
<commons-lang2.version>2.6</commons-lang2.version>
<!-- org.apache.commons/commons-lang3/-->
<commons-lang3.version>3.14.0</commons-lang3.version>
<!-- org.apache.commons/commons-pool2/-->
Expand All @@ -206,7 +204,7 @@
<jodd.version>3.5.2</jodd.version>
<jsr305.version>3.0.0</jsr305.version>
<jaxb.version>2.2.11</jaxb.version>
<libthrift.version>0.12.0</libthrift.version>
<libthrift.version>0.16.0</libthrift.version>
<antlr4.version>4.13.1</antlr4.version>
<jpam.version>1.1</jpam.version>
<selenium.version>4.17.0</selenium.version>
Expand Down Expand Up @@ -615,11 +613,6 @@
<artifactId>commons-text</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>${commons-lang2.version}</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
Expand Down Expand Up @@ -2294,8 +2287,8 @@
<artifactId>janino</artifactId>
</exclusion>
<exclusion>
<groupId>org.pentaho</groupId>
<artifactId>pentaho-aggdesigner-algorithm</artifactId>
<groupId>net.hydromatic</groupId>
<artifactId>aggdesigner-algorithm</artifactId>
</exclusion>
<!-- End of Hive 2.3 exclusion -->
</exclusions>
Expand Down Expand Up @@ -2365,6 +2358,10 @@
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-all</artifactId>
</exclusion>
<exclusion>
<groupId>com.lmax</groupId>
<artifactId>disruptor</artifactId>
</exclusion>
</exclusions>
</dependency>

Expand Down Expand Up @@ -2805,6 +2802,10 @@
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
Expand Down Expand Up @@ -2898,12 +2899,6 @@
<artifactId>hive-storage-api</artifactId>
<version>${hive.storage.version}</version>
<scope>${hive.storage.scope}</scope>
<exclusions>
<exclusion>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.thrift.TProcessorFactory;
import org.apache.thrift.transport.TSaslClientTransport;
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TTransportException;

public final class KerberosSaslHelper {

Expand Down Expand Up @@ -68,8 +69,8 @@ public static TTransport createSubjectAssumedTransport(String principal,
new TSaslClientTransport("GSSAPI", null, names[0], names[1], saslProps, null,
underlyingTransport);
return new TSubjectAssumingTransport(saslTransport);
} catch (SaslException se) {
throw new IOException("Could not instantiate SASL transport", se);
} catch (SaslException | TTransportException se) {
throw new IOException("Could not instantiate transport", se);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
import org.apache.thrift.transport.TSaslClientTransport;
import org.apache.thrift.transport.TSaslServerTransport;
import org.apache.thrift.transport.TTransport;
import org.apache.thrift.transport.TTransportException;
import org.apache.thrift.transport.TTransportFactory;

public final class PlainSaslHelper {
Expand All @@ -64,7 +65,7 @@ public static TTransportFactory getPlainTransportFactory(String authTypeStr)
}

public static TTransport getPlainTransport(String username, String password,
TTransport underlyingTransport) throws SaslException {
TTransport underlyingTransport) throws SaslException, TTransportException {
return new TSaslClientTransport("PLAIN", null, null, null, new HashMap<String, String>(),
new PlainCallbackHandler(username, password), underlyingTransport);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,11 +45,12 @@ public TSetIpAddressProcessor(Iface iface) {
}

@Override
public boolean process(final TProtocol in, final TProtocol out) throws TException {
public void process(final TProtocol in, final TProtocol out) throws TException {
setIpAddress(in);
setUserName(in);
try {
return super.process(in, out);
super.process(in, out);
return;
} finally {
THREAD_LOCAL_USER_NAME.remove();
THREAD_LOCAL_IP_ADDRESS.remove();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,16 +91,10 @@ protected void initializeServer() {

// Server args
int maxMessageSize = hiveConf.getIntVar(HiveConf.ConfVars.HIVE_SERVER2_THRIFT_MAX_MESSAGE_SIZE);
int requestTimeout = (int) hiveConf.getTimeVar(
HiveConf.ConfVars.HIVE_SERVER2_THRIFT_LOGIN_TIMEOUT, TimeUnit.SECONDS);
int beBackoffSlotLength = (int) hiveConf.getTimeVar(
HiveConf.ConfVars.HIVE_SERVER2_THRIFT_LOGIN_BEBACKOFF_SLOT_LENGTH, TimeUnit.MILLISECONDS);
TThreadPoolServer.Args sargs = new TThreadPoolServer.Args(serverSocket)
.processorFactory(processorFactory).transportFactory(transportFactory)
.protocolFactory(new TBinaryProtocol.Factory())
.inputProtocolFactory(new TBinaryProtocol.Factory(true, true, maxMessageSize, maxMessageSize))
.requestTimeout(requestTimeout).requestTimeoutUnit(TimeUnit.SECONDS)
.beBackoffSlotLength(beBackoffSlotLength).beBackoffSlotLengthUnit(TimeUnit.MILLISECONDS)
.executorService(executorService);

// TCP Server
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,16 @@ public void setSessionHandle(SessionHandle sessionHandle) {
public SessionHandle getSessionHandle() {
return sessionHandle;
}

@Override
public <T> T unwrap(Class<T> aClass) {
return null;
}

@Override
public boolean isWrapperFor(Class<?> aClass) {
return false;
}
}

public ThriftCLIService(CLIService service, String serviceName) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ private[spark] object HiveUtils extends Logging {

val HIVE_METASTORE_VERSION = buildStaticConf("spark.sql.hive.metastore.version")
.doc("Version of the Hive metastore. Available options are " +
"<code>2.0.0</code> through <code>2.3.9</code> and " +
"<code>2.0.0</code> through <code>2.3.10</code> and " +
"<code>3.0.0</code> through <code>3.1.3</code>.")
.version("1.4.0")
.stringConf
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1358,7 +1358,7 @@ private[hive] object HiveClientImpl extends Logging {
try {
Hive.getWithoutRegisterFns(hiveConf)
} catch {
// SPARK-37069: not all Hive versions have the above method (e.g., Hive 2.3.9 has it but
// SPARK-37069: not all Hive versions have the above method (e.g., Hive 2.3.10 has it but
// 2.3.8 don't), therefore here we fallback when encountering the exception.
case _: NoSuchMethodError =>
Hive.get(hiveConf)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,12 @@ package object client {
"org.pentaho:pentaho-aggdesigner-algorithm"))

// Since HIVE-23980, calcite-core included in Hive package jar.
case object v2_3 extends HiveVersion("2.3.9",
case object v2_3 extends HiveVersion("2.3.10",
exclusions = Seq("org.apache.calcite:calcite-core",
"org.apache.calcite:calcite-druid",
"org.apache.calcite.avatica:avatica",
"com.fasterxml.jackson.core:*",
"org.apache.curator:*",
"org.pentaho:pentaho-aggdesigner-algorithm",
"net.hydromatic:aggdesigner-algorithm",
"org.apache.hive:hive-vector-code-gen"))

// Since Hive 3.0, HookUtils uses org.apache.logging.log4j.util.Strings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
tryDownloadSpark(version, sparkTestingDir.getCanonicalPath)
}

// Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 2.3.9 and Java 11.
// Extract major.minor for testing Spark 3.1.x and 3.0.x with metastore 2.3.10 and Java 11.
val hiveMetastoreVersion = """^\d+\.\d+""".r.findFirstIn(hiveVersion).get
val args = Seq(
"--name", "prepare testing tables",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ class HiveSparkSubmitSuite
"--conf", s"${EXECUTOR_MEMORY.key}=512m",
"--conf", "spark.ui.enabled=false",
"--conf", "spark.master.rest.enabled=false",
"--conf", "spark.sql.hive.metastore.version=2.3.9",
"--conf", "spark.sql.hive.metastore.version=2.3.10",
"--conf", "spark.sql.hive.metastore.jars=maven",
"--driver-java-options", "-Dderby.system.durability=test",
unusedJar.toString)
Expand Down Expand Up @@ -370,7 +370,7 @@ class HiveSparkSubmitSuite
"--master", "local-cluster[2,1,512]",
"--conf", s"${EXECUTOR_MEMORY.key}=512m",
"--conf", s"${LEGACY_TIME_PARSER_POLICY.key}=LEGACY",
"--conf", s"${HiveUtils.HIVE_METASTORE_VERSION.key}=2.3.9",
"--conf", s"${HiveUtils.HIVE_METASTORE_VERSION.key}=2.3.10",
"--conf", s"${HiveUtils.HIVE_METASTORE_JARS.key}=maven",
"--conf", s"spark.hadoop.javax.jdo.option.ConnectionURL=$metastore",
unusedJar.toString)
Expand All @@ -387,7 +387,7 @@ object SetMetastoreURLTest extends Logging {
val builder = SparkSession.builder()
.config(sparkConf)
.config(UI_ENABLED.key, "false")
.config(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.9")
.config(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.10")
// The issue described in SPARK-16901 only appear when
// spark.sql.hive.metastore.jars is not set to builtin.
.config(HiveUtils.HIVE_METASTORE_JARS.key, "maven")
Expand Down Expand Up @@ -698,7 +698,7 @@ object SparkSQLConfTest extends Logging {
val filteredSettings = super.getAll.filterNot(e => isMetastoreSetting(e._1))

// Always add these two metastore settings at the beginning.
(HiveUtils.HIVE_METASTORE_VERSION.key -> "2.3.9") +:
(HiveUtils.HIVE_METASTORE_VERSION.key -> "2.3.10") +:
(HiveUtils.HIVE_METASTORE_JARS.key -> "maven") +:
filteredSettings
}
Expand Down Expand Up @@ -726,7 +726,7 @@ object SPARK_9757 extends QueryTest {
val hiveWarehouseLocation = Utils.createTempDir()
val sparkContext = new SparkContext(
new SparkConf()
.set(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.9")
.set(HiveUtils.HIVE_METASTORE_VERSION.key, "2.3.10")
.set(HiveUtils.HIVE_METASTORE_JARS.key, "maven")
.set(UI_ENABLED, false)
.set(WAREHOUSE_PATH.key, hiveWarehouseLocation.toString))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1627,10 +1627,8 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
test("SPARK-33084: Add jar support Ivy URI in SQL") {
val testData = TestHive.getHiveFile("data/files/sample.json").toURI
withTable("t") {
// hive-catalog-core has some transitive dependencies which dont exist on maven central
// and hence cannot be found in the test environment or are non-jar (.pom) which cause
// failures in tests. Use transitive=false as it should be good enough to test the Ivy
// support in Hive ADD JAR
// Use transitive=false as it should be good enough to test the Ivy support
// in Hive ADD JAR
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion" +
"?transitive=false")
sql(
Expand Down

0 comments on commit f976ea2

Please sign in to comment.