Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 2.3 Merge #97

Open
wants to merge 142 commits into
base: snappy/branch-2.3
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
142 commits
Select commit Hold shift + click to select a range
7a1ace9
[SPARK-13904][SCHEDULER] Add support for pluggable cluster manager
Apr 17, 2016
06eca13
[SPARK-14729][SCHEDULER] Refactored YARN scheduler creation code to u…
Apr 27, 2016
e9f80e6
[SNAPPYDATA] increasing visibility of SparkContext.activeContext
Nov 21, 2015
009ab91
[SNAPPYDATA] add SnappyData's modification headers in updated files
Dec 30, 2015
8998a02
[SNAP-404] Address #comment about increasing decimal precision
Jan 12, 2016
cc40bc9
[SNAP-860] Removed hardcoding of size of Array used for storing Decim…
SachinJanani Jul 14, 2016
79130cb
[SNAPPYDATA] Try hard to not schedule on others if ExecutorCacheTaskL…
Jul 29, 2016
596ac89
[SNAPPYDATA] Use SnappyContext as default SQLContext on shell (#35)
nthanvi May 17, 2016
f5bc2d6
[SNAP-643] Increase visibility of some methods in GenerateUnsafeProje…
Mar 28, 2016
3104dd1
[SNAPPYDATA] Fixing sequence of expression in an option
ahshahid Apr 28, 2016
4cf0d5d
[SNAP-931] Use non-secure randomUUID where appropriate (#40)
Jul 28, 2016
a6d7679
[SNAPPYDATA] Adding SnappyData modification headers for missing files
Jul 29, 2016
b1dafd5
[SNAPPYDATA] Updated README.md with information on SnappyData's changes
Feb 1, 2016
71b22d0
[SNAPPYDATA] Optimizations for bootstrap
ahshahid Jun 7, 2016
490663f
[SNAPPYDATA] Gradle build scripts and build fixes
Jul 31, 2016
ecb7f7f
[SNAPPYDATA] Dynamic CQ changes in spark streaming
Nov 30, 2015
40a4ea0
[SNAPPYDATA] Fix cluster startup due to executionId format
Aug 8, 2016
37ee514
[SNAPPYDATA] Accept Spark properties with "snappydata." prefix
nthanvi Dec 2, 2015
53f2d15
[SNAPPYDATA] More fixes for SnappyData for Spark 2.0
Aug 16, 2016
3e82acb
Snap 293 (#1)
nthanvi Aug 25, 2016
27025d1
[SNAPPYDATA] fix a scalaStyle issue
Sep 2, 2016
9ac6661
[SNAP-966] Prefer conversions to date/timestamp and not strings (#7)
Sep 7, 2016
8455f59
[SNAP-1034] Optimizations at Spark layer as seen in profiling (#10)
Sep 7, 2016
58af690
[SNAPPYDATA] Updated Benchmark code from Spark PR#13899
Sep 11, 2016
1c4ff5a
[SNAPPYDATA] Spark version 2.0.1-2
Sep 20, 2016
5acb359
[SNAPPYDATA] fixing antlr generated code for IDEA
Sep 22, 2016
26adf26
[SNAP-1083] fix numBuckets handling (#15)
Oct 17, 2016
5cacaa1
[SNAPPYDATA] Spark version 2.0.1-3
Oct 20, 2016
2142c81
[SNAPPYDATA] updating snappy-spark version after the merge
Oct 24, 2016
ce30bd9
[SNAPPYDATA] Bumping version to 2.0.3-1
Nov 24, 2016
46c3807
[SNAPPYDATA] Made two methods in Executor as protected to make them c…
rishitesh Nov 27, 2016
1c255df
[SNAPPYDATA]: Honoring JAVA_HOME variable while compiling java files
Nov 28, 2016
455f328
[SNAP-1198] Use ConcurrentHashMap instead of queue for ContextCleaner…
Dec 1, 2016
17b6b3b
[SNAP-1194] explicit addLong/longValue methods in SQLMetrics (#33)
Dec 3, 2016
98b2f85
[SNAPPYDATA] More optimizations to UTF8String
Nov 24, 2016
9a566cc
[SNAPPYDATA] Adding fixed stats to common filter expressions
Dec 9, 2016
f27ef92
[SNAPPYDATA] adding kryo serialization missing in LongHashedRelation
Dec 9, 2016
6951cda
[SNAPPYDATA] Correcting HashPartitioning interface to match apache spark
Dec 10, 2016
b0400cd
[SNAP-1233] clear InMemorySorter before calling its reset (#35)
Dec 11, 2016
1f46a57
[SNAPPYDATA] Adding more filter conditions for plan sizing as followup
Dec 12, 2016
66d9d42
[SNAPPYDATA] reduced factors in filters a bit to be more conservative
Dec 13, 2016
ba84de4
[SNAP-1240] Snappy monitoring dashboard (#36)
snappy-sachin Dec 14, 2016
f796276
[SNAP-1251] Avoid exchange when number of shuffle partitions > child …
Dec 15, 2016
718a20b
[SNAPPYDATA] reverting lazy val to def for defaultNumPreShufflePartit…
Dec 15, 2016
0af421b
[SNAPPYDATA] Code changes for displaying product version details. (#38)
snappy-sachin Dec 15, 2016
d544f95
[SNAPPYDATA] Removing duplicate RDD already in snappy-core
Dec 18, 2016
5bc6591
SNAP-1257 (#40)
snappy-sachin Dec 20, 2016
587a3f9
[SNAPPYDATA] Spark Version 2.0.3-2
Dec 21, 2016
f49a3ac
SNAP-1281: UI does not show up if spark shell is run without snappyda…
snappy-sachin Jan 3, 2017
9f91917
[SNAP-1185] Guard logging and time measurements (#28)
Nov 30, 2016
e25e4b9
Snap 982 (#43)
rishitesh Feb 2, 2017
1694567
Snap 1890 : Snappy Pulse UI suggestions for 1.0 (#69)
snappy-sachin Aug 8, 2017
0c52ebd
[SNAP-1067] Optimizations seen in perf analysis related to SnappyData…
Oct 24, 2016
59d8076
[SNAP-1067] Optimizations seen in perf analysis related to SnappyData…
Oct 26, 2016
b500538
[SNAP-1136] Kryo closure serialtization support and optimizations (#27)
Nov 28, 2016
c874a10
[SNAP-1190] Reduce partition message overhead from driver to executor…
Dec 3, 2016
4795553
[SNAP-1202] Reduce serialization overheads of biggest contributors in…
Dec 3, 2016
7a6836c
[SNAPPYDATA] Code changes for displaying product version details. (#38)
snappy-sachin Dec 15, 2016
b0d9118
[SNAPPYDATA] Update to gradle-scalatest version 0.13.1
Jan 25, 2017
b0fe78d
[SNAPPYDATA] Skip cast if non-nullable type is being inserted in null…
Jan 12, 2017
db700b6
[SNAPPYDATA] optimized versions for a couple of string functions
Jan 12, 2017
573dbd4
[SNAPPYDATA] Increasing the code generation cache eviction size to 30…
Feb 9, 2017
216c3e5
[SNAP-1398] Update janino version to latest 3.0.x
Mar 10, 2017
535ebb7
[SNAPPYDATA] made some methods protected to be used by SnappyUnifiedM…
rishitesh Mar 30, 2017
bf9784f
[SNAPPYDATA] Reducing file read/write buffer sizes
May 9, 2017
ec242aa
[SNAP-1486] make QueryPlan.cleanArgs a transient lazy val (#51)
May 29, 2017
7bf9d90
[SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap (#53)
rishitesh Jun 2, 2017
366007e
SNAP-1545: Snappy Dashboard UI Revamping (#52)
snappy-sachin Jun 2, 2017
b91dfce
[SNAPPYDATA] fixing scalastyle errors introduced in previous commits
Jun 4, 2017
6cd99a0
SNAP-1698: Snappy Dashboard UI Enhancements (#55)
snappy-sachin Jun 7, 2017
a7d038b
[SNAPPYDATA] reduce a byte copy reading from ColumnVector
Jul 2, 2017
33d5a2a
[SNAPPYDATA] moved UTF8String.fromBuffer to Utils.stringFromBuffer
Jul 3, 2017
89c9fab
[SNAPPYDATA] handle "prepare" in answer comparison inside Map types too
Feb 23, 2017
83ae1ec
[SNAPPYDATA] reverting changes to increase DECIMAL precision to 127
Feb 23, 2017
747fe5e
[SNAPPYDATA][MERGE-2.1] Some fixes after the merge
Nov 17, 2016
50e7c81
[SNAPPYDATA][MERGE-2.1]
Feb 18, 2017
f5f4761
[SNAPPYDATA][MERGE-2.1]
May 21, 2017
17001ab
[SNAPPYDATA][MERGE-2.1]
Jun 23, 2017
6915b5a
[SNAPPYDATA][MERGE-2.1]
Jul 6, 2017
40ea70c
[SNAP-1790] Fix one case of incorrect offset in ByteArrayMethods.arra…
Jul 11, 2017
0b1e993
[SNAPPYDATA][MERGE-2.1] Missing patches and version changes
Jul 9, 2017
ffa86a7
[SNAP-1389] Optimized UTF8String.compareTo (#62)
Jul 18, 2017
8293c86
[SNAPPYDATA][PERF] optimized pattern matching for byte/time strings
Jul 23, 2017
5fd52dd
SNAP-1792: Display snappy members logs on Snappy Pulse UI (#58)
snappy-sachin Jul 25, 2017
e7392ba
SNAP-1744: UI itself needs to consistently refer to itself as "Snappy…
snappy-sachin Jul 31, 2017
08fc9b6
[SNAP-1377,SNAP-902] Proper handling of exception in case of Lead and…
SachinJanani Aug 9, 2017
0e2181a
Snap 1833 (#67)
rishitesh Aug 9, 2017
638b0fb
Refactored the executor exception handling for cache (#71)
rishitesh Aug 15, 2017
4d16611
[SNAP-1930] Rectified a code in WholeStageCodeGenRdd. (#73)
rishitesh Aug 16, 2017
a2ecd8b
Snap 1813 : Security - Add Server (Jetty web server) level user authe…
snappy-sachin Aug 17, 2017
61e9811
[SNAPPYDATA] fixing scalastyle failure introduced by last commit
Aug 18, 2017
a9a70db
Resized company logo (#74)
snappy-sachin Aug 18, 2017
8451e55
[SNAPPYDATA] update janino to latest release 3.0.7
Aug 19, 2017
1c26655
[SNAP-1951] move authentication handler bind to be inside connect (#75)
Aug 21, 2017
422e723
Bump version spark 2.1.1.1-rc1, store 1.5.6-rc1 and sparkJobserver 0.…
Aug 24, 2017
547a8d7
Updated the year in the Snappydata copyright header. (#76)
Aug 30, 2017
a5e767a
[SNAPPYDATA] upgrade netty versions (SPARK-18971, SPARK-18586)
Aug 30, 2017
73b9c6d
[SNAPPYDATA] more efficient passing of non-primitive literals
Sep 2, 2017
435d321
[SNAP-1993] Optimize UTF8String.contains (#78)
Sep 5, 2017
ced024b
[SNAPPYDATA][AQP-293] Native JNI callback changes for UTF8String (#80)
Sep 10, 2017
90166b8
[SNAPPYDATA] update jetty version
Sep 11, 2017
055753c
[SNAP-2033] pass the original number of buckets in table via Orderles…
Sep 16, 2017
bed1dde
Update versions for snappydata 1.0.0, store 1.6.0, spark 2.1.1.1 and …
Sep 20, 2017
525f4d6
[SNAPPYDATA] use common "vendorName" in build scripts
Sep 20, 2017
b21cacf
[SNAPPYDATA] relax access-level of Executor thread pools to protected
Oct 6, 2017
7007a35
[SNAPPYDATA] version upgrades as per previous cherry-picks
Oct 10, 2017
cca3704
Snap 2044 (#85)
rishitesh Oct 23, 2017
d7140c3
Snap 2061 (#83)
ahshahid Oct 24, 2017
560cafa
[SNAPPYDATA] build changes/fixes (#81)
Oct 24, 2017
ddc54ad
[SNAP-2061] fix scalastyle errors, add test
Oct 24, 2017
2e01d6e
[SNAPPYDATA] add missing jersey-hk2 dependency
Dec 4, 2017
ab61be2
[SNAPPYDATA][SNAP-2120] make codegen cache size configurable (#87)
Dec 26, 2017
7541619
Snap 2084 (#86)
rishitesh Dec 28, 2017
6f1b860
[SNAPPYDATA] some optimizations to ExecutionMemoryPool
Jan 8, 2018
c03599e
[SNAPPYDATA] fixing all failures in snappy-spark test suite
Jan 30, 2018
b5136bf
[SNAPPYDATA] fixing one remaining failure in gradle runs
Jan 31, 2018
6532714
Preserve the preferred location in MapPartitionRDD. (#92)
rishitesh Feb 12, 2018
b35bd98
* SnappyData Spark Version 2.1.1.2
Feb 12, 2018
55aa6e3
[SNAP-2218] honour timeout in netty RPC transfers (#93)
Feb 16, 2018
5db16fe
Adding commons crypto libraries
Mar 12, 2018
787deb8
compilation issues
Mar 12, 2018
39fb136
fixing compilation errors
Mar 13, 2018
1cebe77
compilation errors
Mar 16, 2018
7940bbb
compilation issues
Mar 19, 2018
ad82beb
compilation issues
Mar 19, 2018
11523f2
Added new build.gradle for spark-kvstore subproject
Mar 21, 2018
b41d4f8
Addressing precheckin failures
Mar 29, 2018
51d65e4
Addressing precheckin issues
Apr 2, 2018
e6e3859
Addressing precheckin failures
Apr 6, 2018
5ddf8e8
Addressing precheckin failures
Apr 9, 2018
b1a7664
Addressing prechekin failures
Apr 23, 2018
4b275d8
Addressing precheckin failures
Apr 30, 2018
7c936df
Addressing precheckin failures
May 2, 2018
11a26a6
Addressing precheckin failures
May 4, 2018
9552470
Addressing precheckin failures
May 8, 2018
c7d6ed1
[SNAPPYDATA] revert changes in Logging to upstream
Mar 1, 2018
150f5ad
[SNAPPYDATA] Changed TestSparkSession in test class APIs to base Spar…
hemanthmeka Mar 7, 2018
38c4f16
[SNAPPYDATA] increased default codegen cache size to 2K
Mar 8, 2018
6863e44
[SNAPPYDATA] make Dataset.boundEnc as lazy val
Mar 22, 2018
2049882
[SNAP-2242] Unique application names & kill app by names (#98)
rishitesh Mar 21, 2018
74b3598
[SNAP-2225] Removed OrderlessHashPartitioning. (#95)
rishitesh Mar 19, 2018
58d93d9
Fixing issues after master down merge
May 12, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ spark-tests.log
src_managed/
streaming-tests.log
target/
build-artifacts/
unit-tests.log
work/

Expand All @@ -91,3 +92,6 @@ spark-warehouse/
*.Rproj.*

.Rproj.user

# gradle specific
.gradle/
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
## SnappyData's extensions to Spark

- SnappyData collocates Spark executors with its in-memory data store in the same JVM. To achieve this, support for external cluster manager in Spark 2.0 is used to add a SnappyData cluster manager.
- SnappyData's MemoryManager was needed to generate and handle memory events. A property spark.memory.manager is now used to specify a memory manager other than Spark's own.
- To display the consumption of memory in an external embedded store, Spark's storage UI was updated.
- Support for getting length of type (for VARCHAR) was added in the JDBCDialect class.
- For SnappyData, dynamic continous queries on streams would be enabled in future. For that, support for registering DStreams after streaming context has started is added.
- For partitioning, sequence of expressions can be provided. SnappyData adds OrderlessHashPartitioning that does not take into account order of expressions while partitioning.
- Hive client thread-local configuration changed to be instance specific.
- Hive client added support for dropTable and listing tables for all databases.
- RDD partitions with executor specific preferred locations will be forced to be routed to one of those executors if alive.
- An "unsecure" version of random UUID added in DiskBlockManager for temporary file names.
- Added a fix for SPARK-13116.
- Increased visibility of some classes/methods.


# Apache Spark

Spark is a fast and general cluster computing system for Big Data. It provides
Expand Down
137 changes: 137 additions & 0 deletions assembly/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/*
* Copyright (c) 2017 SnappyData, Inc. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you
* may not use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
* implied. See the License for the specific language governing
* permissions and limitations under the License. See accompanying
* LICENSE file.
*/

description = 'Spark Project Assembly'

dependencies {
compile project(subprojectBase + 'snappy-spark-core_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-catalyst_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-sql_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-repl_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-streaming_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-streaming-kafka-0.8_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-streaming-kafka-0.10_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-sql-kafka-0.10_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-mllib_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-graphx_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-yarn_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-mesos_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-hive_' + scalaBinaryVersion)
compile project(subprojectBase + 'snappy-spark-hive-thriftserver_' + scalaBinaryVersion)
if (rootProject.hasProperty('kubernetes')) {
compile project(subprojectBase + 'snappy-spark-kubernetes_' + scalaBinaryVersion)
}
if (rootProject.hasProperty('spark-ganglia-lgpl')) {
compile project(subprojectBase + 'snappy-spark-ganglia-lgpl_' + scalaBinaryVersion)
}
}

def cleanProduct() {
delete "${sparkProjectRootDir}/python/lib/pyspark.zip"
delete snappyProductDir
}
clean.doLast {
cleanProduct()
}

task product(type: Zip) {
def examplesProject = project(subprojectBase + 'snappy-spark-examples_' + scalaBinaryVersion)
String yarnShuffleProject = subprojectBase + 'snappy-spark-network-yarn_' + scalaBinaryVersion
dependsOn jar, examplesProject.jar, "${yarnShuffleProject}:shadowJar"
// create python zip
destinationDir = file("${snappyProductDir}/python/lib")
archiveName = 'pyspark.zip'
from("${sparkProjectRootDir}/python") {
include 'pyspark/**/*'
}

doFirst {
cleanProduct()
}
doLast {
// copy all runtime dependencies (skip for top-level snappydata builds)
if (rootProject.name == 'snappy-spark') {
copy {
from(configurations.runtime) {
// exclude antlr4 explicitly (runtime is still included)
// that gets pulled by antlr gradle plugin
exclude '**antlr4-4*.jar'
// exclude scalatest included by spark-tags
exclude '**scalatest*.jar'
}
into "${snappyProductDir}/jars"
}
}
// copy scripts, data and other files that are part of distribution
copy {
from(sparkProjectRootDir) {
include 'bin/**'
include 'sbin/**'
include 'conf/**'
include 'data/**'
include 'licenses/**'
include 'python/**'
include 'examples/src/**'
}
into snappyProductDir
}
def sparkR = 'sparkProjectRootDir/R/lib/SparkR'
if (file(sparkR).exists()) {
copy {
from sparkR
into "${snappyProductDir}/R/lib"
}
}

// copy yarn shuffle shadow jar
copy {
from "${project(yarnShuffleProject).buildDir}/jars"
into "${snappyProductDir}/yarn"
}
// copy examples jars
copy {
from "${examplesProject.buildDir}/jars"
into "${snappyProductDir}/examples/jars"
}
// create RELEASE file, copy README etc for top-level snappy-spark project
if (rootProject.name == 'snappy-spark') {
copy {
from(sparkProjectRootDir) {
include 'LICENSE'
include 'NOTICE'
include 'README.md'
}
into snappyProductDir
}
def releaseFile = file("${snappyProductDir}/RELEASE")
String buildFlags = ''
if (rootProject.hasProperty('docker')) {
buildFlags += ' -Pdocker'
}
if (rootProject.hasProperty('ganglia')) {
buildFlags += ' -Pganglia'
}
String gitRevision = "${gitCmd} rev-parse --short HEAD".execute().text.trim()
if (gitRevision.length() > 0) {
gitRevision = " (git revision ${gitRevision})"
}

releaseFile.append("Spark ${version}${gitRevision} built for Hadoop ${hadoopVersion}\n")
releaseFile.append("Build flags:${buildFlags}\n")
}
}
}
Loading