forked from TIBCOSoftware/snappydata
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathReleaseNotes.txt
892 lines (554 loc) · 39.9 KB
/
ReleaseNotes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
####################################################################################################
# PLEASE KEEP THE WIDTH OF THE LINES BELOW WITHIN 100 CHARACTERS. #
# MOST RECENT CHANGE AT THE TOP. #
# KEEP THE DESCRIPTION OF EACH OF YOUR CHANGES THAT NEEDS TO BE PUT INTO THE RELEASE NOTES TO ONE #
# TO THREE LINES. #
# KEEP A LINE BLANK BETWEEN TWO NOTES. #
# ADD THE JIRA TICKET ID, IF APPLICABLE. #
####################################################################################################
Release 1.0.1
- New Features/Fixed Issues
[SNAP-2214][SNAP-2036] Fixed OOME after restart with heap, projection pushdown. (#960)
Fixed putInto inner join cache perf and related issues. (#958)
[SNAP-2212] Fixed failure in TPCH Q21, by re-evaluating check condition for all joins. (#959)
[SNAP-2205] Fixed scala.MatchError in SnappyEmbeddedTableStatsProviderService on cluster restart.
[SNAP-2175] Handle no GemFireCache in smart connector mode. (#956)
Explicit Action for put innerjoin cache. Materialized cache for intermediate inner join in a put
operation. (#955)
[SNAP-2204] Search through aliases (e.g. for VIEWs) for colocated join keys. (#954)
[SNAP-2200] Fixed ClassCastException when reading from overflowed update deltas.
[SNAP-2178] Increase the time to wait for servers to join.
[SNAP-2191] Disable zeppelin interpreter from within lead process when security is enabled. (#946)
[SNAP-2194] Add partition pruning for column tables to smart connector. (#952)
[SNAP-2180] Fixed snappy pulse UI showing zero memory usage on data server, on active lead node
restart by explicitly initializing memoryMap on UMM start. (#951)
[SNAP-2192] Delay rollover in column updates to pre-commit. (#950)
[SNAP-2124] Fixed rows missing in update due to incorrect stats row read. (#945)
[SNAP-1283] LATERAL VIEW support in SnappyParser. (#944)
[SNAP-1840] Fixed TPCH Q22 in Smart Connector mode due to NPE in CollectAggregateExec.
(5955ce7) Fixed some snappy-spark failures and miscellaneous changes.
[SNAP-2042] Added GRANT/REVOKE support from SnappySession.
(41ed1ca) Use power of 2 for number of buckets in tests/docs.
[SNAP-2178] Wait for servers to join in LeadImpl start and start stats service only after some
servers have joined. Likewise for creating the global SnappyContext.
[SNAP-2170] Reduced the scope of global lock in SnappyContext.stopSnappyContext to fix deadlock
in lead shutdown.
[SNAP-2088] Fixes for queries with filters on columns with null values. (#937)
Parser performance improvements to recover the regression over 0.9 release. Also, optimized and
enhanced numeric/decimal literal handling. (#936)
[SNAP-2086] Snappy Pulse displays list of external tables.
(1d5cfd4) Some enhancements towards snappydata security. (#930)
[SNAP-2102] Added memory+disk optimized column batch iterators. (#933)
[SNAP-2118] Allow reading previous variable length value again. (#929)
[SNAP-1501] Set overflow-to-disk as the only evict-action for tables. (#924)
By default, column and row tables will have heap-based eviction enabled with overflow-to-disk as
evict action. Allow OVERFLOW=false to disable eviction if EVICTION_BY is absent in DDL.
[SNAP-2114] Plan caching is now attempted only for snappy tables. (#922)
[SNAP-2125] Added setter commands to disable plan cachin on current session and on all sessions.
[SNAP-2093] Support ColumnTable PutInto & DeleteFrom API. (#906)
[SNAP-2141] Fix updates on complex types. (#925)
[SNAP-2146] Avoid prefixing zeppelin properties with "snappydata.store".
(b139935) Refactored ByteBufferHashMap into a generic base class.
(1c9e661) Instead of an explicit property to acquire read or write locks (which is supposed to be
set by scripts), if some other server has already initialized the hive metastore, then
automatically drop to read-lock to avoid servers unnecessarily blocking each other.
[SNAP-338] Improvements in cluster startup time.
* Rreduce discovery/join timeouts for first locator.
* A faster launcher that avoids loading any other classes (other than gemfire-shared and JNA)
* Jobserver startup (and thus the global SnappySession initialization) in background
* Initialize the hive catalog in background. (#911)
* Updated SnappyData type registration. (#910)
(fd33f31) Avoid infinite retries in Utils.mapExecutors. In case an executor goes away then
retries in Utils.mapExecutors can get stuck in infinite retry loop so break it after a few
attempts. Changed PooledKryoSerializer to use direct buffers for Output.
(0b34233) Fixing a couple of issues seen in ODBC testing.
[SNAP-2127] Use separate delta disk-stores for row buffer regions. (#918)
[SNAP-2084] Handled dropStorageMemoryForObject in DefaultMemoryManager. (#892)
[SNAP-2121] Mark delta regions to use the delta diskstore. (#916)
[SNAP-2122] Use a canonical representation of DistributedMembers in query routing comparisons.
[SNAP-2120] Use "spark.sql.codegen.cacheSize" for Snappy caches. (#915)
[SNAP-338] Changes related to locator startup time improvements. (#909)
[SNAP-1743] Compress column batches when storing to disk or sending over network. (#905)
Changes to ColumnFormatValue serialization/deserialization to deal with compression transparently
when storing to disk or sending over network.
[SNAP-2063] Thrift servers were getting started in rowstore mode instead of DRDA server. (#907)
[SNAP-2116] Auto-configuration for AWS and local clusters. (#908)
[GITHUB-900] Fix case-sensitivity of columns in CREATE INDEX. (#904)
Fix the case of remote pull from smart connector. (#896)
[SNAP-2101] Smart connector performance fixes and related issues. (#895)
With above changes (+ the store ones), the performance for smart connector mode in
ColumnCacheBenchmark has improved by more than 2X and now within expected range: from 12-13ns per
row to 5-6ns per row. It is now 3-4X faster than Spark caching and 2-3X faster than direct Parquet
scan having compression=none and entirely in OS buffers.
(0b09eea) Fixing failure in QueryRoutingDUnitSecurityTest; dropTable should always throw back
SQLExceptions and not proceed with unresolved relation.
[SNAP-2072][SNAP-2073] Fix external connectors and support VIEWs. (#887)
This commit fixes primarily two issues:
1. External connectors not working in smart connector mode since the required libraries may not
be available in the embedded cluster. This happens because the BaseRelation is attempted to be
resolved in both "CREATE EXTERNAL TABLE" and "DROP TABLE". Now resolve all required information
(schema, inbuilt or not) at the driver connector JVM and send that in the procedure calls for
external providers.
2. Support for VIEW, VIEW...USING (temporary, global and persistent) in the parser.
(916cea3) Fix UDT reads/writes for row buffer. Use the "inner" sqlType for UDTs in schema mapping.
Same in the CodeGeneration row buffer/table fragments for PreparedStatement set or read.
Read underlying data as byte array directly if incoming type is SerializedRow/Map/Array.
Added efficient serialization for SerializedMap (like already done so for SerializedRow/Array).
[SNAP-2077] Modified the parser to understand FETCH FIRST syntax also. FETCH NEXT will be taken
with OFFSET support if required. (#876)
(d1987c7) Removed unused ExternalEmbeddedMode and "snappydata.embedded" property.
[SNAP-2044] Integrate Snappy python tests with precheckin (#879)
[SNAP-1986] Use a global lock throughout hive client initialization which ensures no two hive
client initializations end up trying to create the hive directory. (#878)
[SNAP-2068] Added ThreadFactory to SnappyExecutor to cleanup thread artifacts on close with
ConnectionTable.releaseThreadsSockets() as done by other pool threads. (#872)
[SNAP-1960] Fix the RUNNING status being set prematurely by removing the override of running in
LeadImpl which is no longer required. (#871)
[SNAP-2056] Use Spark JacksonGenerator with separate JSON generators per column to convert type
to JSON format. (#866)
Release 1.0.0
- New Features/Fixed Issues
[SNAP-953] Add RPM/DEB installer packaging targets using the Netflix Nebula ospackage gradle
plugin.
[SNAP-2039] Correct null updates to column tables. (#861)
Use concurrent TrieMaps in SnappySession contextObjects, and queryHints map. Reason being that
SnappySession can be read concurrently by multiple threads from same query for sub-query/broadcast
kind of plans where planning for the BroadcastExchangeExec plan happens in parallel on another
thread.
[SNAP-2029] Added new "snappydata.preferPrimaries" option to prefer primaries for queries. (#852)
Avoid double memory at the cost of reduced scalability but still having a hot backup.
See discussion on Slack: https://snappydata-public.slack.com/archives/C0DCF0UGG/p1505460492000378
Fixed a parser issue where AS can be optional in namedExpression rule. This fixes Q77 of TPCDS.
[SNAP-2030] Now routed update and delete query on row table would return number of affected rows.
[SNAP-2028] Snappy Python APIs fixes. (#851)
A) Some of the SparkSession python APIs used to pass SQLContext to DataFrameWriter and
DataFrameReader APIs.
B) Fixed truncate table API.
Fixed a couple of issues in parser. (#849)
1. Order by and sort by clauses after partition by can be optional.
2. INTERVAL non reserved key word was being treated as an identifier because of optional clauses
ordering.
[SNAP-2022] Remove the check which tested if any lead is already stopped, in snappy-stop-all.sh
(#845). This was causing the script to skip shutting down of other running leads, if any. Added
a check for rowstore, so that 'sbin/snappy-stop-all.sh rowstore' doesn't see the message.
[SNAP-2020] Track in-progress insert size to avoid data skew. (#844)
With many concurrent inserts/partitions on a node, significant data skew in inserts was still
observed (on machines with large number of cores like 32) due to same smallest bucket being
chosen by multiple partitions. This change now tracks the in-progress size for bucket and adds
that to determine smallest bucket.
[SNAP-2012] Skip locked entries in evictor. (#839)
Fix as suggested by @rishitesh to use Unsafe API to try acquire monitor on RegionEntry.
Hiding commands not applicable to snappydata (will be continued to be displayed for GemFireXD and
RowStore mode). (#838)
[SNAP-2003] Fix for 'stream to big table join CQ returning incorrect result'. (#829)
HashJoinExec's streamPlan and buildPlan RDDs are computed on each CQ execution.
[AQP-293] Changes for JNI UTF8String.contains. (#832)
Convert UTF8Strings in ParamLiteral to off-heap when snappydata's off-heap is enabled.
Changes in SnappyParser. Also, updated parboiled2 to latest release.
[SNAP-1995] Added a python example showcasing KMeans usage. (#827)
Fix an issue in collect-debug-artifacts script with extraction. Skip any configuration checks in
collect-debug-artifacts for extraction (-x, --extract=).
[SNAP-1993] Fixes for data skew when no partition_by is provided. (#825)
With these changes, distribution in ColumnCacheBenchmark test, for example, is nearly equal most
of the time among the buckets. Other cases like those reported originally with 7M rows have only
~50% difference between min and max (as compared to ~4X originally)
Remove ParamLiteral for LIKE/RLIKE/REGEXP. If expression foldable is false, then LIKE family
generates very suboptimal plan (if not converted to Contains/StartsWith/EndsWith) that will
compile the Regex for every row.
[SNAP-1984] Changes to retain UnifiedMemoryManager state across executor restart by copying the
state in a temporary memory manager, which is created when store boots up but Spark environment is
not ready. (#821)
[SNAP-1981] For prepare phase, avoid rules that do not handle NullType since that is what is used
as placeholder for params. (#815)
[SNAP-1851] Properly closing the connection in case when connection commit fails. (#796)
[SNAP-1976] Changes to set isolation level. (#813)
Allow operations on row and column tables if isolation level is set to other than NONE and
autocommit is true (query routing is enabled). If autocommit is false, query routing will be
disabled and transactions on row tables will be supported. Queries on column tables will error out
when query routing is disabled.
[SNAP-1973][SNAP-1970] Avoid clearing hive meta-store system properties. (#816)
The hive meta-store system properties are required to be set for static initialization of Hive and
should not be cleared because a concurrent hive context initialization (from some other path) can
see inconsistencies like system property found but not available when actually read.
[SNAP-1979] Added MemoryManagerStats for capturing different stats for UnifiedMemoryManager.(#814)
Smart Cconnector mode will not have these stats as GemFireXD cache will not be available.
[SNAP-1982] Change batch UUID to be a long (#812)
Now using region.newUUID to generate the batch UUID. Use colocatedRegion of column table (the row
buffer) to generate the UUID since that is what smart connector and internal rollover uses.
[SNAP-1611] Increased spark.memory.fraction from 92% to 97% (#808)
We want to give a little buffer to JVM before it reaches the critical hep size.
Make SnappySession.contextObjects as transient to fix the serialization issues reported on
spark-shell when SnappySession gets captured in closures (e.g. import session.implicits._ with
toDF)
[SNAP-1955] Fixes for issues seen in parallel test runs (#805)
[SNAP-1660] Remove password from product logging.
[SNAP_1948] Added an option to specify streaming batch interval during streaming job submission.
e.g. bin/snappy-job.sh submit --lead localhost:8090 --app-name appname --class appclass \
--app-jar appjar --conf logFileName=demo.txt --stream --batch-interval 4000
[SNAP-1893] Changed locator status to RUNNING after stopped locator is restarted with
snappy-start-all.sh
[SNAP-1877] GC issues with large dictionaries in decoding and other optimizations (#787)
1. Performance issues with dictionary decoder when dictionary is large. 2. Data skew fixes. 3.
Using a consistent sort order so that generated code is identical across sessions for the same
query. 4. Reducing the size of generated code.
Fix issues seen during concurrency testing (#782)
[SNAP-1884] Fixed result mismatch in join between snappy table and temp table.
Overridden two methods from Executor.scala. (#783) These methods have been added in Spark
executor to check store related errors.
[SNAP-1917] Properly comparing datatype of complex schema.
[SNAP-1919][SNAP-1916] Added isPartitioned flag to determine partitioned tables (#784)
[SNAP-1904] Use same connection for rowbuffer and columnstore.
[SNAP-1883] Parser change for range operator.
Fixed: After new job classloader changes executors are not fetching driver files. (#777)
[SNAP-1894] Codegen issue for query with case in predicate expression (#772)
[SNAP-1888][SNAP-1886] Fixed parser error in two level nested subQuery, works with Spark (#774)
[Snap 1833] Fixed the synchronization problem with sc.addJar() (#728)
[SNAP-1377][SNAP-902] Proper handling of exceptions in case of Lead and Server HA (#758)
[Snap 1871] Remove custom built-in jdbc provider and instead use spark's JDBC provider (#757)
[SNAP-1882] Changes done for routing update and delete queries on column table to lead node.
Also handled prepared statement on update and delete queries for column table.
[SNAP-1885] Fixed Semijoin returning incorrect result (#768)
[SNAP-1787] - Handling Array[Decimal] in both embedded and split mode (#754)
[SNAP-1892] .show() after table creation using CreateExternalTable api gives empty/null
entries, caused due to empty UserSpecifiedSchema instead of None (#764)
[SNAP-1734] Query plan shows 0 number of output rows at the start of the plan. (#761)
Snappy's execution happens in two phases. First phase the plan is executed to create a rdd
which is then used to create a CachedDataFrame. In second phase, the CachedDataFrame is then
used for further actions. For accumulating the metrics for first phase,
SparkListenerSQLPlanExecutionStart is fired. This keeps the current executionID in
_executionIdToData but does not add it to the active executions. This ensures that query is not
shown in the UI but the new jobs that are run while the plan is being executed are tracked
against this executionID. In the second phase, when the query is actually executed,
SparkListenerSQLPlanExecutionStart adds the execution data to the active executions.
SparkListenerSQLPlanExecutionEnd is then sent with the accumulated time of both the phases. For
consuming SparkListenerSQLPlanExecutionStart, Snappy's SQLListener has been added. Overridden
withNewExecutionId in CachedDataFrame so that the above explained process also happens when the
dataset APIs are used.
[SNAP-1878] Proper handling of path option while creation of external table using API (#760)
[SNAP-1850] Remove connection used in JDBCSourceAsColumnarStore#getPartitionID v2 (#750)
[SNAP-1389] Update and delete support for column tables (#747)
[SNAP-1426] Fixed the Snappy Dashboard freezing issue when loading data sets (#732)
Making background start of multi-node cluster as default
[SNAP-1860] Close the connection if \commit/rollback is not done (#746)
Made changes to make sure to commit/rollback the snapshot tx in case of exception. e.g Security
related while trying to iterate over the region.
[SNAP-1656] Security support in snappydata (#731)
Enable LDAP based authentication and authorization in Snappydata cluster.
Support for snapshot transactional insert in column table (#718)
[SNAP-1825][SNAP-1818] DDL routing changes (#742)
Fix for ALTER TABLE ADD column does not work in case of row table when the table is altered
after inserting data and CREATE ASYNCEVENTLISTENER doesn't work with lead node.
Removing old 2.0.x backward compatibility classes.
Fixes the "describe table" from Spark and shows the full schema.
[SNAP-1268] Code changes to start SnappyTableStatsProviderService service only once. (#738)
[SNAP-1838] skip plan cache clear if there is no SparkContext
Fixes for issues found during concurrency testing (#730)
[SNAP-1815] Disallow configuration of Hive metsatore using hive.metastore.uris property in
hive-site.xml (#714)
[SNAP-1708] collect-debug-artifacts script won't need both way ssh now. (#723)
[SNAP-1723] When foldable functions are there in the queries and literals are there in their
argument then identify case where Tokenization should be stopped. Added a bunch of such functions
with corresponding relevant argument numbers for that. (#706)
[SNAP-1806] Changed the exception handling in SnappyConnector mode. (#719)
Support for setting scheduler pools using the set command (#700)
[SNAP-671] Added support for DSID to work for column tables (#716)
Added a task context listener to explicitly remove the obtained memory. (#713)
[SNAP-1326] SnappyParser changes to support ALTER TABLE ADD/DROP COLUMN DDLs (#711)
[SNAP-1808] Create cachedbatch tables in user's schema instead of the earlier common schema
SNAPPYSYS_INTERNAL. Changes from Sumedh @sumwale (#712)
[SNAP-1805] Fixed Query Execution statistics are not getting displayed in SQL graph, caused
because function to withNewExecutionId was executed before it was passed as argument (#703)
[SNAP-1777] Increasing default member-timeout for SnappyData (#704)
[SNAP-1610] Removing the code related to split cluster mode (that was disabled for users in 0.9
release) (#696)
[SNAP-1363] Performance degrades because of PoolExhaustedException when run from connector mode.
Increasing max connection pool size since there is an idle timeout in the pool implementations
(default: 120s), so cleanup of unused connections will happen in any case.
[SNAP-1794] Modified code generation of DynamicFoldableExpression such that even the
initMutableState splits into multiple init() functions, code will be generated properly. (#699)
Changes for Apache Spark 2.1.1 merge (#695)
[SNAP-1451] set default startup-recovery-delay to 102s for Snappy tables to avoid interfering
with initial bucket creation.
[SNAP-1722] Test to validate support for long, short, tinyint and byte datatypes for row tables
(#689)
Spark 2.1 Merge (#501)
Fixing NoSuchElementException "None get" in dropTable. Using the global SparkContext directly
instead of getting from active SparkSession (which may not exist) in hive meta-store listener.
[SNAP-1688] CachedDataFrame memory allocation should be accounted with execution memory rather
than storage memory.
[SNAP-1748] Fixed: Without persistence, data loading is unsuccessful with eviction on (#682)
[SNAP-1721] Avoid code generation failure in WorkingWithObject.scala example (#685)
Changes for SNAP-1678 Smart connector should emit info logs that indicate the cluster to which it is connecting (#676)
[SNAP-1760] Correct null bitset expansion and reduce copying in inserts. (#678) Fixes
ArrayIndexOutOfBounds exception in queries with wide schema having nulls.
Corrected the scaladoc examples in SnappySession. (#672)
Allow for spaces at start of API parser calls
[SNAP-1737] While passing value to GemFireXD, it should ve converted from catalyst type to scala
type.(#669)
[SNAP-1735] use single batch count in stats row (#664)
Renamed "-b" option to "-bg" to match convention used in other POSIX commands
[SNAP-1725] Fix start and collect-debug scripts for Mac.
[SNAP-1714] Correcting case-sensitivity handling for API calls (#657)
[SNAP-1792] Snappy Monitoring UI now also displays Member Details View which shows member specific
information, various statistics (like Status, CPU, Memory, Heap & Off-Heap Usages, etc) and
members logs in incremental way.
[Snap-1890] Snappy Monitoring UI displays new Pulse logo. Also product and it's build details are
shown under version in pop up window.
[Snap-1813] Pulse (Snappy Monitoring UI) users need to provide valid user name and password if
SnappyData cluster is running in secure mode.
Release 0.9
- New Features/Fixed Issues
[Snap-1286] Thin Client Smart Connector implementation.
[SNAP-1235] Overhaul SnappyUnifiedMemoryManager to work properly for overflow.
[SNAP-1454] Support for Off-Heap in column store.
[SNAP-1413] install_jar does not work for Streaming jobs. Handled classloader in case of
Streaming factory as well.
[SNAP-1424] Add a "shouldStop()" call to EncoderScanExec. The "shouldStop()" check is necessary
because if the target is a RowWriter (e.g. the parent is an EXCHANGE) then the same row gets
reused.
[SNAP-1304] Implementation of Snapshot Isolation in snappydata.
[SNAP-990] Column wise storage in region for better perf instead of full cachedbatch.
[SNAP-1346] Plan caching ignoring constant values.
[SNAP-1323] Support parameterized prepared statements for routed queries. Changes for improved
execution of prepared statement on column table through JDBC route.
JDBC CDC Streaming support. (https://github.com/SnappyDataInc/snappydata/pull/622)
[SNAP-1655] Support for boolean in row table.
[SNAP-1705] Support slash ('/') and special characters in column names.
[SNAP-1698] Snappy Dashboard UI Enhancements
Multi-grid master (https://github.com/SnappyDataInc/snappydata/pull/628)
[SNAP-1545] Redesigned SnappyData Dashboard. Now displays detailed member description, heap and
off-heap usage along with snappy storage and execution splits. It also displays cluster level
aggregate Memory and CPU usage.
[SNAP-1642] Avoid plan caching for queries with subqueries as the underlying changing data does
not reflect in subsequent query.
[SNAP-1221] Unable to restart server nodes in the cluster due to
ConflictingPersistentDataException.
[SNAP-1461] Scalar subquery is only allowed to return a single row, while executing subquery on
partitioned row table. This is fixed by routing any query with more than one table to lead node.
[SNAP-1520] Switched to upstream Spark from snappy-spark-unsafe. Removed explicit
KryoSerializableSerializer registration for UnsafeRow and UTF8String in PooledKryoSerializer and
instead call just the .register() method which will determine the serializer to be used by
reflection.
[SNAP-1615] If a column being aggregated has a NULL value while grouping on a string column, the
grouping row itself produces a new row with Null column. As a fix, check the actual value while
scanning row table for string column to decide whether its Null or not.
[SNAP-1496] Wide table scan for column tables fails due to 64K limitation of JVM. As a fix, we
now chunk the different parts of the scan code if the number of columns exceeds 30.
[SNAP-1384] Column Table Inserts can fail if generated code is big. Modified ColumnInsertExec to
handle wides schema to 1012 columns.
[SNAP-892] SnappyData launch script picks localhost as the locator hostname and ignores
conf/locators, when invoked from non-locator host.
[SNAP-1518] sbin/snappy-start-all/.sh does not start lead in a large cluster. As a fix, retry if
Hive metastore initialization fails due to datastore being no yet available on servers.
[SNAP-1400] When a server/cluster is restarted, sometimes incorrect results are observed. Added a
check to get buckets from initialized members only.
[SNAP-1344] As streaming jobs are recurring jobs, the earlier mechanism of removing dependent jar
files were broken. Now we maintain a list of jars in the context itself and remove the jars when
the streaming context is stopped.
[SNAP-1494]: Dashboard shows an exception stack trace when a server goes down. Exception handled
and logged it into log file.
[SNAP-1399] Updated column stats for complex type which was causing issue while inserting JSON
data to column table.
[SNAP-1351] After a low memory exception is encountered, the snappy server does not remain stable.
Snappy threads cannot be interrupted.
[SNAP-1481] SQL Tab on UI, Description column now displays actual SQL Query string executed
instead of handler description text.
[SNAP-1442] Registering row table in catalog after its creation.
[SNAP-1210] Fix NullPointerException caused when writing dataframe containing timestamp column to
csv files by registering FastDatePrinter with KryoSerializer.
[SNAP-1420] Removed the property "config.trace"->"substitutions" which is generating unneccessary
logs.
[SNAP-1435] Added support for off-heap in SnappyMemoryManager.
[GITHUB-534][SNAP-1480] Code generation failure for nested GROUP BY. Match against variable name
for dictionary optimization.
[SNAP-1482] Tableau generated query fails with NumberFormatException. Fix parsing of full
engineering format double values.
[SNAP-303] Handle non-store hive tables in meta-data queries.
[SNAP-1459] StackOverflowError running query on Airline (narrow table) with small data set. As a
fix, registering the classes with a multimap parameter which differentiates between the hashjoin
and hashaggregate.
[SNAP-1361] Added support for schema name in udf while querying e.g. select app.udfname(col_name)
from table.
[SNAP-1414] ArrayIndexOutOfBoundsException when creating sample table out of large dataset. As a
fix, passing a reasonable initial size for the encoder term.
Pruning partitions for predicates based on partitioning columns.
(https://github.com/SnappyDataInc/snappydata/pull/543)
[SNAP-1441] Limit query on column table gives less number of rows (JDBC).
[SNAP-1395] ElasticSearch connector gives NullPointerException when used with SnappyData.
Enhancements in SDE
- For an external table, the LogicalRelation was not storing the table identifier, thus
sample table replacement was not happening. Now passing table identifier in the logical relation.
Release 0.8
- Known Issues
[SNAP-1384] Inserting into or querying a table with wide schema may fail with
StackOverflowException due to a limitation of JVM.
- New Features/Fixed Issues
[SNAP-1357] ODBC Driver and Installer. You can now connect to the SnappyData cluster using the
SnappyData ODBC driver and execute SQL queries.
[SNAP-1313] Multiple Language Binding using Thrift Protocol. SnappyData now provides support for
Apache Thrift protocol enabling users to access the cluster from other languages that are not
supported directly by SnappyData.
[SNAP-490] Insert Performance Optimizations - Insert into tables is much more optimized and
performant now. A new insert plan has been introduced which uses code generation
and a new encoding format.
Fixes backward compatibility with Spark 2.0.0 - The 0.8 SnappyData release is based on the Spark
2.0.2 version. And, the SnappyData Smart Connector is now backward compatible with
Spark 2.0.0 and 2.0.1 releases.
[SNAP-1146] Fixes RowBuffer bloating. Data was not being aged into the internal compressed
columnar format leading to unoptimized storage and hence bad query performances.
[SNAP-1308] Incorrect number of entries displayed on UI if insert was being done from a Spark app
using Smart Connector.
[SNAP-1293] The driver process of external Spark app using Smart connector were incorrectly being
displayed as members of SnappyData cluster.
[SNAP-1282] The Spark web UI stopped working with SnappyData 0.7. Fixed.
[SNAP-1243] UI incorrectly displaying multiple entries for the same lead node after restarts.
[SNAP-1296] ResultsSet obtained from PreparedStatement.executeQuery returned 0 column count
from its metadata. However SNAP-1311 needs to be fixed for complete support of Prepared
Statement jdbc api. Being worked on.
[SNAP-1291] Queries issued with execution-engine=store hint were getting ignored in some cases
[SNAP-1287] Query execution using indexes on row tables was sometimes throwing ClasscastException
[SNAP-1269] Create tables using schema of other table but with different column names were using
column names of the source tables itself.
[SNAP-1134] A job throwing exception remained in hung state instead of reporting failure
[SNAP-982] Support for persistent UDFs added. Even after restart the UDF can be used now
Enhancements in SDE
- Sample selection logic enhanced. It can now select best suited sample table even if SQL
functions are used on QCS columns while creating sample tables.
- Poisson multiplicity generator logic for bootstrap is improved. Error estimated using
bootstrap are now more accurate.
- Improved performance of closed-form and bootstrap error estimations.
Release 0.7
[SNAP-1260] Miscellaneous plan optimizations.
[SNAP-1251][SNAP-1252] Avoid exchange when join columns are superset of partitioning.
[SNAP-1112] Query hints for executionEngine doesn't work correctly.
[SNAP-1240] SnappyData monitoring dashboard.
[SNAP-1234] Always skip broadcast join for cases of collocated PR joins.
[SNAP-1229] Fixed Snappy Python APIs broken after Spark 2.0 merge.
[SNAP-1219] Unable to drop persistent column table when a server node is killed abruptly.
[SNAP-1225] Performance improvements for hash joins (and other fixes).
[SNAP-1218] Enable RDD-bucket de-linking for single table and replicated table joins.
[SNAP-1217] Introduce Enable Experimental Feature property.
[SNAP-1213] Using esoteric ExternalizableSerializable as default serializer for Externalizable
rather than FieldSerialzable.
[AQP-259] Fixing the issue where the size of the Map was not being assigned early enough,
resulting in flush increasing the reservoir size in an unbounded manner.
[SNAP-1209] Updated LocalJoin to cover colocated join cases as well.
[SNAP-1205] Avoid exchange when the table is partitioned with the join key.
[SNAP-1193] Optimized Collect aggregate plan to avoid last step exchange.
[SNAP-1191] Basic plan caching (without constant tokenization). Add plan caching and reuse of
SparkPlan, RDD and PlanInfo.
[SNAP-1194] Optimization for single dictionary column GROUP BY and JOIN
[SNAP-1136] Pooled version of Kryo serializer which works for closures.
New PooledKryoSerializer that does pooling of Kryo objects (else performance is bad if new
instance is created for every call which needs to register and walk tons of classes)
[SNAP-1067] Optimized GROUP BY (HashAggregateExec) and HASH JOIN. Optimized hash table
for GROUP BY (HashAggregateExec) and for LocalJoin.
[SNAP-1087] Maintain stats (which include lower bound, upper bound, null count, etc.)
for every column. And then uses the upper bound and lower bound values of columns to
filter out the cached batches. This will be a perf enhancement for the queries which
filters extensively.
[SNAP-1084] Cache and return CatalogTable instead of going to hive
[SNAP-1182] Added map/flatMap/filter/glom/mapPartition/transform APIs to SchemaDStream
[SNAP-1180] Use ConfigEntry mechanism for SnappyData properties. Added SQLConfigEntry and
convenience methods.
[SNAP-999] Changes to remove the install jar and instead use SparkJobServer only.
Apache Spark 2.0.2 merge.
[SNAP-730] Add a rule to replace column tables with indexes when the join column is indexed.
Removing the kafka-0.10 dependencies and shipping only kafka-0.8
[SNAP-1075] Added a service to publish store table size that is used for query plan generation.
These stats are also published on SnappyData Dashboard.
[SNAP-1060] [SNAP-1141] [SNAP-1115] Fixes for Streaming related issues after Apache Spark 2.0
merge.
[SNAP-1152] Fixing NPE in aggregation. Handling null entry in ObjectHashMapAccessor during code
generation.
Avoid pooling of stream Input and Output objects in PooledKryoSerializer to try and fix occasional
failures in TPCHDUnitTest.
[SNAP-1172] Changes to render StringType as VARCHAR for tables created via API.
[SNAP-1185] Changing all internal.Logging references to public one.
[SNAP-1188] Set batch uuid to previous record if the current batchuuid is null.
[SNAP-69] Fix SparkJobServer rootDir to point to current working directory instead of /tmp.
Redirecting rootdir from /tmp to "-dir" startup parameter via gemfirexd.system.home variable
set in the launcher.
Fixing failure for optimized=T case in TPCETrade
[SNAP-1147] Properly handle dropping of collocated table.
[SNAP-1087] Removing StatsPredicateCompiler and closure; instead generate embedded predicate code
in a new function in the same context as for ColumnTableScan code.
[SNAP-977] Allow user to specify configuration on command-line while submitting a job.
[SNAP-1021] Added an external catalog to SnappyCatalog to ensure the Catalog API of Spark will
also work fine. This makes Snappy catalog cleaner and removes redundant code.
[SNAP-1096] Add Lead attribute in Member MBean.
[SNAP-1066] Modified existing tests to inferschema instead of using string and proper use of
nullValue.
Fixed readLongDecimal for ColumnEncoding adapters.
[SNAP-1199] Making external table visible with SnappyData Connector.
[SNAP-1083] Fixing multiple issues in RDD de-linking.
[SNAP-1190] Reduce per-partition task overhead.
Release 0.6
[SNAP-735] Supporting VARCHAR with size and processing STRING as VARCHAR(32762), by default.
Provided query hint (--+ columnsAsClob(*)) to force processing STRING as CLOB. Changes to render
CHAR as CHAR and VARCHAR as VARCHAR. Added a system property to stop treating STRING as max size
VARCHAR but as CLOB.
[SNAP-1049] IllegalArgumentException: requirement failed: partitions(1).partition == 5, but it
should equal 1
[SNAP-1050] Query execution from JDBC waits infinitely for external table if column name in query
is wrong
[SNAP-1036] Optimize access to row store using raw region iterators
[SNAP-1000] Perf improvement for localjoin through code generation
[SNAP-1034] Optimized generated code iteration for Column tables
[SNAP-1047] Fix column table row count in UI
[SNAP-1044] Support for describe table and show table using snappycontext
[SNAP-846] Ensuring that Spark Uncaught exceptions are handled in the Snappy side and do not cause
a system.exit
[SNAP-1025] Stream tables return duplicate rows
[SNAP-959] create table as select not working as expected if row table is source table
[SNAP-845] Atomicity of DDLs across catalogs
[SNAP-981] Support Snappy with multiple Hadoop version
[SNAP-979] Correct table size and count shown on the Snappy UI tab
[SNAP-936] Automatic selection of execution engine based on query type. Query hint also provided
to select a particular engine for execution
[SNAP-653] Cleanup relation artifacts when it is altered/dropped/... from external cluster
[SNAP-654] If the Lead is running and an application runs a program that points to the Snappy
cluster as the Master, then, the client program perpetually hangs.
[SNAP-174] No ssh required for starting cluster through scripts if only localhost is being used
[SNAP-910] DELETE / UPDATE FROM COLUMN TABLE throws proper exception now
[SNAP-293] Single install/replace jar utility. User can install a jar using install jar utility
and it will be available to all executors, store and driver node the jar uploaded via the job
server also follows the same norm.
[SNAP-824] Support for CUBE/ROLLUP/GROUPING SETS through sql. Support for window clauses and
partition/distribute by
SPARK 2.0 merge
[SNAP-861] Zeppelin interpreter for SnappyData
[SNAP-947] Unable to restart cluster with 0.5 version with columnar persistent tables
[SNAP-961] Fix passing of some DDL extension clauses like OFFHEAP PERSISTENT etc.
[SNAP-734] Support for EXISTS from sql
[SNAP-835] Drop table from default schema with fully qualified name throws "Table Not Found" Error
[SNAP-784] Fully qualified table name access fails with "Table Not Found" Error
[SNAP-864] Script to launch SnappyData cluster on Amazon Web Services EC2 instances.
[AQP-77] exception " STRATIFIED_SAMPLER_WEIGHTAGE#411L missing
[AQP-94] Class cast exception if aggregate is on string column
[AQP-107] scala.MatchError,while using reserved word sample_ in the query
[AQP-143] Unexpected error for query on empty table
[AQP-154] Actual sample count varies with varying number of columns in QCS.
[AQP-177] Unable to drop the sample table
[AQP-190] Relative Error estimates are wildly OFF
[AQP-199] Use of alias in FROM clause results in Sample not being selected
[AQP-203] COUNT(DISTINCT) queries 'with error' clause fails with No plan for ErrorDefaults
[AQP-204] Inconsistent results ,each time the same bootStrap query is executed multiple times.
[AQP-205] Bug in abortSegment implementation of stratum cache/ concurrent segment hashmap causes
count to be inocrrect
[AQP-206] Exception while using error_functions in HAVING clause
[AQP-207] Join query fails with error while evaluating an expression
[AQP-210] Mathematical expression involving error estimates not working
[AQP-212] HAC behavior 'local_omit' doesnot work as expected.
[AQP-213] Exception when using errorFuntion in HAVING clause with HAC behavior 'run_on_full_table'
and 'partial_run_on_base_table'
[AQP-214] Need support for functions in sample creation
[AQP-216] Cannot use float datatype for sample creted on row table
Release 0.5
Rowstore quickstarts are now packaged into the SnappyData distribution.
[AQP] Optimizations of bootstrap for sort based aggregate.
[AQP] Minimize the query plan size for bootstrap.
[AQP] Optimized the Declarative aggregate function.
[SNAP-858] Added documentation for Python APIs.
[SNAP-852] Added new fields on the Snappy Store tab in Spark UI.
[SNAP-730] Added index creation and colocated joins