Skip to content

HBase AggregationClient variant that incorporates the ability to do intervals based timeseries aggregations.

License

Notifications You must be signed in to change notification settings

juwi/HBase-TAggregator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HBase TAggregator

HBase Aggregator implementation for timeseries based aggregations.

It is meant to allow for much faster aggregations when time intervals are at play. For example, when aggregating a day's worth of data down to 15min averages, you'd need 96 queries doing 96 scans using the standard aggregate implementation. TAggregator will do the same thing requiring just one query using a single scan, producing a Map of 96 TimeStamp-Value assignments.

Table of Contents

Supported Features

  • Max
  • Min
  • Sum
  • Avg
  • Will run with 1.1.1, for compatibility with 0.98, see #2ebb163

TODO

  • Fix pom (Some minor issues remaining, like protocol buffers not getting built in time for packaging, but nothing serious anymore)
  • Add more test cases for limits provided via scan
  • Fix Javadocs
  • Currently, when running with a scanner provided timerange, the result will be different from when running with a directly supplied range. The reason for this is, that when providing a range à la 10/10/2014 00:00-02:00 the scanner will cut off at 02:00, whereas the directly supplied range will also give the result for the range starting at 02:00.

Planned Features

  • Weighted avg
  • Some sort of diff

Usage

int interval = 900;
int time_min = (int) ((new GregorianCalendar(2014, 10, 10, 0, 0, 0).getTime().getTime()) / 1000);
int time_max = (int) ((new GregorianCalendar(2014, 10, 10, 1, 59, 59).getTime().getTime()) / 1000);
String KEY_FILTER_PATTERN = "00000001111"; // a 4-byte-timestamp (int) is assumed; Mask it using 1s 


Scan scan = new Scan();
scan.addFamily(FAMILY);
TimeseriesAggregationClient tsac = new TimeseriesAggregationClient(conf, interval, time_min, time_max, KEY_FILTER_PATTERN);

final ColumnInterpreter<Long, Long, EmptyMsg, LongMsg, LongMsg> ci =
        new LongColumnInterpreter();
ConcurrentSkipListMap<Long, Long> maximum = tsac.max(TEST_TABLE, ci, scan);

Note: For the KEY_FILTER_PATTERN it is assumed, that somewhere in your Key, there is an Integer timestamp (4 bytes; in seconds). The pattern masks the position of this integer in the key using 1s. Naturally, it is also assumed, that Keys handled during this operation are fixed length.

Alternatively, you can provide the total time range (time_min,time_max) via the scan object. In this case, you do not provide it to the Coprocessor, itself:

int interval = 900;
int time_min = (int) ((new GregorianCalendar(2014, 10, 10, 0, 0, 0).getTime().getTime()) / 1000);
int time_max = (int) ((new GregorianCalendar(2014, 10, 10, 1, 59, 59).getTime().getTime()) / 1000);


Scan scan = new Scan();
scan.addFamily(FAMILY);
scan.setTimeRange(time_min, time_max)
TimeseriesAggregationClient tsac = new TimeseriesAggregationClient(conf, interval);

final ColumnInterpreter<Long, Long, EmptyMsg, LongMsg, LongMsg> ci =
        new LongColumnInterpreter();
ConcurrentSkipListMap<Long, Long> maximum = tsac.max(TEST_TABLE, ci, scan);

For Maven users

You can use this repo for maven, also. Kind of the poor many maven repo, it resides in the mvn-repo branch:

<repositories>
        <repository>
		<id>hbase-taggregator-mvn-repo</id>
		<url>https://raw.github.com/juwi/HBase-Taggregator/mvn-repo/</url>
		<snapshots>
			<enabled>true</enabled>
			<updatePolicy>always</updatePolicy>
		</snapshots>
	</repository>
</repositories>

Licensing

Code and documentation Copyright Julian Wissmann, licensed under the Apache License version 2.

The following sources have been pulled from HBase, directly and thus are licensed to the Apache Software Foundation under Apache License Version 2:

  • src/main/protobuf/Cell.proto
  • src/main/protobuf/Client.proto
  • src/main/protobuf/Comparator.proto
  • src/main/protobuf/Filter.proto
  • src/main/protobuf/HBase.proto

About

HBase AggregationClient variant that incorporates the ability to do intervals based timeseries aggregations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published