hive-benchmarks

some benchmarking queries for Apache Hive

Setup

This repo was prepared for benchmarks of SS-DB, TPC-H and TPC-DS running in the following environment.

Hadoop version: Hadoop 1.2.1
Hive version: Hive 0.13-SNAPSHOT (Nov. 28, 2013)
Cluster setup:
- A 11-node (1 master + 10 slaves) EC2 cluster in us-east-1d
- Instance type: m1.xlarge
- OS Image: ami-a73264ce (Ubuntu Server 12.04.3 LTS 64-bit)
- OS kernel image version: the result of cat /proc/version is Linux version 3.2.0-56-virtual (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #86-Ubuntu SMP Wed Oct 23 09:43:22 UTC 2013.

Notes

Data types

Right now, int is used for the type of identifier. If the scale factor is very large, bitint is needed.

Because we may need to to compare the current version of Hive with a older version (e.g. 0.10.0) of it, we have to use data types supported by the older version to create columns. Here are mappings:

decimal -> float
char -> string
vacahr -> string
date -> string

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
conf		conf
ec2		ec2
init		init
queries		queries
ss-db		ss-db
tpc-ds		tpc-ds
tpc-h		tpc-h
vector		vector
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hive-benchmarks

Setup

Notes

Data types

About

Releases

Packages

Languages

License

yhuai/hive-benchmarks

Folders and files

Latest commit

History

Repository files navigation

hive-benchmarks

Setup

Notes

Data types

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages