Skip to content
Tester edited this page Nov 22, 2015 · 1 revision

Elasticsearch

About Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.

Basic concepts

Cluster

A cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes

Node

A node is a single server that is part of your cluster, stores your data, and participates in the cluster’s indexing and search capabilities.

Index

An index is a collection of documents that have somewhat similar characteristics

Type

Within an index, you can define one or more types. A type is a logical category/partition of your index whose semantics is completely up to you

Document

A document is a basic unit of information that can be indexed. This document is expressed in JSON (JavaScript Object Notation) which is an ubiquitous internet data interchange format

Shards & Replicas

Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster.

To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Install

  • Install the package for your distribution
  • Start our node and single cluster:
# elasticsearch
......................
[2015-04-03 14:33:37,572][INFO ][node] [The Entity] started

We can see that our node named "The Entity" (which is a random Marvel character) has started.

Tip: to override either the cluster or node name, run the command this way:

# elasticsearch --cluster.name my_cluster_name --node.name my_node_name

Note:

  • It works with java-7-openjdk and java-8-openjdk.
  • elastisearch is running by default on port 9200

REST API

  • Check your cluster, node, and index health, status, and statistics
  • Administer your cluster, node, and index data and metadata
  • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
  • Execute advanced search operations such as paging, sorting, filtering, scripting, faceting, aggregations, and many others

Cluster management

Basic commands

$ curl 'localhost:9200/_cat/health?v'
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks 
1428073032 16:57:12  elasticsearch green           1         1      0   0    0    0        0             0 

"elasticsearch" is up with a green status, 1 node and 0 shards.

% curl 'localhost:9200/_cat/nodes?v'
host      ip        heap.percent ram.percent load node.role master name      
hortensia 127.0.0.1            3          33 0.29 d         *      The Entity

One node The Entity running.

$ curl 'localhost:9200/_cat/indices?v'

Above command returns the list of all indexes.

Create an index

Here we create the agent index

$ curl -XPUT 'localhost:9200/agent?pretty'
{
  "acknowledged" : true
}
$ curl 'localhost:9200/_cat/indices?v'
health status index    pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   agent   5   1          0            0       575b           575b 

Our new agent has been created and has the yellow status. The reason is Elasticsearch by default created one replica for this index. Since we only have one node running at the moment, that one replica cannot yet be allocated (for high availability) until a later point in time when another node joins the cluster. Once that replica gets allocated onto a second node, the health status for this index will turn to green.

Index a document

Let's create a JSON document: { "name": "agreenmamba" }

$ curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
  "name": "agreenmamba"
}'

Reponse:

{
  "_index" : "agent",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "created" : true
}

retrieve the document

$ curl -XGET 'localhost:9200/customer/external/1?pretty'
{
  "_index" : "agent",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
"name": "agreenmamba"
}
}

Delete the index

$ curl -XDELETE 'localhost:9200/agent?pretty'