Elasticsearch is an open-source and distributed analytics search engine. It is written in Java and was released in the year 2010. Elasticsearch is built on top of Lucene. It is a document-oriented engine that stores, searches, and analyzes data in large quantities. It is optimized to work well on huge data sets such that the search happens in near real-time.
It will work on both structured and unstructured documents. All we have to do is index a document and it will be available for search. The data stored in Elasticsearch will be in the form of JSON. Usually, databases require a lot of time to run a search query, while Elasticsearch reduces the results retrieval time along with full-text search.
The features in Elasticsearch are exposed as REST APIs,
Elasticsearch can be used to build a search engine for any type of application. We have an option to configure auto-suggestions, pagination of results, etc.
Become a Elasticsearch Certified professional by learning Elasticsearch online course from hkrtrainings!
Distributed Storage
When documents are added, the indices for them will be divided into shards. And these shards can have any number of replicas.
Scalability
It will run perfectly on a machine or on a cluster of nodes.
Improved performance
Since the indices are based on a distributed approach, the search results of a query will be retrieved very quickly.
Near-real time
The search results will be in near real-time i.e it performs search as soon as you index a document.
Document oriented
The search is document-oriented. When the documents are stored, all the fields will be indexed by default.
First, make sure that Java 8 or higher is installed on your machine. Go to Download Elasticsearch Choose an installer file based on your operating system and download it.
Windows
Download the elastic search-7.7.1.msi installer file. Run the installer file and follow the prompts to finish the installation. The environment path will be automatically set during installation.
Linux
Download the elasticsearch-7.7.1-Linux-x86_64.tar.gz tar file. Extract the tar file using the below command,
$tar -xzf elasticsearch-7.7.1-linux-x86_64.tar.gz
To run Elasticsearch, go to the installation location and run the bat file under bin using the below commands,
$ cd installationpath/bin
$ ./elasticsearch
The installationpath in the above command would be the path of the installation folder.
Mac
Download the elasticsearch-7.7.1-darwin-x86_64.tar.gz tar file and extract it using the below command,
$ tar -xvf elasticsearch-7.7.1-darwin-x86_64.tar.gz
Add the below lines in .bash_profile file to set the path to environment variables,
export ES_HOME=~/installationpath
export PATH=$ES_HOME/bin:$PATH
To run Elasticsearch, execute the below command,
$ elasticsearch
Once the installation is done, Elasticsearch runs on the port 9200 by default. To check if it is up and running or not, open a browser and run http://localhost:9200/ You should get JSON response like below,
{
"name" : "",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "7_soBVcOQuaShrZur0kjzw",
"version" : {
"number" : "7.7.1",
"build_flavor" : "unknown",
"build_type" : "unknown",
"build_hash" : "ad56dce891c901a492bb1ee393f12dfff473a423",
"build_date" : "2020-05-28T16:30:01.040088Z",
"build_snapshot" : false,
"lucene_version" : "8.5.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Kibana is a front-end application for Elasticsearch. We can run all our queries using Kibana.
Go to Download Kibana and download the zip file that suits your OS. Unzip the file to your desired installation path.
If you are using Windows, run the command - bin\kibana.bat
If you are using Linux (or) Mac, open config/kibana.yaml and set Elasticsearch installation path through elasticsearch.hosts field. Open the command prompt and run bin/kibana
Once the installation process is done, go to http://localhost:5601 to access Elasticsearch through Kibana
Click here to get Elasticsearch interview questions and answers for freshers & experienced professionals
We can read, update, delete data in Elasticsearch using REST APIs. Here are some conventions that can be applied to the REST APIs,
Multiple Indices
We can search for documents present in different indices through a single search query. Indices can be given using the following notations,
Date Math Support in Index Names
Date math operations help you in narrowing down the search results. So you can only get the results that you want within a range of time-series. In order to do that, we have to include some parameters along with the name while creating an index. Below is the syntax,
<index_name{date_math_expr{date_format|time_zone}}>
Cron Expressions
We can schedule triggers in Elasticsearch using Cron expressions. The syntax for this is,
We can schedule daily, monthly, yearly, or for a range of days too. For example, if we give the expression as 0 15 11 ? 6 SAT, then a trigger will be scheduled to run at 11:15 AM UTC on every Saturday in the month of June.
Common Options
Elasticsearch provides a lot of common options using which we can customize our query results. We can use them by just appending them at the end of the API. Some of the common options provided are pretty result, human-readable output, distance unit, response filtering, etc.
URL based access control
To provide more secure access to your indices, we can prevent the users from overriding an index by specifying it in the request body of a URL.
Open Kibana. Go to Dev Tools in the left side menu, you will get access to the console.
Execute the following command to create an index,
PUT employees
You will get the below response on successful creation of the index,
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "employees"
}
For each document that we want to add to an index, we need to specify an id to it. To add a document to the created index, execute the below command,
POST employees/_doc/1
{
"name" : "John Doe",
"Employee ID" : "1234",
"Department" : "Development",
"Location" : "Andhra Pradesh"
}
The response would be,
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
Add some more documents using the syntax specified above. Remember to give a unique id for each of the documents added to the index.
Search API
We can search through the documents that were added to an index. By taking the above example, if I want to search for the employees in the location of Andhra Pradesh, the API for it would be like below,
POST /employees/_search
{
"query":{
"query_string":{
"query":"Andhra Pradesh"
}
}
This API will return all the documents along with their details matching the query. The response would be,
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.6471939,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.6471939,
"_source" : {
"name" : "John Doe",
"Employee ID" : "1234",
"Department" : "Development",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.6471939,
"_source" : {
"name" : "Jessica Davis",
"Employee ID" : "1236",
"Department" : "Operations",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.6471939,
"_source" : {
"name" : "Justin Foley",
"Employee ID" : "1237",
"Department" : "Operations",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "10",
"_score" : 1.6471939,
"_source" : {
"name" : "Tyler Down",
"Employee ID" : "1243",
"Department" : "Development",
"Location" : "Andhra Pradesh"
}
}
]
}
}
Get API
If we want to get the details of a particular document, we can do so by specifying the id through GET API like below,
GET employees/_doc/3
Response :
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.6471939,
"hits" : [
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.6471939,
"_source" : {
"name" : "John Doe",
"Employee ID" : "1234",
"Department" : "Development",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.6471939,
"_source" : {
"name" : "Jessica Davis",
"Employee ID" : "1236",
"Department" : "Operations",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.6471939,
"_source" : {
"name" : "Justin Foley",
"Employee ID" : "1237",
"Department" : "Operations",
"Location" : "Andhra Pradesh"
}
},
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "10",
"_score" : 1.6471939,
"_source" : {
"name" : "Tyler Down",
"Employee ID" : "1243",
"Department" : "Development",
"Location" : "Andhra Pradesh"
}
}
]
}
}
Delete API
For deleting a document in an index, the API will be,
DELETE employees/_doc/10
Response :
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "10",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 11,
"_primary_term" : 1
}
Update API
To update any fields in a document, we need to specify the document id along with the field that you want to update to. The API for it will be,
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "10",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 11,
"_primary_term" : 1
}
Response :
{
"_index" : "employees",
"_type" : "_doc",
"_id" : "7",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 12,
"_primary_term" : 1
}
Query DSL (Domain Specific Language) is used to define JSON based queries on Elasticsearch. It has 2 clauses,
Query DSL helps in performing both exact queries like terms, filters, etc, and approximate queries like fuzzy, regex, etc. We can even use them both in combination by writing a compound query.
Mapping
Every document in an index will have a mapping to it. Mapping is nothing but metadata i.e the definition of the fields that the document contains. When we add a new document in an index, the mapping will be automatically added to it. We can even add mapping to a document manually.
[Related articles : Elasticsearch Commands]
Text Analysis
It is a process of converting text into a structured format. Like breaking up a sentence into words so it is better available for searching. When you are searching for text fields, Elasticsearch performs text analysis on indices that have fields of type text.
Index Modules
Modules are for monitoring the functionality of indices. Modules consist of two types of settings,
Static Settings - These can be set only at the time of index creation or on a closed index
Dynamic Settings - These can be set when the index is live on Elasticsearch
Conclusion
Elasticsearch is widely used in building websites where product or document search is very much needed. It can be implemented in various programming languages. It supports search on documents in 34 text languages. Elasticsearch is also provided as a managed service on AWS, GCP, and Alibaba Cloud.
Batch starts on 29th Sep 2023, Fast Track batch
Batch starts on 3rd Oct 2023, Weekday batch
Batch starts on 7th Oct 2023, Weekend batch