Hive Architecture

In the traditional times, managing big data was definitely a tedious task. Handling large volumes of data has become easier after solutions like Hadoop came up. There is also an increased demand for big data and cloud computing increasing its significance. In this world of technologies, there is a requirement of platforms and tools that process and store the large volumes of data. The Hadoop ecosystem includes different sub projects in which Hive also called Apache Hive is one of them. In this blog, we will discuss Hive, its architecture, components and working. With Hadoop training, you can master all the skills required to become a Hadoop certified professional. Let’s get started!

What is Hive?

Hive is referred to as a data warehouse infrastructure based tool that is designed for processing the structured data in Hadoop. Hive will be present on the top of the Hadoop for performing the functionality like summarizing the big data, also making the querying and analyzing functionalities easy.

Initially Hive was developed by Facebook and was later acquired by Apache Foundation, which is specifically designed for Online Analytical processing. It provides a language called HQL or Hive QL that is used for querying purposes. Hive is not a relational database while it is used for storing the schema in the database and also stores the processed data into the Hadoop distributed file system.

Hive Architecture:

Hive architecture


The Apache Hive architecture is divided into 3 main parts as represented in the above picture. They are:

 a.  Hive clients

 b.  Hive services

 c.  Hive storage and computing

Let us get to know more about each of the main core parts in the hive architecture.

Become a master of Hive by going through this HKR Hadoop Hive Training !

Hadoop Administration Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

Hive Clients:

Hive is capable of providing multiple drivers for communication purposes with a different set of applications. If it is a thrift based application, then it will provide a thrift client for the communication purposes. If it is a java based application, it will be providing the JDBC drivers. All these drivers and servers will be communicating back again with the Hive server that is present in the hive services.

Hive Services:

The Hive services are used for establishing the client interactions with Hive. The Hive service is used for communication purposes if there is any requirement of query operations at need to be performed in hive.

CLI Stands for command line interface which acts as a Hive service for the data definition based language operations. There will be communication establishment between the drivers that are present with the help server and also to the main driver that is present in the height services which is represented in the picture.

The driver that is available in the hive services is called as the main driver and it is used for communication with all the types of applications like ODBC and JDBC applications or any other client specific applications. The driver will include all the requests that are required from different applications to metastore and also the field systems for the processing of the data.

Hive Storage and Computing:

Hive services includes different services like the file system, meta store and job client which will be in turn communicating with the Hive storage and is responsible for performing the below set of actions.

  • The metadata information that is created in the form of tables and have is stored in the Hive meta storage database.
  • All the data and the query results will be loaded in the form of tables and will be stored in the Hadoop cluster in the Hadoop distributed file system.

Subscribe to our youtube channel to get new updates..!

Working with Hive:

Working with hive

The above image represents the working of Hive.

It performs the below set of operations.

  • Execute query: In this step, the Hive interface is responsible for sending a query to the driver. The interfaces like command line will send a query to any of the drivers like JDBC and ODBC for execution.
  • Get plan: It is the responsibility of the driver for taking the help from the query compiler which will be parsing the query to check out this syntax and the query plan or the query requirement.
  • Get metadata: The compiler is responsible for sending the metadata request to any meta store which is the database.
  • Send metadata: Store is now responsible for sending the metadata which is a response sent to the compiler.
  • Send plan: It is a responsibility of the compiler to have a check on the requirement and send the plan to the driver. Till this step the passing and compilation of the query will be completed.
  • Execute plan: It is the turn of the driver to send the execution plan to the execution engine.
  • Execute job: The process of executing the job is generally the map reduce job. The Execution engine will now be sending the job to the job tracker which is present in the name node and it will then assign the job to the task tracker which is present in the data node. In this step the query will be executing the map reduce job.
  • Metadata ops: In this type the metadata can also perform the execution of the metadata operations available within the meta Store.
  • Fetch Result: The execution will now receive the required result from the data nodes.
  • Send results: The execution engine will send the result value to the driver.
  • Send results: That driver will now be sending the research to the hive interfaces.

Components of hive architecture:

Each and every component in the Hive architecture plays a crucial role. Let us get an idea on each of the components for a better understanding of the architecture.

a. Metastore: The metastore is generally a repository of the metadata. The metadata usually includes the data that is applicable for each table along with its schema and location. It is responsible for holding the information for the partition meta that will help in monitoring the different data programs that occur in the cluster. Data is usually present in the relational databases and it keeps track of all the data, performs replication of the data and also provides a backup that is useful if there is any data loss.

B. Driver: The responsibility of the driver is to receive the query statements and it works like a controller. It is capable of monitoring the life cycle and progress of the executions that take place by creating different sessions. The driver will also store the metadata that is usually generated during the execution of the hiveQL statement. After the completion of the reducing operation by the map reduce job, that driver will then be collecting the query results and the data points.

C. Compiler: The task of the compiler is to convert a HQL query into a map reduce input. It also includes in the method that is used for the execution of the steps and the tasks that are required to let the output of HiveQL be as needed by the map reduce.

D. Optimizer: The Optimizer is responsible for performing different transformation steps for a pipeline and aggregation conversion. The Optimizer is also required to split a particular task during the transformation of data before the reduce operations are performed for the improved scalability and efficiency.

E. Executor: The goal of the executor is to execute the task after the optimization and compilation steps have been completed. The executive tracker for the purpose of scheduling the tasks that have to run.

F. UI, CLI, Thrift server: The user interface and the command line will be submitting the queries and also process the instructions and monitoring purposes so that it is possible for the external users to interact with the hive. The thrift server is responsible for letting other clients to interact with hive.

Hadoop Administration Training

Weekday / Weekend Batches

Different modes of hive:

Hive is capable of operating in two different modes based on the data node size that is present in the Hadoop. There are two different types of modes and hive. They are:

  1. Local mode
  2. Map reduce mode

Let us know the scenarios in which these two kinds of modes can be used.

a. When to use a local mode:

  • The local mode can be used in Hadoop if the data size is smaller and is limited to only a particular local machine.
  • Th elocal mode can be used when there is Hadoop installed under the soda mode which has only one data node.
  • The local mode can be used if the processing is very fast under the smaller data sets that are available in the local machine.

b.When to use the map reduce mode:

  • The map reduce mode can be used if there is a need to perform on the large volume of data and also the query which is going to be executed in parallel.
  • The map reduce mode can be used if the Hadoop includes multiple data notes and also if the data is distributed across the different nodes.
  • The map reduce mode can be used if the processing of the large volume of data sets can be achieved with better performance

By default the map reduce mode is used in Hadoop while there is an option to set up this property on which mode Hive needs to work.


Hive is a data warehouse tool which is a database that is present in the Hadoop ecosystem and also performs the data definition language and the data language operations. It includes multiple features when compared with the relational database management system. It has got multiple features and advantages that will help in developing a cool ecosystem. To gain more understanding of Hive and its concepts, you can get trained in certified Hadoop training.

Related Article:

Find our upcoming Hadoop Administration Training Online Classes

  • Batch starts on 26th Sep 2023, Weekday batch

  • Batch starts on 30th Sep 2023, Weekend batch

  • Batch starts on 4th Oct 2023, Weekday batch

Global Promotional Image


Request for more information

Research Analyst
As a content writer at HKR trainings, I deliver content on various technologies. I hold my graduation degree in Information technology. I am passionate about helping people understand technology-related content through my easily digestible content. My writings include Data Science, Machine Learning, Artificial Intelligence, Python, Salesforce, Servicenow and etc.

Hive is a data warehouse infrastructure based software that is used to develop the interaction between the Hadoop distributed file system and the users. Hive provides support for multiple user interfaces like Hive command line, Hive HD insight, Hive web UI, etc.

Hive uses HQLl or HiveQL, a query language that is used for analyzing and crossing the structure data that is present in the meta Store. It is a Hively scalable language and is very much similar to SQL. It is a combination of my SQL, oracle SQL and SQL -92.

There are three types of Hive.
a. Warre Hive
b. Top-bar hive
c. Langstroth hive

Hive is easy to learn and code. It helps the SQL professionals to master the skills by working on the Hadoop platform.

The components of Hive architecture are:
a. Metastore
b. Driver
c. Compiler
d. Optimizer
e. Executor
f. CLI, UI, and Thrift Server