hive-architecture

Hive Architecture
Hive Architecture

Hive is a Data Warehousing tool based on Hadoop Framework

Highly scalable as it uses HDFS for storage and Map Reduce for processing

Useful for SQL scripting rather than programming approach

Provides great support for Analysts to run Adhoc queries, Data analysis & summarization

Hive Trift Service - Hive server offers Trift service to connect Hive from different applications like JDBC/ODBC & other Trift clients(Eg. Beeline) Hive Architecture

Hive clients - CLI, Trift client, JDBC/ODBC application, Web GUI(Ambari)

CLI(Command Line Interface) aka Shell interface is the default service to invoke Hive

Provides all supported operations through commands

Trift Client - Trift clients like Beeline comes default with Hive, interacts with Hive like from CLIj

Stages in execution

  1. User submits query from anyone of the Hive client
  2. Driver receives the query, creates session and invokes Compiler, Driver have Session management, Hive configuration file & JDBC APIs capabilities
  3. Compiler parses the query, does the semantic analysis and generates execution plan with the help of Metastore
  4. Metastore contains all the metadata info for every Table, Partitions, Column types & Serializers/Deserializers for read/Write data from HDFS
  5. Execution engine gets series of execution steps as Execution plan from Compiler. Plan is DAG of stages contains Map/Reduce jobs
  6. Engine submits appropriate Map or Reducer jobs along with Metastore data if serilizer/deserializer requires in that job
  7. Intermediate job results will be save to temp. file and forwards to next stage

Comments

Popular posts from this blog

hadoop-installation-ubuntu

jenv-tool

hive-installation-in-ubuntu