hive-architecture

hive-architecture

December 29, 2022

Hive Architecture

Hive Architecture

Hive is a Data Warehousing tool based on Hadoop Framework Highly scalable as it uses HDFS for storage and Map Reduce for processing Useful for SQL scripting rather than programming approach Provides great support for Analysts to run Adhoc queries, Data analysis & summarization Hive Trift Service - Hive server offers Trift service to connect Hive from different applications like JDBC/ODBC & other Trift clients(Eg. Beeline) Hive Architecture
Hive clients - CLI, Trift client, JDBC/ODBC application, Web GUI(Ambari) CLI(Command Line Interface) aka Shell interface is the default service to invoke Hive Provides all supported operations through commands Trift Client - Trift clients like Beeline comes default with Hive, interacts with Hive like from CLIj
Stages in execution User submits query from anyone of the Hive client Driver receives the query, creates session and invokes Compiler, Driver have Session management, Hive configuration file & JDBC APIs capabilities Compiler parses the query, does the semantic analysis and generates execution plan with the help of Metastore Metastore contains all the metadata info for every Table, Partitions, Column types & Serializers/Deserializers for read/Write data from HDFS Execution engine gets series of execution steps as Execution plan from Compiler. Plan is DAG of stages contains Map/Reduce jobs Engine submits appropriate Map or Reducer jobs along with Metastore data if serilizer/deserializer requires in that job Intermediate job results will be save to temp. file and forwards to next stage

Comments