hive-table-partitioning

Hive Table Partitioning
Hive Table Partitioning

What is Partitioning

  • Hive can divide Table data into multiple layers called Partitions
  • Partitioning speed up the query execution and distributes the data load evenly into multiple layers
  • Improper Partitioning columns leads to poor query execution performance as data may not evenly distributed or with huge partitions with less data
  • Partitions where we load data manually into the table are called Static Partitions
  • Hive takes care of loading data into appropriate partitions if Dynamic Partition enabled
  • We need to specify Partition column or set of columns in table creation script
  • Always load data to Partition table from staging/temp. table instead from flat files
  • Create staging table columns as per the order of Flat file using for data load
Static Partition - We need to load the data into the partitions manually, mostly with query on staging table
Hive creates a directory inside table location and stores the given data in to that.
and sub-directory structure for each subsequent partitioning column.

create external table  EMP_STG(id int,first_name string,last_name string,email string,gender string, company string)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile
location '/hivebucketing/01/empstg'
TBLPROPERTIES("skip.header.line.count"="1");

load data local inpath '/home/data_dump/Hive_Bucketing/employee.csv' into table EMP_STG;

create external table emp_part_stg(id int,first_name string,last_name string,email string,gender string)
partitioned by(company string)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile
location '/hivebucketing/02/empbcktstg'
TBLPROPERTIES("skip.header.line.count"="1");

insert overwrite table emp_part_stg partition(company='Voomm')
select id,first_name,last_name,email,gender from emp_stg where company='Voomm';

insert overwrite table emp_part_stg partition(company='Milapo')
select id,first_name,last_name,email,gender from emp_stg where company='Milapo';

insert into table emp_part_stg partition(company='Camimbo')
select id,first_name,last_name,email,gender from emp_stg where company='Camimbo';

Dynamic Partition - Hive takes care of data importing into respective partitions
It creates a directory structure for each partition and sub-directory structure for each subsequent partitioning column.

Please set below properties to enable dynamic partition
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

Please set below properties to control dynamic partitions count
set hive.exec.max.dynamic.partitions= {desired max partitions no.};
set hive.exec.max.dynamic.partitions.pernode={desired max partitions no.};

create external table cmp_bckt_tab(company string,country_code string,location string,timezone string)
partitioned by(country string,state string,city string)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile
location '/hivebucketing/02/cmp_part_stg'
TBLPROPERTIES("skip.header.line.count"="1");

INSERT into table cmp_bckt_tab partition(country,state,city)
select company,country_code,location,timezone,country,state,city from COMPANY_STG;

Comments

Popular posts from this blog

hadoop-installation-ubuntu

jenv-tool

hive-installation-in-ubuntu