Hive dynamic partitioning - MSCK repair
30 Apr 2018
If you are using Hive dynamic partitions, by setting the following at the beginning of the script..
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
You don’‘t need to do an msck repair, partitions are already created.
> create table test.part(id integer) partitioned by (ds string) stored as parquet location 'file:///tmp/testpart';
OK
Time taken: 0.165 seconds
hive> SET hive.exec.dynamic.partition = true;
hive> SET hive.exec.dynamic.partition.mode = nonstrict;
hive> show partitions test.part;
OK
Time taken: 0.141 seconds
insert into table test.part partition('2017-01-01') values (100);
Loading data to table test.part partition (ds=2017-01-01)
OK
Time taken: 30.146 seconds
hive> show partitions test.part;
OK
ds=2017-01-01
Time taken: 0.034 seconds, Fetched: 1 row(s)
However, if you are using a Hive metastore local to EMR, in case the cluster goes down, when you create an external table, the partitions are not present.
You need to do msck repair
to load all the partitions.
Another option is to use a common RDS-based Hive metastore that is used by all the EMR clusters or use a managed hive-compatible datastore like Glue.