If you are using Hive dynamic partitions, by setting the following at the beginning of the script..
SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict;
You don’‘t need to do an msck repair, partitions are already created.
> create table test.part(id integer) partitioned by (ds string) stored as parquet location 'file:///tmp/testpart'; OK Time taken: 0.165 seconds hive> SET hive.exec.dynamic.partition = true; hive> SET hive.exec.dynamic.partition.mode = nonstrict; hive> show partitions test.part; OK Time taken: 0.141 seconds insert into table test.part partition('2017-01-01') values (100); Loading data to table test.part partition (ds=2017-01-01) OK Time taken: 30.146 seconds hive> show partitions test.part; OK ds=2017-01-01 Time taken: 0.034 seconds, Fetched: 1 row(s)
However, if you are using a Hive metastore local to EMR, in case the cluster goes down, when you create an external table, the partitions are not present.
You need to do
msck repair to load all the partitions.
Another option is to use a common RDS-based Hive metastore that is used by all the EMR clusters or use a managed hive-compatible datastore like Glue.