Hive timestamp parsing and nulls
30 Apr 2018
The idea of schema and read and non-failing loads seems appealing at first, but it can get annoying if you discover data issues much later in the pipeline.
One such example is Hive not raising an error when a timestamp column is not parseable. Hive silently returns null.
hive> select unix_timestamp('2015-08-03T02:30:00',"yyyy-MM-dd'T'HH:mm:ss");
OK
1438569000
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive>
> select unix_timestamp('2015-08-03T02:30:00.000Z',"yyyy-MM-dd'T'HH:mm:ss.SSSZ");
OK
NULL
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive> select unix_timestamp('2001-07-04T12:08:56.235-0700',"yyyy-MM-dd'T'HH:mm:ss.SSSZ");
OK
994273736
Time taken: 0.028 seconds, Fetched: 1 row(s)