Hive timestamp parsing and nulls

30 Apr 2018

The idea of schema and read and non-failing loads seems appealing at first, but it can get annoying if you discover data issues much later in the pipeline.

One such example is Hive not raising an error when a timestamp column is not parseable. Hive silently returns null.

hive> select unix_timestamp('2015-08-03T02:30:00',"yyyy-MM-dd'T'HH:mm:ss");
OK
1438569000
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive>
    > select unix_timestamp('2015-08-03T02:30:00.000Z',"yyyy-MM-dd'T'HH:mm:ss.SSSZ");
OK
NULL
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive> select unix_timestamp('2001-07-04T12:08:56.235-0700',"yyyy-MM-dd'T'HH:mm:ss.SSSZ");
OK
994273736
Time taken: 0.028 seconds, Fetched: 1 row(s)

who am i

I am a data engineer with interests in databases, data science, algorithms and programming in general.

where am i

linkedin//rchamarthi
stackoverflow/users/237939/rchamarthi
Content available under Creative Commons (BY-NC-SA) unless otherwise noted.
This site is hosted at Github Pages and created with Jekyll. It uses the papyrus theme