External tables on parquet files - Invalid/missing columns

29 Jan 2019

If the parquet column name does not match the name of the column in the external table, we get null values instead of a complaint/error that this is the invalid column name or missing column name.

  • Can’t the check for datatype be extended to also check for column names
spark.read.parquet("file_name").printSchema()
my_column string

create external table if not exists catalog.entity_test(
   my_column_2 string
)
STORED AS PARQUET
LOCATION '....';

select * from catalog.entity_test where entity_id = 'GVo8cDA0uq8MzqJYIAABm';

the second column returns null

who am i

I am a data engineer with interests in databases, data science, algorithms and programming in general.

where am i

linkedin//rchamarthi
stackoverflow/users/237939/rchamarthi
Content available under Creative Commons (BY-NC-SA) unless otherwise noted.
This site is hosted at Github Pages and created with Jekyll. It uses the papyrus theme