Use Hive Serde for Fixed Length (index based) strings

Yash Sharma
May 12, 2014May 12, 2014
2 Comments
hive

Hive fixed length serde can be used in scenarios where we do not have any delimiters in out data file. Using RegexSerDe for fixed length strings is pretty straight:

CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )
LOCATION ‘path/to/data’;

The above query only expects exactly 32 characters in a line of text (10+10+10+2). The query can be customized to Ignore any characters at end after the useful data is read:

CREATE EXTERNAL TABLE customers ((userid STRING, fb_id STRING, twitter_id STRING, status STRING)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.{2}).*” )
LOCATION ‘path/to/data’;

Thats all. Have fun. Cheers \m/

2 thoughts on “Use Hive Serde for Fixed Length (index based) strings”

Surbhi Jain May 18, 2016 at 4:04 am

Hi..thanks for such a nice explanation..!
I have a scenario in which I’ll be receiving the fixed length record in a file but the records vary in length and they have to be extracted out from different positions.
For eg. if I get a record like AAAABBBCCCCCCDD, I just need AAAA (from position 1 to 4) as my first column and then DD (from position 13 to 14) as my second column. I need only few columns, not all.
Is there any way to work out the same?
I know I can write the substring function to extract the same but I wanted to know whether this can be handled while creating external table only?
Thanks in advance..!

Reply
sanjiv singh November 12, 2016 at 6:46 pm

Try this : https://github.com/sanjivsingh/FixedLengthAndDelimitedSerde
Supported both fixed length and delimited data serilization/deserialization
It support both case in single definition.

Reply

Use Hive Serde for Fixed Length (index based) strings

2 thoughts on “Use Hive Serde for Fixed Length (index based) strings”

Leave a Reply Cancel reply