Debugging : Hive Dynamic partition Error : [Fatal Error] total number of created files now is 100028, which exceeds 100000. Killing the job.

hive

[Fatal Error] total number of created files now is 900320, which exceeds 900000. Killing the job.

tldr; quick fix – but probably not the right thing to do always:
SET hive.exec.max.created.files=900000;

So my config increases the default partitions and files created limit:

 set hive.exec.dynamic.partition=true;
 set hive.exec.max.dynamic.partitions=100000;
 SET hive.exec.max.dynamic.partitions.pernode=100000;
 set hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.max.created.files=900000;

Correct thing to do:
Investigate why Hive is creating these many files. Most partitions should be within 100000, so hitting the limit sounds suspicious.
This happens when we misplace the wrong columns in the partition column and the wrong data creates thousands of useless partitions. Verify the query && check output data location to check what exact files and partitions are written to S3/Hdfs.

If everything looks alright and you think you genuinely need to increase the partitions goto the above tip to increate the max file limit.

2 thoughts on “Debugging : Hive Dynamic partition Error : [Fatal Error] total number of created files now is 100028, which exceeds 100000. Killing the job.

  1. Add “DISTRIBUTE BY ;” At the end of the SQL .

    Insert into tableA partition (partnfield)
    SELECT….. FROM TABLEB… DISTRIBUTE BY partnfield;

    This will make sure the number of part files created are in limit and shouldn’t ideally exceed the limit.

  2. Distribute By would not fix the number of output partitions if the column itself has lot of distinct values. Distribute By is also prone to OutOfMemory errors if one key has a data skew.
    But as I mentioned, I chose the wrong column for partitioning. Having more than 100000 partitions is already bad.
    In my case I just had to use a different partition key. I was using a key that had huge number of distinct values. Such key is a very bad contender to be a partition key.

Leave a Reply

Your email address will not be published. Required fields are marked *