#66
dumbo cat /hdfs/path/part* silently fails to concatenate all part files
-
-
Created on:
Thu, Dec 03 2009 (over 2 years ago)
-
Reported by:
zstone
-
Assigned to:
-
-
Milestone:
-
-
Type:
-
-
Status:
New
-
Priority:
High (2)
-
Component:
-
-
Estimate:
None/Small/Medium/Large
None
-
-
Followers

zstone
No file chosen
You have an empty file field. Please select or remove it.
Associations
| # |
Relation |
Summary |
Status |
Action |
| No tickets |
No associations
Time Expenditure
Loading
Since the normal Dumbo syntax without the final star chokes on the _logs directory that Hadoop creates by default, people may be using this part* syntax frequently, and they may not realize that it yields incorrect results.
Current workarounds include using dumbo cat without the star by manually deleting the _logs directory or configuring Hadoop not to create it. It may be more convenient to use the HDFS ls command to iterate through the part files in a directory explicitly to ensure that each one is processed as expected.