file does not exists while sync clustering #12566

gejinzh · 2025-01-02T14:44:06Z

Describe the problem you faced
Flink was restarted by Yarn and did not recover from the checkpoint. After running for a period of time, it reported a file not exists exception during sync clustering

what can i do to fix it?

hudi configuration

       Map<String, String> options=  new HashMap<>();
      options.put(FlinkOptions.PATH.key(), hudiProperties.getPath());
      options.put(FlinkOptions.TABLE_TYPE.key(), HoodieTableType.MERGE_ON_READ.name());
      options.put(FlinkOptions.OPERATION.key(), WriteOperationType.INSERT.value());
      options.put(FlinkOptions.DATABASE_NAME.key(), hudiProperties.getDatabase());
      options.put(FlinkOptions.TABLE_NAME.key(), hudiProperties.getTableName());
      options.put(FlinkOptions.WRITE_TASKS.key(), String.valueOf(4));
      options.put(FlinkOptions.HIVE_SYNC_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.HIVE_SYNC_MODE.key(), HiveSyncMode.HMS.name());
      options.put(FlinkOptions.HIVE_SYNC_METASTORE_URIS.key(), hudiProperties.getHiveMetastoreUris());
      options.put(FlinkOptions.HIVE_SYNC_DB.key(), hudiProperties.getDatabase());
      options.put(FlinkOptions.HIVE_SYNC_CONF_DIR.key(), hudiProperties.getHiveConfDir());
      options.put(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.CLUSTERING_ASYNC_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.CLUSTERING_TASKS.key(), String.valueOf(4));

Environment Description

Hudi version : 0.13.0
flink version : 1.16
Hive version :
Hadoop version : 3.3.3
Storage (HDFS/S3/GCS..) : hdfs
Running on Docker? (yes/no) : no

Stacktrace

org.apache.hudi.exception.HoodieClusteringException: Error reading input data for hdfs://ctyunns/user/yxfcenter/hudi/tables/tele_table/tele_table/ods_offer_inst/8140000/f54559fc-95cc-428e-bb45-096d8858d0c9-0_1-4-6_20241229153814597.parquet and []
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:337) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.lang.Iterable.spliterator(Iterable.java:101) ~[?:1.8.0_352]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:341) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_352]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_352]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_352]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_352]
	at org.apache.hudi.sink.clustering.ClusteringOperator.readRecordsForGroupBaseFiles(ClusteringOperator.java:342) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:242) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$processElement$0(ClusteringOperator.java:194) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_352]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_352]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ctyunns/user/yxfcenter/hudi/tables/tele_table/tele_table/ods_offer_inst/8140000/f54559fc-95cc-428e-bb45-096d8858d0c9-0_1-4-6_20241229153814597.parquet
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1757) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.3.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIteratorInternal(HoodieAvroParquetReader.java:168) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIterator(HoodieAvroParquetReader.java:94) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getRecordIterator(HoodieAvroParquetReader.java:73) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:334) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	... 16 more```

The text was updated successfully, but these errors were encountered:

danny0405 · 2025-01-03T02:56:23Z

The plan is broken, you need to remove the clustering plan on the timeline manually and re-schedule a new plan, the 0.13.0 has some issues for recovery, did you have chance to upgrade to 0.14.1, 0.15.0 or 1.x?

gejinzh · 2025-01-03T04:00:42Z

The plan is broken, you need to remove the clustering plan on the timeline manually and re-schedule a new plan, the 0.13.0 has some issues for recovery, did you have chance to upgrade to 0.14.1, 0.15.0 or 1.x?
thanks for your reply
Should I remove all the clustering plan or the broken one?
Can the upgrade be compatible with the current data file and timeline generated by 0.13.0?

danny0405 · 2025-01-03T07:03:51Z

0.x is compatible, but 1.x may not(there is an auto=upgrade flow you need to test it first)

gejinzh · 2025-01-03T09:17:31Z

Thanks, I will try

gejinzh closed this as completed Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file does not exists while sync clustering #12566

file does not exists while sync clustering #12566

gejinzh commented Jan 2, 2025

danny0405 commented Jan 3, 2025

gejinzh commented Jan 3, 2025

danny0405 commented Jan 3, 2025

gejinzh commented Jan 3, 2025

file does not exists while sync clustering #12566

file does not exists while sync clustering #12566

Comments

gejinzh commented Jan 2, 2025

danny0405 commented Jan 3, 2025

gejinzh commented Jan 3, 2025

danny0405 commented Jan 3, 2025

gejinzh commented Jan 3, 2025