Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file does not exists while sync clustering #12566

Closed
gejinzh opened this issue Jan 2, 2025 · 4 comments
Closed

file does not exists while sync clustering #12566

gejinzh opened this issue Jan 2, 2025 · 4 comments

Comments

@gejinzh
Copy link

gejinzh commented Jan 2, 2025

Describe the problem you faced
Flink was restarted by Yarn and did not recover from the checkpoint. After running for a period of time, it reported a file not exists exception during sync clustering

what can i do to fix it?

hudi configuration

       Map<String, String> options=  new HashMap<>();
      options.put(FlinkOptions.PATH.key(), hudiProperties.getPath());
      options.put(FlinkOptions.TABLE_TYPE.key(), HoodieTableType.MERGE_ON_READ.name());
      options.put(FlinkOptions.OPERATION.key(), WriteOperationType.INSERT.value());
      options.put(FlinkOptions.DATABASE_NAME.key(), hudiProperties.getDatabase());
      options.put(FlinkOptions.TABLE_NAME.key(), hudiProperties.getTableName());
      options.put(FlinkOptions.WRITE_TASKS.key(), String.valueOf(4));
      options.put(FlinkOptions.HIVE_SYNC_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.HIVE_SYNC_MODE.key(), HiveSyncMode.HMS.name());
      options.put(FlinkOptions.HIVE_SYNC_METASTORE_URIS.key(), hudiProperties.getHiveMetastoreUris());
      options.put(FlinkOptions.HIVE_SYNC_DB.key(), hudiProperties.getDatabase());
      options.put(FlinkOptions.HIVE_SYNC_CONF_DIR.key(), hudiProperties.getHiveConfDir());
      options.put(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.CLUSTERING_ASYNC_ENABLED.key(), String.valueOf(true));
      options.put(FlinkOptions.CLUSTERING_TASKS.key(), String.valueOf(4));

Environment Description

  • Hudi version : 0.13.0

  • flink version : 1.16

  • Hive version :

  • Hadoop version : 3.3.3

  • Storage (HDFS/S3/GCS..) : hdfs

  • Running on Docker? (yes/no) : no

Stacktrace

org.apache.hudi.exception.HoodieClusteringException: Error reading input data for hdfs://ctyunns/user/yxfcenter/hudi/tables/tele_table/tele_table/ods_offer_inst/8140000/f54559fc-95cc-428e-bb45-096d8858d0c9-0_1-4-6_20241229153814597.parquet and []
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:337) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.lang.Iterable.spliterator(Iterable.java:101) ~[?:1.8.0_352]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$readRecordsForGroupBaseFiles$5(ClusteringOperator.java:341) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_352]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_352]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_352]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_352]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_352]
	at org.apache.hudi.sink.clustering.ClusteringOperator.readRecordsForGroupBaseFiles(ClusteringOperator.java:342) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.doClustering(ClusteringOperator.java:242) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$processElement$0(ClusteringOperator.java:194) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_352]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_352]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]
Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ctyunns/user/yxfcenter/hudi/tables/tele_table/tele_table/ods_offer_inst/8140000/f54559fc-95cc-428e-bb45-096d8858d0c9-0_1-4-6_20241229153814597.parquet
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1757) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1750) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.3.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1765) ~[hadoop-hdfs-client-3.3.3.jar:?]
	at org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIteratorInternal(HoodieAvroParquetReader.java:168) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getIndexedRecordIterator(HoodieAvroParquetReader.java:94) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getRecordIterator(HoodieAvroParquetReader.java:73) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	at org.apache.hudi.sink.clustering.ClusteringOperator.lambda$null$4(ClusteringOperator.java:334) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
	... 16 more```

@danny0405
Copy link
Contributor

The plan is broken, you need to remove the clustering plan on the timeline manually and re-schedule a new plan, the 0.13.0 has some issues for recovery, did you have chance to upgrade to 0.14.1, 0.15.0 or 1.x?

@gejinzh
Copy link
Author

gejinzh commented Jan 3, 2025

The plan is broken, you need to remove the clustering plan on the timeline manually and re-schedule a new plan, the 0.13.0 has some issues for recovery, did you have chance to upgrade to 0.14.1, 0.15.0 or 1.x?
thanks for your reply
Should I remove all the clustering plan or the broken one?
Can the upgrade be compatible with the current data file and timeline generated by 0.13.0?

@danny0405
Copy link
Contributor

0.x is compatible, but 1.x may not(there is an auto=upgrade flow you need to test it first)

@gejinzh
Copy link
Author

gejinzh commented Jan 3, 2025

Thanks, I will try

@gejinzh gejinzh closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants