How to avoid creating .crc files when creating parquet floors

I am using a parquet frame to write parquet files
I used this constructor to create a parquet writer –

public class ParquetBaseWriter extends ParquetWriter {
public ParquetBaseWriter(Path file, HashMap mySchema,
CompressionCodecName compressionCodecName, int blockSize,< br /> int pageSize) throws IOException {
super(file, ParquetBaseWriter.writeSupport(mySchema),
compressionCodecName, blockSize, pageSize, DEFAULT_IS_DICTIONARY_ENABLED, false);
}

Every time a parquet file is created, a corresponding .crc file will be created on the disk.
How to avoid creating a .crc file?
Is there a logo or something I need to set?

Thank you

You can see this google group discussion crc file: < br> https://groups.google.com/a/cloudera.org/forum/#!topic/cdk-dev/JR45MsLeyTE

TL; DR-crc file does not occupy NN name space Any overhead in the file. They are not HDFS data files, they are meta files in the data directory. If you use the “file: ///” URI, you will see them in the local file system.

I am using a parquet frame to write parquet files
I used this constructor to create a parquet writer –

public class ParquetBaseWriter extends ParquetWriter {
public ParquetBaseWriter(Path file, HashMap mySchema,
CompressionCodecName compressionCodecName, int blockSize,
int pageSize) throws IOException {
super(file, ParquetBaseWriter.writeSupport(mySchema),
compressionCodecName, blockSize, pageSize, DEFAULT_IS_DICTIONARY_ENABLED, false);
}

Every time a parquet file is created, a corresponding .crc file will be created on the disk.
How to avoid creating a .crc file?
Is there a logo or something I need to set?

Thank you

You can see this google group discussion crc file:
https://groups.google.com/ a/cloudera.org/forum/#!topic/cdk-dev/JR45MsLeyTE

TL; DR-crc files do not occupy any overhead in the NN namespace. They are not HDFS data files, they Are meta files in the data directory. If you use the “file: ///” URI, you will see them in the local file system.

Leave a Comment

Your email address will not be published.