I used this constructor to create a parquet writer –
public class ParquetBaseWriterextends ParquetWriter {
public ParquetBaseWriter(Path file, HashMapmySchema,
CompressionCodecName compressionCodecName, int blockSize,< br /> int pageSize) throws IOException {
super(file, ParquetBaseWriter.writeSupport(mySchema),
compressionCodecName, blockSize, pageSize, DEFAULT_IS_DICTIONARY_ENABLED, false);
}
Every time a parquet file is created, a corresponding .crc file will be created on the disk.
How to avoid creating a .crc file?
Is there a logo or something I need to set?
Thank you
TL; DR-crc file does not occupy NN name space Any overhead in the file. They are not HDFS data files, they are meta files in the data directory. If you use the “file: ///” URI, you will see them in the local file system.
I am using a parquet frame to write parquet files
I used this constructor to create a parquet writer –
public class ParquetBaseWriterextends ParquetWriter {
public ParquetBaseWriter(Path file, HashMapmySchema,
CompressionCodecName compressionCodecName, int blockSize,
int pageSize) throws IOException {
super(file, ParquetBaseWriter.writeSupport(mySchema),
compressionCodecName, blockSize, pageSize, DEFAULT_IS_DICTIONARY_ENABLED, false);
}
Every time a parquet file is created, a corresponding .crc file will be created on the disk.
How to avoid creating a .crc file?
Is there a logo or something I need to set?
Thank you
You can see this google group discussion crc file:
https://groups.google.com/ a/cloudera.org/forum/#!topic/cdk-dev/JR45MsLeyTE
TL; DR-crc files do not occupy any overhead in the NN namespace. They are not HDFS data files, they Are meta files in the data directory. If you use the “file: ///” URI, you will see them in the local file system.