.NET – Keep text file encoding (ASCII, UTF-8, UTF-16)

I have a simple text file processing tool written in C#, the skeleton is as follows:

using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding
using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader
{
string line;
while ((line = reader.ReadLine()) != null)
{
// do something with line
writer.WriteLine(line );
}
}

Most of the files it runs are ASCII files, occasionally UTF-16. I want to keep the file encoding, the newly created file should be The read file has the same encoding-this is why I opened the StreamWriter with the reader’s CurrentEncoding.

My problem is that some UTF-16 files are missing the preamble, and after the StreamReader is opened, it will CurrentEncoding Set to UTF-8, which causes the writer to open in UTF-8 mode. When debugging, I can see that the reader changes its CurrentEncoding property to UTF-16 after the first call to ReadLine, but by then the writer has Open.

I can think of some solutions (open the writer later or view the source file twice-the first is just to detect the encoding), but I think I will first ask the experts for comments. Please note , I don’t care about the code page of ASCII files, I only care about ASCII / UTF-8 / UTF-16 encoding.

trying Before opening the writer, I will try to be a reader. Pek() should be enough in your case, I think.

I have a simple text file processing tool written in C#, The skeleton looks like this:

using (StreamReader reader = new StreamReader(absFileName, true)) // auto detect encoding
using (StreamWriter writer = new StreamWriter(tmpFileName, false, reader.CurrentEncoding)) // open writer with the same encoding as reader
{
string line;
while ((line = reader.ReadLine()) != null)
{
// do something with line
writer.WriteLine(line);
}
}

It runs most The files are all ASCII files, and occasionally there will be UTF-16. I want to keep the file encoding, the newly created file should have the same encoding as the file being read-this is why I opened the StreamWriter with the reader’s CurrentEncoding.

My problem is that some UTF-16 files are missing the preamble, and after StreamReader is opened, it sets CurrentEncoding to UTF-8, which causes the writer to open in UTF-8 mode. While debugging, I can see Until the reader changes its CurrentEncoding property to UTF-16 after calling ReadLine for the first time, but by then the writer has been opened.

I can think of some solutions (open the writer later or twice Look at the source file-the first one is just to detect the encoding), but I think I will ask an expert for advice first. Please note that I don’t care about the code page of the ASCII file, I only care about the ASCII/UTF-8/UTF-16 encoding.

Before trying to open the writer, I will try to be a reader. Pek() should be enough in your case Yes, I think.

Leave a Comment

Your email address will not be published.