String.getBytes () Chinese code problem

Get a byte array in the default encoding format of the operating system. This means that under different operating systems, the returned things are different!

byte[] a = “中”.getBytes()

String.getBytes(String decode) method will Return the byte array representation of a string under the code according to the specified decode code, such as

byte[] a = “中”.getBytes(“GBK”)//The length is 2

byte[] b= “Medium”.getBytes(“UTF-8”)//The length is 3

byte[] c= “Medium”.getBytes(“ISO8859-1”)/ /Length is 1

In contrast to getBytes, you can restore this “medium” word by way of new String(byte[], decode). This new String(byte[], decode) is actually Use the specified encoding decode to parse byte[] into a string.

String s_gbk = new String(a,”GBK”);String s_utf8 = new String(b,”UTF-8″);String s_iso88591 = new String(c,”ISO8859-1″);

By outputting s_gbk, s_utf8 and s_iso88591, you will find that s_gbk and s_utf8 are both “medium”, and only s_iso88591 is an unrecognized character (can be understood as garbled), why use ISO8859-1 encoding and then Can’t restore the word “中” after the combination? The reason is very simple, because the encoding table of ISO8859-1 encoding does not contain Chinese characters at all, of course, it is impossible to get the correct “中” character in ISO8859-1 through “中”.getBytes(“ISO8859-1”); The coded value of, so it is impossible to restore it through new String().

Therefore, when obtaining byte[] through the String.getBytes(String decode) method, you must make sure that the code value represented by String does exist in the encoding table of decode, so that the byte[] array obtained can be correct Was restored.

Note: Sometimes, in order to adapt Chinese characters to some special requirements (for example, http header requires that its content must be iso8859-1 encoded), it may be possible to encode Chinese characters in byte mode , Such as:

String s_iso88591 = new String(“中”.getBytes(“UTF-8″),”ISO8859-1”);

, the resulting s_iso8859-1 The string is actually three characters in ISO8859-1. After passing these characters to the destination, the destination program then uses the opposite method.

String s_utf8 = new String(s_iso88591.getBytes(” ISO8859-1″),”UTF-8″);

To get the correct Chinese character “中”, so as to ensure compliance with the agreement and support Chinese.

Leave a Comment

Your email address will not be published.