07 character encoding

1.Character encoding
Character encoding is for text without considering’video, audio’, etc.

2. The process of writing text
Entered characters>>>( Character code table)>>> Binary numbers

2.1 Character code table:
is the correspondence between characters and numbers
a 0
b 1
a 00
b 01
c 11
d 10

2.2ASCII码表
用八位二进制表示一个英文字符所有的英文字符+符号最多也就在125位左右
0000 0000
1111 1111
2.3GBK
Whether to use 2Bytes to represent a Chinese character or 1Bytes to represent an English character
0000 0000 0000 0000
1111 1111 1111 1111 can represent up to 65535 characters
2.4万国码unicode
统一用2Bytes表示所有的字符
a 0000 0000 0010 1010

产生的问题:
1.浪费存储空间
2.io次数增减,程序Reduced operating efficiency (fatal)

Extra:
The encoding in the memory and the encoding of the hard disk are not the same

1. The process of storing data:
In the memory Unicode (unicode) “” “encoding (endode) “” “utf-8 format in the hard disk (the format is a binary number)
2. The process of reading data:
utf-8 in the hard disk Binary data “” “decode (decode)” “”unicode (universal code) in memory

3. Garbled code:
When there is a hard disk encoding that is inconsistent with the encoding of the operating environment, it will appear garbled.

The core of ensuring no garbled: What encoding is used, and the corresponding encoding can be used for decoding. 4. The difference between the encoding in python2 and Python3: 4.1python2 Read the py file into the interpreter as a text file and use ASCII by default Code (because unicode of the python2 interpreter is not popular yet) 4.1.2 File header It uses utf-8 format 2. The windows terminal uses gbk

Leave a Comment

Your email address will not be published.