ASCII 152 Character (“ÿ”) PostgreSQL upper layer function

On the Windows 7 platform, use PostgreSQL version 9.3.9, use PgAdmin as the client, select the upper result “ÿÿÿ” on the column containing the example, and return null. If you store three Values, for example,

"ada"
"john"
"mole"
"ÿÿÿ"

Except for the lines containing “ÿÿÿ”, they are all returned in uppercase; this line
Nothing is returned, null…

The database encoding scheme is UTF8 / UNICODE. Set “client_encoding” Have the same value UNICODE.

Is this a setting problem in the database, an operating system problem or an error
in the database? Are there some recommended solutions?

Result:

select thecol, upper(thecol), upper(thecol) is null, convert_to(thecol,'UTF8'), current_setting( 'server_encoding') from thetable where ...

Yes:

"Apps";"APPS";f;"Apps";"UTF8" 
"All";"ALL";f;"All";"UTF8"
"Test";"TEST";f;"Test";"UTF8"
"ÿÿÿ"; "";f;"037703770377";"UTF8"

The lc_ part of pg_settings is:

"lc_collate";"Swedish_Sweden.1252";"Shows the collation order locale."
"lc_ctype";"Swedish_Sweden.1252";"Shows the character classification and case conversion locale."
"lc_messages ";"Swedish_Sweden.1252";"Sets the language in which messages are displayed."
"lc_monetary";"Swedish_Sweden.1252";"Sets the locale for formatting monetary amounts."
"lc_numeric" ;"Swedish_Sweden.1252";"Sets the locale for formatting numbers."

The output of select * from pg_database is:

"template1";10 ;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;t;-1;12130;668;1‌​;1663;"{=c/postg res,postgres=CTc/postgres}" 
"template0";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;f;-1;12130;668;1‌​;1663;" {=c/postgres,postgres=CTc/postgres}"
"postgres";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";f;t;-1;12130;668;1; ‌​1663;""

For version 9.4.4, the actual create database statement is:

CREATE DATABASE postgres
WITH OWNER = postgres
ENCODING ='UTF8'
TABLESPACE = pg_default
LC_COLLATE ='Swedish_Sweden.1252'
LC_CTYPE ='Swedish_Sweden.1252'
CONNECTION LIMIT = -1;< /pre>
My guess is that the upper function uses the database’s LC_CTYPE setting. LATIN with DIAERESIS (U 00FF) The uppercase letter of SMALL LETTER Y is LATIN CAPITAL LETTER Y WITH DIAERESIS' (U 0178), which is not part of the Windows 1252 code page.

If the string is first converted to Unicode format, then upper The function may work as expected:

SELECT upper(convert_to(thecol,'UTF8')) ...

You should use for LC_CTYPE and LC_COLLATE Different values. On Linux, you will use sv_SE.UTF-8.

However, I think this is a bug in Postgres. If the uppercase version cannot be in the target character set In said, it is best to keep ÿ.

On the Windows 7 platform, use PostgreSQL version 9.3.9, use PgAdmin as the client, and select the upper part on the column containing the example The result of "ÿÿÿ", returns null. If three values ​​are stored, for example,

"ada"
"john"
"mole "
"ÿÿÿ"

Except for the line containing "ÿÿÿ", they are all returned in uppercase; this line
nothing is returned, null …

The database encoding scheme is UTF8 / UNICODE. Set "client_encoding" to have the same value UNICODE.

This is a setting problem in the database, an operating system problem or an error
In the database? Are there some recommended solutions?

Result:

select thecol, upper(thecol), upper(thecol) is null, convert_to(thecol,'UTF8'), current_setting( 'server_encoding') from thetable where ...

Yes:

"Apps";"APPS";f;"Apps";"UTF8" 
"All";"ALL";f;"All";"UTF8"
"Test";"TEST";f;"Test";"UTF8"
"ÿÿÿ"; "";f;"037703770377";"UTF8"

The lc_ part of pg_settings is:

"lc_collate";"Swedish_Sweden.1252";"Shows the collation order locale."
"lc_ctype";"Swedish_Sweden.1252";"Shows the character classification and case conversion locale."
"lc_messages ";"Swedish_Sweden.1252";"Sets the language in which messages are displayed."
"lc_monetary";"Swedish_Sweden.1252";"Sets the locale for formatting monetary amounts."
"lc_numeric" ;"Swedish_Sweden.1252";"Sets the locale for formatting numbers."

The output of select * from pg_database is:

"template1";10 ;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;t;-1;12130;668;1‌​;1663;"{=c/postgres,postgres =CTc/postgres}" 
"template0";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";t;f;-1;12130;668;1‌​;1663;"{=c /postgres,postgres=CTc/postgres}"
"postgres";10;6;"Swedish_Sweden.1252";"Swedish_Sweden.1252";f;t;-1;12130;668;1;‌​1663 ;""

For version 9.4.4, the actual create database statement is:

CREATE DATABASE postgres
WITH OWNER = postgres
ENCODING ='UTF8'
TABLESPACE = pg_default
LC_COLLATE ='Swedish_Sweden.1252'
LC_CTYPE ='Swedish_Sweden.1252'
CONNECTION LIMIT = -1;

My guess is that the upper-level function uses the LC_CTYPE setting of the database. The capital letter of LATIN SMALL LETTER Y with DIAERESIS (U 00FF) is LATIN CAPITAL LETTER Y WITH DIAERESIS'( U 0178), it is not part of the Windows 1252 code page.

If the string is first converted to Unicode format, the upper function may work as expected:

< /p>

SELECT upper(convert_to(thecol,'UTF8')) ...

You should use different values ​​for LC_CTYPE and LC_COLLATE. On Linux, you will use sv_SE.UTF- 8.

However, I think this is a bug in Postgres. If the uppercase version cannot be represented in the target character set, it is better to keep ÿ.

Leave a Comment

Your email address will not be published.