Localized collate, comparison, SQLite string

I want to compare two strings in a SQLite database without caring about accent and capitalization. I mean “Événement” should be equal to “evenèment”.

On Debian Wheezy, the SQLite package does not provide ICU. So I compiled the official SQLite package (version 3.7.15.2 2013-01-09 11:53:05) that contains the ICU module. Now, I do have a better one Unicode support (originallower() only works for ASCII characters, now it works for other letters). But I can’t use the ratio for collation.

SELECT icu_load_collation('fr_FR ','FRENCH');
SELECT'événement' COLLATE FRENCH ='evenement';
-- 0 (should be 1)
SELECT'Événement' COLLATE FRENCH ='événement';< br />-- 0 (should be 1 if collation was case-insensitive)
SELECT lower('Événement') ='événement';
-- 1 (at least lower() works as expected with Unicode strings)

The SQLite documentation confirms that this is the correct way to apply tidying. I think the documentation of this ICU extension is a bit lighter (a few examples, there is no case-sensitive tidying).

I don’t understand why the COLLATE operator has no effect in the above example. Please help.

me It took a few hours to understand the situation…The way ICU proofreading is defined in SQLite (almost) has no comparative incidence. According to the ICU, the exception is the Hebrew text with the perfusion mark. This is the ICU library The default behavior of the collation. With SQLite, when ICU is loaded, LIKE becomes case-insensitive, but the standardization of accented letters cannot be achieved in this way.

I finally understand that what I need is to set the
strength

primary level
rather than the default high level.

I found that it is not possible to set it through the locale
(For example, several variants of SELECT icu_load_collation(‘fr_FR,strength=0′,’french’) are useless).
So the only solution is to patch the SQLite code.
Since the ucol_setStrength() function is easy
in the ICU API.

The smallest change is a one-line patch: add the line ucol_setStrength(pUCollat​​or,0); after pUCollat​​or = ucol_open ( zLocale,& status); in the function icuLoadCollat​​ion().
For backward compatibility changes, I added an optional third parameter to icu_load_collat​​ion() to set the intensity:< br>0 means default, 1 means main, etc., up to 4 quaternions.
See diff.

Finally I have what I want:

SELECT icu_load_collation('fr_FR','french_ci', 1); - collation with strength=primary
SELECT'Événement' COLLATE french_ci ='evenèment';
-- 1

I want to compare two strings in SQLite database without caring about accent and capitalization. I mean "Événement" should be equal to "evenèment".

On Debian Wheezy, the SQLite package does not provide ICU. So I compiled the official SQLite package (version 3.7.15.2 2013-01-09 11:53:05) that contains the ICU module. Now, I do have one Better Unicode support (originallower() only works for ASCII characters, now it works for other letters). But I can’t use the ratio for collation.

SELECT icu_load_collation( 'fr_FR','FRENCH');
SELECT'événement' COLLATE FRENCH ='evenement';
-- 0 (should be 1)
SELECT'Événement' COLLATE FRENCH ='événement';
-- 0 (should be 1 if collation was case-insensitive )
SELECT lower('Événement') ='événement';
-- 1 (at least lower() works as expected with Unicode strings)

SQLite documentation confirms this is an application The correct way to tidy up. I think the documentation of this ICU extension is a bit lighter (a few examples, there is no case-sensitive about tidying up).

I don't understand why the COLLATE operator has no effect in the above example. Please help.

It took me a few hours to understand the situation... The way ICU proofreading is defined in SQLite (almost) has no comparative incidence. According to According to ICU, the exception is Hebrew text with perfusion mark. This is the default behavior of the collation of the ICU library. With SQLite, when ICU is loaded, LIKE becomes case-insensitive, but it cannot be passed this way Realize the standardization of accented letters.

I finally understand that what I need is to set the
strength

primary level
instead of the default high level.

p>

I found it impossible to set via locale
(for example, several variants of SELECT icu_load_collat​​ion('fr_FR, strength = 0','french') are useless).
So the only solution is to patch the SQLite code.
Since the ucol_setStrength() function is easy
in the ICU API.

The smallest change is a one-line patch: add the line ucol_setStrength(pUCollat​​or ,0); after pUCollat​​or = ucol_open (zLocale, & status); in the function icuLoadCollat​​ion().
For backward compatible changes, I added in icu_load_collat​​ion() An optional third parameter to set the strong Degree:
0 means default, 1 means main, etc., up to 4 quaternions.
See diff.

Finally I have what I want:

SELECT icu_load_collation('fr_FR','french_ci', 1); - collation with strength=primary
SELECT'Événement' COLLATE french_ci ='evenèment';
-- 1

Leave a Comment

Your email address will not be published.