On Debian Wheezy, the SQLite package does not provide ICU. So I compiled the official SQLite package (version 3.7.15.2 2013-01-09 11:53:05) that contains the ICU module. Now, I do have a better one Unicode support (originallower() only works for ASCII characters, now it works for other letters). But I can’t use the ratio for collation.
SELECT icu_load_collation('fr_FR ','FRENCH');
SELECT'événement' COLLATE FRENCH ='evenement';
-- 0 (should be 1)
SELECT'Événement' COLLATE FRENCH ='événement';< br />-- 0 (should be 1 if collation was case-insensitive)
SELECT lower('Événement') ='événement';
-- 1 (at least lower() works as expected with Unicode strings)
The SQLite documentation confirms that this is the correct way to apply tidying. I think the documentation of this ICU extension is a bit lighter (a few examples, there is no case-sensitive tidying).
I don’t understand why the COLLATE operator has no effect in the above example. Please help.
I finally understand that what I need is to set the
strength
primary level
rather than the default high level.
I found that it is not possible to set it through the locale
(For example, several variants of SELECT icu_load_collation(‘fr_FR,strength=0′,’french’) are useless).
So the only solution is to patch the SQLite code.
Since the ucol_setStrength() function is easy
in the ICU API.
The smallest change is a one-line patch: add the line ucol_setStrength(pUCollator,0); after pUCollator = ucol_open ( zLocale,& status); in the function icuLoadCollation().
For backward compatibility changes, I added an optional third parameter to icu_load_collation() to set the intensity:< br>0 means default, 1 means main, etc., up to 4 quaternions.
See diff.
Finally I have what I want:
p>
SELECT icu_load_collation('fr_FR','french_ci', 1); - collation with strength=primary
SELECT'Événement' COLLATE french_ci ='evenèment';
-- 1 pre>
I want to compare two strings in SQLite database without caring about accent and capitalization. I mean "Événement" should be equal to "evenèment".
p>
On Debian Wheezy, the SQLite package does not provide ICU. So I compiled the official SQLite package (version 3.7.15.2 2013-01-09 11:53:05) that contains the ICU module. Now, I do have one Better Unicode support (originallower() only works for ASCII characters, now it works for other letters). But I can’t use the ratio for collation.
SELECT icu_load_collation( 'fr_FR','FRENCH');
SELECT'événement' COLLATE FRENCH ='evenement';
-- 0 (should be 1)
SELECT'Événement' COLLATE FRENCH ='événement';
-- 0 (should be 1 if collation was case-insensitive )
SELECT lower('Événement') ='événement';
-- 1 (at least lower() works as expected with Unicode strings)
SQLite documentation confirms this is an application The correct way to tidy up. I think the documentation of this ICU extension is a bit lighter (a few examples, there is no case-sensitive about tidying up).
I don't understand why the COLLATE operator has no effect in the above example. Please help.
It took me a few hours to understand the situation... The way ICU proofreading is defined in SQLite (almost) has no comparative incidence. According to According to ICU, the exception is Hebrew text with perfusion mark. This is the default behavior of the collation of the ICU library. With SQLite, when ICU is loaded, LIKE becomes case-insensitive, but it cannot be passed this way Realize the standardization of accented letters.
I finally understand that what I need is to set the
strength
primary level
instead of the default high level.
p>
I found it impossible to set via locale
(for example, several variants of SELECT icu_load_collation('fr_FR, strength = 0','french') are useless).
So the only solution is to patch the SQLite code.
Since the ucol_setStrength() function is easy
in the ICU API.
The smallest change is a one-line patch: add the line ucol_setStrength(pUCollator ,0); after pUCollator = ucol_open (zLocale, & status); in the function icuLoadCollation().
For backward compatible changes, I added in icu_load_collation() An optional third parameter to set the strong Degree:
0 means default, 1 means main, etc., up to 4 quaternions.
See diff.
Finally I have what I want:
SELECT icu_load_collation('fr_FR','french_ci', 1); - collation with strength=primary
SELECT'Événement' COLLATE french_ci ='evenèment';
-- 1