116 Posts
nycxav
10 years ago
4
Topic

Hi, 

I'm working on a Seblod site with content in German. I therefore have specific characters such as ß or umlauted vowels. The content is a directory with a lot of family names and places for which the spelling is not always consistent. ß can potentially be replaced by ss or even sz and umlauted characters are sometimes spelled ae, oe, ue. or even simply a,o, or u.

MySQL is using utf8 and utf8-unicode-ci as recommended for German. The database will probably be extended to include names of Easter European countries and must accomodate their specific characters too. 

I don't want to change anything to the database but since it will be accesed by both native German speakers and non German speakers I'd like a search for 

ß to return all content with ß but also ss and sz, and a search for a vowel  without umlaut to return both the umlauted and non umlauted version and a search for a vowel with umlaut to return both the umlauted vowel and it's ae, ue, oe version and conversely a search for ae, oe, ue to return also umlauted vowels.

Is there a way to set up a rule in the search fields to do so?

Any other solution welcome.

Thanks.

Xav.

Get a VIP membership
175 Posts
webcastor
10 years ago
0
Level 1

Maybe what you need is a match criteria based on SOUNDEX.

116 Posts
nycxav
10 years ago
0
Level 1

Hi,

Thanks for the suggestion. How to implement that in Seblod though?

Besides does "soundex" know which language the content is in?

Levenshtein distance may actually work better, but again how to implement it?

Any pointers on how to create a search "match" plugin?

Thanks.

Xav.

10 years ago
0
Level 1

Just an idea:

If you compute the search terms into regular expressions before using them in a query,
you could replace all these characters with their regex equivalent.

Then a regex match would be sufficient.

175 Posts
webcastor
10 years ago
0
Level 1

There is a file helper_workshop.php in /administrator/components/com_cck/helpers which contain all the search match string definitions.

/plugins/search/cck/cck.php should contain the code that converts the match criteria into collection array.

P.S. While looking for the collection array, I stumbled upon the _download_hits function, which was mentioned in one of the posts in last few days.

Get a Book for SEBLOD