Wednesday, June 9, 2010

migrating from latin1 to utf8 in mysql : part 2

As promised the second part of this article.

to locate the double encoded characters, we made use of an user defined library LIB_MYSQLUDF_PREG

this function allows you to return the position where a certain pattern occurs.
let me explain with this example :

SELECT * FROM Address WHERE
MOD(preg_position('(C{1}[2-9]{1}[8-9A-B]{1}[0-9A-F]{1})',hex(line6)),2) = 1;

this will return us the records where the starting position of a hex string which starts with C2 till C9 and as a second part from 80 till BF is uneven.

so the string ß => hex c39f would be filtered out, because indeed the condition is met

So far how to locate the different cases the next part will tell how we did the whole migration to minimize the downtime.