Correcting Wrong Character Encoding In MySQL

Monday, March 16, 2009 - 15:10

Sometimes, especially when moving data from one server to another, you might find that you have encoded your MySQL database incorrectly. This problem with first show itself if you have the database encoded in one charset and your website set to display in another. If this is the case then you will find strange characters appearing in your text, especially when using punctuation marks. If you are unable or unwilling to change the character encoding on the site then you need to change how the data is encoded in the database.

The most common sort of thing you might want to do is change from iso-8859-1 (or windows-1252) to UTF-8. This can be done in one of two ways.

The first way is to simply alter the table so that the column contains a different charset.

1
2
3
<p>However, if your database has already been set up and your data has already been inserted in the wrong format then you can also update the data in the column using the CONVERT command.  The following snippet turns our latin1 data into uncoded binary data and then into utf8.</p>
 
<pre language="sql}UPDATE table SET col1=CONVERT(CONVERT(CONVERT(col1 USING 'latin1') USING BINARY) USING 'utf8');

You should also make sure that the connection to the database is done through a specific character set. This is done by using the SET NAMES command and the SET CHARACTER SET.

1
2
3
4
5
<p>These two commands basically set some values in your MySQL database, for more information on what is set look at the <a href="http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html" title="Connection Character Sets and Collations">Connection Character Sets and Collations</a> page on the MySQL website. This ensures that the data we get back from the database is also in the correct charset.</p>
 
<p>For a full list of the different character sets available in MySQL just run the command:</p>
 
<pre language="sql}SHOW CHARACTER SET;

This will display a table with the columns Charset, Description, Default collation and Maxlen. Each charset is associated with a collation. A collation is a set of rules for comparing characters in a charset, so it is important that you get this right if you want the database to work. The full list of collations can be viewed using the following command:

1
2
3
<p>You can even use a LIKE statement to refine the collation data into the information you are looking for.</p>
 
<pre language="sql}SHOW COLLATION WHERE Charset LIKE '%utf%'
Category: 
philipnorton42's picture

Philip Norton

Phil is the founder and administrator of #! code and is an IT professional working in the North West of the UK.
Google+ | Twitter

Comments

Awesome! I had issue with spanish words, and thie method helped.

Add new comment