MySQL, JDBC, Unicode and You
A few notes on my experiences with using MySQL from JDBC with unicode support that might help others.
Yes, MySQL and the JDBC driver fully support UTF-8 and really it's quite easy to do but there are a few things to be aware of.
First you are best off if you create your database to use utf8 as the character set from the get go. If the database is set to use another character set by default then you have to be careful with your CREATE/ALTER table statements to make sure you're using utf8 there. Also the driver can get confused (if you're not careful) when the database and table character sets don't match.
Second on the driver front you should stick the following parameters onto your JDBC connection URL (I am assuming you are using the MySQL supplied JDBC driver, I can't imagine why you wouldn't really).useUnicode=true&characterEncoding=UTF-8
These parameters will make sure that the driver uses the correct encoding. As mentioned above if the database is set to use utf8 the driver will auto-detect this but there is a difference between database and table character sets in MySQL which can trip you up if not careful. So setting these parameters ensures it will work regardless of your setup.
In my opinion you should never use another character set in MySQL besides latin or utf8. If you are only ever going to store "ASCII" text than latin is fine but if you are supporting any other character set or sets just use utf8. It keeps things simple.
And to wrap up, a few notes on testing and displaying. If you are having problems with UTF-8 data and your database you should test the data before you insert it to make sure it is what you think. A large amount of JDBC related encoding issues are caused by the data being mangled well before it is event stored into the database. Also make sure that you are identifying the data correctly on the way to display it as well.
For quick and dirty stand-alone tests JOptionPanes work well in seeing data (as long as you have fonts that can display the glyphs in question). For web projects there are two common mistakes that can happen, one is not correctly identifying the outbound data as UTF-8 (which you can do very simply in your JSP page directive). The second is mangling the data in which is solved by setting the request encoding correctly in your servlet.request.setCharacterEncoding("UTF-8");
|© 2008 Max Stocker|