jump to navigation

UTF-8 encoding with PHP/MySQL December 1, 2005

Posted by chmod775 in MySQL.
trackback

CharsetsUTF-8 is an encoding that is used for internationalization of a web-based app. I use it for an application that is fetching RSS feeds from any country and saves them to a database like MySQL.

In the beginning it was only fetching feeds from blogs and news sites with the latin language. But as the app was growing also feeds that came from China or Japan ware added. But these character sets were not supported and shows only squares and other strange (=not readable) characters.

In more then one case a RSS feed is encoding according to the unicode character set. A good RSS and atom parser like magpieRSS can handle these charsets and output everything in the desired encoding like utf-8, iso-8859-1, etc.
Als the database tables I had to set the charset to ‘utf-8 general‘ so that there was no loss of original caharacters that were in that feed.

Now only in PHP to show it on the screen or to make an aggregated feed of those items.
Before I print it on the screen I did some string manupilation.After a while I read about the multi-byte support in PHP and tries some functions. It worked for a bit, but not for the ë, é, etc. characters. They still were garbled on the screen.
Then in a comment there was a perfect hint!

$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");
After that it was solved. also all the regular expressions with the mb_ereg_replace() function worked as expected.
The next step is to change the encoding of your HTML page to utf-8. The you are complete and you have a full unicode supporting webapp!
Good articles about this issue:
- http://www.w3.org/International/questions/qa-changing-encoding
- http://www.php.net/mbstring

Comments»

1. Zara - October 27, 2008

Zara Resmi Web Sitesi

2. Petar Smilajkov - December 24, 2008

Thanks much! This really solved the issue :)