Matthijs C. de Jonge



Forcing MySQL to output UTF-8 in Python

No matter how you’ve defined your tables (as containing UTF-8 data only, for example), the MySQLdb module that takes care of interacting with a MySQL database from Python simply refuses to return your data as proper UTF-8. Since it took me nearly an hour (and most of my hair) to figure out how to force it to do so anyway and I care enough to have others avoid the same fate, here’s what I learned:

Make the connection:

  connection = MySQLdb.connect(host = dbhost, user=dbuser, passwd=dbpass, db=dbname)

And before you do anything else, run a query that forces MySQL to output UTF-8 from now on:

  cursor = connection.cursor(MySQLdb.cursors.DictCursor) # or some other cursor type
  cursor.execute('SET character_set_results="utf8"')

Once you’ve done this, all your subsequent queries will return your data in proper UTF-8 encoding.

The fact that you have to run a query suggests that it’s not Python or MySQLdb that’s to blame, by the way. Not that I particularly care, of course. I just want it to work.