Let me guess: you just bumped into the
Incorrect string value: '\x01\x9D\x8C\x86' for column 'column_name' at row 1 error. Fantastic! Looks like you want to store some emoji in that
MySQL database of yours. You’ll need to make sure your columns are
utf8mb4 encoded. Read on.
Why do I want
Because you want to store emoji. That’s it, most of the time.
What’s the deal here?
Current posts tell you to just switch your whole database/table to
utf8mb4 as default encoding. You most likely don’t want this. It’s a crappy, lazy approach, honestly. We wanted to provide emoji support in the app that powers
/docs - which happens to be a modest Django app - and all resources we found online insisted on shooting the just-go-utf8mb4-everywhere cannon. But you can do better.
If you want to store
face with OK gesture 🙆,
face with look of triumph 😤,
information desk person 💁 or others, you have to use
utf8mb4. The other encoding you may be tempted to use -
utf8 - only stores code points up to 3 bytes, and Emoji fall into the 4-byte family.
Bonus: Which collation to use?
utf8mb4_unicode_ci on these columns and NOT the bare, 3-byte
If you want to dive into this, you should probably head this way.
Creating a custom, empty migration
Django migrations don’t support encoding changes as part of their automatic change detection/generation, and you can’t specify an encoding when defining a model’s field, so we’re going to have to generate an empty migration ourselves and do some raw
SQL action on it.
./manage.py makemigrations --empty --name switch_to_utf8mb4_columns YOUR_APP
Will give us a new, empty migration:
# -*- coding: utf-8 -*- # Generated by Django ... from __future__ import unicode_literals from django.db import migrations class Migration(migrations.Migration): dependencies = [('YOUR_APP', '0004_switch_to_utf8mb4_columns_20160620_2311')] operations = 
Now we just need to figure out what to cram inside the
Filling out the migration
You’ll want to use the
migrations.RunSQL operation. We’ll use two keyword args:
sql= to define the statement to execute for migrating up
reverse_sql= for migrating down.
Make sure these are lists, by the way.
migrations.RunSQL( sql=['ALTER TABLE app_posts_table MODIFY post_title VARCHAR(100) CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci'], reverse_sql=['ALTER TABLE app_posts_table MODIFY post_title VARCHAR(100)'] )
Bonus: django-admin patch
If you're using `django-admin`, you'll also need to patch the `django_admin_log` table as part of the migration - otherwise the audit log will blow up. Simply add this operation to the list:
migrations.RunSQL( sql=['ALTER TABLE django_admin_log MODIFY object_repr VARCHAR(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL'], reverse_sql=['ALTER TABLE django_admin_log MODIFY object_repr VARCHAR(200) NOT NULL'] )
Indexing utf8mb4 columns
See here to understand how this affects your columns maximum index size.
Bonus: why not just escape?
You may have come up with working around this by encoding the emojis somehow - HTML entities most likely. No one will stop you, and it will work - but should you ever need to present that data in a format that doesn’t support HTML character entities, you’re in for more hacks. Storing the raw data as-is will let you sidestep building hack upon hack in some Android app along the way.
For all things
utf8mb4, this fantastic post by Mathias Bynens should be your go-to and launchpad into other related topics. And make sure you dig into the comments, which are another fantastic adventure entirely.
Check out this SO question for a second opinion on the differences between UTF-8 collations.
Subscribe via RSS