Let me guess: you just bumped into the Incorrect string value: '\x01\x9D\x8C\x86' for column 'column_name' at row 1 error. Fantastic! Looks like you want to store some emoji in that MySQL database of yours. You’ll need to make sure your columns are utf8mb4 encoded. Read on.

Why do I want utf8mb4?

Because you want to store emoji. That’s it, most of the time.

What’s the deal here?

Current posts tell you to just switch your whole database/table to utf8mb4 as default encoding. You most likely don’t want this. It’s a crappy, lazy approach, honestly. We wanted to provide emoji support in the app that powers /docs - which happens to be a modest Django app - and all resources we found online insisted on shooting the just-go-utf8mb4-everywhere cannon. But you can do better.

About utf8mb4

If you want to store face with OK gesture 🙆, face with look of triumph 😤, information desk person 💁 or others, you have to use utf8mb4. The other encoding you may be tempted to use - utf8 - only stores code points up to 3 bytes, and Emoji fall into the 4-byte family.

Bonus: Which collation to use?

You’ll want utf8mb4_unicode_ci on these columns and NOT the bare, 3-byte utf8_unicode_ci.

If you want to dive into this, you should probably head this way.

Creating a custom, empty migration

Django migrations don’t support encoding changes as part of their automatic change detection/generation, and you can’t specify an encoding when defining a model’s field, so we’re going to have to generate an empty migration ourselves and do some raw SQL action on it.

./manage.py makemigrations --empty --name switch_to_utf8mb4_columns YOUR_APP

Will give us a new, empty migration:

# -*- coding: utf-8 -*-
# Generated by Django ...
from __future__ import unicode_literals

from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [('YOUR_APP', '0004_switch_to_utf8mb4_columns_20160620_2311')]

    operations = []

Now we just need to figure out what to cram inside the operations list.

Filling out the migration

You’ll want to use the migrations.RunSQL operation. We’ll use two keyword args: * sql= to define the statement to execute for migrating up * reverse_sql= for migrating down.

Make sure these are lists, by the way.

migrations.RunSQL(
    sql=['ALTER TABLE app_posts_table MODIFY post_title VARCHAR(100) CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci'],
    reverse_sql=['ALTER TABLE app_posts_table MODIFY post_title VARCHAR(100)']  
)

Bonus: django-admin patch

If you're using `django-admin`, you'll also need to patch the `django_admin_log` table as part of the migration - otherwise the audit log will blow up. Simply add this operation to the list:
migrations.RunSQL(
    sql=['ALTER TABLE django_admin_log MODIFY object_repr VARCHAR(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL'],
    reverse_sql=['ALTER TABLE django_admin_log MODIFY object_repr VARCHAR(200) NOT NULL']
 )

Indexing utf8mb4 columns

See here to understand how this affects your columns maximum index size.

Bonus: why not just escape?

You may have come up with working around this by encoding the emojis somehow - HTML entities most likely. No one will stop you, and it will work - but should you ever need to present that data in a format that doesn’t support HTML character entities, you’re in for more hacks. Storing the raw data as-is will let you sidestep building hack upon hack in some Android app along the way.

Resources

For all things utf8mb4, this fantastic post by Mathias Bynens should be your go-to and launchpad into other related topics. And make sure you dig into the comments, which are another fantastic adventure entirely.

Check out this SO question for a second opinion on the differences between UTF-8 collations.