Matt Segal Dev

How to generate lots of dummy data for your Django app
Sun 14 June 2020, by Matthew Segal
Category: Django

It sucks when you're working on a Django app and all your pages are empty. For example, if you're working on a forum webapp, then all your discussion boards will be empty by default:

dummy-threads-empty

Manually creating enough data for your pages to look realistic is a lot of work. Wouldn't it be nice if there was an automatic way to populate your local database with dummy data that looks real? Eg. your forum app has many threads:

dummy-threads

Even better, wouldn't it be cool if there was an easy way to populate each thread with as many comments as you like?

dummy-comments

In this post I'll show you how to use Factory Boy and a few other tricks to quickly and repeatably generate an endless amount of dummy data for your Django app. By the end of the post you'll be able to generate all your test data using a management command:

./manage.py setup_test_data

There is example code for this blog post hosted in this GitHub repo.

Example application

In this post we'll be working with an example app that is an online forum. There are four models that we'll be working with:

# models.py

class User(models.Model):
    """A person who uses the website"""
    name = models.CharField(max_length=128)


class Thread(models.Model):
    """A forum comment thread"""
    title = models.CharField(max_length=128)
    creator = models.ForeignKey(User)


class Comment(models.Model):
    """A comment by a user on a thread"""
    body = models.CharField(max_length=128)
    poster = models.ForeignKey(User)
    thread = models.ForeignKey(Thread)


class Club(models.Model):
    """A group of users interested in the same thing"""
    name = models.CharField(max_length=128)
    member = models.ManyToManyField(User)

Building data with Factory Boy

We'll be using Factory Boy to generate all our dummy data. It's a library that's built for automated testing, but it also works well for this use-case. Factory Boy can easily be configured to generate random but realistic data like names, emails and paragraphs by internally using the Faker library.

When using Factory Boy you create classes called "factories", which each represent a Django model. For example, for a user, you would create a factory class as follows:

# factories.py
import factory
from factory.django import DjangoModelFactory

from .models import User

# Defining a factory
class UserFactory(DjangoModelFactory):
    class Meta:
        model = User

    name = factory.Faker("first_name")

# Using a factory with auto-generated data
u = UserFactory()
u.name # Kimberly
u.id # 51

# You can optionally pass in your own data
u = UserFactory(name="Alice")
u.name # Alice
u.id # 52

You can find the data types that Faker can produce by looking at the "providers" that the library offers. Eg. I found "first_name" by reviewing the options inside the person provider.

Another benefit of Factory boy is that it can be set up to generate related data using SubFactory, saving you a lot of boilerplate and time. For example we can set up the ThreadFactory so that it generates a User as its creator automatically:

# factories.py
class ThreadFactory(DjangoModelFactory):
    class Meta:
        model = Thread

    creator = factory.SubFactory(UserFactory)
    title = factory.Faker(
        "sentence",
        nb_words=5,
        variable_nb_words=True
    )

# Create a new thread
t = ThreadFactory()
t.title  # Room marriage study
t.creator  # <User: Michelle>
t.creator.name  # Michelle

The ability to automatically generate related models and fake data makes Factory Boy quite powerful. It's worth taking a quick look at the other suggested patterns if you decide to try it out.

Adding a management command

Once you've defined all the models that you want to generate with Factory Boy, you can write a management command to automatically populate your database. This is a pretty crude script that doesn't take advantage of all of Factory Boy's features, like sub-factories, but I didn't want to spend too much time getting fancy:

# setup_test_data.py
import random

from django.db import transaction
from django.core.management.base import BaseCommand

from forum.models import User, Thread, Club, Comment
from forum.factories import (
    UserFactory,
    ThreadFactory,
    ClubFactory,
    CommentFactory
)

NUM_USERS = 50
NUM_CLUBS = 10
NUM_THREADS = 12
COMMENTS_PER_THREAD = 25
USERS_PER_CLUB = 8

class Command(BaseCommand):
    help = "Generates test data"

    @transaction.atomic
    def handle(self, *args, **kwargs):
        self.stdout.write("Deleting old data...")
        models = [User, Thread, Comment, Club]
        for m in models:
            m.objects.all().delete()

        self.stdout.write("Creating new data...")
        # Create all the users
        people = []
        for _ in range(NUM_USERS):
            person = UserFactory()
            people.append(person)

        # Add some users to clubs
        for _ in range(NUM_CLUBS):
            club = ClubFactory()
            members = random.choices(
                people,
                k=USERS_PER_CLUB
            )
            club.user.add(*members)

        # Create all the threads
        for _ in range(NUM_THREADS):
            creator = random.choice(people)
            thread = ThreadFactory(creator=creator)
            # Create comments for each thread
            for _ in range(COMMENTS_PER_THREAD):
                commentor = random.choice(people)
                CommentFactory(
                    user=commentor,
                    thread=thread
                )

Using the transaction.atomic decorator makes a big difference in the runtime of this script, since it bundles up 100s of queries and submits them in one go.

Images

If you need dummy images for your website as well then there are a lot of great free tools online to help. I use adorable.io for dummy profile pics and Picsum or Unsplash for larger pictures like this one: https://picsum.photos/700/500.

picsum-example

Next steps

Hopefully this post helps you spin up a lot of fake data for your Django app very quickly. If you enjoy using Factory Boy to generate your dummy data, then you also might like incorporating it into your unit tests.

If you have any feedback or questions email me at [email protected]