Matt's Dev Bloghttps://mattsegal.dev/2022-05-03T12:00:00+10:00How I hunt down (and fix) errors in production2022-05-03T12:00:00+10:002022-05-03T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2022-05-03:/prod-bug-hunt.html<p>Once you’ve deployed your web app to prod there is a moment of satisfaction: a brief respite where you can reflect on your hard work. You sit, adoringly refreshing the homepage of www.mysite.com to watch it load over and over. It’s beautiful, perfect, timeless. A glittering …</p><p>Once you’ve deployed your web app to prod there is a moment of satisfaction: a brief respite where you can reflect on your hard work. You sit, adoringly refreshing the homepage of www.mysite.com to watch it load over and over. It’s beautiful, perfect, timeless. A glittering crystal palace of logic and reason. Then people start to actually use it in earnest and you begin to receive messages like this in Slack:</p>
<blockquote>
<p>Hey Matt. I am not getting reply emails for case ABC123 Jane Doe</p>
</blockquote>
<p>Ideally, with a <a href="https://mattsegal.dev/django-monitoring-stack.html">solid monitoring stack</a>, you will be alerted of bugs and crashes as they happen, but some may still slip through the cracks. In any case, you’ve got to find and fix these issues promptly or your users will learn to distrust you and your software, kicking off a feedback loop of negative perception. Best to nip this in the bud.</p>
<p>So a user has told you about a bug in production, and you’ve gotta fix it - how do you figure out what went wrong? Where do you start? In this post I’ll walk you through an illustrative example of hunting down a bug in our email system.</p>
<h2>The problem</h2>
<p>So this was the message I got over Slack from a user of my website:</p>
<blockquote>
<p>Hey Matt. I am not getting reply emails for case ABC123 Jane Doe</p>
</blockquote>
<p>A user was not receiving an email, despite their client insisting that they had sent the email. That’s all I know so far...</p>
<h2>More detail</h2>
<p>... and it’s not quite enough. I know the case number but that’s not enough to track any error messages efficiently. I followed up with my user to check:</p>
<ul>
<li>what address was used to send the email (eg. jane.doe@gmail.com)</li>
<li>when they attempted to send the email (over the weekend apparently)</li>
</ul>
<p>With this info in hand I can focus my search on a particular time range and sender address.</p>
<h2>Knowledge of the system</h2>
<p>There’s one more piece of info you need to have before you start digging into log files and such: what are the components of the email-receiving system? I assembled this one myself, but under other circumstances, in a team setting, I might ask around to build a complete picture of the system. In this case it looks like this:</p>
<p><img alt="email-system" src="https://mattsegal.dev/img/prod-bug-hunt/email-system.png"></p>
<p>In brief:</p>
<ul>
<li>The client sends an email from their email client</li>
<li>The email travels through the mystical email realm</li>
<li>SendGrid (SaaS product) receives the email via SMTP</li>
<li>SendGrid sends the email content to a webhook URL on my webserver as an HTTP POST request</li>
<li>My web application ingests the POST request and stores the relevant bits in a database table</li>
</ul>
<p>Inside the web server there’s a pretty standard “3 tier” setup:</p>
<ul>
<li>NGINX receives all web traffic, sends requests onwards to the app server</li>
<li>Gunicorn app server running the Django web application</li>
<li>A database hosting all the Django tables (including email content)</li>
</ul>
<p><img alt="web server" src="https://mattsegal.dev/img/prod-bug-hunt/webserver.png"></p>
<h2>My approach</h2>
<p>So, the hunt begins for evidence of this missing email, but where to start looking? One needs a search strategy. In this case, my intuition is to check the “start” and “end” points of this system and work my way inwards. My reasoning is:</p>
<ul>
<li>if we definitely knew that SendGrid did not receive the email, then there’d be no point checking anywhere downstream (saving time)</li>
<li>if we knew that the database contained the email (or it was showing up on the website itself!) then there’d be no point checking upstream services like SendGrid or NGINX (saving time)</li>
</ul>
<p>So do you start upstream or downstream? I think you do whatever’s most convenient and practical. </p>
<p>Of course you may have a special system-specific knowledge that leads you towards checking one particular component first (eg. “our code is garbage it’s probably our code, let’s check that first”), which is a cool and smart thing to do. Gotta exploit that domain knowledge.</p>
<h2>Did SendGrid get the email?</h2>
<p>In this case it seemed easiest to check SendGrid’s fancy web UI for evidence of an email failing to be received or something. I had a click around and found their reporting on this matter to be... pretty fucking useless to be honest.</p>
<p><img alt="Sendgrid chart" src="https://mattsegal.dev/img/prod-bug-hunt/sendgrid-chart.png"></p>
<p>This is all I could find - so I’ve learned that we usually get emails. Reassuring but not very helpful in this case. They have good reporting on email sending, but this dashboard was disappointingly vague.</p>
<h2>Is the email in the database?</h2>
<p>After checking SendGrid (most upstream) I then checked to see if the the database (most downstream) had received the email content.</p>
<p>As an aside, I also checked if the email was showing up in the web UI, which it wasn’t (maybe my user got confused and looked at the wrong case?). It’s good to quickly check for stupid obvious things just in case.</p>
<p>Since we don’t have a high volume of emails I was able to check the db by just eyeballing the Django admin page. If we were getting many emails per day I would have instead run a query in the Django shell via the ORM (or run an SQL query directly on the db).</p>
<p><img alt="Django admin page" src="https://mattsegal.dev/img/prod-bug-hunt/django-admin.png"></p>
<p>It wasn’t there >:(</p>
<h2>Did my code explode?</h2>
<p>So far we know that <em>maybe</em> SendGrid got the email and it’s definitely not in the database. Since it was easy to do I quickly scanned my error monitoring logs (using <a href="https://sentry.io/for/python/">Sentry</a>) for any relevant errors. Nothing. No relevant application errors during the expected time period found.</p>
<p><img alt="Sentry error logs" src="https://mattsegal.dev/img/prod-bug-hunt/sentry-errors.png"></p>
<p><strong>Aside</strong>: yes my Sentry issue inbox is a mess. I know, it's bad. Think of it like an email in box with 200 unread emails, most of them spam, but maybe a few important ones in the pile. For both emails and error reports, it's best to have a clean inbox.</p>
<p><strong>Aside</strong>: ideally I would get Slack notifications for any production errors and investigate them as they happen but Sentry recently made Slack integration a paid feature and I haven’t decided whether to upgrade or move.</p>
<h2>Did NGINX receive the POST request?</h2>
<p>Looking back upstream, I wanted to know if I could find anything interesting in the NGINX logs. If you’re not familiar with webserver logfiles I give a rundown in <a href="https://mattsegal.dev/django-gunicorn-nginx-logging.html">this article</a> covering a typical Django stack.</p>
<p>All my server logs get sent to SumoLogic, a log aggregator (explained in the “log aggregation” section of <a href="https://mattsegal.dev/django-monitoring-stack.html">this article</a>), where I can search through them in a web UI.</p>
<p>I checked the NGINX access logs for all incoming requests to the email webhook path in the relevant timeframe and found nothing interesting. This shows NGINX is receiving email data in general, which is good.</p>
<p><img alt="Sumologic search of access logs" src="https://mattsegal.dev/img/prod-bug-hunt/sumologic-access-search.png"></p>
<p>Next I checked the NGINX error logs... and found a clue!</p>
<p><img alt="Sumologic search of error logs" src="https://mattsegal.dev/img/prod-bug-hunt/sumologic-error-search.png"></p>
<p>For those who don’t want to squint at the screenshot above this was the error log:</p>
<blockquote>
<p>2022/04/30 02:38:40 [error] 30616#30616: *129401 client intended to send too large body: 21770024 bytes, client: 172.70.135.74, server: www.mysite.com, request: "POST /email/receive/ HTTP/1.1", host: "www.mysite.com”</p>
</blockquote>
<p>This error, which occurs when in receiving a POST request to the webhook URL, lines up with the time that the client apparently sent the email. So it seems likely that this is related to the email problem.</p>
<h2>What is going wrong?</h2>
<p>I googled the error message and found <a href="https://stackoverflow.com/questions/44741514/nginx-error-client-intended-to-send-too-large-body">this StackOverflow post</a>. It seems that NGINX limits the size of requests that it will receive (which is configurable via the nginx.conf file). I checked my NGINX config and I had a limit of 20MB set. Checking my email ingestion code, it seems like all the file attachments are included in the HTTP request body. So... my guess was that the client sending the email attached more than 20MB of attachments (an uncompressed phone camera image is ~5MB) and NGINX refused to receive that request. Most email providers (eg Gmail) offer ~25MB of attachments per email.</p>
<h2>Testing the hypothesis</h2>
<p>I actually didn’t do this because I got a little over-exicted and immediately wrote and pushed a fix. </p>
<p>What I should have done is verified that the problem I had in mind actually exists. I should have tried to send a 21MB email to our staging server to see if I could reproduce the error, plus asked my user to ask the client if she was sending large files in her email.</p>
<p>Oops. A small fuckup given I think the error message is pretty clear about what the problem is.</p>
<h2>The fix</h2>
<p>The fix was pretty simple, as it often is in these cases, I bumped up the NGINX request size limit (<code>client_max_body_size</code>) to 60MB. That might be a little excessive, perhaps 30MB would have been fine, but whatever. I updated the config file in source control and deployed it to the staging and prod environments. I tested that I can send larger files by sending a 24MB email attachment to the staging server.</p>
<h2>Aftermath</h2>
<p>We’ve asked the client to re-send her email. Hopefully it comes through and all is well.</p>
<p>I checked further back in the SumoLogic and this is not the first time this error has happened, meaning we’ve dropped a few emails. I’ll need to notify the team about this. </p>
<p>If I had more time to spend on this project and I’d consider adding some kind of alert to NGINX error logs so that we’d see them pop up in Slack - maybe SumoLogic offers this, I haven’t checked. </p>
<p>Another option would be going with an alternative to SendGrid that had more useful reporting on failed webhook delivery attempts.</p>
<h2>Overview</h2>
<p>Although it can sometimes be stressful, finding and fixing these problems can also be a lot of fun. It’s like a detective game where you are searching for clues to crack the case.</p>
<p>In summary my advice for productively hunting down errors in production are:</p>
<ul>
<li>Gather info from the user who reported the error</li>
<li>Mentally sketch a map of the system</li>
<li>Check each system component for clues, using a search strategy</li>
<li>Use these clues to develop a hypothesis about what went wrong</li>
<li>Test the hypothesis if you can (before writing a fix)</li>
<li>Build, test, ship a fix (then check it's fixed)</li>
<li>Tell your users the good news</li>
</ul>
<p>Importantly I was only able to solve this issue because I had access to my server log files. A good server monitoring setup makes these issues much quicker and less painful to crack. If you want to know what montioring tools I like to use in my projects, check out <a href="https://mattsegal.dev/django-monitoring-stack.html">my Django montioring stack</a>.</p>How to setup Django with Pytest on GitHub Actions2022-01-13T12:00:00+11:002022-01-13T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2022-01-13:/django-with-pytest-on-github-actions.html<p>Someone recently asked me</p>
<blockquote>
<p>When is a good time to get automated testing setup on a new Django project?</p>
</blockquote>
<p>The answer is "now". There are other good times, but now is best. In this post I'll briefly make my case for why, and show you an example of a minimal …</p><p>Someone recently asked me</p>
<blockquote>
<p>When is a good time to get automated testing setup on a new Django project?</p>
</blockquote>
<p>The answer is "now". There are other good times, but now is best. In this post I'll briefly make my case for why, and show you an example of a minimal setup of Django running tests with <a href="https://docs.pytest.org/en/6.2.x/index.html">pytest</a> with fully automated <a href="https://www.atlassian.com/continuous-delivery/continuous-integration">continuous integration</a> (CI) using <a href="https://github.com/features/actions">GitHub Actions</a>.</p>
<p>As soon as you know a Django project is going to be "serious", then you should get it set up to run tests. So, potentially before you write any features. My approach is to get testing setup and to write a dummy test or two and then get it running in CI. This means that as soon as you start writing features then you will have everything you need to write a real test and have it run automatically on every commit.</p>
<p>The alternate scenario is you start adding features and get swept up in that process. At some point you'll think "hmm maybe I should write a test for this...", but if you don't have tests and CI set up already then you're more likely to say "nah, fuck it I'll do it later" and not write the test. Getting pytest to work with Django on GitHub actions is pretty easy these days. Bite the bullet, it tastes better than you may expect.</p>
<p>Or you could just not write any tests. This is fine for small personal projecs. Tests are a lot of things but they're not fun. For more serious endeavours though, not having tests will lead to riskier deployments, longer feedback loops on errors and less confidence in making big changes. Have you ever done a huge, wild refactor of a chunk of code, followed by a set of passing tests? It feels great man, that's when you're really living.</p>
<p>The other question is: when should I run my tests? Sometimes you forget or you can't be bothered. This is where GitHub Actions (or any other CI) is very useful. You can set this service up to automatically run your tests <em>every time</em> you push a commit up to GitHub.</p>
<p>Let's go then: how do you set up Django + pytest + GitHub Actions? All the code discussed here can be found in this <a href="https://github.com/MattSegal/django-pytest-github-actions">example GitHub repository</a>.</p>
<h2>Installation</h2>
<p>Alongside Django you will need to install <a href="https://docs.pytest.org/en/6.2.x/"><code>pytest</code></a> and <a href="https://pytest-django.readthedocs.io/en/latest/"><code>pytest-django</code></a>. These libraries are not required to run tests with Django: the <a href="https://docs.djangoproject.com/en/4.0/topics/testing/overview/">official docs</a> show you how to use Python's unittest library instead. I like pytest better though, and I think you will too. My <a href="https://github.com/MattSegal/django-pytest-github-actions/blob/master/requirements.txt">requirements.txt</a> file looks like this:</p>
<div class="highlight"><pre><span></span><code>django
pytest
pytest-django
</code></pre></div>
<p>I don't pin my dependencies because I'm lazy: what can I say? I recommend you setup a <a href="https://realpython.com/python-virtual-environments-a-primer/">virtual environment</a> and then install as follows:</p>
<div class="highlight"><pre><span></span><code>pip install -r requirements.txt
</code></pre></div>
<h2>Configuraton</h2>
<p>You can configure pytest with a standard <a href="https://snarky.ca/what-the-heck-is-pyproject-toml/">pyproject.toml</a> file. <a href="https://github.com/MattSegal/django-pytest-github-actions/blob/master/app/pyproject.toml">Here's mine</a>. The most important thing is to set <a href="https://docs.djangoproject.com/en/4.0/topics/settings/#envvar-DJANGO_SETTINGS_MODULE"><code>DJANGO_SETTINGS_MODULE</code></a> so pytest knows which settings to use. It's good to have a separate set of test settings for your project so that you can avoid, for example, accidently changing your production environment with credentials stored in settings when you run a test.</p>
<div class="highlight"><pre><span></span><code><span class="k">[tool.pytest.ini_options]</span><span class="w"></span>
<span class="na">DJANGO_SETTINGS_MODULE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"demo.settings"</span><span class="w"></span>
<span class="na">filterwarnings</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[</span><span class="w"></span>
<span class="w"> </span><span class="na">"ignore::UserWarning",</span><span class="w"></span>
<span class="na">]</span><span class="w"></span>
</code></pre></div>
<p>This file should live in whichever folder you will be running <code>pytest</code> from. For the reference project, that means in the <code>./app</code> folder alongside <code>manage.py</code>.</p>
<h2>Adding a dummy test</h2>
<p>That's a good start. Now we can test the setup so far with a dummy test. This test does nothing: it always passes, but it verifies that all the plumbing is working. In pytest, tests are just functions that use assert statements to check things:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_nothing</span><span class="p">():</span>
<span class="sd">"""A dummy test"""</span>
<span class="k">assert</span> <span class="kc">True</span>
</code></pre></div>
<p>Pytest looks for a <code>tests</code> folder in your Django apps. For example, here is the <a href="https://github.com/MattSegal/django-pytest-github-actions/tree/master/app/web/tests">tests folder</a> in the reference project. So this dummy test function could live in a file named <code>app/web/tests/test_dummy.py</code>. You can add as many tests to a file as you like, or have as many test files as you like. Avoid duplicate names though!</p>
<h2>Running the tests locally</h2>
<p>At this stage it's good to check that the dummy test works by running pytest from the command line:</p>
<div class="highlight"><pre><span></span><code>pytest -vv
</code></pre></div>
<p>Read <code>-vv</code> as "very verbose". Here are <a href="https://github.com/MattSegal/django-pytest-github-actions#running-tests">specific instructions</a> for anyone trying out the reference project. Hopefully that worked. You may see a folder called <code>.pytest_cache</code> appear in your project. I recommend you <a href="https://www.atlassian.com/git/tutorials/saving-changes/gitignore">gitignore</a> this.</p>
<p>Now let's add some more meaningful example tests before we move on to setting up GitHub Actions.</p>
<h2>Adding a basic view test</h2>
<p>My reference project has a very basic view named "goodbye" which just returns the text "Goodbye world". Here it is:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">goodbye_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Goodbye world"</span><span class="p">)</span>
</code></pre></div>
<p>You can test that this view returns the expected response using the <a href="https://docs.djangoproject.com/en/4.0/topics/testing/tools/#the-test-client">Django test client</a>. Pytest has a handy feature called <a href="https://docs.pytest.org/en/6.2.x/fixture.html">fixtures</a>, which is a little piece of magic where you ask for an speficic object via the test function arguments and pytest automagically provides it. In this case we add "client" to the function arguments to get a test client. It's a little out of scope for this post, but you can write your own fixtures too!</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_goodbye_view</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="sd">"""Test that goodbye view works"""</span>
<span class="c1"># Build the URL from the url's name</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"goodbye"</span><span class="p">)</span>
<span class="c1"># Make a GET request to the view using the test client</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="c1"># Verify that the response is correct</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">content</span> <span class="o">==</span> <span class="sa">b</span><span class="s2">"Goodbye world"</span>
</code></pre></div>
<p>Very nice, but you will find that you need to do a little more work to test views that include database queries.</p>
<h2>Adding a view test with database interaction</h2>
<p>With pytest-django you need to <em>explicitly</em> request access to the database using the <a href="https://pytest-django.readthedocs.io/en/latest/helpers.html#pytest-mark-django-db-request-database-access">pytest.mark.django_db</a> decorator. Below is an example of a test that hits the database. In this example there is a page view counter that increments +1 every time someone views the page:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">hello_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">counter</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">PageViewCount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">get_or_create</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s2">"hello"</span><span class="p">)</span>
<span class="n">counter</span><span class="o">.</span><span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">counter</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Hello world. The counter is: </span><span class="si">{</span><span class="n">counter</span><span class="o">.</span><span class="n">count</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>So if you load the page over and over again it should say:</p>
<div class="highlight"><pre><span></span><code>Hello world. The counter is: 1
Hello world. The counter is: 2
Hello world. The counter is: 3
Hello world. The counter is: 4
... etc
</code></pre></div>
<p>Here is a test for this view:</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pytest</span>
<span class="kn">from</span> <span class="nn">django.urls</span> <span class="kn">import</span> <span class="n">reverse</span>
<span class="kn">from</span> <span class="nn">web.models</span> <span class="kn">import</span> <span class="n">PageViewCount</span>
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">django_db</span>
<span class="k">def</span> <span class="nf">test_hello_view</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"hello"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">PageViewCount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">==</span> <span class="mi">0</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="k">assert</span> <span class="n">PageViewCount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">==</span> <span class="mi">1</span>
<span class="n">counter</span> <span class="o">=</span> <span class="n">PageViewCount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">last</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">counter</span><span class="o">.</span><span class="n">count</span> <span class="o">==</span> <span class="mi">1</span>
<span class="k">assert</span> <span class="sa">b</span><span class="s2">"Hello world"</span> <span class="ow">in</span> <span class="n">response</span><span class="o">.</span><span class="n">content</span>
<span class="k">assert</span> <span class="sa">b</span><span class="s2">"The counter is: 1"</span> <span class="ow">in</span> <span class="n">response</span><span class="o">.</span><span class="n">content</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="n">counter</span><span class="o">.</span><span class="n">refresh_from_db</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">counter</span><span class="o">.</span><span class="n">count</span> <span class="o">==</span> <span class="mi">2</span>
<span class="k">assert</span> <span class="sa">b</span><span class="s2">"The counter is: 2"</span> <span class="ow">in</span> <span class="n">response</span><span class="o">.</span><span class="n">content</span>
</code></pre></div>
<h2>Setting up GitHub Actions</h2>
<p>Ok so all our tests are running locally, how do we get them to run automatically in GitHub Actions? You can configure an action by adding a config file to your GitHub project at the location <code>.github/workflows/whatever.yml</code>. I named mine <a href="https://github.com/MattSegal/django-pytest-github-actions/blob/master/.github/workflows/tests.yml">tests.yml</a>.</p>
<p>Let's walk through the contents of this file (docs <a href="https://docs.github.com/en/actions">here</a>):</p>
<div class="highlight"><pre><span></span><code><span class="c1"># The name of the action</span><span class="w"></span>
<span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Django Tests</span><span class="w"></span>
<span class="c1"># When the action is triggered</span><span class="w"></span>
<span class="nt">on</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">push</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">branches</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span><span class="w"></span>
<span class="w"> </span><span class="nt">pull_request</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">branches</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span><span class="w"></span>
<span class="c1"># What to do when the action is triggered</span><span class="w"></span>
<span class="nt">jobs</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="c1"># A job called 'build' - arbitrary</span><span class="w"></span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Run on a Ubuntu VM</span><span class="w"></span>
<span class="w"> </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ubuntu-latest</span><span class="w"></span>
<span class="w"> </span><span class="nt">steps</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Checkout the GitHub repo</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">actions/checkout@v2</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Install Python 3.8</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Set up Python 3.8</span><span class="w"></span>
<span class="w"> </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">actions/setup-python@v2</span><span class="w"></span>
<span class="w"> </span><span class="nt">with</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">python-version</span><span class="p">:</span><span class="w"> </span><span class="s">"3.8"</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Pip install project dependencies</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Install dependencies</span><span class="w"></span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span><span class="w"></span>
<span class="w"> </span><span class="no">python -m pip install --upgrade pip</span><span class="w"></span>
<span class="w"> </span><span class="no">pip install -r requirements.txt</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Move into the Django project folder (./app) and run pytest</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Test with pytest</span><span class="w"></span>
<span class="w"> </span><span class="nt">working-directory</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./app</span><span class="w"></span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pytest -vv</span><span class="w"></span>
</code></pre></div>
<p>That's it, now pytest will run on every commit to master, and every pull request to master. You can see the actions for the reference project <a href="https://github.com/MattSegal/django-pytest-github-actions/actions">here</a>. Every test run will put a little tick or cross in your GitHub commit history.</p>
<p><img alt="test ticks" src="https://mattsegal.dev/django-test-tick.png"></p>
<p>You can also embed a nice little badge in your README:</p>
<p><a href="https://github.com/MattSegal/django-pytest-github-actions/actions/workflows/tests.yml"><img alt="Django Tests" src="https://github.com/MattSegal/django-pytest-github-actions/actions/workflows/tests.yml/badge.svg"></a></p>
<h2>Conclusion</h2>
<p>I hope this post helps you get started with writing and running automated tests for your Django project. They're a real lifesaver. If you liked this post about testing, you might also like this post about different testing styles (<a href="https://mattsegal.dev/alternate-test-styles.html">There's no one right way to test your code</a>) and this post about setting up pytest on GitHub actions, without Django (<a href="https://mattsegal.dev/pytest-on-github-actions.html">Run your Python unit tests via GitHub actions</a>).</p>My (free) Django monitoring stack for 20222022-01-01T12:00:00+11:002022-01-01T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2022-01-01:/django-monitoring-stack.html<p>You've built and deployed a website using Django. Congrats!
After that initial high of successfully launching your site comes the grubby work of fixing bugs. There are so many things that <s>can</s> will go wrong.
Pages may crash with 500 errors in prod, but not locally. Some offline tasks never …</p><p>You've built and deployed a website using Django. Congrats!
After that initial high of successfully launching your site comes the grubby work of fixing bugs. There are so many things that <s>can</s> will go wrong.
Pages may crash with 500 errors in prod, but not locally. Some offline tasks never finish. The site becomes <a href="https://twitter.com/mattdsegal/status/1473462877772136448">mysteriously unresponsive</a>. This one pain-in-the-ass user keeps complaining that file uploads "don't work"
but refuses to elaborate further: "they just don't work okay!?!".</p>
<p>If enough issues crop up and you aren't able to solve them quickly and decisively, then you will lose the precious trust of your coworkers or clients. Often reputational damage isn't caused by the bug itself, but by the perception that you have no idea what's going on.</p>
<p>Imagine that you are able to find out about bugs or outages <em>as they happen</em>. You proactively warn your users that the site is down, not the other way around. You can quickly reproduce problems locally and push a fix to prod in a matter of hours. Sounds good right? You're going to need a good "monitoring stack" to achieve this dream state of omniscient hyper-competence.</p>
<p>You'll need a few different (free) tools to get a holistic picture of what your Django app is doing:</p>
<ul>
<li><strong>Uptime monitoring</strong>: tells you when the site is down (<a href="https://www.statuscake.com/">StatusCake</a>)</li>
<li><strong>Error reporting</strong>: tells you when an application error occurs, collects details (<a href="https://sentry.io/welcome/">Sentry</a>)</li>
<li><strong>Log aggregation</strong>: allows you to read about what happened on your servers (<a href="https://www.sumologic.com/">Sumologic</a>)</li>
<li><strong>Performance</strong>: tells you how long requests took, what's fast, what's slow (<a href="tps://sentry.io/welcome/">Sentry</a>, <a href="https://newrelic.com/products/application-monitoring">New Relic</a>)</li>
</ul>
<p>In the rest of this post I'll talk about these SaaS tools in more detail and why I like to use the ones linked above.</p>
<h2>Uptime monitoring</h2>
<p>It's quite embarrasing when your site goes down, but what's more embarrasing is when you learn about it from <em>someone else</em>. An uptime monitoring service can help: it sends a request to your site every few minutes and pings you (Slack, email) when it's unresponsive. This allows you to quickly get your site back online, hopefully before anyone notices. If you want to get fancy you can build a health check route (eg. <code>/health-check/</code>) into your Django app which, for example, checks that the database, or cache, or whatever are still online as well.</p>
<p>Another benefit of uptime monitoring is that you'll get a clear picture of when the outage started. For example, in the picture below you can see that a website of mine stopped responding to requests between ~21:00 and ~23:30 UTC. You can use this knowledge of exactly <em>when</em> the site become unresponsive to check other sources of information, such as server logs or error reports for clues.</p>
<p><img alt="downtime" src="https://mattsegal.dev/img/downtime.png"></p>
<p>I like to use <a href="https://www.statuscake.com/">StatusCake</a> for this function because it's free, simple and easy to set up.</p>
<h2>Error reporting</h2>
<p>There are lots of ways for your site to break that don't render it completely unresponsive. A user might click a button to submit a form and receive a 500 error page because you made some trivial coding mistake that wasn't caught by your <a href="https://mattsegal.dev/pytest-on-github-actions.html">automated testing pipeline</a>. This user comes to you and complains that "the site is broken". Sometimes they will provide you with a very detailed explanation of what they did to produce the error, which you can use to replicate the issue, but as often as not they may, infuriated by your shitty website and seemingly antagonistic line of questioning, follow up with "iTs JuST brOken OKAY!?". Wouldn't it be nice to get the detailed information that you need to fix the bug without having to talk to a human?</p>
<p>This is where error reporting comes in. When your Django web app catches some kind of exception, then an error reporting library can inspect the error and send the details to a SaaS service which records it for you. These error reporting tools capture heaps of useful information, such as:</p>
<ul>
<li>When the error happened first and most recently</li>
<li>The exception type and message</li>
<li>Which line of code triggered the error</li>
<li>The stack trace of the error</li>
<li>The value of local variables in each frame of the stack trace</li>
<li>The Python version, package versions, user browser, IP, etc etc etc.</li>
</ul>
<p>This rich source of information makes error reporting a vital tool. It really shines when you encounter errors that <em>only</em> happen in production, where you have no idea how to replicate them locally. <a href="https://sentry.io/welcome/">Sentry</a> is great for this task because it's free, easy to set up and has a great web UI. You can set up Sentry to send you error alerts via Slack and/or email.</p>
<h2>Log aggregation</h2>
<p>Production errors can be more complicated than a simple Python exception crashing a page. Sometimes, much more complicated. If you want to get a feel for the twisted shit computers will get up to then give <a href="https://rachelbythebay.com/w/">Rachel by the Bay</a> a read. To solve the trickier issues in production you're going to need to reconstruct what actually happened at the time of the error. You'll need to draw upon multiple sources of information, such as:</p>
<ul>
<li>application logs (eg. <a href="https://mattsegal.dev/file-logging-django.html">Django logs</a>)</li>
<li>webserver logs (eg. <a href="https://mattsegal.dev/django-gunicorn-nginx-logging.html">NGINX, Gunicorn logs</a>)</li>
<li>logs from other services (eg. Postgres, syslog, etc)</li>
</ul>
<p>You can <code>ssh</code> into your server and read these logs from the command line using <code>less</code> or <code>grep</code> or <code>awk</code> or something. Even so, it's much more convenient to access these logs via a log aggregation service's web UI, where you can run search queries to quickly find the log lines of interest. These tools work by running a "logging agent" on your server, which watches files of interest and sends them to a centralised server.</p>
<p><img alt="logging" src="https://mattsegal.dev/img/logging.png"></p>
<p>This model is paritcularly valuable if you have transient infrastructure (servers that don't last forever) or if you have many different servers, or if you want to limit <code>ssh</code> access for security reasons.</p>
<p><a href="https://www.sumologic.com/">Sumologic</a> if my favourite free SaaS for this task because it's easy to install the logging agent and add new files to be watched. The search is pretty good as well. The main downside is that web UI can be a little complicated and overwhelming at times. The search DSL is very powerful but I always need to look up the syntax. Log retention times seem reasonable, 30 days by default. The Sumologic agent seems to consume several hundred MB of RAM (~300MB?).</p>
<p><a href="https://www.papertrail.com/">Papertrail</a> is, in my opinion, worse than Sumologic in every way I can think of. However, it is also free and presents a simple web UI for viewing and searching your logs. If you're interested I wrote about setting up Papertrail <a href="https://mattsegal.dev/django-logging-papertrail.html">here</a>. <a href="https://docs.newrelic.com/docs/logs/get-started/get-started-log-management/">New Relic</a> offer a logging service as well - never tried it though. There are open source logging solutions like <a href="https://www.elastic.co/">Elasticsearch</a> + Kibana and other alternatives, but they come with the downside of having to run them yourself: "now you have two problems".</p>
<h2>Performance montioring</h2>
<p>Sometimes your website isn't broken per-se, but it's too slow. People hate slow websites. You can often diagnose and fix these issues locally using tools like <a href="https://django-debug-toolbar.readthedocs.io/en/latest/">Django Debug Toolbar</a> (I made a video on how to do this <a href="https://mattsegal.dev/django-debug-toolbar-performance.html">here</a>), but sometimes the slowness only happens in production. Furthermore, riffing on the general theme of this article, you want to know about (and fix) slow pages before your boss walks over to your desk and complains about it.</p>
<p>Performance monitoring tools instrument your Django web app and record information about how long various requests take. What's fast? What's slow? Which pages have problems? I recommend that you start out by using <a href="https://sentry.io/welcome/">Sentry</a> for this task because their performance monitoring service comes bundled with their error reporting by default. It's kind of basic, but maybe that's all you need.</p>
<p>The best appilcation performance monitoring for Django that I know of is <a href="https://newrelic.com/products/application-monitoring">New Relic's offering</a>, which seems to have a free tier. The request traces that they track include a very detailed breakdown of <em>where</em> the time was spent in serving a request. For example, it will tell you how much time was spent querying the database, or a cache, or building HTML templates. Sometimes you need that level of detail to solve tricky performance issues. The downside of using New Relic is that you have to reconfigure your app server to boot using their <a href="https://docs.newrelic.com/docs/apm/agents/python-agent/">agent</a> as a wrapper.</p>
<p>Although it's not strictly on-topic, <a href="https://pagespeed.web.dev/">PageSpeed Insights</a> is pretty useful for checking page load performance from a front-end perspective. If you're interested in more on Django web app performance then you might like this post I wrote, where I ponder: <a href="https://mattsegal.dev/is-django-too-slow.html">is Django too slow?</a></p>
<h2>Conclusion</h2>
<p>This list is not exhaustive or definitive, it's just the free-tier tools that I like to use for my freelance and personal projects.
Nevertheless I hope you find them useful.
It can be a pain to integrate them all into your app, but over the long run they'll save you a lot of time and energy.</p>
<p>Be prepared!</p>DevOps in academic research2021-11-21T12:00:00+11:002021-11-21T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2021-11-21:/devops-academic-research.html<p>I'd like to share some things I've learned and done in the 18 months I worked as a "Research DevOps Specialist" for a team of infectious disease <a href="https://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated/1-what-epidemiology">epidemiologists</a>.
Prior to this job I'd worked as a web developer for four years and I'd found that the day-to-day had become quite …</p><p>I'd like to share some things I've learned and done in the 18 months I worked as a "Research DevOps Specialist" for a team of infectious disease <a href="https://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated/1-what-epidemiology">epidemiologists</a>.
Prior to this job I'd worked as a web developer for four years and I'd found that the day-to-day had become quite routine. Web dev is a mature field where most of the hard problems have been solved. Looking for something new, I started a new job at a local university in early 2020. The job was created when my colleagues wrote ~20k lines of Python code and then found out what a pain in the ass it is to maintain a medium-sized codebase. It's the usual story: the code is fragile, it's slow, it's easy to break things, changes are hard to make. I don't think this situation is anyone's fault per-se: it arises naturally whenever you write a big pile of code.</p>
<p>In the remainder of this post I'll talk about the application we were working on and the awesome, transformative, <superlative> power of:</p>
<ul>
<li>mapping your workflow</li>
<li>an automated test suite</li>
<li>performance improvements</li>
<li>task automation</li>
<li>visualisation tools; and</li>
<li>data management</li>
</ul>
<p>If you're a web developer, you might be interested to see how familar practices can be applied in different contexts. If you're an academic who uses computers in your work, then you might be interested to learn how some ideas from software development can help you be more effective.</p>
<h2>The application in question</h2>
<p>We were working on a <a href="https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology">compartmental</a> infectious disease model to simulate the spread of tuberculosis. Around March 2020 the team quickly pivoted to modelling COVID-19 as well (surprise!). There's documentation <a href="http://summerepi.com/">here</a> with <a href="http://summerepi.com/examples/index.html">examples</a> if you want to poke around.</p>
<p>In brief, it works like this: you feed the model some data for a target region (population, demographics, disease attributes) and then you simulate what's going to happen in the future (infections, deaths, etc). This kind of modelling is useful for exploring different scenarios, such as "what would happen if we closed all the schools?" or "how should we roll out our vaccine?". These results are presented to stakeholders, usually from some national health department, via a PowerBI dashboard. Alternatively the results are included in a fancy academic paper as graphs and tables.</p>
<p><img alt="notifications" src="https://mattsegal.dev/img/devops-academia/notifications.png"></p>
<p>(Note: "notifications" are the infected cases that we know about)</p>
<p>A big part of our workflow was model calibration. This is where we would build a disease model with variable input parameters, such as the "contact rate" (proportional to how infectious the disease is), and then try to learn the best value of those parameters given some historical data (such as a timeseries of the number of cases). We did this calibration using a technique called <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov chain Monte Carlo</a> (MCMC). MCMC has many nice statistical properties, but requires running the model 1000 to 10,000 times - which is quite computationally expensive.</p>
<p><img alt="calibration" src="https://mattsegal.dev/img/devops-academia/calibration.png"></p>
<p>This all sounds cool, right? It was! The problem is that when I started. the codebase just hadn't been getting the care it needed given its size and complexity. It was becoming unruly and unmanageable. Trying to read and understand the code was stressing me out.</p>
<p>Furthermore, running calibrations was <em>slow</em>. It could take days or weeks. There was a lot of manual toil where someone needed to upload the application to the university computer cluster, babysit the run and download the outputs, and then post-process the results on their laptop. The execution of the code itself took days or weeks. This time-sink is a problem when you're trying to submit an academic paper and a reviewer is like "hey can you just re-run everything with this one small change" and that means re-running days or weeks of computation.</p>
<p>So there were definitely some pain points and room for improvement when I started.</p>
<h2>Improving our workflow with DevOps</h2>
<p>The team knew that there were problems and everybody wanted to improve the way we worked. If I could point to any key factor in our later succeses it would be their willingness to change and openness to new things.</p>
<p>I took a "DevOps" approach to my role (it was in the job title after all). What do I mean by DevOps? This <a href="https://www.atlassian.com/devops/what-is-devops">article</a> sums it up well:</p>
<blockquote>
<p>a set of practices that works to automate and integrate the processes between [different teams], so they can build, test, and release software faster and more reliably</p>
</blockquote>
<p>Traditionally this refers to work done by Software <strong>Dev</strong>elopers and IT <strong>Op</strong>eration<strong>s</strong>, but I think it can be applied more broadly. In this case we had a software developer, a mathematician, an epidemiologist and a data visualisation expert working on a common codebase.</p>
<p>A key technique of DevOps is to think about the entire system that produces finished work. You want to conceive of it as a kind of pipeline to be optimised end-to-end, rather than focusing on any efficiencies achieved by individuals in isolation. One is encouraged to explicitly map the flow of work through the system. Where does work come from? What stages does it need to flow through to be completed? Where are the bottlenecks? Importantly: what is the goal of the system?</p>
<p>In this case, I determined that our goal was to produce robust academic research, in the form of published papers or reports. My key metric was to minimise "time to produce a new piece of research", since I believed that our team's biggest constraint was time, rather than materials or money or ideas or something else. Another key metric was "number of errors", which should be zero: it's bad to publish incorrect research.</p>
<p>If you want to read more about DevOps I recommend checking out <a href="https://www.goodreads.com/book/show/17255186-the-phoenix-project">The Phoenix Project</a> and/or <a href="https://www.amazon.com.au/Goal-Process-Ongoing-Improvement/dp/0884271951">The Goal</a> (the audiobooks are decent).</p>
<h2>Mapping the workflow</h2>
<p>As I mentioned, you want to conceive of your team's work as a kind of pipeline. So what was our pipeline? After chatting with my colleagues I came up with something like this:</p>
<p><img alt="workflow" src="https://mattsegal.dev/img/devops-academia/autumn-workflow.png"></p>
<p>It took several discussions to nail this process down. People typically have decent models of how they work floating around in their heads, but it's not common to write it out explicitly like this. Getting this workflow on paper gave us some clear targets for improvement. For example:</p>
<ul>
<li>Updating a model required tedious manual testing to check for regressions</li>
<li>The update/calibrate cycle was the key bottleneck, because calibration ran slowly and manual steps were required to run long jobs on the compute cluster</li>
<li>Post processing was done manually and was typically only done by the one person who knew the correct scripts to run</li>
</ul>
<h2>Testing the codebase</h2>
<p>My first concern was testing. When I started there were no automated tests for the code. There were a few little scripts and "test functions" which you could run manually, but nothing that could be run as a part of <a href="https://www.atlassian.com/continuous-delivery/continuous-integration">continuous integration</a>.</p>
<p>This was a problem. Without tests, errors will inevitably creep into the code. As the complexity of the codebase increases, it becomes infeasible to manually check that everything is working since there are too many things to check. In general writing code that is correct the first time isn't too hard - it's not breaking it later that's difficult.</p>
<p>In the context of disease modelling, automated tests are even more important than usual because the correctness of the output cannot be easily verified. The whole point of the system is to calculate an output that would be infeasible for a human to produce. Compare this scenario to web development where the desired output is usually known and easily verified. You can usually load up a web page and click a few buttons to check that the app works.</p>
<h2>Smoke Tests</h2>
<p>So where did I start? Trying to add tests to an untested codebase with thousands of lines of code is very intimidating. I couldn't simply sit down and write unit tests for every little bit of functionality because it would have taken weeks. So instead I wrote "smoke tests". A smoke test runs some code and checks that it doesn't crash. For example:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_covid_malaysia</span><span class="p">():</span>
<span class="sd">"""Ensure the Malaysia region model can run without crashing"""</span>
<span class="c1"># Load model configuration.</span>
<span class="n">region</span> <span class="o">=</span> <span class="n">get_region</span><span class="p">(</span><span class="s2">"malaysia"</span><span class="p">)</span>
<span class="c1"># Build the model with default parameters.</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">region</span><span class="o">.</span><span class="n">build_model</span><span class="p">()</span>
<span class="c1"># Run the model, don't check the outputs.</span>
<span class="n">model</span><span class="o">.</span><span class="n">run_model</span><span class="p">()</span>
</code></pre></div>
<p>To some this may look crimininally stupid, but these tests give fantastic bang-for-buck. They don't tell you whether the model outputs are correct, but they only takes a few minutes to write. These tests catch all sorts of stupid bugs: like someone trying to add a number to a string, undefined variables, bad filepaths, etc. They doesn't help so much in reducing semantic errors, but they do help with development speed.</p>
<h2>Continuous Integration</h2>
<p>A lack of testing is the kind of problem that people don't know they have. When you tell someone "hey we need to start writing tests!" the typical reaction is "hmm yeah sure I guess, sounds nice..." and internally they're thinking "... but I've got more important shit to do". You can try browbeating them by telling them how irresponsible they're being etc, but that's unlikely to actually get anyone to write and run tests on their own time.</p>
<p>So how to convince people that testing is valuable? You can <em>show</em> them, with the magic of ✨continuous integration✨. Our code was hosted in GitHub so I set up GitHub Actions to automatically run the new smoke tests on every commit to master. I've written a short guide on how to do this <a href="https://mattsegal.dev/pytest-on-github-actions.html">here</a>.</p>
<p>This setup makes tests visible to everyone. There's a little tick or cross next to every commit and, importantly, next to the name of the person who broke the code.</p>
<p><img alt="test-failures" src="https://mattsegal.dev/img/devops-academia/test-failures.png"></p>
<p>With this system in place we eventually developed new norms around keeping the tests passing. People would say "Oops! I broke the tests!" and it became normal to run the tests locally and fix them if they were broken. It was a little harder to encourage people to invest time in writing new tests.</p>
<p>Once I become more familiar with the codebase I eventually wrote integration and unit tests for the critical modules. I've written a bit more about some testing approaches I used <a href="https://mattsegal.dev/alternate-test-styles.html">here</a>.</p>
<p>Something that stood out to me in this process was that perhaps the most valuable thing I did in that job was one of the easiest things to do. Setting up continuous integration with GitHub took me an hour to two, but it's been paying dividends for ~2 years since. How hard something is to do and how valuable it is are different things.</p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Performance improvements</h2>
<p>The code was too slow and the case for improving performance was clear. Slowness can be subjective, I've <a href="https://mattsegal.dev/is-django-too-slow.html">written a little</a> about the different meanings of "slow" in backend web dev, but in this case having to wait 2+ days for a calibration result was obviously way too slow and was our biggest productivity bottleneck.</p>
<p>The core of the problem was that a MCMC calibration had to run the model over 1000 times. When I started, a single model run took about 2 minutes. Doing that 1000 times means ~33 hours of runtime per calibration. Our team's mathematician worked on trying to make our MCMC algorithm more sample-efficient, while I tried to push down the 2 minute inner loop.</p>
<p>It wasn't hard to do better, since performance optimisation hadn't been a priority so far. I used Python's cProfile module, plus a few visualisation tools to find the hot parts of the code and speed them up. <a href="https://julien.danjou.info/guide-to-python-profiling-cprofile-concrete-case-carbonara/">This article</a> was a lifesaver. In broad strokes, these were the kinds of changes that improved performance:</p>
<ul>
<li>Avoid redundant re-calculation in for-loops</li>
<li>Switching data structures for more efficient value look-ups (eg. converting a list to a dict)</li>
<li>Converting for-loops to matrix operations (<a href="https://en.wikipedia.org/wiki/Vectorization">vectorisation</a>)</li>
<li>Applying JIT optimisation to hot, pure, numerical functions (<a href="https://numba.pydata.org/">Numba</a>)</li>
<li>Caching function return values (<a href="https://en.wikipedia.org/wiki/Memoization">memoization</a>)</li>
<li>Caching data read from disk</li>
</ul>
<p>This work was heaps of fun. It felt like I was playing a video game. Profile, change, profile, change, always trying to get a new high score. Initially there were lots of easy, huge wins, but it became harder to push the needle over time.</p>
<p>After several months the code was 10x to 40x faster, running a model in 10s or less, meaning we could run 1000 iterations in a few hours, rather than over a day. This had a big impact on our ability to run calibrations for weekly reports, but the effects of this speedup were felt more broadly. To borrow a phrase: "more is different". Our tests ran faster. CI was more snappy and people were happier to run the tests locally, since they would take 10 seconds rather than 2 minutes to complete. Dev work was faster since you could tweak some code, run it, and view the outputs in seconds. In general, these performance improvements opened up other opportunities for working better that weren't obvious from the outset.</p>
<p>There were some performance regressions over time as the code evolved. To try and fight these slowdowns I added automatic <a href="https://github.com/benchmark-action/github-action-benchmark">benchmarking</a> to our continuous integration pipeline.</p>
<h2>Task automation</h2>
<p>Once our calibration process could run in hours instead of days we started to notice new bottlenecks in our workflow. Notably, running a calibration involved a lot of manual steps which were not documented, meaning that only one person knew how to do it.</p>
<p>Interacting with the university's <a href="https://slurm.schedmd.com/documentation.html">Slurm</a> cluster was also a pain. The compute was free but we were at the mercy of the scheduler, which decided when our code would actually run, and the APIs for running and monitoring jobs were arcane and clunky.</p>
<p>Calibrations didn't always run well so this cycle could repeat several times before we got an acceptable result that we would want to use.</p>
<p>Finally, there wasn't a systematic method for recording input and output data for a given model run. It would be hard to reproduce a given model run 6 months later.</p>
<p>The process worked something like this when I started:</p>
<p><img alt="old workflow" src="https://mattsegal.dev/img/devops-academia/old-workflow.png"></p>
<p>It was possible to automate most of these steps. After a lot of thrashing around on my part, we ended up with a workflow that looks like this.</p>
<p><img alt="new workflow" src="https://mattsegal.dev/img/devops-academia/new-workflow.png"></p>
<p>In brief:</p>
<ul>
<li>A disease modeller would update the code and push it to GitHub</li>
<li>Then they could load up a webpage and trigger a job by filling out a form</li>
<li>The calibration and any other post processing would run "in the cloud"</li>
<li>The final results would be available on a website</li>
<li>The data vis guy could pull down the results and push them to PowerBI</li>
</ul>
<p>There were many benefits to this new workflow. There were no more manual tasks. The process could be run by anyone on the team. We could easily run multiple calibrations in parallel (and often did). We also created standard diagnostic plots that would be automatically generated for each calibration run (similar to <a href="https://wandb.ai/site">Weights and Biases</a> for machine learning). For example, these plots show how the model parameters change over the course of a MCMC calibration run.</p>
<p><img alt="parameter traces" src="https://mattsegal.dev/img/devops-academia/param-traces.png"></p>
<p>I won't go into too much detail on the exact implementation of this cloud pipeline. Not my cleanest work, but it did work. It was a collection of Python scripts that hacked together several tools:</p>
<ul>
<li><a href="https://buildkite.com/home">Buildkite</a> for task automation (it's really great)</li>
<li>AWS EC2 for compute</li>
<li>AWS S3 for storing data</li>
<li><a href="https://github.com/boto/boto3">boto3</a> for managing transient servers</li>
<li><a href="https://nextjs.org/">NextJS</a> for building the static results website</li>
</ul>
<p>If I could build it again I'd consider using something like <a href="https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines">Azure ML pipelines</a>. See below for an outline of the cloud architecture if you're curious.</p>
<p><img alt="new workflow, detailed" src="https://mattsegal.dev/img/devops-academia/new-workflow-detailed.png"></p>
<h2>Visualization tools</h2>
<p>Our models had a lot of stuff that needed to be visualised: inputs, outputs, and calibration targets. Our prior approach was to run a Python script which used <a href="https://matplotlib.org/">matplotlib</a> to dump all the required plots to into a folder. So the development loop to visualise something was:</p>
<ul>
<li>Edit the model code, run the model</li>
<li>Run a Python script on the model outputs</li>
<li>Open up a folder and look at the plots inside</li>
</ul>
<p>It's not terrible but there's some friction and toil in there.</p>
<p><a href="https://jupyter.org">Jupyter notebooks</a> were a contender in this space, but I chose to use <a href="https://streamlit.io/">Streamlit</a>, because many of our plots were routine and standardised. With Streamlit, you can use Python to build web dashboards that generate plots based on a user's input. This was useful for disease modellers to quickly check a bunch of different diagnostic plots when working on the model on their laptop. Given it's all Python (no JavaScript), my colleagues were able to independently add their own plots. This tool went from interesting idea to a key fixture of our workflow over a few months.</p>
<p><img alt="streamlit dashboard" src="https://mattsegal.dev/img/devops-academia/streamlit.png"></p>
<p>A key feature of Streamlit is "hot reloading", which is where the code that generates the dashboard automatically re-runs when you change it. This means you can adjust a plot by editing the Python code, hit "save" and the changes will appear in your web browser. This quick feedback loop sped up plotting tasks considerably.</p>
<p><strong>Aside:</strong> This isn't super relevant but while we're here I just want to show off this visualisation I made of an agent based model simulating the spread of a disease through a bunch of households.</p>
<p><img alt="agent based model" src="https://mattsegal.dev/img/devops-academia/abm.gif"></p>
<h2>Data management</h2>
<p>We had quite a variety of data flying around. Demographic inputs like population size, model parameters, calibration targets and the model outputs.</p>
<p>We had a lot of model input parameters stored as YAML files and it was hard to keep them all consistent. We had like, a hundred YAML files when I left.
To catch errors early I used <a href="https://docs.python-cerberus.org/en/stable/">Cerberus</a> and later <a href="https://pydantic-docs.helpmanual.io/">Pydantic</a> to validate parameters as they were loaded from disk.
I wrote smoke tests, which were run in CI, to check that none of these files were invalid. I wrote more about this approach <a href="https://mattsegal.dev/cerberus-config-validation.html">here</a>, although now I prefer Pydantic to Cerberus becuase it's a little less verbose.</p>
<p>We had a lot of 3rd party inputs for our modelling such as <a href="https://www.google.com/covid19/mobility/">Google mobility data</a>, <a href="https://population.un.org/wpp/">UN World Population</a> info, <a href="https://github.com/kieshaprem/synthetic-contact-matrices">social mixing matrices</a>. Initially this data was kept in source control as a random scattering of undocumented .csv and .xls file. Pre-processing was done manually using some Python scripts. I pushed to get all of the source data properly documented and consolidated into a single folder and tried to encourage a standard framework for pre-processing all of our inputs with a single script. As our input data grew to 100s of megabytes I moved these CSV files to GitHub's <a href="https://git-lfs.github.com/">Git LFS</a>, since our repo was getting quite hefty and slow to download (>400MB).</p>
<p>In the end hand-rolled a lot of functionality that I probably shouldn't have. If you want to organise and standardise all your input data, I recommend checking out <a href="https://dvc.org/">Data Version Control</a>.</p>
<p>Finally I used AWS S3 to store all of the outputs, intermediate values, log files and plots produced by cloud jobs. Each job was stored using a key that included the model name, region name, timestamp and git commit. This was very helpful for debugging and convenient for everybody on the team to access via our results website. The main downside was that I had to occasionally manually prune ~100GB of results from S3 to keep our cloud bills low.</p>
<h2>Wrapping Up</h2>
<p>Overall I look back on this job fondly. You might have noticed that I've written thousands of words about it. There were some downsides specific to the academic environment. There was an emphasis on producing novel results, especially in the context of COVID in 2020, and as a consequence there were a lot of "one off" tasks and analyses. The codebase was constantly evolving and it felt like I was always trying to catch-up. It was cool working on things that I'd never done before where I didn't know what the solution was. I drew a lot of inspiration from machine learning and data science.</p>
<p>Thanks for reading. If this sounds cool and you think you might like working as a software developer in academia, then go pester some academics.</p>
<p>If you read this and were like "wow! we should get this guy working for us!", I've got good news. I am looking for projects to work on as a freelance web developer. See <a href="https://mattsegal.com.au/">here</a> for more details.</p>How to compress images for a webpage2021-05-14T12:00:00+10:002021-05-14T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2021-05-14:/webpage-image-compressiom.html<p>Often when you're creating a website, a client or designer will provide you with large images that are 2-5MB in size and thousands of pixels wide.
The large file size of these images will make them slow to load on your webpage, making it seem slow and broken</p>
<p>This video …</p><p>Often when you're creating a website, a client or designer will provide you with large images that are 2-5MB in size and thousands of pixels wide.
The large file size of these images will make them slow to load on your webpage, making it seem slow and broken</p>
<p>This video shows you a quick browser-only workflow for cropping, resizing and compressing these images so that they will load more quickly on a webpage.
It's not very advanced, but it doesn't need to be. Here I convert images from ~2MB to ~100kB, which is a ~20x reduction in file size.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/ZtzdpWQzidM"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>How to setup Django with React2020-10-24T12:00:00+11:002020-10-24T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2020-10-24:/django-react.html<p>It's not too hard to get started with either Django or React. Both have great documentation and there are lots of tutorials online.
The tricky part is getting them to work together. Many people start with a Django project and then decide that they want to "add React" to it …</p><p>It's not too hard to get started with either Django or React. Both have great documentation and there are lots of tutorials online.
The tricky part is getting them to work together. Many people start with a Django project and then decide that they want to "add React" to it.
How do you do that though? Popular React scaffolding tools like <a href="https://github.com/facebook/create-react-app">Create React App</a> don't offer you a clear way to integrate with Django, leaving you to figure it out yourself. Even worse, there isn't just one way to set up a Django/React project. There are dozens of <a href="https://mattsegal.dev/django-spa-infrastructure.html">possible methods</a>, each with different pros and cons. Every time I create a new project using these tools I find the options overwhelming.</p>
<p>I think that most people should start with a setup that is as close to vanilla Django as possible: you take your existing Django app and sprinkle a little React on it to make the frontend more dynamic and interactive. For most cases, creating a completely seperate "single page app" frontend creates a lot of complexity and challenges without providing very much extra value for you or your users.</p>
<p>In this series of posts I will present an opinionated guide on how to setup and deploy a Django/React webapp. The focus will be on keeping things simple, incremental and understanding each step. I want you to be in a position to debug any problems yourself. At the end of each post, you should have a working project that you can use.</p>
<p>I'm going to assume that you know:</p>
<ul>
<li>the <a href="https://developer.mozilla.org/en-US/docs/Learn/Getting_started_with_the_web">basics of web development</a> (HTML, CSS, JavaScript)</li>
<li>the <a href="https://docs.djangoproject.com/en/3.1/intro/tutorial01/">basics of Django</a> (views, templates, static files)</li>
<li>the <a href="https://reactjs.org/tutorial/tutorial.html">basics of React</a> (components, props, rendering)</li>
</ul>
<p>I'm <strong>not</strong> going to assume that you know anything about Webpack, Babel, or any other JavaScript toolchain insanity.</p>
<h2>Example project</h2>
<p>The example code for this guide is hosted on <a href="https://github.com/MattSegal/django-react-guide">this GitHub repo</a>. The code for each section is available as a Git branch:</p>
<ul>
<li><a href="https://github.com/MattSegal/django-react-guide/tree/part-1-initial-django">Starting point</a></li>
<li><a href="https://github.com/MattSegal/django-react-guide/tree/part-2-add-webpack">Adding Webpack</a></li>
<li><a href="https://github.com/MattSegal/django-react-guide/tree/part-3-add-babel-and-react">Adding Babel and React</a></li>
</ul>
<p>Before you start the rest of the guide, I recommend setting up the example project by cloning the repo and following the instructions in the <a href="https://github.com/MattSegal/django-react-guide/blob/part-1-initial-django/README.md">README</a>:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/MattSegal/django-react-guide.git
</code></pre></div>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/d238b8eb58dd44c89af7a4e3dd0c42a1" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>Django and static files</h2>
<p>Before we dig into React, Babel and Webpack, I want to make sure that we have a common understanding around how static files work in Django:</p>
<p><img alt="views and static files" src="https://mattsegal.dev/views-static.png"></p>
<p>The approach of this guide will be to re-use a lot of this existing setup. We will create an additional that system inserts our React app's JavaScript into a Django static files folder.</p>
<p><img alt="views and static files plus mystery system" src="https://mattsegal.dev/views-static-mystery.png"></p>
<h2>Why can't we just write React in a single static file?</h2>
<p>Why do we need to add a new system? Django is pretty complicated already. Can't we just write our React app in a single JavaScript file like you usually do when writing JavaScript for webpages? The answer is yes, you totally can! You can write a complete React app in a single HTML file:</p>
<div class="highlight"><pre><span></span><code><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="cm"><!-- React mount point --></span>
<span class="p"><</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">"app"</span><span class="p">></</span><span class="nt">div</span><span class="p">></span>
<span class="cm"><!-- Download React library scripts --></span>
<span class="p"><</span><span class="nt">script</span> <span class="na">crossorigin</span> <span class="na">src</span><span class="o">=</span><span class="s">"https://unpkg.com/react@16/umd/react.development.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span> <span class="na">crossorigin</span> <span class="na">src</span><span class="o">=</span><span class="s">"https://unpkg.com/react-dom@16/umd/react-dom.development.js"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
<span class="p"><</span><span class="nt">script</span><span class="p">></span><span class="w"></span>
<span class="w"> </span><span class="c1">// Define the React app</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">App</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">count</span><span class="p">,</span><span class="w"> </span><span class="nx">setCount</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">onClick</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">setCount</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'div'</span><span class="p">,</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'h1'</span><span class="p">,</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="s1">'The count is '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">count</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s1">'button'</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">onClick</span><span class="o">:</span><span class="w"> </span><span class="nx">onClick</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="s1">'Count'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Mount the app to the mount point.</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">root</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">'app'</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nx">ReactDOM</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="nx">App</span><span class="p">,</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="kc">null</span><span class="p">),</span><span class="w"> </span><span class="nx">root</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p"></</span><span class="nt">script</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</code></pre></div>
<p>Why don't we just do this? There are a few issues with this approach of writing React apps:</p>
<ul>
<li>We can't use <a href="https://reactjs.org/docs/introducing-jsx.html">JSX</a> syntax in our JavaScript</li>
<li>It's harder to break our JavaScript code up into modules</li>
<li>It's harder to install/use external libraries</li>
</ul>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/8f2c4c6448144246b25beed21a7b4712" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>Webpack</h2>
<p>The example code for this section <a href="https://github.com/MattSegal/django-react-guide/tree/part-1-initial-django">starts here</a> and <a href="https://github.com/MattSegal/django-react-guide/tree/part-2-add-webpack">ends here</a>.</p>
<p>We need a tool that helps us use JSX, and it would be nice to also have a "module bundling system" which lets us install 3rd party libraries and split our JavaScript code up into lots of little files. For this purpose, we're going to use <a href="https://webpack.js.org/">Webpack</a>. Webpack is going to take our code, plus any 3rd party libraries that we want to install and combine them into a single JS file.</p>
<p><img alt="webpack" src="https://mattsegal.dev/webpack.png"></p>
<p>In this step we will just to create a minimal working Webpack setup. We're not goint try to use React yet. By the end of this section, we won't have added any new JavaScript features, but Webpack will be working.</p>
<p>To use Webpack you need to first install <a href="https://nodejs.org/en/">NodeJS</a> so that you can run JavaScript outside of your web browser. You need to be able to run <code>node</code> and <code>npm</code> (the Node Package Manager) before you can continue.</p>
<p>First, go into the example project and create a new folder called <code>frontend</code>.
We'll start by just copying over the existing JavaScript that is used by the Django app in <a href="https://github.com/MattSegal/django-react-guide/blob/part-1-initial-django/backend/todos/static/todos/main.js">main.js</a>. We're going to copy this into a "source code" folder at <code>frontend/src/index.js</code>.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/src/index.js</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">btn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s1">'click'</span><span class="p">)</span><span class="w"></span>
<span class="nx">btn</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s1">'click'</span><span class="p">,</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">alert</span><span class="p">(</span><span class="s1">'You clicked the button!'</span><span class="p">))</span><span class="w"></span>
</code></pre></div>
<p>Inside of the <code>frontend</code> folder, install Webpack using <code>npm</code> as follows:</p>
<div class="highlight"><pre><span></span><code>npm init --yes
npm install webpack webpack-cli
</code></pre></div>
<p>Now is a good time to update your <code>.gitignore</code> file to exclude <code>node_modules</code>. Next, we need to add a file that tells Webpack what to do, which is called <code>webpack.config.js</code></p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/webpack.config.js</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'path'</span><span class="p">)</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">webpack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'webpack'</span><span class="p">)</span><span class="w"></span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Where Webpack looks to load your JavaScript</span><span class="w"></span>
<span class="w"> </span><span class="nx">entry</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">main</span><span class="o">:</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span><span class="w"> </span><span class="s1">'src/index.js'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nx">mode</span><span class="o">:</span><span class="w"> </span><span class="s1">'development'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Where Webpack spits out the results (the myapp static folder)</span><span class="w"></span>
<span class="w"> </span><span class="nx">output</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">path</span><span class="o">:</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span><span class="w"> </span><span class="s1">'../backend/myapp/static/myapp/'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nx">filename</span><span class="o">:</span><span class="w"> </span><span class="s1">'[name].js'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nx">plugins</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Don't output new files if there is an error</span><span class="w"></span>
<span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">webpack</span><span class="p">.</span><span class="nx">NoEmitOnErrorsPlugin</span><span class="p">(),</span><span class="w"></span>
<span class="w"> </span><span class="p">],</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Where find modules that can be imported (eg. React) </span><span class="w"></span>
<span class="w"> </span><span class="nx">resolve</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">extensions</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'*'</span><span class="p">,</span><span class="w"> </span><span class="s1">'.js'</span><span class="p">,</span><span class="w"> </span><span class="s1">'.jsx'</span><span class="p">],</span><span class="w"></span>
<span class="w"> </span><span class="nx">modules</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span><span class="w"> </span><span class="s1">'src'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span><span class="w"> </span><span class="s1">'node_modules'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="p">],</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Finally let's make it easy to run Webpack by including an entry in the "scripts" section of our <code>package.json</code> file:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/package.json</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
<span class="w"> </span><span class="s2">"scripts"</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="s2">"dev"</span><span class="o">:</span><span class="w"> </span><span class="s2">"webpack --watch --config webpack.config.js"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The <code>--watch</code> flag is particularly useful: it makes Webpack re-run automatically on file change. Now we can run Webpack using <code>npm</code>:</p>
<div class="highlight"><pre><span></span><code>npm run dev
</code></pre></div>
<p>You will now see that the contents of your <code>main.js</code> file has been replaced with a crazy looking <code>eval</code> statement. If you check your Django app at <code>http://localhost:8000</code> you'll see that the JavaScript on the page still works, but it's now using the Webpack build output at <code>http://localhost:8000/static/myapp/main.js</code> </p>
<div class="highlight"><pre><span></span><code><span class="c1">// backend/myapp/static/myapp/main.js</span><span class="w"></span>
<span class="nb">eval</span><span class="p">(</span><span class="s2">"const btn = document.getElementById('click')\nbtn.addEventListener('click', () => alert('You clicked the button!'))\n\n\n//# sourceURL=webpack://frontend/./src/index.js?"</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>This file is the Webpack build output. Webpack has taken our source file (<code>index.js</code>) and transformed it into an output file (<code>main.js</code>): </p>
<p><img alt="webpack minimal" src="https://mattsegal.dev/webpack-minimal.png"></p>
<p>So now we have Webpack working. It's not doing anything particularly useful or interesting yet, but all the plumbing has been set up.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/b3dd1325841646a491728c1478a173d3" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>Source code vs. build outputs</h2>
<p>It's a common newbie mistake to add Webpack build outputs like <code>main.js</code> to source control. It's a mistake because source control is for "source code", not "build artifacts". A build artifact is a file created by a build or compliation process. The reason you don't add build artifacts is because they're redundant: they are fully defined by the source code, so adding them just bloats the repo without adding any extra information. Even worse, having a mismatch between source code and build artifacts can create nasty errors that are hard to find. Some examples of build artifacts:</p>
<ul>
<li>Python bytecode (.pyc) file,s which are built from .py files by the Python interpeter</li>
<li>.NET bytecode (.dll) files, built from compiling C# code</li>
<li>Executable (.exe) files, build from compiling C code</li>
</ul>
<p>None of these things should go in source control unless there's a special reason to keep them. In general they should be kept out of Git using the <code>.gitignore</code> file.</p>
<p>My approach for this project is to create a special Webpack-only folder in Django's static file called "build", which is ignored by Git.
To achieve this, you need to update your <code>webpack.config.js</code> file:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/webpack.config.js</span><span class="w"></span>
<span class="c1">// ...</span><span class="w"></span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
<span class="w"> </span><span class="nx">output</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">path</span><span class="o">:</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">__dirname</span><span class="p">,</span><span class="w"> </span><span class="s1">'../backend/myapp/static/myapp/build/'</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="nx">filename</span><span class="o">:</span><span class="w"> </span><span class="s1">'[name].js'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>You will need to restart Webpack for these changes to take effect. Then you can add <code>build/</code> to your <code>.gitignore</code> file.
Finally, you will need to update the static file link in your Django template:</p>
<div class="highlight"><pre><span></span><code><span class="cm"><!-- backend/myapp/templates/myapp/index.html --></span>
<span class="p"><</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">"{% static 'myapp/build/main.js' %}"</span><span class="p">></</span><span class="nt">script</span><span class="p">></span>
</code></pre></div>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/86893cc2f3c14a41ab347bc912678ec9" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>Adding React</h2>
<p>The example code for this section <a href="https://github.com/MattSegal/django-react-guide/tree/part-2-add-webpack">starts here</a> and <a href="https://github.com/MattSegal/django-react-guide/tree/part-3-add-babel-and-react">ends here</a>.</p>
<p>Now that Webpack is working, we can add React. Let's start by installing React in our <code>frontend</code> folder:</p>
<div class="highlight"><pre><span></span><code>npm install react react-dom
</code></pre></div>
<p>Now we can use React in our JavaScript source code. Let's re-use the small counter app I created earlier:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/src/index.js</span>
<span class="k">import</span> <span class="nx">React</span> <span class="nx">from</span> <span class="s">'react'</span>
<span class="k">import</span> <span class="nx">ReactDOM</span> <span class="nx">from</span> <span class="s">'react-dom'</span>
<span class="c1">// Define the React app</span>
<span class="kd">const</span> <span class="nx">App</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="p">[</span><span class="nx">count</span><span class="p">,</span> <span class="nx">setCount</span><span class="p">]</span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">onClick</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="nx">setCount</span><span class="p">(</span><span class="nx">c</span> <span class="o">=></span> <span class="nx">c</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'div'</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span>
<span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'h1'</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span> <span class="s">'The count is '</span> <span class="o">+</span> <span class="nx">count</span><span class="p">),</span>
<span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'button'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">onClick</span><span class="p">:</span> <span class="nx">onClick</span> <span class="p">},</span> <span class="s">'Count'</span><span class="p">),</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="c1">// Mount the app to the mount point.</span>
<span class="kd">const</span> <span class="nx">root</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s">'app'</span><span class="p">)</span>
<span class="nx">ReactDOM</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="nx">App</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span> <span class="k">null</span><span class="p">),</span> <span class="nx">root</span><span class="p">)</span>
</code></pre></div>
<p>Now if you go to <code>http://localhost:8000/</code> you should see a simple counter. If you inspect the contents of <code>main.js</code> at <code>http://localhost:8000/static/myapp/build/main.js</code>, you'll see that there is a <em>lot</em> more stuff included in the file. This is because Webpack has bundled up our code plus the development versions of React and ReactDOM into a single file:</p>
<p><img alt="webpack" src="https://mattsegal.dev/webpack.png"></p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/76bf5c576ff148aea4e0d332507ec381" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Adding Babel</h2>
<p>Next we need at tool that lets us write JSX. We want to be able to write our React components like this:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">App</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="p">[</span><span class="nx">count</span><span class="p">,</span> <span class="nx">setCount</span><span class="p">]</span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">onClick</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="nx">setCount</span><span class="p">(</span><span class="nx">c</span> <span class="o">=></span> <span class="nx">c</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span>
<span class="k">return</span> <span class="p">(</span>
<span class="p"><</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">h1</span><span class="p">></span>The count is <span class="p">{</span><span class="nx">count</span><span class="p">}</</span><span class="nt">h1</span><span class="p">></span>
<span class="p"><</span><span class="nt">button</span> <span class="na">onClick</span><span class="o">=</span><span class="p">{</span><span class="nx">onClick</span><span class="p">}></span>Count<span class="p"></</span><span class="nt">button</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>and then some magic tool transforms it into regular JavaScript, like this:</p>
<div class="highlight"><pre><span></span><code><span class="kd">const</span> <span class="nx">App</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="p">[</span><span class="nx">count</span><span class="p">,</span> <span class="nx">setCount</span><span class="p">]</span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">onClick</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="nx">setCount</span><span class="p">(</span><span class="nx">c</span> <span class="o">=></span> <span class="nx">c</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span>
<span class="k">return</span> <span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'div'</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span>
<span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'h1'</span><span class="p">,</span> <span class="k">null</span><span class="p">,</span> <span class="s">'The count is '</span> <span class="o">+</span> <span class="nx">count</span><span class="p">),</span>
<span class="nx">React</span><span class="p">.</span><span class="nx">createElement</span><span class="p">(</span><span class="s">'button'</span><span class="p">,</span> <span class="p">{</span> <span class="nx">onClick</span><span class="p">:</span> <span class="nx">onClick</span> <span class="p">},</span> <span class="s">'Count'</span><span class="p">),</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>That magic tool is <a href="https://babeljs.io/">Babel</a>, a JavaScript compiler that can transform JSX into standard JavaScript.
Babel can use <a href="https://babeljs.io/docs/en/plugins">plugins</a>, which apply custom transforms to your source code.
It also offers <a href="https://babeljs.io/docs/en/presets">presets</a>, which are groups of plugins that work well together to achieve a goal.</p>
<p>Now we're going to install a whole bunch of Babel stuff with <code>npm</code>:</p>
<div class="highlight"><pre><span></span><code>npm install --save-dev babel-loader @babel/core @babel/preset-react
</code></pre></div>
<p>What the hell is all of this? Let me break it down for you:</p>
<ul>
<li><strong><a href="https://babeljs.io/docs/en/babel-core">@babel/core</a></strong>: The main Babel compiler library</li>
<li><strong><a href="https://babeljs.io/docs/en/babel-preset-react">@babel/preset-react</a></strong>: A collection of React plugins: tranforms JSX to regular JavaScript</li>
<li><strong><a href="https://github.com/babel/babel-loader">babel-loader</a></strong>: Allows Webpack to use Babel</li>
</ul>
<p>These are not the only Babel plugins that I like to use, but I didn't want to add too many new things at once.
In addition to installing the plugins/presets, we need to tell Babel to use them, which we do with a config file called <code>.babelrc</code>.</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/.babelrc</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="s2">"presets"</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"@babel/preset-react"</span><span class="p">]</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Next, we need to tell Webpack to use our new Babel compiler for all our JavaScript files:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/webpack.config.js</span><span class="w"></span>
<span class="c1">// ...</span><span class="w"></span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Add a rule so Webpack reads JS with Babel</span><span class="w"></span>
<span class="w"> </span><span class="nx">module</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">rules</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">test</span><span class="o">:</span><span class="w"> </span><span class="sr">/\.js$/</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">exclude</span><span class="o">:</span><span class="w"> </span><span class="sr">/node_modules/</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">use</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="s1">'babel-loader'</span><span class="p">],</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="p">]},</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span><span class="w"></span>
</code></pre></div>
<p>Essentially, this config change tells Webpack: "for any file ending with <code>.js</code>, use <code>babel-loader</code> on that file, expect for anything in <code>node_modules</code>".
Finally, we can now use JSX in our React app:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// frontend/src/index.js</span>
<span class="k">import</span> <span class="nx">React</span> <span class="nx">from</span> <span class="s">'react'</span>
<span class="k">import</span> <span class="nx">ReactDOM</span> <span class="nx">from</span> <span class="s">'react-dom'</span>
<span class="c1">// Define the React app</span>
<span class="kd">const</span> <span class="nx">App</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="p">[</span><span class="nx">count</span><span class="p">,</span> <span class="nx">setCount</span><span class="p">]</span> <span class="o">=</span> <span class="nx">React</span><span class="p">.</span><span class="nx">useState</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="kd">const</span> <span class="nx">onClick</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="nx">setCount</span><span class="p">(</span><span class="nx">c</span> <span class="o">=></span> <span class="nx">c</span> <span class="o">+</span> <span class="m">1</span><span class="p">)</span>
<span class="k">return</span> <span class="p">(</span>
<span class="p"><</span><span class="nt">div</span><span class="p">></span>
<span class="p"><</span><span class="nt">h1</span><span class="p">></span>The count is <span class="p">{</span><span class="nx">count</span><span class="p">}</</span><span class="nt">h1</span><span class="p">></span>
<span class="p"><</span><span class="nt">button</span> <span class="na">onClick</span><span class="o">=</span><span class="p">{</span><span class="nx">onClick</span><span class="p">}></span>Count<span class="p"></</span><span class="nt">button</span><span class="p">></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="p">)</span>
<span class="p">}</span>
<span class="c1">// Mount the app to the mount point.</span>
<span class="kd">const</span> <span class="nx">root</span> <span class="o">=</span> <span class="nb">document</span><span class="p">.</span><span class="nx">getElementById</span><span class="p">(</span><span class="s">'app'</span><span class="p">)</span>
<span class="nx">ReactDOM</span><span class="p">.</span><span class="nx">render</span><span class="p">(<</span><span class="nt">App</span> <span class="p">/>,</span> <span class="nx">root</span><span class="p">)</span>
</code></pre></div>
<p>You will need to restart Webpack for the config changes to be loaded. After that, you should be able to visit <code>http://localhost:8000/</code> and view your counter app, now working with JSX.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/18e5b20ee31344b588aa17dd902344ce" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h2>Deployment</h2>
<p>I won't cover deployment in detail in this post, because it's long enough already, but in short, you can now deploy your Django/React app as follows:</p>
<ul>
<li>Install JavaScript dependencies with <code>npm</code></li>
<li>Run Webpack to create build artifacts in your Django static files</li>
<li>Deploy Django how you normally would</li>
</ul>
<p>There a few things that it would be good to change before deploying, like not using "development" mode in Webpack, but this workflow should get you started for now.
If you have never deployed a Django app before, I've written an <a href="https://mattsegal.dev/simple-django-deployment.html">introductory guide</a> on that as well, which uses the same incremental, explanation-heavy style as this guide.</p>
<h2>Next steps</h2>
<p>There is a <strong>lot</strong> of stuff I didn't cover in this guide, which I'd like to write about in the future. Here are some things that I didn't cover, which are important or useful when building a React/Django app:</p>
<ul>
<li>Hot reloading</li>
<li>Deployment</li>
<li>Passing requests/data between Django and React</li>
<li>Modular CSS / SCSS / styled components</li>
<li>Routing and code-splitting</li>
<li>Authentication</li>
</ul>How to highlight unused Python variables in VS Code2020-10-09T12:00:00+11:002020-10-09T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2020-10-09:/pylance-vscode.html<p>I make a lot of stupid mistakes when I'm working on Python code. I tend to:</p>
<ul>
<li>make typos in variable names</li>
<li>accidently delete a variable that's used somewhere else</li>
<li>leave unused variables lying around when they should be deleted</li>
</ul>
<p>It's easy to accidentally create code like in the image below …</p><p>I make a lot of stupid mistakes when I'm working on Python code. I tend to:</p>
<ul>
<li>make typos in variable names</li>
<li>accidently delete a variable that's used somewhere else</li>
<li>leave unused variables lying around when they should be deleted</li>
</ul>
<p>It's easy to accidentally create code like in the image below, where you have unused variables (<code>y</code>, <code>z</code>, <code>q</code>) and references to variables that aren't defined yet (<code>z</code>).</p>
<p><img alt="foo-before" src="https://mattsegal.dev/img/pylance/foo-before.png"></p>
<p>You'll catch these issues when you eventually try to run this function, but it's best
to be able to spot them instantly. I want my editor to show me something that looks like this:</p>
<p><img alt="foo-after" src="https://mattsegal.dev/img/pylance/foo-after.png"></p>
<p>Here you can see that the vars <code>y</code>, <code>z</code> and <code>q</code> are greyed out, to show that they're not used. The undefined reference to <code>z</code> is highlighted with a yellow squiggle. This kind of instant visual feedback means you can write better code, faster and with less mental overhead.</p>
<p>Having your editor highlight unused variables can also help you remove clutter.
For example, it's common to have old imports that aren't used anymore, like <code>copy</code> and <code>requests</code> in this script:</p>
<p><img alt="imports-before" src="https://mattsegal.dev/img/pylance/imports-before.png"></p>
<p>It's often hard to see what imports are being used just by looking, which is why it's nice to
have your editor tell you:</p>
<p><img alt="imports-after" src="https://mattsegal.dev/img/pylance/imports-after.png"></p>
<p>You'll also note that there is an error in my import statement. <code>import copy from copy</code> isn't valid Python. This was an <em>unintentional mistake</em> in my example code that VS Code caught for me.</p>
<h2>Setting this up with VS Code</h2>
<p>You can get these variable highlights in VS Code very easily by installing <a href="https://devblogs.microsoft.com/python/announcing-pylance-fast-feature-rich-language-support-for-python-in-visual-studio-code/">PyLance</a>, and alternative "language server" for VS Code. A language server is a tool, which runs alongside the editor, that does <a href="https://en.wikipedia.org/wiki/Static_program_analysis">static analysis</a> of your code.</p>
<p>To get this language server, go into your extensions tab in VS Code, search for "pylance", install it, and then you'll see this popup:</p>
<p><img alt="server-prompt" src="https://mattsegal.dev/img/pylance/server-prompt.png"></p>
<p>Click "Yes, and reload".</p>
<h2>Alternatives</h2>
<p>PyCharm does this kind of <a href="https://en.wikipedia.org/wiki/Static_program_analysis">static analysis</a> out of the box. I don't like PyCharm quite so much as VS Code, but it's a decent editor and many people swear by it. You can also get this feature by enabling a Python linter in VS Code like flake8, pylint or autopep8. I don't like twiddling with linters, but again other people enjoy using them.</p>
<h2>Next steps</h2>
<p>If you're looking for more Python productivity helpers, then check out my blog post on the <a href="https://mattsegal.dev/python-formatting-with-black.html">Black</a> auto-formatter.</p>A Django project blueprint to help you learn by doing2020-10-03T12:00:00+10:002020-10-03T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-10-03:/django-survey-project.html<p>There's an awkward point when you're learning Django where you've done the <a href="https://docs.djangoproject.com/en/3.1/intro/install/">official tutorial</a> and maybe built a simple project, like a to-do list, and now you want to try something a little more advanced. People say that you should "learn by building things", which is good advice, but it …</p><p>There's an awkward point when you're learning Django where you've done the <a href="https://docs.djangoproject.com/en/3.1/intro/install/">official tutorial</a> and maybe built a simple project, like a to-do list, and now you want to try something a little more advanced. People say that you should "learn by building things", which is good advice, but it leaves you unsure about what to <em>actually build</em>. </p>
<p>In this post I'll share two things:</p>
<ul>
<li>a description of a Django project for beginners, which you can build; and</li>
<li>a short guide on how to design a new website from scratch</li>
</ul>
<p>I won't introduce many new tools or technical concepts beyond what is already in the Django tutorial. The project can be built using just the basic Django features. There is no need to use REST Framework, JavaScript, React, Webpack, Babel, JSON or AJAX to get this done. Only Django, HTML and CSS are required.</p>
<p>Even though this project only uses simple tools, I think building it is worthwhile for a beginner, since it will introduce you to many of the common themes of backend web development.</p>
<h1>Project overview</h1>
<p>In this project, you will build a Django app that runs a survey website. On this site, users can create surveys and send them out to other people to get answers. A user can sign up, create a survey and add multi-choice questions to it. They can then send a survey link to other people, who will answer all the questions. The user who created the survey can see how many people answered, and what percentage of people chose each multi-choice option.</p>
<p>That's the whole app. I have created a <a href="https://github.com/MattSegal/django-survey">reference implementation on my GitHub</a> which you can look at if you get stuck when building it yourself.</p>
<p>The project description sounds simple, doesn't it? I thought this would take me 8 hours to design and build, but I spent <strong>20 hours</strong> at the keyboard to get it done. Software projects are hard to estimate before they are built, since they have a <a href="http://johnsalvatier.org/blog/2017/reality-has-a-surprising-amount-of-detail">surprising amount of detail</a> that you don't think about beforehand.</p>
<h1>Designing the app</h1>
<p>So now you know what you're building, but you're not ready to write any code yet. We need to create a design first. As the saying goes: <em>weeks of coding can save hours of planning</em>. </p>
<p>This design will have three parts:</p>
<ul>
<li><strong>User journeys</strong>: where you decide who is using your app and how they will use it</li>
<li><strong>Data models</strong>: where you decide how you will structure the database</li>
<li><strong>Webpage wireframes</strong>: where you decide what your user interface (UI) will look like</li>
</ul>
<h1>User journey</h1>
<p>The most important thing to do when building a website is to consider the users and their goals. In this case, I think there are two sets of users:</p>
<ul>
<li><strong>Survey takers</strong>: people who want to answer a survey's questions</li>
<li><strong>Survey creators</strong>: people who want to create a survey, send it out and view the answers</li>
</ul>
<p>To better understand who your users are and what they want, you should construct a <a href="https://en.wikipedia.org/wiki/User_journey">user journey</a> for each of them: a high-level description of the steps that they will need to take to get what they want. This is easily represented as a diagram, created with a free wireframing tool like <a href="https://excalidraw.com/">Exalidraw</a> or <a href="https://wireflow.co/">Wireflow</a>.</p>
<p>Let's start with the person who is answering the survey, the "survey taker", who has a simple user journey:</p>
<p><img alt="user journey for survey taker" src="https://mattsegal.dev/journey-taker.png"></p>
<p>Next, let's look at the person who created the survey, the "survey creator":</p>
<p><img alt="user journey for survey creator" src="https://mattsegal.dev/journey-creator.png"></p>
<p>Creating these diagrams will force you to think about what you will need to build and why. For example, a survey creator will probably need a user account and the ability to "log in", since they will want private access to their surveys. Lots of thoughts about how to build your app will cross your mind when you are mapping these user journeys.</p>
<h1>Data models</h1>
<p>Once you know what your users want to do, you should focus on what data you will need to describe all of the things in your app. So far we have vague ideas of "surveys", "questions", "answers" and "results", but we need a more specific description of these things so that we can write our Model classes in Django.</p>
<p>To better understand your data, I recommend that you create a simple diagram that displays your models and how they relate to each other. Each connection between a model is some kind of foreign key relation. Something like this:</p>
<p><img alt="app data model" src="https://mattsegal.dev/data-model.png"></p>
<p>I explain how I came up with this particular data model in this <a href="https://mattsegal.dev/django-survey-project-data-model.html">appendix page</a>.</p>
<p>You don't need to get too formal or technical with these diagrams. They're just a starting point, not a perfect, final description of how your app will work. Also, the data model which I made isn't the only possible one for this app. Feel free to make your own and do it differently.</p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h1>Webpage wireframes</h1>
<p>Now we have an idea of how our users will interact with the app and we know how we will structure our data. Next, we design our user interfaces. I suggest you create a rough <a href="https://www.usability.gov/how-to-and-tools/methods/wireframing.html">wireframe</a> that describes the user interface for each webpage. Creating wireframes for webpages is a good idea for two reasons:</p>
<ul>
<li>Wireframing allows you to <strong>quickly</strong> explore different page designs and it forces you to think about how your app needs to work</li>
<li>It's <strong>much</strong> easier to write HTML and CSS for pages where you already have a simple design to work from</li>
</ul>
<p>You can use a free wireframing tool like <a href="https://excalidraw.com/">Exalidraw</a> or <a href="https://wireflow.co/">Wireflow</a> for these diagrams. Keep in mind that this project doesn't use JavaScript, so you can't get too fancy with custom interactions. You will need to use <a href="https://developer.mozilla.org/en-US/docs/Learn/Forms">HTML forms</a> to POST data to the backend.</p>
<p>You can create your own wireframes or you can use the ones that I've already created, which are all listed in this <a href="https://mattsegal.dev/django-survey-project-wireframes.html">appendix page</a> with some additional notes for each page:</p>
<ul>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#start">Starting the survey</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#answer">Answering the survey</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#submit">Survey submitted</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#landing">Landing page</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#signup">Signing up</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#login">Logging in</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#list">Survey list</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#create">Create a survey</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#edit">Edit a survey</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#addquestion">Add questions to a survey</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#addoption">Add options to a survey question</a></li>
<li><a href="https://mattsegal.dev/django-survey-project-wireframes.html#details">Survey details</a></li>
</ul>
<h1>General advice</h1>
<p>Now with some user journeys, a data model and a set of wireframes, you should be ready to start building your Django app. This project blueprint will help you get started, but there is still a lot of work for you to do if you want to build this app. You still need to:</p>
<ul>
<li>decide on a URL schema</li>
<li>create models to represent the data</li>
<li>create forms to validate the user-submitted data</li>
<li>write HTML templates to build each page</li>
<li>add views to bind everything together</li>
</ul>
<p>There's about 12 views, 12 templates, 5 forms and 5 models to write. Given all this work, it's really important that you <strong>focus</strong> and keep the scope of this project narrow. Keep everything <strong>simple</strong>. Don't use any JavaScript and write as little CSS as possible. Use a CSS framework like <a href="https://getbootstrap.com/docs/4.0/getting-started/introduction/">Boostrap</a> or <a href="https://semantic-ui.com/">Semantic UI</a> if you want it to look nice. Get something simple working <strong>first</strong>, and then you can make it fancy later. If you don't focus, you could spend weeks or months on this project before it's done.</p>
<p>As a specific example, consider the user authentication feature. In this app, your users can log in or sign up. To really make the auth system "complete", you could also add a log out button, a password reset page, and an email validation feature. I think you should skip these features for now though, and get the core functionality working first.</p>
<p>Software projects are never finished, and you can improve this app again and again even after you are "done". Don't try to make it perfect, just finish it.</p>
<h1>Next steps</h1>
<p>I hope you find this blueprint project and design guide helpful. If you actually end up building this, send me an email! I'd love to see it. If you like this post and you want to read some more stuff I've written about Django, check out:</p>
<ul>
<li><a href="https://mattsegal.dev/simple-django-deployment.html">A beginner's guide to Django deployment</a></li>
<li><a href="https://mattsegal.dev/how-to-read-django-docs.html">How to read the Django documentation</a></li>
<li><a href="https://mattsegal.dev/django-portable-setup.html">How to make your Django project easy to move and share </a></li>
<li><a href="https://mattsegal.dev/github-resume-polish.html">How to polish your GitHub projects when you're looking for a job</a></li>
<li><a href="https://mattsegal.dev/django-debug-tips.html">Tips for debugging with Django</a></li>
</ul>
<p>You can also subscribe to my mailing list below for emails when I post new articles.</p>Django project blueprint: data model2020-10-03T12:00:00+10:002020-10-03T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-10-03:/django-survey-project-data-model.html<p>This post is an appendix to my post on <a href="https://mattsegal.dev/django-survey-project.html">designing a Django project</a>. In this page I explain why I chose to use this data model:</p>
<p><img alt="app data model" src="https://mattsegal.dev/data-model.png"></p>
<p>I created this data model by looking at the user journeys and thinking about what data I would need to make them work. Here's …</p><p>This post is an appendix to my post on <a href="https://mattsegal.dev/django-survey-project.html">designing a Django project</a>. In this page I explain why I chose to use this data model:</p>
<p><img alt="app data model" src="https://mattsegal.dev/data-model.png"></p>
<p>I created this data model by looking at the user journeys and thinking about what data I would need to make them work. Here's the thought process I used. First I thought about the data that I need to define everything about a "survey" in the app. I decided that I would need:</p>
<ul>
<li>a <strong>Survey</strong> model to represent each survey; and then</li>
<li>a link between each <strong>Survey</strong> and a <strong>User</strong>, since we need to restrict survey access to only the user who owns it</li>
<li>a <strong>Question</strong> model for each question on the survey. Each survey needs to have one or more questions, so we can't hardcode questions as fields on the <strong>Survey</strong> model, so we must create a new <strong>Question</strong> model which knows which survey owns it</li>
<li>each <strong>Question</strong> has one re more multi-choice answer options, so we must create an <strong>Option</strong> model</li>
</ul>
<p>Next, I thought about how we would record a survey taker answering the questions. We would need:</p>
<ul>
<li>a <strong>Submission</strong> model to represent each survey taker's submission</li>
<li>a link between <strong>Submission</strong> and <strong>Survey</strong>, so each submission can know which survey it belongs to</li>
<li>the <strong>Answers</strong> to each question, where the answer is for a particular <strong>Option</strong></li>
</ul>Django project blueprint: wireframes2020-10-03T12:00:00+10:002020-10-03T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-10-03:/django-survey-project-wireframes.html<p>This post is an appendix to my post on <a href="https://mattsegal.dev/django-survey-project.html">designing a Django project</a>. This page shows all the wireframes for the app, with some additional notes for each page.</p>
<h1>Page designs for the user who answers the survey</h1>
<p>This section covers the pages required for the "survey taker" user journey …</p><p>This post is an appendix to my post on <a href="https://mattsegal.dev/django-survey-project.html">designing a Django project</a>. This page shows all the wireframes for the app, with some additional notes for each page.</p>
<h1>Page designs for the user who answers the survey</h1>
<p>This section covers the pages required for the "survey taker" user journey:</p>
<p><img alt="journey for survey taker" src="https://mattsegal.dev/journey-taker.png"></p>
<p>Taken literally, this journey suggests that we should build ~3 pages. </p>
<h2 id="start">Starting the survey</h2>
<p>The person taking the survey should start on a "landing" page, where we explain what's going on and invite them to take the survey.</p>
<p><img alt="landing" src="https://mattsegal.dev/page-start.png"></p>
<p>The "start survey" button can just be a link to the next page.</p>
<h2 id="answer">Answering the survey</h2>
<p>Next, we need a page for the survey taker to actually answer the questions.</p>
<p><img alt="survey answering page" src="https://mattsegal.dev/page-submit.png"></p>
<p>You will need to render all of the questions on the survey inside an HTML form. The "submit" button should trigger a POST request to the backend. </p>
<p>If you want to answer multiple questions on one page, then you will need to use a more advanced feature of Django: a formset. I found <a href="https://whoisnicoleharris.com/2015/01/06/implementing-django-formsets.html">this blog post</a> and <a href="https://jacobian.org/2010/feb/28/dynamic-form-generation/">this other one</a> useful for creating my formsets, along with the <a href="https://docs.djangoproject.com/en/3.1/topics/forms/formsets/">official Django docs on formsets</a>.</p>
<p>Alternatively, you could have one page per question, which would mean splitting up this single page across multiple pages, but it would make your Django forms simpler.</p>
<h2 id="submit">Survey submitted</h2>
<p>Once the user submits their answers for the survey, they should then receive confirmation that everything worked so that they don't try to submit the survey again or get frustrated. When they click "submit", let's take them to a "thank you" page.</p>
<p><img alt="thanks page" src="https://mattsegal.dev/page-thanks.png"></p>
<p>That's it for the survey taker. Next let's look at the survey creator pages.</p>
<h1 >Page designs for the user who creates the survey</h1>
<p>Here's the "survey creator" user journey again.</p>
<p><img alt="creator journey" src="https://mattsegal.dev/journey-creator.png"></p>
<p>The correspondence between this journey and the pages won't be exact, but it'll be pretty close. </p>
<h2 id="landing">Landing page</h2>
<p>We should start the user's experience with a landing page, where we will explain the app to the user and invite them to log in with a <a href="https://en.wikipedia.org/wiki/Call_to_action_(marketing)">call to action</a> button.</p>
<p><img alt="landing page" src="https://mattsegal.dev/page-landing.png"></p>
<p>The button can just be a link to the login or signup page. If you're not sure what to write for the landing page, check out <a href="https://stackingthebricks.com/how-i-increased-conversion-2-4x-with-better-copywriting/">this article</a>.</p>
<h2 id="signup">Signing up</h2>
<p>We need a signup page for new users to create accounts.</p>
<p><img alt="sign up" src="https://mattsegal.dev/page-signup.png"></p>
<p>There should also be a link to the log in page from the signup page, just in case a user who alread has an account gets lost. This <a href="https://simpleisbetterthancomplex.com/tutorial/2017/02/18/how-to-create-user-sign-up-view.html">blog post</a> is a good guide for how to create a sign up view in Django.</p>
<h2 id="login">Logging in</h2>
<p>We also need a login page for returning users.</p>
<p><img alt="login page" src="https://mattsegal.dev/page-login.png"></p>
<p>Use a <a href="https://docs.djangoproject.com/en/3.1/topics/auth/default/#django.contrib.auth.views.LoginView">LoginView</a> for the log in view. More details on this view class at <a href="https://ccbv.co.uk/projects/Django/3.0/django.contrib.auth.views/LoginView/">CCBV</a>. There should be a link to the signup page from the login page.</p>
<h2 id="list">Survey list</h2>
<p>Where do users go after they log in? There are two viable options. You could send them straight to a "create survey" page, or you could send them to a "list" page, where they can see all their surveys. I chose the list page option, becuase I think it's less disorienting for the user and less complicated to implement.</p>
<p><img alt="survey list" src="https://mattsegal.dev/page-list.png"></p>
<p>For this page to work you'll need to grab all of the Survey objects that the user has created and list them in the HTML template. </p>
<p>In this wireframe, a survey can be either in an "active" or "editing" state, where if the survey is "active" then the user can view the results and if it is "editing" then they can add more questions.</p>
<p>This is the first page we've seen that is specific to one user. You need to implement <a href="https://en.wikipedia.org/wiki/Authorization">authorization</a> so that one user cannot spy on another user's surveys.</p>
<h2 id="create">Create survey</h2>
<p>On this page a user types in the name of a new survey, and presses "create survey" to create a new survey with that name.</p>
<p><img alt="create survey" src="https://mattsegal.dev/page-create.png"></p>
<p>This can be implented with a HTML form which sends a POST request to a Django view. You will need a Django Form to validate the data.</p>
<p>I have broken the "survey creation" pages (this page an the ones after it) up into many stages to try and make the Django views simple. This is not the only way to design pages for the "survey creation" feature, and you can do this differently, with fewer pages, if you like. </p>
<p>You will need to think about authorization for this view, and all the other views where the user can change data. We don't want users to be able to change the data of other users. You will need to write some code in your views to check that the user who is changing some data is also the user who owns it.</p>
<h2 id="edit">Edit survey</h2>
<p>On this page a user can add questions to the survey they just created. </p>
<p><img alt="edit survey" src="https://mattsegal.dev/page-edit.png"></p>
<p>Clicking "add another question" takes the user to a seperate "add question" page.
The user can add as many questions as they like until they are ready to make the survey "active".</p>
<p>When they click "start survey", the button should use an HTML form to send a POST request to a Django view which moves the survey from "edit mode" to "active mode".</p>
<h2 id="addquestion">Add a question to survey</h2>
<p>On this page the user can create a new question for the survey. They type in the prompt for the question, like "what is your favourite colour?" and then click "add question" to create the new question.</p>
<p><img alt="create questions" src="https://mattsegal.dev/page-question-create.png"></p>
<h2 id="addoption">Add options to a new question</h2>
<p>On this page the user can add multiple options to a question that they just created.</p>
<p><img alt="add options" src="https://mattsegal.dev/page-option-create.png"></p>
<h2 id="details">Survey details</h2>
<p>This is the final page that a user who is running a survey wants to look at. They will view this dashboard to check the answers of a survey that they've created and sent out.</p>
<p>This page tells the user how many people have answered their survey and what percentage of people chose each answer.</p>
<p><img alt="survey details" src="https://mattsegal.dev/page-details.png"></p>
<p>You will need to do a bit of maths in the view for this page. You can calculate the percentages using some fancy database queries using <a href="https://docs.djangoproject.com/en/3.1/topics/db/aggregation/">aggregation</a>. Otherwise you can query the Survey model, its Questions and all of its Submissions and their Answers. Once you have pulled all the data you need into memory, then you can write a for loop or something to do the percentage calculations. I recommend using <code>filter</code> and <code>count</code> in your queries.</p>
<p>When thinking about database queries for this view, you should imagine that you have thousands of surveys and each survey has dozens of questions and hundreds of answers.</p>
<p>You will need to implement authorization on in this page's view so that only the user who created the survey can view the results.</p>How to use both camelCase and snake_case in your frontend and backend2020-09-24T12:00:00+10:002020-09-24T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-09-24:/camel-and-snake-case.html<p>Python uses <code>snake_case</code> variable naming while JavaScript favours <code>camelCase</code>.
When you're buiding an web API with Django then you'll be using both langauges together. How do you keep your styles consistent? You <em>could</em> just use one style for both your frontend and backend, but it looks ugly. Perhaps this …</p><p>Python uses <code>snake_case</code> variable naming while JavaScript favours <code>camelCase</code>.
When you're buiding an web API with Django then you'll be using both langauges together. How do you keep your styles consistent? You <em>could</em> just use one style for both your frontend and backend, but it looks ugly. Perhaps this is not the biggest problem in your life right now, but it's a nice one to solve and it's easy to fix.</p>
<p>In this post I'll show you can use snake case on the backend and camel case on the frontend, with the help of the the <code>camelize</code> and <code>snakeize</code> JS libraries.</p>
<h3>The problem: out of place naming styles</h3>
<p>Let's say you've got some Django code that presents an API for a <code>Person</code> model:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Inside your Django app.</span>
<span class="c1"># The data model</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">full_name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">biggest_problem</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="c1"># The serializer</span>
<span class="k">class</span> <span class="nc">PersonSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">ModelSerializer</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Person</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"full_name"</span><span class="p">,</span> <span class="s2">"biggest_problem"</span><span class="p">]</span>
<span class="c1"># The API view</span>
<span class="k">class</span> <span class="nc">PersonViewSet</span><span class="p">(</span><span class="n">viewsets</span><span class="o">.</span><span class="n">ModelViewSet</span><span class="p">):</span>
<span class="n">serializer_class</span> <span class="o">=</span> <span class="n">PersonSerializer</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
</code></pre></div>
<p>And you've also got some JavaScript code that talks to this view:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Inside your frontend JavaScript codebase.</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">createPerson</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">personData</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">requestData</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">method</span><span class="o">:</span><span class="w"> </span><span class="s1">'POST'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">personData</span><span class="p">),</span><span class="w"></span>
<span class="w"> </span><span class="c1">// etc.</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">fetch</span><span class="p">(</span><span class="s1">'/api/person/'</span><span class="p">,</span><span class="w"> </span><span class="nx">requestData</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The problem occurs when you try to use the data fetched from the backend and it is using the wrong variable naming style:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Inside your frontend JavaScript codebase.</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">personData</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">full_name</span><span class="o">:</span><span class="w"> </span><span class="s1">'Matt Segal'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">biggest_problem</span><span class="o">:</span><span class="w"> </span><span class="s1">'My pants are too red'</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">person</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">createPerson</span><span class="p">(</span><span class="nx">personData</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"></span>
<span class="c1">// {</span><span class="w"></span>
<span class="c1">// full_name: 'Matt Segal',</span><span class="w"></span>
<span class="c1">// biggest_problem: 'My pants are too red',</span><span class="w"></span>
<span class="c1">// }</span><span class="w"></span>
</code></pre></div>
<p>This usage of snake case in JavaScript is a little yucky and it's a quick fix.</p>
<h3>The solution: install more JavaScript libraries</h3>
<p>Hint: the solution is always to add more dependencies.</p>
<p>To fix this we'll install <a href="https://www.npmjs.com/package/snakeize">snakeize</a> and <a href="https://www.npmjs.com/package/camelize">camelize</a> using npm or yarn:</p>
<div class="highlight"><pre><span></span><code>yarn add snakeize camelize
</code></pre></div>
<p>Then you just need to include it in your frontend's API functions:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Inside your frontend JavaScript codebase.</span><span class="w"></span>
<span class="k">import</span><span class="w"> </span><span class="nx">camelize</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'camelize'</span><span class="w"></span>
<span class="k">import</span><span class="w"> </span><span class="nx">snakeize</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'snakeize'</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">createPerson</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">personData</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">requestData</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">method</span><span class="o">:</span><span class="w"> </span><span class="s1">'POST'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">snakeize</span><span class="p">(</span><span class="nx">personData</span><span class="p">)),</span><span class="w"></span>
<span class="w"> </span><span class="c1">// etc.</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">fetch</span><span class="p">(</span><span class="s1">'/api/person/'</span><span class="p">,</span><span class="w"> </span><span class="nx">requestData</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">responseData</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">resp</span><span class="p">.</span><span class="nx">json</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">camelize</span><span class="p">(</span><span class="nx">responseData</span><span class="p">)</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now we can use <code>camelCase</code> in the frontend and it will automatically be transformed to <code>snake_case</code> before it gets sent to the backend:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// Inside your frontend JavaScript codebase.</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">personData</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">fullName</span><span class="o">:</span><span class="w"> </span><span class="s1">'Matt Segal'</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nx">biggestProblem</span><span class="o">:</span><span class="w"> </span><span class="s1">'I ate too much fish'</span><span class="p">,</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">person</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">createPerson</span><span class="p">(</span><span class="nx">personData</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"></span>
<span class="c1">// {</span><span class="w"></span>
<span class="c1">// fullName: 'Matt Segal',</span><span class="w"></span>
<span class="c1">// biggestProblem: 'I ate too much fish',</span><span class="w"></span>
<span class="c1">// }</span><span class="w"></span>
</code></pre></div>
<p>That's it! Hope this helps your eyes a little.</p>A breakdown of how NGINX is configured with Django2020-07-31T12:00:00+10:002020-07-31T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-07-31:/nginx-django-reverse-proxy-config.html<p>You are trying to deploy your Django web app to the internet.
You have never done this before, so you follow a guide like <a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-16-04">this one</a>.
The guide gives you many instructions, which includes installing and configuring an "NGINX reverse proxy".
At some point you mutter to yourself:</p>
<blockquote>
<p>What-the-hell is …</p></blockquote><p>You are trying to deploy your Django web app to the internet.
You have never done this before, so you follow a guide like <a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-16-04">this one</a>.
The guide gives you many instructions, which includes installing and configuring an "NGINX reverse proxy".
At some point you mutter to yourself:</p>
<blockquote>
<p>What-the-hell is an NGINX? Eh, whatever, let's keep reading.</p>
</blockquote>
<p>You will have to copy-paste some weird gobbledygook into a file, which looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># NGINX site config file at /etc/nginx/sites-available/myproject</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">foo.com</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">Host</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_redirect</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="w"> </span><span class="s">http://foo.com</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/static/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">root</span><span class="w"> </span><span class="s">/home/myuser/myproject</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>What is all this stuff? What is it supposed to do?</p>
<p>Most people do their first Django deployment as a learning exercise.
You want to understand what you are doing, so that you can fix problems if you get stuck
and so you don't need to rely on guides in the future.
In this post I'll break down the elements of this NGINX config and how it ties in with Django,
so that you can confidently debug, update and extend it in the future.</p>
<h2>What is this file supposed to achieve?</h2>
<p>This scary-looking config file sets up NGINX so that it acts as the entrypoint to your Django application.
Explaining <em>why</em> you might choose to use NGINX is a topic too expansive for this post, so I'm just going to stick to explaining
how it works.</p>
<p>NGINX is completely separate program to your Django app.
It is running inside its own process, while Django is running inside a <a href="https://mattsegal.dev/simple-django-deployment-2.html#wsgi">WSGI server</a> process, such as Gunicorn.
In this post I will sometimes refer to Gunicorn and Django interchangeably.</p>
<p><img alt="nginx as a separate process" src="https://mattsegal.dev/nginx-separate-process.png"></p>
<p>All HTTP requests that hit your Django app have to go through NGINX first.</p>
<p><img alt="nginx proxy" src="https://mattsegal.dev/nginx-proxy.png"></p>
<p>NGINX listens for incoming HTTP requests on port 80 and HTTPS requests on port 443.
When a new request comes in:</p>
<ul>
<li>NGINX looks at the request, checks some rules, and sends it on to your WSGI server, which is usually listening on localhost, port 8000</li>
<li>Your Django app will process the request and eventually produce a response</li>
<li>Your WSGI server will send the response back to NGINX; and then</li>
<li>NGINX will send the response back out to the original requesting client</li>
</ul>
<p>You can also configure NGINX to serve static files, like images, directly from the filesystem, so that requests for these assets don't need to go through Django</p>
<p><img alt="nginx proxy with static files" src="https://mattsegal.dev/nginx-static-proxy.png"></p>
<p>You can adjust the rules in NGINX so that it selectively routes requests to multiple app servers. You could, for example, run a Wordpress site and a Django app from the same server:</p>
<p><img alt="nginx multi proxy" src="https://mattsegal.dev/nginx-multi-proxy.png"></p>
<p>Now that you have a general idea of what NGINX is supposed to do, let's go over the config file that makes this happen.</p>
<h2>Server block</h2>
<p>The top level block in the NGINX config file is the <a href="https://docs.nginx.com/nginx/admin-guide/web-server/web-server/#setting-up-virtual-servers">virtual server</a>.
The main utility of virtual servers is that they allow you to sort incoming requests based on the port and hostname.
Let's start by looking at a basic server block:</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Listen on port 80 for incoming requests.</span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Return status code 200 with text "Hello World".</span>
<span class="w"> </span><span class="kn">return</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="s">'Hello</span><span class="w"> </span><span class="s">World'</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Let me show you some example requests. Say we're on the same server as NGINX and we send a GET request using the command line tool <code>curl</code>.</p>
<div class="highlight"><pre><span></span><code>curl localhost
<span class="c1"># Hello World</span>
</code></pre></div>
<p>This <code>curl</code> command sends the following <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages">HTTP request</a> to localhost, port 80:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>We will get the following HTTP response back from NGINX, with a 200 OK status code and "Hello World" in the body:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/octet-stream</span>
<span class="na">Content-Length</span><span class="o">:</span> <span class="l">11</span>
Hello World
</code></pre></div>
<p>We can also request some random path and we get the same result:</p>
<div class="highlight"><pre><span></span><code>curl localhost/some/path/on/website
<span class="c1"># Hello World</span>
</code></pre></div>
<p>With <code>curl</code> sending this HTTP request: </p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/some/path/on/website</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>and we get back the same response as before:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">200</span> <span class="ne">OK</span>
<span class="na">Content-Type</span><span class="o">:</span> <span class="l">application/octet-stream</span>
<span class="na">Content-Length</span><span class="o">:</span> <span class="l">11</span>
Hello World
</code></pre></div>
<p>Simple so far, but not very interesting, let's start to mix it up with multiple server blocks.</p>
<h2>Multiple virtual servers</h2>
<p>You can add more than one virtual server in NGINX:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># All requests to foo.com return a 200 OK status code</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">foo.com</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">return</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="s">'Welcome</span><span class="w"> </span><span class="s">to</span><span class="w"> </span><span class="s">foo.com!'</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1"># Any other requests get a 404 Not Found page</span>
<span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="w"> </span><span class="s">default_server</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">return</span><span class="w"> </span><span class="mi">404</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>NGINX uses the <code>server_name</code> directive to check the <code>Host</code> header of incoming requests and match the request to a virtual server. Your web browser will usually set this header automatically for you.
You can set up a particular virtual server to be the default choice (<code>default_server</code>) if no other ones match the incoming request. You can use this feature to host multiple
Django apps on a single server. All you need to do is <a href="https://mattsegal.dev/dns-for-noobs.html">set up your DNS</a> to get multiple domain names to point to a single server, and then add a virtual server for each Django app.</p>
<p>Let's test out the config above. If send a request to <code>localhost</code>, we'll get a 404 status code from the default server:</p>
<div class="highlight"><pre><span></span><code>curl localhost
<span class="c1"># <html></span>
<span class="c1"># <head><title>404 Not Found</title></head></span>
<span class="c1"># ...</span>
<span class="c1"># </html></span>
</code></pre></div>
<p>This is the request that gets sent:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">localhost</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>Our request was matched to the default server because the <code>Host</code> header we sent didn't match <code>foo.com</code>. Let's try setting the <code>Host</code> header to <code>foo.com</code>:</p>
<div class="highlight"><pre><span></span><code>curl localhost --header <span class="s2">"Host: foo.com"</span>
<span class="c1"># Welcome to foo.com!</span>
</code></pre></div>
<p>This is the request that gets sent:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">foo.com</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>Now are directed to the <code>foo.com</code> virtual server because we sent the correct <code>Host</code> header in our request.
Finally, we can see that setting a random <code>Host</code> header sends us to the default server:</p>
<div class="highlight"><pre><span></span><code>curl localhost --header <span class="s2">"Host: fasfsadfs.com"</span>
<span class="c1"># <html></span>
<span class="c1"># <head><title>404 Not Found</title></head></span>
<span class="c1"># ...</span>
<span class="c1"># </html></span>
</code></pre></div>
<p>There's <a href="https://docs.nginx.com/nginx/admin-guide/web-server/web-server/#setting-up-virtual-servers">more</a> that you can do with virtual servers in NGINX,
but what we've covered so far should be enough for you to understand their typical usage with Django. </p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Location blocks</h2>
<p>Within a virtual server you can route the request based on the path.</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Requests to the root path get a 200 OK response</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">return</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="s">'Cool!'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Requests to /forbidden get 403 Forbidden response</span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/forbidden</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">return</span><span class="w"> </span><span class="mi">403</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Under this configuration, any requested path that matches <code>/forbidden</code> will return a 403 Forbidden status code, and everything else will return <em>Cool!</em> Let's try it out:</p>
<div class="highlight"><pre><span></span><code>curl localhost
<span class="c1"># Cool!</span>
curl localhost/blah/blah/blah
<span class="c1"># Cool!</span>
curl localhost/forbidden
<span class="c1"># <html></span>
<span class="c1"># <head><title>403 Forbidden</title></head></span>
<span class="c1"># ...</span>
<span class="c1"># </html></span>
curl localhost/forbidden/blah/blah/blah
<span class="c1"># <html></span>
<span class="c1"># <head><title>403 Forbidden</title></head></span>
<span class="c1"># ...</span>
<span class="c1"># </html></span>
</code></pre></div>
<p>Now that we've covered <code>server</code> and <code>location</code> blocks it should be easier to make sense of some of the config that I showed you at the start of this post:</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">server_name</span><span class="w"> </span><span class="s">foo.com</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Do something...</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/static/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Do something...</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Next we'll dig into the connection between NGINX and our WSGI server.</p>
<h2>Reverse proxy location</h2>
<p>As mentioned earlier, NGINX acts as a <a href="https://en.wikipedia.org/wiki/Reverse_proxy#:~:text=In%20computer%20networks%2C%20a%20reverse,from%20the%20proxy%20server%20itself.">reverse proxy</a> for Django:</p>
<p><img alt="nginx proxy" src="https://mattsegal.dev/nginx-proxy.png"></p>
<p>This reverse proxy setup is configured within this location block:</p>
<div class="highlight"><pre><span></span><code><span class="k">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">Host</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_redirect</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="w"> </span><span class="s">http://foo.com</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>In the next few sections I will break down the directives in this block so that you understand what is going on.
You might also find the NGINX documentation on <a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/">reverse proxies</a> helpful for understanding this config.</p>
<h2>Proxy pass</h2>
<p>The <code>proxy_pass</code> directive tells NGINX to send all requests for that location to the specified address.
For example, if your WSGI server was running on localhost (which has IP 127.0.0.1), port 8000, then you would use this config:</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>You can also point <code>proxy_pass</code> at a <a href="https://en.wikipedia.org/wiki/Unix_domain_socket#:~:text=A%20Unix%20domain%20socket%20or,the%20same%20host%20operating%20system.">Unix domain socket</a>, with Gunicorn listening on that socket, which is very similar to using localhost except it doesn't use up a port number and it's a bit faster:</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://unix:/home/user/my-socket-file.sock</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Seems simple enough - you just point NGINX at your WSGI server, so... what was all that other crap? Why do you set <code>proxy_set_header</code> and <code>proxy_redirect</code>? That's what we'll discuss next.</p>
<h2>NGINX is lying to you</h2>
<p>As a reverse proxy, NGINX will receive HTTP requests from clients and then send those requests to our Gunicorn WSGI server.
The problem is that NGINX hides information from our WSGI server. The HTTP request that Gunicorn receives is not the same as the one that NGINX received from the client.</p>
<p><img alt="nginx hiding info" src="https://mattsegal.dev/nginx-hide-info.png"></p>
<p>Let me give you an example, which is illustrated above. You, the client, have an IP of <code>12.34.56.78</code> and you go to <code>https://foo.com</code> in your web browser and try to load the page. The request hits the server on port 443 and is read by NGINX. At this stage, NGINX knows that:</p>
<ul>
<li>the protocol is <a href="https://www.cloudflare.com/learning/ssl/what-is-https/">HTTPS</a></li>
<li>the client has an IP address of <code>12.34.56.78</code></li>
<li>the request is for the host <code>foo.com</code></li>
</ul>
<p>NGINX then sends the request onwards to Gunicorn. When Gunicorn receives this request, it thinks:</p>
<ul>
<li>the protocol is HTTP, not HTTPS, because the connection between NGINX and Gunicorn is not encrypted</li>
<li>the client has the IP address <code>127.0.0.1</code>, because that's the address NGINX is using</li>
<li>the host is <code>127.0.0.1:8000</code> because NGINX said so</li>
</ul>
<p>Some of this lost information is useful, and we want to force NGINX to send it to our WSGI server. That's what these lines are for:</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">Host</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span><span class="w"></span>
<span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span><span class="w"></span>
<span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>Next, I will explain each line in more detail.</p>
<h2>Setting the Host header</h2>
<p>Django would like to know the value of the <code>Host</code> header so that various bits of the framework, like <a href="https://docs.djangoproject.com/en/3.0/ref/settings/#allowed-hosts">ALLOWED_HOSTS</a> or <a href="https://docs.djangoproject.com/en/3.0/ref/request-response/#django.http.HttpRequest.get_host">HttpRequest.get_host</a> can work. The problem is that NGINX does not pass the <code>Host</code> header to proxied servers by default.</p>
<p>For example, when I'm using <code>proxy_pass</code> like I did in the previous section, and I send a request with the <code>Host</code> header to NGINX like this:</p>
<div class="highlight"><pre><span></span><code>curl localhost --header <span class="s2">"Host: foo.com"</span>
</code></pre></div>
<p>Then NGINX receives the HTTP request, which looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">foo.com</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>and then NGINX sends a HTTP request to your WSGI server, like this:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.0</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">127.0.0.1:8000</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>Notice something? That rat-fuck-excuse-for-a-webserver sent different headers to our WSGI server!
I'm sure there is a good reason for this behaviour, but it's not what we want because it breaks some Django functionality.
We can fix this by using the <code>proxy_set_header</code> as follows:</p>
<div class="highlight"><pre><span></span><code><span class="k">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">listen</span><span class="w"> </span><span class="mi">80</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">proxy_pass</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Ensure original Host header is forwarded to our Django app.</span>
<span class="w"> </span><span class="kn">proxy_set_header</span><span class="w"> </span><span class="s">Host</span><span class="w"> </span><span class="nv">$host</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now NGINX will send the desired headers to Django:</p>
<div class="highlight"><pre><span></span><code><span class="nf">GET</span> <span class="nn">/</span> <span class="kr">HTTP</span><span class="o">/</span><span class="m">1.0</span>
<span class="na">Host</span><span class="o">:</span> <span class="l">foo.com</span>
<span class="na">User-Agent</span><span class="o">:</span> <span class="l">curl/7.58.0</span>
</code></pre></div>
<p>Gunicorn will read this <code>Host</code> header and provide it to you in your Django views via the <code>request.META</code> object:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">host</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="p">[</span><span class="s1">'HTTP_HOST'</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">host</span><span class="p">)</span> <span class="c1"># Eg. "foo.com"</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Got host </span><span class="si">{</span><span class="n">host</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<h2>Setting the X-Forwarded-Whatever headers</h2>
<p>The <code>Host</code> header isn't the only useful information that NGINX does not pass to Gunicorn. We would also like the protocol and source IP address of the client request
to be passed to our WSGI server. We achieve this with these two lines:</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span><span class="w"></span>
<span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>I just want to point out that these header names are completely arbitrary. You can send any header you want with the format <code>X-Insert-Words-Here</code> to Gunicorn and it will parse it and send it onwards to Django. For example, you could set the header to be <code>X-Matt-Is-Cool</code> as follows:</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Matt-Is-Cool</span><span class="w"> </span><span class="s">'it</span><span class="w"> </span><span class="s">is</span><span class="w"> </span><span class="s">true'</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>Now NGINX will include this header with every request it sends to Gunicorn. When Gunicorn parses the HTTP request it reads <strong>any</strong> header with the format <code>X-Insert-Words-Here</code> into a Python dictionary, which ends up in the <code>HttpRequest</code> object that Django passes to your view. So in this case, <code>X-Matt-Is-Cool</code> gets turned into the key <code>HTTP_X_MATT_IS_COOL</code> in your request object. For example:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="c1"># Prints value of X-Matt-Is-Cool header included by NGINX</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="p">[</span><span class="s2">"HTTP_X_MATT_IS_COOL"</span><span class="p">])</span> <span class="c1"># it is true</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="s2">"Hello World"</span><span class="p">)</span>
</code></pre></div>
<p>This means you can add in whatever custom headers you like to your NGINX config, but for now let's focus on getting the protocol and client IP address to your Django app.</p>
<h2>Setting the X-Forwarded-Proto header</h2>
<p>Django sometimes needs to know whether the incoming request is secure (HTTPS) or not (HTTP). For example, some features of the <a href="https://docs.djangoproject.com/en/3.0/ref/middleware/#http-strict-transport-security">SecurityMiddleware</a> class checks for HTTPS. The problem is, of course, that NGINX is <em>always</em> telling Django that the client's request to the sever is not secure, even when it is. This problem always crops up for me when I'm implementing pagination, and the "next" URL has <code>http://</code> instead of <code>https://</code> like it should. </p>
<p>Our fix for this is to put the client request protocol into a header called <code>X-Forwarded-Proto</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-Proto</span><span class="w"> </span><span class="nv">$scheme</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>Then you need to set up the <a href="https://docs.djangoproject.com/en/3.0/ref/settings/#secure-proxy-ssl-header">SECURE_PROXY_SSL_HEADER</a> setting to read this header in your <code>settings.py</code> file:</p>
<div class="highlight"><pre><span></span><code><span class="n">SECURE_PROXY_SSL_HEADER</span> <span class="o">=</span> <span class="p">(</span><span class="s1">'HTTP_X_FORWARDED_PROTO'</span><span class="p">,</span> <span class="s1">'https'</span><span class="p">)</span>
</code></pre></div>
<p>Now Django can tell the difference between incoming HTTP requests and HTTPS requests. </p>
<h2>Setting the X-Forwarded-For header</h2>
<p>Now let's talk about determining the client's IP address. As mentioned before, NGINX will always lie to you and say that the client IP address is <code>127.0.0.1</code>.
If you don't care about client IP addresses, then you don't care about this header. You don't need to set it if you don't want to. Knowing the client IP might be useful sometimes. For example, if you want to guess at where they are located, or if you are building one of those <a href="https://www.expressvpn.com/what-is-my-ip"><em>What's My IP?</em></a> websites:</p>
<p><img alt="some website knows my ip address" src="https://mattsegal.dev/my-ip.png"></p>
<p>You can set the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For">X-Forwarded-For</a> header to tell Gunicorn the original IP address of the client: </p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_set_header</span><span class="w"> </span><span class="s">X-Forwarded-For</span><span class="w"> </span><span class="nv">$proxy_add_x_forwarded_for</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>As described earlier, the header <code>X-Forwarded-For</code> gets turned into the key <code>HTTP_X_FORWARDED_FOR</code> in your request object. For example:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="c1"># Prints client IP address: "12.34.56.78"</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="p">[</span><span class="s2">"HTTP_X_FORWARDED_FOR"</span><span class="p">])</span>
<span class="c1"># Prints NGINX IP address: "127.0.0.1", ie. localhost</span>
<span class="nb">print</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">META</span><span class="p">[</span><span class="s2">"REMOTE_ADDR"</span><span class="p">])</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="s2">"Hello World"</span><span class="p">)</span>
</code></pre></div>
<p>Does this seem kind of underwhelming? Maybe a little pointless? As I said before, if you don't care about client IP addresses, then this header isn't for you.</p>
<h2>Proxy redirect</h2>
<p>Let's cover the final line of the Django reverse proxy config: <code>proxy_redirect</code>.
The NGINX docs for this directive are <a href="http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_redirect">here</a>.</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_redirect</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="w"> </span><span class="s">http://foo.com</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>This directive is used when handling redirects that are issued by Django.
For example, you might have a webpage that used to live at path <code>old/page/</code>, but you moved it to <code>new/page/</code>.
You want to send any user that asked for <code>old/page/</code> to <code>new/page/</code>.
To achieve this you could write a Django view like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># view.py</span>
<span class="k">def</span> <span class="nf">redirect_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">return</span> <span class="n">HttpResponseRedirect</span><span class="p">(</span><span class="s2">"new/page/"</span><span class="p">)</span>
</code></pre></div>
<p>When a user asks for <code>old/page/</code>, this view will send them a HTTP response with a 302 redirect status code:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span>
<span class="na">Location</span><span class="o">:</span> <span class="l">new/page/</span>
</code></pre></div>
<p>Your web browser will follow the <code>Location</code> response header to the new page.
A problem occurs when your Django app includes the WSGI server's address and port in the <code>Location</code> header:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span>
<span class="na">Location</span><span class="o">:</span> <span class="l">http://127.0.0.1:8000/new/page/</span>
</code></pre></div>
<p>This is a problem because the client's browser will try to go to that address, and it will fail because the WSGI server is not
on the same server as the client.</p>
<p>Here's the thing: I have never actually seen this happen, and I'm having trouble thinking of a common scenario where this would happen.
Send me an email if you know where this issue crops up. Anyway, using <code>proxy_redirect</code> helps in the hypothetical case where Django does include the WSGI address
in a redirect's <code>Location</code> header.</p>
<p>The directive rewrites the header using the syntax:</p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_redirect</span><span class="w"> </span><span class="s">redirect</span><span class="w"> </span><span class="s">replacement</span><span class="w"></span>
</code></pre></div>
<p>So, for example, if there was a redirect response like this:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span>
<span class="na">Location</span><span class="o">:</span> <span class="l">http://127.0.0.1:8000/new/page/</span>
</code></pre></div>
<p>and you set up your <code>proxy_redirect</code> like this </p>
<div class="highlight"><pre><span></span><code><span class="k">proxy_redirect</span><span class="w"> </span><span class="s">http://127.0.0.1:8000</span><span class="w"> </span><span class="s">https://foo.com/blog/</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>then the outgoing response would be re-written to this:</p>
<div class="highlight"><pre><span></span><code><span class="kr">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">302</span> <span class="ne">Found</span>
<span class="na">Location</span><span class="o">:</span> <span class="l">https://foo.com/blog/new/page/</span>
</code></pre></div>
<p>I guess this directive might be useful in some situations? I'm not really sure.</p>
<h2>Static block</h2>
<p>Earlier I mentioned that NGINX can serve static files directly from the filesystem.</p>
<p><img alt="nginx proxy with static files" src="https://mattsegal.dev/nginx-static-proxy.png"></p>
<p>This is a good idea because NGINX is much more efficient at doing this than your WSGI server will be.
It means that your server will be able to respond faster to static file request and handle more traffic.
You can use <a href="https://docs.djangoproject.com/en/3.0/howto/static-files/deployment/#serving-static-files-in-production">this technique</a> to put all of your
Django app's static files into a folder like this:</p>
<div class="highlight"><pre><span></span><code>/home/myuser/myproject
└─ static Your static files
├─ styles.css CSS file
├─ main.js JavaScript file
└─ cat.png A picture of a cat
</code></pre></div>
<p>Then you can set the <code>/static/</code> location to serve files directly from this folder: </p>
<div class="highlight"><pre><span></span><code><span class="k">location</span><span class="w"> </span><span class="s">/static/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">root</span><span class="w"> </span><span class="s">/home/myuser/myproject</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Now a request to <code>http://localhost/static/cat.png</code> will cause NGINX to read from <code>/home/myuser/myproject/static/cat.png</code>, without sending a request to the WSGI server.</p>
<h2>Next steps</h2>
<p>Now you know what every line of your Django app's NGINX config is doing.
Hopefully you will be able to use this knowledge to debug issues faster and customise your existing setup.
If you have specific questions that weren't covered by this post, I recommend looking at the official NGINX documentation <a href="https://docs.nginx.com/nginx/admin-guide/web-server/web-server/">here</a>.</p>
<p>If you liked this post then you might also like reading some other stuff I've written:</p>
<ul>
<li><a href="https://mattsegal.dev/simple-django-deployment.html">A simple guide to deploying a Django app</a></li>
<li><a href="https://mattsegal.dev/django-prod-architectures.html">An overview of Django server setups</a></li>
<li><a href="https://mattsegal.dev/django-gunicorn-nginx-logging.html">How to manage logs with Django, Gunicorn and NGINX</a></li>
<li>A mini rant on Django performance: <a href="https://mattsegal.dev/is-django-too-slow.html">Is Django too slow?</a></li>
<li>A little series on Postgres database backups <a href="https://mattsegal.dev/postgres-backup-and-restore.html">1</a>, <a href="https://mattsegal.dev/postgres-backup-automate.html">2</a>, <a href="https://mattsegal.dev/restore-django-local-database.html">3</a></li>
</ul>
<p>If you found some of the stuff about HTTP in this post confusing, I heartily recommend checking out Brian Will's "The Internet" videos to learn more about what HTTP, TCP, and ports are: <a href="https://www.youtube.com/watch?v=DTQV7_HwF58">part 1</a>, <a href="https://www.youtube.com/watch?v=3fvUc2Dzr04&t=167s">part 2</a>, <a href="https://www.youtube.com/watch?v=_55PyDw0lGU">part 3</a>, <a href="https://www.youtube.com/watch?v=yz3lkSqioyU">part 4</a>.</p>
<p>And, of course, if you want to get updates on any new posts I write, you can subscribe to my blog's mailing list below.</p>How to manage logs with Django, Gunicorn and NGINX2020-07-26T12:00:00+10:002020-07-26T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-07-26:/django-gunicorn-nginx-logging.html<p>So you want to run a Django app using NGINX and Gunicorn.
Did you notice that <em>all three</em> of these tools have logging options?
You can configure <a href="https://docs.djangoproject.com/en/3.0/topics/logging/">Django logging</a>,
<a href="https://docs.gunicorn.org/en/latest/settings.html#errorlog">Gunicorn logging</a>, and <a href="https://docs.nginx.com/nginx/admin-guide/monitoring/logging/">NGINX logging</a>.</p>
<p>You just want to see what's happening in your Django app so that you can fix …</p><p>So you want to run a Django app using NGINX and Gunicorn.
Did you notice that <em>all three</em> of these tools have logging options?
You can configure <a href="https://docs.djangoproject.com/en/3.0/topics/logging/">Django logging</a>,
<a href="https://docs.gunicorn.org/en/latest/settings.html#errorlog">Gunicorn logging</a>, and <a href="https://docs.nginx.com/nginx/admin-guide/monitoring/logging/">NGINX logging</a>.</p>
<p>You just want to see what's happening in your Django app so that you can fix bugs. How are you supposed to set these logs up? What are they all for?
In this post I'll give you a brief overview of your logging options with Django, Gunicorn and NGINX, so that you don't feel so confused and overwhelmed.</p>
<p>I've previously written a short guide on <a href="https://mattsegal.dev/file-logging-django.html">setting up file logging</a> with Django if you just want quick instructions on what to do. </p>
<h2>NGINX logging</h2>
<p>NGINX allows you to set up <a href="https://docs.nginx.com/nginx/admin-guide/monitoring/logging/">two log files</a>, access_log and error_log. I usually configure them like this in my <code>/etc/nginx/nginx.conf</code> file:</p>
<div class="highlight"><pre><span></span><code>access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
</code></pre></div>
<h2>NGINX access logs</h2>
<p>The NGINX access_log is a file which records of all the requests that are coming in to your server via NGINX. It looks like this:</p>
<div class="highlight"><pre><span></span><code>123.45.67.89 - - [26/Jul/2020:04:55:28 +0000] "GET / HTTP/1.1" 200 906 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
123.45.67.89 - - [26/Jul/2020:05:06:29 +0000] "GET / HTTP/1.1" 200 904 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
123.45.67.89 - - [26/Jul/2020:05:10:33 +0000] "GET / HTTP/1.1" 200 904 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
123.45.67.89 - - [26/Jul/2020:05:21:33 +0000] "GET / HTTP/1.1" 200 910 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
123.45.67.89 - - [26/Jul/2020:05:25:37 +0000] "GET / HTTP/1.1" 200 907 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
</code></pre></div>
<p>There's a new line for each request that comes in. Breaking a single like down:</p>
<div class="highlight"><pre><span></span><code>123.45.67.89 - - [26/Jul/2020:04:55:28 +0000] "GET / HTTP/1.1" 200 906 "-" "Mozilla/5.0 ... Chrome/98 Safari/537.4"
</code></pre></div>
<p>From this line can see:</p>
<ul>
<li>the IP is 123.45.67.89</li>
<li>the request arrived at 26/Jul/2020:04:55:28 +0000</li>
<li>the HTTP request method was GET</li>
<li>the path requested was /</li>
<li>the version of HTTP used was HTTP/1.1</li>
<li>the status code returned by the server was "200" (ie. <a href="https://http.cat/">OK</a>)</li>
<li>the requester's <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent">user agent</a> was "Mozilla/5.0 ... Chrome/98 Safari/537.4"</li>
</ul>
<p>This is <em>very</em> useful information to have when debugging issues in production, and I recommend you enable these access logs in NGINX.
You can quickly view these logs using <code>tail</code>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># View last 5 log lines</span>
tail -n <span class="m">5</span> /var/log/nginx/access.log
<span class="c1"># View last 5 log lines and watch for new ones</span>
tail -n <span class="m">5</span> -f /var/log/nginx/access.log
</code></pre></div>
<p>In addition to legitimate requests to your web application, NGINX will also log all of the spam, crawlers, and hacking attempts that hit your webserver.
If you have your server accessible via the internet, then you will get garbage requests like this in your access log: </p>
<div class="highlight"><pre><span></span><code>195.54.160.21 - - [26/Jul/2020:03:58:25 +0000] "POST /vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php HTTP/1.1" 404 564 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
</code></pre></div>
<p>I assume this is a bot trying to hack an old version of PHP (which I do not run on this server).</p>
<h2>NGINX error logs</h2>
<p>NGINX also logs errors to error_log, which can occur when you've messed up your configuration somehow, or if your Gunicorn server is unresponsive. This file is also useful for debugging so I recommend you include it as well in your NGINX config. You get error messages like this:</p>
<div class="highlight"><pre><span></span><code>2020/07/25 08:14:57 [error] 32115#32115: *44242 connect() failed (111: Connection refused) while connecting to upstream, client: 11.22.33.44, server: www.example.com, request: "GET /admin/ HTTP/1.1", upstream: "http://127.0.0.1:8000/admin/", host: "clerk.anikalegal.com", referrer: "https://www.example.com/admin/"
</code></pre></div>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Gunicorn logging</h2>
<p>Gunicorn has <a href="https://docs.gunicorn.org/en/latest/settings.html#errorlog">two main logfiles</a> that it writes, the error log and the access log.
You can configure the log settings through the <a href="https://docs.gunicorn.org/en/latest/configure.html#command-line">command line</a> or a <a href="https://docs.gunicorn.org/en/latest/configure.html#configuration-file">config file</a>. I recommend using the config file because it's easier to read.</p>
<h2>Gunicorn access logs</h2>
<p>The Gunicorn access log is very similar to the NGINX access log, it records all the requests coming in to the Gunicorn server:</p>
<div class="highlight"><pre><span></span><code>10.255.0.2 - - [26/Jul/2020:05:10:33 +0000] "GET /foo/ HTTP/1.0" 200 1938 "-" "Mozilla/5.0 ... (StatusCake)"
10.255.0.2 - - [26/Jul/2020:05:25:37 +0000] "GET /foo/ HTTP/1.0" 200 1938 "-" "Mozilla/5.0 ... (StatusCake)"
10.255.0.2 - - [26/Jul/2020:05:40:42 +0000] "GET /foo/ HTTP/1.0" 200 1938 "-" "Mozilla/5.0 ... (StatusCake)"
</code></pre></div>
<p>I think you may as well enable this so that you can debug issues where you're not sure if NGINX is sending requests to Gunicorn properly.</p>
<h2>Gunicorn error logs</h2>
<p>The Gunicorn error log is a little bit more complicated. By default it contains information about what the Gunicorn server is doing, like starting up and shutting down:</p>
<div class="highlight"><pre><span></span><code>[2020-04-06 06:17:23 +0000] [53] [INFO] Starting gunicorn 20.0.4
[2020-04-06 06:17:23 +0000] [53] [INFO] Listening at: http://0.0.0.0:8000 (53)
[2020-04-06 06:17:23 +0000] [53] [INFO] Using worker: sync
[2020-04-06 06:17:23 +0000] [56] [INFO] Booting worker with pid: 56
[2020-04-06 06:17:23 +0000] [58] [INFO] Booting worker with pid: 58
</code></pre></div>
<p>You can change how verbose these messages are using the "<a href="https://docs.gunicorn.org/en/latest/settings.html#loglevel">loglevel</a>" setting, which can be set to log more info using the "debug" level, or only errors, using the "error" level, etc.</p>
<p>Finally, and importantly there is the "<a href="https://docs.gunicorn.org/en/latest/settings.html#capture-output">capture_output</a>" logging setting, which is a boolean flag.
This setting will take any stdout/stderr, which is to say print statements, log messages, warnings and errors from your Django app, and log then to the Gunicorn error file.
I like to keep this setting enabled so that I can catch any random output that is falling through from Django to Gunicorn.
Here is an example Gunicorn config file with logging set up:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># gunicorn.conf.py</span>
<span class="c1"># Non logging stuff</span>
<span class="n">bind</span> <span class="o">=</span> <span class="s2">"0.0.0.0:80"</span>
<span class="n">workers</span> <span class="o">=</span> <span class="mi">3</span>
<span class="c1"># Access log - records incoming HTTP requests</span>
<span class="n">accesslog</span> <span class="o">=</span> <span class="s2">"/var/log/gunicorn.access.log"</span>
<span class="c1"># Error log - records Gunicorn server goings-on</span>
<span class="n">errorlog</span> <span class="o">=</span> <span class="s2">"/var/log/gunicorn.error.log"</span>
<span class="c1"># Whether to send Django output to the error log </span>
<span class="n">capture_output</span> <span class="o">=</span> <span class="kc">True</span>
<span class="c1"># How verbose the Gunicorn error logs should be </span>
<span class="n">loglevel</span> <span class="o">=</span> <span class="s2">"info"</span>
</code></pre></div>
<p>You can run Gunicorn using config like this as follows:</p>
<div class="highlight"><pre><span></span><code>gunicorn myapp.wsgi:application -c /some/folder/gunicorn.conf.py
</code></pre></div>
<h2>Django logging</h2>
<p>Django logging refers to the output of your Django application. The kind of messages you see printed by <code>runserver</code> in development. Stuff like this:</p>
<div class="highlight"><pre><span></span><code>Sending Thing<b5d1854b-7efc-4c67-9e9b-a956c10e5b86]> to Google API
Google API called failed: {'error_description': 'You failed hahaha'}
Traceback (most recent call last):
File "/app/google/api/base.py", line 102, in _handle_json_response
resp.raise_for_status()
File "/usr/local/lib/python3.6/dist-packages/requests/models.py"
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error
Setting expired tokens to inactive: []
</code></pre></div>
<p>I discuss Django logging in more detail in <a href="https://mattsegal.dev/file-logging-django.html">this guide</a>, but I will give you a brief summary here.
Django uses the same conventions as Python's standard library <a href="https://docs.python.org/3/library/logging.html">logging</a> module, which is kind of a pain to learn, but valuable to know.
The Django docs provide a nice overview of logging config <a href="https://docs.djangoproject.com/en/3.0/topics/logging/">here</a>.</p>
<p>I think you have two viable options for your Django logging:</p>
<ul>
<li>Set up Django to log everything to stdout/stderr using the <code>StreamHandler</code> and capture the output using Gunicorn via the capture_output option, so that your Django logs end up in the Gunicorn error logfile</li>
<li>Set up Django to log to a file using <code>FileHandler</code> so you can keep your Django and Gunicorn logs separate</li>
</ul>
<p>I personally prefer option #2, but you whatever makes you happy.</p>
<h2>Next steps</h2>
<p>I encourage you to set up the logging described in this post, so that you don't waste hours trying to figure out what is causing bugs in production.
I also recommend that you configure error alerting with Django, with <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">Sentry</a> being a strong choice.</p>
<p>Finally, if you're having other difficulties getting your Django app onto the internet, then check out my guide on <a href="https://mattsegal.dev/simple-django-deployment.html">Django deployment</a></p>How to make your Django project easy to move and share2020-07-24T12:00:00+10:002020-07-24T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-07-24:/django-portable-setup.html<p>You need your Django project to be portable. It should be quick and easy to start it up on a new laptop.
If it isn't portable, then your project is trapped on your machine. If it gets deleted or corrupted, then you've lost all your work!
This issue comes up …</p><p>You need your Django project to be portable. It should be quick and easy to start it up on a new laptop.
If it isn't portable, then your project is trapped on your machine. If it gets deleted or corrupted, then you've lost all your work!
This issue comes up in quite a few scenarios:</p>
<ul>
<li>you want to work on your code on multiple machines, like a laptop and a PC</li>
<li>you want to get help from other people, and they want to try running your code</li>
<li>you somehow screwed up your files very badly and you want to start from scratch </li>
</ul>
<p>In the worst case, moving your Django project from one machine to another is a frustrating and tedious experience that involves dead ends, mystery bugs and cryptic error messages. It's the kind of thing that makes you want to scream at your computer.</p>
<p><img alt="frutsrated fox" src="https://mattsegal.dev/img/frustrated-fox.jpeg"></p>
<p>In the best case, this process can take minutes. To achieve this best case, there are some steps that you'll
need to take to make your development environment reproducable.</p>
<p>If you don't believe that this is achievable, then here's a quick example of me cloning and setting up an <a href="git@github.com:MattSegal/djdt-perf-demo.git">example project</a> from scratch in under a minute:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/01cbd6d2c2f04d0ab78e4d33d0174de5" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>In the rest of this post, I'll describe some practices that will help ensure that anyone with Python installed can quickly start working on your Django app.</p>
<h2>Hosting your code</h2>
<p>The best way to make your code portable between multiple computers is to put it online in a place that is publicly accessible, like <a href="https://github.com/">GitHub</a>.
For example, this blog is <a href="https://github.com/MattSegal/devblog">hosted on GitHub</a> so that I can access the latest copy of my writing from both my laptop and PC.
Git, the version control tool, is widely used by software developers and allows you to efficently and reliably sync your code between multiple machines.</p>
<p>If you don't know Git and you plan to work with software in any capacity, then I strongly recommend that you start learning how to use it as soon as possible.
There are plenty of <a href="https://hellowebbooks.com/learn-git/">books</a>, <a href="https://www.udacity.com/course/version-control-with-git--ud123">online courses</a>, <a href="https://missing.csail.mit.edu/2020/version-control/">lectures</a> and <a href="https://try.github.io/">more</a> to help you learn. It's a pain in the ass to start with, no doubt about that, but it is definitely worth your time.</p>
<h2>Tracking Python dependencies</h2>
<p>Your project needs a bunch of 3rd party libraries to run. Obviously Django is required, plus maybe, Django Rest Framework, Boto3... Pillow, perhaps?
It's hard to remember all the thing that you've <code>pip install</code>'d, which is why it's really important to track all the libraries that your app needs, plus the versions, if those are important to you.</p>
<p>There is a Python convention of tracking all your libraries in a <a href="https://pip.pypa.io/en/latest/reference/pip_install/#example-requirements-file">requirements.txt</a> file.
Experienced Python devs immediately know what to do if they see a project with one of these files, so it's good if you stick with this practice. Installing all
your requirements is as easy as:</p>
<div class="highlight"><pre><span></span><code>pip install -r requirements.txt
</code></pre></div>
<p>You can also use <code>pip freeze</code> to get an exact snapshot of your current Python packages and write them to a file:</p>
<div class="highlight"><pre><span></span><code>pip freeze > requirements.txt
</code></pre></div>
<p>Python's <code>pip</code> package manager tries to install all of your dependencies in your global system Python folder by default, which is a really dumb idea,
and it can cause issues where multiple Python projects are all installing libraries in the same place. When this happens you can get the wrong
version installed, and you can no longer keep track of what dependencies you need to run your code, because they're are muddled
together with the ones from all your other projects.</p>
<p>The simplest way to fix this issue is to <em>always</em> use <code>virtualenv</code> to isolate your Python dependencies. You can read a guide on that <a href="https://realpython.com/python-virtual-environments-a-primer/">here</a>. Using <code>virtualenv</code>, incidentally, also fixes the problem where you sometimes have to use <code>sudo</code> to pip install things on Linux.
There are also other tools like <a href="https://realpython.com/pipenv-guide/">pipenv</a> or <a href="https://python-poetry.org/">poetry</a> that solve this problem as well. Use whatever you want,
but it's a good idea to pick <em>something</em>, or you will shed many tears over Python dependency errors in the future.</p>
<h2>Repeatable setup instructions</h2>
<p>Most simple Django projects have the exact same setup sequence. It's almost always roughly this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Create and activate virtual environment</span>
virtualenv -p python3 env
. ./env/bin/activate
<span class="c1"># Install Python dependencies</span>
pip install -r requirements.txt
<span class="c1"># Create SQLite databse, run migrations</span>
<span class="nb">cd</span> myapp
./manage.py migrate
<span class="c1"># Run Django dev server</span>
./manage.py runserver
</code></pre></div>
<p>But for anything but the simplest projects there's usually a few extra steps that you'll need to get up and running.
You need to <strong>write this shit down</strong>, preferably in your project's README, or <strong>you will forget</strong>.
Even if you remember all these steps, your friends or colleagues will get stuck if they're not available.</p>
<p>You want to document all the instructions that someone needs to do to start running your project, with as much of it being explicit
line of shell code as possible. Someone, who already has Python setup, should be able to clone your project onto their laptop with Git, follow
your instructions, and then be able to run your Django app. The kind of extra things that you should document are:</p>
<ul>
<li>any extra scripts or management commands that the user must run</li>
<li>any environment variables or files that the user needs to configure</li>
<li>setup of required data in the Django admin or shell </li>
<li>installing and running any 3rd party dependencies (eg. Docker, Postgres, Redis)</li>
<li>building required front end web assets (eg. with Webpack)</li>
<li>downloading essential data from the internet</li>
</ul>
<p>Documenting the project setup isn't so important for small and simple projects, but it's also really easy to do (see script above).
As your project becomes more complicated, the need to have replicable, explicit setup instructions becomes vital.
If you do not maintain these instructions, then it will cost your hours of work when you forget to perform a vital step and your app doesn't work.</p>
<p>I've written before on <a href="https://mattsegal.dev/github-resume-polish.html#readme">how to write a nice README</a>, which you might find useful.
It's a little over the top for the purposes of just making your project portable and reproducible, but it should give you a general idea of what to cover.</p>
<h2>Exclude unnecessary files</h2>
<p>Your project should only contain source code, plus the minimum files required to run it. It should not not contain:</p>
<ul>
<li>Editor config files (.idea, .vscode)</li>
<li>Database files (eg. SQLite)</li>
<li>Random documents (.pdf, .xls)</li>
<li>Non-essential media files (images, videos, audio)</li>
<li>Bytecode (eg. *.pyc files)</li>
<li>Build artifacts (eg. JavaScript and CSS from Webpack)</li>
<li>Virtual environments (eg env/venv folders)</li>
<li>JavaScript packages (node_modules)</li>
<li>Log files (eg. *.log)</li>
</ul>
<p>Some of these files are just clutter, but the SQLite databases and bytecode are particularly important to exclude.</p>
<p>SQLite files are a binary format, which Git does not store easily. Every change to the database causes Git to store a whole new copy.
In addition, there's no way to "merge" databases with Git, meaning the data will get regularly overwritten by multiple users.</p>
<p><a href="https://opensource.com/article/18/4/introduction-python-bytecode">Python bytecode</a> files, with the <code>.pyc</code> extension, can cause issues
when shared between different machines, and are also just yucky to look at.</p>
<p>You can exlude all of the files (and folders) I described above using a <code>.gitignore</code> file, in the root of your repository, with contents something like this:</p>
<div class="highlight"><pre><span></span><code># General
*.log
*.pdf
*.png
# IDE
.idea/ # PyCharm settings
.vscode/ # VSCode settings
# Python
*.pyc
env/
venv/
# Databases
*.sqlite3
# JavaScript
node_modules/
build/ # Webpack build output
</code></pre></div>
<p>If you've already added these kinds of files to your project's Git history, then you'll need to delete them before ignoring them.</p>
<p>In addition, a common mistake by beginners is to exclude migration files from theit Git history. Django migration files belong in source control,
so that you can ensure that everybody is running the same migrations on their data.</p>
<h2>Automate common tasks</h2>
<p>Although it's not strictly necessary, it's really nice to automate your project setup, so that you can get started by just running a few scripts.
You can use bash scripts if you're a Linux or Mac user, PowerShell if you're using Windows, or even custom Django management commands. I also recommend checking out <a href="https://www.pyinvoke.org/">Invoke</a>, which is a nice, cross-platform Python tool for running tasks (<a href="https://github.com/MattSegal/link-sharing-site/blob/master/tasks.py">example Invoke script</a>).</p>
<p>For example, in this <a href="https://github.com/MattSegal/djdt-perf-demo">demo repo</a>, I added a script which <a href="https://mattsegal.dev/django-factoryboy-dummy-data.html">fills the website with test data</a>, which a user can quickly run via a management command:</p>
<div class="highlight"><pre><span></span><code>./manage.py setup_test_data
</code></pre></div>
<p>In other projects of mine, I also like to include a script that allows me to <a href="https://mattsegal.dev/restore-django-local-database.html">pull production data into my local database</a>, which is also just one quick copy-paste to run. </p>
<div class="highlight"><pre><span></span><code>./scripts/restore-prod.sh
</code></pre></div>
<h2>Next steps</h2>
<p>If you're working on a Django project right now, I recommend that you make sure that it's portable.
It doesn't take long to do and you will save yourself hours and hours of this:</p>
<p><img alt="dog screaming internally" src="https://mattsegal.dev/img/screams.jpg"></p>
<p>If multiple people are working on your Django project and you want to become even more productive as a team, then I also recommend that you begin <a href="https://docs.djangoproject.com/en/3.0/topics/testing/">writing tests</a> and <a href="https://mattsegal.dev/pytest-on-github-actions.html">run them automatically with GitHub Actions</a>.</p>
<p>If you've found moving your Django project around to be a frustrating experience, then you've probably also had trouble deploying it to the web as well. If that's the case, you might enjoy my guide on <a href="https://mattsegal.dev/simple-django-deployment.html">Django deployment</a>, where I show you how to deploy Django to a DigitalOcean virtual machine.</p>Is Django too slow?2020-07-24T12:00:00+10:002020-07-24T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-07-24:/is-django-too-slow.html<p>Does Django have "bad performance"?
The framework is now 15 years old. Is it out of date?
Mostly, no. I think that Django's performance is perfectly fine for most use-cases.
In this post I'll review different aspects of Django's "performance" as a web framework and discuss how you can decide …</p><p>Does Django have "bad performance"?
The framework is now 15 years old. Is it out of date?
Mostly, no. I think that Django's performance is perfectly fine for most use-cases.
In this post I'll review different aspects of Django's "performance" as a web framework and discuss how you can decide whether it's a good fit for your web app.</p>
<h2>Benchmarks</h2>
<p>Let's start by digging into the ad-hoc web app performance benchmarks that you'll see pop up on Medium from time to time. To produce a graph like the one below, the author of <a href="https://medium.com/@mihaigeorge.c/web-rest-api-benchmark-on-a-real-life-application-ebb743a5d7a3">this article</a> sets up a server for each of the frameworks tested and sends them a bunch of HTTP requests. The benchmarking tool counts number of requests served per second by each framework.</p>
<p><img alt="benchmark" src="https://mattsegal.dev/img/benchmark.png"></p>
<p>I think these kind of measurements are irrelevant to practical web development. There are a few factors to consider:</p>
<ul>
<li>Is the metric being measured actually of interest? What's a good baseline? Is 100 requests per seconds good, or pathetic? Is 3000 requests/s practically better than 600 requests/s?</li>
<li>Is the test representative of an actual web app workload? In this case, how often do we just send a static "hello world" JSON to users?</li>
<li>Are we comparing apples to apples? For example, ExpressJS has 3 layers of relatively simple middleware enabled by default, wheras Django provides a larger stack of middleware features, "out of the box"</li>
<li>Has each technology been set up correctly? Was Gunicorn, for example, run with an optimal number of workers?</li>
</ul>
<p>This kind of naive comparsison is a little misleading and it's hard to use it to make practical decisions. So, what kind of performance metrics should you pay attention to when working on your Django app?</p>
<h2>What do you mean by "performance"?</h2>
<p>When you ask whether a framework or language is "slow", you should also ask "slow at what?" and "why do you care?".
Fundamentally I think there are really only two performance goals: a good user experience and low hosting cost. How much money does running this website cost me, and do people enjoy using my website? For user experience I'm going to talk about two factors:</p>
<ul>
<li>Response time: how long people need to wait before their requests are fulfilled</li>
<li>Concurrency: how many people can use your website at the same time</li>
</ul>
<p>Cost, on the other hand, is typically proportional to compute resources: how many CPU cores and GB of RAM you will need to run your web app.</p>
<h2>Response time in Django</h2>
<p>Users don't like waiting for their page to load, so the less time they have to wait, the better. There are a few different
metrics that you could use to measure page load speed, such as <a href="https://web.dev/time-to-first-byte/">time to first byte</a> or <a href="https://web.dev/first-contentful-paint/">first contentful paint</a>, both of which you can check with <a href="https://developers.google.com/speed/pagespeed/insights/">PageSpeed Insights</a>. Faster responses don't benefit your user linearly though, not every 5x improvement in response is equally beneficial. A user getting a response in:</p>
<ul>
<li>5s compared to 25s transforms the app from "broken" to "barely useable"</li>
<li>1s compared to 5s is a huge improvement</li>
<li>200ms instead of 1s is good</li>
<li>50ms instead of 200ms is nice, I guess, but many people wouldn't notice</li>
<li>10ms instead of 50ms is imperceptible, no one can tell the difference</li>
</ul>
<p>So if someone says "this framework is 5x faster than that framework blah blah blah" it really doesn't mean anything without more context.
The important question is: will your users notice? Will they care?</p>
<p>So, what makes a page load slowly in Django? The most common beginner mistakes are using too many database queries or making slow API calls to external services.
I've written previously on how to <a href="https://mattsegal.dev/django-debug-toolbar-performance.html">find and fix slow database queries with Django Debug Toolbar</a> and how to <a href="https://mattsegal.dev/offline-tasks.html">push slow API calls into offline tasks</a>. There are <strong>many</strong> other ways to make your Django web pages or API endpoints load slowly, but if you avoid these two major pitfalls then you should be able to serve users with a time to first byte (TTFB) of 1000ms or less and provide a reasonable user experience.</p>
<h2>When is Django's response time not fast enough?</h2>
<p>Django isn't perfect for every use case, and sometimes it can't respond to queries fast enough.
There are some aspects of Django that are hard to optimise without giving up much of the convenience that makes the framework attractive in the first place.
You will always have to wait for Django when it is:</p>
<ul>
<li>running requests through middleware (on the way in and out) </li>
<li>serializing and deserializing JSON strings</li>
<li>building HTML strings from templates</li>
<li>converting database queries into Python objects</li>
<li>running garbage collection</li>
</ul>
<p>All this stuff run really fast on modern computers, but it is still overhead.
Most humans don't mind waiting roughly a second for their web page to load, but machines can be more impatient.
If you are using Django to serve an API, where it is primarily computer programs talking to other computer programs, then it <em>may</em> not be fast enough for very high performance workloads. Some applications where you would consider ditching Django to shave off some latency are:</p>
<ul>
<li>a stock trading marketplace</li>
<li>an global online advertisement serving network</li>
<li>a low level infrastructure control API</li>
</ul>
<p>If you find yourself sweating about an extra 100ms here or there, then maybe it's time to look at alternative web frameworks or languages. If the difference between a 600ms and 500ms TTFB doesn't mean much to you, then Django is totally fine.</p>
<h2>Concurrency in Django</h2>
<p>As we saw in the benchmark above, Django web apps can handle multiple requests at the same time. This is important if your application has multiple users. If too many people try to use your site at the same time, then it will eventually become overwhelmed, and they will be served errors or timeouts. In Australia, our government's household census website was <a href="https://www.abc.net.au/news/2016-08-09/abs-website-inaccessible-on-census-night/7711652">famously overwhelmed</a> when the entire country tried to access an online form in 2016. This effect is often called the "<a href="https://en.wikipedia.org/wiki/Slashdot_effect">hug of death</a>" and associated with small sites becoming popular on Reddit or Hacker News.</p>
<p>A Django app's <a href="https://mattsegal.dev/simple-django-deployment-2.html#wsgi">WSGI server</a> is the thing that handles multiple concurrent requests. I'm going to use <a href="https://gunicorn.org/">Gunicorn</a>, the WGSI server I know best, as a reference. Gunicorn can provide two kinds of concurrency: multiple child worker processes and multiple green threads per worker. If you don't know what a "process" or a "green thread" is then, whatever, suffice to say that you can set Gunicorn up to handle multiple requests at the same time. </p>
<p>What happens if a new request comes in and all the workers/threads are busy? I'm a little fuzzy on this, but I believe these extra requests get put in a queue, which is managed by Gunicorn. It appears that the <a href="https://docs.gunicorn.org/en/stable/settings.html#backlog">default length</a> of this queue is 2048 requests. So if the workers get overwhelmed, then the extra requests get put on the queue so that the workers can (hopefully) process them later. Typically NGINX will timeout any connections that have not received a response in 60s or less, so if a request gets put in the queue and doesn't get responded to in 60s, then the user will get a HTTP 504 "Gateway Timeout" error. If the queue gets full, then Gunicorn will start sending back errors for any overflowing requests.</p>
<p>It's interesting to note the relationship between request throughput and response time. If your WSGI server has 10 workers
and each request takes 1000ms to complete, then you can only serve ~10 requests per second. If you optimise your Django code so that each request only takes
100ms to complete, then you can serve ~100 requests per second. Given this relationship, it's sometimes good to improve your app's response time even if users won't notice, because it will also improve the number of requests/second that you can serve.</p>
<p>There are some limitations to adding more Gunicorn workers, of course:</p>
<ul>
<li>Each additional worker eats up some RAM (which can be reduced if you use <a href="https://docs.gunicorn.org/en/latest/settings.html#preload-app">preload</a>)</li>
<li>Each additional worker/thread will eat some CPU when processing requests</li>
<li>Each additional worker/thread will eat some extra CPU when listening to new requests, ie. the "<a href="https://docs.gunicorn.org/en/latest/faq.html#does-gunicorn-suffer-from-the-thundering-herd-problem">thundering herd problem</a>", which is described in great detail <a href="https://rachelbythebay.com/w/2020/03/07/costly/">here</a></li>
</ul>
<p>So, really, the question of "how much concurrency can Django handle" is actually a question of "how much cloud compute can you afford":</p>
<ul>
<li>if you need to handle more requests, add more workers</li>
<li>if you need more RAM, rent a virtual machine with more RAM</li>
<li>if you have too many workers one server and are seeing "thundering herd" problems, then <a href="https://mattsegal.dev/django-prod-architecture/nginx-2-external.png">scale out your web servers</a> (<a href="https://mattsegal.dev/django-prod-architectures.html">more here</a>)</li>
</ul>
<p>This situation is, admittedly, not ideal, and it would be better if Gunicorn were more resource efficient. To be fair, though, this problem of scaling
Django's concurrency doesn't really come up for most developers. If you're working at <a href="https://instagram-engineering.com/">Instagram</a> or <a href="https://www.eventbrite.com/engineering/our-strategy-to-migrate-to-django/">Eventbrite</a>, then sure, this is costing your company some serious money, but most developers don't run apps that operate at a scale where this is an issue.</p>
<p>How do you know if you can support enough concurrency with your current infrastructure? I recommend using <a href="https://locust.io/">Locust</a> to load test your app
with dozens, hundreds, or thousands of simultaneous users - whatever you think a realistic "bad case" scenario would look like. Ideally you would do this on a staging server that has a similar architecture and compute resources to your production enviroment. If your server becomes overwhelmed with requests and starts returning
errors or timeouts, then you know you have concurrency issues. If all requests are gracefully served, then you're OK!</p>
<p>What if the traffic to your site is very "bursty" though, with large transient peaks, or you're afraid that you'll get the dreaded "hug of death"?
In that case I recommend looking into "<a href="https://en.wikipedia.org/wiki/Autoscaling">autoscaling</a>" your servers, based on a metric like CPU usage.</p>
<p>If you're interested, you can read more on Gunicorn <a href="https://docs.gunicorn.org/en/latest/design.html#how-many-workers">worker selection</a> and how to configure Gunicorn to <a href="https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7">use more workers/threads</a>. There's also <a href="https://medium.com/@bfirsh/squeezing-every-drop-of-performance-out-of-a-django-app-on-heroku-4b5b1e5a3d44">this interesting case study</a> on optimising Gunicorn for <a href="https://www.arxiv-vanity.com/">arxiv-vanity.com</a>.</p>
<h2>When is Django's concurrency not enough?</h2>
<p>You will have hit the wall when you run out of money, or you can't move your app to a bigger server, or distribute it across more servers.
If you've twiddled all the available settings and still can't get your app to handle all the incoming requests without sending back errors or
burning through giant piles of cash, then maybe Django isn't the right backend framework for your application.</p>
<h2>The other kind of "performance"</h2>
<p>There's one more aspect of performance to consider: your performance as a developer. Call it your <a href="https://en.wikipedia.org/wiki/Takt_time">takt time</a>, if you like metrics. Your ability to quickly and easily fix bugs and ship new features is valuable to both you and your users.
Improvements to the speed or throughput of your web app that also makes your code harder to work with may not be worth it.
Cost savings on infrastructure might be a waste if the change makes you less productive and costs you your time.</p>
<p>Choosing languages, frameworks and optimisations is an engineering decision, and in all engineering decisions there are competing tradeoffs to be considered, at least at the <a href="https://en.wikipedia.org/wiki/Pareto_efficiency">Pareto frontier</a>.</p>
<p>If raw performance was all we cared about, then we'd just write all our web apps in assembly.</p>
<p><img alt="web development in assembly" src="https://mattsegal.dev/img/assembly.webp"></p>
<h2>Next steps</h2>
<p>If you liked reading about running Django in production, then you might also enjoy another post I wrote, which gives you a tour of some common <a href="https://mattsegal.dev/django-prod-architectures.html">Django production architectures</a>. If you've written a Django app and you're looking to deploy it to production, then
you might enjoy my guide on <a href="https://mattsegal.dev/simple-django-deployment.html">Django deployment</a>.</p>There's no one right way to test your code2020-07-11T12:00:00+10:002020-07-11T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-07-11:/alternate-test-styles.html<p>Today I read a Reddit thread where a beginner was stumbling over themself, apologizing for writing tests the "wrong way":</p>
<blockquote>
<p>I'm now writing some unit tests ... I know that the correct way would be to write tests first and then the code, but unfortunately it had to be done this …</p></blockquote><p>Today I read a Reddit thread where a beginner was stumbling over themself, apologizing for writing tests the "wrong way":</p>
<blockquote>
<p>I'm now writing some unit tests ... I know that the correct way would be to write tests first and then the code, but unfortunately it had to be done this way.</p>
</blockquote>
<p>This is depressing... what causes newbies to feel the need to <em>ask for forgiveness</em> when writing tests? You can tell the poster has either previously copped some snark or has seen someone else lectured online for not doing things the "correct way".</p>
<p>I feel that people can be very prescriptive about how you should test your code, which is puzzling to me. There are so many different use-cases for automated tests that there cannot be one right way to do it. When you're reading blogs and forums you get the impression that you must write "unit tests" (the right way!) and that you need to do <a href="https://en.wikipedia.org/wiki/Test-driven_development">test driven development</a>, or else you're some kind of idiot slacker.</p>
<p>In this post I am going to focus on the quiet dominance of "unit tests" as the default way to test your code, and suggest some other testing styles that you can use.</p>
<h2>You should write "unit tests"</h2>
<p>People often say that you should write <strong>unit tests</strong> for your code. In brief, these tests check that some chunk of code returns a an specific output for a given input. For example:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># The function to be tested</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="n">a</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
<span class="sd">"""Returns a added with b"""</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
<span class="c1"># Some tests for `add`</span>
<span class="k">def</span> <span class="nf">test_add__with_positive_numbers</span><span class="p">():</span>
<span class="k">assert</span> <span class="n">add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span>
<span class="k">def</span> <span class="nf">test_add__with_zero</span><span class="p">():</span>
<span class="k">assert</span> <span class="n">add</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span>
<span class="c1"># etc. etc. etc</span>
</code></pre></div>
<p>This style of testing is great under the right circumstances, but these are not the only kind of test that you can, or should, write. Unfortunately the name "unit test" is used informally to refer to all automated testing of code. This misnomer leads beginners to believe that unit tests are the best, and maybe only, way to test.</p>
<p>Let's start with what unit tests are good for. They favour a "bottom-up" style of coding. They're the most effective when you have a lots of little chunks of code that you want to write, test independently, and then assemble into a bigger program.</p>
<p>This is a perfect fit when you're writing code to deterministically transform data from one form into another, like parts of an <a href="https://en.wikipedia.org/wiki/Extract,_transform,_load">ETL pipeline</a> or a compiler. These tests work best when you're writing <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>, or code with limited <a href="https://en.wikipedia.org/wiki/Side_effect_(computer_science)">side effects</a>.</p>
<h2>When unit tests don't make sense</h2>
<p>The main problem with unit tests is that you can't always break your code up into pretty little pure functions.</p>
<p>When you start working on an existing legacy codebase there's no guarantee that the code is well-structured enough to allow for unit tests. Most commercial code that you'll encounter is legacy code, and a lot of legacy code is untested. I've encountered a fair few 2000+ line classes where reasoning about the effect of any one function is basically impossible because of all the shared state. You can't test a function if you don't know what it's supposed to do. These codebases cannot be rigourly unit tested straight away and need to be <a href="https://understandlegacycode.com/">gently massaged into a better shape over time</a>, which is a whole other can of worms.</p>
<p>Another, very common, case where unit tests don't make much sense is when a lot of the heavy lifting is being done by a framework. This happens to me all the time when I'm writing web apps with the <a href="https://www.djangoproject.com/">Django</a> framework. In Django's REST Framework, we use a "serializer" class to validate Python objects and translate them into a JSON string. For example:</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">models</span>
<span class="kn">from</span> <span class="nn">rest_framework</span> <span class="kn">import</span> <span class="n">serializers</span>
<span class="kn">from</span> <span class="nn">rest_framework.renderers</span> <span class="kn">import</span> <span class="n">JSONRenderer</span>
<span class="c1"># Create a data model that represents a person</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="c1"># Create a serializer that can map a Person to a JSON string</span>
<span class="k">class</span> <span class="nc">PersonSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">ModelSerializer</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Person</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"name"</span><span class="p">,</span> <span class="s2">"email"</span><span class="p">]</span>
<span class="c1"># Example usage.</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Person</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"Matt"</span><span class="p">,</span> <span class="n">email</span><span class="o">=</span><span class="s2">"mattdsegal@gmail.com"</span><span class="p">)</span>
<span class="n">ps</span> <span class="o">=</span> <span class="n">PersonSerializer</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="n">ps</span><span class="o">.</span><span class="n">is_valid</span><span class="p">()</span> <span class="c1"># True</span>
<span class="n">JSONRenderer</span><span class="p">()</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">ps</span><span class="o">.</span><span class="n">data</span><span class="p">)</span>
<span class="c1"># '{"name":"Matt","email":"mattdsegal@gmail.com"}'</span>
</code></pre></div>
<p>In this case, there's barely anything for you to actually test.
Don't get me wrong, you <em>could</em> write unit tests for this code, but anything you write is just a re-hash of the definitions of the <code>Person</code> and <code>PersonSerializer</code>. All the interesting stuff is handled by the framework. Any "unit test" of this code is really just a test of the 3rd party code, which <a href="https://github.com/encode/django-rest-framework/tree/master/tests">already has heaps of tests</a>. In this case, writing unit tests is just adding extra boilerplate to your codebase, when the whole point of using a framework was to save you time.</p>
<p>So if "unit tests" don't always make sense, what else can you do? There are other styles of testing that you can use. I'll highlight my two favourites: <strong>smoke tests</strong> and <strong>integration tests</strong>.</p>
<h2>Quick 'n dirty smoke tests</h2>
<p>Some of the value of an automated test is checking that the code runs at all. A smoke test runs some code and checks that it doesn't crash. Smoke tests are really, really easy to write and maintain and they catch 50% of bugs (made up number). These kinds of tests are great for when:</p>
<ul>
<li>your app has many potential code-paths</li>
<li>you are using interpreted languages like JavaScript or Python which often crash at runtime</li>
<li>you don't know or can't predict what the output of your code will be</li>
</ul>
<p>Here's a smoke test for a neural network. All it does is construct the network and feed it some random garbage data, making sure that it doesn't crash and that the outputs are the correct shape:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_processes_noise</span><span class="p">():</span>
<span class="n">input_shape</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">256</span><span class="p">)</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="n">get_random_input</span><span class="p">(</span><span class="n">input_shape</span><span class="p">)</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="n">MyNeuralNet</span><span class="p">(</span><span class="n">inputs</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">outputs</span><span class="o">.</span><span class="n">shape</span> <span class="o">==</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>This is valuable because runtime errors due to stupid mistakes are very common when building a neural net. A mismatch in array dimensions somewhere in the network is common stumbling block. Typically it might take minutes of runtime before your code crashes due to all the data loading and processing that needs to happen before the broken code is executed. With smoke tests like this, you can check for stupid errors in seconds instead of minutes.</p>
<p>In a more web-development focused example, here's a Django smoke test that loops over a bunch of urls and checks that they all respond to GET requests with happy "200" HTTP status codes, without validating any of the data that is returned:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">django_db</span>
<span class="k">def</span> <span class="nf">test_urls_work</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="sd">"""Ensure all urls return 200"""</span>
<span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">SMOKE_TEST_URLS</span><span class="p">:</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
</code></pre></div>
<p>Maybe you don't have time to write detailed tests for all your web app's endpoints, but a quick smoke test like this will at least exercise your code and check for stupid errors.</p>
<p>This crude style of testing is both fine and good. Don't let people shame you for writing smoke tests. If you do nothing but write smoke tests for your app, you'll still be getting a sizeable benefit from your test suite.</p>
<h2>High level integration tests</h2>
<p>To me, integration tests are when you test a whole feature, end-to-end. You are testing a system of components (functions, classes, modules, libraries) and the <em>integrations</em> between them. I think this style of testing can provide more bang-for-buck than a set of unit tests, because the integration tests cover a lot of different components with less code, and they check for behaviours that you actually care about. This is more "top down" approach to testing, compared to the "bottom up" style of unit tests.</p>
<p>Calling back to my earlier Django example, an integration test wouldn't test any independent behaviour of the the <code>Person</code> or <code>PersonSerializer</code> classes. Instead, we would test them by exercising a code path where they are used in combination. For example, we would want to make sure that a GET request asking for a specific Person by their id returns the correct data. Here's the API code to be tested:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Data model</span>
<span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
<span class="n">email</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">EmailField</span><span class="p">()</span>
<span class="c1"># Maps data model to JSON string</span>
<span class="k">class</span> <span class="nc">PersonSerializer</span><span class="p">(</span><span class="n">serializers</span><span class="o">.</span><span class="n">ModelSerializer</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Person</span>
<span class="n">fields</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"name"</span><span class="p">,</span> <span class="s2">"email"</span><span class="p">]</span>
<span class="c1"># API endpoint for Person</span>
<span class="k">class</span> <span class="nc">PersonViewSet</span><span class="p">(</span><span class="n">viewsets</span><span class="o">.</span><span class="n">RetrieveAPIView</span><span class="p">):</span>
<span class="n">serializer_class</span> <span class="o">=</span> <span class="n">PersonSerializer</span>
<span class="n">queryset</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="c1"># Attach API endpoint to a URL path</span>
<span class="n">router</span> <span class="o">=</span> <span class="n">routers</span><span class="o">.</span><span class="n">SimpleRouter</span><span class="p">()</span>
<span class="n">router</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="s2">"person"</span><span class="p">,</span> <span class="n">PersonViewSet</span><span class="p">)</span>
<span class="n">urlpatterns</span> <span class="o">=</span> <span class="p">[</span><span class="n">path</span><span class="p">(</span><span class="s2">"api"</span><span class="p">,</span> <span class="n">include</span><span class="p">(</span><span class="n">router</span><span class="o">.</span><span class="n">urls</span><span class="p">))]</span>
</code></pre></div>
<p>And here's a short integration test for the code above. It used Django's <a href="https://docs.djangoproject.com/en/3.0/topics/testing/tools/#the-test-client">test client</a> to simulate a HTTP GET request to our view and validate the data that is returned:</p>
<div class="highlight"><pre><span></span><code><span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">django_db</span>
<span class="k">def</span> <span class="nf">test_person_get</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="sd">"""Ensure a user can retrieve a person's data by id"""</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">Person</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"Matt"</span><span class="p">,</span> <span class="n">email</span><span class="o">=</span><span class="s2">"mattdsegal@gmail.com"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"person-detail"</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">[</span><span class="n">p</span><span class="o">.</span><span class="n">id</span><span class="p">])</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">data</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Matt"</span><span class="p">,</span>
<span class="s2">"email"</span><span class="p">:</span> <span class="s2">"mattdsegal@gmail.com"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div>
<p>This integration test is exercising the code of the <code>Person</code> data model, the <code>PersonSerializer</code> data mapping and the <code>PersonViewSet</code> API endpoint all in one go.</p>
<p>A valid criticism of this style of testing is that if the integration test fails, it's not always clear <em>why</em> it failed. This is typically a non-issue, since you can get to the bottom of a failure by reading the error message and spending a few minutes poking the code with a debugger.</p>
<h2>Next steps</h2>
<p>Testing code is an art that requires you to apply judgement to your specific situation. There's a bunch of styles and methodologies for testing your code and your choice depends on your codebase, your app's risk profile and your time constraints. I think you can cultivate this judgement by trying out different techniques. If you haven't already, try a new style of testing on your codebase and see if you like it.</p>
<p>I've enjoyed poking around the <a href="https://understandlegacycode.com/changing-untested-code">Undertand Legacy Code</a> blog, which suggests quite a few novel testing methods that I've never heard of. I've got my eye on the "<a href="https://understandlegacycode.com/approval-tests/">approval test</a>" for a codebase I'm currently working on.</p>
<p>If you're interested in reading more about automated testing with Python, then you might enjoy this post I wrote on how to <a href="https://mattsegal.dev/pytest-on-github-actions.html">automatically run your tests on every commit with GitHub Actions</a>.</p>How to find what you want in the Django documentation2020-06-26T12:00:00+10:002020-06-26T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-26:/how-to-read-django-docs.html<p>Many beginner programmers find the <a href="https://docs.djangoproject.com/en/3.0/">Django documentation</a> overwhelming.</p>
<p>Let's say you want to learn how to perform a login for a user. Seems like it would be pretty simple: logins are a core feature of Django. If you <a href="https://www.google.com/search?q=django+login">google for "django login"</a> or <a href="https://docs.djangoproject.com/en/3.0/search/?q=login">search the docs</a> you see a few …</p><p>Many beginner programmers find the <a href="https://docs.djangoproject.com/en/3.0/">Django documentation</a> overwhelming.</p>
<p>Let's say you want to learn how to perform a login for a user. Seems like it would be pretty simple: logins are a core feature of Django. If you <a href="https://www.google.com/search?q=django+login">google for "django login"</a> or <a href="https://docs.djangoproject.com/en/3.0/search/?q=login">search the docs</a> you see a few options, with "Using the Django authentication system" as the most promising result. You click the link, happily anticipating that your login problems will soon be over, and you get smacked in the face with <a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/">thirty nine full browser pages of text</a>. This is way too much information!</p>
<p>Alternatively, you find your way to the reference page on <a href="https://docs.djangoproject.com/en/3.0/ref/contrib/auth/">django.contrib.auth</a>, because that's where all the auth stuff is, right? If you browse this page you will see an endless enumeration of all the different authentication models and fields and functions, but no explanation of how they're supposed to fit together.</p>
<p>At this stage you may want to close your browser tab in despair and reconsider your decision to learn Django. It turns out the info that you wanted was somewhere in that really long page <a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/#how-to-log-a-user-in">here</a> and <a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/#django.contrib.auth.authenticate">here</a>. Why was it so hard to find? Why is this documentation so fragmented?</p>
<p>God forbid that you should complain to anyone about this struggle. Experienced devs will say things like "you are looking in the wrong place" and "you need more experience before you try Django". This response begs the question though: how does anyone know where the "right place" is? The table of contents in the Django documentation <a href="https://docs.djangoproject.com/en/3.0/contents/">is unreadably long</a>. Meanwhile, you read other people raving about how great Django docs are: what are they talking about? You may wonder: am I missing something?</p>
<p>Wouldn't it be great if you could go from having a question to finding the answer in a few minutes or less? A quick Google and a scan, and boom: you know how to solve your Django problem. This is possible. As a professional Django dev I do this daily. I rarely remember how to do anything from heart and I am constantly scanning the docs to figure out how to solve problems, and you can too.</p>
<p>In this post I will outline how to find what you want in the Django documentation, so that you spend less time frustrated and stuck, and more time writing your web app. I also include a list of key references that I find useful.</p>
<p>Experienced devs can be dismissive when you complain about documentation, but they're right about one thing: knowing how to read docs is a really important skill for a programmer, and being good at this will save you lots of time.</p>
<h2>Find the right section</h2>
<p>Library documentation is almost always written with distinct sections. If you do not understand what these sections are for, then you will be totally lost.
If you have time, watch <a href="https://www.youtube.com/watch?v=t4vKPhjcMZg">Daniele Procida's excellent talk</a> how documentation should be structured. In the talk he describes four different sections of documentation:</p>
<ul>
<li><strong>Tutorials</strong>: lessons that show you how to complete a small project (<a href="https://docs.djangoproject.com/en/3.0/intro/install/">example</a>)</li>
<li><strong>How-to guides</strong>: guide with steps on how to solve a common problem (<a href="https://docs.djangoproject.com/en/3.0/howto/custom-management-commands/">example</a>)</li>
<li><strong>API References</strong>: detailed technical descriptions of all the bits of code (<a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/">example</a>)</li>
<li><strong>Explanations</strong>: high level discussion of design decisions (<a href="https://docs.djangoproject.com/en/3.0/topics/templates/#module-django.template">example</a>)</li>
</ul>
<p>In addition to these, there's also commonly a <strong>Quickstart</strong> (<a href="http://whitenoise.evans.io/en/stable/#quickstart-for-django-apps">example</a>), which is the absolute minimum steps you need to to do get started with the library.</p>
<p>The Django Rest Framework docs use a structure similar to this</p>
<p><img alt="django rest framework sections" src="https://mattsegal.dev/img/drf-sections.png"></p>
<p>The ReactJS docs use a structure similar to this</p>
<p><img alt="react sections" src="https://mattsegal.dev/img/react-sections.png"></p>
<p>The Django docs use a <a href="https://docs.djangoproject.com/en/3.0/#how-the-documentation-is-organized">structure similar to this</a></p>
<p><img alt="django sections" src="https://mattsegal.dev/img/django-sections.png"></p>
<p>Hopefully you see the pattern here: all these docs have been split up into distinct sections. Learn this structure once and you can quickly navigate most documentation.
Now that you understand that library documentation is usually structured in a particular way, I will explain how to navigate that structure.</p>
<h2>Do the tutorial first</h2>
<p>This might seem obvious, but I have to say it. If there is a tutorial in the docs and you are feeling lost, then do the tutorial. It is a place where the authors may have decided to introduce concepts that are key to understanding everything else. If you're feeling like a badass, then don't "do" the tutorial, but at the very least skim read it.</p>
<h2>Find an example, guide or overview</h2>
<p>Avoid the <a href="https://docs.djangoproject.com/en/3.0/ref/">API reference</a> section, unless you already know <em>exactly</em> what you're looking for. You will recognise that you are in an API reference section because the title will have "reference" in it, and the content will be very detailed with few high-level explanations. For example, <a href="https://docs.djangoproject.com/en/3.0/ref/contrib/auth/">django.contrib.auth</a> is a reference section - it is not a good place to learn how "Django login" works.</p>
<p>You need to understand how the bits of code fit together before looking at an API reference. This can be hard since most documentation, even the really good stuff, is incomplete. Still, the best thing to try is to look for overviews and explanations of framework features.</p>
<p>Find and scan the list of <a href="https://docs.djangoproject.com/en/3.0/howto/">how-to guides</a>, to see if they solve your exact problem. This will save you a lot of time if the guide directly solves your problem. Using our login example, there is no "how to log a user in" guide, which is bad luck.</p>
<p>If there is no guide, then quickly scan the <a href="https://docs.djangoproject.com/en/3.0/topics/">topic list</a> and try and find the topic that you need. If you do not already understand the topic well, then read the overview. <strong>Google terms that you do not understand</strong>, like "authentication" and "authorization" (they're different, specific things). In our login case, "<a href="https://docs.djangoproject.com/en/3.0/topics/auth/">User authentication in Django</a>" is the topic that we want from the list.</p>
<p>Once you think you sort-of understand how everything should fit together, then you can move to the detailed API reference, so that you can ensure that you're using the code correctly.</p>
<h2>Find and remember key references</h2>
<p>Once you understand what you want to do, you will need to use the API reference pages to figure out exactly what code you should write. It's good to remember key pages that contain the most useful references. Here's my personal favourites that I use all the time:</p>
<ul>
<li><a href="https://docs.djangoproject.com/en/3.0/ref/settings/"><strong>Settings reference</strong></a>: A list of all the settings and what they do</li>
<li><a href="https://docs.djangoproject.com/en/3.0/ref/templates/builtins/"><strong>Built-in template tags</strong></a>: All the template tags with examples</li>
<li><a href="https://docs.djangoproject.com/en/3.0/ref/models/querysets/"><strong>Queryset API reference</strong></a>: All the different tools for using the ORM to access the database</li>
<li><a href="https://docs.djangoproject.com/en/3.0/ref/models/fields/"><strong>Model field reference</strong></a>: All the different model fields</li>
<li><a href="https://ccbv.co.uk/"><strong>Classy Class Based Views</strong></a>: Detailed descriptions for each of Django's class-based views</li>
</ul>
<p>I don't have any of these pages bookmarked, I just google for them and then search using <code>ctrl-f</code> to find what I need in seconds.</p>
<p>When using Django REST Framework I often find myself referring to:</p>
<ul>
<li><a href="http://www.cdrf.co/"><strong>Classy DRF</strong></a>: Like Classy Class Based Views but for DRF</li>
<li><a href="https://www.django-rest-framework.org/api-guide/serializers/"><strong>Serializer reference</strong></a>: To make serializers work</li>
<li><a href="https://www.django-rest-framework.org/api-guide/fields/"><strong>Serializer field reference</strong></a>: All the different serializer fields</li>
<li><a href="https://www.django-rest-framework.org/api-guide/relations/#nested-relationships"><strong>Nested relationships</strong></a>: How to put serializers <a href="https://mattsegal.dev/img/xzibit.png">inside of other serializers</a></li>
</ul>
<h2>Search insead of reading</h2>
<p>Most documentation is not meant to be read linearly, from start to end, like a novel: most pages are too long to read. Instead, you should strategically search for what you want. Most documentation involves big lists of things, because they're so much stuff that the authors need to explain in a lot of detail. You cannot rely on brute-force reading all the content to find the info you need.</p>
<p>You can use your browser's build in text search feature (<code>ctrl-f</code>) to quickly find the text that you need. This will save you a lot of scrolling and squinting at your screen. I use this technique all the time when browsing the Django docs. Here's a video of me finding out how to log in with Django using <code>ctrl-f</code>:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/cc4b030513b0406c91a1eadcd08514a2" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>Here's me struggling to get past the first list by trying to read all the words with my pathetic human eyes. I genuinely did miss the "auth" section several times when trying to read that list manually while writing this post:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/1be42c1709334817ab3cb055ad8acf69" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>Using search is how you navigate the enormous <a href="https://docs.djangoproject.com/en/3.0/contents/">table of contents</a> or the <a href="https://docs.djangoproject.com/en/3.0/topics/auth/default/">39 browser pages of authentication overview</a>. You're not supposed to read all that stuff, you're supposed to strategically search it. In our login example, good search terms would be "auth", "login", "log in" and "user".</p>
<p>In addition, most really long pages will have a sidebar summarising all the content. If you're going to read something, read that.</p>
<p><img alt="django sections" src="https://mattsegal.dev/img/docs-sidebar.png"></p>
<h2>Read the source code</h2>
<p>This is kind of the documentation equivalent of "go fuck yourself", but when you need an answer and the documentation doesn't have it, then the code is the authoratative source on how the library works. There are many library details that would be too laborious to document in full, and at some point the expectation is that if you <em>really need to know</em> how something works, then you should try reading the code. The <a href="https://github.com/django/django">Django source code</a> is pretty well written, and the more time you spend immersed in it, the easier it will be to navigate. This isn't really advice for beginners, but if you're feeling brave, then give it a try.</p>
<h2>Summary</h2>
<p>The Django docs, in my opionion, really are quite good, but like most code docs, they're hard for beginners to navigate. I hope that these tips will make learning Django a more enjoyable experience for you. To summarise my tips:</p>
<ul>
<li>Identify the different sections of the documentation</li>
<li>Do the tutorial first if you're not feeling confident, or at least skim read it</li>
<li>Avoid the API reference early on</li>
<li>Try find a how to guide for your problem</li>
<li>Try find a topic overview and explanation for your topic</li>
<li>Remember key references for quick lookup later</li>
<li>Search the docs, don't read them like a book</li>
<li>Read the source code if you're desperate</li>
</ul>
<p>As good as it is, the Django docs do not, and should not, tell you everything there is to know about how to use Django. At some point, you will need to turn to Django community blogs like <a href="https://simpleisbetterthancomplex.com/">Simple is Better than Complex</a>, YouTube videos, courses and books. When you need to deploy your Django app, you might enjoy my guide on <a href="https://mattsegal.dev/simple-django-deployment.html">Django deployment</a> and my overview of <a href="https://mattsegal.dev/django-prod-architectures.html">Django server setups</a>.</p>How to pull production data into your local Postgres database2020-06-21T12:00:00+10:002020-06-21T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-21:/restore-django-local-database.html<p>Sometimes you want to write a feature for your Django app that requires a lot of structured data that already exists in production. This happened to me recently: I needed to create a reporting tool for internal business users. The problem was that I didn't have much data in my …</p><p>Sometimes you want to write a feature for your Django app that requires a lot of structured data that already exists in production. This happened to me recently: I needed to create a reporting tool for internal business users. The problem was that I didn't have much data in my local database. How can I see what my reports will look like if I don't have any data?</p>
<p>It's possible to generate a bunch of fake data using a management command. I've written earlier about <a href="https://mattsegal.dev/django-factoryboy-dummy-data.html">how to do this with FactoryBoy</a>. This approach is great for filling web pages with dummy content, but it's tedious to do if your data is highly structured and follows a bunch of implcit rules. In the case of my reporting tool, the data I wanted involved hundreds of form submissions, and each submission has dozens of answers with many different data types. Writing a script to generate data like this would haven take ages! I've also seen situations like this when working with billing systems and online stores with many product categories.</p>
<p>Wouldn't it be nice if we could just get a copy of our production data and use that for local development? You could just pull the latest data from prod and work on your feature with the confidence that you have plenty of data that is structured correctly.</p>
<p>In this post I'll show you a script which you can use to fetch a Postgres database backup from cloud storage and use it to populate your local Postgres database with prod data. This post builds on three previous posts of mine, which you might want to read if you can't follow the scripting in this post:</p>
<ul>
<li><a href="https://mattsegal.dev/reset-django-local-database.html">How to automatically reset your local Django database</a></li>
<li><a href="https://mattsegal.dev/postgres-backup-and-restore.html">How to backup and restore a Postgres database</a></li>
<li><a href="https://mattsegal.dev/postgres-backup-automate.html">How to automate your Postgres database backups</a></li>
</ul>
<p>I'm going to do all of my scripting in bash, but it's also possible to write similar scripts in PowerShell, with only a few tweaks to the syntax.</p>
<h3>Starting script</h3>
<p>Let's start with the "database reset" bash script from my <a href="https://mattsegal.dev/reset-django-local-database.html">previous post</a>. This script resets your local database, runs migrations and creates a local superuser for you to use. We're going to extend this script with an additional step to download and restore from our latest database backup.</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Resets the local Django database, adding an admin login and migrations</span>
<span class="nb">set</span> -e
<span class="nb">echo</span> -e <span class="s2">"\n>>> Resetting the database"</span>
./manage.py reset_db --close-sessions --noinput
<span class="c1"># =========================================</span>
<span class="c1"># DOWNLOAD AND RESTORE DATABASE BACKUP HERE</span>
<span class="c1"># =========================================</span>
<span class="nb">echo</span> -e <span class="s2">"\n>>> Running migrations"</span>
./manage.py migrate
<span class="nb">echo</span> -e <span class="s2">"\n>>> Creating new superuser 'admin'"</span>
./manage.py createsuperuser <span class="se">\</span>
--username admin <span class="se">\</span>
--email admin@example.com <span class="se">\</span>
--noinput
<span class="nb">echo</span> -e <span class="s2">"\n>>> Setting superuser 'admin' password to 12345"</span>
./manage.py shell_plus --quiet-load -c <span class="s2">"</span>
<span class="s2">u=User.objects.get(username='admin')</span>
<span class="s2">u.set_password('12345')</span>
<span class="s2">u.save()</span>
<span class="s2">"</span>
<span class="nb">echo</span> -e <span class="s2">"\n>>> Database restore finished."</span>
</code></pre></div>
<h3>Fetching the latest database backup</h3>
<p>Now that we have a base script to work with, we need to fetch the latest database backup. I'm going to assume that you've followed my guide on <a href="https://mattsegal.dev/postgres-backup-automate.html">automating your Postgres database backups</a>.</p>
<p>Let's say your database is saved in an AWS S3 bucket called <code>mydatabase-backups</code>, and you've saved your backups with a timestamp in the filename, like <code>postgres_mydatabase_1592731247.pgdump</code>. Using these two facts we can use a little bit of bash scripting to find the name of the latest backup from our S3 bucket:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Find the latest backup file</span>
<span class="nv">S3_BUCKET</span><span class="o">=</span>s3://mydatabase-backups
<span class="nv">LATEST_FILE</span><span class="o">=</span><span class="k">$(</span>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort <span class="p">|</span> tail -n <span class="m">1</span><span class="k">)</span>
<span class="nb">echo</span> -e <span class="s2">"\nFound file </span><span class="nv">$LATEST_FILE</span><span class="s2"> in bucket </span><span class="nv">$S3_BUCKET</span><span class="s2">"</span>
</code></pre></div>
<p>Once you know the name of the latest backup file, you can download it to the current directory with the <code>aws</code> CLI tool:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Download the latest backup file</span>
aws s3 cp <span class="si">${</span><span class="nv">S3_BUCKET</span><span class="si">}</span>/<span class="si">${</span><span class="nv">LATEST_FILE</span><span class="si">}</span> .
</code></pre></div>
<p>The <code>.</code> in this case refers to the current directory.</p>
<h3>Restoring from the latest backup</h3>
<p>Now that you've downloaded the backup file, you can apply it to your local database with <code>pg_restore</code>. You may need to install a Postgres client on your local machine to get access to this tool. Assuming your local Postgres credentials aren't a secret, you can just hardcode them into the script:</p>
<div class="highlight"><pre><span></span><code>pg_restore <span class="se">\</span>
--clean <span class="se">\</span>
--dbname postgres <span class="se">\</span>
--host localhost <span class="se">\</span>
--port <span class="m">5432</span> <span class="se">\</span>
--username postgres <span class="se">\</span>
--no-owner <span class="se">\</span>
<span class="nv">$LATEST_FILE</span>
</code></pre></div>
<p>In this case we use <code>--clean</code> to remove any existing data and we use <code>--no-owner</code> to ignore any commands that set ownership of objects in the database.</p>
<h3>Look ma, no files!</h3>
<p>You don't have to save your backup file to disk before you use it to restore your local database: you can stream the data directly from <code>aws s3 cp</code> to <code>pg_restore</code> using pipes.</p>
<div class="highlight"><pre><span></span><code>aws s3 cp <span class="si">${</span><span class="nv">S3_BUCKET</span><span class="si">}</span>/<span class="si">${</span><span class="nv">LATEST_FILE</span><span class="si">}</span> - <span class="p">|</span> <span class="se">\</span>
pg_restore <span class="se">\</span>
--clean <span class="se">\</span>
--dbname postgres <span class="se">\</span>
--host localhost <span class="se">\</span>
--port <span class="m">5432</span> <span class="se">\</span>
--username postgres <span class="se">\</span>
--no-owner
</code></pre></div>
<p>The <code>-</code> in this case means "stream to stdout", which we use so that we can pipe the data.</p>
<h3>Final script</h3>
<p>Here's the whole thing:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Resets the local Django database,</span>
<span class="c1"># restores from latest prod backup,</span>
<span class="c1"># and adds an admin login and migrations</span>
<span class="nb">set</span> -e
<span class="nb">echo</span> -e <span class="s2">"\n>>> Resetting the database"</span>
./manage.py reset_db --close-sessions --noinput
<span class="nb">echo</span> -e <span class="s2">"\nRestoring database from S3 backups"</span>
<span class="nv">S3_BUCKET</span><span class="o">=</span>s3://mydatabase-backups
<span class="nv">LATEST_FILE</span><span class="o">=</span><span class="k">$(</span>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort <span class="p">|</span> tail -n <span class="m">1</span><span class="k">)</span>
aws s3 cp <span class="si">${</span><span class="nv">S3_BUCKET</span><span class="si">}</span>/<span class="si">${</span><span class="nv">LATEST_FILE</span><span class="si">}</span> - <span class="p">|</span> <span class="se">\</span>
pg_restore <span class="se">\</span>
--clean <span class="se">\</span>
--dbname postgres <span class="se">\</span>
--host localhost <span class="se">\</span>
--port <span class="m">5432</span> <span class="se">\</span>
--username postgres <span class="se">\</span>
--no-owner
<span class="nb">echo</span> -e <span class="s2">"\n>>> Running migrations"</span>
./manage.py migrate
<span class="nb">echo</span> -e <span class="s2">"\n>>> Creating new superuser 'admin'"</span>
./manage.py createsuperuser <span class="se">\</span>
--username admin <span class="se">\</span>
--email admin@example.com <span class="se">\</span>
--noinput
<span class="nb">echo</span> -e <span class="s2">"\n>>> Setting superuser 'admin' password to 12345"</span>
./manage.py shell_plus --quiet-load -c <span class="s2">"</span>
<span class="s2">u=User.objects.get(username='admin')</span>
<span class="s2">u.set_password('12345')</span>
<span class="s2">u.save()</span>
<span class="s2">"</span>
<span class="nb">echo</span> -e <span class="s2">"\n>>> Database restore finished."</span>
</code></pre></div>
<p>You should be able to to run this over and over and over to get the latest database backup working on your local machine.</p>
<h3>Other considerations</h3>
<p>When talking about using production backups locally, there are two points that I think are important.</p>
<p>First, production data can contain sensitive user information including names, addresses, emails and even credit card details. You need to ensure that this data is only be distributed to people who are authorised to access it, or alternatively the backups should be sanitized so the senitive data is overwritten or removed.</p>
<p>Secondly, It's possible to use database backups to debug issues in production. I think it's a great method for squashing hard-to-reproduce bugs, but it shouldn't be your only way to solve production errors. Before you move onto this technique, you should first ensure you have <a href="https://mattsegal.dev/file-logging-django.html">application logging</a> and <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">error monitoring</a> set up for your Django app, so that you don't lean on your backups as a crutch.</p>
<h3>Next steps</h3>
<p>If you don't already have automated prod backups, I encourage you to set that up if you have any valuable data in your Django app. Once that's done, you'll be able to use this script to pull down prod data into your local dev environment on demand.</p>How to polish your GitHub projects when you're looking for a job2020-06-17T12:00:00+10:002020-06-17T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-17:/github-resume-polish.html<p>When you're going for your first programming job, you don't have any work experience or references to show that you can write code. You might not even have a relevant degree (I didn't). What you <em>can</em> do is write some code and throw it up on GitHub to demonstrate to …</p><p>When you're going for your first programming job, you don't have any work experience or references to show that you can write code. You might not even have a relevant degree (I didn't). What you <em>can</em> do is write some code and throw it up on GitHub to demonstrate to employers that you can build a complete app all by yourself.</p>
<p>A lot of junior devs don't know how to show off their projects on GitHub. They spend <em>hours and hours</em> writing code and then forget to do some basic things to make their project seem interesting. In this post I want to share some tips that you can apply in a few hours to make an existing project much more effective at getting you an interview.</p>
<h3>Remove all the clutter</h3>
<p>Your project should only contain source code, plus the minimum files required to run it. It should not not contain:</p>
<ul>
<li>Editor config files (.idea, .vscode)</li>
<li>Database files (eg. SQLite)</li>
<li>Random documents (.pdf, .xls)</li>
<li>Media files (images, videos, audio)</li>
<li>Build outputs and artifacts (*.dll files, *.exe, etc)</li>
<li>Bytecode (eg. *.pyc files for Python)</li>
<li>Log files (eg. *.log)</li>
</ul>
<p>Having these files in your repo make you look sloppy. Professional developers don't like finding random crap cluttering up their codebase.
You can keep these files out of your git repo using a <a href="https://www.atlassian.com/git/tutorials/saving-changes/gitignore">.gitignore</a> file. If you already have these files inside your repo, make sure to delete them. If you're using <code>bash</code> you can use <code>find</code> to delete all files that match a pattern, like Python bytecode files ending in <code>.pyc</code>.</p>
<div class="highlight"><pre><span></span><code>find -name *.pyc -delete
</code></pre></div>
<p>You can achieve a similar result in Windows PowerShell, but it'll be a little more verbose.</p>
<p>Sometimes you do need to keep some media files, documents or even small databases in your source control. This is okay to do as long as it's an essential part of running, testing or documenting the code, as opposed to random clutter that you forgot to remove or gitignore. A good example of non-code files that you should keep in source control is website static files, like favicons and fonts.</p>
<h3 id="readme">Write a README</h3>
<p>Your project <em>must</em> have a README file. This is a file in the root of your project's repository called <code>README.md</code>. It's a text file written in <a href="https://github.com/adam-p/markdown-here/wiki/markdown-cheatsheet">Markdown</a> that gives a quick overview of what your project is and what it does. Not having a README makes your project seem crappy, and many people, including me, may close the browser window without checking any code if there isn't one present.</p>
<p>Here's <a href="https://github.com/anikalegal/clerk">one I prepared earlier</a>, and <a href="https://github.com/AnikaLegal/intake">here's another</a>. They're not
perfect, but I hope they give you a general idea of what to do.</p>
<p>One hour of paying attention to your project's README is worth 20 extra hours of coding, when it comes to impressing hiring managers. You know when people mindlessly write that they have "excellent communication skills" on their resume? No one believe that - it's far too easy to just say that. Don't <em>tell them</em> that you have excellent commuication skills, <em>show them</em> when you write an excellent README.</p>
<p>Enough of me waffling about why you should right a README, what do you put in it?</p>
<p>First, you should describe what your project does at a high level: what problem it solves. It is a command line tool that plays music? Is it a website that finds you low prices on Amazon? Is it a Reddit bot that reminds people? A reader should be able to read the first few sentences and decide if it's something they might want to use. You should summarize the main features of your project in this section.</p>
<p>A key point to remember is that the employer or recruiter reading your GitHub is both lazy and time-poor. They might not read past the first few sentences... they might not even read the code! They may well assume that your project works without checking anything. Before you rush to pack your README with features that don't exist, you scallywag, note that they may ask you more about your project in a job interview. So, uh... don't lie about anything.</p>
<p>Beyond a basic overview of your project, it's also good to outline the high-level architecture of your code - how it's structured. For example, in a Django web app, you could explain the different apps that you've implemented and their responsibilities.</p>
<p>If your project is a website, then you can also talk about the production infrastructure that your website runs on. For example:</p>
<blockquote>
<p>This website is deployed to a DigitalOcean virtual machine. The Django app runs inside a Gunicorn WSGI app server and depends on a Postgres database. A seperate Celery worker process runs offline tasks. Redis is responsible for both caching and serving as a task broker.</p>
</blockquote>
<p>Or for something a little more simple:</p>
<blockquote>
<p>This project is a static webpage that is hosted on Netlify</p>
</blockquote>
<p>Simply indicating that you know how to deploy your application makes you look good. "Isn't that obvious though?" - you may ask. No, it's not obvious and you need to be explicit.</p>
<p>A little warning on READMEs: they're for other people to read, not you. Do not include personal to-dos or notes to yourself in your README. Put those somewhere else, like Trello or Workflowy.</p>
<h3>Add a screenshot</h3>
<p>Add a screenshot of your website or tool and embed it in the README, it'll take you 10 minutes and it makes it look way better. Store the screenshot in a "docs" folder and embed it in your README using Markdown. If it's a command line app your can use <a href="https://asciinema.org/">asciinema</a> to record the tool in action, if your project has a GUI then you can quickly record yourself using the website with <a href="https://www.loom.com/my-videos">Loom</a>. This will make your project seem much more impressive for only a small amount of effort.</p>
<h3>Give instructions for other developers</h3>
<p>You should include instructions on how other devs can get started using your project. This is important because it demonstrates that you can document project setup instructions, and also because someone may actually try to run your code. These instructions should state what tools are required to run your project. For example:</p>
<ul>
<li>You will need Python 3 and pip installed</li>
<li>You will need yarn and node v11+</li>
<li>You will need docker and docker-compose</li>
</ul>
<p>Next your should explain the steps, with explicit command line examples if possible, that are required to get the app built or running. If your project has external libraries that need to be installed, then you should have a file that specifies these dependencies, like a <code>requirements.txt</code> (Python) or <code>package.json</code> (Node) or <code>Dockerfile</code> / <code>docker-compose.yaml</code> (Docker).</p>
<p>You should also include instructions on how to run your automated tests.
You have some tests, right? More on that later.</p>
<p>If you've scripted your project's deployment, you can mention how to do it here, if you like.</p>
<h3>Have a nice, readable commit history</h3>
<p>If possible, your git commit history should tell a story about what you've been working on.
Each commit should represent a distinct unit of work, and the commit message should explain what work was done.
For example your commit messages could look like this:</p>
<ul>
<li>Added smoke tests for payment API</li>
<li>Refactored image compression</li>
<li>Added Windows compatibility</li>
</ul>
<p>There are differing opions amongst devs on what exactly makes a "good" commit message, but it's very, very clear what bad commit messages look like:</p>
<ul>
<li>zzzz</li>
<li>add code</li>
<li>more code</li>
<li>fuck</li>
<li>remove shitty code</li>
<li>fuckfuckfuckfuck</li>
<li>still broken</li>
<li>fuck Windows</li>
<li>zzz</li>
<li>adsafsf</li>
<li>broken</li>
</ul>
<p>I for one have written my fair share of "zzz"s. This tip is hard to implement if you've already written all your commits. If you're feeling brave, or if you need to remove a few "fucks", you can re-write your commit history with <code>git rebase</code>. Be warned though, you can lose your code if you screw this up.</p>
<h3>Fix your formatting</h3>
<p>If I see inconsistent indentation or other poor formatting in someone's code, my opinion of their programming ability drops dramatically.
Is this fair? Maybe, maybe not, but that's how it is. Make sure all your code sticks to your language's standard styling conventions.
If you don't know what those are, find out, you'll need to learn them eventually.
Fixing bad coding style is much easier to do if you use a linter or auto-formatter.</p>
<h3>Add linting or formatting</h3>
<p>This one is a bonus, but it's reasonably quick to do. Grab your language community's favorite linter and run it over your code.
Something like <code>eslint</code> for JavaScript or <code>flake8</code> for Python.
For those not in the know, a linter is a program that identifies style issues in your code.
You run it over your codebase and it yells at you if you do anything wrong. You think your impostor syndrome is bad?
Try using a tool that screams at your about all your shitty style choices.
These tools are quite common in-industry and using one will help you stand out from other junior devs.</p>
<p>Even better than a linter, try using an auto-formatter. I prefer these personally.
These tools automatically re-write your code so they conform with a standard style.
Examples include <a href="https://golang.org/cmd/gofmt/">gofmt</a> for Go, <a href="https://github.com/psf/black">Black</a> for Python and
<a href="https://prettier.io/">Prettier</a> for JavaScript. I've written more about getting started with Black <a href="https://mattsegal.dev/python-formatting-with-black.html">here</a>.</p>
<p>Whatever you choose, make sure you document how to run the linter or formatting tool in your README.</p>
<h3>Write some tests</h3>
<p>Automated code testing is an important part of writing reliable professional-grade software.
If you want someone to pay you money to be a professional software developer, then you should demonstrate
that you know what a unit test is and how to write one. You don't need to write 100s of tests or get a high test coverage,
but write a <em>few</em> at least.</p>
<p>Needless to say, explain how to run your tests in your README.</p>
<h3>Add automated tests</h3>
<p>If you want to look super fancy then you can run your automated tests in GitHub Actions.
This isn't a must-have but it looks nice.
It'll take you 30 minutes if you've already written some tests and you can put a cool "tests passing" badge in your README that looks really good.
I've written more on how to do this <a href="https://mattsegal.dev/pytest-on-github-actions.html">here</a></p>
<h3>Deploy your project</h3>
<p>If your project is a website then make sure it's deployed and available online.
If you have deployed it, make sure there's a link to the live site in the README.
This could be a large undertaking, taking hours or days, especially if you haven't done this before, so
I'll leave it to you do decide if it's worthwhile.</p>
<p>If your project is a Django app and you want to get it online, then you might like my guide on <a href="https://mattsegal.dev/simple-django-deployment.html">simple Django deployments</a>.</p>
<h3>Add documentation</h3>
<p>This is a high effort endeavour so I don't really recommend it if you're just trying to quickly improve the appeal of your project.
That said, building HTML documentation with something like <a href="https://www.sphinx-doc.org/en/master/">Sphinx</a> and hosting it on <a href="https://pages.github.com/">GitHub Pages</a> looks pretty pro. This only really makes sense if your app is reasonably complicated and requires documentation.</p>
<h3>Next steps</h3>
<p>I mention GitHub a lot in this post, but the same tips apply for projects hosted on Bitbucket and GitLab. All these tips also apply to employer-supplied coding tests that are hosted on GitHub, although I'd caution you not to spend too much time jazzing up coding tests: too many beautiful submissions end up in the garbage.</p>
<p>Now you should have a few things you can do to spiff up your projects before you show them to prospective employers. I think it's important to make sure that the code that you've spent hours on isn't overlooked or dismissed because you didn't write a README.</p>
<p>Good luck, and please don't hesitate to mail me money if this post helps you get a job.</p>How to generate lots of dummy data for your Django app2020-06-14T12:00:00+10:002020-06-14T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-14:/django-factoryboy-dummy-data.html<p>It sucks when you're working on a Django app and all your pages are empty.
For example, if you're working on a forum webapp, then all your discussion boards will be empty by default:</p>
<p><img alt="dummy-threads-empty" src="https://mattsegal.dev/dummy-threads-empty.png"></p>
<p>Manually creating enough data for your pages to look realistic is a lot of work.
Wouldn't …</p><p>It sucks when you're working on a Django app and all your pages are empty.
For example, if you're working on a forum webapp, then all your discussion boards will be empty by default:</p>
<p><img alt="dummy-threads-empty" src="https://mattsegal.dev/dummy-threads-empty.png"></p>
<p>Manually creating enough data for your pages to look realistic is a lot of work.
Wouldn't it be nice if there was an automatic way to populate your local database with dummy data
that looks real? Eg. your forum app has many threads:</p>
<p><img alt="dummy-threads" src="https://mattsegal.dev/dummy-threads-full.png"></p>
<p>Even better, wouldn't it be cool if there was an easy way to populate each thread with as many comments
as you like?</p>
<p><img alt="dummy-comments" src="https://mattsegal.dev/dummy-comments.png"></p>
<p>In this post I'll show you how to use <a href="https://factoryboy.readthedocs.io/en/latest/">Factory Boy</a> and a few other tricks to quickly and repeatably generate an endless amount of dummy data for your Django app. By the end of the post you'll be able to generate all your test data using a management command:</p>
<div class="highlight"><pre><span></span><code>./manage.py setup_test_data
</code></pre></div>
<p>There is example code for this blog post hosted in <a href="https://github.com/MattSegal/djdt-perf-demo">this GitHub repo</a>.</p>
<h3>Example application</h3>
<p>In this post we'll be working with an example app that is an online forum. There are four models that we'll be working with:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">User</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""A person who uses the website"""</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Thread</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""A forum comment thread"""</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="n">creator</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Comment</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""A comment by a user on a thread"""</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="n">poster</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
<span class="n">thread</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">Thread</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Club</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""A group of users interested in the same thing"""</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="n">member</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ManyToManyField</span><span class="p">(</span><span class="n">User</span><span class="p">)</span>
</code></pre></div>
<h3>Building data with Factory Boy</h3>
<p>We'll be using <a href="https://factoryboy.readthedocs.io/en/latest/">Factory Boy</a> to generate all our dummy data. It's a library that's built for automated testing, but it also works well for this use-case. Factory Boy can easily be configured to generate random but realistic data like names, emails and paragraphs by internally using the <a href="https://faker.readthedocs.io/en/master/">Faker</a> library.</p>
<p>When using Factory Boy you create classes called "factories", which each represent a Django model. For example, for a user, you would create a factory class as follows:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># factories.py</span>
<span class="kn">import</span> <span class="nn">factory</span>
<span class="kn">from</span> <span class="nn">factory.django</span> <span class="kn">import</span> <span class="n">DjangoModelFactory</span>
<span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">User</span>
<span class="c1"># Defining a factory</span>
<span class="k">class</span> <span class="nc">UserFactory</span><span class="p">(</span><span class="n">DjangoModelFactory</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">User</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">factory</span><span class="o">.</span><span class="n">Faker</span><span class="p">(</span><span class="s2">"first_name"</span><span class="p">)</span>
<span class="c1"># Using a factory with auto-generated data</span>
<span class="n">u</span> <span class="o">=</span> <span class="n">UserFactory</span><span class="p">()</span>
<span class="n">u</span><span class="o">.</span><span class="n">name</span> <span class="c1"># Kimberly</span>
<span class="n">u</span><span class="o">.</span><span class="n">id</span> <span class="c1"># 51</span>
<span class="c1"># You can optionally pass in your own data</span>
<span class="n">u</span> <span class="o">=</span> <span class="n">UserFactory</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"Alice"</span><span class="p">)</span>
<span class="n">u</span><span class="o">.</span><span class="n">name</span> <span class="c1"># Alice</span>
<span class="n">u</span><span class="o">.</span><span class="n">id</span> <span class="c1"># 52</span>
</code></pre></div>
<p>You can find the data types that Faker can produce by looking at the "<a href="https://faker.readthedocs.io/en/master/providers.html">providers</a>" that the library offers. Eg. I found "first_name" by reviewing the options inside the <a href="https://faker.readthedocs.io/en/master/providers/faker.providers.person.html">person provider</a>.</p>
<p>Another benefit of Factory boy is that it can be set up to generate related data using <a href="https://factoryboy.readthedocs.io/en/latest/recipes.html#dependent-objects-foreignkey">SubFactory</a>, saving you a lot of boilerplate and time. For example we can set up the <code>ThreadFactory</code> so that it generates a <code>User</code> as its creator automatically:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># factories.py</span>
<span class="k">class</span> <span class="nc">ThreadFactory</span><span class="p">(</span><span class="n">DjangoModelFactory</span><span class="p">):</span>
<span class="k">class</span> <span class="nc">Meta</span><span class="p">:</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Thread</span>
<span class="n">creator</span> <span class="o">=</span> <span class="n">factory</span><span class="o">.</span><span class="n">SubFactory</span><span class="p">(</span><span class="n">UserFactory</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">factory</span><span class="o">.</span><span class="n">Faker</span><span class="p">(</span>
<span class="s2">"sentence"</span><span class="p">,</span>
<span class="n">nb_words</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
<span class="n">variable_nb_words</span><span class="o">=</span><span class="kc">True</span>
<span class="p">)</span>
<span class="c1"># Create a new thread</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">ThreadFactory</span><span class="p">()</span>
<span class="n">t</span><span class="o">.</span><span class="n">title</span> <span class="c1"># Room marriage study</span>
<span class="n">t</span><span class="o">.</span><span class="n">creator</span> <span class="c1"># <User: Michelle></span>
<span class="n">t</span><span class="o">.</span><span class="n">creator</span><span class="o">.</span><span class="n">name</span> <span class="c1"># Michelle</span>
</code></pre></div>
<p>The ability to automatically generate related models and fake data makes Factory Boy quite powerful. It's worth taking a quick look at the <a href="https://factoryboy.readthedocs.io/en/latest/recipes.html">other suggested patterns</a> if you decide to try it out.</p>
<h3>Adding a management command</h3>
<p>Once you've defined all the models that you want to generate with Factory Boy, you can write a <a href="https://simpleisbetterthancomplex.com/tutorial/2018/08/27/how-to-create-custom-django-management-commands.html">management command</a> to automatically populate your database. This is a pretty crude script that doesn't take advantage of all of Factory Boy's features, like sub-factories, but I didn't want to spend too much time getting fancy:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># setup_test_data.py</span>
<span class="kn">import</span> <span class="nn">random</span>
<span class="kn">from</span> <span class="nn">django.db</span> <span class="kn">import</span> <span class="n">transaction</span>
<span class="kn">from</span> <span class="nn">django.core.management.base</span> <span class="kn">import</span> <span class="n">BaseCommand</span>
<span class="kn">from</span> <span class="nn">forum.models</span> <span class="kn">import</span> <span class="n">User</span><span class="p">,</span> <span class="n">Thread</span><span class="p">,</span> <span class="n">Club</span><span class="p">,</span> <span class="n">Comment</span>
<span class="kn">from</span> <span class="nn">forum.factories</span> <span class="kn">import</span> <span class="p">(</span>
<span class="n">UserFactory</span><span class="p">,</span>
<span class="n">ThreadFactory</span><span class="p">,</span>
<span class="n">ClubFactory</span><span class="p">,</span>
<span class="n">CommentFactory</span>
<span class="p">)</span>
<span class="n">NUM_USERS</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">NUM_CLUBS</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">NUM_THREADS</span> <span class="o">=</span> <span class="mi">12</span>
<span class="n">COMMENTS_PER_THREAD</span> <span class="o">=</span> <span class="mi">25</span>
<span class="n">USERS_PER_CLUB</span> <span class="o">=</span> <span class="mi">8</span>
<span class="k">class</span> <span class="nc">Command</span><span class="p">(</span><span class="n">BaseCommand</span><span class="p">):</span>
<span class="n">help</span> <span class="o">=</span> <span class="s2">"Generates test data"</span>
<span class="nd">@transaction</span><span class="o">.</span><span class="n">atomic</span>
<span class="k">def</span> <span class="nf">handle</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"Deleting old data..."</span><span class="p">)</span>
<span class="n">models</span> <span class="o">=</span> <span class="p">[</span><span class="n">User</span><span class="p">,</span> <span class="n">Thread</span><span class="p">,</span> <span class="n">Comment</span><span class="p">,</span> <span class="n">Club</span><span class="p">]</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">models</span><span class="p">:</span>
<span class="n">m</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s2">"Creating new data..."</span><span class="p">)</span>
<span class="c1"># Create all the users</span>
<span class="n">people</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_USERS</span><span class="p">):</span>
<span class="n">person</span> <span class="o">=</span> <span class="n">UserFactory</span><span class="p">()</span>
<span class="n">people</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">person</span><span class="p">)</span>
<span class="c1"># Add some users to clubs</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_CLUBS</span><span class="p">):</span>
<span class="n">club</span> <span class="o">=</span> <span class="n">ClubFactory</span><span class="p">()</span>
<span class="n">members</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choices</span><span class="p">(</span>
<span class="n">people</span><span class="p">,</span>
<span class="n">k</span><span class="o">=</span><span class="n">USERS_PER_CLUB</span>
<span class="p">)</span>
<span class="n">club</span><span class="o">.</span><span class="n">user</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">members</span><span class="p">)</span>
<span class="c1"># Create all the threads</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">NUM_THREADS</span><span class="p">):</span>
<span class="n">creator</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">people</span><span class="p">)</span>
<span class="n">thread</span> <span class="o">=</span> <span class="n">ThreadFactory</span><span class="p">(</span><span class="n">creator</span><span class="o">=</span><span class="n">creator</span><span class="p">)</span>
<span class="c1"># Create comments for each thread</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">COMMENTS_PER_THREAD</span><span class="p">):</span>
<span class="n">commentor</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">people</span><span class="p">)</span>
<span class="n">CommentFactory</span><span class="p">(</span>
<span class="n">user</span><span class="o">=</span><span class="n">commentor</span><span class="p">,</span>
<span class="n">thread</span><span class="o">=</span><span class="n">thread</span>
<span class="p">)</span>
</code></pre></div>
<p>Using the <code>transaction.atomic</code> decorator makes a big difference in the runtime of this script, since it bundles up 100s of queries and submits them in one go.</p>
<h3>Images</h3>
<p>If you need dummy images for your website as well then there are a lot of great free tools online to help. I use <a href="https://api.adorable.io">adorable.io</a> for dummy profile pics and <a href="https://picsum.photos/">Picsum</a> or <a href="https://unsplash.com/developers">Unsplash</a> for larger pictures like this one: <a href="https://picsum.photos/700/500">https://picsum.photos/700/500</a>.</p>
<p><img alt="picsum-example" src="https://picsum.photos/700/500"></p>
<h3>Next steps</h3>
<p>Hopefully this post helps you spin up a lot of fake data for your Django app very quickly.
If you enjoy using Factory Boy to generate your dummy data, then you also might like incorporating it into your unit tests.</p>How to automatically reset your local Django database2020-06-13T12:00:00+10:002020-06-13T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-13:/reset-django-local-database.html<p>Sometimes when you're working on a Django app you want a fresh start. You want to nuke all of the data in your local database and start again from scratch. Maybe you ran some migrations that you don't want to keep, or perhaps there's some test data that you want …</p><p>Sometimes when you're working on a Django app you want a fresh start. You want to nuke all of the data in your local database and start again from scratch. Maybe you ran some migrations that you don't want to keep, or perhaps there's some test data that you want to get rid of. This kind of problem doesn't crop up very often, but when it does it's <em>super</em> annoying to do it manually over and over.</p>
<p>In this post I'll show you small script that you can use to reset your local Django database. It completely automates deleting the old data, running migrations and setting up new users. I've written the script in <code>bash</code> but most of it will also work in <code>powershell</code> or <code>cmd</code> with only minor changes.</p>
<p>For those of you who hate reading, the full script is near the bottom.</p>
<h3>Resetting the database</h3>
<p>We're going to reset our local database with the <a href="https://django-extensions.readthedocs.io/en/latest/installation_instructions.html">django-extensions</a> package, which provides a nifty little helper command called <code>reset_db</code>. This command destroys and recreates your Django app's database.</p>
<div class="highlight"><pre><span></span><code>./manage.py reset_db
</code></pre></div>
<p>I like to add the <code>--noinput</code> flag so the script does not ask me for confirmation, and the <code>--close-sessions</code> flag if I'm using PostgreSQL locally so that the command does not fail if my Django app is connected the database at the same time.</p>
<div class="highlight"><pre><span></span><code>./manage.py reset_db --noinput --close-sessions
</code></pre></div>
<p>This is is a good start, but now we have no migrations, users or any other data in our database. We need to add some data back in there before we can start using the app again.</p>
<h3>Running migrations</h3>
<p>Before you do anything else it's important to run migrations so that all your database tables are set up correctly:</p>
<div class="highlight"><pre><span></span><code>./manage.py migrate
</code></pre></div>
<h3>Creating an admin user</h3>
<p>You want to have a superuser set up so you can log into the Django admin. It's nice when a script guarantees that your superuser always has the same username and password. The first part of creating a superuser is pretty standard:</p>
<div class="highlight"><pre><span></span><code>./manage.py createsuperuser <span class="se">\</span>
--username admin <span class="se">\</span>
--email admin@example.com <span class="se">\</span>
--noinput
</code></pre></div>
<p>Now we want to set the admin user's password to something easy to remember, like "12345". This isn't a security risk because it's just for local development. This step involves a little more scripting trickery. Here we can use <code>shell_plus</code>, which is an enhanced Django shell provided by django-extensions. The <code>shell_plus</code> command will automatically import all of our models, which means we can write short one liners like this one, which prints the number of Users in the database:</p>
<div class="highlight"><pre><span></span><code>./manage.py shell_plus --quiet-load -c <span class="s2">"print(User.objects.count())"</span>
<span class="c1"># 13</span>
</code></pre></div>
<p>Using this method we can grab our admin user and set their password:</p>
<div class="highlight"><pre><span></span><code>./manage.py shell_plus --quiet-load -c <span class="s2">"</span>
<span class="s2">u = User.objects.get(username='admin')</span>
<span class="s2">u.set_password('12345')</span>
<span class="s2">u.save()</span>
<span class="s2">"</span>
</code></pre></div>
<h3>Setting up new data</h3>
<p>There might be a little bit of data that you want to set up every time you reset your database. For example, in one app I run, I want to ensure that there is always a <code>SlackMessage</code> model that has a <code>SlackChannel</code>. We can set up this data in the same way we set up the admin user's password:</p>
<div class="highlight"><pre><span></span><code>./manage.py shell_plus --quiet-load -c <span class="s2">"</span>
<span class="s2">c = SlackChannel.objects.create(name='Test Alerts')</span>
<span class="s2">SlackMessage.objects.create(channel=c)</span>
<span class="s2">"</span>
</code></pre></div>
<p>If you need to set up a <em>lot</em> of data then there are options like <a href="https://docs.djangoproject.com/en/3.0/howto/initial-data/">fixtures</a> or tools like <a href="https://factoryboy.readthedocs.io/en/latest/">Factory Boy</a> (which I heartily recommend). If you only need to do a few lines of scripting to create your data, then you can include them in this script. If your development data setup is very complicated, then I recommend putting all the setup code into a custom management command.</p>
<h3>The final script</h3>
<p>This is the script that you can use to reset your local Django database:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Resets the local Django database, adding an admin login and migrations</span>
<span class="nb">set</span> -e
<span class="nb">echo</span> -e <span class="s2">"\n>>> Resetting the database"</span>
./manage.py reset_db --close-sessions --noinput
<span class="nb">echo</span> -e <span class="s2">"\n>>> Running migrations"</span>
./manage.py migrate
<span class="nb">echo</span> -e <span class="s2">"\n>>> Creating new superuser 'admin'"</span>
./manage.py createsuperuser <span class="se">\</span>
--username admin <span class="se">\</span>
--email admin@example.com <span class="se">\</span>
--noinput
<span class="nb">echo</span> -e <span class="s2">"\n>>> Setting superuser 'admin' password to 12345"</span>
./manage.py shell_plus --quiet-load -c <span class="s2">"</span>
<span class="s2">u=User.objects.get(username='admin')</span>
<span class="s2">u.set_password('12345')</span>
<span class="s2">u.save()</span>
<span class="s2">"</span>
<span class="c1"># Any extra data setup goes here.</span>
<span class="nb">echo</span> -e <span class="s2">"\n>>> Database restore finished."</span>
</code></pre></div>
<h3>Other methods</h3>
<p>It's good to note that what I'm proposing is the "nuclear option": purge everything and restart from scratch. There are also some more precise methods available for managing your local database:</p>
<ul>
<li>If you just want to reverse some particular migrations, then you can use the <code>migrate</code> command <a href="https://docs.djangoproject.com/en/3.0/topics/migrations/#reversing-migrations">as documented here</a>.</li>
<li>If you just want to delete all your data and you don't care about re-applying the migrations, then the <code>flush</code> management command, <a href="https://docs.djangoproject.com/en/3.0/ref/django-admin/#flush">documented here</a> will take care of that.</li>
</ul>
<h3>Docker environments</h3>
<p>If you're running your local Django app in a Docker container via <code>docker-compose</code>, then this process is a little bit more tricky, but it's not too much more complicated. You just need to add two commands to your script.</p>
<p>First you want a command to kill all running containers, which I do because I'm superstitious and don't trust that <code>reset_db</code> will actually close all database connections:</p>
<div class="highlight"><pre><span></span><code><span class="k">function</span> stop_docker <span class="o">{</span>
<span class="nb">echo</span> -e <span class="s2">"\nStopping all running Docker containers"</span>
<span class="c1"># Ensure that no containers automatically restart</span>
docker update --restart<span class="o">=</span>no <span class="sb">`</span>docker ps -q<span class="sb">`</span>
<span class="c1"># Kill everything</span>
docker <span class="nb">kill</span> <span class="sb">`</span>docker ps -q<span class="sb">`</span>
<span class="o">}</span>
</code></pre></div>
<p>We also want a shorthand way to run commands inside your docker environment. Let's say you are working with a compose file located at <code>docker/docker-compose.local.yml</code> and your Django app's container is called <code>web</code>. Then you can run your commands inside the container as follows:</p>
<div class="highlight"><pre><span></span><code><span class="k">function</span> run_docker <span class="o">{</span>
docker-compose -f docker/docker-compose.local.yml run --rm web <span class="nv">$@</span>
<span class="o">}</span>
</code></pre></div>
<p>Now we can just prefix <code>run_docker</code> to all the management commands we run. For example:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Without Docker</span>
./manage.py reset_db --close-sessions --noinput
<span class="c1"># With Docker</span>
run_docker ./manage.py reset_db --close-sessions --noinput
</code></pre></div>
<p>I will note that this <code>run_docker</code> shortcut can act a little weird when you're passing strings to <code>shell_plus</code>. You might need to experiment with different methods of escaping whitespace etc.</p>
<h3>Conclusion</h3>
<p>Hopefully this script will save you some time when you're working on your Django app. If you're interested in more Django-related database stuff then you might enjoy reading about how to <a href="https://mattsegal.dev/postgres-backup-and-restore.html">back up and restore a Postgres database</a> and then how to <a href="https://mattsegal.dev/postgres-backup-automate.html">fully automate your prod backup process</a>.</p>How to automate your Postgres database backups2020-06-05T12:00:00+10:002020-06-05T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-05:/postgres-backup-automate.html<p>If you've got a web app running in production, then you'll want to take <a href="https://mattsegal.dev/postgres-backup-and-restore.html">regular database backups</a>, or else you risk losing all your data. Taking these backups manually is fine, but it's easy to forget to do it. It's better to remove the chance of human error and automate …</p><p>If you've got a web app running in production, then you'll want to take <a href="https://mattsegal.dev/postgres-backup-and-restore.html">regular database backups</a>, or else you risk losing all your data. Taking these backups manually is fine, but it's easy to forget to do it. It's better to remove the chance of human error and automate the whole process. To automate your backup and restore you will need three things:</p>
<ul>
<li>A safe place to store your backup files</li>
<li>A script that creates the backups and uploads them to the safe place</li>
<li>A method to automatically run the backup script every day</li>
</ul>
<h3>A safe place for your database backup files</h3>
<p>You don't want to store your backup files on the same server as your database. If your database server gets deleted, then you'll lose your backups as well. Instead, you should store your backups somewhere else, like a hard drive, your PC, or in the cloud.</p>
<p>I like using cloud object storage for this kind of use-case. If you haven't heard of "object storage" before: it's just a kind of cloud service where you can store a bunch of files. All major cloud providers offer this service:</p>
<ul>
<li>Amazon's AWS has the <a href="https://aws.amazon.com/s3/">Simple Storage Service (S3)</a></li>
<li>Microsoft's Azure has <a href="https://azure.microsoft.com/en-us/services/storage/">Storage</a></li>
<li>Google Cloud also has <a href="https://cloud.google.com/storage">Storage</a></li>
<li>DigitalOcean has <a href="https://www.digitalocean.com/products/spaces/">Spaces</a></li>
</ul>
<p>These object storage services are <em>very</em> cheap at around 2c/GB/month, you'll never run out of disk space, they're easy to access from command line tools and they have very fast upload/download speeds, especially to/from other services hosted with the same cloud provider. I use these services a lot: this blog is being served from AWS S3.</p>
<p>I like using S3 simply because I'm quite familiar with it, so that's what we're going to use for the rest of this post. If you're not already familiar with using the AWS command-line, then check out this post I wrote about <a href="https://mattsegal.dev/aws-s3-intro.html">getting started with AWS S3</a> before you continue.</p>
<h3>Creating a database backup script</h3>
<p>In my <a href="https://mattsegal.dev/postgres-backup-and-restore.html">previous post on database backups</a> I showed you a small script to automatically take a backup using PostgreSQL:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Backs up mydatabase to a file.</span>
<span class="nv">TIME</span><span class="o">=</span><span class="k">$(</span>date <span class="s2">"+%s"</span><span class="k">)</span>
<span class="nv">BACKUP_FILE</span><span class="o">=</span><span class="s2">"postgres_</span><span class="si">${</span><span class="nv">PGDATABASE</span><span class="si">}</span><span class="s2">_</span><span class="si">${</span><span class="nv">TIME</span><span class="si">}</span><span class="s2">.pgdump"</span>
<span class="nb">echo</span> <span class="s2">"Backing up </span><span class="nv">$PGDATABASE</span><span class="s2"> to </span><span class="nv">$BACKUP_FILE</span><span class="s2">"</span>
pg_dump --format<span class="o">=</span>custom > <span class="nv">$BACKUP_FILE</span>
<span class="nb">echo</span> <span class="s2">"Backup completed for </span><span class="nv">$PGDATABASE</span><span class="s2">"</span>
</code></pre></div>
<p>I'm going to assume you have set up your Postgres database environment variables (<code>PGHOST</code>, etc) either in the script, or elsewhere, as mentioned in the previous post.
Next we're going to get our script to upload all backups to AWS S3.</p>
<h3>Uploading backups to AWS Simple Storage Service (S3)</h3>
<p>We will be uploading our backups to S3 with the <code>aws</code> command line (CLI) tool. To get this tool to work, we need to set up our AWS credentials on the server by either using <code>aws configure</code> or by setting the environment variables <code>AWS_ACCESS_KEY_ID</code> and <code>AWS_SECRET_ACCESS_KEY</code>. Once that's done we can use <code>aws s3 cp</code> to upload our backup files. Let's say we're using a bucket called "<code>mydatabase-backups</code>":</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Backs up mydatabase to a file and then uploads it to AWS S3.</span>
<span class="c1"># First, dump database backup to a file</span>
<span class="nv">TIME</span><span class="o">=</span><span class="k">$(</span>date <span class="s2">"+%s"</span><span class="k">)</span>
<span class="nv">BACKUP_FILE</span><span class="o">=</span><span class="s2">"postgres_</span><span class="si">${</span><span class="nv">PGDATABASE</span><span class="si">}</span><span class="s2">_</span><span class="si">${</span><span class="nv">TIME</span><span class="si">}</span><span class="s2">.pgdump"</span>
<span class="nb">echo</span> <span class="s2">"Backing up </span><span class="nv">$PGDATABASE</span><span class="s2"> to </span><span class="nv">$BACKUP_FILE</span><span class="s2">"</span>
pg_dump --format<span class="o">=</span>custom > <span class="nv">$BACKUP_FILE</span>
<span class="c1"># Second, copy file to AWS S3</span>
<span class="nv">S3_BUCKET</span><span class="o">=</span>s3://mydatabase-backups
<span class="nv">S3_TARGET</span><span class="o">=</span><span class="nv">$S3_BUCKET</span>/<span class="nv">$BACKUP_FILE</span>
<span class="nb">echo</span> <span class="s2">"Copying </span><span class="nv">$BACKUP_FILE</span><span class="s2"> to </span><span class="nv">$S3_TARGET</span><span class="s2">"</span>
aws s3 cp <span class="nv">$BACKUP_FILE</span> <span class="nv">$S3_TARGET</span>
<span class="nb">echo</span> <span class="s2">"Backup completed for </span><span class="nv">$PGDATABASE</span><span class="s2">"</span>
</code></pre></div>
<p>You should be able to run this multiple times and see a new backup appear in your S3 bucket's webpage every time you do it. As a bonus, you can add a little one liner at the end of your script that checks for the last uploaded file to the S3 bucket:</p>
<div class="highlight"><pre><span></span><code><span class="nv">BACKUP_RESULT</span><span class="o">=</span><span class="k">$(</span>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> tail -n <span class="m">1</span><span class="k">)</span>
<span class="nb">echo</span> <span class="s2">"Latest S3 backup: </span><span class="nv">$BACKUP_RESULT</span><span class="s2">"</span>
</code></pre></div>
<p>Once you're confident that your backup script works, we can move on to getting it to run every day.</p>
<h3>Running cron jobs</h3>
<p>Now we need to get our server to run this script every day, even when we're not around. The simplest way to do this is on a Linux server is with <a href="https://en.wikipedia.org/wiki/Cron">cron</a>. Cron can automatically run scripts for us on a schedule. We'll be using the <code>crontab</code> tool to set up our backup job.</p>
<p>You can read more about how to use crontab <a href="https://linuxize.com/post/scheduling-cron-jobs-with-crontab/">here</a>. If you find that you're having issues setting up cron, you might also find this <a href="https://serverfault.com/questions/449651/why-is-my-crontab-not-working-and-how-can-i-troubleshoot-it">StackOverflow post</a> useful.</p>
<p>Before we set up our daily database backup job, I suggest trying out a test script to make sure that your cron setup is working. For example, this script prints the current time when it is run:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="nb">echo</span> <span class="k">$(</span>date<span class="k">)</span>
</code></pre></div>
<p>Using <code>nano</code>, you can create a new file called <code>~/test.sh</code>, save it, then make it executable as follows:</p>
<div class="highlight"><pre><span></span><code>nano ~/test.sh
<span class="c1"># Write out the time printing script in nano, save the file.</span>
chmod +x ~/test.sh
</code></pre></div>
<p>Then you can test it out a little by running it a couple of times to check that it is printing the time:</p>
<div class="highlight"><pre><span></span><code>~/test.sh
<span class="c1"># Sat Jun 6 08:05:14 UTC 2020</span>
~/test.sh
<span class="c1"># Sat Jun 6 08:05:14 UTC 2020</span>
~/test.sh
<span class="c1"># Sat Jun 6 08:05:14 UTC 2020</span>
</code></pre></div>
<p>Once you're confident that your test script works, you can create a cron job to run it every minute. Cron uses a special syntax to specifiy how often a job runs. These "cron expressions" are a pain to write by hand, so I use <a href="https://crontab.cronhub.io/">this tool</a> to generate them. The cron expression for "every minute" is the inscrutable string "<code>* * * * *</code>". This is the crontab entry that we're going to use:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Test crontab entry</span>
<span class="nv">SHELL</span><span class="o">=</span>/bin/bash
* * * * * ~/test.sh <span class="p">&</span>>> ~/time.log
</code></pre></div>
<ul>
<li>The <code>SHELL</code> setting tells crontab to use bash to execute our command</li>
<li>The "<code>* * * * *</code>" entry tells cron to execute our command every minute</li>
<li>The command <code>~/test.sh &>> ~/time.log</code> runs our test script <code>~/test.sh</code> and then appends all output to a log file called <code>~/time.log</code></li>
</ul>
<p>Enter the text above into your user's crontab file using the crontab editor:</p>
<div class="highlight"><pre><span></span><code>crontab -e
</code></pre></div>
<p>Once you've saved your entry, you should then be able to view your crontab entry using the list command:</p>
<div class="highlight"><pre><span></span><code>crontab -l
<span class="c1"># SHELL=/bin/bash</span>
<span class="c1"># * * * * * ~/test.sh &>> ~/time.log</span>
</code></pre></div>
<p>You can check that cron is actually trying to run your script by watching the system log:</p>
<div class="highlight"><pre><span></span><code>tail -f /var/log/syslog <span class="p">|</span> grep CRON
<span class="c1"># Jun 6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &>> ~/time.log)</span>
<span class="c1"># Jun 6 11:17:01 swarm CRON[6908]: (root) CMD (~/test.sh &>> ~/time.log)</span>
</code></pre></div>
<p>You can also watch your logfile to see that time is being written every minute:</p>
<div class="highlight"><pre><span></span><code>tail -f time.log
<span class="c1"># Sat Jun 6 11:34:01 UTC 2020</span>
<span class="c1"># Sat Jun 6 11:35:01 UTC 2020</span>
<span class="c1"># Sat Jun 6 11:36:01 UTC 2020</span>
<span class="c1"># Sat Jun 6 11:37:01 UTC 2020</span>
</code></pre></div>
<p>Once you're happy that you can run a test script every minute with cron, we can move on to running your database backup script daily.</p>
<h3>Running our backup script daily</h3>
<p>Now we're nearly ready to run our backup script using a cron job. There are a few changes that we'll need to make to our existing setup. First we need to write our database backup script to <code>~/backup.sh</code> and make sure it is executable:</p>
<div class="highlight"><pre><span></span><code>chmod +x ~/backup.sh
</code></pre></div>
<p>Then we need to crontab entry to run every day, which will be "<a href="https://crontab.cronhub.io/"><code>0 0 * * *</code></a>", and update our cron command to run our backup script. Our new crontab entry should be:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Database backup crontab entry</span>
<span class="nv">SHELL</span><span class="o">=</span>/bin/bash
<span class="m">0</span> <span class="m">0</span> * * * ~/backup.sh <span class="p">&</span>>> ~/backup.log
</code></pre></div>
<p>Update your crontab with <code>crontab -e</code>. Now we wait! This script should run every night at midnight (server time) to take your database backups and upload them to AWS S3. If this isn't working, then change your cron expression so that it runs the script every minute, and use the steps I showed above to try and debug the problem.</p>
<p>Hopefully it all runs OK and you will have plenty of daily database backups to roll back to if anything ever goes wrong.</p>
<h3>Automatic restore from the latest backup</h3>
<p>When disaster strikes and you need your backups, you could manually view your S3 bucket, download the backup file, upload it to the server and manual run the restore, which I documented in my <a href="https://mattsegal.dev/postgres-backup-and-restore.html">previous post</a>. This is totally fine, but as a bonus I thought it would be nice to include a script that automatically downloads the latest backup file and uses it to restore your database. This kind of script would be ideal for dumping production data into a test server. First I'll show you the script, then I'll explain how it works:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="nb">echo</span> -e <span class="s2">"\nRestoring database </span><span class="nv">$PGDATABASE</span><span class="s2"> from S3 backups"</span>
<span class="c1"># Find the latest backup file</span>
<span class="nv">S3_BUCKET</span><span class="o">=</span>s3://mydatabase-backups
<span class="nv">LATEST_FILE</span><span class="o">=</span><span class="k">$(</span>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort <span class="p">|</span> tail -n <span class="m">1</span><span class="k">)</span>
<span class="nb">echo</span> -e <span class="s2">"\nFound file </span><span class="nv">$LATEST_FILE</span><span class="s2"> in bucket </span><span class="nv">$S3_BUCKET</span><span class="s2">"</span>
<span class="c1"># Restore from the latest backup file</span>
<span class="nb">echo</span> -e <span class="s2">"\nRestoring </span><span class="nv">$PGDATABASE</span><span class="s2"> from </span><span class="nv">$LATEST_FILE</span><span class="s2">"</span>
<span class="nv">S3_TARGET</span><span class="o">=</span><span class="nv">$S3_BUCKET</span>/<span class="nv">$LATEST_FILE</span>
aws s3 cp <span class="nv">$S3_TARGET</span> - <span class="p">|</span> pg_restore --dbname <span class="nv">$PGDATABASE</span> --clean --no-owner
<span class="nb">echo</span> -e <span class="s2">"\nRestore completed"</span>
</code></pre></div>
<p>I've assumed that all the Postgres environment variables (<code>PGHOST</code>, etc) are already set elsewhere.</p>
<p>There are three tasks that are done in this script:</p>
<ul>
<li>finding the latest backup file in S3</li>
<li>downloading the backup file</li>
<li>restoring from the backup file</li>
</ul>
<p>So the first part of this script is finding the latest database backup file. The way we know which file is the latest is because of the Unix timestamp which we added to the filename. The first command we use is <code>aws s3 ls</code>, which shows us all the files in our backup bucket:</p>
<div class="highlight"><pre><span></span><code>aws s3 ls <span class="nv">$S3_BUCKET</span>
<span class="c1"># 2019-04-04 10:04:58 112309 postgres_mydatabase_1554372295.pgdump</span>
<span class="c1"># 2019-04-06 07:48:53 112622 postgres_mydatabase_1554536929.pgdump</span>
<span class="c1"># 2019-04-14 07:24:02 113484 postgres_mydatabase_1555226638.pgdump</span>
<span class="c1"># 2019-05-06 11:37:39 115805 postgres_mydatabase_1557142655.pgdump</span>
</code></pre></div>
<p>We then use <code>awk</code> to isolate the filename. <code>awk</code> is a text processing tool which I use occasionally, along with <code>cut</code> and <code>sed</code> to mangle streams of text into the shape I want. I hate them all, but they can be useful.</p>
<div class="highlight"><pre><span></span><code>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span>
<span class="c1"># postgres_mydatabase_1554372295.pgdump</span>
<span class="c1"># postgres_mydatabase_1554536929.pgdump</span>
<span class="c1"># postgres_mydatabase_1555226638.pgdump</span>
<span class="c1"># postgres_mydatabase_1557142655.pgdump</span>
</code></pre></div>
<p>We then run <code>sort</code> over this output to ensure that each line is sorted by the time. The aws CLI tool seems to sort this data by the uploaded time, but we want to use <em>our</em> timestamp, just in case a file was manually uploaded out-of-order:</p>
<div class="highlight"><pre><span></span><code>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort
<span class="c1"># postgres_mydatabase_1554372295.pgdump</span>
<span class="c1"># postgres_mydatabase_1554536929.pgdump</span>
<span class="c1"># postgres_mydatabase_1555226638.pgdump</span>
<span class="c1"># postgres_mydatabase_1557142655.pgdump</span>
</code></pre></div>
<p>We use <code>tail</code> to grab the last line of the output:</p>
<div class="highlight"><pre><span></span><code>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort <span class="p">|</span> tail -n <span class="m">1</span>
<span class="c1"># postgres_mydatabase_1557142655.pgdump</span>
</code></pre></div>
<p>And there's our filename! We use the <code>$()</code> <a href="http://www.tldp.org/LDP/abs/html/commandsub.html">command-substituation</a> thingy to capture the command output and store it in a variable:</p>
<div class="highlight"><pre><span></span><code><span class="nv">LATEST_FILE</span><span class="o">=</span><span class="k">$(</span>aws s3 ls <span class="nv">$S3_BUCKET</span> <span class="p">|</span> awk <span class="s1">'{print $4}'</span> <span class="p">|</span> sort <span class="p">|</span> tail -n <span class="m">1</span><span class="k">)</span>
<span class="nb">echo</span> <span class="nv">$LATEST_FILE</span>
<span class="c1"># postgres_mydatabase_1557142655.pgdump</span>
</code></pre></div>
<p>And that's part one of our script done: find the latest backup file. Now we need to download that file and use it to restore our database. We use the <code>aws</code> CLI to copy backup file from S3 and stream the bytes into stdout. This literally prints out your whole backup file into the terminal:</p>
<div class="highlight"><pre><span></span><code><span class="nv">S3_TARGET</span><span class="o">=</span><span class="nv">$S3_BUCKET</span>/<span class="nv">$LATEST_FILE</span>
aws s3 cp <span class="nv">$S3_TARGET</span> -
<span class="c1"># xtshirt9.5.199.5.19k0ENCODINENCODING</span>
<span class="c1"># SET client_encoding = 'UTF8';</span>
<span class="c1"># false00</span>
<span class="c1"># ... etc ...</span>
</code></pre></div>
<p>The <code>-</code> symbol is commonly used in shell scripting to mean "write to stdout". This isn't very useful on it's own, but we can send that data to the <code>pg_restore</code> command via a pipe:</p>
<div class="highlight"><pre><span></span><code><span class="nv">S3_TARGET</span><span class="o">=</span><span class="nv">$S3_BUCKET</span>/<span class="nv">$LATEST_FILE</span>
aws s3 cp <span class="nv">$S3_TARGET</span> - <span class="p">|</span> pg_restore --dbname <span class="nv">$PGDATABASE</span> --clean --no-owner
</code></pre></div>
<p>And that's the whole script!</p>
<h3>Next steps</h3>
<p>Now you can set up automated backups for your Postgres database. Hopefully having these daily backups this will take a weight off your mind. Don't forget to do a test restore every now and then, because backups are worthless if you aren't confident that they actually work.</p>
<p>If you want to learn more about the Unix shell tools I used in this post, then I recommend having a go at the <a href="https://overthewire.org/">Over the Wire Wargames</a>, which teaches you about bash scripting and hacking at the same time.</p>An introduction to cloud file storage2020-06-05T11:00:00+10:002020-06-05T11:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-05:/aws-s3-intro.html<p>Sometimes when you're running a web app you will find that you have a lot of files on your server. All these files will start to feel like a burden. You might worry about losing them all if the server fails, or you might be concerned about running out of …</p><p>Sometimes when you're running a web app you will find that you have a lot of files on your server. All these files will start to feel like a burden. You might worry about losing them all if the server fails, or you might be concerned about running out of disk space. You might even have multiple servers that all need to access these files.</p>
<p>Wouldn't it be nice if solving all these issues were someone else's problem? You would pay a few cents a month so that you never need to think about this again, right? I like using cloud object storage for hosting most of my web app's files and backups. If you haven't heard of "object storage" before: it's just a kind of cloud service where you can store a bunch of files. All major cloud providers offer this service:</p>
<ul>
<li>Amazon's AWS has the <a href="https://aws.amazon.com/s3/">Simple Storage Service (S3)</a></li>
<li>Microsoft's Azure has <a href="https://azure.microsoft.com/en-us/services/storage/">Storage</a></li>
<li>Google Cloud also has <a href="https://cloud.google.com/storage">Storage</a></li>
<li>DigitalOcean has <a href="https://www.digitalocean.com/products/spaces/">Spaces</a></li>
</ul>
<p>These object storage services are <em>very</em> cheap at around 2c/GB/month, you'll never run out of disk space, they're easy to access from command line tools and they have very fast upload/download speeds, especially to/from other services hosted with the same cloud provider. I use these services a lot: this blog is being served from AWS S3.</p>
<p>I like using S3 simply because I'm quite familiar with it, so that's what we're going to use for the rest of this post. The other services are probably great as well. This video will take you through how to get started with AWS S3.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/b-icwbsGZkc"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>As an update to this video: AWS also ships a self-contained CLI tool that doesn't need to be installed in a virtual environment, which you can read about <a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html">here</a>. Eg:</p>
<div class="highlight"><pre><span></span><code><span class="nv">URL</span><span class="o">=</span><span class="s2">"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"</span>
curl <span class="nv">$URL</span> -o <span class="s2">"awscliv2.zip"</span>
unzip awscliv2.zip
sudo ./aws/install
aws --version
</code></pre></div>
<p>One great use-case for object storage like AWS S3 is hosting your <a href="https://mattsegal.dev/postgres-backup-automate.html">database backups</a>.</p>How to backup and restore a Postgres database2020-06-04T12:00:00+10:002020-06-04T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-06-04:/postgres-backup-and-restore.html<p>You've deployed your Django web app to to the internet. Grats! Now you have a fun new problem: your app's database is full of precious "live" data, and if you lose that data, it's gone forever. If your database gets blown away or corrupted, then you will need backups to …</p><p>You've deployed your Django web app to to the internet. Grats! Now you have a fun new problem: your app's database is full of precious "live" data, and if you lose that data, it's gone forever. If your database gets blown away or corrupted, then you will need backups to restore your data. This post will go over how to backup and restore PostgreSQL, which is the database most commonly deployed with Django.</p>
<p>Not everyone needs backups. If your Django app is just a hobby project then losing all your data might not be such a big deal. That said, if your app is a critical part of a business, then losing your app's data could literally mean the end of the business - people losing their jobs and going bankrupt. So, at least some of time, you don't want to lose all your data.</p>
<p>The good news is that backing up and restoring Postgres is pretty easy, you only need two commands: <code>pg_dump</code> and <code>pg_restore</code>. If you're using MySQL instead of Postgres, then you can do something very similar to the instructions in this post using <a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html"><code>mysqldump</code></a>.</p>
<h3>Taking database backups</h3>
<p>I'm going to assume that you've already got a Postgres database running somewhere. You'll need to run the following code from a <code>bash</code> shell on a Linux machine that can access the database. In this example, let's say you're logged into the database server with <code>ssh</code>.</p>
<p>The first thing to do is set some <a href="https://www.postgresql.org/docs/current/libpq-envars.html">Postgres-specifc environment variables</a> to specify your target database and login credentials. This is mostly for our convenience later on.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># The server Postgres is running on</span>
<span class="nb">export</span> <span class="nv">PGHOST</span><span class="o">=</span>localhost
<span class="c1"># The port Postgres is listening on</span>
<span class="nb">export</span> <span class="nv">PGPORT</span><span class="o">=</span><span class="m">5432</span>
<span class="c1"># The database you want to back up</span>
<span class="nb">export</span> <span class="nv">PGDATABASE</span><span class="o">=</span>mydatabase
<span class="c1"># The database user you are logging in as</span>
<span class="nb">export</span> <span class="nv">PGUSER</span><span class="o">=</span>myusername
<span class="c1"># The database user's password</span>
<span class="nb">export</span> <span class="nv">PGPASSWORD</span><span class="o">=</span>mypassw0rd
</code></pre></div>
<p>You can test these environment variables by running a <a href="https://www.postgresql.org/docs/current/app-psql.html"><code>psql</code></a> command to list all the tables in your app's database.</p>
<div class="highlight"><pre><span></span><code>psql -c <span class="s2">"\dt"</span>
<span class="c1"># Output:</span>
<span class="c1"># List of relations</span>
<span class="c1"># Schema | Name | Type | Owner</span>
<span class="c1">#--------+---------------+-------+--------</span>
<span class="c1"># public | auth_group | table | myusername</span>
<span class="c1"># public | auth_group... | table | myusername</span>
<span class="c1"># public | auth_permi... | table | myusername</span>
<span class="c1"># public | django_adm... | table | myusername</span>
<span class="c1"># .. etc ..</span>
</code></pre></div>
<p>If <code>psql</code> is missing you can install it on Ubuntu or Debian using <code>apt</code>:</p>
<div class="highlight"><pre><span></span><code>sudo apt install postgresql-client
</code></pre></div>
<p>Now we're ready to create a database dump with <a href="https://www.postgresql.org/docs/12/app-pgdump.html"><code>pg_dump</code></a>. It's pretty simple to use because we set up those environment variables earlier. When you run <code>pg_dump</code>, it just spits out a bunch of SQL statements as hundreds, or even thousands of lines of text. You can take a look at the output using <code>head</code> to view the first 10 lines of text:</p>
<div class="highlight"><pre><span></span><code>pg_dump <span class="p">|</span> head
<span class="c1"># Output:</span>
<span class="c1"># --</span>
<span class="c1"># -- PostgreSQL database dump</span>
<span class="c1"># --</span>
<span class="c1"># -- Dumped from database version 9.5.19</span>
<span class="c1"># -- Dumped by pg_dump version 9.5.19</span>
<span class="c1"># SET statement_timeout = 0;</span>
<span class="c1"># SET lock_timeout = 0;</span>
<span class="c1"># SET client_encoding = 'UTF8';</span>
</code></pre></div>
<p>The SQL statements produced by <code>pg_dump</code> are instructions on how to re-create your database. You can turn this output into a backup by writing all this SQL text into a file:</p>
<div class="highlight"><pre><span></span><code>pg_dump > mybackup.sql
</code></pre></div>
<p>That's it! You now have a database backup. You might have noticed that storing all your data as SQL statements is rather inefficient. You can compress this data by using the "custom" dump format:</p>
<div class="highlight"><pre><span></span><code>pg_dump --format<span class="o">=</span>custom > mybackup.pgdump
</code></pre></div>
<p>This "custom" format is ~3x smaller in terms of file size, but it's not as pretty for humans to read because it's now in some funky non-text binary format:</p>
<div class="highlight"><pre><span></span><code>pg_dump --format<span class="o">=</span>custom <span class="p">|</span> head
<span class="c1"># Output:</span>
<span class="c1"># xtshirt9.5.199.5.19k0ENCODINENCODING</span>
<span class="c1"># SET client_encoding = 'UTF8';</span>
<span class="c1"># false00</span>
<span class="c1"># ... etc ...</span>
</code></pre></div>
<p>Finally, <code>mybackup.pgdump</code> is a crappy file name. It's not clear what is inside the file. Are we going to remember which database this is for? How do we know that this is the freshest copy? Let's add a <a href="https://en.wikipedia.org/wiki/Unix_time">timestamp</a> plus a descriptive name to help us remember:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Get Unix epoch timestamp</span>
<span class="c1"># Eg. 1591255548</span>
<span class="nv">TIME</span><span class="o">=</span><span class="k">$(</span>date <span class="s2">"+%s"</span><span class="k">)</span>
<span class="c1"># Descriptive file name</span>
<span class="c1"># Eg. postgres_mydatabase_1591255548.pgdump</span>
<span class="nv">BACKUP_FILE</span><span class="o">=</span><span class="s2">"postgres_</span><span class="si">${</span><span class="nv">PGDATABASE</span><span class="si">}</span><span class="s2">_</span><span class="si">${</span><span class="nv">TIME</span><span class="si">}</span><span class="s2">.pgdump"</span>
pg_dump --format<span class="o">=</span>custom > <span class="nv">$BACKUP_FILE</span>
</code></pre></div>
<p>Now you can run these commands every month, week, or day to get a snapshot of your data. If you wanted, you could write this whole thing into a <code>bash</code> script called <code>backup.sh</code>:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="c1"># Backs up mydatabase to a file.</span>
<span class="nb">export</span> <span class="nv">PGHOST</span><span class="o">=</span>localhost
<span class="nb">export</span> <span class="nv">PGPORT</span><span class="o">=</span><span class="m">5432</span>
<span class="nb">export</span> <span class="nv">PGDATABASE</span><span class="o">=</span>mydatabase
<span class="nb">export</span> <span class="nv">PGUSER</span><span class="o">=</span>myusername
<span class="nb">export</span> <span class="nv">PGPASSWORD</span><span class="o">=</span>mypassw0rd
<span class="nv">TIME</span><span class="o">=</span><span class="k">$(</span>date <span class="s2">"+%s"</span><span class="k">)</span>
<span class="nv">BACKUP_FILE</span><span class="o">=</span><span class="s2">"postgres_</span><span class="si">${</span><span class="nv">PGDATABASE</span><span class="si">}</span><span class="s2">_</span><span class="si">${</span><span class="nv">TIME</span><span class="si">}</span><span class="s2">.pgdump"</span>
<span class="nb">echo</span> <span class="s2">"Backing up </span><span class="nv">$PGDATABASE</span><span class="s2"> to </span><span class="nv">$BACKUP_FILE</span><span class="s2">"</span>
pg_dump --format<span class="o">=</span>custom > <span class="nv">$BACKUP_FILE</span>
<span class="nb">echo</span> <span class="s2">"Backup completed"</span>
</code></pre></div>
<p>You should avoid hardcoding passwords like I just did above, it's better to pass credentials in as a script argument or environment variable. The file <code>/etc/environment</code> is a nice place to store these kinds of credentials on a secure server.</p>
<h3>Restoring your database from backups</h3>
<p>It's pointless creating backups if you don't know how to use them to restore your data. There are three scenarios that I can think of where you want to run a restore:</p>
<ul>
<li>You need to set up your database from scratch</li>
<li>You want to rollback your exiting database to a previous time</li>
<li>You want to restore data in your dev environment</li>
</ul>
<p>I'll go over these scenarios one at a time.</p>
<h3>Restoring from scratch</h3>
<p>Sometimes you can lose the database server and there is nothing left. Maybe you deleted it by accident, thinking it was a different server. Luckily you have your database backup file, and hopefully some <a href="https://mattsegal.dev/intro-config-management.html">automated configuration management</a> to help you quickly set the server up again.</p>
<p>Once you've got the new server provisioned and PostgreSQL installed, you'll need to recreate the database and the user who owns it:</p>
<div class="highlight"><pre><span></span><code>sudo -u postgres psql <span class="s"><<-EOF</span>
<span class="s"> CREATE USER $PGUSER WITH PASSWORD '$PGPASSWORD';</span>
<span class="s"> CREATE DATABASE $PGDATABASE WITH OWNER $PGUSER;</span>
<span class="s">EOF</span>
</code></pre></div>
<p>Then you can set up the same environment variables that we did earlier (PGHOST, etc.) and then use <a href="https://www.postgresql.org/docs/12/app-pgrestore.html"><code>pg_restore</code></a> to restore your data.
You'll probably see some warning errors, which is normal.</p>
<div class="highlight"><pre><span></span><code><span class="nv">BACKUP_FILE</span><span class="o">=</span>postgres_mydatabase_1591255548.pgdump
pg_restore --dbname <span class="nv">$PGDATABASE</span> <span class="nv">$BACKUP_FILE</span>
<span class="c1"># Output:</span>
<span class="c1"># ... lots of errors ...</span>
<span class="c1"># pg_restore: WARNING: no privileges were granted for "public"</span>
<span class="c1"># WARNING: errors ignored on restore: 1</span>
</code></pre></div>
<p>I'm not 100% on what all these errors mean, but I believe they're mostly related to the restore script trying to modify Postgres objects that your user does not have permission to modify. If you're using a standard Django app this shouldn't be an issue. You can check that the restore actually worked by checking your tables with <code>psql</code>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Check the tables</span>
psql -c <span class="s2">"\dt"</span>
<span class="c1"># Output:</span>
<span class="c1"># List of relations</span>
<span class="c1"># Schema | Name | Type | Owner</span>
<span class="c1">#--------+---------------+-------+--------</span>
<span class="c1"># public | auth_group | table | myusername</span>
<span class="c1"># public | auth_group... | table | myusername</span>
<span class="c1"># public | auth_permi... | table | myusername</span>
<span class="c1"># public | django_adm... | table | myusername</span>
<span class="c1"># .. etc ..</span>
<span class="c1"># Check the last migration</span>
psql -c <span class="s2">"SELECT * FROM django_migrations ORDER BY id DESC LIMIT 1"</span>
<span class="c1"># Output:</span>
<span class="c1"># id | app | name | applied</span>
<span class="c1"># ----+--------+-----------+---------------</span>
<span class="c1"># 20 | tshirt | 0003_a... | 2019-08-26...</span>
</code></pre></div>
<p>There you go! Your database has been restored. Crisis averted.</p>
<h3>Rolling back an existing database</h3>
<p>If you want to roll your existing database back to an previous point in time, deleting all new data, then you will need to use the <code>--clean</code> flag, which drops your restored database tables before re-creating them (<a href="https://www.postgresql.org/docs/12/app-pgrestore.html">docs here</a>):</p>
<div class="highlight"><pre><span></span><code><span class="nv">BACKUP_FILE</span><span class="o">=</span>postgres_mydatabase_1591255548.pgdump
pg_restore --clean --dbname <span class="nv">$PGDATABASE</span> <span class="nv">$BACKUP_FILE</span>
</code></pre></div>
<h3>Restoring a dev environment</h3>
<p>It's often beneficial to restore a testing or development database from a known backup.
When you do this, you're not so worried about setting up the right user permissions.
In this case you want to completely destroy and re-create the database to get a completely fresh start, and you want to use the <code>--no-owner</code> flag to ignore any database-user related stuff in the restore script:</p>
<div class="highlight"><pre><span></span><code>sudo -u postgres psql -c <span class="s2">"DROP DATABASE </span><span class="nv">$PGDATABASE</span><span class="s2">"</span>
sudo -u postgres psql -c <span class="s2">"CREATE DATABASE </span><span class="nv">$PGDATABASE</span><span class="s2">"</span>
<span class="nv">BACKUP_FILE</span><span class="o">=</span>postgres_mydatabase_1591255548.pgdump
pg_restore --no-owner --dbname <span class="nv">$PGDATABASE</span> <span class="nv">$BACKUP_FILE</span>
</code></pre></div>
<p>I use this method quite often to pull non-sensitive data down from production environments to try and reproduce bugs that have occured in prod. It's much easier to fix mysterious bugs when you have regular database backups, <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">error reporting</a> and <a href="https://mattsegal.dev/django-logging-papertrail.html">centralized logging</a>.</p>
<h3>Next steps</h3>
<p>I hope you now have the tools you need to backups and restore your Django app's Postgres database. If you want to read more the <a href="https://www.postgresql.org/docs/12/index.html">Postgres docs</a> have a good section on <a href="https://www.postgresql.org/docs/12/backup-dump.html">database backups</a>.</p>
<p>Once you've got your head around database backups, you should automate the process to make it more reliable. I will show you how to do this in <a href="https://mattsegal.dev/postgres-backup-automate.html">this follow-up post</a>.</p>A tour of Django server setups2020-05-25T12:00:00+10:002020-05-25T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-25:/django-prod-architectures.html<p>If you haven't deployed a lot of Django apps, then you might wonder:
how do professionals put Django apps on the internet? What does Django typically look like when it's running in production?
You might even be thinking <em>what the hell is <a href="https://www.techopedia.com/definition/8989/production-environment">production</a>?</em></p>
<p>Before I started working a developer there …</p><p>If you haven't deployed a lot of Django apps, then you might wonder:
how do professionals put Django apps on the internet? What does Django typically look like when it's running in production?
You might even be thinking <em>what the hell is <a href="https://www.techopedia.com/definition/8989/production-environment">production</a>?</em></p>
<p>Before I started working a developer there was just a fuzzy cloud in my head where the knowledge of production infrastructure should be.
If there's a fuzzy cloud in your head, let's fix it.
There are many ways to extend a Django server setup to achieve better performance, cost-effectiveness and reliability.
This post will take you on a tour of some common Django server setups, from the most simple and basic to the more complex and powerful.
I hope it will build up your mental model of how Django is hosted in production, piece-by-piece.</p>
<h2>Your local machine</h2>
<p>Let's start by reviewing a Django setup that you are alreay familiar with: your local machine.
Going over this will be a warm-up for later sections.
When you run Django locally, you have:</p>
<ul>
<li>Your web browser (Chrome, Safari, Firefox, etc)</li>
<li>Django running with the runserver management command</li>
<li>A SQLite database sitting in your project folder</li>
</ul>
<p><img alt="local server setup" src="https://mattsegal.dev/django-prod-architecture/local-server.png"></p>
<p>Pretty simple right? Next let's look at something similar, but deployed to a web server.</p>
<h2>Simplest possible webserver</h2>
<p>The simplest Django web server you can setup is very similar to your local dev environment.
Most professional Django devs don't use a basic setup like this for their production environments. It works perfectly fine, but it has some limitations that we'll discuss later.
It looks like this:</p>
<p><img alt="simple server setup" src="https://mattsegal.dev/django-prod-architecture/simple-server.png"></p>
<p>Typically people run Django on a Linux virtual machine, often using the Ubuntu distribution.
The virtual machine is hosted by a cloud provider like <a href="https://aws.amazon.com/">Amazon</a>, <a href="https://cloud.google.com/gcp/">Google</a>, <a href="https://azure.microsoft.com/en-au/">Azure</a>, <a href="https://www.digitalocean.com/">DigitalOcean</a> or <a href="https://www.linode.com/">Linode</a>.</p>
<p>Instead of using runserver, you should use a WSGI server like <a href="https://gunicorn.org/">Gunicorn</a> to run your Django app.
I go into more detail on why you shouldn't use runserver in production, and explain WSGI <a href="https://mattsegal.dev/simple-django-deployment-2.html#wsgi">here</a>.
Otherwise, not that much is different from your local machine: you can still use SQLite as the database (<a href="https://mattsegal.dev/simple-django-deployment-2.html#sqlite">more here</a>).</p>
<p>This is the bare bones of the setup. There are a few other details that you'll need to manage like <a href="https://mattsegal.dev/dns-for-noobs.html">setting up DNS</a>, virtual environments, babysitting Gunicorn with a process supervisor like <a href="https://mattsegal.dev/simple-django-deployment-4.html">Supervisord</a> or how to serve static files with <a href="http://whitenoise.evans.io/en/stable/">Whitenoise</a>. If you're interested in a more complete guide on how to set up a simple server like this, I wrote <a href="https://mattsegal.dev/simple-django-deployment.html">a guide</a> that explains how to deploy Django.</p>
<h2>Typical standalone webserver</h2>
<p>Let's go over an environment that a professional Django dev might set up in production when using a single server.
It's not the exact setup that everyone will always use, but the structure is very common.</p>
<p><img alt="typical server setup" src="https://mattsegal.dev/django-prod-architecture/typical-server.png"></p>
<p>Some things are the same as the simple setup above: it's still a Linux virtual machine with Django being run by Gunicorn.
There are three main differences:</p>
<ul>
<li>SQLite has been replaced by a different database, <a href="https://www.postgresql.org/">PostgreSQL</a></li>
<li>A <a href="https://www.nginx.com/">NGINX</a> web server is now sitting in-front of Gunicorn in a <a href="https://www.nginx.com/resources/glossary/reverse-proxy-server/">reverse-proxy</a> setup</li>
<li>Static files are now being served from outside of Django</li>
</ul>
<p>Why did we swap SQLite for PostgreSQL? In general Postgres is a litte more advanced and full featured. For example, Postgres can handle multiple writes at the same
time, while SQLite can't.</p>
<p>Why did we add NGINX to our setup? NGINX is a dedicated webserver which provides extra features and performance improvements
over just using Gunicorn to serve web requests. For example we can use NGINX to directly serve our app's static and media files more efficiently. NGINX can also be configured to a lot of other useful things, like encrypt your web traffic using HTTPS and compress your files to make your site faster. NGINX is the web server that is most commonly combined with Django, but there are also alternatives like the <a href="https://httpd.apache.org/">Apache HTTP server</a> and <a href="https://docs.traefik.io/">Traefik</a>.</p>
<p>It's important to note that everything here lives on a single server, which means that if the server goes away, so does all your data, <a href="https://mattsegal.dev/postgres-backup-and-restore.html">unless you have backups</a>.
This data includes your Django tables, which are stored in Postgres, and files uploaded by users, which will be stored in the <a href="https://docs.djangoproject.com/en/3.0/ref/settings/#media-root">MEDIA_ROOT</a> folder, somewhere on your filesystem. Having only one server also means that if your server restarts or shuts off, so does your website. This is OK for smaller projects, but it's not acceptable for big sites like StackOverflow or Instagram, where the cost of downtime is very high.</p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Single webserver with multiple apps</h2>
<p>Once you start using NGINX and PostgreSQL, you can run multiple Django apps on the same machine.
You can save money on hosting fees by packing multiple apps onto a single server rather than paying for a separate server for each app. This setup also allows you to re-use some of the services
and configurations that you've already set up.</p>
<p>NGINX is able to route incoming HTTP requests to different apps based on the domain name, and Postgres can host multiple databases on a single machine.
For example, I use a single server to host some of my personal Django projects: <a href="http://mattslinks.xyz/">Matt's Links</a>, <a href="http://memories.ninja/">Memories Ninja</a> and <a href="https://www.blogreader.com.au/">Blog Reader</a></p>
<p><img alt="multi-app server setup" src="https://mattsegal.dev/django-prod-architecture/multi-app-server.png"></p>
<p>I've omitted the static files for simplicity. Note that having multiple apps on one server saves you hosting costs, but there are downsides: restarting the server restarts all of your apps.</p>
<h3 id="worker">Single webserver with a worker</h3>
<p>Some web apps need to do things other than just <a href="https://www.codecademy.com/articles/what-is-crud">CRUD</a>. For example, my website <a href="https://www.blogreader.com.au/">Blog Reader</a> needs to scrape <a href="https://slatestarcodex.com/2020/04/24/employer-provided-health-insurance-delenda-est/">text</a> from a website and then send it to an Amazon API to be translated into <a href="https://media.blogreader.com.au/media/043dcf9fe4c1df539468000cb97af1d7.mp3">audio files</a>. Another common example is "thumbnailing", where you upload a huge 5MB image file to Facebook and they downsize it into a crappy 120kB JPEG. These kinds of tasks do not happen inside a Django view, because they take too long to run. Instead they have to happen "offline", in a separate worker process, using tools like <a href="http://www.celeryproject.org/">Celery</a>, <a href="https://huey.readthedocs.io/en/latest/django.html">Huey</a>, <a href="https://github.com/rq/django-rq">Django-RQ</a> or <a href="https://django-q.readthedocs.io/en/latest/">Django-Q</a>. All these tools provide you with a way to run tasks outside of Django views and do more complicated things, like co-ordinate multiple tasks and run them on schedules.</p>
<p>All of these tools follow a similar pattern: tasks are dispatched by Django and put in a queue where they wait to be executed. This queue is managed by a service called a "broker", which keeps track of all the tasks that need to be done. Common brokers for Django tasks are Redis and RabbitMQ. A worker process, which uses the same codebase as your Django app, pulls tasks out the broker and runs them.</p>
<p><img alt="worker server setup" src="https://mattsegal.dev/django-prod-architecture/worker-server.png"></p>
<p>If you haven't worked with task queues before then it's not immediately obvious how this all works, so let me give an example. You want to upload a 2MB <a href="https://memories-ninja-prod.s3-ap-southeast-2.amazonaws.com/original/7e26334177b6ee7d5ab4c21f7149190e.jpeg">photo of your breakfast</a> from your phone to a Django site. To optimise image loading performance, the Django site will turn that 2MB photo upload into a 70kB <a href="https://memories-ninja-prod.s3.amazonaws.com/display/7e26334177b6ee7d5ab4c21f7149190e.jpeg">display image</a> and a smaller <a href="https://memories-ninja-prod.s3.amazonaws.com/thumbnail/7e26334177b6ee7d5ab4c21f7149190e.jpeg">thumbnail image</a>. So this is what happenes:</p>
<ul>
<li>A user uploads a photo to a Django view, which saves the original photo to the filesystem and updates the database to show that the file has been received</li>
<li>The view also pushes a thumbnailing task to the task broker</li>
<li>The broker receives the task and puts it in a queue, where it waits to be executed</li>
<li>The worker asks the broker for the next task and the broker sends the thumbnailing tasks</li>
<li>The worker reads the task description and runs some Python function, which reads the original image from the filesystem, creates the smaller thumbnail images, saves them and then updates the database to show that the thumbnailing is complete</li>
</ul>
<p>If you want to learn more about this stuff, I've written guides for getting started with <a href="https://mattsegal.dev/offline-tasks.html">offline tasks</a> and <a href="https://mattsegal.dev/simple-scheduled-tasks.html">scheduled tasks</a> with Django Q.</p>
<h2>Single webserver with a cache</h2>
<p>Sometimes you'll want to <a href="https://docs.djangoproject.com/en/3.0/topics/cache/">use a cache</a> to store data for a short time. For example, caches are commonly used when you have some data that was expensive to pull from the database or an API and you want to re-use it for a little while. <a href="https://redis.io/">Redis</a> and <a href="https://en.wikipedia.org/wiki/Memcached">Memcached</a> are both popular cache services that are used in production with Django. It's not a very complicated setup.</p>
<p><img alt="cache on server setup" src="https://mattsegal.dev/django-prod-architecture/cache-on-server.png"></p>
<h2>Single webserver with Docker</h2>
<p>If you've heard of <a href="https://www.docker.com/">Docker</a> before you might be wondering where it factors into these setups.
It's a great tool for creating consistent programming environments, but it doesn't actually change how any of this works too much.
Most of the setups I've described would work basically the same way... except everything is inside a Docker container.</p>
<p>For example, if you were running multiple Django apps on one server and you wanted to use Docker containers, then
you might do something like this using <a href="https://docs.docker.com/engine/swarm/">Docker Swarm</a>:</p>
<p><img alt="docker on server setup" src="https://mattsegal.dev/django-prod-architecture/swarm-server.png"></p>
<p>As you can see it's not such a different structure compared to what we were doing before Docker.
The containers are just wrappers around the services that we were already running.
Putting things inside of Docker containers doesn't really change how all the services talk to each other.
If you really wanted to you could wrap Docker containers around more things like NGINX, the database, a Redis cache, whatever.
This is why I think it's valuable to learn how to deploy Django without Docker first.
That said, you can do some more complicated setups with Docker containers, which we'll get into later.</p>
<h2>External services</h2>
<p>So far I've been showing you server setups with just one virtual machine running Ubuntu.
This is the simplest setup that you can use, but it has limitations: there are some things that
you might need that a single server can't give you. In this section I'm going to walk you through
how we can break apart our single server into more advanced setups.</p>
<p>If you've studied programming you might have read about <a href="https://en.wikipedia.org/wiki/Separation_of_concerns">separation of concerns</a>, the
<a href="https://en.wikipedia.org/wiki/Single-responsibility_principle">single responsibility principle</a> and
<a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">model-view-controller (MVC)</a>.
A lot of the changes that we're going to make will have a similar kind of vibe: we're going to split up our services
into smaller, more specialised units, based on their "responsibilities".
We're going to pull apart our services bit-by-bit until there's nothing left.
Just a note: you might not need to do this for your services, this is just an overview of what you <em>could</em> do.</p>
<h2>External services - database</h2>
<p>The first thing you'd want to pull off of our server is the database. This involves putting PostgreSQL onto its own virtual machine.
You can set this up yourself or pay a little extra for an off-the-shelf service like <a href="https://aws.amazon.com/rds/">Amazon RDS</a>.</p>
<p><img alt="postgres on server setup" src="https://mattsegal.dev/django-prod-architecture/postgres-external.png"></p>
<p>There are a couple of reasons that you'd want to put the database on its own server:</p>
<ul>
<li>You might have multiple apps on different servers that depend on the same database</li>
<li>Your database performance will not be impacted by "noisy neighbours" eating up CPU, RAM or disk space on the same machine</li>
<li>You've moved your precious database away from your Django web server, which means you can delete and re-create your Django app's server with less concern</li>
<li><em>mumble muble security mumble</em></li>
</ul>
<p>Using an off-the-shelf option like AWS RDS is attractive because it reduces the amount of admin work that you need to run your database server.
If you're a backend web developer with a lot of work to do and more money than time then this is a good move.</p>
<h2>External services - object storage</h2>
<p>It is common to push file storage off the web server into "object storage", which is basically a filesystem behind a nice API. This is often done using <a href="https://django-storages.readthedocs.io/en/latest/">django-storages</a>, which I enjoy using. Object storage is usually used for user-uploaded "media" such as documents, photos and videos. I use AWS S3 (Simple Storage Service) for this, but every big cloud hosting provider has some sort of "object storage" offering.</p>
<p><img alt="AWS S3 setup" src="https://mattsegal.dev/django-prod-architecture/files-external-revised.png"></p>
<p>There are a few reasons why this is a good idea</p>
<ul>
<li>You've moved all of your app's state (files, database) off of your server, so now you can move, destroy and re-create the Django server with no data loss</li>
<li>File downloads hit the object storage service, rather than your server, meaning you can scale your file downloads more easily</li>
<li>You don't need to worry about any filesystem admin, like running out of disk space</li>
<li>Multiple servers can easily share the same set of files</li>
</ul>
<p>Hopefully you see a theme here, we're taking shit we don't care about and making it someone else's problem.
Paying someone else to do the work of managing our files and database leaves us more free time to work on more important things.</p>
<h2>External services - web server</h2>
<p>You can also run your "web server" (NGINX) on a different virtual machine to your "app server" (Gunicorn + Django):</p>
<p><img alt="nginx external setup" src="https://mattsegal.dev/django-prod-architecture/nginx-1-external.png"></p>
<p>This seems kind of pointless though, why would you bother? Well, for one, you might have multiple identical app servers set up for redundancy and to handle high traffic, and NGINX can act as a <a href="https://www.nginx.com/resources/glossary/load-balancing/">load balancer</a> between the different servers.</p>
<p><img alt="nginx external setup 2" src="https://mattsegal.dev/django-prod-architecture/nginx-2-external.png"></p>
<p>You could also replace NGINX with an off-the-shelf load balancer like an AWS Elastic Load Balancer or something similar.</p>
<p>Note how putting our services on their own servers allows us to scale them out over multiple virtual machines. We couldn't run our Django app on three servers at the same time if we also had three copies of our filesystem and three databases.</p>
<h2>External services - task queue</h2>
<p>You can also push your "offline task" services onto their own servers. Typically the broker service would get its own machine and the worker would live on another:</p>
<p><img alt="worker external setup" src="https://mattsegal.dev/django-prod-architecture/worker-1-external.png"></p>
<p>Splitting your worker onto its own server is useful because:</p>
<ul>
<li>You can protect your Django web app from "noisy neighbours": workers which are hogging all the RAM and CPU</li>
<li>You can give the worker server extra resources that it needs: CPU, RAM, or access to a GPU</li>
<li>You can now make changes to the worker server without risking damage to the task queue or the web server</li>
</ul>
<p>Now that you've split things up, you can also scale out your workers to run more tasks in parallel:</p>
<p><img alt="worker external setup 2" src="https://mattsegal.dev/django-prod-architecture/worker-2-external.png"></p>
<p>You could potentially swap our your self-managed broker (Redis or RabbitMQ) for a managed queue like <a href="https://aws.amazon.com/sqs/">Amazon SQS</a>.</p>
<h2>External services - final form</h2>
<p>If you went totally all-out, your Django app could be set up like this:</p>
<p><img alt="fully external setup" src="https://mattsegal.dev/django-prod-architecture/full-external.png"></p>
<p>As you can see, you can go pretty crazy splitting up all the parts of your Django app and spreading across multiple servers.
There are many upsides to this, but the downside is that you now have mutiple servers to provision, update, monitor and maintain.
Sometimes the extra complexity is well worth or and sometimes it's a waste of your time. That said, there are many benefits to this setup:</p>
<ul>
<li>Your web and worker servers are completely replaceable, you can destroy, create and update them without affecting uptime at all</li>
<li>You can now do <a href="https://martinfowler.com/bliki/BlueGreenDeployment.html">blue-green deployments</a> with zero web app downtime</li>
<li>Your files and database are easily shared between multiple servers and applications</li>
<li>You can provision different sized servers for their different workloads</li>
<li>You can swap out your self-managed servers for managed infrastructure, like moving your task broker to AWS SQS, or your database to AWS RDS</li>
<li>You can now autoscale your servers (more on this later)</li>
</ul>
<p>When you have complicated infrastructure like this you need to start automating your infrastructure setup and server config.
It's just not feasible to manage this stuff manually once your setup has this many moving parts. I recorded a talk
on <a href="https://mattsegal.dev/intro-config-management.html">configuration management</a> that introduces these concepts.
You'll need to start looking into tools like <a href="https://www.ansible.com/">Ansible</a> and <a href="https://www.packer.io/">Packer</a> to configure your virtual machines,
and tools like <a href="https://www.terraform.io/">Terraform</a> or <a href="https://aws.amazon.com/cloudformation/">CloudFormation</a> to configure your cloud services.</p>
<h2>Auto scaling groups</h2>
<p>You've already seen how you can have multiple web servers running the same app, or multiple worker servers all pulling tasks from a queue.
These servers cost money, dollars per hour, and it can get very expensive to run more servers than you need.</p>
<p>This is where <a href="https://aws.amazon.com/autoscaling/">autoscaling</a> comes in. You can setup your cloud services to use some sort of trigger, such as virtual machine CPU usage,
to automatically create new virtual machines from an image and add them to an autoscaling group.</p>
<p>Let's use our task worker servers as an example. If you have a thumbnailing service that turns <a href="https://memories-ninja-prod.s3-ap-southeast-2.amazonaws.com/original/7e26334177b6ee7d5ab4c21f7149190e.jpeg">big uploaded photos</a> into <a href="https://memories-ninja-prod.s3.amazonaws.com/thumbnail/7e26334177b6ee7d5ab4c21f7149190e.jpeg">smaller photos</a> then one server should be able to handle
dozens of file uploads per second. What if during some periods of the day, like around 6pm after work, you saw file uploads spike from dozens per second to <em>thousands</em> per second? Then you'd need more servers!
With an autoscaling setup, the CPU usage on your worker servers would spike, triggering the creation of more and more worker servers, until you had enough to handle all the uploads.
When the rate of file uploads drops, the extra servers would be automatically destroyed, so you aren't always paying for them.</p>
<h2>Container clusterfuck</h2>
<p>There is a whole world of container fuckery that I haven't covered in much detail, because:</p>
<ul>
<li>I don't know it very well</li>
<li>It's a little complicated for the targed audience of this post; and</li>
<li>I don't think that most people need it</li>
</ul>
<p>For completeness I'll quickly go over some of the cool, crazy things you can do with containers. You can use tools like <a href="https://kubernetes.io/">Kubernetes</a> and <a href="https://www.sumologic.com/glossary/docker-swarm/">Docker Swarm</a> with a set of config files to define all your services as Docker containers and how they should all talk to each other. All your containers run somewhere in your Kubernetes/Swarm cluster, but as a
developer, you don't really care what server they're on. You just build your Docker containers, write your config file, and push it up to your infrastructure.</p>
<p><img alt="maybe kubernetes" src="https://mattsegal.dev/django-prod-architecture/kubernetes-maybe.png"></p>
<p>Using these "container orchestration" tools allows you to decouple your containers from their underlying infrastructure.
Multiple teams can deploy their apps to the same set of servers without any conflict between their apps.
This is the kind of infrastructure that enables teams to deploy <a href="https://www.youtube.com/watch?v=y8OnoxKotPQ">microservices</a>.
Big enterprises like Target will have specialised teams dedicated to setting up and maintaining these container orchestration systems, while other teams can use them without having
to think about the underlying servers. These teams are essentially supplying a "platform as a service" (PaaS) to the rest of the organisation.</p>
<p>As you might have noticed, there is probably too much complexity in these container orchestration tools for them to be worth your while as a solo developer or even as a small team.
If you're interested in this sort of thing you might like <a href="http://dokku.viewdocs.io/dokku/">Dokku</a>, which claims to be "the smallest PaaS implementation you've ever seen".</p>
<h2>End of tour</h2>
<p>That's basically everything that I know that I know about how Django can be set up in production.
If you're interested in building up your infrastructure skills, then I recommend you try out one of the setups or tools that I've mentioned in this post.
Hopefully I've built up your mental models of how Django gets deployed so that the next time someone mentions "task broker" or "autoscaling", you have some idea of what they're talking about.</p>
<p>If you enjoyed reading this you might also like other things I've written about <a href="https://mattsegal.dev/simple-django-deployment.html">deploying Django as simply as possible</a>,
how to <a href="https://mattsegal.dev/offline-tasks.html">get started with offline tasks</a>, how to start <a href="https://mattsegal.dev/file-logging-django.html">logging to files</a> and <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">tracking errors</a> in prod and my <a href="https://mattsegal.dev/intro-config-management.html">introduction to configuration management</a>.</p>
<p>If you liked the box diagrams in this post check out <a href="https://excalidraw.com/">Exalidraw</a>.</p>Studying programming: where to start2020-05-16T12:00:00+10:002020-05-16T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-16:/self-study-starting.html<p>You have zero programming knowledge and you want to start learning to code.
Where do you start?</p>
<p>Maybe you want to learn enough to get yourself a coding job, or you're planning to study computer science in the future and you
want to try it out before you start your …</p><p>You have zero programming knowledge and you want to start learning to code.
Where do you start?</p>
<p>Maybe you want to learn enough to get yourself a coding job, or you're planning to study computer science in the future and you
want to try it out before you start your course.
Maybe you just want to automate a few things here and there.</p>
<p>This post will outline a path to getting comfortable with coding.
It's certainly not the only way to learn programming, it's just the advice that I would give anybody who asked me.</p>
<h3>Learning the basics</h3>
<p>First you'll need to pick a programming language to learn.
This won't be the only language you ever learn if you pursue programming long-term,
I've been coding for about 5 years and I know roughly three-and-a-half to six languages, depending on what counts as a "language", so this
choice isn't forever.</p>
<p>If you haven't already picked something, learn Python.
It's one of the easier languages to get started with, has a syntax that almost looks
like natural language and you can get a lot done with it.
There are also a <em>lot</em> of good beginner resources for learning Python.</p>
<p>You should follow a bare-basics course or book to get started, like <a href="https://realpython.com/learning-paths/python3-introduction/">Real Python's introductory course</a> or the often-recommended <a href="https://automatetheboringstuff.com/">Automate the Boring Stuff with Python book</a>. There are <a href="https://www.reddit.com/r/learnpython/wiki/index#wiki_new_to_programming.3F">dozens of books and courses</a> you can choose, so just pick one and learn the basics. I've also written about some <a href="https://mattsegal.dev/windows-setup-programming.html">other tools</a> that you should look into when programming (specifically on Windows).</p>
<p>"The basics", which I have mentioned so far, will include:</p>
<ul>
<li>installing Python on your computer</li>
<li>running Python scripts on the "command line" (CLI)</li>
<li>variables, data types, simple data structures</li>
<li>control flow (if, else, for, while)</li>
<li>printing output to the CLI</li>
<li>reading user input from the CLI</li>
<li>reading from and writing to files</li>
</ul>
<p>It's going to feel very simple and kind of dumb. After a week of messing around you <em>might</em> be able to build a simple text-based calculator.
That's pretty normal: your first programs will not be very impressive at all.
Maybe you spent your first week just-trying-to-fucking-install-Python, which isn't that abnormal either.</p>
<h3>Many small challenges</h3>
<p>Once you've read a book or followed a course and gotten the basics down, you need to start setting small challenges for yourself. You can't just do tutorials forever... well you <em>can</em>, but you'll always be dependent on them to learn new things. You need to come up with your own problems and build your own solutions. You might start by building:</p>
<ul>
<li>a script that asks your name and prints it back to you</li>
<li>a text-based calculator that helps you add, multiply, divide</li>
<li>a script that tells you the number of days between two dates</li>
<li>a script that prints out your workout for the day</li>
</ul>
<p>The point isn't that these are particularly useful or impressive programs, it's that you set a challenge for yourself and then build a working solution.
This is something you should do over and over. You will get stuck and need to search Google and check <a href="https://stackoverflow.com/">Stack Overflow</a>, read the official <a href="https://docs.python.org/3/">Python documentation</a>, and ask for help on <a href="https://www.reddit.com/r/learnprogramming/">/r/learnprogramming</a> and <a href="https://www.reddit.com/r/python/">/r/learnpython</a>. Getting stuck, and then getting unstuck is a part of being a programmer, and tutorials that hold your hand won't teach you how to solve your own problems. Since you are doing lots of small challenges, it's OK if you need to give up and try something easier before coming back to it later.</p>
<p>Don't get me wrong, tutorials are <em>great</em> resources for learning a specific skill, but you cannot learn from them alone.</p>
<p>If you're having trouble inventing your own challenges, then check out <a href="https://www.codeabbey.com/">Code Abbey</a>, which has a bajillion problems to solve with a wide range of difficulties.</p>
<p>Once you're comfortable with the basic syntax of Python and solving simple problems, you can slowly ramp up the complexity of the problems and start playing around with third party libraries. You might eventually:</p>
<ul>
<li>make a script that pulls data from a webpage and prints it to a screen</li>
<li>write a simple website that stores your "to-dos" in a database</li>
<li>write a script that finds and deletes duplicate photos on your laptop</li>
<li>play around with a new language, like JavaScript, which is the language that you need to program websites</li>
<li>learn to create webpages with HTML and CSS</li>
<li>read and write data from a spreadsheet (Excel, Google Sheets)</li>
<li><a href="https://www.khanacademy.org/computing/computer-programming/sql">learn SQL</a> to manage databases</li>
</ul>
<p>You won't know how to solve these problems to start, which is part of the process: Googling around until you figure out how to achieve your goal.
I don't actually remember very much about the coding tools and languages that I use day to day - I'm just a fucking gun at Googling things. I'm a professional <a href="https://www.djangoproject.com/">Django</a> developer (most of the time) and when I'm working on Django projects I will check the <a href="https://docs.djangoproject.com/en/3.0/">documentation</a> at least once an hour. Developing your Google-fu and learning how to read documentation will be vital for your programming abilities.</p>
<h3>More advanced theory</h3>
<p>If you get comfortable with coding and want to get a head start on your studies, then you should try some more advanced online courses.
Even if you aren't going to study computer science at school, <a href="https://mattsegal.dev/self-study-tools-vs-concepts.html">there are great benefits to learning some theory</a>.</p>
<p>I recommend choosing something that teaches computer science concepts, not just how to use particular tools.
I think <a href="https://www.coursera.org/learn/principles-of-computing-1">Principles of Computing</a> is a fantastic course for dipping your toes into computer science,
with engaging coursework and a focus on the practical skills that a software engineer needs. <a href="https://mattsegal.dev/nand-to-tetris.html">Nand2Tetris</a> is awesome, but a little more challenging. There is an <em>absurd</em> number of free online computer science courses, these two aren't the only good ones, they're just the ones I have personally done and recommend.</p>
<h3>Learn some more advanced tools and frameworks</h3>
<p>Eventually you'll want to do something practical with your coding, and you won't want to build everything from scratch.
With a solid grounding in Python and basic programming, you can branch out to learn about:</p>
<ul>
<li>scientific computing (NumPy, SciPy)</li>
<li>data science tools (pandas, matplotlib, Jupyter notebooks)</li>
<li>web development (Flask, Django)</li>
<li>web server admin (Linux, SSH, SCP, apt, AWS, etc)</li>
<li>databases (SQLite, Postgres, SQLAlchemy)</li>
<li>frontend JavaScript frameworks (React, Vue, Angular)</li>
</ul>
<p>I recommend that you learn these tools with a small project in mind, to better motivate and guide your study.</p>
<h3>Conclusion</h3>
<p>So in summary, I recommend you:</p>
<ul>
<li>learn the basics of Python by following a beginner's course or book</li>
<li>do lots of small self-guided coding challenges</li>
<li>try some more advanced self-guided challenges and new tools</li>
<li>learn some computer science theory</li>
<li>learn some advanced tools and frameworks</li>
</ul>Studying programming: pace yourself2020-05-15T12:00:00+10:002020-05-15T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-15:/self-study-pacing.html<p>You can learn programming all by yourself and get a coding job. Just you, your laptop and the internet.
It's great! You don't have to pay thousands of dollars for a degree and you can work at your own pace.</p>
<p>There's a problem with this approach though: with no teacher …</p><p>You can learn programming all by yourself and get a coding job. Just you, your laptop and the internet.
It's great! You don't have to pay thousands of dollars for a degree and you can work at your own pace.</p>
<p>There's a problem with this approach though: with no teacher or course to guide you it's not clear
how much work you need to do every day. There are no professors giving weekly lectures or tutors setting homework.</p>
<p>You just do as much study as you're motivated to do. Are you doing enough? Could you do more?
These questions can eat at you, creating guilt and anxiety when you spend time on non-programming activities.
There's no clear line between work, study and play. There's no campus or workplace to go to and no-one is keeping you accountable.</p>
<p>I've written before about <a href="https://mattsegal.dev/self-study-mindset-enthusiasm.html">how to choose</a>
what to study and whether to focus on <a href="https://mattsegal.dev/self-study-tools-vs-concepts.html">theory or practice</a>.
In this post I want to discuss the question: how much should should you be working?</p>
<p>In general my advice will be to pace yourself. Slow and steady - don't burn out. I can't pin down
this quote exactly, but it goes something like:</p>
<blockquote>
<p>You overestimate what you can do in a day, and underestimate what you can do in a year</p>
</blockquote>
<h3>Actual advice</h3>
<p>Let's be specific though. I think you should aim for four hours a day of total study.
That's what worked for me personally.
It might not seem like much, but that's four hours <em>every day</em>.
You might think that 8 hours of work per day is the golden standard, but learning is much harder than working.</p>
<p>That four hour figure was for people who are studying full time.
Of course not everyone has the luxury to dedicate themselves to learning a new profession.
If you're working a job as well then I'd aim for 1-2 hours a day at most.</p>
<h3>Study some theory</h3>
<p>Of those fours hours I recommend you spend 1-2 hours doing some sort of course work. This might be watching
lectures, reasing a textbook or doing assigned coursework. I'm talking about courses from <a href="https://www.coursera.org/">Coursera</a>,
<a href="https://www.udemy.com/">Udemy</a>, <a href="https://ocw.mit.edu/index.htm">MIT OpenCourseWare</a>, or even just good old YouTube videos.
Passively reading and watching videos is very mentally draining and I doubt you can actively absorb information and take notes for more than two hours a day.
You can sit infront of a video like a zombie for as many hours as you want, but I think most people only have a couple of hours of high quality
passive learning in them every day. Maybe you're special (good for you!), but if you're not special that's OK.</p>
<h3>Do some practice</h3>
<p>I think you should spend the rest of your time on some sort of coding project, which I've described <a href="https://mattsegal.dev/self-study-mindset-enthusiasm.html">elsewhere</a>. I suggest coding after some theory, because learning theory is harder. If you've found a project that you're interested in, then it shouldn't be hard to spend 2-3 hours writing code. Some days you'll get totally stuck and give up a little early - that's normal. Other days you'll be having fun and the next 10 hours will fly by. What's important is that you don't <em>force yourself</em> to sit and code for 8 hours a day when you're not having fun.
If you find yourself repeatedly unably to spend 2-3 hours a day working on your code, then you need to take a step back and figure out what's blocking you:</p>
<ul>
<li>Do you hate the language you're using? Try a new language</li>
<li>Is the project you chose for yourself too hard? Try something easier</li>
</ul>
<p>In general you should be taking a meta perspective on your work. Don't try and just slog through your problems. If 2-3 hours of coding a day isn't fun,
then figure out how to make it fun.</p>
<h3>Immerse yourself</h3>
<p>You can accumulate a lot of passive programming knowledge without doing a lot of work <em>per-se</em>.
If you can find programming-related entertainment that you enjoy consuming, then it won't feel like you're
studying. For example, I used to really enjoy reading <a href="https://www.joelonsoftware.com/">Joel Spolosky's blog</a>
and listening to podcasts like <a href="https://www.programmingthrowdown.com/">Programming Throwdown</a> and <a href="https://softwareengineeringdaily.com/">Software Engineering Daily</a>.
I'd just listen to these podcasts while I was walking around. It was just casual consumption, not a study thing: there were no weekly "podcast goals" that had to be met.
Maybe blogs and podcasts aren't your thing, but there are also books and YouTube videos a-plently that are both entertaining and <em>slightly</em> informative.
Hell, I even picked up lingo from being subbed to <a href="https://www.reddit.com/r/programminghumor/">/r/programminghumor</a>.</p>
<p>It's really hard to say how long it takes any given person to get a software job from scratch.
There's a lot of variance involved including your local job market plus pure luck and coincidence.
In general getting a job will probably take longer than you expect, so it's important to work consistently.
You should be in it for the long haul: pace yourself.</p>Studying programming: tools or theory?2020-05-10T12:00:00+10:002020-05-10T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-10:/self-study-tools-vs-concepts.html<p>When you're studying web development you have a lot to learn and limited time.
One of the hard choices that you'll need to make is whether you learn tools or concepts.
Should you study data structures and algorithms to be a web developer?
It seems kind of esoteric.
Do you …</p><p>When you're studying web development you have a lot to learn and limited time.
One of the hard choices that you'll need to make is whether you learn tools or concepts.
Should you study data structures and algorithms to be a web developer?
It seems kind of esoteric.
Do you just need to learn a bunch of the latest tools and frameworks to be productive?
I'm going to argue that you need both: learning concepts makes you better at using tools, and using tools motivates you to learn concepts.</p>
<h3>The case for learning tools</h3>
<p>The case for learning tools and frameworks is the strongest so let's get it out of the way: they make you more productive.
I can use <a href="https://www.djangoproject.com/">Django</a> to build a website with
authentication, permissions, HTML templating, database models, form validation, etc. in half a day.
Writing any one of these features from scratch would take me days at the very least.
You do not want to invent the 2020 programmer's toolchain from scratch, not if you want to get anything done.</p>
<p>In addition, employers want you to know how to use tools.
Programmers get paid to ship valuable code, not to know a bunch of stuff.
Job advertisments are primarily a list of <a href="https://reactjs.org/">React</a>, <a href="https://spring.io/">Spring</a>, <a href="https://webpack.js.org/">Webpack</a>, <a href="https://nuxtjs.org/">NuxtJS</a>, Django, <a href="https://rubyonrails.org/">Rails</a>, etc.
Contrary to how it might seem, you can get these jobs without knowing every technology on the list,
but you do need to know at least some of them.
Good luck getting a coding job if you don't know Git.</p>
<p>Ok, so we're done right? Tools win, fuck ideas. Learn Git, get money.</p>
<h3>The case for learning ideas</h3>
<p>You can't just learn tools and frameworks. If you do not know <a href="https://www.youtube.com/watch?v=DTQV7_HwF58">how the internet works</a>,
then you're going to spend your time as a web developer swimming in a meaningless word-soup of "DNS", "TCP" and "Headers".
Django's database model structure is going to be very confusing if you don't know what "database normalisation" is.
How will you debug issues that don't already have a StackOverflow post written for then?</p>
<p>Ok cool, so you need to learn some basic internet stuff, but do you really need to learn about computational complexity?
Do you have to be able to <a href="https://twitter.com/mxcl/status/608682016205344768?lang=en">invert a binary tree</a>?</p>
<p>Well, no: you don't <em>have</em> to learn these theoretical computer-sciency concepts to get a job as a programmer.
That said, I think it's in your interest to learn theoretical stuff.
Learning computer-sciency concepts help you learn new tools faster and use them better.
If you've learned a little bit of <a href="https://en.wikipedia.org/wiki/Functional_programming">functional programming</a> then you'll find a lot of familliar concepts when reading the <a href="https://redux.js.org/basics/reducers">Redux documentation</a>:</p>
<blockquote>
<p>The reducer is a pure function that takes the previous state and an action, and returns the next state.</p>
</blockquote>
<p>If you haven't been exposed to functional programming concepts, then words like "state", "pure function" and "immutability"
are going to be complete jibberish. Functional programming is infamous for this kind of techno-babble:</p>
<blockquote>
<p><a href="https://stackoverflow.com/questions/3870088/a-monad-is-just-a-monoid-in-the-category-of-endofunctors-whats-the-problem">A monad is just a monoid in the category of endofunctors, what's the problem?</a></p>
</blockquote>
<p>The authors of the Redux docs have the <a href="https://en.wikipedia.org/wiki/Curse_of_knowledge">curse of knowledge</a>. They either don't know that they need to explain these terms, or they don't care to.
You might not have bothered to learn about functional programming, but they did.</p>
<p>Similarly, you don't need to understand hash functions to use Git, but the string of crazy numbers and letters
in your history is going to be quite disorienting: what the fuck is e2cbf1addc70652c4d63fdb5a81720024c9f2677 supposed to mean?</p>
<p>Even simple ideas like the idea of a "tree" data structure helps you work with the computer filesystem more easily.
You might know that recursion is a good method for "walking" trees. Pattern-matching a programming problem
to a data structure will help you come up with solutions much faster.</p>
<p>You can't know beforehand which computer science concepts will be useful.
As far as I can tell, functional programming got "cool" and baked into some frontend tools in the last five years or so. I don't know what's next. You need to get a broad base of knowledge to navigate and demystify the programming landscape.</p>
<p>Ok, so you should:</p>
<ul>
<li>isolate yourself in a log cabin for four years</li>
<li>study computer science</li>
<li>return to civilisation</li>
<li>learn Git</li>
<li>get money</li>
</ul>
<p>...right?</p>
<h3>What to learn first?</h3>
<p>You can't sit down and just learn all of computer science, downloading it all into your brain
like Neo hooked into the Matrix.
You'll also struggle to learn new tools and frameworks without some computer science fundamentals.
So, what to do?</p>
<p>I think you should try a <a href="https://en.wikipedia.org/wiki/Spiral_approach">spiral approach</a> to learning.
You should learn a some theory, then explore some new tools, then try to build something practical.
Repeat over and over.
You won't necessarily learn everything in the "right order", but new ideas from one area will influence another.
You might:</p>
<ul>
<li>run into performance bottlenecks in your code and get interested in computational complexity</li>
<li>read about "pure functions" in the Redux docs and explore functional programming</li>
<li>complete a course on <a href="https://mattsegal.dev/nand-to-tetris.html">compilers</a> and finally understand what all those pesky .class, .pyc and .dll files are doing on your computer</li>
</ul>
<p>This might seem like a random and haphazard approach, and it kind of is, but I don't think learning
programming should be viewed as a big list of "things you must do". I've written more about that <a href="https://mattsegal.dev/self-study-mindset-enthusiasm.html">in this post</a>.</p>
<p>If you are learning programming and you have only focused on learning frameworks and tools, then I encourage you to mix in some theoretical online courses as well. If you're immersed in a univeristy-style curriculum and haven't tried any modern programming tools - start using them now!</p>How to diagnose and fix slow queries with Django Debug Toolbar2020-05-09T12:00:00+10:002020-05-09T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-09:/django-debug-toolbar-performance.html<p>Your Django views are running slowly and you want to make them faster,
but you can't figure out what the issue is just by reading the code.
Just as bad is when you're not sure if you're using the Django ORM correctly - how can you know if the
code you …</p><p>Your Django views are running slowly and you want to make them faster,
but you can't figure out what the issue is just by reading the code.
Just as bad is when you're not sure if you're using the Django ORM correctly - how can you know if the
code you write will be slow?</p>
<p>This is where a profiling tool comes in handy.
<a href="https://django-debug-toolbar.readthedocs.io">Django Debug Toolbar</a> is great
for figuring out why your Django views are going slow. This guide will show you how to use DJDT to find and fix slow database queries in your views.</p>
<p>The demo app shown in the video is <a href="https://github.com/MattSegal/djdt-perf-demo">available on GitHub</a>.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/9uoI6pvuvYs"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>The DJDT docs explain <a href="https://django-debug-toolbar.readthedocs.io/en/latest/installation.html">how to install</a> the toolbar.</p>Studying programming: what to learn next?2020-05-08T12:00:00+10:002020-05-08T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-08:/self-study-mindset-enthusiasm.html<p>A lot of people trying to teach themselves programming have an anxiety
about what they should be learning. There is an endless array
of options - you've seen these ridiculous <a href="https://github.com/prakhar1989/awesome-courses">lists of online courses</a>, right?
There's too much to learn and not enough time! You don't want to waste time learning …</p><p>A lot of people trying to teach themselves programming have an anxiety
about what they should be learning. There is an endless array
of options - you've seen these ridiculous <a href="https://github.com/prakhar1989/awesome-courses">lists of online courses</a>, right?
There's too much to learn and not enough time! You don't want to waste time learning something that doesn't matter.</p>
<p>This dilemma can manifest as a general sense of dread about the task ahead of you,
or it can lead you to ruminate over specific technolgies: should I learn Java or Python? Flask or Django?
Which framework is best? What should I learn to get myself a job?</p>
<h3>Follow your enthusiasm</h3>
<p>I recommend you dissolve this question by learning about things that interest you.
Stuff that you're enthusiastic about. Stuff that <strong>gets you going</strong>.
You wanted to learn programming for a reason, right? Why was that -
what about it seems cool to you? Work on that!</p>
<p>If you don't know what's cool about programming, then you should explore the landscape: sample lots of things until you find something you like. Try lots of small projects:
make a webpage in HTML, do some hacking challenges, learn about databases, or functional programming, etc.
You can use <a href="https://marginalrevolution.com/marginalrevolution/2019/08/reading-and-rabbit-holes.html">this "rabbit holes" technique</a>
to pose some interesting questions for yourself.</p>
<p>You might have a goal like "I want to be a backend web developer". I think you can work towards this goal while still learning things that interest you. For example, if you're into hacking at the moment, you can learn about hacking webservers.</p>
<h3>It's all just practice anyway</h3>
<p>Here's the thing, it doesn't really matter what you learn next, at least not early on.
As a beginner coder, you're going to fucking suck at everything you do, so you might as well have fun while you're sucking.
Your first few months will just be learning to program, in general, and the most important thing to do is to write a lot of code.</p>
<p>If you're interested in the content you're learning and the code you're writing then you will do so
much more practice than if you're just grinding. Conversely, you'll burn out if you're forcing yourself to work on
something you don't care about. This isn't some airy-fairy feel-good advice telling you to "follow your dreams" - you will do more productive work if you're interested in what you're doing.</p>
<p>Why is practice so important? In programming there are a bunch of meta-skills that you don't learn deliberately,
but you'll learn them by writing a lot of code. I'm talking about
how to debug your code, how to find solutions to problems online, how to read documentation.
There are "coding muscles" in your brain that you need to exercise.
Picking up these meta-skills is more important than knowing some specifc web framework, like Ruby on Rails (RoR).</p>
<p>Here's why: if you try to learn RoR early it'll take you weeks to learn the basics of the framework.
You'll struggle to navigate the online documentation and command-line tools.
You'll make Ruby language syntax errors while trying to learn framework-specific concepts.
On the other hand, you can learn the basics of RoR in a weekend after 6 months of programming.
You'll only need to pick up the RoR-specific stuff because you already have a solid background in coding.</p>
<p>Just to clarify: I'm not saying that you shouldn't learn Ruby on Rails in your first week of learning to code. I'm saying that you shouldn't <em>force yourself</em> to learn Ruby on Rails because you're trying to optimise for getting a job in the shortest time possible.</p>
<h3>There's practice, then there's <em>practice</em></h3>
<p>A note on "practice". I said that it's important for a beginner, and suggested that writing a lot of code is good practice.
I think that's mostly true, but there are some things you can do to get better faster:</p>
<ul>
<li>Deliberately try out new skills and techniques in your work: "in this project, I'm going to try the 'object oriented' style I just learned". In another project you might try the "functional style".</li>
<li>Re-visit your old projects and think about how/if you could make them better with your new skills. You'll learn a lot by trying to read the code you wrote a month ago.</li>
<li>Try to get some feedback on your code from more experienced developers.</li>
</ul>
<h3>Start by building things</h3>
<p>When I said "learn about hacking webservers" earlier, you might imagine yourself reading a textbook on hacking and then reading a textbook on webservers. Maybe that works for you, but I don't like that approach. I think you should start your learning journey with a practical challenge. In this hacking example it might be to complete the first stage of the <a href="https://overthewire.org/wargames/">Over the Wire wargames</a>. There are two reasons for this.</p>
<p>Firstly, you do not know what you need to learn. How could you? Having a practical goal grounds you in reality and forces you to confront your ignorance. Here's an example: I literally did not know what "backend web development" was when I was building my <a href="https://mattslinks.xyz/">first website</a> (this ~v10). Even so, I wanted new list items to remain on the page after I refreshed it in my browser. To get that to work, I learned a lot about backend web dev: HTTP, APIs, Linux, virtual machines, web servers, JSON, web frameworks, WSGI, etc. I had no idea that I needed to know what a "JSON" was when I set out on that path, but my practical project lead me there.</p>
<p>Secondly, it's much easier to learn things when you have an implementation in mind. If you are building something, then when you need to learn a new concept you'll think:</p>
<blockquote>
<p>Hmm, I need to learn more about JSON and HTTP are to get this task done.</p>
</blockquote>
<p>You'll be motivated because you will quickly be able to use that new knowledge. Contrast that to someone handing you a massive list of web technologies and telling you to study them all. Do you think you'll be motivated to slog through that list?</p>
<p>To clarify: I'm not telling you not to read textbooks. I'm saying that you should read
textbooks after you've found a practical problem that inspires you to learn more. <a href="https://mattsegal.dev/nand-to-tetris.html">Nand2Tetris</a> is a great example of this. It's an online course where they trick you into reading a <a href="https://www.nand2tetris.org/book">textbook</a> by first asking you to build a computer.</p>
<h3>Conclusion</h3>
<p>Next time you catch yourself agonising over your next coding project, new framework or online course,
just ask yourself: is this fun? Will I learn something new? If so, go for it!</p>Keeping your config files valid with Python2020-05-03T12:00:00+10:002020-05-03T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-03:/cerberus-config-validation.html<p>It's common to use a config file for your Python projects:
some sort of JSON or YAML document that defines how you program behaves. Something like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># my-config.yaml</span><span class="w"></span>
<span class="nt">num_iters</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">30</span><span class="w"></span>
<span class="nt">population_size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">20000</span><span class="w"></span>
<span class="nt">cycle_type</span><span class="p">:</span><span class="w"> </span><span class="s">"long"</span><span class="w"></span>
<span class="nt">use_gpu</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"></span>
<span class="nt">plots</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">population</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">infections</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">cost</span><span class="p p-Indicator">]</span><span class="w"></span>
</code></pre></div>
<p>Storing config in a file …</p><p>It's common to use a config file for your Python projects:
some sort of JSON or YAML document that defines how you program behaves. Something like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># my-config.yaml</span><span class="w"></span>
<span class="nt">num_iters</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">30</span><span class="w"></span>
<span class="nt">population_size</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">20000</span><span class="w"></span>
<span class="nt">cycle_type</span><span class="p">:</span><span class="w"> </span><span class="s">"long"</span><span class="w"></span>
<span class="nt">use_gpu</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"></span>
<span class="nt">plots</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">population</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">infections</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">cost</span><span class="p p-Indicator">]</span><span class="w"></span>
</code></pre></div>
<p>Storing config in a file is nice because it lets you separate your input data from the code itself,
but it sucks when a bad config file crashes your program. What <em>really</em> sucks is when:</p>
<ul>
<li>You don't know exactly which bad value is caused the crash, or how to fix it</li>
<li>The bad config crashes your program minutes or hours after you first ran it</li>
<li>Other users write invalid config, then tell you your code is broken</li>
</ul>
<p>A related issue is when you have complex data structures flying around inside your code: lists of dicts, dicts of lists, dicts of dicts of dicts.
You just have to pray that all the data is structured the way you want it. Sometimes you forget how it's supposed to look in the first place.</p>
<p>You might have tried validating this data yourself using "assert" statments, "if"s and "ValueError"s, but it quickly get tedious and ugly.</p>
<h3>Cerberus</h3>
<p>When I run into these kinds of problems, I tend to pull out <a href="https://docs.python-cerberus.org/en/stable/">Cerberus</a>
to stop the bleeding. It's a small Python library that can validate data according to some schema at runtime.
It's pretty simple to use (as per their docs):</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">cerberus</span> <span class="kn">import</span> <span class="n">Validator</span>
<span class="n">schema</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="p">{</span><span class="s1">'type'</span><span class="p">:</span> <span class="s1">'string'</span><span class="p">}}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">Validator</span><span class="p">(</span><span class="n">schema</span><span class="p">)</span>
<span class="n">v</span><span class="o">.</span><span class="n">validate</span><span class="p">({</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'john doe'</span><span class="p">})</span> <span class="c1"># True</span>
<span class="n">v</span><span class="o">.</span><span class="n">validate</span><span class="p">({</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'aaaa'</span><span class="p">})</span> <span class="c1"># True</span>
<span class="n">v</span><span class="o">.</span><span class="n">validate</span><span class="p">({</span><span class="s1">'name'</span><span class="p">:</span> <span class="mi">1</span><span class="p">})</span> <span class="c1"># False</span>
<span class="n">v</span><span class="o">.</span><span class="n">errors</span> <span class="c1"># {'name': ['must be of string type']}</span>
</code></pre></div>
<p>You can use this tool to validate all of your loaded config at the <em>start</em> of your program: giving early feedback to the user
and printing a sensible error message that tells them how to solve the problem ("Look at 'name', make it a string!").
This is much better than some obscure ValueError that bubbles up from 6 function calls deep.
It's still not a great experience for non-programmers, but coders will appreciate the clarity.</p>
<p>The Cerberus schema is just a Python dictionary that you define.
Even so, it's quite a powerful system for how basic it is. You can use Ceberus schemas to validate complicated nested data structres if you want to,
even adding custom validation functions and type definitions.
It's particularly nice because it allows you to declare how your data should look, rather than writing a hundred "if" statements.</p>
<p>Here's an example: a YAML config file for training a neural network might look like <a href="https://gist.github.com/MattSegal/d813f8d7848b5459f95f5eeacf581d2a">this</a> and
you could build a validator for that config like <a href="https://gist.github.com/MattSegal/fea30d10d26ef666f3a572e97f03c339">this</a>. Since everything is just dicts, there's no reason you can't also write your schema as a YAML or JSON (<a href="https://gist.github.com/MattSegal/b855659ff40533a9d13935a3ca632f63">example</a>).
Luckily Cerberus will validate your schema before applying it, so there is no endless recursion of "who validates the validators?".</p>
<h3>Schema as documenation</h3>
<p>I think that defining data schemas using Cerberus gets really useful when lots of different people need to use your config files.
The schema that you've defined also serves as documentation on how to write a correct config file: add a few explanatory comments and you've got some quick-n-dirty docs.
It's not a perfect strategy for writing docs but it has one fantasic property: the documentation cannot lie, because it <em>actually runs as code</em>.</p>
<p>I was recently working on an in-house CLI tool for builds and deployment that was written in Python.
I had devs from other teams using the file and I couldn't always show them how to use it in-person.
Even worse I was constantly updating the tool based on feature requests and the config was evolving over time.
Once I had written a Cerberus schema for the tool's config files, I was able to link to the
schema from our documentation. In addition, I was able to run regression tests on "wild" config files
by pulling them down from our source control and checking that they were still valid.</p>
<h3>Limitations</h3>
<p>There's no denying that these schemas are very, very verbose: you need to write a lot of text to define even simple data structures.
I think this verbosity caused by the fact that the tool uses built-in Python data structures, rather than an object-oriented DSL.
It's quick and easy to get started, but that comes at a cost.</p>
<p>Another issue is that you can abuse this tool by using it as half-assed type system.
It gives you no type hints or static compilation errors in your IDE: everything happens when the code runs.
Some code quality problems are better solved by investing in static analysis and using tools like <a href="http://mypy-lang.org/">mypy</a>.</p>
<p>Finally, using Cerberus to validate config files and big data structures can hide underlying issues.
I think of it like slapping a bandaid on a problem. It stops the bleeding, but you should also clean up all the broken glass on the floor.
Why do you have all these config files in the first place? Why are you shipping around these big crazy data structures in your code?
It's good to ask these questions and consider alternative solutions.</p>
<h3>Next steps</h3>
<p>Give Cerberus a try in your next CLI tool or data science project, you're a quick pip install and a schema definition
from validating your config files.</p>8 helpful tools for programming on Windows2020-05-02T12:00:00+10:002020-05-02T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-05-02:/windows-setup-programming.html<p>Software development on Windows can be a pain. Not because of any issues with C#, .NET
or the operating system, but simply because the tools surrounding your work can be quite clunky by default.
I'm talking about the lack of a package manager, PowerShell's ugly blue terminal with no tabs …</p><p>Software development on Windows can be a pain. Not because of any issues with C#, .NET
or the operating system, but simply because the tools surrounding your work can be quite clunky by default.
I'm talking about the lack of a package manager, PowerShell's ugly blue terminal with no tabs and a bunch of "missing" tools (git, ssh).
It's like a living room where all the furniture is perfectly positioned to stub your toe.</p>
<p>That said, you can get have a pretty nice developer experience if you install a few tools.
This post goes over my preferred setup on a new Windows laptop. It's not a definitive guide, just some tips and
tricks that I've picked up from other devs that I've worked with. Hopefully you find some of them useful.</p>
<p>The post below summarises everything in this video.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/wMJJp1PbQQA"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>ConEmu console emulator</h3>
<p><a href="https://conemu.github.io/">ConEmu</a> my #1 favourite tool for Windows. It allows you to:</p>
<ul>
<li>Open many PowerShell tabs in one window</li>
<li>Show and hide the terminal with a hotkey (ctrl-`)</li>
<li>Split your windows into sub-windows using hotkeys (ctrl-shift-(o|e))</li>
<li>Open different shells in one window (PowerShell, Git Bash, cmd)</li>
<li>Customise your terminal (different colors etc)</li>
<li>Open PowerShell as Admin automatically</li>
</ul>
<p>It's like removing a rock from your shoe: an ugly blue rock.
Some people also like to use <a href="https://cmder.net/">Cmder</a> for the same use-case.</p>
<h3>Everything search</h3>
<p>Windows Explorer search is so horribly broken in 2020 that you <em>hope</em> Microsoft is trolling you,
because the alternative is just sad. In any case <a href="https://www.voidtools.com/support/everything/">Everything</a>
gives you very fast search of all your files and folders, including that pesky InternalToolChain.dll
which has gone missing.</p>
<p>I believe it runs in the background all the time, quietly indexing your files.
I do not how this affects your workstation's performance.</p>
<h3>Chocolatey package manager</h3>
<p><a href="https://chocolatey.org/">Chocolatey</a> is the (unofficial) package manager for Windows.
<a href="https://www.nuget.org/">NuGet</a> is good for installing your .NET libraries, while <code>choco</code> is good for everything else.
It's great for quickly installing tools and automating the process. It's quite easy to <a href="https://chocolatey.org/install">install</a>.</p>
<p>To install a tool like Everything, you can just <a href="https://chocolatey.org/search?q=everything">search for it</a> then run the install from the CLI:</p>
<div class="highlight"><pre><span></span><code><span class="n">choco</span> <span class="n">install</span> <span class="n">everything</span>
</code></pre></div>
<p>In fact, once you've got choco installed, you can install all of the other tools on this list with:</p>
<div class="highlight"><pre><span></span><code><span class="n">choco</span> <span class="n">install</span> <span class="n">git</span> <span class="n">-y</span>
<span class="n">choco</span> <span class="n">install</span> <span class="n">conemu</span> <span class="n">-y</span>
<span class="n">choco</span> <span class="n">install</span> <span class="n">everything</span> <span class="n">-y</span>
<span class="n">choco</span> <span class="n">install</span> <span class="n">poshgit</span> <span class="n">-y</span>
<span class="n">choco</span> <span class="n">install</span> <span class="n">vscode</span> <span class="n">-y</span>
<span class="n">choco</span> <span class="n">install</span> <span class="n">ag</span> <span class="n">-y</span>
</code></pre></div>
<p>Try not to install anything with Chocolatey if it already exists: things can get weird. You can always run <code>Get-Command</code> in PowerShell to check for existing executables:</p>
<div class="highlight"><pre><span></span><code><span class="nb">Get-Command</span> <span class="n">python</span>
</code></pre></div>
<h3>Visual Studio Code</h3>
<p><a href="https://code.visualstudio.com/">Visual Studio Code</a> is a text editor that strikes a great balance between being full-featured and overly bloated.
This is an obvious proposition to more experienced developers, but there are a lot of beginners out there editing their files in <code>notepad.exe</code>.
I personally prefer it to slimmer alternatives like Sublime Text 3 and hulking behemoths like PyCharm or Visual Studio.</p>
<p>A really cool feature of VSCode on Windows is that it's quite command-line friendly. You can open the current folder in VSCode from the CLI with:</p>
<div class="highlight"><pre><span></span><code><span class="n">code</span> <span class="p">.</span>
</code></pre></div>
<p>or you can open a single file like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">code</span> <span class="n">my</span><span class="o">-file</span><span class="p">.</span><span class="n">txt</span>
</code></pre></div>
<p>I'm quite a fan of the <a href="https://marketplace.visualstudio.com/items?itemName=AndreyVolosovich.monokai-st3">Monokai ST3 theme</a> and the <a href="https://github.com/PKief/vscode-material-icon-theme">Materoal Icon Theme</a>, plus a bajillion other language-specific extensions.</p>
<p><a href="https://www.sublimetext.com/3">Sublime Text 3</a> is a popular alternative with a rich plugin ecosystem but less features out-of-the box.
Some people also like <a href="https://notepad-plus-plus.org/downloads/">Notepad++</a>, a decision I don't really understand, but as the name suggests,
it beats the shit out of just using Notepad.</p>
<h3>PowerShell setup</h3>
<p>There's a few tricks to getting PowerShell into a usable state on a new Windows machine.
The first thing is to always open as Administrator, if you can.
Once PowerShell is open, I like set the "execution policy", which allows you to run scripts:</p>
<div class="highlight"><pre><span></span><code><span class="nb">Set-ExecutionPolicy</span> <span class="n">-ExecutionPolicy</span> <span class="n">Bypass</span>
</code></pre></div>
<p>Now you can put some PowerShell in a script, like myscript.ps1:</p>
<div class="highlight"><pre><span></span><code><span class="c"># myscript.ps1</span>
<span class="nb">Write-Host</span> <span class="s2">"Hello World!"</span>
</code></pre></div>
<p>and then run it from your PowerShell terminal</p>
<div class="highlight"><pre><span></span><code><span class="p">./</span><span class="n">myscript</span><span class="p">.</span><span class="n">ps1</span>
<span class="c"># Hello World!</span>
</code></pre></div>
<p>Finally, I like to configure my profile, which is a script that runs before every PowerShell session starts.
This is where you can add things like welcome messages, function definitions and module imports.
To set up your profile, just open <code>$profile</code>, add some stuff and then save the file. For example:</p>
<div class="highlight"><pre><span></span><code><span class="n">code</span> <span class="nv">$profile</span>
</code></pre></div>
<p>One other handy PowerShell tip while I'm here: you can open folders in explorer with <code>explorer</code>, and, if you're not using VSCode,
you can still edit files using <code>notepad</code>:</p>
<div class="highlight"><pre><span></span><code><span class="n">explorer</span> <span class="p">.</span>
<span class="n">explorer</span> <span class="s2">"C:\Users\mattd"</span>
<span class="n">notepad</span> <span class="s2">"secret-plot.txt"</span>
</code></pre></div>
<h3>Git for Windows</h3>
<p>This one also seems kind of obvious if you're already using <a href="https://git-scm.com/download/win">Git</a>,
and if you're not using Git then why would you bother?
Wait! There's more than just <code>git</code> in Git for Windows. The install also gives you:</p>
<ul>
<li>Git Bash, which is a bash shell that can run scripts</li>
<li><code>ssh</code> for logging into Linux servers</li>
<li>
<p><code>scp</code> for transfering files to Linux servers</p>
</li>
<li>
<p><code>ssh-keygen</code> for generating SSH keys</p>
</li>
<li>
<p>A bunch of nice Unix tools like <code>find</code></p>
</li>
</ul>
<p>You'll never need ot use <a href="https://www.putty.org/">PuTTY</a> again! You might scoff at my promotion of Git Bash:</p>
<blockquote>
<p>Fool! Doesn't he know about <a href="https://docs.microsoft.com/en-us/windows/wsl/install-win10">Windows Subsystem for Linux</a>?</p>
</blockquote>
<p>WSL seems nice, but a lot of workplaces won't let you install it, but they will allow Git.</p>
<h3>Posh-Git: a PowerShell environment for Git</h3>
<p><a href="https://github.com/dahlbyk/posh-git">Posh-Git</a> gives you a nice little Git status message in your command prompt.
It tells you the current branch that you're on and the number un-comitted changes. It's quite convenient. To install it:</p>
<div class="highlight"><pre><span></span><code><span class="nb">Install-Module</span> <span class="n">posh-git</span> <span class="n">-Scope</span> <span class="n">CurrentUser</span> <span class="n">-Force</span>
</code></pre></div>
<p>And then to activate it:</p>
<div class="highlight"><pre><span></span><code><span class="nb">Import-Module</span> <span class="n">posh-git</span>
</code></pre></div>
<p>Note that it will only display your Git status when the current directory is a Git repo.
Consider placing the import statement into your PowerShell profile so that it loads automatically for you.</p>
<h3>The Silver Searcher</h3>
<p>The <a href="https://github.com/ggreer/the_silver_searcher">Silver Searcher</a> is a nice little CLI tool for quickly finding all
instances of a string in a folder. For example if I want to find all instances of "probably" in my "contents" folder:</p>
<div class="highlight"><pre><span></span><code><span class="n">ag</span> <span class="n">-i</span> <span class="n">probably</span> <span class="n">content</span>
</code></pre></div>
<p>Its main appeal is speed: it's really fucking fast. Git grep is also a contender if you're working in a Git repository:</p>
<div class="highlight"><pre><span></span><code><span class="n">git</span> <span class="n">grep</span> <span class="n">probably</span>
</code></pre></div>
<h3>Next steps</h3>
<p>Despite Steve Ballmer's <a href="https://www.youtube.com/watch?v=Vhh_GeBPOhs">pro-dev yelling</a> in the 2000s, Microsoft
dropped the ball on making Windows nice for developers. Nevertheless you can use these tools, and others, to cobble together an environment where
building software isn't like stubbing your toe on every piece of furniture. If you haven't tried these tools already, then I encourage you to install them and have a play around. You might find your coding experience a little less painful.</p>Run your Python unit tests via GitHub actions2020-04-27T12:00:00+10:002020-04-27T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-27:/pytest-on-github-actions.html<p>You've written some unit tests for your Python app. Good for you! There are dozens of us, dozens!
You don't always remember to run your tests, or worse, your colleagues don't always remember to run them.</p>
<p>Wouldn't it be nice to automatically run unit tests on every commit to GitHub …</p><p>You've written some unit tests for your Python app. Good for you! There are dozens of us, dozens!
You don't always remember to run your tests, or worse, your colleagues don't always remember to run them.</p>
<p>Wouldn't it be nice to automatically run unit tests on every commit to GitHub? What about on every pull request?
You can do this with <a href="https://github.com/features/actions">GitHub Actions</a>.
You'd be able to hunt down commits that broke the build, and if you're feeling blamey, <em>who</em> broke the build.
Sounds complicated, but it's not.
Sounds like it might cost money, but the free version has ~30 hours of execution per month.
Let me show you how to set this up.</p>
<p>There is example code for this blog post <a href="https://github.com/MattSegal/actions-python-tests">here</a>.</p>
<h3>Setting up your project</h3>
<p>I'm going to assume that:</p>
<ul>
<li>You have some Python code</li>
<li>You use Git, and your code is already in a GitHub repository</li>
</ul>
<p>If you're already running unit tests locally you can skip this section.
Otherwise, your Python project's folder looks something like this:</p>
<div class="highlight"><pre><span></span><code>.
├── env Python virtualenv
├── requirements.txt Python requirements
├── README.md Project description
└── stuff.py Your code
</code></pre></div>
<p>If you don't have tests already, I recommend trying pytest (and adding it to your requirements.txt).</p>
<div class="highlight"><pre><span></span><code>pip install pytest
</code></pre></div>
<p>You'll need at least one test</p>
<div class="highlight"><pre><span></span><code><span class="c1"># test_stuff.py</span>
<span class="kn">from</span> <span class="nn">stuff</span> <span class="kn">import</span> <span class="n">run_stuff</span>
<span class="k">def</span> <span class="nf">test_run_stuff</span><span class="p">():</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">run_stuff</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">1</span>
</code></pre></div>
<p>You'll want to make sure your tests run and pass locally</p>
<div class="highlight"><pre><span></span><code>pytest
</code></pre></div>
<h3>Set up your Action</h3>
<p>You'll need to create new a file in a new folder: <code>.github/workflows/ci.yml</code>.
You can learn more about these config files <a href="https://help.github.com/en/actions">here</a>.
Here's an example file:</p>
<div class="highlight"><pre><span></span><code><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Project Tests</span><span class="w"></span>
<span class="nt">on</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">push</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">branches</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span><span class="w"></span>
<span class="w"> </span><span class="nt">pull_request</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">branches</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">master</span><span class="w"></span>
<span class="nt">jobs</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">build</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">ubuntu-latest</span><span class="w"></span>
<span class="w"> </span><span class="nt">steps</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">actions/checkout@v2</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Set up Python 3.6</span><span class="w"></span>
<span class="w"> </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">actions/setup-python@v1</span><span class="w"></span>
<span class="w"> </span><span class="nt">with</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">python-version</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3.6</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Install dependencies</span><span class="w"></span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span><span class="w"></span>
<span class="w"> </span><span class="no">python -m pip install --upgrade pip</span><span class="w"></span>
<span class="w"> </span><span class="no">pip install -r requirements.txt</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Test with pytest</span><span class="w"></span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pytest -vv</span><span class="w"></span>
</code></pre></div>
<p>Now your project looks like this:</p>
<div class="highlight"><pre><span></span><code>.
├── .github GitHub hidden folder
| └── workflows Some other folder
| └── ci.yml GitHub Actions config
├── env Python virtualenv
├── requirements.txt Python requirements
├── README.md Project description
├── test_stuff.py pytest unit tests
└── stuff.py Your code
</code></pre></div>
<p>Commit your changes, push it up to GitHub and watch your tests run!</p>
<p>Sometimes they fail:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/c46a3b978fa441b2a50abbe9d7d2a1ef" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>Sometimes they pass:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/f06b6150b74445159e665f0b3ba92c2a" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>Add a badge to your README</h3>
<p>You can add a "badge" to your project's README.md.
Assuming your project was hosted at https://github.com/MyName/my-project/, you can add this
to your README.md file:</p>
<div class="highlight"><pre><span></span><code>![](https://github.com/MyName/my-project/workflows/Project%20Tests/badge.svg)
</code></pre></div>
<h3>Next steps</h3>
<p>Write some tests, run them locally, and then let GitHub run them for you on every commit from now on.
If you get stuck, check out <a href="https://github.com/MattSegal/actions-python-tests">this minimal reference</a> or the <a href="https://help.github.com/en/actions">Actions docs</a>.</p>Simple Django deployment part six: domain setup2020-04-26T18:00:00+10:002020-04-26T18:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-6.html<p>We're very nearly done deploying our Django app. There's just one more thing we should take care of.
Having a raw IP as our website address is kind of yucky, isn't it?
You're not going to ask your friend, boss, or mum to visit 23.231.147.88 to check …</p><p>We're very nearly done deploying our Django app. There's just one more thing we should take care of.
Having a raw IP as our website address is kind of yucky, isn't it?
You're not going to ask your friend, boss, or mum to visit 23.231.147.88 to check out your cool new Django app.
You want a domain name like mycoolwebsite.xyz! Let's finish up our deployment by setting up a domain for our web app.</p>
<p>Here we will learn how to:</p>
<ul>
<li>Buy a domain name</li>
<li>Set up a Cloudflare reverse-proxy</li>
<li>Adding our domain name to Django prod settings</li>
<li>Test our setup</li>
</ul>
<p>A quick note before we start - usually you would do this at the start of the process, right after you create your server,
because setting domain name records can take a long time. The reason we're doing it last in this guide is to make sure that you're confident that your app is working before we start fiddling with DNS. If you've never heard of DNS before, I did a short <a href="https://mattsegal.dev/dns-for-noobs.html">blog post</a> that explains the basics.</p>
<h3>Buy a domain name</h3>
<p>If you already own a domain name for your app your can skip this step.
To get a domain name we need to give someone some money.
We're going to go to <a href="https://www.namecheap.com/">Namecheap</a> and buy a domain name. Why Namecheap?
Domain name registrars exist to sell domains and occasionally fuck you over by raising prices and trying to sell you crap that you don't need. They're generally a pain, so I did a Google search for "site:reddit.com best domain seller", and the good people of Reddit seemed to hate Namecheap the least.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/d9XjuXxNPRI"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Set up Cloudflare</h3>
<p>We're going to use Cloudflare to set up our DNS records. I've written elsewhere on <a href="https://mattsegal.dev/cloudflare-review.html">why I like Cloudflare</a>. TLDR it's pretty easy to use and provides some nice bonus features like caching your static files, SSL encryption and analytics.</p>
<p>All requests to our domain (mycoolwebsite.xyz) are going to pass through Cloudflare's servers, which are running NGINX under the hood. This kind of set up is called a "<a href="https://en.wikipedia.org/wiki/Reverse_proxy">reverse proxy</a>", because we have a "proxy" (Cloudflare), routing all incoming traffic to our server. This is in contrast to a "forward proxy", which deals will outbound traffic.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/GCCBGNKDBIw"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>... 30 minutes later ...</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/6TWJlVv8Qek"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Next steps</h3>
<p>Alright! We're done! Congratulations, you've deployed a Django app. Just as a quick recap, you've learned how to:</p>
<ul>
<li>Use ssh, scp and create SSH keys</li>
<li>Create a cloud virtual machine</li>
<li>Set up your cloud VM</li>
<li>Configure your Django project for deployment</li>
<li>Deploy your Django project to the server</li>
<li>Run your web app using Gunicorn and Supervisor</li>
<li>Set up server logging</li>
<li>Automate the deployment, server setup and database backups</li>
<li>Set up your web app's domain name plus SSL and caching using Cloudflare</li>
</ul>
<p>Now I encourage you to take the things you've learned and write your own Django app and try deploying that.
It will probably break at some point, it always does, but I hope you're able to use the skills that you've
picked up in this guide to debug the problem and fix it.</p>
<p>You've got the basics down, but there is a lot of stuff you can learn about deploying Django and web apps in general.
Some things you might want to look into at some point:</p>
<ul>
<li><a href="https://mattsegal.dev/file-logging-django.html">Setting up Django logging in production</a></li>
<li><a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">Adding error monitoring</a></li>
<li><a href="https://mattsegal.dev/offline-tasks.html">Adding offline tasks</a></li>
<li><a href="https://mattsegal.dev/simple-scheduled-tasks.html">Adding offline scheduled tasks</a></li>
<li>Start using Git for deployments</li>
<li>Try using Fabric for deployment scripting</li>
<li>Implement "continuous delivery" using GitHub actions</li>
<li>Try using PostgreSQL instead of SQLite</li>
<li>Try using NGINX instead of (or in addition to) Cloudflare</li>
<li>Try put your gunicorn server / Django app inside of Docker with Docker Swarm</li>
<li>Try out media hosting in AWS S3</li>
<li>Add automated unit tests to your deployment pipeline</li>
<li>Secure your server fail2ban and a firewall</li>
<li>Improve your server setup automation with Ansible</li>
<li>Try a different cloud hosting provider, like AWS or Google Cloud</li>
</ul>
<p>There's an endless list of stuff you can learn, and there's no need to do it all right now,
but it's there if you're interested.</p>
<p>If you have any feedback on this guide, or questions about the steps, you can email me at mattdsegal@gmail.com.</p>Simple Django deployment part five: deployment automation2020-04-26T17:00:00+10:002020-04-26T17:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-5.html<p>Deploying our Django app involved a lot of different commands, right? It would suck to have to do all that over again, wouldn't it?</p>
<p>Having to manually type all those commands again would be tedious, slow and easy to screw up.
Even worse, the harder it is to deploy, the …</p><p>Deploying our Django app involved a lot of different commands, right? It would suck to have to do all that over again, wouldn't it?</p>
<p>Having to manually type all those commands again would be tedious, slow and easy to screw up.
Even worse, the harder it is to deploy, the less often you are going to do it.
If you deployments are infrequent, then they'll contain more features in one big batch and they'll be risker, because there's more things that could go wrong, and it's harder to tell what caused any issues that crop up.
Frequent, small deployments are key to pumping out lots of valuable code with lower risk.
The <a href="https://www.amazon.com.au/Phoenix-Project-DevOps-Helping-Business/dp/0988262592">Phoenix Project</a>
is a great book that talks more about this idea (srsly give it a read).</p>
<p>So, if we want to deploy fast and often, we're going to need to automate the process. Hell, even if we want to do this again in a week we need to automate the process, because we're definitely going to forget what-the-fuck we just did.
No need to get fancy, we can do the whole thing with a bunch of bash scripts.
You can get fancy later.</p>
<p>Our goal is that you can run a single bash script and your whole deployment happens.</p>
<p>We'll write these scripts in stages:</p>
<ul>
<li>Uploading new code to the server</li>
<li>Installing the new code</li>
<li>Single deploy script</li>
<li>Backing up the database</li>
<li>Automating the server setup</li>
</ul>
<h3>Uploading new code to the server</h3>
<p>If you recall, we uploaded code to the server by creating a "deploy" directory locally,
then uploading that directory to our server. After that we did some clean up work on that directory
to deal with Python bytecode (pyc) files and Windows line endings.</p>
<p>Let's automate the upload first. The files that we need to copy over are:</p>
<ul>
<li>requirements.txt for our Python packages</li>
<li>tute for our Django code</li>
<li>scripts for our bash scripts</li>
<li>config for our Gunicorn and Supervisor config</li>
</ul>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/OOYG4ZGOv80"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Installing the new code</h3>
<p>Now we have automated the process of getting our code onto the server,
let's script the bit where we install it in the project dir and run Gunicorn</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/R1XDE-NoGAQ"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>So, you might have noticed that we stop Gunicorn at the start of the deployment and start it again it at the end. That means your site will be offline during the deployment and if something goes wrong, it'll stay down. You have to log in and manually fix the problem to get it running again.</p>
<p>This is fine for personal projects and low traffic websites - nobody will notice. If you're running some important, high traffic website, then there are techniques to make sure that your website is always running - but we won't go into that here. We're keeping it simple for now.</p>
<h3>Single deploy script</h3>
<p>Alright we're basically done with this section, now all we need to do is combine our two scripts into a master deploy script.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/FnM1fL3-I2E"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>That's it, now we can deploy our code over and over in seconds.</p>
<h3>Backing up the database</h3>
<p>This section is optional, it's nice to have, but not a core part of the guide. Skip it if you like.
Here I'll show you how to back up your database on the server.
It's very, very simple to do with SQLite because the database is just a single file.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/Pc6C68RTbfc"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Automating the server setup</h3>
<p>This section is also optional, it's nice to have, but not a core part of the guide. Skip it if you like.</p>
<p>You will get to a point where you want to move you app to a new server,
or maybe you've broken your server really badly, or maybe you want to set your server up again slightly differently.
When that time comes, you will not remember how you set this one up: that's why we want to automate our server setup.</p>
<p>Automating your server setup also allows you to do things that were inconceivable before:</p>
<ul>
<li>run hundreds of servers that are all configured the same way</li>
<li>create a new server for every new deployment (allowing for "blue-green" deployments), allowing for zero downtime during deploys</li>
<li>create servers for testing that are identical to your "live" production server</li>
</ul>
<p>I talk more about this topic in my video on <a href="https://mattsegal.dev/intro-config-management.html">configuration management</a>.</p>
<p>So, we want to be able to blow away our server and make a new one with minimal work required. The good news is we're already most of the way there. Our Django app in prod is defined by 3 things at the moment:</p>
<ul>
<li>our code (we have our code already)</li>
<li>our database (we have automatic backups already)</li>
<li>the server (we know how to set it up, we just need to automate this)</li>
</ul>
<p>Our goal in this section is to run a single script on a new DigitalOcean droplet and it all just works. In addition, we want this script to be "idempotent" - this means we want to be able to run it many times on the same server and get (mostly) the same result.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/I4XGu9MXkSE"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>This script can get kind of long and hairy, especially as your deployments get more complicated.
At some point, you're going to want to use something other than a bash script to automate this process.
When you're ready, I recommend you take a look at <a href="https://github.com/ansible/ansible">Ansible</a>,
which is a great tool for writing scripts to automatically setting up servers.
<a href="https://www.packer.io/">Packer</a> is also a good tool for using scripts like the one we just wrote to
"bake" a single virtual machine image, which can then be used to instantly create multiple copies of the same virtual machine.</p>
<h3>Next steps</h3>
<p>There's one last thing to do before our website is <em>really</em> deployed - <a href="https://mattsegal.dev/simple-django-deployment-6.html">give our app a domain name</a>.</p>Simple Django deployment part four: run a service2020-04-26T16:00:00+10:002020-04-26T16:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-4.html<p>So we've got a problem. Our Django app only runs when we're logged into the server via SSH and running Gunicorn.
That's not going to work long term. We need to get Gunicorn running even when we're not around.
In addition, if our Gunicorn server crashes because of some bug …</p><p>So we've got a problem. Our Django app only runs when we're logged into the server via SSH and running Gunicorn.
That's not going to work long term. We need to get Gunicorn running even when we're not around.
In addition, if our Gunicorn server crashes because of some bug, we want it to automatically restart.</p>
<p>In this section we're going to cover:</p>
<ul>
<li>Setting up Supervisor</li>
<li>Adding Gunicorn config</li>
<li>Setting up basic logging</li>
<li>Running as root</li>
</ul>
<h3>Setting up Supervisor</h3>
<p>We're going to solve our process supervison problem with <a href="http://supervisord.org/">Supervisor</a>. It's a program
that we can use to run Gunicorn in the background. I chose this tool because a lot of other Django devs use it,
plus it's pretty easy to install, configure and run.</p>
<p>We can install it into our virtualenv with pip, which is handy:</p>
<div class="highlight"><pre><span></span><code>pip install supervisor
</code></pre></div>
<p>Supervisor has several parts that we should know about:</p>
<ul>
<li><strong>supervisord</strong>: the "<a href="https://en.wikipedia.org/wiki/Daemon_(computing)">daemonized</a>" program that will run Gunicorn as a "child process"</li>
<li><strong>supervisorctl</strong>: the tool that we will use to send commands to supervisord</li>
</ul>
<p>We'll also be writing some config files to help automate how Supervisor and Gunicorn run</p>
<ul>
<li><strong>supervisord.conf</strong>: a file that we'll need write to configure how supervisord works</li>
<li><strong>gunicorn.conf.py</strong>: a file we'll need to write to configure how Gunicorn works</li>
</ul>
<p>Finally, we need to start configuring basic logging. We didn't really need logging before because when we ran "runserver" or "gunicorn",
we could just read the console output on our terminal. We can't do that anymore because we cannot see the terminal. So we need to ask
gunicorn and supervisord to write their logs to a file somewhere, so we can read them later if we need to. Once we're done, our Django project will look like this when we deploy it:</p>
<div class="highlight"><pre><span></span><code>/app
├── env Python 3 virtualenv
├── requirements.txt Python requirements
├── db.sqlite3 Production SQLite database
├── scripts Bash scripts
| └── run-gunicorn.sh Script to run Gunicorn
├── config Config files
| ├── supervisord.conf Supervisor config
| └── gunicorn.conf.py Gunicorn config
├── logs Log files
| ├── supervisord.log Supervisor logs
| └── gunicorn.access.log Gunicorn access logs
| └── gunicorn.app.log Gunicorn application logs
└── tute Django project code
├── tute Django app code
├── counter Django app code
├── staticfiles Collected static files
└── manage.py Django management script
</code></pre></div>
<p>It's coming to be a lot of stuff isn't it? When I said this would be a "simple" deployment guide, I meant that in a relative sense. ¯\_(ツ)_/¯</p>
<p>Let's get started by setting up Supervisor to run our Django app using Gunicorn. Unfortunately we can't test this new setup completely on our Windows machine, so we're going to have to upload our files to the server to try this out.</p>
<p>You can find the scripts and config referenced in the video <a href="https://github.com/MattSegal/django-deploy">here</a>.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/ny2L15dOf4Q"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Adding Gunicorn config</h3>
<p>Next we want to tweak how Gunicorn runs a little bit. In particular, we want to set the number of "workers". The Gunicorn process runs as a sort of "master", which then co-ordinates a bunch of child "worker" processes. The <a href="https://docs.gunicorn.org/en/stable/settings.html#workers">Gunicorn docs</a> suggest using 2-4 workers per CPU core (we have 1 on our DigitalOcean VM), but the default is 1.</p>
<p>If we only have 1 worker, and two people send our site a HTTP request, then one of them will need to wait for the other to finish. If we set more workers, it means we can handle more HTTP requests at the same time. Too many workers are kind of pointless because they'll just end up fighting for access to the CPU. So let's pick 3 workers, because we have 1 CPU core, nothing else happening on this machine, and 3 is half way between the recommended 2-4 (which is a very arbitrary way of deciding).</p>
<p>We <em>could</em> apply this config change by just adding it as a command line parameter when we run Gunicorn:</p>
<div class="highlight"><pre><span></span><code>gunicorn tute.wsgi:application --workers <span class="m">3</span>
</code></pre></div>
<p>But this will become unweildy when we configure more and more settings. It's kind of just an aesthetic thing, but I'd rather write this config to a file than as command line parameters. So instead, we can write a <a href="https://docs.gunicorn.org/en/stable/configure.html#configuration-file">configuration file</a> called "gunicorn.conf.py" and put all our config in there:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># gunicorn.conf.py</span>
<span class="n">bind</span> <span class="o">=</span> <span class="s2">"0.0.0.0:80"</span>
<span class="n">workers</span> <span class="o">=</span> <span class="mi">3</span>
<span class="c1"># Add more config here</span>
</code></pre></div>
<p>and then when we run gunicorn we can just do this:</p>
<div class="highlight"><pre><span></span><code>gunicorn tute.wsgi:application -c config/gunicorn.conf.py
</code></pre></div>
<p>Let's set up our Gunicorn config.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/KsCJw3skJdQ"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>Now that our Gunicorn config has been created, we can set up logging.</p>
<h3>Setting up basic logging</h3>
<p>As I mentioned earlier, we need logging because Gunicorn is now running in the background and we can't see its terminal output.
This is important when something goes wrong on in our code and we need to figure out what happened. In this section we'll set up logging so we can see:</p>
<ul>
<li>what supervisord is doing</li>
<li>what requests Gunicorn is receiving</li>
<li>what Gunicorn is doing, plus Django logs</li>
</ul>
<p>This isn't the <em>perfect</em> logging setup, I go into more detail on how we can improve Django logging in production <a href="https://mattsegal.dev/file-logging-django.html">in this blog post</a>, but it's good enough for now.</p>
<p>When we're done, our logs on the server will look like this:</p>
<div class="highlight"><pre><span></span><code>/app
...
└── logs Log files
├── supervisord.log Supervisor logs
└── gunicorn.access.log Gunicorn access logs
└── gunicorn.app.log Gunicorn application logs
</code></pre></div>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/ubR--JB5iQM"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>Ok we've got logging all set up, looking good! Later on, you might want to also add <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">error monitoring</a> to your app, which alerts you when errors happen.</p>
<h3>Running as root</h3>
<p>Before we move on to automating our deployments, there's an elephant in the room that I'd like to address.
This whole time we've been running Gunicorn as the Linux root user.
In Windows terminology we'd call this an "admin" account.</p>
<p>This setup is a potential security risk. Here's the problem: we've given Gunicorn permission
to do <em>anything</em> to our VM. It can delete all the files, install any programs they want, terminate other processes, whatever.
This will be a problem if a hacker figures out how to execute arbitrary code on our Django app, or manipulate our Django app in some other way (like writing to any part of the filesystem).
Any vulnerability that we accidentally write in our Django app can do maximum damage to our server,
because we've allowed Gunicorn to do everything. The two biggest risks that I see are:</p>
<ul>
<li>a hacker could trash our server and delete all our shit</li>
<li>a hacker could gain control of our server and use it to mine Bitcoin, <a href="https://www.cloudflare.com/en-au/learning/ddos/what-is-a-ddos-attack/">DDoS</a> another server, etc.</li>
</ul>
<p>This is why people say "don't run Gunicorn as root", because if you fuck up your code somewhere, or if Gunicorn itself
is vulnerable somehow, then control of your server and data could be compromised.</p>
<p>So why does this guide have you run Gunicorn as root?</p>
<ul>
<li>It makes it easier for us to access port 80</li>
<li>It removes some extra work around managing file permissions</li>
<li>It avoids some extra config work around creating new users and assigning user roles</li>
<li>Our server, app and data are all pretty trivial and if they're compromised it's not a big deal</li>
</ul>
<p>As you learn more about deploying web apps and managing infrastructure, you'll need to learn to make your own decisions about
the security risks you're willing to take vs. the extra work you'll need to do. For now I think running as root is OK.
In the future, especially if you think your app is important, you may want to run Gunicorn as a non-root user and research
other security measures.</p>
<h3>Next steps</h3>
<p>Now that we've got our Django app up-and-running, all on its own, we can look forward to <a href="https://mattsegal.dev/simple-django-deployment-5.html">automating the deployment</a>, so we can deploy our code again and again, quickly and easily.</p>Simple Django deployment part three: deploy code2020-04-26T15:00:00+10:002020-04-26T15:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-3.html<p>We've got our server set up, and our Django code is ready.
Now we can actually deploy Django to our server.
The goal of this section is to get a basic deployment done.
We'll do some automation and introduce some extra tools later.</p>
<p>In this section we'll cover:</p>
<ul>
<li>Windows line …</li></ul><p>We've got our server set up, and our Django code is ready.
Now we can actually deploy Django to our server.
The goal of this section is to get a basic deployment done.
We'll do some automation and introduce some extra tools later.</p>
<p>In this section we'll cover:</p>
<ul>
<li>Windows line endings</li>
<li>Uploading and running your Django app</li>
</ul>
<h3>Windows line endings</h3>
<p>A quick aside before we start deploying: Windows line endings. These are the curse of every Django developer running Windows.
This is one of those technical details that you never want to know about, but they'll bite you in the ass if you ignore them.</p>
<p>The TLDR is that in Linux and MacOS, lines end with the "\n" character.
On Windows lines end with "\r\n", because fuck-you-that's-why.
The problem is that your Windows Python files will fail on Linux because they have the wrong line endings.</p>
<p>There are several ways to fix this, including writing our own custom bash or Python scripts to convert these line endings, but for simplicity we'll just use an off-the-shelf tool called <a href="https://linux.die.net/man/1/dos2unix">dos2unix</a>, which I'll show you later.</p>
<p>You can help avoid this problem in VSCode by selecting the "LF" option instead of "CRLF" for the "End of Line Sequence" setting, which is visible in the toolbar on the bottom right hand corner of your screen.</p>
<h1>Uploading and running your Django app</h1>
<p>Let's upload our code to the server and set up our app so we can run it. There are lots of ways to do get your code onto the server: scp, rsync, git. I'm going to stick to using scp to limit the number of new tools needed to do this.</p>
<p>Currently our Django project, on our local machine, looks like this:</p>
<div class="highlight"><pre><span></span><code>/django-deploy
├── env Python 3 virtualenv
├── requirements.txt Python requirements
├── db.sqlite3 Local SQLite database
└── tute Django project code
├── tute Django app code
├── counter Django app code
└── manage.py Django management script
</code></pre></div>
<p>When we upload our code, we're going to put it in the root user's home directory - /root/. It'll look like this:</p>
<div class="highlight"><pre><span></span><code>/root
└── deploy All uploaded code
├── requirements.txt Python requirements
└── tute Django project code
├── tute Django app code
├── counter Django app code
└── manage.py Django management script
</code></pre></div>
<p>Then we'll be creating a directory called /app/, which will be the final resting place of our code,
and we will set up our project like this:</p>
<div class="highlight"><pre><span></span><code>/app
├── env Python 3 virtualenv
├── requirements.txt Python requirements
├── db.sqlite3 Production SQLite database
└── tute Django project code
├── tute Django app code
├── counter Django app code
├── staticfiles Collected static files
└── manage.py Django management script
</code></pre></div>
<p>A key idea is that every time we re-deploy our code in the future, we want to delete and re-create the folder /app/tute,
but we want to keep the database (db.sqlite3), or else we lose all our production data.</p>
<p>What I'm going to show you now is a very manual process, we will automate this later.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/Hm0Dz61_oQ8"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>So to recap, the testing we just did looks like this:</p>
<p><img alt="gunicorn http" src="https://mattsegal.dev/gunicorn-server-http.png"></p>
<p>We're most of the way there! We've got our Django app running our server.
There's just a bit more to go before it's fully deployed.</p>
<h3>Next steps</h3>
<p>To really say that our app is "deployed", we need it to run even when we're not around.
In the next section, we'll learn how to <a href="https://mattsegal.dev/simple-django-deployment-4.html">run Django in the background</a></p>Simple Django deployment part two: local setup2020-04-26T14:00:00+10:002020-04-26T14:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-2.html<p>We've got our server set up and ready to host our Django app, now let's focus on preparing our app for deployment.
The goal of this section is to set up and test as much of the stuff that we'll be using in production.
That way, we can debug issues …</p><p>We've got our server set up and ready to host our Django app, now let's focus on preparing our app for deployment.
The goal of this section is to set up and test as much of the stuff that we'll be using in production.
That way, we can debug issues on our computer, instead of on the server.</p>
<p>For this guide I'm going to be creating a Django app from scratch.
I recommend you follow along and set up your project like I do, rather than trying to deploy an existing Django project.
You can try deploy your existing app after you've finished the guide. Remember: new skills on easy terrain.</p>
<p>In this section we'll cover:</p>
<ul>
<li>Setting up our Python environment</li>
<li>Creating a basic Django app</li>
<li>SQLite limitations</li>
<li>Preparing Django for production</li>
<li>Serving static files in production</li>
<li>Preparing our WSGI server</li>
<li>Windows line endings</li>
</ul>
<h3>Setting up our Python environment</h3>
<p>I assume you've got Python 3 already installed on your computer. If you don't <a href="https://realpython.com/installing-python/#windows">install it now</a>.</p>
<p>We're going to be installing some Python packages for our app and we also will want to install the same packages on our server.
To keep things consistent, we're going to use a "virtual environment" (virtualenv) for this project.
In general it's good practice to always use a virtualenv, for these reasons:</p>
<ul>
<li>It helps maintain consistency between our local project and the deployed project</li>
<li>It helps you keep track of what packages you need to run the project</li>
<li>It helps minimise the number of packages that we need to install when we deploy</li>
<li>It keeps other apps on the same computer from overwriting our packages with different versions</li>
</ul>
<p>Here's how to start our project with a virtualenv.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/8ja20EjR7zs"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Creating a basic Django app</h3>
<p>Now that we've got Django installed let's create our Django project. This guide covers some of the same ground as the <a href="https://docs.djangoproject.com/en/3.0/intro/tutorial01/">Django tutorial</a>, but we're going to skim through it, because the point isn't to teach you Django basics, it's to teach you how to deploy Django. If you're not familliar with Django then try out the tutorial first.</p>
<p>In addition some of my code (ie. the views) is going to be a little half-assed, since the purpose of the guide is not to show you how to write "good" Django views, it's just to get something basic working so we can deploy it.</p>
<p>I've put the <a href="https://github.com/MattSegal/django-deploy">reference code for this guide onto GitHub</a>, which you might want to look at while you're following along.</p>
<p>This video will show you how we're going to set up our Django project, and importantly, it will show you how to implement the key features that we want to test later, namely:</p>
<ul>
<li>A view which interacts with a database model</li>
<li>Some static files (eg. CSS, JS)</li>
<li>Our database setup</li>
<li>The admin panel</li>
</ul>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/fOvQfz8GZeM"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>Now we've created our app and it's working locally. The next step is to get it ready for production. Here's a diagram of how we've been running our app and serving requests so far.</p>
<p><img alt="runserver http" src="https://mattsegal.dev/runserver-http.png"></p>
<h3 id="sqlite">Is SQLite OK for production?</h3>
<p>Before we move on, I want to talk about SQLite quickly. You can skip this bit if you don't care. We'll be using SQLite as our database in development and in production. It'll be two separate databases - we're not going to copy our local SQLite file to the server. The main reason that I'm using SQLite instead of a more advanced database like PostgreSQL or MySQL is because I want to keep this guide as simple as I can.</p>
<p>Is it bad practice to use SQLite in production? Are we taking some shitty shortcut that will bite us in the ass later? Mostly no. Here's what the creators of SQLite <a href="https://www.sqlite.org/whentouse.html">have to say</a> about running it for webservers:</p>
<blockquote>
<p>SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites).</p>
</blockquote>
<p>For our needs, the performance of SQLite is totally fine. There are some limitations to SQLite that are worth mentioning though (<a href="https://djangodeployment.com/2016/12/23/which-database-should-i-use-on-production">discussed here</a>). One concern is that only one change to the database can <a href="https://www.sqlite.org/faq.html#q5">happen at a time</a>. Multiple concurrent reads, but only one write:</p>
<blockquote>
<p>Multiple processes can have the same database open at the same time. Multiple processes can be doing a SELECT at the same time. But only one process can be making changes to the database at any moment in time, however.</p>
</blockquote>
<p>Most website traffic is reads, not writes, so it's not as bad as it sounds.
Still, what happens in Django when two users try to write to an SQLite database at the same time? I think this will happen:</p>
<ul>
<li>One user will get a lock on the database, and will write their changes, while the other user will be forced to wait</li>
<li>If the first user finishes quickly enough, then the second user will get their turn - no problem here</li>
<li>If the first user takes too long, then the second user gets an error "OperationalError: 'database is locked'"</li>
</ul>
<p>You can <a href="https://docs.djangoproject.com/en/3.0/ref/databases/#database-is-locked-errors">increase the wait time if you need to</a>. I really don't think this is a big issue for low-volume learning projects, or small basic websites with medium traffic.</p>
<p>The other issue worth mentioning is switching from SQLite to another database like PostgreSQL. This probably will be annoying to do, where you need to dump your data to disk as a JSON or something then reload it into Postgres. If this seems like a huge issue for you, then I suggest you follow this guide, then learn how to switch SQLite for Postgres before you fill your database with valuable data. Take small steps.</p>
<p>One thing worth noting is that SQLite is <em>really easy</em> to back up. You just make a copy of the file - done!</p>
<h3>Preparing Django for production</h3>
<p>We need to make some changes to our Django settings to prepare our project for production, mostly for security reasons. The big 3 are:</p>
<ul>
<li><strong>DEBUG</strong>: needs to be set to False to prevent Django from leaking information like error messages</li>
<li><strong>SECRET_KEY</strong>: needs to be set to something that's actually secret: you can't put it on GitHub</li>
<li><strong>ALLOWED_HOSTS</strong>: needs to be set to a whitelist of the IP addresses / domain names that your app can use, to prevent cross site request forgery attacks... or something like that</li>
</ul>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/nL6yJOKTzO0"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>Our server seems to be <em>mostly</em> working with our new production settings...
other than our static files mysteriously breaking. Let's fix that next.</p>
<h3>Serving static files in production</h3>
<p>So your static files (CSS, JS, images, etc) work fine when DEBUG=True, but they're broken when DEBUG=False.
This doesn't seem like a "debug" thing... what the fuck? Right?
They were working before!?!? Whose ideas was this?</p>
<p>Aren't you glad you found out about this problem <em>before</em> you tried to deploy your app?</p>
<p>Many Django developers have been slapped in the face by this surprise.
If you want to go outside and scream now's a good time.</p>
<blockquote>
<p>AIIIIIIIIIIIIIIEEEEAAAAAAAAAAAAHHHH!!!!</p>
</blockquote>
<p>Computers can be frustrating! I like Django and the people who built it.
That said, this is one of the few times where I feel like the framework lets you down.
Django's docs on the <a href="https://docs.djangoproject.com/en/3.0/howto/static-files/deployment/">subject of deploying static files</a> are somewhere between cryptic and infuriating.
They're usually good docs too!</p>
<p>The reason that the static files break when DEBUG=False is that there are lots of different ways to serve static content.
When you are in DEBUG=True mode, Django helpfully serves your static files for you.
When you set DEBUG=False, you're on your own - Django forces you to figure out how you're going to serve static files in production.</p>
<p>There are several options available: most of the choices that are made around hosting costs, the other tech tools you're using
bandwidth, performance - shit we don't care about right now.
We want the simplest solution for serving static files in production.</p>
<p>As far as I know <a href="http://whitenoise.evans.io/en/stable/">Whitenoise</a> is the simplest way to serve static files in production:</p>
<blockquote>
<p>Radically simplified static file serving for Python web apps... None of this is rocket science, but it’s fiddly and annoying and WhiteNoise takes care of all it for you.</p>
</blockquote>
<p>Sounds good right? It basically just does what runserver was doing before we set DEBUG=False, except maybe a bit better, or something. Their <a href="http://whitenoise.evans.io/en/stable/index.html">documentation</a> and <a href="http://whitenoise.evans.io/en/stable/index.html#infrequently-asked-questions">FAQ</a> goes over what it does for you. We're going to use the CloudFlare CDN in a later part of this guide to cache our static files, so that will solve most of our performance concerns.</p>
<p>Let's follow their <a href="http://whitenoise.evans.io/en/stable/django.html">guide</a> and set up Django to use Whitenoise for static files. Before we get to the video let's go over the important bits.</p>
<p>First we have to install it</p>
<div class="highlight"><pre><span></span><code>pip install whitenoise
</code></pre></div>
<p>We also have to set STATIC_ROOT in our Django settings. STATIC_ROOT is a folder where Django will dump all of your static files when you run the "collectstatic" management command. Whitenoise looks inside this folder when DEBUG=False, so it's important we set it, and run "collectstatic" when we deploy. We'll go over this more in the video.</p>
<p>Alright, let's set up Whitenoise and solve our static files problem.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/97UQM-Cfhxs"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3 id="wsgi">Preparing our WSGI server</h3>
<p>So far we've been using the "runserver" management command to run our Django code and serve HTTP requests.
It works pretty well for development - the way it auto restarts when files change is pretty handy.
There's some trouble with running runserver in production though -the Django docs <a href="https://docs.djangoproject.com/en/2.2/ref/django-admin/#runserver">say it best</a>:</p>
<blockquote>
<p>DO NOT USE THIS SERVER IN A PRODUCTION SETTING. It has not gone through security audits or performance tests. (And that’s how it’s gonna stay. We’re in the business of making Web frameworks, not Web servers, so improving this server to be able to handle a production environment is outside the scope of Django.)</p>
</blockquote>
<p>Why <em>exactly</em> is using runserver in prod a bad idea? Honestly I don't know, I've never tried. Something about security and performance... here's the thing: when the people writing the software tell you not to use it production (in all caps no less), it's best to just listen to them, unless you're confident that you understand the risks and benefits.</p>
<p>So... what do we use to run our Django app instead? We're going to use <a href="https://gunicorn.org/">Gunicorn</a>, basically because it's a popular WSGI server and I'm familliar with it and it seems OK. Another widely used contender is <a href="https://uwsgi-docs.readthedocs.io/en/latest/">uWSGI</a>. I've seen <a href="http://docs.pylonsproject.org/projects/waitress/en/stable/">Waitress</a> recommended for running on Windows, but I've never tried it myself.</p>
<p>You might be wondering what "<a href="https://wsgi.readthedocs.io/en/latest/what.html">WSGI</a>" ("Web Server Gateway Interface") means. WSGI is a type of "interface". I think it's much easier to explain with examples than to get too theoretical.</p>
<p>Here are some WSGI compatible web frameworks:</p>
<ul>
<li>Django</li>
<li>Flask</li>
<li>Pyramid</li>
<li>web2py</li>
</ul>
<p>Here are some WSGI compatible web servers:</p>
<ul>
<li>Gunicorn</li>
<li>uWSGI</li>
<li>CherryPy</li>
<li>Apache's mod_wsgi module</li>
</ul>
<p>Web frameworks (eg. Django) are just some Python code, you need a web server to actually run the code and translate incoming HTTP requests (which are just text) into Python objects. The WSGI specification makes it so that any WSGI compatible webserver can run any WSGI compatible web framework, which means:</p>
<ul>
<li>Gunicorn can run Django</li>
<li>Gunicorn can run Flask</li>
<li>CherryPy can run web2py</li>
<li>mod_wsgi can run Django</li>
<li>... etc etc etc ...</li>
</ul>
<p>This is a good thing because it means that if you are using a particular web framework (eg. Django), you have a lot of choices for which web server you run (eg. Gunicorn). It's also good for web server developers, because lots of people with different web frameworks can use their tools.</p>
<p>With that out of the way, let's get stuck into using Gunicorn instead of runserver to run our Django app.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/wHmpB2AEmZY"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>So before we were doing this:</p>
<p><img alt="runserver http" src="https://mattsegal.dev/runserver-http.png"></p>
<p>Now we're doing this (hypothetically if Gunicorn actually worked on Windows):</p>
<p><img alt="gunicorn http" src="https://mattsegal.dev/gunicorn-http.png"></p>
<p>Nothing too crazy.</p>
<h3>Next steps</h3>
<p>Now that we've done our local setup, we're ready to <a href="https://mattsegal.dev/simple-django-deployment-3.html">deploy Django to the server</a></p>Simple Django deployment part one: infrastructure2020-04-26T13:00:00+10:002020-04-26T13:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment-1.html<p>In order to deploy our Django app, we need a somewhere to run it: we need a server.
In this section we'll be setting up our server in "the cloud".
Doing this can be fiddly and annoying, especially if you're new, so we want to get it right first before …</p><p>In order to deploy our Django app, we need a somewhere to run it: we need a server.
In this section we'll be setting up our server in "the cloud".
Doing this can be fiddly and annoying, especially if you're new, so we want to get it right first before we involve our Django app. By the end of this section we will learn how to:</p>
<ul>
<li>Install ssh and scp on Windows</li>
<li>Create a SSH key</li>
<li>Set up our cloud web server</li>
<li>Learn how to access our server, upload files</li>
<li>Test our setup</li>
</ul>
<h3>Installing our tools</h3>
<p>We'll need some tools to access the server:</p>
<ul>
<li>the "bash" <a href="https://en.wikipedia.org/wiki/Unix_shell">shell</a> for scripting</li>
<li>the "ssh" tool for logging into our web server</li>
<li>the "scp" tool to transfer file to our server</li>
</ul>
<p>We need to install these tools on Windows and the fastest and easiest way I know of is to just <a href="https://git-scm.com/download/win">install Git</a>.
We won't be using Git, just some of the tools that get installed with it. You can <a href="https://www.codecademy.com/learn/learn-git">learn Git</a> some other time.
If you're using a Mac or Linux you can skip this step and open up a terminal window.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/yizAaMHUC5w"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Creating a SSH key</h3>
<p>We'll be using an "SSH key" a way to authenticate ourselves when we log in to the server with ssh.
We need this key for when we create our server, so we're doing this bit first.
It's possible to just use a username / password when logging in via SSH, but creating a key is more convenient in the long run.
In particular, using a key means you don't have to type in a password every time you want to access the server.</p>
<p>In this video, we'll be creating an SSH key using the "ssh-keygen" command in bash:</p>
<div class="highlight"><pre><span></span><code>ssh-keygen -C <span class="s2">"mattdsegal@gmail.com"</span>
</code></pre></div>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/BIc1TWrVQcw"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>Now that we've created our key, we're ready to use it to log into web servers. To recap:
the SSH key we just created has two parts, a public key (id_rsa.pub) and a private key (id_rsa).
Both keys will be stored in ~/.ssh by convention. You can read your public key like this:</p>
<div class="highlight"><pre><span></span><code>cat ~/.ssh/id_rsa.pub
</code></pre></div>
<p>A public key is like your "username" when logging in - it's public information and it's OK to share it. It looks like this:</p>
<div class="highlight"><pre><span></span><code>ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCVtWYN+uDbdG6RtmU8vcaj1cxYM0kK6565LFa
MMkolZZlSA6MhfWfGUmIGswIHJ/yjQCRQRihlEdm0VQJgsFBtK36J/U2/u+cMjGXwN/9swYBsnj
8bSMRzYc2s6PeshYmefpD80dWsvW550zqHmOwnKxeiwpz1q+rqUgT/xd0nOATw92nx5CS7ozhnL
t0FA0r0fk9LGih473Ho4/22fsAGXTcnMV5VoDgeBP4z8BLt16pKD8fgSGB8OG3/bN6udY54TcM2
rFjfN8yP+Vcbs5xBd3HaTu8Z42IPdC46Z25WMt285FLLZyUqWY36CrQZoTEf9F6aCkFgwtOCN81
u0Qr1 foo@bar.com
</code></pre></div>
<p>It's usually all just long one line of text.
The "ssh-rsa" part means it's a key for SSH and uses RSA encryption.
The "AAAAB3N...u0Qr1" part is your actual key, it uniquely identifies your keypair.
The "foo@bar.com" part is just a comment, it can be anything, but by convention it's your email.</p>
<p>A private key is like your password - do not share it with anyone. Private keys look like this:</p>
<div class="highlight"><pre><span></span><code>-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAlbVmDfrg23RukbZlPL3Go9XMWDNJCuueuSxWjDJKJWWZUgOj
IX1nxlJiBrMCByf8o0AkUEYoZRHZtFUCYLBQbSt+if1Nv7vnDIxl8Df/bMGAbJ4/
G0jEc2HNrOj3rIWJnn6Q/NHVrL1uedM6h5jsJysXosKc9avq6lIE/8XdJzgE8Pdp
8eQku6M4Zy7dBQNK9H5PSxooeO9x6OP9tn7ABl03JzFeVaA4HgT+M/AS7deqSg/H
4EhgfDht/2zernWOeE3DNqxY3zfMj/lXG7OcQXdx2k7vGeNiD3QuOmduVjLdvORS
y2clKlmN+gq0GaExH/RemgpBYMLTgjfNbtEK9QIDAQABAoIBAQCAtWwAKOiYxAkr
jTyMdDwLLwx359+sW9YiLVRbRAErFaYzNJ1TdZV6k+ljCRN9Q4uYbtTJjwe7nRUm
TM+2gN8kfHhV+kiVxt5lk28wj3Qx9EqNF5/5vR3odPV26vPEhypB8V6FfYHO+S25
3zg6y+Z75jhz3g1DyYI14j4aB+qSg+5YSQ56vG+vhutYD41XVp3bkgD76kL8QFxd
q2cTpF5WSoZF49CaPhE2PnHoMZibLRQUOG+wJWkrQHcU+UnoWQvQUkqGNeRYpmI5
49Umk3b+/MQr0Dj6vuT2ZKqgFjr2FEu8AoA0tCZu8GUXWS1LshuZuqb1D0ZY+KOj
1SpEY48hAoGBAMXAbA8iNeON93yhOf7GqRCecL/wuyn6KqifCtEg8HbhbafcWBsd
/prEnEJHGRTU+omfLx7uVJmAgkVhj/uHl8K9L2qvlqkbBqF/rWPwWBi94G/cszn4
tYb91sTnnOk+QRjs/bGSCcv6kGlv2Bv0YYie0K6oQNPD9SqXCAit5hCDAoGBAMHO
Qv2+JcrrfLjoCbUaPaJ2sblO6Bq1RpUsltE1lL8nldTu/MgUIA3dEExg/hl7YLYW
vQVgNZkJQxHf1UJgZF/FTfw8yYaN4CKX4lOXDeKW8kN8xTtQQCPi77w/jxz5DUau
wmhqYOVwrCeqvLI4qUoEp6oOsQ83GxjHKXPfNK0nAoGAQD5aHLSFg068xz1tpOqP
RDnk8UZY17NRJoS8s+IanNRxlmYMLYsaCtey2AlXCaCDYDBZ05ej3laUe8vNRe7w
C7EAdY1jyb5g8hiTkPMk+6y7/Dtb8optFtTicAe6vz+dUGa1qHmEO0NEpSxTrgk/
om3N59/7Z5Cy1kpIruEn69cCgYBgRYCboVgOq9nB1Gn2D3nseT+hiKPdmIzeT07/
z7j7F8PjCXCCRxUBLf4Joui2acZJzZPJ1tfpFGO/vkumdFGIDW/Gy79j2pgrNv2T
fmbEVy0y/wjOhPfHm9Rw07XYs5K3uNoTmjxV3Rl3fuXLNkBJ53QOEsw7fak1LsHV
sFvvYwKBgDTRNKz3217jYjuAzeluHQHsfrUKn73DzckivDHr35m1JR37rlNXGWRE
JlN82KNevAqrDabwYPZnkDrPMGpLQi1A1icUtiBXRRskeR6ULxOASVyAJ3N1WV8T
1/g0+hahPeNFGQG649Z1d5WYSJeVbz7is3MiVGYaQu+iNz9VXNq5
-----END RSA PRIVATE KEY-----
</code></pre></div>
<p>... I'm not actually using this one so it's OK to share.</p>
<h3>Creating the server</h3>
<p>You've probably heard the word "server" used to refer to a dozen different things, so let me be specific.
Our server will be a Linux virtual machine (VM), which we are going to rent from DigitalOcean, a cloud hosting company.
DigitalOcean will run our VM in one of their datacenters, which is a <a href="https://lh3.googleusercontent.com/7D8_SzSQQn-uDeKq4R7SSER5LO7fjsnkCLJ-uZG443cKHFS20nU-SyvlzXaGP97Fgt31MYJdgy94563uETi9jbosUMYQzO95-H0PRg=w2114-h1058-n">big building</a> that is <a href="https://www.pon-cat.com/application/files/6215/3995/7501/Datacenter-Pon_Power.jpg">full of computers</a>. For our purposes, this VM is a stand-alone computer that is for our private usage, with a static IP address which we can use to find it online.</p>
<p>The first thing you need to do is create an account with <a href="https://www.digitalocean.com/">DigitalOcean</a>. The only reason I've chosen this company is because they have a nice web user interface and I already use them. Other than that, there's no reason you couldn't also use Linode, AWS, Google Cloud or Azure to do the exact same thing. They all provide Linux web servers for rent.</p>
<p>Once you've created your account, you can follow this video for the rest of the setup.
I'm not sure exactly when they're going to ask you to put your credit card details it, but have a credit card ready.</p>
<p>By the end of this video we'll have created our server and we'll have an IP address, which we can use to log into the server.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/mdRTN-rzi94"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<h3>Setting up the server</h3>
<p>The main tools that we'll need to run our Django app on this server are Python 3 and pip, the Python package manager.
You'll find that Python 3 is already installed on our Ubuntu server, but we need to install pip.
We'll be using the <a href="https://devconnected.com/apt-package-manager-on-linux-explained/">apt package manager</a> to download and install pip.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/wHbOsG1UV9Q"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>By the way, it turns out that "LTS" stands for <a href="https://en.wikipedia.org/wiki/Long-term_support">Long Term Support</a> and refers to Ubuntu's policy of how they plan to provide patches in the future (not super relevant to this guide).</p>
<h3>Uploading files and troubleshooting HTTP</h3>
<p>So now we know how to create a server, log in with ssh and install the software we need to run Django.
Next I will show you how to upload files to the server with scp.
In addition, I'll show you how to run a quick and easy HTTP web server, which can be useful for debugging later. You will need <a href="https://realpython.com/installing-python/#windows">Python 3 installed</a> on your desktop for this step.</p>
<div class="yt-embed">
<iframe
src="https://www.youtube.com/embed/sQNNsetMZfg"
frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
allowfullscreen
>
</iframe>
</div>
<p>If you want to take a 40 minute side-quest I recommend checking out Brian Will's "The Internet" videos to learn more about what HTTP, TCP, and ports are: <a href="https://www.youtube.com/watch?v=DTQV7_HwF58">part 1</a>, <a href="https://www.youtube.com/watch?v=3fvUc2Dzr04&t=167s">part 2</a>, <a href="https://www.youtube.com/watch?v=_55PyDw0lGU">part 3</a>, <a href="https://www.youtube.com/watch?v=yz3lkSqioyU">part 4</a>.</p>
<p>I'll recap on the HTTP troubleshooting, so you understand why we just did that.
The main purpose was to get us ready to troubleshoot our Django app later.</p>
<p>We ran a program (http.server) on our Linux VM using Python.
That program was listening for HTTP requests on port 80, and when someone sends it a GET request, it responds with the contents of the index.html file that we wrote and uploaded.</p>
<p>If we use our web browser and visit our web server's IP address (64.225.23.131 in my case), our web browser will send an HTTP GET request to our server on port 80. Sometimes this works and gets the the HTML, but under some circumstances it will fail. The point of our troubleshooting is to figure out why it is failing. It could be that:</p>
<ul>
<li>Our computer is not connected to the internet</li>
<li>Our web server is turned off</li>
<li>We cannot access our web server over the internet</li>
<li>A firewall on our web server is blocking port 80</li>
<li>etc. etc. etc</li>
</ul>
<p>One way we can figure out what's going on is to log into the VM using ssh and make a HTTP GET request using curl. If the curl request works, then we know that the our HTTP server program is working and serving requests <em>inside</em> the VM, and the problem is something between our computer and the VM. Once we've narrowed down the exact problem, then we can figure out how to fix it.</p>
<p><img alt="troubleshooting http" src="https://mattsegal.dev/troubleshoot-http.png"></p>
<p>This style of troubleshooting will become useful when we start setting up our Django app on the server.</p>
<h3>Next steps</h3>
<p>Now our server is ready to serve Django and we know how to troubleshoot HTTP connections.
Next we will <a href="https://mattsegal.dev/simple-django-deployment-2.html">prepare and test Django locally</a>.</p>Simple Django deployment: a guide2020-04-26T12:00:00+10:002020-04-26T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-26:/simple-django-deployment.html<p>You're learning web development with Django. You've followed the <a href="https://docs.djangoproject.com/en/3.0/intro/tutorial01/">official introductory tutorial</a> and you can get a Django app working on your local computer. Now you want to put your web app onto the internet. Maybe it's to show your friends, or you actually want to use it for something …</p><p>You're learning web development with Django. You've followed the <a href="https://docs.djangoproject.com/en/3.0/intro/tutorial01/">official introductory tutorial</a> and you can get a Django app working on your local computer. Now you want to put your web app onto the internet. Maybe it's to show your friends, or you actually want to use it for something, or maybe you just want to learn how to deploy Django apps. This guide has five parts:</p>
<ol>
<li><a href="https://mattsegal.dev/simple-django-deployment-1.html">Server setup</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-2.html">Prepare and test Django locally</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-3.html">Deploy Django to the server</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-4.html">Run Django in the background</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-5.html">Automate the re-deployment</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-6.html">Domain setup</a></li>
</ol>
<p>You can start the guide with part 1 now. If you're interested, read on to learn more about what motivated me to write this.</p>
<h3>Stuck, frustrated, confused</h3>
<p>You've probably tried <a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-18-04">tutorials like this</a> which give you a bunch of steps to follow, commands to type, files to configure. This is how I learned to deploy Python web apps: with online tutorials and a lot of Googling. When you follow these guides, you have no fucking idea what you're actually doing. Why do you use that tool? Why do you type that command? You may as well be <a href="https://youtu.be/nAQBzjE-kvI?t=33">learning magic at Hogwarts</a>. You could easily swap:</p>
<blockquote>
<p>What is apt? Why am I using it to install postgresql-contrib and libpq-dev?</p>
</blockquote>
<p>with</p>
<blockquote>
<p>Why do I have to say Wingardium Levios<em>aaa</em> not Leviosa<em>rrr</em> to get my spell to work?</p>
</blockquote>
<p>It's not your fault. These kinds of guide throw a lot of unfamilliar tools and concepts at you without taking the time to teach you about them. The DigitalOcean guide above smacks you with:</p>
<ul>
<li>apt package manager</li>
<li>PostgreSQL installation</li>
<li>PostgreSQL database admin</li>
<li>Python virtual environments</li>
<li>Prod Django settings</li>
<li>Running a gunicorn WSGI server</li>
<li>Firewall configurations</li>
<li>Systemd configuration</li>
<li>Socket file setup</li>
<li>NGINX reverse proxy setup</li>
</ul>
<p>It also requires that you know:</p>
<ul>
<li>How to spin up a new a web server</li>
<li>How to login via SSH</li>
<li>How to set DNS records</li>
<li>How to get your Django code onto the server</li>
</ul>
<p>Some of these tools and skills are necessary, some of them are not. If you don't follow their instructions perfectly then you can get stuck and have no idea how to get unstuck. Then you get frustrated, discouraged and embarrassed that you suck so much at deployment. It's pretty common for new developers to struggle for days, even weeks to get their first web app deployed.</p>
<p>Hitting a wall when trying to deploy your Django app isn't inevitable. I used to work as a ski instructor (software pays better) and I was taught a saying:</p>
<blockquote>
<p>Teach new skills on easy terrain. On hard terrain, stick to the old skills.</p>
</blockquote>
<p>This means that you shouldn't try teaching a fancy new technique on the steepest, hardest runs.
Deploying web applications is <em>hard</em>. It gets easier with time, but it's got a nasty learning curve. It's easier to learn if we minimise the number of new skills and try to keep you in a familiar environment.</p>
<h3>Minimal new tools, small steps</h3>
<p>That's the focus of this guide. I want to help you achieve lots of small, incremental wins where you gain one small skill, then another, until you have all the skills you need to deploy your Django app. I want to you to understand what the fuck is going on so you don't get stuck. I want to introduce as few new tools as possible.</p>
<p>Here are the new technologies that I propose we learn to use:</p>
<ul>
<li>A Linux virtual machine in the cloud for hosting (DigitalOcean)</li>
<li>SSH and SCP for accesing the server</li>
<li>git-bash shell scripting</li>
<li>Python virtual environments</li>
<li>Gunicorn WSGI server for running your app</li>
<li>Supervisor for keeping Gunicorn running</li>
<li>Whitenoise Python library to serve static files</li>
<li>Cloudflare SaaS tool for DNS, static file caching, SSL</li>
</ul>
<p>That's still a lot of tools, despite trying to keep it small an simple. Here are some things we will not be using:</p>
<ul>
<li>PostgreSQL database</li>
<li>NGINX reverse proxy</li>
<li>Containers (eg. Docker, Kubernetes)</li>
<li>Config management tools (eg. Ansible, Fabric)</li>
<li>Git version control</li>
</ul>
<p>You should give them a try sometime... just not yet.</p>
<blockquote>
<p>But don't professional web developers use NGINX/Docker/Postgres/etc? That's what people on Reddit say! I don't want to learn bad practices :(</p>
</blockquote>
<p>It's true that these are all great tools. I use them often, but I think they will make learning to deploy Django unnecessarily complicated.
The good news is that you can always add them to your infrastructure later on.
Once you've got this simple deployment down then you can mix it up: you can add NGINX, Postgres and Docker if you like.</p>
<h3>The guide</h3>
<p>I am going to assume that you are using Windows for the guide, partly because it's what most new developers use, and partly because it's the worst-case scenario.
That's right: doing this stuff on Windows is hard-mode.
If you have a Mac or Linux desktop, then you can still follow along - there will just be slightly fewer things for you to do.</p>
<p>Also, just so you know, this guide will involve buying a domain name ($2 - $10 USD / year) and using a paid cloud service (5 bucks / month).
If you're not willing (or unable) to get your credit card out and pay for some stuff, then you will not be able to complete every step.</p>
<p>I said this was a "simple" guide, but I didn't say it's short: it's surprisingly long in fact. This guide has five steps, which I suggest you do in order:</p>
<ol>
<li><a href="https://mattsegal.dev/simple-django-deployment-1.html">Server setup</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-2.html">Prepare and test Django locally</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-3.html">Deploy Django to the server</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-4.html">Run Django in the background</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-5.html">Automate the re-deployment</a></li>
<li><a href="https://mattsegal.dev/simple-django-deployment-6.html">Domain setup</a></li>
</ol>Never think about Python formatting again2020-04-24T12:00:00+10:002020-04-24T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-24:/python-formatting-with-black.html<p>At some point you realise that formatting your Python code is important.
You want your code to be readable, but what's the <em>right</em> way to format it?
You recognise that it's much harder to read this:</p>
<div class="highlight"><pre><span></span><code><span class="n">some_things</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"carrots"</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span> <span class="p">],</span>
<span class="s2">"apples"</span><span class="p">:[</span>
<span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span>
<span class="p">],</span> <span class="s2">"pears"</span><span class="p">:</span> <span class="p">[]</span> <span class="p">}</span>
</code></pre></div>
<p>than it is to …</p><p>At some point you realise that formatting your Python code is important.
You want your code to be readable, but what's the <em>right</em> way to format it?
You recognise that it's much harder to read this:</p>
<div class="highlight"><pre><span></span><code><span class="n">some_things</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"carrots"</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span> <span class="p">],</span>
<span class="s2">"apples"</span><span class="p">:[</span>
<span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span>
<span class="p">],</span> <span class="s2">"pears"</span><span class="p">:</span> <span class="p">[]</span> <span class="p">}</span>
</code></pre></div>
<p>than it is to read this:</p>
<div class="highlight"><pre><span></span><code><span class="n">some_things</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"carrots"</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="s2">"apples"</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
<span class="s2">"pears"</span><span class="p">:</span> <span class="p">[],</span>
<span class="p">}</span>
</code></pre></div>
<p>or... wait should it be like this instead? Hmm...</p>
<div class="highlight"><pre><span></span><code><span class="n">some_things</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"carrots"</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span>
<span class="s2">"apples"</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span>
<span class="s2">"pears"</span><span class="p">:</span> <span class="p">[],</span>
<span class="p">}</span>
</code></pre></div>
<p>nah, nah, wait a sec maybe would be better if we kept in on one line to save space...</p>
<div class="highlight"><pre><span></span><code><span class="n">some_things</span> <span class="o">=</span> <span class="p">{</span><span class="s2">"carrots"</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="s2">"apples"</span><span class="p">:</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="s2">"pears"</span><span class="p">:</span> <span class="p">[]}</span>
</code></pre></div>
<p>Umm, is that line too long though? We could do this for hours.</p>
<p>Formatting your code <em>is</em> important, but it's easy to get lost in the details.
You want your code to look professional, but it can be a time-sink. It's easy to:</p>
<ul>
<li>spend time experimenting with different formatting styles</li>
<li>spend ages twiddling with linter (eg. PyLint) rules, and then spend cumulative hours tweaking your code to make the linter stop yelling at you</li>
<li>fight a co-worker to the death on top of a castle tower in a thunderstorm over the proper way to lay out brackets</li>
</ul>
<p>This is all just incidental bullshit though. It's a distraction from your real work: laying out brackets one way or another isn't going to make your software run any better (but if the closing bracket isn't on its own new line then I'll gut you like the dog you are!).</p>
<p>Is there a way to avoid this mess? How can we get rid of all this incidental work?</p>
<h3>Give black a try</h3>
<p><a href="https://github.com/psf/black/">Black</a> is a tool that auto-formats your Python code. You run black over all your .py files and the correct formatting is applied for you. It's like <a href="https://prettier.io/">prettier</a>, but for Python instead of JavaScript.</p>
<p>Importantly, Black has minimal configuration. You basically only get to choose the maximum line length that you want, and everything else is decided by the formatter. It's the "uncompromising Python code formatter". This means you don't get to choose what formatting style you use, but it also means you don't need to decide either: once you've adopted Black, you <em>never need to think about Python formatting again</em>. No more config files, no more arguing with your coworkers. Spend your time on more valuable things, like what your code is doing.</p>
<p>Is it safe to just run your whole codebase through this tool? I think so. Black compares the Python <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a> of the code before and after the changes, just to make sure it didn't change or break anything. In the last few jobs I've worked, I've walked in, made the case for Black (politely), and run it over the whole codebase. It's never caused any issues.</p>
<p>Here's some of the other benefits of Black:</p>
<ul>
<li><strong>Less work when coding</strong>: all the time you spend manually formatting your code can now be spent elsewhere.</li>
<li><strong>More productive pull requests</strong>: the person reviewing your code can't <a href="https://en.wiktionary.org/wiki/bikeshedding">bikeshed</a> your formatting, because it's out of your hands - instead they'll need to actually look at what your code is doing.</li>
<li><strong>Smaller diffs</strong>: there will be no formatting changes in your diffs, so the only changes left are meaningful ones. In addition, the Black formatting style is optimised around minimising diffs.</li>
<li><strong>Keep the linter off your back</strong>: if you are also using a linter like flake8, then Black will help you avoid basic <a href="https://www.python.org/dev/peps/pep-0008/">PEP 8</a> errors.</li>
<li><strong>Auto format on save in your IDE</strong>: This one is huuuuge. You can set up Black to reformat your code <em>as you write it</em>. I've found this helps me write code much faster.</li>
</ul>
<h3>Running Black</h3>
<p>You have to install it.</p>
<div class="highlight"><pre><span></span><code>pip install black
</code></pre></div>
<p>Then you run it with a path as an argument</p>
<div class="highlight"><pre><span></span><code>black .
</code></pre></div>
<p>Then it mangles all of your code!</p>
<div class="highlight"><pre><span></span><code>reformatted /home/matt/code/redbubble/colors.py
reformatted /home/matt/code/redbubble/fuzzer.py
reformatted /home/matt/code/redbubble/image.py
reformatted /home/matt/code/redbubble/sierpinski.py
All done! ✨ 🍰 ✨
4 files reformatted, 2 files left unchanged.
</code></pre></div>
<p>You can mess around a little bit with the line length config, or using pyproject.toml, but that's basically it.</p>
<p>If you're running CI and you want to check for correct formatting, you can use</p>
<div class="highlight"><pre><span></span><code>black --check .
</code></pre></div>
<p>It returns exit code 0 if the formatting is correct, and exit code 1 if it's not.</p>
<h3>Format on save</h3>
<p>Format on save is incredible, it's been a big productivity boost for me. In VSCode you can add the following settings to format on save with black:</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"python.formatting.provider"</span><span class="p">:</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"editor.formatOnSave"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>I don't know about other editors, but I've set this up in PyCharm as well. Once that's done then any save will format the document. Here's an example:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/a5914312a4ff44d188f019bb63e19bf7" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>Limitations</h3>
<p>Black is a just formatter, not a linter, so it does not do some linting functions. It will not complain about unused variables, imports and other linty stuff.</p>
<p>It will also not do import sorting like <a href="https://github.com/timothycrosley/isort">isort</a>. In fact, Black and isort can fight over how imports should be formatted, if you're running both of them. You can resolve it by running isort then black, or vice versa, but it can make CI tests a little awkward.</p>
<p>Finally, it's "in beta", which as far as I can tell just means "you should expect some formatting to change in the future".</p>
<h3>Summary</h3>
<p>Black is awesome, it'll save you time and brain cycles, go forth and use it on all your Python code.</p>Cloudflare makes DNS slightly less painful2020-04-18T12:00:00+10:002020-04-18T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-18:/cloudflare-review.html<p>When you're setting up a new website, there's a bunch of little tasks that you have to do that <em>suck</em>.
They're important, but they don't give you the joy of creating something new, they're just... plumbing.</p>
<p>In particular I'm thinking of:</p>
<ul>
<li>setting up your domain name with DNS records</li>
<li>encrypting …</li></ul><p>When you're setting up a new website, there's a bunch of little tasks that you have to do that <em>suck</em>.
They're important, but they don't give you the joy of creating something new, they're just... plumbing.</p>
<p>In particular I'm thinking of:</p>
<ul>
<li>setting up your domain name with DNS records</li>
<li>encrypting your traffic with SSL</li>
<li>compressing and caching your static assets (CSS, JS) using a CDN</li>
</ul>
<p>No one decided to learn web development because they were super stoked on DNS.
The good news is that you can use <a href="https://www.cloudflare.com/">Cloudflare</a> (for free)
to make all these plumbing tasks a little less painful.</p>
<p>In the rest of this post I'll go over the pros and cons of using Cloudflare,
plus a short video guide on how to start using it.</p>
<h3>What is Cloudflare</h3>
<p>Cloudflare is a <a href="https://en.wikipedia.org/wiki/Reverse_proxy">reverse proxy</a> service that you put in-between you website visitors and your website's server. All requests that hit your website are routed through Cloudflare's servers first. This means that they can provide:</p>
<ul>
<li><strong>DNS record configuration</strong>: allowing you to set up A records, CNAMEs etc for your domain.</li>
<li><strong>HTTP traffic encryption using SSL</strong>: All HTTP traffic between the end-user and Cloudflare's servers are encrypted with SSL (making it HTTPS)</li>
<li><strong>Caching of static assets</strong>: Cloudflare will cache static assets like CSS and JS <a href="https://support.cloudflare.com/hc/en-us/articles/200172516-Understanding-Cloudflare-s-CDN">depending on the "Cache-Control" headers</a> set by your origin server.</li>
<li><strong>Compression of static assets</strong>: Cloudflare will compress <a href="https://support.cloudflare.com/hc/en-us/articles/200168396-What-will-Cloudflare-compress-">static assets</a> like CSS and JS so that your pages load and render faster.</li>
</ul>
<p>This is a <em>whooole</em> lot of bullshit that I don't want to set up myself, if I can avoid it, so it's nice when Cloudflare handles it for me.</p>
<h3>Cloudflare pros</h3>
<p>In addition to the features I listed above, there are a few nice I've found when using Cloudflare:</p>
<ul>
<li><strong>Free</strong>: It has a free plan which is sufficient for all the projects I've worked on so far</li>
<li><strong>Easy to use</strong>: I think it's uncommonly easy to set up and use for tools in its field</li>
<li><strong>CNAME flattening</strong>: They provide a handy DNS feature called "CNAME flattening", which means you can point your root domain name (eg. "mattsegal.dev") to other domain names (eg. an AWS S3 bucket website "mattsegal.dev.s3-blah.aws.com"). As far as I know only Cloudflare provides this feature.</li>
<li><strong>Flexible SSL</strong>: Their "flexible SSL" feature is both a pro and a con. It works like this: traffic between you users and Cloudflare are encrypted, but traffic between Cloudflare are your servers are not encrypted. As long as you trust Cloudflare or intermediate routers not to snoop on your packets, this is a nice setup. In this case setting up flexible SSL is as simple as toggling a button on the website. You <em>can</em> set up end-to-end encryption but that's a little more work. <a href="https://letsencrypt.org/">Let's Encrypt</a> has made setting up SSL <em>much</em> easier and cheaper for developers, but it's still relatively complex compared to Cloudflare's "flexible" implementation.</li>
<li><strong>Faster DNS updates?</strong>: I might be imagining things, but I find that updates to DNS records in Cloudflare <em>seem</em> to propagate faster than other services.</li>
<li><strong>Analytics</strong>: They provide some basic analytics like unique visitors and download bandwidth, which is nice, I guess</li>
</ul>
<h3>Cloudflare cons</h3>
<p>The biggest main con I see for using Cloudflare is that you're not learning to use open source alternatives like self-hosted NGINX to do the same job.
If you are an NGINX expert already then you're a big boy/girl and you can make your own decisions about what tools to use.
If you're a newer developer and you've never set up a webserver like NGINX and Apache, then you're robbing yourself of useful infrastructure experience if you <em>only</em> ever use Cloudflare for <em>everything</em>.</p>
<p>That said, I think that newer developers should start deploying websites using services like Cloudflare, and then learn how to use tools like NGINX.</p>
<p>Another, more abstract downside, is that some double-digit percentage of the internet's websites use Cloudflare. If you're worried about centralization of control of the internet, then Cloudflare's growing consolidation of internet traffic is a concern. Personally I don't really care about that right now.</p>
<h3>How to get started</h3>
<p>This video shows you how to get set up with Cloudflare.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/fffc03f4a3f24285be017b7759461755" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>What now?</h3>
<p>Once you've set up Cloudflare, you'll need to start creating some DNS records. I've written a <a href="https://mattsegal.dev/dns-for-noobs.html">guide on exactly this topic</a> to help you get set up.
I suggest you check it out so you can give your website a domain name.</p>Nand to Tetris is a great course2020-04-17T12:00:00+10:002020-04-17T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-17:/nand-to-tetris.html<p>Everyone who learns programming at some point stops and asks - how does this actually work?
You might know how to write and run code, but <em>what's actually happening inside the computer</em>?
It can seem unfathomable.</p>
<p>Some people don't care about what's happening under the hood. Their code works, it gets …</p><p>Everyone who learns programming at some point stops and asks - how does this actually work?
You might know how to write and run code, but <em>what's actually happening inside the computer</em>?
It can seem unfathomable.</p>
<p>Some people don't care about what's happening under the hood. Their code works, it gets the job done, why would you bother drilling into the details?
I'm like that sometimes, and you can get a long way coding without knowing the fundamentals of computing,
but there is a certain clarity and confidence that you can only get from knowing that you could <a href="https://www.youtube.com/watch?v=SbO0tqH8f5I">build a computer from scratch</a>... if you had the time.</p>
<h3>You too can build a computer</h3>
<p>I'm glad to say that you <em>may well</em> have the time to build a computer from scratch, since that's what you do in the online course Nand to Tetris (<a href="https://www.nand2tetris.org/">website</a>, <a href="https://www.coursera.org/learn/build-a-computer">Coursera</a>). The course takes you through 12 projects, about 1 week each, where you incrementally build:</p>
<ul>
<li>a CPU</li>
<li>a RAM chip</li>
<li>a full computer</li>
<li>an assembly language</li>
<li>a virtual machine</li>
<li>a high-level language</li>
<li>an operating system</li>
</ul>
<p>All of this is done on your computer using tools provided by the course. Once you've done these projects you will understand the building blocks of a computer from the RAM and CPU, to assembly up to the compiler that executes your programming language of choice. It's a powerful course that will unlock a whole new perspective on computer programming for you. I believe that bang-for-buck it's probably the best online course for someone who is a self-taught programmer. It's practical, fun and mostly oriented around building things.</p>
<p>I found that the course took a few weeks to really get into gear. The intial content on boolean logic and arithmetic can be a little dry, but if you can get through that, the course becomes more interesting and rewarding. It's a pretty cool feeling to run a program that is executed by a system that you wrote, from the compiler to the VM to the assembly code to the CPU.</p>
<h3>What you need</h3>
<p>The course description is a little over-optimistic in my opinion:</p>
<blockquote>
<p>This is a self-contained course: all the knowledge necessary to succeed in the course and build the computer system will be given as part of the learning experience.</p>
</blockquote>
<p>It's <em>mostly</em> self-contained, but really you either need to be an intermediate programmer, or very gung-ho. I'm not trying to talk you out of the course if you're new to coding, just know it's going to be challenging and you might get stuck from time-to-time. I believe that anyone can get through it if they are determined.</p>
<p>The whole course consists of projects and there's automated testing of your work. If you're not in the habit of doing it already, I strongly recommend learning how to write unit tests in your programming language of choice. It's 1000x faster to write your own tests and run them on your computer than to upload your code to Coursera and let them verify whether your code works. You will write bugs, and you want to minimise the feedback loop required to find them.</p>
<p>If you have <em>lots</em> of free time and you want to line up a 1-2 punch of theory and practice, you could also watch Harry Porter's <a href="https://www.youtube.com/playlist?list=PLbtzT1TYeoMjNOGEiaRmm_vMIwUAidnQz">Theory of Computation</a> videos, which teach you the mathsy theoretical underpinnings of computer science.</p>
<h3>Do it! (if you can)</h3>
<p>Not everyone has the spare time to commit to 12 weeks of programming projects, but if you do, I encourage you to give this course a try.
Knowing how a computer works from chip-to-compiler is a nugget of knowledge that will be useful for your whole life.
I can't say the same for learning the latest JavaScript toolchain.
<a href="https://www.coursera.org/learn/build-a-computer">Give it a try</a>!</p>DNS for beginners: how to give your site a domain name2020-04-13T12:00:00+10:002020-04-13T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-13:/dns-for-noobs.html<p>You are learning how to build a website and you want to give it a domain name like mycoolwebsite.com.
It doesn't seem like a <em>real</em> website without a domain name, does it?
How is anybody going to find your website without one?
Setting up your domain is an important …</p><p>You are learning how to build a website and you want to give it a domain name like mycoolwebsite.com.
It doesn't seem like a <em>real</em> website without a domain name, does it?
How is anybody going to find your website without one?
Setting up your domain is an important step for launcing your website, but it's also a real pain if you're new to web development.
I want to help make this job a little easier for you.</p>
<p>Typically you go to <a href="https://www.namecheap.com/">namecheap</a> or GoDaddy or some other domain name vendor and you buy mycoolwebsite.com for 12 bucks a year - now you need to set it up.
When you try to get started you are confronted by all these bizzare terms: "A record", "CNAME", "nameserver". It can be quite intimidating.
The rest of this blog will show you the basics of how to set up your domain, with a few explanations sprinkled throughout.</p>
<p>Contents:</p>
<ul>
<li>What the fuck is DNS?</li>
<li>I want my domain name to go to an IP address</li>
<li>I want my domain name to go to a different domain name</li>
<li>I want to give control of my domain name to another service</li>
</ul>
<h3>What the fuck is DNS?</h3>
<p>I'll keep this short. I think <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">CloudFlare explains it best</a>:</p>
<blockquote>
<p>The Domain Name System (DNS) is the phonebook of the Internet. Humans access information online through domain names, like nytimes.com or espn.com. Web browsers interact through Internet Protocol (IP) addresses. DNS translates domain names to IP addresses so browsers can load Internet resources.</p>
</blockquote>
<p>DNS is a worldwide, online "phonebook" that translates human-friendly website names like "mattsegal.dev" into computer-friendly numbers like 192.168.1.1. You use the domain name system every day:</p>
<ul>
<li>You type "mattsegal.dev" into your web browser and press "Enter"</li>
<li>Your computer will reach out into the domain name system and ask other computers to find out which IP address "mattsegal.dev" points to</li>
<li>Your computer eventually finds the correct IP address</li>
<li>Your web browser fetches a web page from that IP address</li>
</ul>
<p>So, how do we get our website into this "phonebook"?</p>
<h3>I want my domain name to go to an IP address</h3>
<p>Sometimes you have an IP address like 11.22.33.44 and you want your domain name to send users to that IP. You want a mapping like this:</p>
<div class="highlight"><pre><span></span><code>mycoolwebsite.com --> 11.22.33.44
</code></pre></div>
<p>You will need this when you are running software like WordPress, or your own custom web app. Your website is running on a server and that server has an IP address.
For example, I have a website <a href="https://mattslinks.xyz">mattslinks.xyz</a> which runs on a webserver which has a public IP of 167.99.78.141.
My users (me, my girlfriend) don't want to type in 167.99.78.141 into our browsers to visit my site. We'd prefer to type in mattslinks.xyz, which is way easier to remember. So I need to set up a mapping using DNS:</p>
<div class="highlight"><pre><span></span><code>mattslinks.xyz --> 167.99.78.141
</code></pre></div>
<p>So how do we set this up? We need an <strong>A record</strong> ("address record") to do this. An A record maps a domain name to an IP address.
To set up an A record you need to go onto your domain name provider's website and enter the <strong>subdomain</strong> name you want plus the IP address that you wanto to point to.</p>
<p><img alt="Photo" src="https://mattsegal.dev/a-record.png"></p>
<p>What I've set up here is:</p>
<div class="highlight"><pre><span></span><code>mattslinks.xyz --> 167.99.78.141
www.mattslinks.xyz --> 167.99.78.141
</code></pre></div>
<p>At this point you may yell <em>"What the fuck is a subdomain!?"</em> at your monitor. Please do, it's cathartic. The idea is that when you own mattslinks.xyz, you also own a near-infinite number of "child domains" which end in mattslinks.xyz. For example you can set up A records (and other DNS records) for all these domain names:</p>
<ul>
<li>mattslinks.xyz ("root domain", sometimes written as "@")</li>
<li>www.mattslinks.xyz (a subdomain)</li>
<li>blog.mattslinks.xyz (a different subdomain)</li>
<li>cult.mattslinks.xyz</li>
<li>super.secret.clubhouse.mattslinks.xyz</li>
</ul>
<p>Apparently you can do this to up to 255 characters (including the dots) so this.is.a.very.long.domain.name.but.i.advise.against.doing.this.mattslinks.xyz is <em>technically</em> possible, but a stupid idea.</p>
<p>If you're serving a normal website, then it's pretty standard to add A records for both your root domain (mattslinks.xyz) and the "www" subdomain (www.mattslinks.xyz), because some people might put "www" in front of the domain name and we don't want them to miss our website.</p>
<p>Just in case this all seems a little too abstract and theoretical for you, here's a video of me setting some A records:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/2398e6757135445989f83757befd6c11" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>And then, 30 minutes later, checking if I've gone mad or not...</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/c591b12ac5ae400b82e497011a96d901" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>Finally, the record updates and I add a www subdomain</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/4a2fed1898b0491fabab1ef8f063b987" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>You might also be wondering about the <strong>TTL</strong> value. It's not that important, just set it to 3600. If you care to know, TTL stands for "time to live" and it represents how long your DNS records is going to hang around in the system before anybody checks the records you set. So if it's 3600 (seconds), it means it takes at least an hour for changes that you make to your DNS records to update on other people's computers.</p>
<p>So you have an A record set up, how do you check that it's working? The easiest way is to wait an hour or so and then use a 3rd party website like <a href="https://dnschecker.org/#A/mattslinks.xyz">DNS checker</a>. If you're a little more technical and have a bash shell handy you can also try using <a href="https://www.linux.com/training-tutorials/check-your-dns-records-dig/">dig</a> from your local machine.</p>
<h3>I want my domain name to go to a different domain name</h3>
<p>Sometimes your DNS needs are a little more complicated than just mapping a domain name to an IP address. Sometimes you want to do this instead:</p>
<div class="highlight"><pre><span></span><code>prettyname.com --> ugly-name-for-pretty-site.ap-southeast2.amazon.aws.com
</code></pre></div>
<p>That is to say, you want users to type in www.prettyname.com, but you want them to see the website which is hosted on ugly-name-for-pretty-site.ap-southeast2.amazon.aws.com, but you never want them to know about the hideous name that lies beneath.</p>
<p>For this problem you need a <strong>CNAME record</strong> ("canonical name"). A CNAME record is used to map from one domain name to another.</p>
<p>Here's an example of me setting up a CNAME record in CloudFlare:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/1445cce96ac3449183acf40719c02b4d" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>I want to give control of my domain name to another service</h3>
<p>Sometimes you you want to give control of a domain to another service. This can happen when you're using a service like Squarespace or Webflow and you want them to set up all your DNS records for you, or if you want to use a different service (like CloudFlare) to manage your DNS.</p>
<p>The way to set this up is to use set the <strong>name servers</strong> of your domain. Changing the name servers, as far as I can tell, gives the target servers full control of your domain. In this video, I'll show you some examples.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/269e0eba94dc40d3880ef04aa261f41f" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>Conclusion</h3>
<p>So there you go, some basic DNS-how-tos. With A records, CNAMES and name servers under your belt, you should be able to do ~70% of DNS tasks that you need in web development. Get a handle on TXT and MX records, and you're up to ~95%. DNS is horrible to work with, but it doesn't need to be confusing.</p>
<p>This certainly isn't the definitive guide on DNS, and I expect I made some technical errors in my explanations, but I hope you now have the tools to go out an setup some websites.</p>4 tips for debugging in Django2020-04-12T12:00:00+10:002020-04-12T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-12:/django-debug-tips.html<p>You've got a bug in your Django code and you can't quite figure out what's wrong. You know there's a problem, but you can't quite pin down where it's coming from. This post will share 4 tips which will help you speed up your bug catching.</p>
<h3>Dig deeper in your …</h3><p>You've got a bug in your Django code and you can't quite figure out what's wrong. You know there's a problem, but you can't quite pin down where it's coming from. This post will share 4 tips which will help you speed up your bug catching.</p>
<h3>Dig deeper in your print statements</h3>
<p>Using <code>print</code> to view data is the most basic debugging method. You're probably already doing this, so I'm going to show you how to squeeze more info out of your objects when you print them.</p>
<p>Basic print usage for debugging looks like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">thing</span> <span class="o">=</span> <span class="n">Things</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">last</span><span class="p">()</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Check thing:"</span><span class="p">,</span> <span class="n">thing</span><span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The thing is called </span><span class="si">{</span><span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>The problem is that when you're looking at Python objects, you might only see a string representing the object, rather than the data you want. For example the above code will print this:</p>
<div class="highlight"><pre><span></span><code>Check thing: <Thing: 1>
</code></pre></div>
<p>This is not helpful for our debugging, but there's a better way. We can use <code>pprint</code>, which "pretty prints" dictionaries, and the <code>__dict__</code> attribute, which is present on every Python object, to dig into the data in more detail:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">thing</span> <span class="o">=</span> <span class="n">Things</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">last</span><span class="p">()</span>
<span class="kn">from</span> <span class="nn">pprint</span> <span class="kn">import</span> <span class="n">pprint</span>
<span class="n">pprint</span><span class="p">(</span><span class="n">thing</span><span class="o">.</span><span class="vm">__dict__</span><span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The thing is called </span><span class="si">{</span><span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>With this method we will see a nicely formatted dict, showing all the data attached to the <code>thing</code> object:</p>
<div class="highlight"><pre><span></span><code>{
"_state": <django.db.models. ...>,
"id": 1,
"name": "the thing",
"weight": 12,
}
</code></pre></div>
<p>Now you can dig deeper into your objects when printing.</p>
<p>Leaving a bunch of print statements in your code can pollute your app's console output. You can keep the printing but reduce the noise by <a href="https://mattsegal.dev/file-logging-django.html">setting up logging</a>, which then enables you to toggle how noisy your logs are using <a href="https://docs.python.org/3/howto/logging.html">log levels</a>.</p>
<h3>Python's built-in debugger</h3>
<p>Finding bugs via print works, but it can be a slow and tedious process. You might have to run the same code dozens of times to find the problem. Wouldn't it be nice to just stop the code on a particular line and then check a bunch of variables? You can do this with Python's built-in debugger. You can get started with it by following <a href="https://mattsegal.dev/django-debug-pdb.html">this guide on using pdb</a>.</p>
<h3>Check your insanity with assertions</h3>
<p>At some point during debugging you start to question your sanity - you don't know what to believe anymore. You start to question everything you've ever known about programming.</p>
<blockquote>
<p>When debugging, you must first accept that something you believe is true is not true. If everything you believed about this system were true, it would work. It doesn't, so you're wrong about something. (<a href="https://twitter.com/cocoaphony/status/1224364439429881856">source</a>)</p>
</blockquote>
<p>Using Python's <code>assert</code> statement is a quick and easy way to check if something that you believe is true, is <em>acutally true</em>. <code>assert</code> is pretty simple:</p>
<ul>
<li>You whack <code>assert</code> in your code with an expression</li>
<li>If the expression is truthy then nothing happens</li>
<li>If the expression is falsy then <code>assert</code> throws an <code>AssertionError</code></li>
</ul>
<p>Simple, but quite useful. Here are some quick examples:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># All OK, nothing happens</span>
<span class="k">assert</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">1</span>
<span class="k">assert</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">1</span>
<span class="k">assert</span> <span class="n">a</span> <span class="o">==</span> <span class="n">b</span>
<span class="c1"># All throw AssertionError</span>
<span class="k">assert</span> <span class="kc">False</span>
<span class="k">assert</span> <span class="mi">1</span> <span class="o">==</span> <span class="mi">2</span>
<span class="k">assert</span> <span class="p">[]</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">2</span>
<span class="k">assert</span> <span class="n">a</span> <span class="o">==</span> <span class="n">b</span>
<span class="c1"># You can include messages</span>
<span class="k">assert</span> <span class="kc">False</span><span class="p">,</span> <span class="s1">'This is forbidden'</span>
<span class="c1"># Throws AssertionError: This is forbidden</span>
</code></pre></div>
<p>So how do you use this practically? Well, in a Django view, you can check all sorts of things that you believe are true. Check the assertions that you believe <em>maybe</em> aren't true, even though they <em>should</em> be.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="k">assert</span> <span class="n">Thing</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">exists</span><span class="p">(),</span> <span class="s1">'there must be at least 1 thing in db'</span>
<span class="n">thing</span> <span class="o">=</span> <span class="n">Things</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">last</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">thing</span><span class="p">,</span> <span class="s1">'thing must exist'</span>
<span class="k">assert</span> <span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="p">,</span> <span class="s1">'thing must have name'</span>
<span class="k">assert</span> <span class="nb">type</span><span class="p">(</span><span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="ow">is</span> <span class="nb">str</span><span class="p">,</span> <span class="s1">'thing name must be a str'</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The thing is called </span><span class="si">{</span><span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>Deciding when to use print vs. assert vs. pdb comes with experience, so I recommend you give them all a try so that you can get a feel for them. These three methods are quick and simple to implement, wheras this final tip is the most useful, but also requires the most labour.</p>
<h3>Reproduce the bug with tests</h3>
<p>Some bugs can be quite tricky to reproduce. To trigger the line of code that causes the bug you might need to create a new user, log in as that user, verify their email, sign in, sign out, sign in again, buy their first product... etc. etc. etc. you get the idea.</p>
<p>Even worse, you might have to do this series of steps dozens of times before you've fixed the bug. To avoid all of this hard work... you're going to have to do a little bit of hard work and write a test.</p>
<p>The bad thing about tests is that they take some time to write. The good thing about tests is that you set up the data required to run the test once, and then you've automated the process forever. Tests become more valuable the more you run them, and you can run them <em>a lot</em>:</p>
<ul>
<li>You can quickly re-run them to reproduce the issue</li>
<li>You can run them to check that the issue is solved</li>
<li>You can run them in the future to make sure that the issue never comes back</li>
</ul>
<p>I'll give you a quick example. Say your issue is that when you call the view <code>my_view</code>, you get an error:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">thing</span> <span class="o">=</span> <span class="n">Things</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">last</span><span class="p">()</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The thing is called </span><span class="si">{</span><span class="n">thing</span><span class="o">.</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>The error is</p>
<div class="highlight"><pre><span></span><code>AttributeError: 'NoneType' object has no attribute 'name'
</code></pre></div>
<p>A quick test to run this view (using <a href="https://docs.pytest.org/en/latest/">pytest</a>) is:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># tests.py</span>
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">django_db</span>
<span class="k">def</span> <span class="nf">test_my_view__with_thing</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Check that my_view returns thing name when there is a Thing</span>
<span class="sd"> """</span>
<span class="n">Thing</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">"a thing"</span><span class="p">)</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"my-view"</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">data</span> <span class="o">==</span> <span class="s2">"The thing is called a thing"</span>
<span class="nd">@pytest</span><span class="o">.</span><span class="n">mark</span><span class="o">.</span><span class="n">django_db</span>
<span class="k">def</span> <span class="nf">test_my_view__with_no_thing</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Check that my_view returns no thing name when there is no Thing</span>
<span class="sd"> """</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">reverse</span><span class="p">(</span><span class="s2">"my-view"</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">status_code</span> <span class="o">==</span> <span class="mi">200</span>
<span class="k">assert</span> <span class="n">response</span><span class="o">.</span><span class="n">data</span> <span class="o">==</span> <span class="s2">"The thing is called "</span>
</code></pre></div>
<p>Note that even just writing these tests will show you where the code is broken, but this is just an example, so let's ignore that.</p>
<p>When you run these tests, you'll notice that:</p>
<ul>
<li><code>test_my_view__with_thing</code> passes</li>
<li><code>test_my_view__with_no_thing</code> fails, with an <code>AttributeError</code></li>
</ul>
<p>Now that we've nailed down the issue with a test, we can fix the bug, update the test and re-run it to make sure the bug is fixed. Now we've automated the process of reproducing the bug and checking that it's fixed.</p>
<h3>Conclusion</h3>
<p>So there you go, four tips for debugging Django:</p>
<ul>
<li>better print statements with <code>__dict__</code></li>
<li>Python's pdb debugger</li>
<li>assert statements</li>
<li>reproducing the issue with tests</li>
</ul>
<p>Of all these four, I recommend you invest time into learning how to write tests. Effective testing has huge bang-for-buck, not just for debugging, but also for preventing bugs in the first place.</p>Quickly fix bugs in Django with Python's debugger2020-04-11T12:00:00+10:002020-04-11T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-11:/django-debug-pdb.html<p>There's a bug in your Django code. You've tried to track down the problem with "print" statements, but it's such a slow, tedious process:</p>
<ul>
<li>Add a "print" statement to your code</li>
<li>Refresh the page in your browser to re-run your code</li>
<li>Look at the <code>runserver</code> console output for the "print …</li></ul><p>There's a bug in your Django code. You've tried to track down the problem with "print" statements, but it's such a slow, tedious process:</p>
<ul>
<li>Add a "print" statement to your code</li>
<li>Refresh the page in your browser to re-run your code</li>
<li>Look at the <code>runserver</code> console output for the "print" results</li>
</ul>
<p>Repeat this 100 times, maybe you find the issue. Is there a faster way to find and fix bugs in Django?</p>
<h3>Python's built-in debugger</h3>
<p>Python's standard library comes with a debugging tool and it is easily the most efficient tool for diving into your code and figuring out what's happening. Using the debugger is as simple as taking a Django view like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">some_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""Shows user some stuff"""</span>
<span class="n">things</span> <span class="o">=</span> <span class="n">Thing</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">stuff</span> <span class="o">=</span> <span class="n">get_stuff</span><span class="p">(</span><span class="n">things</span><span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The stuff is </span><span class="si">{</span><span class="n">stuff</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>... and then whacking a single line of code into the view:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""Shows user some stuff"""</span>
<span class="n">things</span> <span class="o">=</span> <span class="n">Thing</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="c1"># Start debugging here</span>
<span class="kn">import</span> <span class="nn">pdb</span><span class="p">;</span><span class="n">pdb</span><span class="o">.</span><span class="n">set_trace</span><span class="p">()</span>
<span class="n">stuff</span> <span class="o">=</span> <span class="n">get_stuff</span><span class="p">(</span><span class="n">things</span><span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="sa">f</span><span class="s2">"The stuff is </span><span class="si">{</span><span class="n">stuff</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<p>That's it, you're now using Python's debugger.</p>
<h3>Yeah, but, what's it do?</h3>
<p>Here's a short video I made showing you an example of using pdb in a Django view:</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/7de384817fbc45f0918995646b199055" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>Quick reference</h3>
<p>The <a href="https://docs.python.org/3/library/pdb.html">Python pdb docs</a> tell you all the commands, but for completeness, here are the commands I used:</p>
<ul>
<li><code>__dict__</code> - print Python object attributes as a dictionary</li>
<li><code>type()</code> - print object type</li>
<li><code>l / ll</code> - show the current line of code</li>
<li><code>n</code> - execute next line</li>
<li><code>s</code> - step inside function</li>
<li><code>c</code> - exit debugger, continue running code</li>
<li><code>q</code> - quit debugger, throw an exception</li>
</ul>
<p>Some extra commands worth trying, which I didn't show you:</p>
<ul>
<li><code>help</code> - print debugger help</li>
<li><code>dir()</code> - print Python object functions available</li>
<li><code>locals()</code> - print local variables</li>
<li><code>globals()</code> - print global variables</li>
</ul>
<h3>Why the command line?</h3>
<p>You might be wondering why I insist on using pdb from the command line rather than using some fancy integrated IDE like PyCharm or Visual Studio. Basically I think these tools take too long to set up. Using pdb requires no set up time with nothing to install. If you use an IDE-based debugger, then anytime you switch editors you'll need to set up your debugging tools. You don't want to waste time debugging your debugger. No thanks!</p>
<h3>Bonus tip: run debugger on any exception</h3>
<p>You can also set up pdb to start running anytime there is an exception:</p>
<div class="highlight"><pre><span></span><code>python -m pdb -c <span class="k">continue</span> myscript.py
</code></pre></div>
<p>This doesn't work for Django, because of the way <code>runserver</code> handles exceptions, but you can use it for your other Python scripting.</p>
<p>If you're testing Django with pytest you can force the testing tool to drop into the pdb debugger when it hits an error:</p>
<div class="highlight"><pre><span></span><code>pytest --pdb
</code></pre></div>
<h3>Next steps</h3>
<p>Go out there and use pdb - it's one line of code! If you really want to step up your debugging, then I recommend learning how to write tests that reproduce your issue, and then use pdb in concert with your tests to find a fix, and make sure it stays fixed.</p>How to view Django logs with Papertrail2020-04-10T12:00:00+10:002020-04-10T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-10:/django-logging-papertrail.html<p>You have a Django app running on a webserver and hopefully you're <a href="https://mattsegal.dev/file-logging-django.html">writing your logs to a file</a>. If anything goes wrong you can search back through the logs and figure out what happened.</p>
<p>The problem is that to get to your logs, you have to log into your server …</p><p>You have a Django app running on a webserver and hopefully you're <a href="https://mattsegal.dev/file-logging-django.html">writing your logs to a file</a>. If anything goes wrong you can search back through the logs and figure out what happened.</p>
<p>The problem is that to get to your logs, you have to log into your server, find the right file and search through the text on the command line. It's possible to do but it's kind of a pain. Isn't there an easier way to view your Django app's logs? Wouldn't it be nice to search through them on a website?</p>
<p>This post will show you how to push your Django logs into <a href="https://www.papertrail.com/">Papertrail</a>. Papertrail is a free web-based log aggregator that is reasonably simple to set up. It stores ~6 days of searchable logs. It's best for small, simple projects where you don't want to do anything complicated.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/5ede7f70b62645ca82c1ffbf4c0e64eb" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<h3>Create an account</h3>
<p>You can start by going to the <a href="https://www.papertrail.com/">Papertrail website</a> and creating an account. Once that's done, you can visit <a href="https://papertrailapp.com/systems/setup?type=app&platform=unix">this page</a>, where you'll see a message like this:</p>
<blockquote>
<p>Your logs will go to logs2.papertrailapp.com:41234 and appear in Events.</p>
</blockquote>
<p>You need to note down two things from this page:</p>
<ul>
<li>The hostname: logs2.papertrailapp.com</li>
<li>The port: 41234</li>
</ul>
<p>These two peices of information will determine where Papertrail stores your logs, and they're essentially secrets that should be kept out of public view. Keep the page open, because it'll be useful later.</p>
<h3>Install Papertrail's remote_syslog2</h3>
<p>Papertrail uses some tool they've built called <code>remote_syslog2</code> to ship logs from your server into their storage. Assuming you're running Ubuntu or Debian, you can download the .deb installation file for remote_syslog2 from GitHub. As of the writing of this post, <a href="https://github.com/papertrail/remote_syslog2/releases/download/v0.20/remote-syslog2_0.20_amd64.deb">this is the latest release deb file</a>.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Download installation file to /tmp/</span>
<span class="nv">DEB_URL</span><span class="o">=</span><span class="s2">" https://github.com/papertrail/...deb"</span>
curl --location --silent <span class="nv">$DEB_URL</span> -o /tmp/remote_syslog.deb
<span class="c1"># Install remote_syslog2 from the file</span>
sudo dpkg -i /tmp/remote_syslog.deb
</code></pre></div>
<p>You can read more about remote_syslog <a href="https://help.papertrailapp.com/kb/configuration/configuring-centralized-logging-from-text-log-files-in-unix/">here</a>.</p>
<h3>Create logging config</h3>
<p>You can configure what logs get sent to Papertrail using a config file. This uses the YAML format and should live at <code>/etc/log_files.yml</code></p>
<div class="highlight"><pre><span></span><code><span class="c1"># /etc/log_files.yml</span><span class="w"></span>
<span class="nt">files</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/tmp/papertrail-test.log</span><span class="w"></span>
<span class="nt">destination</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">host</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">logs2.papertrailapp.com</span><span class="w"></span>
<span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">41234</span><span class="w"></span>
<span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tls</span><span class="w"></span>
</code></pre></div>
<h3>Run Papertrail with a test log file</h3>
<p>Start by testing out whether remote_syslog is setup correctly by running it in non-daemonized mode:</p>
<div class="highlight"><pre><span></span><code>remote_syslog -D --hostname myapp
</code></pre></div>
<p>Note that "hostname" can be whatever name you want. You should see some console output like this:</p>
<div class="highlight"><pre><span></span><code>... Connecting to logs2.papertrailapp.com:41234 over tls
... Cannot forward /tmp/papertrail-test.log, it may not exist
</code></pre></div>
<p>Make sure you have <a href="https://papertrailapp.com/systems/setup?type=app&platform=unix">this page</a> open in your web browser (or open it now). In another bash terminal, write some text to papertrail-test.log:</p>
<div class="highlight"><pre><span></span><code><span class="nb">echo</span> <span class="s2">"[</span><span class="k">$(</span>date<span class="k">)</span><span class="s2">] Test logline"</span> >> /tmp/papertrail-test.log
</code></pre></div>
<p>Now you should see, in your remote_syslog terminal, a new message:</p>
<div class="highlight"><pre><span></span><code>... Forwarding file: /tmp/papertrail-test.log
</code></pre></div>
<p>When you look at the page you have open, you should see something like:</p>
<blockquote>
<p>Logs received from myapp</p>
</blockquote>
<p>If you head to your <a href="https://papertrailapp.com/dashboard">dashboard</a> you should now see a new system added called "myapp". You should be also able to see your test log messages in the <a href="https://my.papertrailapp.com/systems/myapp/events">search panel for myapp</a>.</p>
<h3>Run Papertrail with real log files</h3>
<p>Now that you're happy that Papertrail is able to upload log messages, you can set it up to ship your log files. In this example, I'm going to upload data from the Django and gunicorn log files I created in <a href="https://mattsegal.dev/file-logging-django.html">this post</a>:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># /etc/log_files.yml</span><span class="w"></span>
<span class="nt">files</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/var/log/django.log</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/var/log/gunicorn/access.log</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">/var/log/gunicorn/error.log</span><span class="w"></span>
<span class="nt">destination</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">host</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">logs2.papertrailapp.com</span><span class="w"></span>
<span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">41234</span><span class="w"></span>
<span class="w"> </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tls</span><span class="w"></span>
</code></pre></div>
<p>When you are not testing with remote_syslog, you want to run it in daemonized mode:</p>
<div class="highlight"><pre><span></span><code>sudo remote_syslog --hostname myapp
</code></pre></div>
<p>You can check that it's still running by looking up its process:</p>
<div class="highlight"><pre><span></span><code>ps aux <span class="p">|</span> grep remote_syslog
</code></pre></div>
<p>If you need to stop it:</p>
<div class="highlight"><pre><span></span><code>pkill remote_syslog
</code></pre></div>
<p>That's it! Now you have remote_syslog running on your server, shipping log data off to Papertrail.</p>How to save Django logs in production2020-04-10T12:00:00+10:002020-04-10T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-10:/file-logging-django.html<p>You've deployed Django to a webserver and something has broken. There's an error <em>somewhere</em>. What happened? When you're debugging Django on your local computer, you can just throw a print statement into your code and check the output in the runserver logs. What about in production? Where do the logs …</p><p>You've deployed Django to a webserver and something has broken. There's an error <em>somewhere</em>. What happened? When you're debugging Django on your local computer, you can just throw a print statement into your code and check the output in the runserver logs. What about in production? Where do the logs go there? How can I set up Django so it's easy to see what is happening?</p>
<h2>Write your logs to a file</h2>
<p>You need to get your deployed Django app to write its logs to a file, so that you can look at them later. You can do this by configuring Django's settings. You will also need to use Python's logging library, rather than print statements. Why use logging over print? The logging library generally makes it easier to manage logs in production. Specifically, it makes it easier to:</p>
<ul>
<li>write logs to a file</li>
<li>track extra data like the current time and function</li>
<li>filter your logs</li>
</ul>
<p>You might be thinking that using "print" works fine when you're using Django's dev web server. It's true! Using "print" works fine locally, but when you're in production with DEBUG=False, you won't be able to see your print statements anymore in Django's log output. Log messages will still show up when you're working locally so there's nothing to lose by ditching print for logging.</p>
<h2>How to use logging in Django</h2>
<p>Before you set Django up to write logs to a file, you need to use Python's logging framework to write any log messages that you want to record. It's pretty easy, you just need to set it up in each module that needs it. For example, in one of your views:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="c1"># Import logging from Python's standard library</span>
<span class="kn">import</span> <span class="nn">logging</span>
<span class="c1"># Create a logger for this file</span>
<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="o">.</span><span class="n">getLogger</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">some_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Example view showing all the ways you can log messages.</span>
<span class="sd"> """</span>
<span class="n">logger</span><span class="o">.</span><span class="n">debug</span><span class="p">(</span><span class="s2">"This logs a debug message."</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s2">"This logs an info message."</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">warn</span><span class="p">(</span><span class="s2">"This logs a warning message."</span><span class="p">)</span>
<span class="n">logger</span><span class="o">.</span><span class="n">error</span><span class="p">(</span><span class="s2">"This logs an error message."</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">"This is a handled exception"</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">Exception</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">exception</span><span class="p">(</span><span class="s2">"This logs an exception."</span><span class="p">)</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">"This is an unhandled exception"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="s2">"this worked"</span><span class="p">)</span>
</code></pre></div>
<p>Most of the time I just use:</p>
<ul>
<li>logger.info for events I want to track, like a purchase being made</li>
<li>logger.error for logical errors, like things that should never happen according to business rules</li>
<li>logger.exception for when I catch an exception</li>
</ul>
<p>Once you've configured your logging in your settings (shown further below), you'll see messages like this appear in your log file (thanks to the info, warn and error methods):</p>
<div class="highlight"><pre><span></span><code>2020-04-10 03:35:05 [INFO ] (views.some_view) This logs an info message.
2020-04-10 03:35:05 [WARNING ] (views.some_view) This logs a warn message.
2020-04-10 03:35:05 [ERROR ] (views.some_view) This logs an error message.
</code></pre></div>
<p>And you'll see your message plus a stack trace when you log using the exeption method:</p>
<div class="highlight"><pre><span></span><code>2020-04-10 03:35:05 [ERROR ] (views.some_view) This logs an exception.
Traceback (most recent call last):
File ".../myproj/views.py", line 14, in log_view
raise Exception("This is a handled exception")
Exception: This is a handled exception
</code></pre></div>
<p>And you'll still get an error log and stack track for your unhandled exceptions:</p>
<div class="highlight"><pre><span></span><code>2020-04-10 03:35:05 [ERROR ] (log.log_response) Internal Server Error:
Traceback (most recent call last):
File ".../exception.py", line 34, in inner
response = get_response(request)
File ".../base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File ".../base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File ".../myproj/views.py", line 18, in log_view
raise Exception("This is an unhandled exception")
Exception: This is an unhandled exception
</code></pre></div>
<p>Importantly, you won't see any results from print statements, which is why you can't use them for production logging.</p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>How to set up file logging</h2>
<p>Now that you're sold on logging and you know how to use it in your code, you can set it up in your Django settings.</p>
<p>I like to do this by splitting my settings module up into two files - one for dev and one for production. Usually your Django project's main app will have your settings set up something like this:</p>
<div class="highlight"><pre><span></span><code>myapp
├── settings.py
├── urls.py
└── wsgi.py
</code></pre></div>
<p>I recommend turning settings into a folder, and moving the original settings.py file into the folder's __init__.py file:</p>
<div class="highlight"><pre><span></span><code>myapp
├── settings
| ├── __init__.py
| └── prod.py
├── urls.py
└── wsgi.py
</code></pre></div>
<p>So that __init__.py has all your original settings</p>
<div class="highlight"><pre><span></span><code><span class="c1"># __init__.py</span>
<span class="c1"># Base settings for myapp</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="n">BASE_DIR</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)))</span>
<span class="n">SECRET_KEY</span> <span class="o">=</span> <span class="s2">"xxx"</span>
<span class="n">DEBUG</span> <span class="o">=</span> <span class="kc">True</span>
<span class="n">ALLOWED_HOSTS</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># ... all the regular Django settings ...</span>
</code></pre></div>
<p>and prod.py has your production-only settings:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># prod.py</span>
<span class="c1"># Production settings for myapp</span>
<span class="kn">from</span> <span class="nn">.</span> <span class="kn">import</span> <span class="o">*</span> <span class="c1"># Import base settings from settings/__init__.py</span>
<span class="n">ALLOWED_HOSTS</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"www.myapp.com"</span><span class="p">]</span>
<span class="n">DEBUG</span> <span class="o">=</span> <span class="kc">False</span>
<span class="c1"># ... whatever else you need ...</span>
</code></pre></div>
<p>In this prod.py settings file, I recommend adding the following logging config:</p>
<div class="highlight"><pre><span></span><code><span class="n">LOGGING</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"version"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"disable_existing_loggers"</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span>
<span class="s2">"root"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"level"</span><span class="p">:</span> <span class="s2">"INFO"</span><span class="p">,</span> <span class="s2">"handlers"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"file"</span><span class="p">]},</span>
<span class="s2">"handlers"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"file"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"level"</span><span class="p">:</span> <span class="s2">"INFO"</span><span class="p">,</span>
<span class="s2">"class"</span><span class="p">:</span> <span class="s2">"logging.FileHandler"</span><span class="p">,</span>
<span class="s2">"filename"</span><span class="p">:</span> <span class="s2">"/var/log/django.log"</span><span class="p">,</span>
<span class="s2">"formatter"</span><span class="p">:</span> <span class="s2">"app"</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="s2">"loggers"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"django"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"handlers"</span><span class="p">:</span> <span class="p">[</span><span class="s2">"file"</span><span class="p">],</span>
<span class="s2">"level"</span><span class="p">:</span> <span class="s2">"INFO"</span><span class="p">,</span>
<span class="s2">"propagate"</span><span class="p">:</span> <span class="kc">True</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="s2">"formatters"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"app"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"format"</span><span class="p">:</span> <span class="p">(</span>
<span class="sa">u</span><span class="s2">"</span><span class="si">%(asctime)s</span><span class="s2"> [</span><span class="si">%(levelname)-8s</span><span class="s2">] "</span>
<span class="s2">"(</span><span class="si">%(module)s</span><span class="s2">.</span><span class="si">%(funcName)s</span><span class="s2">) </span><span class="si">%(message)s</span><span class="s2">"</span>
<span class="p">),</span>
<span class="s2">"datefmt"</span><span class="p">:</span> <span class="s2">"%Y-%m-</span><span class="si">%d</span><span class="s2"> %H:%M:%S"</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">}</span>
</code></pre></div>
<p>This is, admittedly, a horrific clusterfuck of configuration. It burns my eyes and I imagine it makes you want to slam your laptop shut and run away screaming. If you want to know how it all works, I recommend watching <a href="https://www.youtube.com/watch?v=DxZ5WEo4hvU">this presentation</a>. If not, feel free to blindly copy now and figure it out later.</p>
<p>The relevant area for you is in <code>LOGGING["handlers"]["file"]</code>. This dict defines the bit that acutally writes our logs to the file. The important key is "filename", which defines the filepath where your logs will be written. You might want to change this depending on your preferences.</p>
<h2>Use prod settings in production</h2>
<p>The last little trick you need is to tell Django to use your prod settings in production. You can do this a few ways, I like to do it by setting the DJANGO_SETTINGS_MODULE environment variable.</p>
<p>When I launch gunicorn, I do this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Set Django settings to use prod.py</span>
<span class="nb">export</span> <span class="nv">DJANGO_SETTINGS_MODULE</span><span class="o">=</span>myproj.settings.prod
<span class="c1"># Launch gunicorn as-per-normal</span>
gunicorn myproj.wsgi:application
</code></pre></div>
<h2>Bonus round: gunicorn logs</h2>
<p>If you're using gunicorn as your WSGI app server in production, you might also want to track your gunicorn logs. This will give you information about incoming web requests, and the app starting and stopping, which can be useful when debugging. To do this, you just need to set some command-line flags:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Set Django settings to use prod.py</span>
<span class="nb">export</span> <span class="nv">DJANGO_SETTINGS_MODULE</span><span class="o">=</span>myproj.settings.prod
<span class="c1"># Create logging folder for gunicorn.</span>
mkdir -p /var/log/gunicorn
<span class="c1"># Launch gunicorn with access and error logging.</span>
gunicorn myproj.wsgi:application <span class="se">\</span>
--error-logfile /var/log/gunicorn/error.log <span class="se">\</span>
--access-logfile /var/log/gunicorn/access.log
</code></pre></div>
<p>Gunicorn's access logs look something like this, telling you about incoming web requests:</p>
<div class="highlight"><pre><span></span><code>127.0.0.1 - - [10/Apr/2020:02:46:09 +0000] "GET /logs/ HTTP/1.1" 400 143 ...
127.0.0.1 - - [10/Apr/2020:02:46:43 +0000] "GET /logs/ HTTP/1.1" 500 145 ...
</code></pre></div>
<p>And the error logs are mostly information about the app booting up and stopping:</p>
<div class="highlight"><pre><span></span><code>[2020-04-10 12:45:57 +1000] [14814] [INFO] Starting gunicorn 20.0.4
[2020-04-10 12:45:57 +1000] [14814] [INFO] Listening at: http://127.0.0.1:8000
[2020-04-10 12:45:57 +1000] [14814] [INFO] Using worker: sync
[2020-04-10 12:45:57 +1000] [14817] [INFO] Booting worker with pid: 14817
[2020-04-10 12:46:38 +1000] [14814] [INFO] Handling signal: int
[2020-04-10 02:46:38 +0000] [14817] [INFO] Worker exiting (pid: 14817)
[2020-04-10 12:46:38 +1000] [14814] [INFO] Shutting down: Master
</code></pre></div>
<p>Both of these can be pretty useful when debugging issues in production.</p>
<h2>Next steps</h2>
<p>Once you've conifgured all of this, you'll be able to log into your webserver and see all your info events, error messages, access logs and gunicorn events. Finding and fixing an error in prod will be much easier with these logs.</p>
<p>Wouldn't it be nice if you didn't have to log into production to see these messages though? Even better, wouldn't it be great to search through your logs? That's when log aggregation tools like Papertrail or SumoLogic come in handy. I've written a guide on how to set up Papertrail <a href="https://mattsegal.dev/django-logging-papertrail.html">here</a>.</p>
<p>In addition, if you're running a professional operation, wouldn't it be good to get alerts when you have errors? That's when you need to <a href="https://mattsegal.dev/sentry-for-django-error-monitoring.html">set up error reporting</a> as well as logging.</p>How to customise a class based view in Django2020-04-09T12:00:00+10:002020-04-09T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-09:/customise-class-based-view-django.html<p>You've spend a little bit of time working on your Django app and you want to dip your toes into class-based views. The basic examples are simple enough, but once you want to do something more complicated, something more custom, you get stuck. How do you customise a class-based view …</p><p>You've spend a little bit of time working on your Django app and you want to dip your toes into class-based views. The basic examples are simple enough, but once you want to do something more complicated, something more custom, you get stuck. How do you customise a class-based view?</p>
<p>You've written some function-based views before, and they seem pretty straightforward, it's just a function! If you want to change how it works, you just change the code inside the function. Simple - no magic, no mystery, it's just code. Customising class based views seems much less user-friendly.</p>
<p>In this post I'll take you through a worked example, showing you how to customise class-based views.</p>
<h3>Example problem</h3>
<p>Let's start with an example problem. Say we've got a model called Article, used for publishing news online:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Article</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">)</span>
<span class="n">published_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">blank</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">null</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">title</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span>
<span class="n">body_html</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">TextField</span><span class="p">()</span>
</code></pre></div>
<p>We have a function-based view that lists all the articles:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">article_list_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">articles</span> <span class="o">=</span> <span class="n">Article</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">context</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'articles'</span><span class="p">:</span> <span class="n">articles</span><span class="p">}</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'news/article_list.html'</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
</code></pre></div>
<p>As I mentioned earlier, this function-based code is pretty easy to customise - you just change the code! Let's say we only want to list all the <em>published</em> articles and list them from newest to oldest:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">article_list_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">articles</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">Article</span><span class="o">.</span><span class="n">objects</span>
<span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">published_at__isnull</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="o">.</span><span class="n">order_by</span><span class="p">(</span><span class="s1">'-published_at'</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">context</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'object_list'</span><span class="p">:</span> <span class="n">articles</span><span class="p">}</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s2">"news/article_list.html"</span><span class="p">,</span> <span class="n">context</span><span class="p">)</span>
</code></pre></div>
<p>Now let's try doing the same thing with a class-based view. Listing all Articles is <em>super</em> simple. It's like 3 lines of code:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views</span>
<span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="n">template_name</span> <span class="o">=</span> <span class="s2">"news/article_list.html"</span>
</code></pre></div>
<p>Cool, cool, and now we need to do the next bit: list all the <em>published</em> articles and list them from newest to oldest. How the fuck do we do that? Where do you even start? Are you stressed? I'm stressed.</p>
<h3>The fix</h3>
<p>The fix is to read some documentation. Not the <a href="https://docs.djangoproject.com/en/3.0/ref/class-based-views/">Django docs</a>, which are great for a lot of topics. No, you are going to need to refer to <a href="https://ccbv.co.uk/">Classy Class-Based Views</a> to keep your sanity. Let's take a peek at the documentation for <a href="https://ccbv.co.uk/projects/Django/3.0/django.views.generic.list/ListView/">ListView</a>.</p>
<p>I'm going to cut to video to show you the rest of the fix.</p>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/914ef155a98f49faba6c3c8af3d686a4" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<p>You can use the techniques of overriding on any of the class-based view methods, depending on what you need to do.</p>
<p>A common method to override is get_context_data:</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">ArticleListView</span><span class="p">(</span><span class="n">ListView</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Article</span>
<span class="n">template_name</span> <span class="o">=</span> <span class="s2">"news/article_list.html"</span>
<span class="k">def</span> <span class="nf">get_context_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="n">context</span> <span class="o">=</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">get_context_data</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">return</span> <span class="p">{</span>
<span class="o">**</span><span class="n">context</span><span class="p">,</span>
<span class="s1">'now'</span><span class="p">:</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div>
<p>In summary, when you're stuck on a class-based view:</p>
<ul>
<li>Go to <a href="https://ccbv.co.uk/">Classy Class-Based Views</a></li>
<li>Take a peek at the attributes of the class</li>
<li>Scan over the methods of the class</li>
<li>Dig into the methods to figure out what you need to change</li>
<li>Set any attributes that are necessary</li>
<li>Override any methods that you need to change</li>
</ul>9 commands for debugging Django in Docker containers2020-04-08T12:00:00+10:002020-04-08T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-08:/docker-container-debugging.html<p>You want to get started "Dockerizing" your Django environment and you do a tutorial which shows you how to set it all up with docker-compose. You follow the listed commands and everything is working. Cool!</p>
<p>A few days later there's an error in your code and you want to debug …</p><p>You want to get started "Dockerizing" your Django environment and you do a tutorial which shows you how to set it all up with docker-compose. You follow the listed commands and everything is working. Cool!</p>
<p>A few days later there's an error in your code and you want to debug the issue. What caused your dev environment to break? Is it your code? Is it a dependencies issue? Is it a Docker thing? How can you tell?</p>
<p>I've compiled a list of handy Docker commands that I whip out in these "what the fuck is happening!?!?" situations to help me get to the bottom of the issue:</p>
<ul>
<li>Rebuild from scratch</li>
<li>Run a debugger</li>
<li>Get a bash shell in a running container</li>
<li>Get a bash shell in a brand new container</li>
<li>Run a script</li>
<li>Poke around inside of a PostgreSQL container</li>
<li>Watch some logs</li>
<li>View volumes</li>
<li>Destroy absolutely everything</li>
</ul>
<h3>Rebuild from scratch</h3>
<p>Sometimes you want to rebuild you Docker image from scratch, just to make sure. Rebuilding with the --no-cache flag ensures that your Dockerfile is executed from start to finish, with no intermediate cached layers used.</p>
<p>For docker:</p>
<div class="highlight"><pre><span></span><code>docker build --no-cache .
</code></pre></div>
<p>For docker-compose, assuming you have a "web" service:</p>
<div class="highlight"><pre><span></span><code>docker-compose build --no-cache web
</code></pre></div>
<h3>Run a debugger</h3>
<p>You might notice that using docker-compose, Django's runserver and the pdb debugger together doesn't really work.</p>
<p>If you've plopped your debugger into a Django view for example:</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">my_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="n">things</span> <span class="o">=</span> <span class="n">Thing</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">do_stuff</span><span class="p">(</span><span class="n">things</span><span class="p">)</span>
<span class="c1"># Launch Python command-line debugger</span>
<span class="kn">import</span> <span class="nn">pdb</span><span class="p">;</span><span class="n">pdb</span><span class="o">.</span><span class="n">set_trace</span><span class="p">()</span>
<span class="k">return</span> <span class="n">JsonResponse</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</code></pre></div>
<p>... and your <code>docker-compose.yml</code> file is something like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">services</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./manage.py runserver</span><span class="w"></span>
<span class="w"> </span><span class="c1"># ... more stuff ...</span><span class="w"></span>
</code></pre></div>
<p>... and you start your services like this:</p>
<div class="highlight"><pre><span></span><code>docker-compose up web
</code></pre></div>
<p>Then your Python debugger will never work! When the view hits the pdb.set_trace() function, you'll always see this horrible error:</p>
<div class="highlight"><pre><span></span><code> # ... 10 million lines of stack trace ...
File "/usr/lib/python3.6/bdb.py", line 51, in trace_dispatch
return self.dispatch_line(frame)
File "/usr/lib/python3.6/bdb.py", line 70, in dispatch_line
if self.quitting: raise BdbQuit
bdb.BdbQuit
</code></pre></div>
<p>This is an easy fix. The debugger, which is inside the Docker container, is trying to communicate with your terminal, which is outside of the Docker container, via some port, which is closed - hence the error. So we need to tell Docker to keep the required port open with --service-ports. More info <a href="https://stackoverflow.com/questions/33066528/should-i-use-docker-compose-up-or-run">here</a>:</p>
<div class="highlight"><pre><span></span><code>docker-compose run --rm --service-ports web
</code></pre></div>
<p>Now when you hit the debugger you will get a functional, interactive pdb interface in your terminal.</p>
<h3>Get a bash shell in a running container</h3>
<p>Sometimes you want to poke around inside a container that is already running. You might want to <code>cat</code> a file, run <code>ls</code> or inspect the output of <code>ps auxww</code>. To get inside a running container you can use docker's <code>exec</code> command.</p>
<p>First, you need to get the running container's id:</p>
<div class="highlight"><pre><span></span><code>docker ps
</code></pre></div>
<p>Which will get you and output like</p>
<div class="highlight"><pre><span></span><code>CONTAINER ID ... NAMES
0dd3d893u8d3 ... web
518f741c4415 ... worker
0ce1cfd9c99f ... database
</code></pre></div>
<p>Say I wanted to poke around in the "worker" container, then I need to note its id of "518f741c4415" and then run bash using <code>docker exec</code>:</p>
<div class="highlight"><pre><span></span><code>docker <span class="nb">exec</span> -it 518f741c4415 bash
</code></pre></div>
<p>It's a little easier if you're using docker-compose. If you want to get into an already running "web" container:</p>
<div class="highlight"><pre><span></span><code>docker-compose <span class="nb">exec</span> web bash
</code></pre></div>
<h3>Get a bash shell in a brand new container</h3>
<p>Sometimes you want to poke around inside a container that is based on an image, to see what is baked into the image. You can do this using docker or docker-compose.</p>
<p>For a service set up like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">services</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">myimage:latest</span><span class="w"></span>
<span class="w"> </span><span class="c1"># ... more stuff ...</span><span class="w"></span>
</code></pre></div>
<p>You can run the image <code>myimage</code> using docker:</p>
<div class="highlight"><pre><span></span><code>docker run --rm -it myimage:latest bash
</code></pre></div>
<p>Or via docker-compose:</p>
<div class="highlight"><pre><span></span><code>docker-compose run --rm web bash
</code></pre></div>
<p>Note the <code>--rm</code> flag, which will save you from having all these single use containers lying around, using up disk space.</p>
<h3>Run a script</h3>
<p>If you just want to run a script in a single-use, throw away container, you can use the <code>run</code> command as well. This is particularly useful for running management commands or unit tests:</p>
<div class="highlight"><pre><span></span><code>docker-compose run --rm web ./manage.py migrate
</code></pre></div>
<p>Note: this only works if your container's default working dir is contains <code>./manage.py</code>.</p>
<h3>Poke around inside of a PostgreSQL container</h3>
<p>If you're using Django and docker-compose then you're likely running a PostgreSQL container, set up something like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">services</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">database</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span><span class="w"></span>
<span class="w"> </span><span class="c1"># ... more stuff ...</span><span class="w"></span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">POSTGRES_HOST_AUTH_METHOD</span><span class="p">:</span><span class="w"> </span><span class="s">"trust"</span><span class="w"></span>
<span class="w"> </span><span class="nt">web</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">./manage.py runserver</span><span class="w"></span>
<span class="w"> </span><span class="c1"># ... more stuff ...</span><span class="w"></span>
<span class="w"> </span><span class="nt">environment</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">PGDATABASE</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span><span class="w"></span>
<span class="w"> </span><span class="nt">PGUSER</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">postgres</span><span class="w"></span>
<span class="w"> </span><span class="nt">PGPASSWORD</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">password</span><span class="w"></span>
<span class="w"> </span><span class="nt">PGHOST</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">database</span><span class="w"></span>
<span class="w"> </span><span class="nt">PGPORT</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">5432</span><span class="w"></span>
</code></pre></div>
<p>Then you can use the psql command line from the web container to check out your database tables:</p>
<div class="highlight"><pre><span></span><code>docker-compose run --rm web psql
</code></pre></div>
<h3>Watch some logs</h3>
<p>Sometimes you have a container, like a Celery worker or database, which is running in the background and you want to see its console output. Even better, you want to watch its console output in realtime. You can do this with <code>logs</code>. For example, if I want to follow the output of the "worker" container:</p>
<div class="highlight"><pre><span></span><code>docker-compose logs --tail <span class="m">100</span> -f worker
</code></pre></div>
<h3>View volumes</h3>
<p>Sometimes when you're having issues with volume you want to double check what volumes you have and how they're set up. This is relatively straightforward.</p>
<p>To see all volumes:</p>
<div class="highlight"><pre><span></span><code>docker volume ls
</code></pre></div>
<p>Which gets output like</p>
<div class="highlight"><pre><span></span><code>DRIVER VOLUME NAME
local docker_postgres-data
</code></pre></div>
<p>And then to drill down into one volume:</p>
<div class="highlight"><pre><span></span><code>docker volume inspect docker_postgres-data
</code></pre></div>
<p>Giving you something like</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"CreatedAt"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2020-04-08T12:44:34+10:00"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Driver"</span><span class="p">:</span><span class="w"> </span><span class="s2">"local"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Labels"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nt">"com.docker.compose.project"</span><span class="p">:</span><span class="w"> </span><span class="s2">"docker"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"com.docker.compose.version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1.23.1"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"com.docker.compose.volume"</span><span class="p">:</span><span class="w"> </span><span class="s2">"postgres-data"</span><span class="w"></span>
<span class="w"> </span><span class="p">},</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Mountpoint"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/var/lib/docker/volumes/docker_postgres-data/_data"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"docker_postgres-data"</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Options"</span><span class="p">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="nt">"Scope"</span><span class="p">:</span><span class="w"> </span><span class="s2">"local"</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">]</span><span class="w"></span>
</code></pre></div>
<p>If that doesn't help you, there's always the next step.</p>
<h3>Destroy absolutely everything</h3>
<p>There's a Docker command that removes all your "unused" data:</p>
<div class="highlight"><pre><span></span><code>docker system prune
</code></pre></div>
<p>That's nice, it might free up some disk space, but what if you want to go full scorched-earth on your Docker envrionemnt? Like tear down Carthage and salt the fields so that nothing will ever grow again?</p>
<p>Here's a script I use occasionally when I just want to get rid of <em>everything</em> and start afresh:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Stop all containers</span>
docker <span class="nb">kill</span> <span class="k">$(</span>docker ps -q<span class="k">)</span>
<span class="c1"># Remove all containers</span>
docker rm <span class="k">$(</span>docker ps -a -q<span class="k">)</span>
<span class="c1"># Remove all docker images</span>
docker rmi <span class="k">$(</span>docker images -q<span class="k">)</span>
<span class="c1"># Remove all volumes</span>
docker volume rm <span class="k">$(</span>docker volume ls -q<span class="k">)</span>
</code></pre></div>
<p>Burn it all down I say! From the ashes, we will rebuild!</p>
<p>If this doesn't fix your issue, I recommend that you throw your laptop out a window, sell all your worldy possesions and <a href="https://www.outsideonline.com/2411125/lynx-vilden-stone-age-life">start a new life in the wilderness</a>.</p>Introduction to configuration management2020-04-08T12:00:00+10:002020-04-08T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-08:/intro-config-management.html<p>This is a talk I gave at the Melbourne <a href="https://www.meetup.com/en-AU/Junior-Developers-Melbourne/">Junior dev meetup</a>:</p>
<blockquote>
<p>Have you ever found a bug in prod, which wasn't caught earlier because of a missing folder, library, or file permission? It sucks! This talk goes over some practices and tools that you can use to keep your …</p></blockquote><p>This is a talk I gave at the Melbourne <a href="https://www.meetup.com/en-AU/Junior-Developers-Melbourne/">Junior dev meetup</a>:</p>
<blockquote>
<p>Have you ever found a bug in prod, which wasn't caught earlier because of a missing folder, library, or file permission? It sucks! This talk goes over some practices and tools that you can use to keep your environments consistent and share knowledge with the rest of your team.</p>
</blockquote>
<div class="loom-embed"><iframe src="https://www.loom.com/embed/95fce3bb373e40f99ee91e5892ba177e" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>Sentry is great for tracking Django errors2020-04-08T12:00:00+10:002020-04-08T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-08:/sentry-for-django-error-monitoring.html<p>You've deployed a Django app to a webserver and now it's not working. Your app is throwing 500 Internal Server Errors - what's wrong? Why is this happening? It worked on my laptop!?</p>
<p>Even worse is when a <em>customer</em> experienced an error 12 hours ago and <em>you</em> need to figure out …</p><p>You've deployed a Django app to a webserver and now it's not working. Your app is throwing 500 Internal Server Errors - what's wrong? Why is this happening? It worked on my laptop!?</p>
<p>Even worse is when a <em>customer</em> experienced an error 12 hours ago and <em>you</em> need to figure out what went wrong.</p>
<h3>Error reporting</h3>
<p>You need something to alert you when errors happen in production, otherwise you're flying blind. How can you fix a bug if you don't know what happened? Error reporting is important if you're a new developer, because you're going to write a lot of bugs, or if you're experienced, since other people are likely relying on your code to work.</p>
<p>Django allows you to set up <a href="https://docs.djangoproject.com/en/3.0/howto/error-reporting/">email reports</a>, which requires some fiddling with mail servers, but it's a totally OK way to track errors.</p>
<p>My favourite way to monitor errors is using <a href="https://sentry.io/welcome/">Sentry</a>. It's a SaaS product that's been used at every Django job I've worked at and I use it for my personal projects. Here's why I like it so much.</p>
<h3>Easy to set up</h3>
<p>Sentry used to be a little harder to install, but now there are only 3 things you need to do in order to get started.</p>
<p>Install the Python package</p>
<div class="highlight"><pre><span></span><code>pip install sentry-sdk
</code></pre></div>
<p>Set an environment variable</p>
<div class="highlight"><pre><span></span><code><span class="nb">export</span> <span class="nv">SENTRY_DSN</span><span class="o">=</span><span class="s2">"https://xxx@sentry.io/yyy"</span>
</code></pre></div>
<p>And run a line of Python in your <em>production</em> settings.py</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">sentry_sdk</span>
<span class="kn">from</span> <span class="nn">sentry_sdk.integrations.django</span> <span class="kn">import</span> <span class="n">DjangoIntegration</span>
<span class="n">sentry_sdk</span><span class="o">.</span><span class="n">init</span><span class="p">(</span>
<span class="n">dsn</span><span class="o">=</span><span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"SENTRY_DSN"</span><span class="p">),</span>
<span class="n">integrations</span><span class="o">=</span><span class="p">[</span><span class="n">DjangoIntegration</span><span class="p">()],</span>
<span class="n">environment</span><span class="o">=</span><span class="s2">"prod"</span>
<span class="p">)</span>
</code></pre></div>
<h3>Rich reporting</h3>
<p>You want as much information as possible on the incident:</p>
<ul>
<li>when did it happen?</li>
<li>how many times did it happen?</li>
<li>what URL was requested?</li>
<li>what cookies were set?</li>
<li>was it for a particular user?</li>
<li>was it in a particular browser?</li>
<li>what line of code triggered the error?</li>
<li>what's the stack trace?</li>
</ul>
<p>Sentry captures an incredible amount of info when it logs an error, sometimes including the values of variables in scope and the database queries that were run.</p>
<h3>Free for multiple apps</h3>
<p>I host all my personal projects in Sentry for free. I think as long as you stick to one user you don't have to pay for it.</p>
<h3>Handles frontend errors</h3>
<p>You can use Sentry in your frontend JavaScript as well, which gives you a much more complete picture of what went wrong.</p>
<h3>Slack integration</h3>
<p>Some people might not like this, but if you pefer Slack to email, you can set up Sentry to post to a Slack channel when an error crops up.</p>
<h3>Deployment tracking</h3>
<p>If you're willing to do a little more legwork, you can configure Sentry to track your deployments. You give it a Git commit hash and it is able to correlate errors with particular deployments, making it easier to track down the offending code. This is particularly useful if you/your team are shippng multiple deployments per day.</p>
<h3>Wrapping up</h3>
<p>As you might have noticed, I'm pretty happy with Sentry, even after a few years of using it. There are a few little issues with it, like the overly-complex settings panel in the web UI, but overall it offers a low-friction user experience. Hopefully you'll get some use out of it.</p>
<p>Now, just because you have error monitoring set up, that doesn't mean you've done everything you need to in order to monitor your production environment. Application logging is essential as well! If you haven't already set up your Django app to write logs in production <a href="https://mattsegal.dev/file-logging-django.html">you can find out how here</a>.</p>3 ways to deploy a Django backend with a React frontend2020-04-07T12:00:00+10:002020-04-07T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-07:/django-spa-infrastructure.html<p>You're developing a web app with a Django REST backend and some sort of single page app frontend using React or Vue or something like that. There are many ways for you to run this app in production. There are a lot of choices that you need to make:</p>
<ul>
<li>Do …</li></ul><p>You're developing a web app with a Django REST backend and some sort of single page app frontend using React or Vue or something like that. There are many ways for you to run this app in production. There are a lot of choices that you need to make:</p>
<ul>
<li>Do you serve your frontend as a stand-alone static site or via Django views?</li>
<li>Do you put the backend and frontend on different subdomains?</li>
<li>Do you deploy the backend and frontend separately, or together?</li>
</ul>
<p>How do you choose? What is "the right way"?</p>
<p>Well, the bad news is that there is no "right way" to do this and there are a lot of different trade-offs to consider. The good news is that I've compiled three different options with their pros and cons.</p>
<h3>Option 1 - Cram it all into Django</h3>
<p>This is the "default" approach, where you have a Django site and you just add React to it. All your HTML is served via Django views, and all your JavaScript and CSS is bundled by Django and served as static files. All your code, frontend and backend, is in one Git repository. You serve the app from a single domain like www.myapp.com.</p>
<p>When you deploy your code using this setup, you will need to:</p>
<ul>
<li>Use <a href="https://webpack.js.org">webpack</a>, or something <a href="https://www.google.com/search?q=webpack+alternatives">similar</a>, to build your JavaScript and CSS assets and put them into a Django static files directory</li>
<li>Deploy Django like you usually would</li>
</ul>
<p>You will need to use a setup like <a href="https://pascalw.me/blog/2020/04/19/webpack-django.html">this</a> or <a href="https://github.com/owais/django-webpack-loader">django-webpack-loader</a> to integrate Webpack's build assets with Django's staticfiles system and templates. Other than that, it's a vanilla Django deployment.</p>
<p>The pros are:</p>
<ul>
<li><strong>Simplest infrastructure.</strong> Other than setting up django-webpack-loader and adding a Webpack build to the start of your deployment process, there's nothing else you need to do to your production infrastructure. Nothing extra to set up, pay for, configure, debug or tear your hair out over.</li>
<li><strong>Cross-cutting changes.</strong> If you need to make a change that affects both your frontend and backend, then you can do it all in one Git commit and get your changes into production using a single deployment.</li>
<li><strong>Tighter integration.</strong> With this setup you can use Django's views to pass context data from the backend to the frontend via templates. In addition, you can do server side rendering (with additional messing around with NodeJS).</li>
</ul>
<p>The cons are:</p>
<ul>
<li><strong>Single deployment for frontend and backend.</strong> Often you want to just deploy a small CSS or content change to the frontend, or a backend-only change. With this setup, you are forced to always deploy the backend and the frontend together. This means that you need to wait for the frontend to build, even if you didn't make any frontend changes! Even worse, a broken test, or linter error in the <em>other codebase</em> can fail a deployment, if you're using continuous integration practices. You don't want your database migration deployment to fail just because someone forgot to use semicolons in their JavaScript.</li>
<li><strong>Tangled tech stack.</strong> Backend devs will need to know a little React, and frontend devs will need to know a little Django for this system to work.</li>
<li><strong>Tricksy django-webpack-loader.</strong> Setting up the integration between Webpack and Django has been a painful process for me in the past. I don't remember why, I just remember pain. Truthfully, every option on this list will involve you wanting to throw your computer out of a window at some point, and this one is no exception.</li>
</ul>
<p>Choose this when:</p>
<ul>
<li>You want to keep your infrastructure simple</li>
<li>You don't care about deployment times</li>
<li>You typically deploy the frontend and backend together</li>
<li>You need a tight integration between the frontend and backend (eg. data passing, server-side rendering)</li>
</ul>
<h3>Option 2 - Completely separate infrastructure</h3>
<p>This is an approach that has become more popular over the last several years. In this setup you have two separate codebases, one for the frontend and one for the backend, each with their own Git repository.</p>
<p>The frontend is deployed as a "static site" of just HTML CSS and JavaScript assets. It is hosted separately to Django, in an <a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteHosting.html">AWS S3 bucket</a>, <a href="https://www.netlify.com/">Netlify</a>, or something similar. The frontend is built, tested and deployed independently of the backend. The frontend gets data from the backend soley through REST API calls.</p>
<p>The backend is a Django REST API with no HTML views (other than the admin pages), and hosts no static content (other than what's needed for the admin). It is built, tested and deployed independently of the frontend.</p>
<p>Importantly, since the frontend and backend are on different servers, they will also have different domain names. The backend might live on something like api.myapp.com and the frontend on www.myapp.com.</p>
<p>The pros are:</p>
<ul>
<li><strong>Independent deployments.</strong> No waiting on the backend to deploy the frontend and vice versa.</li>
<li><strong>Separation of concerns.</strong> Backend developers only need to think about the API, not views or CSS. Frontend developers only need to think in terms of the API presented by the backend, not the internal workings of Django. You <em>can</em> achieve this using option 1, but this method enforces it more strictly.</li>
<li><strong>If the backend goes down, the frontend still works.</strong> Your users will still experience errors, but the site won't appear as broken.</li>
<li><strong>Security permissions can be split up.</strong> You can split up who is allowed to deploy the frontend vs the backend, typically meaning more people will have the power to deploy, making your team more productive.</li>
</ul>
<p>The cons are:</p>
<ul>
<li><strong>More infrastructure.</strong> You will need to set up and maintain the static site plus an extra deployment process, which is more work, more complexity.</li>
<li><strong>Cross-domain fuckery.</strong> You run into several problems because your frontend is on a different subdomain to your backend. You need to do some extra configuration of your Django settings to allow the frontend to talk to the backend properly. It's a security thing apparently. If you don't fix this you can have issues with making API requests to the backend, receiving cookies, and stuff like that. I don't understand it super well. I don't <em>want</em> to understand it super well. I have better shit to do than figure out the correct value of SESSION_COOKIE_DOMAIN, CORS_ORIGIN_REGEX_WHITELIST and friends. Even worse, cross-domain issues do not crop up on your local machine, because everything is served from localhost, so you need to deploy your configuration before you know if you got it right.</li>
</ul>
<p>Here are some cross domain Django settings that I hope you never need to think about:</p>
<ul>
<li>SESSION_COOKIE_DOMAIN</li>
<li>CSRF_COOKIE_DOMAIN</li>
<li>CSRF_TRUSTED_ORIGINS</li>
<li>CORS_ORIGIN_ALLOW_ALL</li>
<li>CORS_ALLOW_CREDENTIALS</li>
<li>CORS_ORIGIN_REGEX_WHITELIST</li>
</ul>
<p>Choose this when:</p>
<ul>
<li>You have separate dedicated frontend and backend developers</li>
<li>You want to deploy the backend and frontend separately</li>
<li>You want to <em>completely</em> decouple your backend and frontend infrastructure</li>
<li>You don't mind a little more operational complexity and configuration</li>
</ul>
<h3>Option 3 - One server, separate deployments</h3>
<p>This approach is an attempted fusion of options 1 and 2. The idea is to still deploy the frontend as a separate static site, but you deploy everything to one server, under a single domain name:</p>
<ul>
<li>You have two separate codebases for the backend and frontend respectively</li>
<li>Both codebases are deployed indepdently, but to the same server</li>
<li>Both codebases are hosted on a single domain, like wwww.myapp.com</li>
</ul>
<p>You manage this by using a webserver, like NGINX, which handles all incoming requests. Requests to the URL path /api/ get sent to the WSGI server which runs your Django app (traditional reverse-proxy setup), while all other requests are sent to the frontend, which is set up as a static site and served from the filesystem (eg. /var/www/).</p>
<p>The pros are:</p>
<ul>
<li><strong>Most of the benefits of Option 2.</strong> Separation of concerns and independent deployments are still possible.</li>
<li><strong>No "cross-domain fuckery".</strong> Since all requests are served from one domain, you shouldn't need to mess around with all those horrible cross-domain settings in Django.</li>
</ul>
<p>The cons are:</p>
<ul>
<li><strong>More infrastructure.</strong> This setup is still more complex than the "Cram it all into Django" option.</li>
<li><strong>Requires control over host webserver.</strong> You need to be able to install and configure NGINX, deploy files to the filesystem etc. to get this done. This is straightforward if you're using a typical cloud virtual machine, but might be more tricky if you're using something like Heroku (not sure).</li>
</ul>
<p>Choose this when:</p>
<ul>
<li>You want to split up frontend and backend, but you don't need completely separate infrastructure</li>
<li>You have sufficient control over your host webserver</li>
</ul>
<p>I'll be honest here. I actually have never tried option 3 (I've used 1 + 2 before). I thought it up when replying to a Reddit post. I think it'll work though. Good luck!</p>How to restart Celery on file change2020-04-07T12:00:00+10:002020-04-07T12:00:00+10:00Matthew Segaltag:mattsegal.dev,2020-04-07:/restart-celery-on-file-change.html<p>I use Celery and Django together a lot. My biggest pain when doing local development with Celery is that the worker process won't restart when I change my code. Django's <code>runserver</code> restarts on code change, why can't Celery? How can you set up your dev envrionent to force Celery to …</p><p>I use Celery and Django together a lot. My biggest pain when doing local development with Celery is that the worker process won't restart when I change my code. Django's <code>runserver</code> restarts on code change, why can't Celery? How can you set up your dev envrionent to force Celery to restart on file change?</p>
<p><a href="https://github.com/gorakhargosh/watchdog">Watchdog</a> is a nifty little (cross-platform?) Python library that watches for filesystem changes. We can use Watchdog to restart <em>anything</em> on file change.</p>
<p>First, you need to install it in your local Python environment:</p>
<div class="highlight"><pre><span></span><code>pip install watchdog<span class="o">[</span>watchmedo<span class="o">]</span>
</code></pre></div>
<p>Let's say you normally start Celery like this:</p>
<div class="highlight"><pre><span></span><code>celery worker --broker redis://localhost:6379 --app myapp
</code></pre></div>
<p>Now you can start it like this:</p>
<div class="highlight"><pre><span></span><code>watchmedo <span class="se">\</span>
auto-restart <span class="se">\</span>
--directory ./my-code/ <span class="se">\</span>
--recursive <span class="se">\</span>
--pattern <span class="s1">'*.py'</span> <span class="se">\</span>
-- <span class="se">\</span>
celery worker --broker redis://localhost:6379 --app myapp
</code></pre></div>
<p>That's it! Watchdog will restart the process on file change. If you like, you can specify:</p>
<ul>
<li>multiple code directories using --directory</li>
<li>different file patterns using --pattern</li>
</ul>How to deploy Django migrations2020-04-04T12:00:00+11:002020-04-04T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2020-04-04:/deploy-django-migrations.html<p>You've started learning Django, you've created a new Django app and you've deployed it to a Linux webserver in the cloud somewhere. It's all set up and running nicely. Now you want to make some more changes and you need to update your models.</p>
<p>How do you deploy those model …</p><p>You've started learning Django, you've created a new Django app and you've deployed it to a Linux webserver in the cloud somewhere. It's all set up and running nicely. Now you want to make some more changes and you need to update your models.</p>
<p>How do you deploy those model changes? How do you get those changes into your production database?</p>
<p>First I'll show you a simple way to run your migrations on a deployed Django app with a worked example, then I'll discuss some more advanced considerations.</p>
<h3>Simple method</h3>
<p>This simple method is how I like to run migrations. It's for when your web app:</p>
<ul>
<li>is a personal project</li>
<li>has low traffic</li>
<li>is only used by internal staff members</li>
</ul>
<p>Basically any situation where a few seconds of downtime isn't that important.</p>
<h4>Update your model</h4>
<p>First you need to make the changes you want to your model class.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Person</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="sd">"""A human person"""</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">128</span><span class="p">)</span>
<span class="n">height</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">FloatField</span><span class="p">()</span>
<span class="c1"># Add new attribute "weight"</span>
<span class="n">weight</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">FloatField</span><span class="p">()</span>
</code></pre></div>
<h4>Create the migration script locally</h4>
<p>Once that is done, you want to use Django's command-line managment script <code>makemigrations</code> to auto-generate the migrations script.</p>
<div class="highlight"><pre><span></span><code>./manage.py makemigrations
</code></pre></div>
<p>You should see a new file in your app's migrations folder. It'll have some whacky name like <code>0002_auto_20170219_2310.py</code>. If you're using Git, don't forget to commit this file.</p>
<h4>Run the migration script locally</h4>
<p>The <code>makemigrations</code> command only generates a script which applies your models changes to the database. To actually run that code and apply the changes, you need to run the <code>migrate</code> script:</p>
<div class="highlight"><pre><span></span><code>./manage.py migrate
</code></pre></div>
<h4>Check nothing broke</h4>
<p>After you've done that, you should do some testing to make sure that the migrations actually worked. Check the admin panel to see that the model has changed in your local database, test out your app to see that you haven't broken any existing functionality. If you've got automated tests, run them. Once you're happy that it's all good, move on.</p>
<h4>Deploy the migrations</h4>
<p>Now that you've generated your migration script it's time to apply it to the production database:</p>
<ul>
<li>Copy all your new code onto the server. Ideally instead of just picking single files, just copy all the .py files in your project to make sure you didn't miss anything</li>
<li>Stop your WSGI server</li>
<li>Delete all of the old code on the server, including any .pyc files</li>
<li>Move the new code to where the old code was</li>
<li>Apply your migrations with <code>./manage.py migrate</code></li>
<li>Start your WSGI server again</li>
</ul>
<h4>Why delete all the old code?</h4>
<p>It might seem scary deleting all your deployed code and replacing it, but the alternative of just uploading a few files is even more risky. You could miss:</p>
<ul>
<li>The auto-generated migration file</li>
<li>The model file that you changed</li>
<li>Any other code that you've updated which depends on the updated model</li>
</ul>
<p>It's best to just nuke everything and start from scratch. This will ensure that your production code stays the same as your development code.</p>
<h4>Is this the best way to do it?</h4>
<p>The good thing about this method is that you don't have to worry about keeping your migrations backwards compatible. Sometimes your model changes will break your old code, but not your new code, like when you remove a field from a Django model. This method will keep you from running into that issue.</p>
<p>BUT, following these steps will take your site down for a few seconds, which is fine for a lot of cases, but is bad if any downtime is unnacceptable.</p>
<p>Websites that need to always stay up usually use a method called <a href="https://rollout.io/blog/blue-green-deployment/">"blue-green" deployments</a>, where there are many severs running at once.</p>
<p>If you are doing blue-green deployments, this method will not work, and you will need to construct and deploy <a href="https://gist.github.com/majackson/493c3d6d4476914ca9da63f84247407b">backwards-compatible migrations</a>.</p>
<p>Don't invent problems for yourself though, keep your process simple if you can.</p>Fix long running tasks in Django views2020-04-02T12:00:00+11:002020-04-02T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2020-04-02:/offline-tasks.html<p>What do you do if you have a Django view that runs too slow? Slow views are a bad user experience. Users hate waiting. Even worse, if the view takes too long to return a response, they will receive a "408 Request Timeout" error, completely ruining the website experience.</p>
<p>Sometimes …</p><p>What do you do if you have a Django view that runs too slow? Slow views are a bad user experience. Users hate waiting. Even worse, if the view takes too long to return a response, they will receive a "408 Request Timeout" error, completely ruining the website experience.</p>
<p>Sometimes you can fine tune your code and improve the performance enough to fix the slow runtime, but sometimes there's nothing you can do to make it faster. What do you do when your code looks like this?</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_slow_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Performs a long running task for the user (slow response time).</span>
<span class="sd"> """</span>
<span class="n">long_running_task</span><span class="p">(</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span> <span class="c1"># Takes 30s</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="s2">"Your task is finished!"</span><span class="p">)</span>
</code></pre></div>
<p>This kind of situation can happen when you have to:</p>
<ul>
<li>call out to an external API which is slow</li>
<li>do some computationally expensive data crunching</li>
<li>make some slow database queries</li>
<li>any combination of the above</li>
</ul>
<p>So how do you fix this problem? You can't make your <code>long_running_task</code> any faster - that's out of your control, so what can you do? The solution is to push the execution of your long running task <em>somewhere else</em>.</p>
<h3>Somewhere else?</h3>
<p>In Django, when your view function runs, everything is happening on one thread. That is to say, each line of code has to run one after the other. We want to push our long running code into a different thread so that the view doesn't have to wait for our task to finish before it can return a response. We want to do something like this:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">my_fast_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Performs a long running task for the user (quick response time).</span>
<span class="sd"> """</span>
<span class="n">run_offline</span><span class="p">(</span><span class="n">long_running_task</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span> <span class="c1"># Takes 0.01s</span>
<span class="c1"># ... runs for 30s somewhere else.</span>
<span class="k">return</span> <span class="n">HttpResponse</span><span class="p">(</span><span class="s2">"Your will task be finished soon!"</span><span class="p">)</span>
</code></pre></div>
<p>This is a common problem and Django has a lot of tools that will provide this functionality. There's <a href="http://www.celeryproject.org/">Celery</a>, <a href="https://huey.readthedocs.io/en/latest/django.html">Huey</a>, <a href="https://github.com/rq/django-rq">Django Redis Queue</a>. For most projects I recommend using <a href="https://django-q.readthedocs.io/en/latest/">Django Q</a>, for the reasons outlined in <a href="https://mattsegal.dev/simple-scheduled-tasks.html">this post</a>.</p>
<h3>Setting up Django Q</h3>
<p>To get started you need Django Q set up. You can skip past this section to the worked example below and do this later.</p>
<p>The first thing to do is install the Django Q package alongside Django:</p>
<div class="highlight"><pre><span></span><code>pip install django-q
</code></pre></div>
<h4>Configure settings</h4>
<p>Then we need to adjust our Django settings so that Django knows that it should use the Django Q app. We also need to configure Django Q to use the database as the task broker.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># shop/settings.py</span>
<span class="c1"># Add Django-Q to your installed apps.</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s1">'django_q'</span>
<span class="p">]</span>
<span class="c1"># Configure your Q cluster</span>
<span class="c1"># More details https://django-q.readthedocs.io/en/latest/configure.html</span>
<span class="n">Q_CLUSTER</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"shop"</span><span class="p">,</span>
<span class="s2">"orm"</span><span class="p">:</span> <span class="s2">"default"</span><span class="p">,</span> <span class="c1"># Use Django's ORM + database for broker</span>
<span class="p">}</span>
</code></pre></div>
<h4>Apply migrations</h4>
<p>Once this is done, we need to run our database migrations to create the tables that Django Q needs:</p>
<div class="highlight"><pre><span></span><code>./manage.py migrate
</code></pre></div>
<h4>Run the task process</h4>
<p>Finally, we need to run the Django Q process. This is the "somewhere else" where our long-running tasks will execute. If you don't run the qcluster management command, your offline tasks will never run. To get this process started, open a new terminal window start the Django Q cluster via the Django management script:</p>
<div class="highlight"><pre><span></span><code>./manage.py qcluster
</code></pre></div>
<h3>Worked example</h3>
<p>Imagine you run a stock-trading website. Your user owns a bunch of stocks - like 60 different stocks. Sometimes they want to click a button to refresh all their stocks so they can see the latest prices. The problem is that you need to hit a 3rd party API to get the new prices. Say each API call takes 500ms, that's 30s of waiting!</p>
<h4>Slow version</h4>
<p>Consider the following Stock model:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Stock</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">code</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DecimalField</span><span class="p">(</span><span class="n">decimal_places</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">max_digits</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">,</span> <span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">CASCADE</span><span class="p">)</span>
</code></pre></div>
<p>... and this slow view:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="k">def</span> <span class="nf">refresh_stocks_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Refreshes a user's stocks (slow version)</span>
<span class="sd"> """</span>
<span class="n">stocks</span> <span class="o">=</span> <span class="n">Stocks</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="c1"># Go through all stocks and update prices, takes at least 30s</span>
<span class="k">for</span> <span class="n">stock</span> <span class="ow">in</span> <span class="n">stocks</span><span class="p">:</span>
<span class="n">stock</span><span class="o">.</span><span class="n">price</span> <span class="o">=</span> <span class="n">some_api</span><span class="o">.</span><span class="n">fetch_price</span><span class="p">(</span><span class="n">stock</span><span class="o">.</span><span class="n">code</span><span class="p">)</span>
<span class="n">stock</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'stocks.html'</span><span class="p">,</span> <span class="p">{</span><span class="s1">'stocks'</span><span class="p">:</span> <span class="n">stocks</span><span class="p">})</span>
</code></pre></div>
<h4>Fast offline version</h4>
<p>We can start by moving the slow code into a task function, which will run inside of Django Q. By convention, I like to put these into a <code>tasks.py</code> module:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># task.py</span>
<span class="k">def</span> <span class="nf">refresh_stocks_task</span><span class="p">(</span><span class="n">stock_ids</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Refreshes all stocks in `stock_ids`, a list of ids.</span>
<span class="sd"> """</span>
<span class="n">stocks</span> <span class="o">=</span> <span class="n">Stocks</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">id__in</span><span class="o">=</span><span class="n">stock_ids</span><span class="p">)</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="c1"># Go through all stocks and update prices, takes at least 30s</span>
<span class="k">for</span> <span class="n">stock</span> <span class="ow">in</span> <span class="n">stocks</span><span class="p">:</span>
<span class="n">stock</span><span class="o">.</span><span class="n">price</span> <span class="o">=</span> <span class="n">some_api</span><span class="o">.</span><span class="n">fetch_price</span><span class="p">(</span><span class="n">stock</span><span class="o">.</span><span class="n">code</span><span class="p">)</span>
<span class="n">stock</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</code></pre></div>
<p>Note that the task function takes a list of ids (<code>stock_ids</code>) - why not a list of Stock objects? The reason is that when Django Q stores the task in the database, waiting for execution, the task arguments are serialized as a string (or something like that). A Django model cannot be serialized into a string, so we need to use the ids instead.</p>
<p>Now that we've created the task function, we just need to call it from our view:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="kn">from</span> <span class="nn">django_q.tasks</span> <span class="kn">import</span> <span class="n">async_task</span>
<span class="kn">from</span> <span class="nn">.tasks</span> <span class="kn">import</span> <span class="n">refresh_stocks_task</span>
<span class="k">def</span> <span class="nf">refresh_stocks_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Refreshes a user's stocks (fast version)</span>
<span class="sd"> """</span>
<span class="n">stocks</span> <span class="o">=</span> <span class="n">Stocks</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="n">stock_ids</span> <span class="o">=</span> <span class="n">stocks</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="n">flat</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Dispatch task to Django Q - runs in <1s</span>
<span class="n">async_task</span><span class="p">(</span><span class="n">refresh_stocks_task</span><span class="p">,</span> <span class="n">stock_ids</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'stocks.html'</span><span class="p">,</span> <span class="p">{</span><span class="s1">'stocks'</span><span class="p">:</span> <span class="n">stocks</span><span class="p">})</span>
</code></pre></div>
<p>That's basically it, but there's one level of complexity we can add for a better user experience.</p>
<h4>Loading state</h4>
<p>In the slow version, the user submits a request, waits 30s and eventually gets a response back with the new stock prices. In the fast version, the user gets a response back much faster, but their stock data isn't updated yet! They'll have to wait 30s and refresh the page to get the latest data, but there's no indication that anything happened. We can add a loading state the Stocks model to help the user understand what is going on:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># models.py</span>
<span class="k">class</span> <span class="nc">Stock</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">code</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">CharField</span><span class="p">(</span><span class="n">max_length</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DecimalField</span><span class="p">(</span><span class="n">decimal_places</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">max_digits</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">ForeignKey</span><span class="p">(</span><span class="n">User</span><span class="p">,</span> <span class="n">on_delete</span><span class="o">=</span><span class="n">models</span><span class="o">.</span><span class="n">CASCADE</span><span class="p">)</span>
<span class="n">is_loading</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">BooleanField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</code></pre></div>
<p>Then in the view we can set all our pending Stocks to "loading":</p>
<div class="highlight"><pre><span></span><code><span class="c1"># views.py</span>
<span class="kn">from</span> <span class="nn">django_q.tasks</span> <span class="kn">import</span> <span class="n">async_task</span>
<span class="kn">from</span> <span class="nn">.tasks</span> <span class="kn">import</span> <span class="n">refresh_stocks_task</span>
<span class="k">def</span> <span class="nf">refresh_stocks_view</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Refreshes a user's stocks (fast version)</span>
<span class="sd"> """</span>
<span class="n">stocks</span> <span class="o">=</span> <span class="n">Stocks</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">user</span><span class="o">=</span><span class="n">request</span><span class="o">.</span><span class="n">user</span><span class="p">)</span>
<span class="n">stock_ids</span> <span class="o">=</span> <span class="n">stocks</span><span class="o">.</span><span class="n">values_list</span><span class="p">(</span><span class="s1">'id'</span><span class="p">,</span> <span class="n">flat</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">stocks</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">is_loading</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Dispatch task to Django Q - runs in <1s</span>
<span class="n">async_task</span><span class="p">(</span><span class="n">refresh_stocks_task</span><span class="p">,</span> <span class="n">stock_ids</span><span class="p">)</span>
<span class="k">return</span> <span class="n">render</span><span class="p">(</span><span class="n">request</span><span class="p">,</span> <span class="s1">'stocks.html'</span><span class="p">,</span> <span class="p">{</span><span class="s1">'stocks'</span><span class="p">:</span> <span class="n">stocks</span><span class="p">})</span>
</code></pre></div>
<p>Finally, we can set the Stock state back to "not loading" when the new price is fetched:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># task.py</span>
<span class="k">def</span> <span class="nf">refresh_stocks_task</span><span class="p">(</span><span class="n">stock_ids</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> Refreshes all stocks in `stock_ids`, a list of ids.</span>
<span class="sd"> """</span>
<span class="n">stocks</span> <span class="o">=</span> <span class="n">Stocks</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">id__in</span><span class="o">=</span><span class="n">stock_ids</span><span class="p">)</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="c1"># Go through all stocks and update prices, takes at least 30s</span>
<span class="k">for</span> <span class="n">stock</span> <span class="ow">in</span> <span class="n">stocks</span><span class="p">:</span>
<span class="n">stock</span><span class="o">.</span><span class="n">price</span> <span class="o">=</span> <span class="n">some_api</span><span class="o">.</span><span class="n">fetch_price</span><span class="p">(</span><span class="n">stock</span><span class="o">.</span><span class="n">code</span><span class="p">)</span>
<span class="n">stock</span><span class="o">.</span><span class="n">is_loading</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">stock</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
</code></pre></div>
<p>Now the user will request a refresh, see that all of their stocks are loading, and when the new prices have been set the user will see them once they refresh the page again.</p>
<p>That's it, hopefully you can now get started doing offline processing in Django. Enjoy!</p>Simple scheduled tasks with Django Q2020-03-30T12:00:00+11:002020-03-30T12:00:00+11:00Matthew Segaltag:mattsegal.dev,2020-03-30:/simple-scheduled-tasks.html<p>How do you run some code once a day in Django, or every hour? This post will explain how to set up scheduled code execution in Django using Django-Q.</p>
<p>There are a lot of reasons you might want to run code on a schedule. You may want to:</p>
<ul>
<li>Process a …</li></ul><p>How do you run some code once a day in Django, or every hour? This post will explain how to set up scheduled code execution in Django using Django-Q.</p>
<p>There are a lot of reasons you might want to run code on a schedule. You may want to:</p>
<ul>
<li>Process a batch of data every night</li>
<li>Send out a bunch of emails once a week</li>
<li>Regularly scrape a website and store the results in the database</li>
</ul>
<p>If you're running a backend web service, you will need to do something like this eventually.</p>
<p>When you ask around online for help with setting up a scheduler in Django, people will often point you to <a href="http://www.celeryproject.org/">Celery</a>. If you look at Celery's website:</p>
<blockquote>
<p>Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.</p>
</blockquote>
<p>Asynchronous what? Distributed? Sounds complicated. Do I need that? Celery is intimidating for beginners, and it happens to be pain in the ass to set up. If you happen to need Celery, then it's well worth the effort, but I believe that it's overkill for most people.</p>
<p>The biggest stumbling block is that Celery requires that you set up some kind of "<a href="http://docs.celeryproject.org/en/latest/getting-started/brokers/">broker</a>", which is a program which keeps track of all the tasks that need to be done. You will need to install and run a program like <a href="https://redis.io/">Redis</a> or <a href="https://www.rabbitmq.com/">RabbitMQ</a> to run Celery, which makes getting started more complciated, and gives you more infrastructure to worry about.</p>
<p>I think the best solution for beginners is <a href="https://django-q.readthedocs.io/en/latest/">Django-Q</a>. It's simpler to set up and run in production than Celery, and it is perfectly fine for basic scheduling tasks. Django-Q can use just your existing database as a broker, which means you don't have to set up any new infrastructure. If you find that you need to use a different broker later on, then you can swap out the database for something else.</p>
<h2>Example project</h2>
<p>The <a href="https://django-q.readthedocs.io/en/latest/install.html">Django-Q installation docs</a> are reasonably good, but if you're new to programming you might struggle to put all the pieces together. I've created a worked example to try to give you the full picture. You can check out the full code on <a href="https://github.com/MattSegal/devblog-examples/tree/master/django-q-scheduling-example">GitHub</a>.</p>
<p>Let's say I have a Django app that is and online store which has a Discount model. This model keeps track of:</p>
<ul>
<li>when it was created (<code>created_at</code>)</li>
<li>the amount that should be discounted (<code>amount</code>)</li>
</ul>
<div class="highlight"><pre><span></span><code><span class="c1"># discounts/models.py</span>
<span class="k">class</span> <span class="nc">Discount</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="n">created_at</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">DateTimeField</span><span class="p">(</span><span class="n">default</span><span class="o">=</span><span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">)</span>
<span class="n">amount</span> <span class="o">=</span> <span class="n">models</span><span class="o">.</span><span class="n">IntegerField</span><span class="p">()</span>
</code></pre></div>
<p>And let's say that every minute I want to delete every discount that is older than a minute. It's a silly thing to do, but this is just an learning example. So how do we set up Django-Q to do this?</p>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<form action="https://dev.us19.list-manage.com/subscribe/post?u=e7a1ec466f7bb1732dbd23fc7&id=ec345473bd" method="post" name="mc-embedded-subscribe-form" target="_blank" style="text-align: center; padding-bottom: 1em;" novalidate>
<h3 class="subscribe-cta">Get alerted when I publish new blog posts</h3>
<div class="ui fluid action input subscribe">
<input
type="email"
value=""
name="EMAIL"
placeholder="Enter your email address"
/>
<button class="ui primary button" type="submit" name="subscribe">
Subscribe
</button>
</div>
<div style="position: absolute; left: -5000px;" aria-hidden="true">
<input
type="text"
name="b_e7a1ec466f7bb1732dbd23fc7_ec345473bd"
tabindex="-1"
value=""
/>
</div>
</form>
<div class="ui divider" style="margin: 1.5em 0;"></div>
<h2>Install the package</h2>
<p>First thing to do is install the Django-Q package:</p>
<div class="highlight"><pre><span></span><code>pip install django-q
</code></pre></div>
<h2>Configure settings</h2>
<p>Then we need to adjust our Django settings so that Django knows that it should use the Django-Q app. We also need to configure Django-Q to use the database as the task broker.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># shop/settings.py</span>
<span class="c1"># Add Django-Q to your installed apps.</span>
<span class="n">INSTALLED_APPS</span> <span class="o">=</span> <span class="p">[</span>
<span class="c1"># ...</span>
<span class="s1">'django_q'</span>
<span class="p">]</span>
<span class="c1"># Configure your Q cluster</span>
<span class="c1"># More details https://django-q.readthedocs.io/en/latest/configure.html</span>
<span class="n">Q_CLUSTER</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"shop"</span><span class="p">,</span>
<span class="s2">"orm"</span><span class="p">:</span> <span class="s2">"default"</span><span class="p">,</span> <span class="c1"># Use Django's ORM + database for broker</span>
<span class="p">}</span>
</code></pre></div>
<h2>Apply migrations</h2>
<p>Once this is done, we need to run our database migrations to create the tables that Django-Q needs:</p>
<div class="highlight"><pre><span></span><code>./manage.py migrate
</code></pre></div>
<h2>Create a task</h2>
<p>Next we need to create the task function that will be called every minute. I've decided to put mine in a <code>tasks.py</code> module. You can see below that there's nothing special about this - just a plain old Python function.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># discounts/tasks.py</span>
<span class="k">def</span> <span class="nf">delete_expired_discounts</span><span class="p">():</span>
<span class="sd">"""</span>
<span class="sd"> Deletes all Discounts that are more than a minute old</span>
<span class="sd"> """</span>
<span class="n">one_minute_ago</span> <span class="o">=</span> <span class="n">timezone</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">timezone</span><span class="o">.</span><span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">expired_discounts</span> <span class="o">=</span> <span class="n">Discount</span><span class="o">.</span><span class="n">objects</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span>
<span class="n">created_at__lte</span><span class="o">=</span><span class="n">one_minute_ago</span>
<span class="p">)</span>
<span class="n">expired_discounts</span><span class="o">.</span><span class="n">delete</span><span class="p">()</span>
</code></pre></div>
<h2>Create a schedule</h2>
<p>Now that we have a task ready to run, we need to add a scheduled task to the database. We can do this on the admin site at <code>/admin/django_q/schedule/add/</code>, or we can create and save a Schedule instance (<a href="https://django-q.readthedocs.io/en/latest/schedules.html">docs here</a>) using the Django shell:</p>
<div class="highlight"><pre><span></span><code>./manage.py shell
from django_q.models import Schedule
Schedule.objects.create<span class="o">(</span>
<span class="nv">func</span><span class="o">=</span><span class="s1">'discounts.tasks.delete_expired_discounts'</span>,
<span class="nv">minutes</span><span class="o">=</span><span class="m">1</span>,
<span class="nv">repeats</span><span class="o">=</span>-1
<span class="o">)</span>
</code></pre></div>
<h2>Run the scheduler</h2>
<p>Finally, we need to run the Django-Q process. When using Django, you will usually have one process that is responsible for serving web requests and a separate one that takes care of processing tasks. During local development, these two processes are:</p>
<ul>
<li>web requests: <code>./manage.py runserver</code></li>
<li>async tasks: <code>./manage.py qcluster</code></li>
</ul>
<p>So if you don't run the qcluster management command, the scheduled task will never run. To get this process started, open a new terminal window start the Django-Q cluster via the Django management script:</p>
<div class="highlight"><pre><span></span><code>./manage.py qcluster
</code></pre></div>
<p>Now you should see your scheduled task processing in the console output:</p>
<div class="highlight"><pre><span></span><code>12:54:18 [Q] INFO Process-1 created a task from schedule [2]
</code></pre></div>
<p>You can also see what's going on in the Django admin site at <code>/admin/django_q/</code>.</p>
<p>...and that's it! You can now run scheduled tasks in Django.</p>