How this blog is built

Published: 2018-10-13 19:36:59 -0700

I’m pretty happy with how this blog has turned out. I wanted to explain how I built it. Maybe you’ll find my approach interesting :-)

The Markdown and Slim content

The first step of my blog system is generating the static HTML content. I write most of the blog content in Markdown and Slim templates. It looks kinda like this:

h1.mb-0
  = meta.title

small.text-muted
  = "Published: " + meta.publication_time.to_s

markdown:
  I'm pretty happy with how this blog has turned out. I wanted to
  explain how I built it. Maybe you'll find my approach interesting :-)

  ## The Markdown and Slim content

  The first step of my blog system is generating the static HTML
  content. I write most of the blog content in
  [Markdown](https://en.wikipedia.org/wiki/Markdown) and
  [Slim](http://slim-lang.com/) templates. It looks kinda like this:

  ```markdown
  h1.mb-0
    = meta.title

  small.text-muted
    = "Published: " + meta.publication_time.to_s

  markdown:
    I'm pretty happy with how this blog has turned out. I wanted to
    explain how I built it. Maybe you'll find my approach interesting :-)

Did I just inception you? You can see the whole thing here.

Ruby scripts to generate HTML

I have a bunch of Ruby scripts that will turn this Slim/Markdown content into the HTML you see here before you. I won’t bore you with all the details of these scripts. They’re checked into my scripts/ directory: you can read them if you like.

Here’s a summary though:

A typical entry directory: entries/2018-10-13
- The .slim file is the content.
- The .yaml is metadata about the blog entry: the title, the publication date, et cetera.
src/entry.rb
- This does all the work of rendering an entry.
- The Entry#full_content method reads and renders the default entry template.
- It calls the Entry#content method to render the entry content itself.
- The final HTML content is then written into a file like dist/2018-10-13/how-this-blog-is-built.html.
scripts/build_entries.rb is the simple script that goes through each entry and builds the HTML for it.
scripts/serve.rb runs a simple WEBrick HTTP file server on localhost so I can see the content as I write it.

Watching for changes to rebuild

Here’s one part I’m proud of. I don’t want to manually run ./scripts/build_entries.rb every time I make a change to an entry’s Markdown content. I want to write Markdown in VSCode, save the .slim file, and then view it immediately in Firefox without having to explicitly go to the terminal to run ./scripts/build_entries.rb.

This is like in Rails or with Webpack, where you can change some code, then go to the browser and hit refresh. I want that. In the past, when I had a similar blog setup, I didn’t have this functionality. It was a major pain point.

To solve this problem I wrote a scripts/watch.rb script. Here it is:

# ./scripts/watch.rb
#!/usr/bin/env ruby

require 'open3'

`./scripts/build_all.rb`
Open3.popen3("fswatch ./entries ./templates") do |stdin, stdout, stderr, status, thread|
  while changed_fname = stdout.gets
    puts "Changed: #{changed_fname}"
    `./scripts/build_all.rb`
  end
end

This script uses the open3 lib to start a terminal command called fswatch. You can give fswatch a list of directories, and it will watch for changes being made to any files in those directories. It will print out the name of any file that changes. It will keep doing this until you kill fswatch.

The inner loop of my Ruby script constantly tries to read from the fswatch command. Whenever fswatch does detect a change and output a file name, the scripts/watch.rb script will invoke a rebuild of the blog entries.

This is somewhat wasteful, because I don’t rebuild only the specific entry that has been changed. I rebuild all the entries. At some point this will begin to feel too slow (as I accumulate more blog posts), but for now it is fine.

I’m super happy with this system. I can now simply type up Markdown. Save. Command-tab to Firefox and refresh. Review my blog entry. Repeat.

Assets

That templates/default.slim template shows all the tricks and libraries I use.

I use Bootstrap 4.
I use Google fonts (EB Garamond for now).
I use highlight.js to highlight any code blocks on the page.
I use MathJax to render equations like this cool one: e^πi = − 1.
I have a few custom CSS styles in assets/styles.css.

I do have one last trick, which is the JavaScript I wrote to let you write me comments. I’ll talk about that later!

Github Pages

I’ve shown you a built entry file: dist/2018-10-13/how-this-blog-is-built.html. But how are you reaching this HTML file right now?

I use Github Pages. This makes it really easy to host static content like this blog. I simply check in all the dist/ files into my repo (along with everything else). I push to the github repository.

In the “Settings” tab for the repository, you can select to publish your repository as a Github page. That means that if dist/2018-10-13/how-this-blog-is-built.html is a file in my repository, then you can access it at ruggeri.github.io/dist/2018-10-13/how-this-blog-is-built.html.

Github will serve any kind of file checked into the github repo. So it will serve my stylesheets and JavaScript too. You can even access the original Slim templates at ruggeri.github.io/entries/2018-10-13/how-this-blog-is-built.slim if you really wanted to for some reason.

The principle is simple: if it’s checked into the Githbub repository, it’s being served by a Github HTTP server.

Setting up a custom CNAME

This website is blog.self-loop.com. Not ruggeri.github.io.

If you try to access any of the content on ruggeri.github.io it will push you to the same path but at the URL blog.self-loop.com. I own this domain name.

I use AWS Route 53 as my domain name host. They have a pretty convenient interface for setting up DNS records. I use a lot of AWS services.

I setup a CNAME record to tell anyone on the internet that blog.self-loop.com is a synonym for ruggeri.github.io. So when your browser looks up the domain blog.self-loop.com, it is told that it should lookup the IP address for ruggeri.github.io, and use that as the IP address also for blog.self-loop.com.

Github runs the webserver that is responding to your HTTP request. It looks for the Host: blog.self-loop.com HTTP request header. If it sees this Host header, Github serves the content.

If instead you were to try to go directly to ruggeri.github.io, then the header would read Host: ruggeri.github.io. Github would see this and issue a redirect to the appropriate path on blog.self-loop.com.

This is nice of Github. I kinda expected that Github would treat both domains equally, because that seems simpler. It’s nice to know that everything will work fine if someone goes to ruggeri.github.io, AND that they will be told about the preferred domain name (blog.self-loop.com).

Special `CNAME` file: Domain name resolution!

Setup note: Github will want you to create a special CNAME file in the root of your repository so that it knows what domain name you will use for your website. You can see mine here. Why do you need this CNAME file?

Github probably is serving lots of Github Pages from the same server. That is: the IP address for ruggeri.github.io is not unique to me. Lots of people have Github Pages hosted by the same web server on the same IP address. Therefore, the Host header is necessary information for disambiguating whether a client is asking for user ruggeri’s Page, or for user markov’s page or whoever.

If someone makes a request with the host ruggeri.github.io, Github’s server knows this means the repository github.com/ruggeri/ruggeri.github.io.

But unless I tell Github that I am using blog.self-loop.com, it won’t know that people asking for Host: blog.self-loop.com want the github.com/ruggeri/ruggeri.github.io content.

Let’s work backward through the domain name resolution process.

The client is contacting Github’s server on Github’s IP address. They found this IP address because of an A record for ruggeri.github.io. (An A record maps a domain name to an IP address). The client looked up the A record because they had previously tried to resolve blog.self-loop.com. They had found the CNAME record that told the client to lookup ruggeri.github.io.

This is how the client learns the IP address to contact. But when the client contacts Github’s HTTP server, they don’t say anything about how they found Github’s IP address (via the ruggeri.github.io A record). The client only gives the Host: blog.self-loop.com header.

And this is why you need to tell Github about your domain name blog.self-loop.com. You can’t only create the CNAME record with AWS Route 53.

Next time: commenting

This post is getting long, so I’ll dedicate another post to the only dynamic part of this website: the comments. This is maybe the most exciting part. I’ll give you a preview:

I store the comments in AWS DynamoDB (a NoSQL, schemaless database).
I use AWS Lamdba to run some very simple Python code that fetches and stores comments from DynamoDB.
I use AWS API Gateway to create some HTTP endpoints that invoke the Lambda code.
I wrote some React code to fetch and display the comments. I use the Formik library for the React comment form.
Last, I Webpack to compile all my JavaScript code. The compiled JavaScript gets checked into Github like everything else.

The end!

Thanks for reading all this! I hope it was fun :-)

<< Back To The Posts Index! <<

self-loop