Bagatto, a New Static Site Generator

2020-12-10

The last few weeks I’ve been in a bit of a ferment of inspiration. The output of my fermentation has been Bagatto, a static site generator written in the Lisp Janet.

To the uninitiated: an SSG is a program for building websites. You maintain a bunch of source files—blog posts written in Markdown, for instance—and run your SSG to create a bunch of HTML out of them. You upload the html somewhere and that’s the website.

Here’s the source of this blog: https://git.sr.ht/~subsetpark/subsetpark

Background

Before it occurred to me to do this I was using LambdaPad, an SSG written in Erlang. The idea with LambdaPad is, you maintain an Erlang source file, index.erl, that defines your site in terms of a couple simple data structures. When you run LambdaPad, it will evaluate your index file and interpret the values it defines as instructions for generating your site. For the truly interested, you can view the LambdaPad version of this same site here.

This is a very appealing model to me, for two main reasons: 1) you don’t need to learn a new configuration language, 2) you do get to take advantage of a whole programming language in defining your site.

To take literally the first example that springs to mind, if we look at the quickstart for Hugo, a very popular program in Go for doing much the same thing, we see that one does not actually write much Go when defining a site. This sort of thing has always struck me as a bit odd; Hugo is identified very much with the language its written in—and generally SSGs are: Jekyll for Ruby, Pelican for Python, et cetera—and yet the process of building a site in Hugo involves writing to a config.toml file and running hugo new commands.

There’s nothing inherently wrong with this approach. Hugo is no doubt a robust and reliable program, and anyone who doesn’t know a programming language, or doesn’t want to do programming when they build their website, will benefit greatly from its design. I simply don’t fall into that camp. I hate writing config files, I like programming, but more importantly, I really like the fact that I can use a programming language to make it easier to write a website that does what I want.

The notes section of this website, for instance, is also statically generated from source Markdown. But it goes through a layer of processing in the journey from Markdown to HTML, with notes being collated and cross-referenced to other notes based on their contents. In Erlang, this is accomplished by running some regexps on the note text and replacing the content strings with other Markdown links. In other words, a little basic programming.

Nevertheless, there were a few itches that had grown over my time using LambdaPad. What’s interesting is that they all derived from the same sort of thing I’m discussing above. Even though LambdaPad is much more code-oriented than config-oriented, it still relies on indirect action in a number of places. Each of these places offers a degree of convenience and simplicity, but eventually became a stumbling block.

LambdaPad, for instance, looks for a file called index.erl in the current directory. This is convenient because you can simply run lpad-gen in the current directory to build everything. Unfortunately, it presents an artificial constraint on organizing your code. If you want to manage your source files with rebar3 you’ll want to have your Erlang source in src/, which means rebar3 and LambdaPad are in conflict.

Similarly, we can see that the data specification in my LambdaPad index consists of lines like this:

notes => {markdown, "notes/*.md"},

That’s a key-value pair mapping an atom to a tuple of an atom and string. This is concise and easy to write. But it also doesn’t actually do anything; it’s just a couple literal terms. The actual mechanism of loading source files exists entirely inside of the LambdaPad package. I pass it a tagged value, giving the name of the data loader I want it to use, and everything else happens under the covers.

Thus, the moment that I was reading on Lobsters about AsciiDoc and wanted to look into how much lift it would be to use that instead of Markdown. As it happens, I couldn’t, really. The output of the data/1 function in a LambdaPad module is not the source data to generate the site; it’s a DSL constituting instructions for generating that source data.

Principles for a new SSG

A philosophical orientation is beginning to form here. If we’re going to write yet another Static Site Generator—and god knows, there are so many of them—can we orient ourselves around a pervasive sense of transparency? That is, to the greatest extent, how can we expose to a site author a programming environment where the inputs and the outputs of the system as a whole, and of each individual step, are entirely inspectable, observable, and extensible?

Arguably, to do so, we need to develop some sort of theory of what the inputs and the outputs of an SSG are. So let’s say:

The inputs to an SSG are a heterogeneous collection of source data files. Many of these will be long-form posts or articles, but even those are a special case of a more general bag of attributes (like a JSON file containing the title and author’s name of your site).
The output of an SSG is a list of files to be generated. A file-to-be-generated consists of exactly two things: the path of the new file, and the contents of the new file. This is an important and pleasing simplification: in a website, there are no such things as posts, or pages, or indices. There are only (file path, file contents) pairs. If the domain model of your site can’t be reduced to one of those, it won’t exist in your website.

Functions

At this point the way forward feels pretty clear. We have two phases—the input phase and the output phase—and we know what kind of data each one should contain. If I want to provide that data as a site author, in a way that allows me to conveniently inspect and arbitrarily transform what I’m providing, the simplest method is with ordinary functions.

This is the departure from the model I had been using in LambdaPad. Even though my index.erl is an Erlang file, if I were to open it in the Erlang shell and invoke my data/1 function, I’d just get some dead terms out. On the other hand, if the terms in my data/1 were functions which output the actual site data, then at any point I could run just those functions and see exactly what the outputs would be. And I could trivially wrap those functions in any other business logic to arbitrarily transform them as I needed.

The same principle goes for the output phase. If the return value of my site/1 function¹ is atom and tuple values constituting a configuration language, describing which built-in file-generation tools should be called and with what inputs, then the actual output of those tools is obscured from me until it shows up in my file tree. But passing in callable, wrappable functions means that there’s no part of the process that’s off-limits.

In other words: instead of providing a configuration language, we should provide a standard library. A collection of functions with a few well-defined signatures that can be easily composed and extended. And we should ensure that the signatures are transparent and simple enough that it’s trivial to write new ones that do new things.

As an example: we saw above how to specify input files and their parsers in LambdaPad: {markdown, "notes/*.md"}. We are locked out from adding an asciidoc parser.

On the other hand, if we imagine that the Markdown parser were a function, specified directly, that looked like this:

parse_markdown(Contents, Attributes) ->
  Metadata = read_front_matter(Contents),
  maps:merge(Attributes, Metadata).

And were specified like this:

notes => {
  loader => load_glob("notes/*"),
  parser => fun parse_markdown/2
},

Then we could write a new AsciiDoc parser which ran Asciidoctor on Contents and merged the resulting metadata, and specify that directly as the value for parser.

Bagatto

This is the basic approach I’ve taken when building Bagatto: a new Static Site Generator written in Janet, which interprets a Janet file much in the same way that LambdaPad interprets Erlang.

For instance, here’s the equivalent specification for the Notes section of this blog:

:notes {
  :src (bagatto/slurp-* "notes/*.md")
  :attrs notes/parse-note
  :transform (bagatto/attr-sorter "topic")
}

notes/parse-note is a function. bagatto/slurp-* and bagatto/attr-sorter are both provided as a part of the Bagatto “standard library”, but they themselves are higher-order-functions which return other functions. So any of these values can be directly evaluated and inspected in a REPL, or wrapped to transform their output.

On Lisp

I find Janet to be tremendously well-suited to the task at hand. There are a couple reasons. Maybe the simplest is speed: unlike Erlang, there’s no heavy VM or runtime to load, so Bagatto starts up very quickly. Important for a command-line application.

But Janet is also a Lisp, and Lisps tend to be very good at interpreting themselves without having to do too much sleight-of-hand. So it’s a natural fit for any application model where you write a program to make it run, and the application itself becomes a slightly specialized interpreter.

It was very important to me that the author be able to assume, in the greatest number of cases, that their Janet index file would behave identically to a normal Janet program under interpretation. This extends to things like being able to manage external dependencies with jpm, and structure one’s modules as one would structure any other Janet package. The inherent availability of the compilation apparatus at all times makes this much more feasible.

Temple, the Janet template language

At first I was a bit disappointed to see the state of HTML templating in Janet. I had rather hoped to be able to use something like Jinja or Django templates, as that is what I’d used in the past and I wanted Bagatto to be transparent and agnostic, and not to ipose its own conventions or DSLs on a site author. However, those aren’t really available in a native way. There’s musty, which is a partial version of Mustache, and there’s Temple, which is Janet specific.

Django templates, Jinja, ERB, EEX

However, here’s the thing about something Django templates or Jinja: they’re actually awful. Here’s an example of a Django template I was using for notes:

<h1>{{ note.topic }}</h1>
{{ note|with_hyperlinks:all_notes|markdown_to_html }}
    
<hr>
{% related_notes note all_notes %}

with_hyperlinks and related_notes are functions I defined in my index.erl, so it’s tremendously useful to be able to call your native code from the template. But what’s with the syntax there? Pipes I understand, though obviously that’s a shell-ism that has little to do with Erlang or HTML. But why does with_hyperlinks take an argument with a : while related_notes takes arguments with spaces? Maybe something to do with {{ }} vs {% %}? Truly I don’t know. Every Django template I’ve ever written has been write-once. And of course, being able to call native functions is not the same as having native syntax; Django templates, or Jinja templates for that matter, expose their own limited control flow primitives, for and suchlike. This is another element that I have found both underpowered and nearly impossible to remember.

An alternative approach is presented by template languages like ERB (I’m more familiar with the Elixir version, but it’s clearly a descendent of ERB, so I’ll refer to that one instead). Here’s a snippet:

<% unless @keys_trusted.empty? -%>
trustedkey <%= @keys_trusted.join(' ') %>
<% end -%>

Unlike Django templates, ERB does expose the full syntactic power of Ruby in your template. This is wonderful. On the other hand, this additional power and intermingling of syntaxes means that the escaping language becomes much more complex.

Now, again: I have been writing EEX (which operates along similar principles) off-and-on for more than two years. I barely understand the difference between <% %> and <%= %>. The latter inserts text into the template and the former doesn’t, but that understanding doesn’t prevent me from constantly failing to understand what my templates are doing. I lay much of the blame for this on the shoulders of this family’s intermingling of escaped and plaintext within a single syntactic construct. You can see that above: even though unless and end form the two ends of a single syntax block, they are both contained in separate angle-brackets tags and there’s both untemplated text as well as a separate template tag between them. This sort of thing makes my head hurt.

Temple

So when I realized that Temple works differently, I was excited. The requirement to learn a new template language is well worth it because this one is much better than those.

Here’s the equivalent to the notes snippet above in my Temple version:

<h1>{{ (get-in args [:_item "title"]) }}</h1>
{- (-> (get-in args [:_item]) 
       (with-hyperlinks (args :notes)) 
       (bagatto/mmarkdown->html)) -}

{% (def related-notes (all-related 
                        (args :notes) 
                        (args :_item)))

   (unless (empty? related-notes) 
    (print "<hr>"))

   (print-related 
    related-notes 
    (args :_item) 
    (args :root)) %}

We can see that the <h1> tag has the same simple {{ }} interpolation. The value of the expression between the curly braces is interpolated directly into the HTML. However, we get to use the exact same Janet syntax rather than this quasi-Python-syle interloper.

However. Let’s turn to the bottom of the template where we have a more complicated piece of logic. It too uses native Janet syntax. But it is crucially different from ERB. In this piece of Janet, everything is contained within a single {% %} delimiter. Syntactically it is whole and easy to read. You could copy it out of this file and into the interpreter and it would parse perfectly. That’s because Temple doesn’t need to constantly flip back and forth from tag to text in order to interpolate text from within Janet code. It simply uses print.

print! When Temple evaluates a template, it inserts everything into the template that’s written out to stdio! What a brilliant idea! I immediately understand what is being output and what isn’t. And I don’t need to break up my syntax into unconnected blocks.

What a pleasure. Having rewritten all my Django templates into Temple, I have such a greater degree of confidence in them. I can update them much more easily. I wish I could do the same for my EEX templates, but I think my coworkers would object.

Transparent, extensible

I hope I’ve done a good enough job of communicating the values and philosophy of Bagatto. I feel quite good about it. I feel more productive using it, and I feel that it is a faithful expression of the rationale that I’ve laid out here. Perhaps you might feel the same.

LambdaPad expects the presence of two functions in your index.erl: data/1 and site/1. These are expected to return, respectively, the specifications for the input to the site generation phase and the output of the site generation phase (ie, the generated files and their contents). ↩

Built with Bagatto.