2020-12-10
The last few weeks I’ve been in a bit of a ferment of inspiration. The output of my fermentation has been Bagatto, a static site generator written in the Lisp Janet.
To the uninitiated: an SSG is a program for building websites. You maintain a bunch of source files—blog posts written in Markdown, for instance—and run your SSG to create a bunch of HTML out of them. You upload the html somewhere and that’s the website.
Here’s the source of this blog: https://git.sr.ht/~subsetpark/subsetpark
Before it occurred to me to do this I was using LambdaPad, an
SSG written in Erlang. The idea with LambdaPad is, you maintain an
Erlang source file, index.erl
, that defines your
site in terms of a couple simple data structures. When you run
LambdaPad, it will evaluate your index file and interpret the values
it defines as instructions for generating your site. For the truly
interested, you can view the LambdaPad version of this same site
here.
This is a very appealing model to me, for two main reasons: 1) you don’t need to learn a new configuration language, 2) you do get to take advantage of a whole programming language in defining your site.
To take literally the first example that springs to mind, if we look
at the quickstart for Hugo, a very popular program in Go for
doing much the same thing, we see that one does not actually write
much Go when defining a site. This sort of thing has always struck me
as a bit odd; Hugo is identified very much with the language its
written in—and generally SSGs are: Jekyll for Ruby, Pelican for
Python, et cetera—and yet the process of building a site in Hugo
involves writing to a config.toml
file and running hugo new
commands.
There’s nothing inherently wrong with this approach. Hugo is no doubt a robust and reliable program, and anyone who doesn’t know a programming language, or doesn’t want to do programming when they build their website, will benefit greatly from its design. I simply don’t fall into that camp. I hate writing config files, I like programming, but more importantly, I really like the fact that I can use a programming language to make it easier to write a website that does what I want.
The notes section of this website, for instance, is also statically generated from source Markdown. But it goes through a layer of processing in the journey from Markdown to HTML, with notes being collated and cross-referenced to other notes based on their contents. In Erlang, this is accomplished by running some regexps on the note text and replacing the content strings with other Markdown links. In other words, a little basic programming.
Nevertheless, there were a few itches that had grown over my time using LambdaPad. What’s interesting is that they all derived from the same sort of thing I’m discussing above. Even though LambdaPad is much more code-oriented than config-oriented, it still relies on indirect action in a number of places. Each of these places offers a degree of convenience and simplicity, but eventually became a stumbling block.
LambdaPad, for instance, looks for a file called index.erl
in the
current directory. This is convenient because you can simply run
lpad-gen
in the current directory to build
everything. Unfortunately, it presents an artificial constraint on
organizing your code. If you want to manage your source files with
rebar3 you’ll want to have your Erlang source in src/
, which
means rebar3 and LambdaPad are in conflict.
Similarly, we can see that the data specification in my LambdaPad index consists of lines like this:
notes => {markdown, "notes/*.md"},
That’s a key-value pair mapping an atom to a tuple of an atom and string. This is concise and easy to write. But it also doesn’t actually do anything; it’s just a couple literal terms. The actual mechanism of loading source files exists entirely inside of the LambdaPad package. I pass it a tagged value, giving the name of the data loader I want it to use, and everything else happens under the covers.
Thus, the moment that I was reading on Lobsters about AsciiDoc
and wanted to look into how much lift it would be to use that instead
of Markdown. As it happens, I couldn’t, really. The output of the
data/1
function in a LambdaPad module is not the source data to
generate the site; it’s a DSL constituting instructions for generating
that source data.
A philosophical orientation is beginning to form here. If we’re going to write yet another Static Site Generator—and god knows, there are so many of them—can we orient ourselves around a pervasive sense of transparency? That is, to the greatest extent, how can we expose to a site author a programming environment where the inputs and the outputs of the system as a whole, and of each individual step, are entirely inspectable, observable, and extensible?
Arguably, to do so, we need to develop some sort of theory of what the inputs and the outputs of an SSG are. So let’s say:
The inputs to an SSG are a heterogeneous collection of source data files. Many of these will be long-form posts or articles, but even those are a special case of a more general bag of attributes (like a JSON file containing the title and author’s name of your site).
The output of an SSG is a list of files to be generated. A
file-to-be-generated consists of exactly two things: the path of the
new file, and the contents of the new file. This is an important and
pleasing simplification: in a website, there are no such things as
posts, or pages, or indices. There are only (file path, file contents)
pairs. If the domain model of your site can’t be reduced
to one of those, it won’t exist in your website.
At this point the way forward feels pretty clear. We have two phases—the input phase and the output phase—and we know what kind of data each one should contain. If I want to provide that data as a site author, in a way that allows me to conveniently inspect and arbitrarily transform what I’m providing, the simplest method is with ordinary functions.
This is the departure from the model I had been using in
LambdaPad. Even though my index.erl
is an Erlang file, if I were to
open it in the Erlang shell and invoke my data/1
function, I’d just
get some dead terms out. On the other hand, if the terms in my
data/1
were functions which output the actual site data, then at any
point I could run just those functions and see exactly what the
outputs would be. And I could trivially wrap those functions in any
other business logic to arbitrarily transform them as I needed.
The same principle goes for the output phase. If the return value of
my site/1
function1 is atom and tuple values constituting
a configuration language, describing which built-in file-generation
tools should be called and with what inputs, then the actual output of
those tools is obscured from me until it shows up in my file tree. But
passing in callable, wrappable functions means that there’s no part of
the process that’s off-limits.
In other words: instead of providing a configuration language, we should provide a standard library. A collection of functions with a few well-defined signatures that can be easily composed and extended. And we should ensure that the signatures are transparent and simple enough that it’s trivial to write new ones that do new things.
As an example: we saw above how to specify input files and their
parsers in LambdaPad: {markdown, "notes/*.md"}
. We are locked out
from adding an asciidoc
parser.
On the other hand, if we imagine that the Markdown parser were a function, specified directly, that looked like this:
parse_markdown(Contents, Attributes) ->
Metadata = read_front_matter(Contents),
maps:merge(Attributes, Metadata).
And were specified like this:
notes => {
loader => load_glob("notes/*"),
parser => fun parse_markdown/2
},
Then we could write a new AsciiDoc parser which ran Asciidoctor
on Contents
and merged the resulting metadata, and specify that
directly as the value for parser
.
This is the basic approach I’ve taken when building Bagatto: a new Static Site Generator written in Janet, which interprets a Janet file much in the same way that LambdaPad interprets Erlang.
For instance, here’s the equivalent specification for the Notes section of this blog:
:notes {
:src (bagatto/slurp-* "notes/*.md")
:attrs notes/parse-note
:transform (bagatto/attr-sorter "topic")
}
notes/parse-note
is a function. bagatto/slurp-*
and
bagatto/attr-sorter
are both provided as a part of the Bagatto
“standard library”, but they themselves are higher-order-functions
which return other functions. So any of these values can be directly
evaluated and inspected in a REPL, or wrapped to transform their
output.
I find Janet to be tremendously well-suited to the task at hand. There are a couple reasons. Maybe the simplest is speed: unlike Erlang, there’s no heavy VM or runtime to load, so Bagatto starts up very quickly. Important for a command-line application.
But Janet is also a Lisp, and Lisps tend to be very good at interpreting themselves without having to do too much sleight-of-hand. So it’s a natural fit for any application model where you write a program to make it run, and the application itself becomes a slightly specialized interpreter.
It was very important to me that the author be able to assume, in the
greatest number of cases, that their Janet index file would behave
identically to a normal Janet program under interpretation. This
extends to things like being able to manage external dependencies with
jpm
, and structure one’s
modules as one would structure any other Janet package. The inherent
availability of the compilation apparatus at all times makes this much
more feasible.
At first I was a bit disappointed to see the state of HTML templating in Janet. I had rather hoped to be able to use something like Jinja or Django templates, as that is what I’d used in the past and I wanted Bagatto to be transparent and agnostic, and not to ipose its own conventions or DSLs on a site author. However, those aren’t really available in a native way. There’s musty, which is a partial version of Mustache, and there’s Temple, which is Janet specific.
However, here’s the thing about something Django templates or Jinja: they’re actually awful. Here’s an example of a Django template I was using for notes:
<h1>{{ note.topic }}</h1>
{{ note|with_hyperlinks:all_notes|markdown_to_html }}
<hr>
{% related_notes note all_notes %}
with_hyperlinks
and related_notes
are functions I defined in my
index.erl
, so it’s tremendously useful to be able to call your
native code from the template. But what’s with the syntax there? Pipes
I understand, though obviously that’s a shell-ism that has little to
do with Erlang or HTML. But why does with_hyperlinks
take an
argument with a :
while related_notes
takes arguments with spaces?
Maybe something to do with {{ }}
vs {% %}
? Truly I don’t
know. Every Django template I’ve ever written has been write-once. And
of course, being able to call native functions is not the same as
having native syntax; Django templates, or Jinja templates for that
matter, expose their own limited control flow primitives, for
and
suchlike. This is another element that I have found both underpowered
and nearly impossible to remember.
An alternative approach is presented by template languages like ERB (I’m more familiar with the Elixir version, but it’s clearly a descendent of ERB, so I’ll refer to that one instead). Here’s a snippet:
<% unless @keys_trusted.empty? -%>
trustedkey <%= @keys_trusted.join(' ') %>
<% end -%>
Unlike Django templates, ERB does expose the full syntactic power of Ruby in your template. This is wonderful. On the other hand, this additional power and intermingling of syntaxes means that the escaping language becomes much more complex.
Now, again: I have been writing EEX (which operates along similar
principles) off-and-on for more than two years. I barely understand
the difference between <% %>
and <%= %>
. The latter inserts text
into the template and the former doesn’t, but that understanding
doesn’t prevent me from constantly failing to understand what my
templates are doing. I lay much of the blame for this on the shoulders
of this family’s intermingling of escaped and plaintext within a
single syntactic construct. You can see that above: even though
unless
and end
form the two ends of a single syntax block, they
are both contained in separate angle-brackets tags and there’s both
untemplated text as well as a separate template tag between them. This
sort of thing makes my head hurt.
So when I realized that Temple works differently, I was excited. The requirement to learn a new template language is well worth it because this one is much better than those.
Here’s the equivalent to the notes snippet above in my Temple version:
<h1>{{ (get-in args [:_item "title"]) }}</h1>
{- (-> (get-in args [:_item])
(with-hyperlinks (args :notes))
(bagatto/mmarkdown->html)) -}
{% (def related-notes (all-related
(args :notes)
(args :_item)))
(unless (empty? related-notes)
(print "<hr>"))
(print-related
related-notes
(args :_item)
(args :root)) %}
We can see that the <h1>
tag has the same simple {{ }}
interpolation. The value of the expression between the curly braces is
interpolated directly into the HTML. However, we get to use the exact
same Janet syntax rather than this quasi-Python-syle interloper.
However. Let’s turn to the bottom of the template where we have a more
complicated piece of logic. It too uses native Janet syntax. But it is
crucially different from ERB. In this piece of Janet, everything is
contained within a single {% %}
delimiter. Syntactically it is whole
and easy to read. You could copy it out of this file and into the
interpreter and it would parse perfectly. That’s because Temple
doesn’t need to constantly flip back and forth from tag to text in
order to interpolate text from within Janet code. It simply uses
print
.
print
! When Temple evaluates a template, it inserts everything into
the template that’s written out to stdio! What a brilliant idea! I
immediately understand what is being output and what isn’t. And I
don’t need to break up my syntax into unconnected blocks.
What a pleasure. Having rewritten all my Django templates into Temple, I have such a greater degree of confidence in them. I can update them much more easily. I wish I could do the same for my EEX templates, but I think my coworkers would object.
I hope I’ve done a good enough job of communicating the values and
philosophy of Bagatto. I feel quite good about it. I feel more
productive using it, and I feel that it is a faithful expression of
the rationale that I’ve laid out here. Perhaps you might feel the
same. LambdaPad expects the presence of two functions in your
index.erl
: data/1
and site/1
. These are expected to return,
respectively, the specifications for the input to the site
generation phase and the output of the site generation phase (ie,
the generated files and their contents). ↩