A Specification of a Note-Taking Program

2022-02-19

I recently watched Conrad Barski’s video introducing zek, a program he wrote to take Zettelkasten-style notes on the command line.

This is the perfect spur to a programming project: another, already perfectly fine, program that does almost what you want but not exactly. In my case, I was intrigued by the idea of a Zettelkasten-style note-taking app on the command line, but the whole line-editor approach seemed to me to be a bridge too far. So of course I started thinking about how I’d write my own.

I came up with an idea that I like, and that is extremely simple, deferring as much as possible to my text editor, and could conceivably be written as a single shell script calling widely-available tools.

I thought about it during a couple showers, and then I wrote a Pantagruel spec for what I came up with. I reproduce that here.

🙟

module ZK.

Notes on a note-taking system, inspired by Zettelkasten.

Line ⇐ String.
Note ⇐ [Line].

The basic operation we’re describing is indexing: taking all the notes in our set and updating them with references to other notes. That’s all this system does; the notes themselves can be written and read with a standard text editor, and most text editors have a “go to file” procedure, so even the navigation from note to note can be done within the editor.

The only thing it needs help with is finding the connections between notes.

index notes: [Note].

Indexing does two things:

It updates each note, in place. All backlinks to each note are placed at the beginning of it, and all references contained within it are placed at the end.

all n: notes ⇒
nʹ = backlinks n + body n + references n
and name nʹ = name n.

It maintains an Index. Indices aren’t notes - they aren’t affected by indexing, in particular. They are simply a list of links to each note in the set, ordered by creation time.

some1 i: Index ⇒
   all n: notesʹ ⇒ ref-note n in i
   and all (n, m): notesʹ ⇒
      created-at n < created-at m → i (ref-note n) < i (ref-note m).

;

Index ⇐ [Line].

A Reference is a line of text that’s read and written by the system. Both backlinks and (forward) references are simply collections of References.

Reference ⇐ Line.
backlinks n: Note ⇒ {Reference}.
references n: Note ⇒ [Reference].

name n: Note ⇒ String.
body n: Note ⇒ [Line].
created-at n: Note ⇒ Date.

ref s: String ⇒ Reference.
ref-note n: Note ⇒ Reference.

To refer to a given name, we simply insert a line with a recognized string in front of it, followed by the name. For instance: %ref:scifi\ authors. In particular, it’s important to escape any delimiters in the name; this is to allow the “go to file under cursor” command to recognize the whole name rather than the first part, in the case of names with spaces.

ref s = %ref: + escape s + \n.

It follows then that a reference to a note is simply a reference to its filename.

ref-note n = ref (name n).

body n = [all line: n, ~(line in Reference) ⇒ line].

As mentioned above, to index a note is to take the note body, prepend all of the backlinks to that note, and append all the references to other notes (or other potential notes) contained in the body. A backlink is a reference to any note that links to this one; a (forward) reference is a reference to any note which is linked to by this one.

backlinks n = {all m: Note, (bracketed (name n)) in m ⇒ ref-note m}.
references n = [all s: String, (bracketed s) in n ⇒ ref s].

It’s worth noting that references can be ordered as they appear in the note body; but backlinks have no defined order.

all (r, s): references n ⇒
n (bracketed (name r)) < n (bracketed (name s)) → references r < references s.

;

To link to any name, simply enclose it in square brackets in the body of a note.

escape s: String ⇒ String.
bracketed s: String ⇒ String.

bracketed s = [ + s + ].

🙟

I invite you to read the above; hopefully you like the idea of such a program. I intend to write it. Hopefully, in fact, you could write it too; hopefully the spec is descriptive enough that we could write two programs which did the same thing.

There’s English prose in it, of course, and no doubt you’d refer to that when you wrote it. And it doesn’t constitute the entire documentation for such a program; there’s no description of how the thing is intended to be used. And there’s certainly no description of the rationale for the program or its values.

The rationale is this: I very much like the core idea of the note system, as it’s presented in zek: a flat directory of text files, linked to other ones (a Wiki, so far), and comprehensively and automatically backlinked as well (not so much a Wiki); and a time-ordered index to bind them all together.

But in my mind, vim is already a wonderful tool for writing text. There’s absolutely no need to reimplement a text editor if text files are your medium of choice. And with gf, it’s not a half-bad one for navigating the links either.

What’s left is actually quite small. You simply need a program that will scan all the text files in a directory, find all the [square brackets references], create a quasi-organized list of filenames corresponding to those square brackets as metadata in the files themselves, and create a quasi-organized list of all the filenames containing references to this file in them too. You can do all that with awk. The distinctive %ref: syntax is just to make it easy to filter those lines out when you generate a new generation of the file.

Nevertheless, I do think there’s a lot of worth in actually writing a specification of some sort, as I’ve done above.

One reason is: there’s an entire largely-unpracticed art of describing programs rather than writing them. The most successful example that comes to mind is redo, as conceived by the brilliant D. J. Bernstein. redo is a program that Bernstein didn’t (or we might say, didn’t bother to) write; he merely described a hypothetical program with enough clarity that there are at least two existing implementations of the system, written by others. I like this idea; it would tickle me pink to see others write the program that I’ve described and compare it to the program in my head.

Another, though, is that I maintain there’s enormous worth in thinking critically and deeply about a program before one writes it. There is an extent to which idle shower thoughts fulfill this role. But there are other, deeper ways to think about parts of a program like its core primitives, its vocabulary, its invariants, its edge cases, its sequence of operations, and so on, and we rarely as a field avail ourselves of them.

There’s the whole subfield of formal methods, of course, but I think you’ll agree that the above document is business casual methods at best. This is Formal Methods Ultralite, and I think it has some worth; in fact I think part of its worth is how lightweight it is. I knocked that out in an hour.

And yet: I understand the program more deeply now than when I wrote it.

The first step is, add the English last. The English is for the reader, but it’s best written only after one is confident that one fully understands what they’re about to describe. Otherwise, natural language has a tendency to infect thought; because we have a name or a phrase to encompass something, we think we understand it; but natural language is full of ambiguities, pronouns without clear referents, double meanings, unclear phrasing, and more.

Indeed, the style that Pantagruel encourages is a naturally explanatory one. I start by describing, quasi-formally, the behaviour of the program in a “sentence” or two. In doing so, two things happen:

The formality begins to work its magic. Things that are easy to express in English are harder to express in the sentence structures of first-order logic, and that’s often because the English thoughts refer to things without introducing them, and the like. So I have to introduce things and then I realize I haven’t thought through all of their contours.
I introduce some new terminology. To explain my program at a high level, I necessarily refer to certain other qualities or concepts. When I run pant, I get this error message:
error: Unglossed symbols:
name, references, created_at, body, backlinks, ref_note, Index
So, this is the next layer down. These are the concepts I must elaborate on; left to my own devices, I might have remembered to explain some of them, but probably not all.

So I introduce a new chapter and continue the process. I recurse, until I hit the leaves. That is: I run pant and I see no error message. There are no terms left to explain. I’m sure I don’t need to mention that it is through the act of explanation that we stumble on every notion we don’t actually understand, even though we think we did, and are forced to clarify it for ourselves, and in doing so are often shown exactly which two concepts are subtly incompatible, which definition is used differently in different places, and so forth.

All of this without writing any code. All, in fact, without even having decided what language I’ll use to write it. I say this not in the sense of “it’s just that easy!” I am not a believer in the no-code revolution. I like code. Indeed, that’s the problem: code is powerful and seductive and has its own gravity. Putting ideas into code makes them real and gives them an inertia, even if they’re the wrong ideas. Thinking about things beforehand is a valuable practice. It’s valuable because it can be a much higher-leverage way to figure them out than to write a whole program until you realize you didn’t understand the assignment; it’s also valuable because they remain ideas. And in a language like Pantagruel, they’re short little sentences, little bits of pseudo-math. They’re easy to edit. They’re easy to type, to write on a whiteboard or notebook, and easy to erase and change.

Update: Fri, Feb. 25, 2022

Since I wrote the above, no less than two implementations of the spec have been produced, only one of which by me.

I wrote a program in Zig that I’m calling Erasmus. This is my first Zig ever, but it seems like a reasonable application for it. If only for the specific reason that basically, everything it does can be done by a shell script calling awk. So if I’m going to write my own program, it can’t be tremendously slower than Awk.

In addition to that, George Pollard wrote a (partial) implementation in Haskell up until where he left off, the two implements are equivalent!

Built with Bagatto.