Array logic operates by a
fillalgorithm. Fill the ds with the smaller rank to the be the same shape as the ds with the larger.
I recently released—so to speak—a new project that has occupied the last several weeks of my free time: a calculator program, called EC.
I decided to build it for a few reasons. Probably the first reason is that it seemed to be a good use-case for the project I had been working on for the prior few weeks: Fugue, an object system for the Janet programming language. I had settled on the basic approach and feature set, and built a little proof of concept, but obviously if I was going to herald this as the spiritual successor to CLOS it would behoove me to put it through its paces more substantially.
A little desk calculator struck me as a nice, mid-sized project, with enough potential for object-orientation that it could potentially benefit from what Fugue offers. It’s quite difficult to think of useful software to write; it’s much easier to think of useful software libraries, that will help other people make useful software.
After I thought of it, it seemed like I could create a sequel to a previous project of mine,
ad1. I use
ad all the time whenever I need to do one-off arithmetic, so I actually have a vague sense of what I’d like to add to it.
The J language, of which I am a great admirer, though essentially hopeless as a practitioner, has a thorough and powerful array orientation. You can do all kinds of brilliant things with arrays and matrices, manipulating multi-dimensional groups of numbers just as though they were individual numbers. Here’s a tiny snippet:
(2 3 $ 3 0 0) + 1 4 1 1 4 1 1
In the first line, we take the array
3 0 0 and shape it into the shape
2 3, that is, into a two-dimensional matrix with two rows and three columns. We then add 1 to that entire matrix, which is equivalent to adding 1 to each element of the matrix.
In J, there isn’t any need to map a function like
(lambda (x) (+ x 1)) over the matrix; the
+ operator is “matrix-aware” and has a built-in semantics for how to apply itself to a 2d matrix on the left side and a single number on the right.2
It would be useful to be able to incorporate this kind of behaviour into
ad‘s successor. EC is still an RPN, stack-based program, and thus we can’t have J’s pervasive array-orientation; in the J example we get an array “for free” any time we separate two or more numbers with a space. We need spaces to separate all tokens in a stack-based language. Nevertheless, the ability to add, multiply, square, etc. operands of any dimensions would be very much in keeping for a desk calculator designed to be powerful and convenient.
When I decided I’d like to build this program, my mind set off on it and wouldn’t be dissuaded. The quotation at the top of this article is a text I sent to myself on February 17 at 11:35pm, after a sleepless hour or so (typo included). Surprisingly, the initial commit is not until the next day. So I had started entirely conceptually.
In point of fact, I’m quite weak when it comes to algorithmic thinking. My mind starts swimming and losing track of things very easily. And I had no idea how you might accomplish this kind of “dimension-agnostic” logic; it seemed kind of magic the last time I had played around in the J REPL.
That said, I did come primed with one piece of knowledge: in J, arrays have a shape, which is itself a 1-dimensional array listing the length of each dimension of the original array. For instance, as above, an array with shape
2 3 is a 2-by-3 matrix; one with
2 3 3 is a 3-dimensional, 2-by-3-by-3 one. We can follow that to its logical conclusion: if a simple list is a 1-dimensional array, its shape should also be a list, with a single element, which is the length of the list. Finally, that means that the shape of a simple number should be the empty list.
The key to the algorithm that occurred to me, like so many of them, feels kind of like a cheat. I simply can’t conceptualize of a way to “apply” an array of dimensionality N to an array of dimensionality M. Addition, to take the simplest operation I can think of, only makes sense to me when applied to two numbers. And trying to somehow imagine how to “do the right thing” when one of the operands is a list and the other is a 4-dimensional matrix just hurts my head. Therefore, let’s approach that problem from the other direction; instead of figuring out how to add a list and a matrix, or a matrix and a number, what if we just made them both the same exact shape? Then you’d have a one-to-one correspondence between elements and the actual addition would be trivial.
Again, I can barely conceptualize what it would mean to “resize” a list into a 4-d matrix. But it’s pretty easy to conceptualize what resizing
1 looks like in the example
(2 3 $ 3 0 0) + 1. You just repeat
1 a bunch of times until you’ve filled out the whole shape. It seems incredibly wasteful to create all those
1s, but conceptually it’s quite simple.
If we follow that thread, we can start to generalize the concept without hurting our brains too much. If filling
1 out to the size of a 6-element list is simply repeating
1 6 times, then filling
3 0 0 to a matrix of shape
2 3 should be simply repeating
3 0 0 twice. And indeed it is:
2 3 $ 3 0 0 3 0 0 3 0 0
At this point we depart from J as a useful model. In fact, J has a sophisticated and flexible approach to applying dyadic operators between arrays of differing shapes. It allows the user to select the “rank” across which they want to apply the operator, so that the behaviour implemented in EC is only one of the possible solutions to the problem.
That said, the fill algorithm provides a generalizable approach to the problem of applying an operation to operands of differing shape. Given any two vector-or-numbers, fill the smaller-dimensioned one to the shape of the greater-dimensioned one and then apply the operation pairwise.
It’s important to note that this imposes a significant constraint: the shape of the smaller one has to be a suffix of the shape of the greater one. For instance, a vector of shape
3 could be filled to a matrix of shape
2 3 by repeating it twice; however, a vector of shape
4 couldn’t be filled to
I’m happy with this. It addresses the most important case, that of operating on a matrix of any arbitrary size and a single number (the shape of a single number is
, the empty vector, which is trivially a suffix of any other shape), and there’s no special-casing for any specific dimensionality.
Once I had that basic conceptual orientation, I actually found myself articulating the entire algorithm (in prose) in texts to myself. It may simply be a trick of the mind, but to me there is some proof in the “rightness” (I hesitate to say “correctness”!) of the algorithm in the fact that it seemed to unfold naturally, at a conceptual level, once I had oriented my thinking correctly around the topic. The resulting code is a very faithful translation of the texts I sent to myself.3
What leaves me feeling good about this as an approach is that the code for
fill is concise and elegant. Here is the function that, given a Vector object and a shape to fill it to (expressed as an array of numbers), returns a new Vector of the original data filled to the new shape:
(defmethod fill Vector [self shape-to-fill] (let [x (array ;(shape self)) y (array ;shape-to-fill)] (while (not (empty? x)) (let [xi (array/pop x) yi (array/pop y)] (unless (= xi yi) (errorf "Shape error: can't fill vector with shape %q to %q" (shape self) shape-to-fill)))) (reduce (fn [acc length] (let [new-shape (tuple length ;(shape acc)) new-data (array/new-filled length acc)] (:new Vector new-shape new-data))) self (reverse y))))
The algorithm is very short:
ntimes on each step.
I can now demonstrate the above J behaviour as translated into EC.
EC, it should be noted, is written in RPN notation, rather than J’s infix notation, so the operations should be read from left to right. EC Vectors are denoted with square brackets
, as spaces separate stack tokens. The value in between the angle brackets
<> on the left side of the prompt show the current top value in the stack.
<> $ [3 0 0] [2 3] fill <[[3 0 0] [3 0 0]]> $ 1 + <[[4 1 1] [4 1 1]]> $
I’ve done the operation in two steps, so you can see the matrix that
1 + is applied against; it can just as easily be written
[3 0 0] [2 3] fill 1 +. We see the same sequence as in the J code: you can
fill any data structure to any shape, subject to the suffix constraint named above; and any two data structures can be applied to an operator like
+ as long as the smaller-shaped of the two—in this case, the number
1—is fillable to shape of the larger.
We see above an example of the Fugue framework in the form of
defmethod. In the EC code,
Vector is a Fugue prototype, and
fill is a method specialized to
Vector. Janet’s OO functionality is built on prototypal inheritance (as opposed to class-based inheritance), and thus in Fugue one defines prototype hierarchies. In EC,
Vector is a child of
Quotation, which is a child of
Fugue offers two ways to define prototype-based behaviour, corresponding to the two major types of behaviour specialization in object-oriented systems: methods and multimethods.
Methods are functions which are assigned directly to prototype objects (in prototypal inheritance, there are no distinct classes, only objects which other objects designate as their prototypes), and which are therefore inherited by any descendant object of the prototype. Janet provides a native way of calling these assigned functions; syntax like
(:foo bar baz), where
:foo is a keyword instead of a regular symbol, translates to
((get bar :foo) bar baz).
In EC, we implement
fill as a method on
Vector, so a call like
(:fill (:new Integer 4) ) will be prototype-aware. (
Integer, is a child of
Number, of course!)
defmethod macro also defines
fill as a function that calls
:fill, for added convenience.
Multimethods can be specialized against the types (or prototypes) of all of their arguments. Instead of being defined for some prototype (and inserted as a method in that prototype table, to be inherited by its descendants), any individual multimethod instance is defined for the type of each of its arguments. Thus, for instance, a multimethod might be defined separately against
[Quotation Number] and
[Vector Number].4 5
EC has turned out to be an excellent testbed for developing and testing Fugue; it offered some natural opportunities to use almost all of Fugue’s features, and over the course of developing EC I hit a bunch of new problems that suggested natural extensions of the existing feature set. By the time I was finished, I felt quite comfortable defining the feature set of Fugue 1.0 as “that which is sufficient to build EC”.
ad is so-called because it comes after
bc, the venerable calculator program. My partner suggested
ce as further development of the naming scheme, which I think is quite good. Somewhere, however, between that suggestion and my creating the repo, I unconsciously transposed the letters, and didn’t notice until a little while later. ↩
In fact (unsurprisingly), the array behaviours of J are far, far richer than in this example, and far richer than what will be demonstrated in the rest of the post. While it’s possible that we might be able to incrementally increase the power of shaping and filling in EC, we will certainly not take a language as powerful and subtle as J as our point of comparison. ↩
I think of this as operating along the horizontal axis rather than the vertical; multimethods trade the ability to exploit inheritance (a vertical orientation) in dispatch for the ability to dispatch across the entire (horizontal) sequence of argument types together. ↩
For completeness’s sake, I’ll mention that multimethods come in two flavors in Fugue: closed and open. Closed multimethods are defined within a single module and can’t be extended by other callers; thus, they’re roughly analogous to a multi-function-head, pattern-matching system like what you get in Erlang and Elixir.
Open multimethods, on the other hand, are declared once and then can be extended with new cases from anywhere in the codebase. They’re thus closer to what are sometimes known as protocols, in that they provide a way to extend the behaviour of a function with respect to data types that didn’t exist when the function was declared.
In EC there are examples of both: pretty-printing terms is defined as a multimethod, while the
push function (which handles pushing some term to the stack) is an open multimethod that picks up new definitions as new data types are developed. ↩