Relevant prior art. Designs and implementations that will support the language design process.
Note: This is a first draft with entries taken directly from #14 and racket-users.
This line of work relies on an idea variously referred to as "skeleton syntax trees", "tree terms", or "token trees". A reader transforms a stream of characters into tokens grouped by balanced delimiters such as braces. The grouping based on delimiters is coarser than in s-expressions; additional parsing remains necessary to separate out different syntactic forms.
Macros receive a sequence of token trees as their input, parse some portion of it, and return the unparsed remainder of the sequence along with their expansion. The output of a macro expansion step is also a token tree.
The following are in rough chronological order.
Compared to the earlier work in this tradition, Honu adds Top Down Operator Precedence parsing
- Honu: Syntactic Extension for Algebraic Notation through Enforestation, GPCE 2012
- Honu documentation
Star is an independent take on this idea, also adding Top Down Operator Precedence parsing
- Feel Different on the Java Platform: The Star Programming Language, PPPJ 2013
- Star Reference
- Star implementation
These approaches provide a reader that produces s-expressions with a similar amount of structure as found in current Racket, but infer some groupings to require fewer parentheses.
- https://github.com/tonyg/racket-something
- #lang something // infix, optionally indentation-sensitive experimental Racket syntax - Tony Garnock-Jones
- SRFI 110
- Sweet Racket reader
- Readable Lisp S-expressions Project implemented in Common LISP
- Sweet.js - Sweet brings the hygienic macros of languages like Scheme and Rust to JavaScript.
In these systems languages define grammars, which may extend non-terminals found in other grammars. When languages are used together, the productions of the various grammars are composed to form a single grammar that is used to parse the program. Note that these approaches often require that all language extensions used in a given parse be known up-front in order to form the composed grammar.
- Parsing Composed Grammars with Language Boxes Section 2, "Parsing Composed Grammars", briefly summarizes some of the challenges with general grammar composition.
- Parsing Reflective Grammars
- Tree Notation Grammar Language
Macros define PEGs, which are composed to determine the parser for a given file
SDF is a DSL for defining general context free grammars. When composing grammars, additional rules can be given to resolve ambiguities resulting from the composition.
Elixir lies somewhere between the token trees approach and a fixed grammar. Many forms build their syntax from generic elements. For example, the syntax of the list comprehensions
for n <- [1, 2, 3, 4], do: n * n
is composed of a call (for
), an infix operator (<-
), and a keyword argument (do:
). for
is not part of the parser's grammar. However, some forms such as anonymous functions (fn
) have more specialized syntax built into the parser.
mflatt commented:
Mentioned on the mailing list: https://elixir-lang.org/
- Remix - a revised version of Racket
- https://github.com/jeapostrophe/remix
AlexKnauth commented
Another thing worth looking at is the Parinfer editor extension, and its line invariant for converting between indentation and paren structure in both directions.
Mathematica language
The Mathematica language is a language that uses a non s-expression syntax, but nevertheless feels lispy to use. An application of a function f to arguments x and y is written f[x,y]. This decision makes it easy to use parenthesis for grouping. Using FullForm, TraditionalForm, StandForm and InputForm one can convert between representations of an expression. The full form resembles s-expressions using {} for lists and F[x,y] for applications.
LiberalArtist commented:
@lexi-lambda's Hackett has an infix syntax: https://github.com/lexi-lambda/hackett/blob/8e4e0e904ac37df58b8c8ef29c0f94ad4151246f/hackett-doc/scribblings/hackett/guide.scrbl#L251 (Link to the Scribble source)
pschmied commented:
The Julia language is quite lispy and I believe achieves this in part with a ~Scheme dialect called femtolisp:
- https://github.com/femtolisp
- https://www.julialang.org/
- https://docs.julialang.org/en/v1/manual/metaprogramming/index.html
rocketnia commented:
The Arc Forum collected a list here a while back: https://sites.google.com/site/arclanguagewiki/more/list-of-languages-with-s-expression-sugar
pschmied commented:
I don’t know if it’s germane, but some languages have gone the other way—implementing an S-expr surface language atop another language:
rocketnia commented:
Another related vein of prior art, which you're no doubt aware of: The idea of "language-oriented programming," especially combined with Python-ish syntax, is something I associate with language workbenches.
This section includes references on general parsing techniques. These don't directly address the integration of parsing with macros, but they're relevant to understanding some of the approaches above.
Top Down Operator Precedence - Vaughan R. Pratt Massachusetts Institute of Technology 1973