The document summarizes the design and implementation of a new programming language called newt. Key points include:
- newt aims to provide tight feedback loops through static typing and fast failure on semantic errors. It has succinct syntax while avoiding surprising behavior.
- The language is implemented in C++ and includes features like primitive types, arrays, records, functions, and automated testing.
- Challenges included generalizing type specifiers and making functions truly first-class. Future work includes collections, polymorphism, and syntax sugar.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Tools for the Toolmakers
1. Tools for the Toolmakers
Designing a Programming Language
2. Please Ask Questions (hold the rotten fruit)
● Tried to make an accessible presentation
● Don't know what the audience doesn't know
3. 0. About Me (and My Perspective)
● Back-to-school senior
● 4 years in industry with a large-scale C#/WPF code base, developing
computational geometry, CAD, and CAM applications and solving
optimization problems.
● Rails and Android apps here and there
4. 1.Pain Points
Programming languages are a pain sometimes
"Industry-strength" languages: C++/Java/C#
"Scripting" languages: Python, Ruby, Bash
Functional programming languages: Haskell
All of the above are brilliant, useful, and viable languages; this presentation
does not intend to denigrate or belittle
5. Pain Points: Syntactic Verbosity
Java (5 LOC, 120 chars)
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World");
}
}
Python 3.5 (1 LOC, 22 chars)
print("Hello, World")
Note: sometimes verbosity aids readability
6. Pain Points: Executive Verbosity
Java
$ javac HelloWorldApp.java
$ java HelloWorldApp
Python 3.5
$ python3 hello_world.py
Build tools (e.g. make) mitigate this problem somewhat, but introduce their own
complexity
7. Pain Points: Runtime Type Failures
Python
>>> 1 in ["1", "2", "3"]
False
Issue is kind of obvious here, but obfuscated by function calls and modules in
production code
A Youtuber I talked with confirmed that one frequently ends up re-inventing a
type system with unit tests
The Python community is working on this (see type decorators:
https://www.python.org/dev/peps/pep-0484/)
8. Pain Points: More Runtime Type Failures
Ruby
def is_thing (arg)
arg == “thing”
end
#the next statement evaluates to false: our function didn't check the type of its argument
#As with Python, this issue is obfuscated by layers of indirection
is_thing [“thing”]
def do_thing(a)
return a.specific_member #runtime failure if the member doesn't exist
end
9. Pain Points: Immutable Boilerplate
Immutability and constness are recommended defaults, but performing transforms
on immutable record types can be very verbose:
class Point {
public:
Point(const int x, const int y);
const Point* WithX(const int x) const { return new Point(x, m_y); }
const Point* WithY(const int y) const { return new Point(m_x, y); }
private:
const int m_x;
const int m_y;
};
10. Pain Points: Separate Declaration and Definition
Point.h
class Point {
public:
Point(const int x, const int y);
const Point* WithX(const int x) const;
const Point* WithY(const int y) const;
private:
const int m_x;
const int m_y;
};
11. Pain Points: Separate Declaration and Definition
Point.cpp
#include "Point.h"
Point::Point(const int x, const int y) : m_x(x), m_y(y) { }
const Point* Point::WithX(const int x) const { return new Point(x, m_y); }
const Point* Point::WithY(const int y) const { return new Point(m_x, y); }
Multiple signatures that must be kept in sync
Tooling can help (Eclipse's Implement Method feature is a life-saver), we shouldn't rely on tooling
if we don't have to
Somewhat poetically, couldn't fit the contents of both files on a single slide
12. Pain Points: Functional Language Syntax
String indexing in Haskell:
ghci> "Steve Buscemi" !! 6
'B'
Snippet from my xmonad.hs (is that a TIE fighter?):
main = do
xmonad $ gnomeConfig {
layoutHook = smartBorders $ layoutHook gnomeConfig
, workspaces = myWorkspaces
, manageHook = myManageHook <+> manageHook defaultConfig
} `additionalKeys` myKeys `removeKeys` disabledKeys
13. 2. Principles and Observations
As creators of software, automation is
what we do.
Examples of things that aren't automated:
Having to remember the types of
function parameters or look them up
in the docs
Writing code that validates the types of
function arguments
Writing unit tests to do type validation
14. Principles and Observations, the Continuation
Tighter feedback loops allow us to deliver better solutions faster
"Feedback loop" encompasses everything necessary to create a solution and be confident that it
works as intended
Authoring code
Semantic analysis
Unit testing, user testing, integration testing, etc
Requires succinct syntax and succinct build interface
Must identify defects as early as possible
Languages like Python initially give very tight feedback loops, then the loops get long
15. Principles and Observations
Conventions are important, but it's important they’re not astonishing
Astonishing behavior is inhumane, inducing stress and frustration
"Non-astonishing" usually means "familiar"
Follow established conventions where possible
Writing boilerplate code is frustrating and error-prone
Tools that generate code snippets are treating the symptoms, not the cause
16. Let's End the Pain
We can do better!
Or can we?
Ideas that sound good in principle can fall over when implemented
Ideas that are simple to describe in English can be hard to describe in code
17. Validation
Can we create a language that:
1. Consistently provides tight feedback loops
a. "Fails fast" and fails loudly
2. Has succinct syntax and grammar
3. Is readable and unastonishing
a. Doesn't violate existing conventions except where necessary
18. Failing Fast: Basic Definitions
Type
A shorthand for the properties of something in memory (e.g. an object)
and the manner in which we may interact with it.
Dynamically Typed Language
Variables are not associated with a type, and may be assigned a value of
any type. Values are generally typed.
Statically Typed Language
Variables are associated with a type, and may only be assigned values of
its type, or the type's subtypes. Typecasting is a hole in a statically typed language
(widening conversions are also questionable)
19. Failing Fast: Syntax, Grammar, Semantics
Symbol
A sequence of characters
Syntax
Legal sequences of symbols, usually described by a formal grammar
Semantics
The meaning of the syntax
Frequently analyzed without mathematical formality; type validation is part
of semantic analysis
20. Failing Fast: Semantic Analysis
Source Code ->
Lexer ->
Parser ->
Abstract Syntax Tree ->
Code Emission (or Interpretation) ->
Testing (hopefully) ->
Production
"Failing fast": push detection of defects as far up this stack as possible
Dynamically typed languages can't do semantic analysis until Testing stage
21. Readability
Sometimes conflicts with succinctness (see Game of Life example)
Sometimes in tension with verbosity (see Java's "Hello, World!")
Existing conventions shouldn't be violated without good reason
Example: parentheses are deeply associated with function invocation
22. The Language: newt
Derived from gpl, the domain-specific language (DSL) developed in Dr. Tyson
Henry's Compilers course
No geometry types or animation blocks
By extension, no tests for geometry types; all other tests are left intact, and are passing
No statement terminators (that is, no semi-colons)
More detailed error reporting, including column numbers
Generalized build system
Explicit errors
Representative--not exhaustive--set of features implemented for purposes of
23. Implemented Functionality
Primitive Types (bool, int, double, string)
Arrays (dynamically sized, multi-dimensional)
Record Types ("structs")
can be marked read-only
Basic Flow Control and Logic Operators
Functions
Recursive functions not implemented due to time constraints, but no implementation roadblocks
are known
24. Design Philosophy
Favor immutability and constness wherever possible
Diverged from this by allowing identifiers to be mutable by default; would like to change this
Syntax should require as little text entry as possible, without affecting readability
No significant whitespace, however, because it makes code refactoring harder to automate
Function declaration syntax sacrifices succinctness to aid readability
Errors are serious business
Nothing is executed until semantic analysis is complete
Nothing is executed if semantic analysis yields any errors
25. Notable Implementation Details
Object-oriented, written in C++
Mostly C++98, with a few C++11 constructs like auto
Favor immutability and constness, all the way down
Reentrant parser
Information about statements and expressions stored in the corresponding AST for error reporting
No implicit state (e.g. global variables) in the runtime
Vital for keeping execution state organized, particularly during function calls
Execution state captured in ExecutionContext objects that are passed around as needed
26. More Notable Implementation Details
Build interface is simple (requires Flex and Bison):
$ make all
Automated testing framework is simple:
$ make test
Very thin parser file compared to gpl
913 LOC, reduced from 1865
Semantic analysis is done in ordinary C++ code
Decision motivated primarily by tooling: Bison files aren't well-supported by my IDE of choice
Also a good separation of concerns: semantic analysis and parsing aren't the same thing and
don't mix well
27. Variables
Strongly typed
Distinct declaration and assignment operators:
a :int= 42 #declaration
a = 13 #assignment
Syntax motivated by "fail early" philosophy, variable shadowing, and first-class
functions (more on this later)
Type can be inferred:
name := "John Doe" #"name" will be of type 'string'
age := 42.5 #"age" will be of type 'double'
28. Flow Control
a := 15
if (a > 12) {
print("The world is flat.")
} else {
print("The rules no longer apply.")
}
Built to be as un-astonishing as possible to a C/C++/Java programmer
29. Arrays
arr:int[] #array of ints
Brackets go with base type (departure from C/C++)
Dynamically size
Static sizes seem possible because declaration operator is distinct from
assignment operator (this syntax is a recent change)
Inserting beyond the end of the array autofills any non-existent indices with
default values (this feels kind of weird)
Multi-dimensional
Array literal syntax not implemented, but it's quite feasible to do so
30. Record Types ("structs")
readonly struct Point {
x:int
y:int
}
p1 :Point= @Point with { x = 20, y = 30 }
p2 := p1 with { y = 56 + fun_call() }
p3 := p2 #create an alias for p2
Declaration maps identifiers to types; an instance maps identifiers to values
Not stored contiguously in memory
"with" syntax generates instances, either from a default value or another
instance (reminder: all types have default values)
31. Functions
First-class and higher-order functions ("functions as data")
Functions are assignable to variables
Functions can take functions as arguments
True first-class functions that close over their contexts
Not just function pointers
No statement terminators in combination with the chosen function type specifier
syntax gives rise to some complexity: fun := (a:int, b:double) -> int {} instead
of int fun(int a, double b) {}.
6 additional characters
32. Why First-class and Higher-order Functions?
They're shiny
Necessary for functional programming
Useful for
callbacks
generic sorting
map/reduce operations
33. Function Examples
#My First Function
add := (a:int, b:int) -> int {
return a + b
}
result := add(2, 3)
#Higher order function example (callback)
do_op := (value:int, callback: (int) -> string) -> int {
#do some processing here, possibly in the background
callback(value)
}
#some anonymous function action thrown in for free
result := do_op(3, (a:int) -> string { print("Good job " + a + "!") })
34. Automated Testing
Extensive automated test suite (~330 individual tests as of this writing)
Derived from gpl's test suite
For every bug, a test
Useful for testing semantics that cannot be expressed in the C++ type
system
Proved vital when developing new features in a way that didn't break existing
code
36. It's Better
I believe newt offers strong evidence that statically typed languages with
lightweight, fail-fast syntax and grammar are viable
Loss of succinctness in favor of tighter feedback loops as projects scale
Room for improvement here
Function declaration syntax
Function overloading and polymorphism
Distinct declaration and assignment syntax will probably stub toes (it still stubs
mine)
Dogfooding required
37. Caveats
Project has been active for 6 months
Compared to years for Ruby and Python and decades for Perl, Bash, and C++.
Comparing the number of person-hours is even more amusing
Rome wasn't built in a day, and programming languages aren't built in a half a
year
38. Lessons
Large projects are collaborative efforts. You can't do it alone (but nobody will do
it for you either)
Build a good foundation. Get the tooling right. Get the environment right.
Automate all the things, and don’t repeat yourself (without damn good reason)
The following tools are non-optional:
Version control. You cannot maintain the entire project history in your head, and it’s a waste
of cognitive resources to try
Automated regression testing. One cannot maintain the entire state of a non-trivial project in
one's head, so you cannot completely validate a change or enhancement without
automated testing.
39. Lessons
The complexity in any nontrivial system is a result of the interaction of its parts
Example: C-style declaration syntax doesn’t mix with first-class functions that share symbols with
function invocation
Separation of concerns is a real thing
Doing semantic analysis in the parser can make it difficult to reason about the code
The implementation of a well-designed language is so regular it’s almost boring.
Patterns are everywhere, but exploiting those patterns can be time-
consuming
40. Lessons
Statement terminators (e.g. semicolons) make parsing easier
But I'm glad newt doesn't have them, because they're not very succinct
Using enums for type specifiers isn't very extensible
dynamic_cast is usually code smell, but sometimes necessary
Get the parser working correctly before writing the implementation
Linked lists are an interesting way of expressing immutable ASTs, but they
aren't always simple
For example, newt generates a linked list of parameters, but it has to be reversed before
processing
44. Notable Challenges
Minimizing the build system
Basic Assignment (widening conversions are painful)
Arrays
Generalizing type specifiers
Making functions first-class
45. Wishlist
Collections (sets, array literals, etc)
Polymorphic functions
Option types
Syntactic sugar for structs
Syntax highlighter and code formatter
More versatile test suite