Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Welcome to the docs of Kalei. A toy language named after Kaleidoscope.

These docs are a way to publicly store language ideas, compiler implementation details or keep links to articles that concern both. Right now, you may find many pages empty or not clearly organized.

The most interesting pages are:

  • Pipeline (Compiler) to see what does the current compiler pipeline looks like.
  • Future (Language) to check out some syntax/language ideas I would like to try.
served by Grebedoc

Syntax

Expressions

Design

Short-circuiting operators

and and or operators makes it clearer that they act on control flow as opposed to other character? based binary operators

if condition and other_condition { }
if condition or other_condition { }

Structs

Design

enum Option<T> {
Some(uint),
None,
}

enum Toto {
Foo,
Bar(uint),
Baz { tata: uint }
}

struct Baz {
content: uint,
}

// struct NamedTuple<T, U> {
// foo: T,
// bar: U,
// }

fn foo(option: Option) { }

/*

enum Option { Foo, Bar }

fn main() {
let maybe = .Some(10);
match maybe {
.Some
()
.None => {}
else => {}
}

let baz = Baz { content: 10 };
let inner = baz.content;
let _ { content } = &baz;

foo(.Foo);
}

Traits

Rust-like traits.

Design

Definition

trait Foo {
fn bar() uint;

fn baz() {
// snip
}
}

Implementation

struct MyType;

impl Foo for MyType {
// notice how there is no type in impl block
//
// concern: against code locality
fn bar() {
// chosen by a fair roll of dice
4
}
}

Match

Design

Invocation

(find name)

match variable { <arms> }
match foo.method() { <arms> }

Postfix

let variable = Some(42);

variable.match { <arms> }
foo.method().match { <arms> }

foo.method().match { <arms> }.continue()

Arms

Small arrow syntax (languages)

<match> {
  123 -> {}
  else -> {}
}

Rust large arrow syntax

<match> {
  123 => {}
  _ => {}
}

no arrows

<match> {
  123 { 456 }
  _ {
    // -snip-
  }
}

Attributes

Design

  • Which character to introduce attribute? #, @

    @ clashes with bindings in match arms.

#attr
#attr("foo")
#attr{name = "c"}
fn item() {
#rustfmt::skip
let matrix = [
1, 2, 3,
4, 5, 6,
];

#allow(clippy::placeholder_name)
let foo = ();
}
  • Concern: Bare attributes (i.e. no [] to delimit) on expression can conflict

    #foo (1)
    #foo() 1
    
    #foo [1, 2, 3]
    #foo[] 1
    

    Currently, attributes have precedence.

    • Do we not have bare attributes? #foo is #foo()
    • Do we delimit the attribute?

Future

All the syntax here are thoughts on language design that may or may not be interesting to implement in a language. Most of the syntax here came from a quick though, is bad and needs to be bikeshed.

Modern programming languages should optimize for readability.

Reading

Syntax Samples

Many of the tests shown here are usual constructs in programming but tried in a left-to-right syntax. I would like to explore the LTR design space.

is operator

  • Motivation: LTR programming

    Would work especially well with path inference.

Equiv. (Rust) if let <pattern> = <expr>

let option = Some(42);
if option is Some(num) { }
if option.is Some(num) { }

not postfix operator

  • Motivation: LTR programming

Would replace the current not prefix operator.

if not vec.is_empty() { }
// becomes
if vec.is_empty().not { }
if vec.is_empty().not() { } // keep as a boolean method?

postfix deref operator

  • Motivation: LTR programming
let num = 10;
let myref = &num;
let mynum = myref.*;

bar.mac_call().*;
bar.field.*;

LTR Loop statements

let stmts = [];

let stmts_iter = stmts.iter();
while stmts_iter.next() is Some(stmt) { }

in stmts for stmt { }
in stmts the stmt { }

for stmts each stmt { }
for stmts be stmt { }

all stmts be stmt { }
loop stmts for stmt { }

stmts.for |stmt| { }
stmts.iter().enumerate().for |i, stmt| { }

LTR Trait implementation

for Struct impl Trait { }
Struct::impl Trait { }
Struct.impl Trait { }
Struct.impl { }

Postfix try control-flow operator

let res = Ok(98);
let num = res.?;

Type ascription

  • Motivation: better type inference ergonomics
// annotate functions as such
let myexpr = f() : uint;

let myvec = (
(1..=100)
.into_iter
()
.collect
() : Vec<_>
)
.into_iter
()
.map
((|x| x+1):fn(uint) -> uint);

Generics

Which syntax to adopt?

  • < > collide with comparison operators and need special handling (e.g. Rust’s turbofish)
fn forget[T](v: T);
fn substring['a](s: &'a str) &'a str;
fn substring(s: &str) &str;

fn foo<T>(arg: T) T {
arg
}

Path

Which syntax to adopt?

use std.arith.Plus; // can be confused with field access
use std/arith/Plus; // no
use std::arith::Plus;

Alternative keywords

My rationale for using verbs instead of nouns for keywords is that they are less likely to conflict with user variable names. On the other hand, it’s difficult to have a wide range of verbs with similar semantic or just a verb for too many things. Using one or the other would be more coherent.

  • Verb keywords:
    • def for values (functions, constants)
    • decl/dcl for type-level constructs (structs, enums, etc.)
    • let for local values (variables)
  • Noun keywords:
    • fn/func
    • type/struct/enum
    • var/val
    • cst/const
def func() { }
def constant = 42;

trait Add {
// dcl → declare instead of `fn`
dcl Output;
// def → define instead of `type/struct/enum`
def add(self, other: Self) Self::Output;
}

Callsite code

  • Motivation: Avoid excessive monomorphization
fn as_ref_str(s: impl AsRef<str>) {
@callsite let s = s.as_ref();
// snip
}

// alternative vvv?
fn into_string(s: can Into<String>) {
let s = s.into();
// snip
}

fn callsite() {
let my_str = "...";

as_ref_str(my_str);
into_string(my_str);
// instead of the following to avoid generics
as_ref_str(my_str.as_ref());
as_ref_str(&my_str);
into_string(my_str.into());
}

Arbitrary prefixed string literals

This would could integrate well with a good comptime system. Is it worth?

comptime fn regex(lit: &str) { }
let regex = regex""
// could lower to
let regex = comptime { regex("") }

// e.g. c"im a cstr"

Enforce purity and capabilities

Enforce purity through effects, doc_effects, capability system or another system that suits.

Track

  • writes to stdio
  • calls to unsafe
  • hidden interaction with randomness (e.g. Rust HashMap)
  • allocations

This is one of the largest goals here

Require value binding

Can be simply implemented as a lint. Acts like if everything was (Rust) #[must_use]

Rational number type

Motivation: experiment with floats as a last resort construct and work with rational numbers

no semicolon in statements, functional syntax?

  • OCaml syntax?
fn foo(arg: Ty) ->
  a = make_foo();
  b = make_bar(a);
  ()

Concern: how to know to return or not last value

  • depend on the type system? consider it a return if type matches, seems rebust to errors
fn boo(arg: Ty) {
  _ = make_blah()
}

Hot-reloading

Would be fun to play with that. Kind of like dotnet watch.

Path inference

enum InferKind {
Infer,
NoInfer,
}

fn foo(qrcode: InferKind);

foo(.Infer);

match InferKind.Infer {
.Infer -> {}
.NoInfer -> {}
}

Pipe operator/construct

Motivation: avoid nesting calls or let repetition

Question: how to tell where are going the arguments

Related: See Uniform function call syntax

let value =
compute_value(my_looooooooog_value_name)
|> transform_value1()
|> transform_value2()
|> transform_value3();

// reuse closure syntax?
let value =
compute_value(my_looooooooog_value_name)
|> transform_value1
|> |v| transform_value2(v)
|> |v| transform_value3(v, 2);

// semantically equivalent to
let value = transform_value3(transform_value2(transform_value1(compute_value(my_looooooooog_value_name))), 3);
// which may format as
let value = transform_value3(
transform_value2(transform_value1(compute_value(
my_looooooooog_value_name,
))),
3,
);

// or chaining the same name to avoid nesting and keeping clarity
let value = compute_value(my_looooooooog_value_name);
let value = transform_value1(value);
let value = transform_value2(value);
let value = transform_value3(value, 3);

Operators

@ pattern
| pattern, bit-or
& pattern, bit-and
_ pattern
~ pattern?, bit-not?

# attributes

% bitop? do not use for rem
* bitop
= bitop
+ bitop
- bitop
/ bitop

? unary try
! unary not

\ escape

'' char, lifetime?
"" string
`` ?

() delim value param
[] delim index
{} delim stmts, group
<> delim type param
, punct param
: type
; stmt end

. field

^ ???
$ raw ident?

struct Fields {
	foo: uint,
	$let: uint,
}

fields.$let

Effects

Reading

Samples

Name resolution

effect Resolve {
  fn resolve(name: Sym) &ref Ty;
}

fn main() {
	let items = <..>;
	let map = Map<Sym, Ty>;

	// I don't like the try/catch-like syntax for effects
	handle {
		items.for |item| {
			'resolve(item);
		}
	} with {
		// no type in effect handling
		fn resolve(name) {
			map.entry(name).match {
				// not long arrows
				.Occupied |ty| effect_continue &ty
				.Vaccant || {
					// it's not in the handle ctx anymore???
					let ty = resolve_item(name);
					map.insert(name, ty);
					effect_continue &ty;
				}
			}
		}
	}
}

fn resolve_item(item: hir::Item) Ty \ Resolve {
	// translate hir to ty
	// ...
	let ty = 'resolve(item.subtype);
	// ...
	ty
}

Capabilities

Reading

Samples

struct Env; // ZST. impl uses a global type
struct BasicArena { }

#explicit capability env = Env; // capability use-case
#implicit capability arena = BasicArena; // context use-case

fn std::env::var(s: AsRef<str>) String with Env {}
fn std::env::var(s: AsRef<str>) String with Env {}

fn main() () \ {env} {
let arena = BasicArena.new();
use arena as Arena;

foo()
// pass some capabilities explicitly
.with
(Env)
}

fn foo() uint \ {arena, env} {
let id = arena.alloc(1);
let var = std::env::var("foo")?;
}

Metaprogramming

Reading

  • The Metaprogramming Dilemma - gingerBill (Odin)

    Defines a list of metaprogramming categories:

    • Introspection (and Reflection for OOP languages)
    • Compile Time Execution (CTE)
    • Template Programming
    • Macros (Textual and Syntactic)
    • Parametric Polymorphism (“Generics”)

    My goal for generics is to have generics obviously, some kind of checked template programming that uses introspection, vm/interpreted compile time execution that is sandboxed and is proven deterministic but can still access things like the heap (Rust const time is way too restrictive, too much hacks)

  • Parametricity, or Comptime is Bonkers

    Though I have not used Zig much, I agree that their comptime are is much free form, and although all abstractions are leaky, it makes it too easy to depend on some construction

    Maybe never allow compiler intrinsic to output information without a trait (except TypeId).

    // has to be the identity because T has no observable properties
    fn foo<T>(foo: T) T {}

    // as opposed to
    fn foo<T: Any>(value: &T) _ {} // we only care about opaque value identity, is used for type maps context in Rust
    fn foo<T: Shape>() _ {} // e.g. shows that it uses T for reflection
    fn foo<T: Layout>() _ {} // layout only gives information for size/alignement
  • facet - GitHub

Samples

How to balance between user power and good ergonomics.

Shape intrinsic

  • Motivation: do not depend on text/token manipulation macros
let shape = Shape::<Directions>::new().variants
// equals: let shape = &[Brush, Line, Rectangle]

Inline statements

  • Motivation: do not depend on text/token manipulation macros
struct Foo { bla: Boo, bar: Baz }

fn deserialize(s: &str) -> T {
let result = Foo::default();
inline for field in Shape::new(result).fields {
field.value = deserialize(s);
}
result
}
inline if std.cfg.host.is_linux() {

} else {

}

Assignment reflection

[..] I very frequently find myself wanting to know the name I’m “about to be assigned to”.

Evelyn Wood, about language design

  • Motivation: in languages where you specify a lot of data inline (e.g. Nix) you tend often have "name" = function("name", { options })

  • Concern: value changes with an LSP rename. Maybe be the source of confusion

  • Concern: makes the expression dependent on context → you cannot extract that expression somewhere else

let kalei = Source::new(`name);
// becomes
let kalei = Source::new("kalei");

Pipeline

Current front-end compiler pipeline looks like the following:

Input is an input, Bold is a step, Italic is an artefact.

-> is used by, => produces

  1. Source code -> Lexing & Parsing => AST

    Understand the form of the user source code.

  2. AST -> Item Collection, Item Resolution => NameEnvironment, ResolutionMap

    Go through each item and associate names of items to ast::NodeIds.

    It makes sense to collect before AST Lowering because

    • this information can be used during the lowering for path resolution
    • it is used to provide diagnostics to user code, AST is closer to what the user wrote
  3. AST, ResolutionMap -> AST Lowering => HIR

    We lower the AST to a flat structure (soon™) that is the HIR.

    HIR is much more usable in the context of a compiler.

  4. HIR -> Type Collection => TypeEnvironment

    Go through each item and compute their ty::EarlyItemTy.

    Once, the collection phase is done, these are immediately resolved to ty::LateTys.

  5. HIR, NameEnvironment, TypeEnvironment -> Inference (per function) => InferResult

  6. HIR, InferResult -> Code Generation => Object

Glossary

There are many core types that are used throughout this compiler. Here is a non-exhaustive list of them and their use.

Contexts

  • SessionCtx (scx)

    Holds information needed for the whole compiler session. Things such as options, diagnostics, debug output.

  • DiagnosticCtx (dcx)

    Contains all the diagnostic accumulation and emission related code.

  • TyCtx (tcx)

    Holds information needed for type inference, type check and their results.

IDs

  • session::Symbol

    Created when lexing files. It is an index to interned identifiers and strings.

  • session::Span and session::BytePos

    Used extensively throughout the different intermediate representations to show good diagnostics to the user.

  • ast::NodeId

    Created during the parsing step.

  • resolve::DefId

    Created during the collection step.

  • hir::NodeId

    Created during the lowering step.

    • hir::ExprId: they wrap normal NodeId but ensure that they point to an expression.

TyKinds

  • hir::TyKind

    Holds the item definition of a type.

  • ty::TyKind<InferKind, RefKind>

    Holds the concrete definition of a type.

    • ty::EarlyItemTy (ty::TyKind<NoInfer, DefId>)

    • ty::InferTy (ty::TyKind<Infer, NoRef>)

      same but keeps an extra field to store type variables in the inference step

    • ty::LateTy (ty::TyKind<NoInfer, NoRef>)

Parsing

Future

Flat AST storage

Some random idea of representation

type Foo { id: u32 }

lexes as this stream with token length

IDENT(4) IDENT(3) LBRA(1) IDENT(2) COLON(1) IDENT(3) RBRA(1)

which parses as this tree stream (braces are not represented in the stream)

{ NAME { NAME TY } FIELDS_END } ITEM_END

Type Checking

Reading

Code Generation

Future

Language Abstract Machine

One thing that helped me to understand the concept of pointer provenance, was to realise that each language acts in its own abstract machine. An abstract machine is emulated through an interpreter.

This is why we have language specifications and undefined behaviour. Specifications, among other things, define the semantics of this abstract machine.

It is easy to overlook the abstract machine because its model and behaviour can map closely to the underlying implementation. This is especially true for a language like Rust or C++ that are designed to be systems programming languages. Whether the language is implemented as an interpreter or compiled down to machine code, you need to understand how this translation works to have a better control on performance, or <…>. By the way, it is easy to argue that machine code is “just” a lower level language that is in turn interpreted by the CPU.

Among the implementations of Rust, one is miri (MIR Interpreter).

This said, it would be nice to define an abstract machine for this language.

Extra references

Kaleidoscope