Parser & Lexer

This page describes how Luma transforms raw source code into a structured Abstract Syntax Tree (AST) using its lexer and parser components. These are foundational building blocks for Luma’s compilation process.

Lexer: Tokenizing the Input

The lexer (or tokenizer) is responsible for breaking source code into discrete tokens.

Key Responsibilities

  • Ignore whitespace and comments
  • Recognize identifiers, numbers, strings, and keywords
  • Support punctuation and operators: +, -, *, /, %, (, ), {, } etc.
  • Emit errors on unrecognized characters

Example Input

x: int = 42

Corresponding Tokens

IDENT(“x”), COLON, IDENT(“int”), EQUAL, NUMBER(“42”)

Each token has:

  • Type (e.g., IDENT, NUMBER, EQUAL, etc.)
  • Literal value (e.g., "x", "42")

Parser: Constructing the AST

The parser transforms the stream of tokens into a tree of structured nodes that represent the code’s semantics.

Core Constructs

Luma’s parser supports:

  • Variable declarations (VarDecl): x: int = 5
  • Function declarations (FuncDecl)
  • Binary expressions (BinaryExpr): a + b, x * (y + z)
  • Function calls (FunctionCallExpr): print("Hello")
  • String interpolation (InterpolatedExpr): "Hello ${name}"
  • Control structures: if, else

Example

total: int = x + y * 2

Parses into:

&ast.VarDecl{
    Name: "total",
    Type: &ast.SimpleType{Name: "int"},
    Value: &ast.BinaryExpr{
        Operator: "+",
        Left: &ast.LiteralExpr{Value: "x", Kind: "IDENT"},
        Right: &ast.BinaryExpr{
            Operator: "*",
            Left: &ast.LiteralExpr{Value: "y", Kind: "IDENT"},
            Right: &ast.LiteralExpr{Value: "2", Kind: "NUMBER"},
        },
    },
}

Precedence & Associativity

The parser uses a precedence-based approach for arithmetic and logical expressions.

Example:

a + b * c

Is parsed as:

a + (b * c)

Using parseBinaryExpr(precedence int) to correctly nest operations.

Type Awareness in Parsing

While Luma is dynamically flexible, the parser supports type-aware constructs:

  • Type declarations: x: int, y: float
  • Used during compilation for type coercion (e.g. int + floatfloat)

Debugging & Extending

Print Tokens

Enable debug output to print all tokens from the lexer.

Dump AST

Use internal helpers to output the AST tree for inspection.

Add New Expression Type

To add support for a new construct:

  1. Update the lexer to recognize keywords or symbols
  2. Add parsing logic in parseExpression or related helper
  3. Create new AST node if needed (e.g., WhileExpr, MatchExpr)

Full Walkthrough

Input:

x: int = 5
y: float = 15.0
result: float = (x + y) / 2

AST (simplified):

VarDecl("x", int, Literal(5))
VarDecl("y", float, Literal(15.0))
VarDecl("result", float, BinaryExpr("/", BinaryExpr("+", x, y), 2))

Compiler Output (Go):

a := int(5)
b := float64(15.0)
result := ((float64(a) + float64(b)) / float64(2))

Summary

  • The lexer tokenizes raw code into meaningful parts.
  • The parser builds a hierarchical AST structure.
  • Luma’s compiler uses this AST to emit Go code.
  • It supports type-aware coercion and clean arithmetic expressions.

This layer forms the backbone of Luma’s design - intuitive, readable, and powerful.

Last updated on