Parser & Lexer
This page describes how Luma transforms raw source code into a structured Abstract Syntax Tree (AST) using its lexer and parser components. These are foundational building blocks for Luma’s compilation process.
Lexer: Tokenizing the Input
The lexer (or tokenizer) is responsible for breaking source code into discrete tokens.
Key Responsibilities
- Ignore whitespace and comments
- Recognize identifiers, numbers, strings, and keywords
- Support punctuation and operators:
+,-,*,/,%,(,),{,}etc. - Emit errors on unrecognized characters
Example Input
x: int = 42
Corresponding Tokens
IDENT(“x”), COLON, IDENT(“int”), EQUAL, NUMBER(“42”)
Each token has:
- Type (e.g.,
IDENT,NUMBER,EQUAL, etc.) - Literal value (e.g.,
"x","42")
Parser: Constructing the AST
The parser transforms the stream of tokens into a tree of structured nodes that represent the code’s semantics.
Core Constructs
Luma’s parser supports:
- Variable declarations (
VarDecl):x: int = 5 - Function declarations (
FuncDecl) - Binary expressions (
BinaryExpr):a + b,x * (y + z) - Function calls (
FunctionCallExpr):print("Hello") - String interpolation (
InterpolatedExpr):"Hello ${name}" - Control structures:
if,else
Example
total: int = x + y * 2
Parses into:
&ast.VarDecl{
Name: "total",
Type: &ast.SimpleType{Name: "int"},
Value: &ast.BinaryExpr{
Operator: "+",
Left: &ast.LiteralExpr{Value: "x", Kind: "IDENT"},
Right: &ast.BinaryExpr{
Operator: "*",
Left: &ast.LiteralExpr{Value: "y", Kind: "IDENT"},
Right: &ast.LiteralExpr{Value: "2", Kind: "NUMBER"},
},
},
}Precedence & Associativity
The parser uses a precedence-based approach for arithmetic and logical expressions.
Example:
a + b * c
Is parsed as:
a + (b * c)
Using parseBinaryExpr(precedence int) to correctly nest operations.
Type Awareness in Parsing
While Luma is dynamically flexible, the parser supports type-aware constructs:
- Type declarations:
x: int,y: float - Used during compilation for type coercion (e.g.
int + float→float)
Debugging & Extending
Print Tokens
Enable debug output to print all tokens from the lexer.
Dump AST
Use internal helpers to output the AST tree for inspection.
Add New Expression Type
To add support for a new construct:
- Update the lexer to recognize keywords or symbols
- Add parsing logic in
parseExpressionor related helper - Create new AST node if needed (e.g.,
WhileExpr,MatchExpr)
Full Walkthrough
Input:
x: int = 5
y: float = 15.0
result: float = (x + y) / 2AST (simplified):
VarDecl("x", int, Literal(5))
VarDecl("y", float, Literal(15.0))
VarDecl("result", float, BinaryExpr("/", BinaryExpr("+", x, y), 2))Compiler Output (Go):
a := int(5)
b := float64(15.0)
result := ((float64(a) + float64(b)) / float64(2))Summary
- The lexer tokenizes raw code into meaningful parts.
- The parser builds a hierarchical AST structure.
- Luma’s compiler uses this AST to emit Go code.
- It supports type-aware coercion and clean arithmetic expressions.
This layer forms the backbone of Luma’s design - intuitive, readable, and powerful.