The language guide
Writing Noise
Noise is a small, expression-based probabilistic language: every value is a probability
distribution, and you compute probabilities and expectations by Monte Carlo. A program is a
sequence of ;-separated statements and its result is the value of the last statement. This
guide is self-contained — everything you need to write correct, idiomatic Noise is below.
Run it
cargo run -p noise-cli -- file.noise # run a program; prints the LAST statement's value
cargo run -p noise-cli # REPL (one line at a time, persistent env)
(If a noise binary is installed, noise file.noise / noise work the same way.)
The mental model
Four load-bearing rules. Internalize these and the rest follows.
-
Everything is a distribution. A number is just the degenerate case — a point mass (
mean(5)=5,variance(5)=0,P(5>3)=1). Operators map distributions to distributions uniformly, soP,E,Var,Qapply to anything. -
~draws;=transforms.name ~ distis a stochastic node — it draws a fresh random variable, and~is the only thing that draws.name = expris a deterministic node — a transform, a constant, or an undrawn recipe. You cannot do arithmetic on an undrawn recipe:unif(0, 1) + 3 # ERROR — unif(...) is a recipe, not a number X ~ unif(0, 1); X + 3 # correct — draw with ~, then transform with =unif,unif_int,bernoulli,normal, … all return an undrawn recipe. Bind a recipe with=to name it (Die = unif_int(1, 6)), then draw it with~(a ~ Die). -
One name = one fixed node. Every mention of a name is the same draw, exactly like
Xin math. SoX + Xis2X,X - Xis exactly0, andP(X == 6 && X > 3)uses oneX. There is no re-draw on reuse. -
Independence is explicit. Two independent draws come from two
~declarations, or from the shaped draw~[n] dist— never from repeating a name.A ~ unif_int(1, 6); B ~ unif_int(1, 6) # two independent dice dice ~[2] unif_int(1, 6) # same thing, as a length-2 array
Nothing is sampled until a query (P/E/Var/Q) forces it; everything upstream stays symbolic.
Language reference
Lexical. Whitespace is insignificant (separates tokens). Comments run to end of line, started
by # or //. Numbers are f64 integer or decimal literals — no exponent syntax, and a
leading - is the unary-minus operator, not part of the literal. Identifiers are
[A-Za-z_][A-Za-z0-9_]*, case-sensitive. Reserved words: if else for in use true false.
Strings are double-quoted with no escape sequences ("like this"); they are a label/utility
type for Print and messages — a string can never enter a random-variable expression.
Operators (precedence low → high; all left-associative except ** and binding, which are
right-associative; prefix -/! bind tighter than everything below **, so -2 ** 2 == -4):
| Level | Operators | Meaning |
|---|---|---|
| 1 | = ~ | binding (right-assoc) |
| 2 | .. | half-open integer range [a, b) |
| 3 | || | logical or |
| 4 | && | logical and |
| 5 | == != < > <= >= | comparison |
| 6 | + - | add / subtract (+ also concatenates if either side is a string) |
| 7 | * / @ | multiply / divide (elementwise/broadcast) · @ = matrix product |
| 8 | ** | power (right-assoc) |
| 9 | prefix - ! ~ | negate · logical not · draw |
| 10 | postfix [index] | index (repeatable: M[i][j]) |
| 11 | call, () grouping, {} blocks |
Types & rules (every value is conceptually a distribution; these are the runtime forms):
number (point mass), bool (point mass on {T,F}), string, unit (), array (fixed-length,
known at build time), dist (a drawn random variable / distribution handle), estimate (a number
carrying a standard error, produced by P/E/Var), and signal (a lazy waveform).
- Arithmetic
+ - * / **need numbers → number (division &**are IEEE-754:1/0 == inf, no panic). With adistoperand the op lifts to adist; pure-constant subexprs fold eagerly. +concatenates when either side is a string (the other is stringified):"x = " + 5.- Ordering
< > <= >=need two numbers → bool. Equality== !=compares two values of the same primitive type → bool (mixed-type is an error). - Prefix
-needs a number; prefix!needs a bool. if cond { a } else { b }—condmust be aboolor a bool random variable (see “lifted if”).- No implicit coercions. Type mismatches are runtime errors with a source span.
- Scope is flat: blocks do not introduce a new scope — bindings made inside a block remain
visible afterwards. (This is how
for-loop accumulators work.)
Modules & use
builtin is always active: P, Q, E, Var, Print, Len (capitalized). Everything
else is strict — a bare name errors until you use its module (or write the mod::name
path). Start each program with the use lines you need.
| Module | use? | Items |
|---|---|---|
builtin | always | P, Q, E, Var, Print, Len |
rand | use rand; | unif, unif_int, bernoulli, normal, normal_int, exp, exp_int, poisson, geometric, rotation |
math | use math; | pi, e, sqrt, round, log (natural), log10, sin, cos, atan, sign |
vec | use vec; | sum, count, any, all, max, min, mean, dot, normsq, norm, vadd, vsub, matvec, transpose, normalize, quantize, has_duplicates, count_duplicates, mse, ones, zeros, iota |
signal | use signal; | sine, cosine, noise_white, sample |
use rand; # unif, unif_int, …
math::sqrt(2) # or reach one item by its full path, no `use` needed
A user definition shadows a module item of the same name. Module paths are single-level
(mod::name); a constant path resolves a value (math::pi), a function path must be called.
The standard workflow
recipe (=) → draw (~ / ~[n]) → transform (=) → query (P/E/Var/Q) → Print.
use rand; # unif_int
use vec; # has_duplicates
n = 23;
bday = unif_int(1, 365); # a recipe
days ~[n] bday; # n independent draws
match = has_duplicates(days);
Print("P(shared birthday among", n, ") =", P(match))
Distributions & queries
Distributions (all rand, all return recipes; draw with ~):
unif(a, b)— continuous uniform on[a, b). Continuous → never use==on it.unif_int(a, b)— discrete uniform ona..=binclusive. Use this for dice/coins/counts.bernoulli(p)—truewith probabilityp(a bool-RV).normal(mu, sigma),exp(rate)(mean = 1/rate),poisson(lambda),geometric(p)(failures before first success)._intfamily —normal_int,exp_intround each draw to the nearest integer (so==/counts are meaningful).unif_intis already discrete.rotation(d)— a fresh randomd×dorthonormal matrix per sample (Haar rotation).
Queries (all builtin; default n = 1e6 samples, fixed seed → reproducible):
P(event[, n])— probability a bool-RV is true. Returns an estimate carrying its standard error: it self-rounds to the digits the sample size justifies, and the error propagates through arithmetic (4 * P(C)shows one fewer digit). Pass a biggernto reveal more digits.Pof a non-event (numeric) is an error.E(x[, n])/Var(x[, n])— expectation / variance of a numeric (or bool) quantity.Eof a bool equalsP.Q(x, q[, n])— quantile (inverse CDF):Q(X, 0.5)median,Q(X, 0.95)95th pct,Q(X, 0)/Q(X, 1)min/max draw. Returns a plain number.Print(args…)— space-separated, then newline; combine with string+andround(x, d).Len(xs)— element count of an array (a build-time constant).
Collections, arrays & idioms
Arrays are fixed-length and known at build time. Build them with literals ([1, 2, 3], []),
the range a..b (half-open: 0..n is 0 … n-1), the shaped draw ~[n] d, or vec
constructors (ones(n), zeros(n), iota(n)). Index with xs[i] (chains: M[i][j]); the index
must be a constant non-negative integer in range — never a random variable. There is no
append/push.
Arithmetic broadcasts over arrays (NumPy-style, nesting for matrices):
[1,2,3] + [10,20,30] → [11,22,33], 1 + [1,2,3] → [2,3,4], [1,2,3] ** 2 → [1,4,9].
The @ operator is the matrix product (v @ w dot, M @ v matvec, M @ N matmul); *
stays elementwise. sin/cos/atan are ufuncs (scalar, lifted over RVs, or mapped over arrays).
Shaped draws & reducers. ~[n] d is n iid draws (an array); ~[n, m] d a matrix. Fold with
a vec reducer — independence becomes a one-liner:
use rand; use vec;
dice ~[2] unif_int(1, 6); Print("P(sum==7) =", P(sum(dice) == 7)) # two dice
flips ~[3] bernoulli(0.5); Print("P(all heads) =", P(all(flips))) # 3-coin streak
flips ~[3] bernoulli(0.5); Print("P(exactly 2) =", P(count(flips)==2)) # count of true
parts ~[3] bernoulli(0.9); Print("uptime =", P(any(parts))) # at-least-one
Lifted if = per-lane select (a value, not control flow; else required, both branches
evaluated and reuse the condition’s per-lane draws). Gives max/min/abs over RVs for free:
A ~ unif_int(1, 6); B ~ unif_int(1, 6);
higher = if A > B { A } else { B }; # max of two dice
Print("P(higher==6) =", P(higher == 6))
Ranges & for (build-time unroll). for x in xs { } runs the body once per element; bindings
leak (blocks don’t scope) — that’s exactly how an accumulator persists. Each ~ inside the body
is a distinct node, so it’s a clean way to make many independent draws:
use vec;
acc = 0; for x in 1..5 { acc = acc + x }; acc # 1+2+3+4 = 10
User functions. f(a) = expr is pure (lifts over RVs); f() ~ dist draws fresh per call:
use rand;
max(a, b) = if a > b { a } else { b }; # pure
roll() ~ unif_int(1, 6); # fresh draw each call
P(roll() + roll() == 7) # two INDEPENDENT rolls
Functions are pure in their parameters — the body sees only its args (plus pi/e), no outer
variables, no closures. Calls unroll at build time, so recursion must terminate.
Conditional probability has no native observe; write it as a ratio:
use rand;
D ~ unif_int(1, 6);
Print("P(D==6 | D>3) =", P(D == 6 && D > 3) / P(D > 3)) # = 1/3
Signals (lazy waveforms). signal::sine(f) / cosine(f) describe a waveform by frequency
(O(1) memory). Scalar/trig ops defer into the signal; it materializes to an array when it meets
a sized array (adopting its length) or via signal::sample(sig, n). noise_white(sigma) is a lazy
random generator (materializes to fresh iid normal(0, sigma) draws). sine(n, f) is shorthand
for sample(sine(f), n).
use signal;
msg = sample(0.3 * sine(3), 64); # render the lazy tone at 64 points
Hazards — what NOT to do
==on a continuous RV is ~never true.unif(1,6) == 4≈ 0. Use a discrete distribution (unif_int,bernoulli,*_int) whenever you compare for equality or count.- Arithmetic on an undrawn recipe is an error. Draw with
~first (rule 2 above). - No closures. Function bodies see only parameters and the
pi/econstants. - Indices must be constant non-negative integers, known at build time — never a random variable.
- Arrays are fixed-length. No
push/append. Build with literals,a..b,~[shape], or thevecconstructors. - Blocks don’t scope — bindings made inside leak out (there is one flat environment).
- No sequential / stateful processes. A random walk, Markov chain, or M/M/1 queue
(
W_{n+1} = max(0, W_n + S_n − A_{n+1})) is not expressible — the engine samples independent lanes that can’t carry state across a time index. A liftedifpicks a value per lane; it can’t thread state across steps. - Each query samples its own pass.
P(A),P(B),P(A && B)are estimated independently, so exact cross-query consistency (P(A && B) ≤ P(A)) is not guaranteed.
Style conventions
- Open with a comment stating the question and, where one exists, the analytic answer to check against.
- Use readable, named intermediate steps, not nested one-liners (
days,match,total,higher— one idea per binding). - Put the needed
uselines at the top. - End in
Print(...), building the message with string+andround(x, d).
Pre-flight checklist
- Every random variable is introduced with
~(no arithmetic on a bare recipe). - Independence comes from
~[n]or separate~— not from repeating a name. - Equality / counting uses a discrete distribution (
unif_int/bernoulli/*_int). -
uselines present for every non-builtinname; queries are capitalized (P/E/Var/Q/Print/Len). - No sequential-state recurrence, no random index, no expectation of block scoping.
- Program ends in
Print(...); run it and sanity-check against the analytic value.