The Noise Programming Language
Abstract.
In Noise, every value is a probability distribution: a number is the degenerate case, a
point mass. Operators lift over random variables and a
query such as P(event) estimates a probability by simulation.
The result is a small language in which propagating uncertainty and running Monte-Carlo
experiments reads like ordinary mathematics, highly optimized.
Keep scrolling
The best way to learn a language is to watch it unfold in front of you — so keep scrolling. Six small programs, each one idea past the last, animate as you read them, then run for real on the compiled engine in your browser. It has a touch of magic to it: every variable here is a whole wave of possibility, and a query is what collapses it to a single number.
Every value is a distribution
A = 5D1 ~ rand::unif_int(1, 6)D2 ~ rand::unif_int(1, 6)S = A + D1 + D2E(S) A number is a distribution
A = 5 looks like a plain constant — and it is, but Noise
sees it as a probability distribution
whose every draw lands on 5. A single infinitely-thin spike: the
Dirac delta,
the degenerate case
every other value generalizes.
The tilde spreads it out
D1 ~ rand::unif_int(1, 6) binds a fair die — a
discrete uniform
spread, all six faces equally likely. The spike fans out into six flat bars.
D1 is now genuinely uncertain, yet you write it like any
other variable.
Join them — a bell appears
Now add the constant and two independent
dice together: S = A + D1 + D2 — five, plus a roll, plus a
roll. The flat shapes convolve
into a triangle peaked at 12 — already curving toward a
bell.
Join a few more and it sharpens into a true Gaussian: the
central limit theorem,
for free. (Noise also ships normal,
poisson,
bernoulli
and more for the same ~ slot.)
A query collapses millions of draws
Nothing actually runs until you ask. E(S) — the
expected value
— fires a Monte-Carlo
pass: millions of rolls of both dice, averaged into one honest number. By the
law of large numbers
the running mean settles on 12. Two lines of math; an expert
kernel underneath.
Estimating π with Monte Carlo
X ~ rand::unif(-1, 1)Y ~ rand::unif(-1, 1) C = X**2 + Y**2 < 1 4 * P(C) Two random draws
Same ~ as before, now used twice. X ~ unif(-1, 1) and Y ~ unif(-1, 1) each draw a number anywhere in [-1, 1] — together, a random point in the square.
Ask one yes/no question
C = X**2 + Y**2 < 1 is true exactly when the point falls inside the unit circle. In Noise that comparison is itself a random variable — a Bernoulli distribution, landing true or false on each draw.
Now throw the darts
Each dart is one draw of (X, Y); teal if inside the circle, grey if not. They scatter evenly — no point is special.
Four times the fraction is π
The circle covers π/4 of the square, so the teal share hovers around π/4 — and 4 * P(C) is π. More darts, sharper estimate.
The birthday paradox
days ~[23] rand::unif_int(1, 365) match = vec::has_duplicates(days) P(match) A room full of birthdays
Every ~ so far drew a single number. ~[23] hands 23 people one random day each — and that whole batch is still a single draw. The strip up top is one such room scattered across the 365 days of the year.
Ask: does any pair collide?
has_duplicate checks all 253 pairwise comparisons at once and is true the moment two days land on top of each other — the matches flash maroon above.
Add people, watch it climb
Each extra person adds more pairs that could collide, so P(match) rises far faster than intuition expects — not a gentle ramp but a steep wall.
23 beats a coin flip
The curve crosses the dashed 50% line at just 23 people — fewer than a classroom. By 50 it's a near-certainty. Our intuition compares ourselves to others; the math counts every pair.
The bell curve, two ways
sampled sum Normal(N/2, N/12)
# 1 — a Galton board: 12 left/right bouncesrights = vec::count(~[12] rand::bernoulli(0.5))P(rights == 6) # centre bin -> 0.226 # 2 — the same bell, from a sum of uniformssum = vec::sum(~[12] rand::unif(0, 1)) - 6P(sum > 1) # -> 0.159, Normal(0,1) Each peg is a coin flip
Two dice already hinted at a bell; twelve coin flips sharpen it. A ball meets twelve pegs; at each it goes left or right with equal chance. ~[12] bernoulli(0.5) is those twelve flips, and count collapses the batch back to one number — how many went right.
Where does it land?
The number of rights is the bin. P(rights == 6) asks how often a ball ends dead centre — twelve fair steps landing exactly halfway.
Drop the balls
Watch them cascade. Any single ball's path is unpredictable, yet the bins fill into a strikingly regular shape.
A bell, from coin flips
The stacks trace the Binomial — a sum of twelve Bernoulli steps. But is the bell really about coins? Let's swap them for something with no bell in it at all.
Start over with flat noise
unif(0, 1) is equally likely anywhere in [0, 1] — a flat slab, nothing bell-shaped. We'll sum N of them and standardize, starting with just one.
Two make a triangle
Sum two independent uniforms and middling totals beat the extremes — a triangle. The parts are still flat; the sum is what bends.
A few, and it rounds
By four terms the corners are already softening toward a curve — the very same shape the falling balls drew, with no pegs in sight.
Same bell, different machine
Twelve uniforms hug the true Normal (red). That's the central limit theorem: sums drift to a Normal regardless of the parts — coins or flat noise, the bell is the same law underneath.
AM vs FM
msg = signal::sample(0.3 * signal::sine(3), 64); am = [1 + msg, 0 * msg];fm = [math::cos(3 * msg), math::sin(3 * msg)]; static = signal::noise_white(0.3);rx_am = am + static;rx_fm = fm + static; rec_am = (rx_am[0]**2 + rx_am[1]**2)**0.5 - 1;rec_fm = math::atan(rx_fm[1] / rx_fm[0]) / 3; Print("AM error", E(vec::mse(rec_am, msg), 40000));Print("FM error", E(vec::mse(rec_fm, msg), 40000)); scroll through the steps →
The message
A batch can carry a shape, not just a count — here it's a signal. A gentle tone: 64 samples of a slow sine. This is what we want to send through a noisy channel and get back intact.
AM — message in the amplitude
Write the carrier as a phasor [I, Q] — a spinning arrow. AM puts the message in its length: the tip slides in and out, crossing the unit circle as the message rises and falls.
FM — message in the angle
FM instead puts the message in the angle: the tip rides the unit circle while its length stays fixed. Same information, encoded in rotation rather than reach.
Add the same static
Identical noise hits both tips. It smears AM along the radius it reads from — but only nudges FM around the circle, barely changing the angle. Watch the clouds.
Recover & compare
Read the length back for AM, the angle for FM, and compare to the original. FM returns far cleaner for the very same static — the payoff of spending bandwidth on the angle.
100 boxes, 100 prisoners
n = 100; opens = 50;boxes ~ rand::permutation(n); # box k holds slip boxes[k] all_win = true;for prisoner in 0..n { box = prisoner; # open your own box first found = false; for hop in 0..opens { box = boxes[box]; # follow the chain found = found || (box == prisoner); }; all_win = all_win && found;};P(all_win) # ~ 0.3118 A riddle that sounds impossible
A draw can be a whole arrangement, too — a random permutation. 100 numbered prisoners, 100 numbered boxes, each box holding one prisoner's number in random order. Each may open 50 boxes hunting for their own number. If everyone finds it, all go free — otherwise all are lost. No signalling.
Guessing is hopeless
Open 50 of the 100 boxes at random and you find your own number half the time. Fine for one prisoner — but all 100 must succeed at once, so the odds are (½)¹⁰⁰ ≈ 8×10⁻³¹. Astronomically, certainly zero.
The trick: open your own box first
Now the clever rule. Prisoner k opens box k. Inside is some number — so go open that box next. The slip there points to the next box, and so on. You are quietly walking a loop.
Follow the chain home
Because the boxes are a permutation, the chain you follow must eventually loop back to box k — and the slip that closes the loop is your own number. You find it on the very last hop of your cycle… if that cycle is at most 50 long.
Every prisoner walks one loop
The whole arrangement splits into a handful of separate loops, and each prisoner is born onto exactly one of them. Everyone on a given loop succeeds or fails together, depending only on that loop's length.
One rule decides everything: no loop over 50
So all 100 go free precisely when the longest loop is ≤ 50. Watch fresh shuffles roll by — a single long loop (red) dooms everyone on it; keep every loop short (green) and the whole room walks free.
How often? About 31%
Over thousands of shuffles, plot the longest loop. A permutation can have at most one loop longer than 50, and the chance it has none works out to 1 − (1/51 + … + 1/100) ≈ 0.3118. The green mass left of the line is the answer.
Examples
A catalogue of short programs, each a Monte-Carlo experiment with a known closed form, so the printed answer can be checked. Open any one in the playground to run and edit it — the real Noise compiler, built to WebAssembly and running in your browser. Each program gets its own shareable link.
Basics
Probability
Three heads in a row
1/8 = 0.125Model the event, don’t multiply probabilities by hand.
Open in playground →Exactly two heads
3/8 = 0.375A tiny Binomial built from boolean events.
Open in playground →Birthday paradox
How often does a group share a birthday?
Open in playground →Monty Hall
2/3 ≈ 0.6667Switching doors wins 2/3 of the time.
Open in playground →Conditional probability
1/3 ≈ 0.3333P(A | B) as a ratio of probabilities.
Open in playground →Games & risk
D&D advantage
Keep the higher of two d20s.
Open in playground →Max of two dice
11/36 ≈ 0.3056max over random variables via a lifted if.
Open in playground →A dice bet
Build a payoff distribution, then ask about profit.
Open in playground →Insurance payout
A deductible as a lifted if over a loss.
Open in playground →Redundancy
0.999Three parallel components, each 90% reliable.
Open in playground →Continuous & CLT
Signals & DSP
Functions & research
How it works
Noise is small by design. Everything above is built from a handful of ideas.
Everything is a distribution
A number is just a distribution with all its weight on a single point — a Dirac delta. Operators lift over random variables automatically — so X below is random, and Y is random too, with no special syntax. Propagating uncertainty reads exactly like ordinary arithmetic.
X ~ unif(-1, 1)
Y = 2 * X + 3 # Y is a distribution too The tilde draws; equals transforms
A name bound with ~ is one fixed random draw that every mention reuses — so X − X is exactly 0, never "two samples." Independence comes from separate ~ bindings, exactly like writing X₁, X₂ on paper. No hidden re-draws, no surprises.
A ~ unif_int(1, 6)
B ~ unif_int(1, 6) # two independent dice
A + B # a genuine 2d6 distribution Queries: P, E, Var, Q
Nothing is sampled until you ask. A query runs a fast columnar Monte-Carlo pass and reports an honest estimate — the printed digits reflect the standard error, and that error propagates through arithmetic, so 4·P(C) rounds itself correctly.
C = X**2 + Y**2 < 1
4 * P(C) # ≈ 3.14 Independence is a shape
Put a shape on the tilde to draw a whole batch at once: ~[n] is an iid vector, ~[n, m] a matrix. A reducer collapses it back to one number — so the birthday paradox over 23 people, all 253 pairwise comparisons, is a single expression.
days ~[23] unif_int(1, 365)
P(has_duplicates(days)) # ≈ 0.51 if is a value, not a branch
When the condition is random, if c { a } else { b } does not take a path — it builds a new random variable, choosing a or b per sample. That single rule hands you max, min, abs, clamps and payoffs over distributions for free.
higher = if A > B { A } else { B } # the larger of two dice Performance
Almost every Noise program ends in “evaluate this expression over a few million random
draws.” That loop is compiled, not interpreted. ~ and the
distribution constructors build a graph IR that lowers three ways — a portable columnar
interpreter, a native JIT via Cranelift, and a
WebAssembly emitter for the browser — all sharing one cost model, so the backend only ever
changes speed, never results (bit-identical across core counts).
You write a one-line P(...) and get an expert kernel for
free. It is built from a stack of techniques, each with its own measured win:
- Kernel fusion — the codegen backends emit one loop that draws its sources, computes the whole expression in registers, and stores only the result, erasing the interpreter's intermediate memory traffic.
- Graph simplification — constant folding, finite-safe algebraic identities, and
common-subexpression elimination shrink the DAG before any code is generated (so
X + Xis one draw, not two). - Inlined xoshiro256++ PRNG — the generator is emitted straight into the kernel as a handful of shifts/xors/rotates, with zero call overhead on native and in WASM alike.
- Inlined transcendentals —
ln/sin/cos(the heart ofnormal,exp, and signals) become straight-line polynomial approximations (~1e-9 vs libm), roughly doubling transcendental-bound kernels and skipping a per-draw crossing of the JS boundary in the browser. - Multi-stream RNG — four independent xoshiro streams run at once to hide the generator's serial-dependency latency (the scalar form of SIMD), switched on only where the graph is latency-bound.
- Columnar batches — the interpreter runs 1024 lanes through one instruction at a
time: a tight, cache-friendly, auto-vectorizing pass over contiguous
f64s. - Vectorized power-sum reduction — moments accumulate as raw power sums across eight unrolled lanes with no per-element divide: ~9.5× faster than a streaming Welford update, turning the reduction from the ceiling into a rounding error.
- Deterministic multicore — sampling fans out with a work-stealing loop whose per-chunk accumulators merge as an exactly-associative monoid, so the answer is bit-identical regardless of thread count, and reproducible from a seed.
- Profitability gate — a cost model emits a fused kernel only where it beats the vectorized interpreter, so codegen can change the speed but never lose.
The payoff, measured on a 14-core M4 Pro:
- ~5.8 billion samples/sec (π Monte Carlo, generate + reduce, all cores), scaling ~9.6× from one core to all of them.
- Within ~1.15× of hand-written, LLVM-compiled Rust per core — and faster end to end, because the one-liner fuses and fans out across every core with no flags or annotations.
- In the browser the emitted WASM kernel runs the same fused loop at ~0.5–0.75× of native codegen — hundreds of millions of samples/sec, client-side.
The full write-up, with the benchmark tables behind each number, is in PERF.md.