Sovereignty

Why a decentralized model matters.

Every prompt sent to a hosted model is logged by the company that hosts it. Code that didn't compile. Medical questions you wouldn't put to your GP. Drafts of resignation letters. Half-formed political opinions. The things you ask before bed because nobody else is awake. All retained, retrievable, and useful for training the next model. Search engines saw what you were curious about. Inference providers see what you intend to do.

IntelNav approaches the problem from a different angle. Rather than ask the host to behave, it removes the host. The model runs in pieces, on volunteer machines, with cryptographic boundaries between every pair of peers. It is slower today than a datacenter call. That changes as more people run nodes.

Threat model

Specific is better than aspirational. The list below states what the design defends against and what it doesn't.

What no single peer sees

A chain has at minimum four parties: a chat client, an entry peer that owns the front layers, one or more middle peers, and a tail peer that owns the lm-head. Only the entry peer ever holds the prompt as text; it has to, in order to embed it. From layer k onward, what travels between peers is a tensor of activations: a vector of floats that came out of the previous block.

That tensor is not text. There is no published method for inverting a mid-layer hidden state of a modern transformer back to its prompt; the embedding has been pushed through k non-linear blocks by the time it leaves the chat client. So the chain is the privacy boundary. One peer sees plaintext. The rest see opaque math. An adversary who controls every peer in your chain sees everything; an adversary who controls one of them sees only their slice.

What runs on the wire

Noise XX between every peer pair. Ephemeral X25519 ECDH for the handshake, AES-256-GCM for the bulk transport. Forward secrecy is on by default; seizing a peer tomorrow doesn't reveal a chain it carried yesterday.
Ed25519 identities. A peer is its public key. There is no central issuer rotating tokens, logging sign-ins, or quietly revoking accounts.
Signed slice advertisements. Every “I host layers k..m of model cid” record on the DHT carries a signature. Routing isn't somewhere a third party can lie quietly.
CBOR-framed messages. Length-prefixed, fixed-schema, no string parsing on the hot path. Easy to audit; easy to fuzz.

What this design doesn't defend against (yet)

The entry peer. Plaintext stops there. If you don't trust any of the entry candidates, host the front slice yourself; it's the cheapest one to run.
Traffic analysis. Hidden-state shapes leak the model. Token timing leaks throughput. The chain is fast point-to-point; it isn't anonymous. Onion-routed transport is on the list, not in the box.
All-peers-collude. A chain whose peers are all the same operator is one operator wearing four hats. Diverse peer selection is the chat client's job and the routing layer's job.
The model itself. Memorization, prompt extraction, jailbreaks, these are model problems. Splitting the weights across peers does nothing about them.

Performance, today vs. later

Four hops cost more round-trips than one datacenter call. That cost is real and we won't hide it. What changes is population: every additional peer makes chains shorter, closer, and easier to parallelise.

Tor, 2003

Hidden services were unusable for casual browsing. Pages took fifteen seconds. The relay network was a few hundred volunteers in two countries. Today: 7,000+ relays in 80+ countries, onion services that load in under a second. Protocol unchanged. Population different.

BitTorrent, 2002

First releases were single-digit kbps for popular files. By 2010, popular torrents saturated home broadband. Today, more bytes flow over BitTorrent on a normal evening than through most CDNs. Same shape of curve.

IntelNav, now

A handful of peers, mostly on one continent. A 7B model wants three hops; a 33B wants eight. RTTs dominate compute on small models. We're in the same place Tor was before anyone used it for the open web.

What gets faster with population

Geographic locality. With ten hosts of layers 0..6 on your continent instead of one in another hemisphere, first-hop RTT goes from 200 ms to 20 ms. Stack that across the chain.
Parallel chains. Redundant hosts let the chat client race two chains and accept the first response. It's the same trick CDNs use against single-origin tail latency.
Slice replication. Popular models pick up redundant hosts on their own. Unpopular slices stay rare, but you also use them rarely. The market handles the balance.
Speculative decoding. A small fast model on the chat client drafts tokens; the chain verifies them. The wire stays warm and perceived latency falls.

The asymptote isn't “as fast as a datacenter”. It's “fast enough that you stop noticing”, which is also the bar centralized providers actually clear in practice.

Why bother

A useful tool for thinking, rented from three vendors who log every use, who can revoke access, who decide what the tool may discuss, isn't a tool the user owns. It's a tool the user rents. We already have decentralised money, publishing, file delivery, and name resolution. The model is the next piece.

How a chat turn flows →Install architecture.md →