Walkthrough

How it works

A single chat turn, end to end.

1. The user types a prompt

The chat client (intelnav) tokenizes the prompt and embeds it into the first hidden state. It owns the embedding layer and the front slice (layers 0..k) locally.

2. The DHT chooses a chain

Each peer that hosts a slice publishes a provider record on Kademlia, keyed by blake3("intelnav/shard/v1|<cid>|<start>|<end>"). The chat client fans out one DHT lookup per range it needs, then ranks providers by TCP probe latency. Up to two candidates per hop are kept as backups.

3. Hidden states flow through the chain

User → TUI → Local pipeline (layers 0..k)
                       │
                       │  ForwardHidden (CBOR-framed)
                       ▼
              peer A · layers k..m
                       │
                       ▼
              peer B · layers m..N
                       │  hidden state
                       ▼
              tail peer · head + sample
                       │  token
                       ▼
                     stream

Hidden states travel as length-prefixed CBOR ForwardHidden messages. Each peer keeps its own KV cache for the session. SessionInit resets cache state at the start of each turn.

4. Tokens stream back to the user

The tail peer samples a token from the final logits and sends it back upstream. The chat client renders it. Loop until end-of-sequence.

Failure handling

  • A hop disconnects mid-turn → chain driver swaps in the next-best candidate for that slot, retries the connection, continues streaming.
  • A peer wants to stop hosting → flips the slice to Draining, stops re-publishing the DHT record, refuses new chains. In-flight ones keep streaming until they finish or hit the 5-minute force-stop.
  • A peer crashes → its provider record ages out of the DHT in 30 minutes. Re-announce interval is 5 minutes, so a healthy peer re-claims its place quickly.

For implementation details, see the architecture diagram and runtime sequence diagram in docs/architecture.md.