From HIP-3 Order Flow to a Defensible Wallet Taxonomy

Arrakis's own study, published April 8, 2026 with 0xArchive credited alongside HyperTracker for the 808 million order events behind it, states a limit up front: the retail share is "a floor, not a ceiling," because the interface can't tag limit orders. That limit matters because every rerun has to preserve the same bucket boundary. The concrete question this benchmark answers: pull the same window twice, a week apart, and check whether the buckets agree. Because the window is historical and fixed, any gap has to trace to how the pipeline reassembled the data, not to a change in market history. (The spotlight covers the full research job this tests; the comparison covers the native-API gap it measures against.)

Where it breaks

Classify wallets market by market and you'd miss the interesting ones entirely. 209 wallets traded all four silver markets in the Arrakis study, a cohort invisible unless the classification is built to look across markets on purpose. Tighten a threshold after the fact and every borderline wallet needs re-checking against order-book state at the moment the label was assigned, not just the fill that triggered it.

Same window, run twice. The buckets barely move.

What it runs on

Start with the fill: /trades/{symbol} gives every trade, wallet-attributed, verified back to February 2026. That alone tells you what a wallet did, not how it got there. /orderbook/{symbol}/history holds the book state around each labeling decision; across the seven Arrakis study markets specifically, that starts February 16, 2026 for the earliest and March 4 for the latest. /orderbook/{symbol}/l4/diffs holds the add, modify, and cancel events, live from March 10, 2026. /orderbook/{symbol}/l4 holds the resting order-level book itself, reconstructed from the nearest checkpoint at or before any requested moment, available once the first checkpoint lands shortly after March 10: every order's price, size, and ID. /orders/{symbol}/history holds wallet-attributed order state on the same range. Reproducing a study at this scale means pulling from all five of these routes; a rerun draws from the same five.

The precomputed version of the same fields

/wallets/classify precomputes the same fields, one row per wallet per day: cancel rate, maker ratio, order-to-trade ratio, whether a wallet uses TWAP or trigger orders. There's no symbol parameter: each row covers a wallet's behavior across every HIP-3 market it touched that day, the same cross-market reach that surfaces a cohort like the 209 silver-market wallets above. A per-symbol pull can't give you that. The endpoint filters directly on the field a threshold-based taxonomy is built from: min_cancel_rate and max_cancel_rate pull exactly the wallets a bucket boundary would separate, without a separate filtering pass. Pull the same date twice, a week apart, and the wallet metrics match exactly; only the request_id in the response envelope changes. The precomputed version already clears the same rerun bar the raw-route version has to earn by hand.

Checking that a rerun actually agrees

Build the taxonomy from the five raw routes instead of the precomputed one, and the rerun test gets harder: pull the same window twice, a week apart, and diff the wallet-to-bucket assignments. Because the window is historical and fixed, no wallet actually behaved differently between the two pulls; any bucket change means the assignment logic saw different inputs the second time. A wallet that flips buckets because the order-book snapshot at labeling time came from a different pagination cursor is exactly that kind of bug. Both look identical from outside unless the book state is queryable from the nearest checkpoint at or before the label was assigned, not approximated from a nearby fill.

Pull the record

curl "https://api.0xarchive.io/v1/hyperliquid/hip3/trades/xyz:TSLA?start=1782777600000&end=1782864000000" \
  -H "X-API-Key: $OXARCHIVE_API_KEY"

Pair that with /orders/{symbol}/history for wallet-attributed order state and /orderbook/{symbol}/l4 for book state from the nearest checkpoint at or before the label time. Same five routes on every rerun.

What native can't give a taxonomy

Hyperliquid's API answers what happened on one trade, not what the book looked like when a wallet's order sat in queue three weeks ago — the same gap the comparison tests directly, so building a taxonomy against native means reconstructing that context from scratch every time.

Pull /wallets/classify for the same date a week apart and diff the wallet metrics yourself — create a free account to try it. If the numbers match and only request_id changes, the precomputed classification holds without touching the raw routes at all.