first-5090-qwen3
- model
- Qwen3-35B-A3B
- quant
- Q4_K_XL
- GPU
- RTX 5090
- ctx
- 65 536
- VRAM
- 32 GB
- LoRA
- —
MTP n_max=2 is fastest. For long-context RAG, prefer SPEC=none. ctx=131072 is the VRAM ceiling here.
A contribution-based, real-time map of local LLM inference. If Arena maps models, luckrig maps the rigs.
Exact GPU, exact quantization, exact context length, with someone’s actual tuning notes — and your own prompt, right now. That’s the gap. luckrig is what fills it.
| Service | What you taste | How you pay | What you see about the hardware |
|---|---|---|---|
| HF Spaceshosted demos | An author’s wrapper around a model | Free / quota | Whatever the author chose to print |
| LMSys Arenamodel evaluation | A blind A/B between two models | Free | Model name. Nothing about the rig. |
| Vast.aiGPU rental marketplace | A bare GPU you rent and provision yourself | Money, per hour | Hardware specs — but no one’s tuning |
| AI Hordedistributed mutual aid | A response from any worker that fits | Kudos | Hardware is abstracted away |
| luckriginfra-first | A specific rig someone is running, with their prompt-tuning | Contribution, not money | GPU · quant · ctx · LoRA · tuning notes · fingerprint |
“If Arena is a map of model evaluation, luckrig is a real-time map of infrastructure evaluation.”
A tracker keeps a public list of nodes. Each node runs a thin proxy in front of ollama or llama.cpp. You earn access by contributing — a node, notes, or upstream patches.
A real, public list. Liveness, rarity score, exact spec, fingerprint. Sorted by what’s rare, not what’s loudest.
Queue → response → local replay file you keep on disk. Real SSE in plain mode; OpenAI-compatible — your vanilla client works.
Register a node, write a tuning note, upload a timing measurement. Access scales with contribution score — a homage to Hotline Connect.
Other contribution networks abstract the worker away. luckrig does the opposite — environment metadata, tuning notes, a specific rig name, and rarity-based ordering. Showcase, not leaderboard.
MTP n_max=2 is fastest. For long-context RAG, prefer SPEC=none. ctx=131072 is the VRAM ceiling here.
Reference frame for unified-memory builds. Evaluated with the memory config, not just the GPU label.
Lowest-spec slot. The value here is not speed — it’s “it runs on this”. 2.3 tok/s p50.
Rare configurations surface before popular ones. The default order is a showcase of breadth, not a leaderboard of speed.
All performance numbers come from client-side measurement. A node never self-reports its throughput.
Same GPU + quant? The config is yours to copy. Taste, then build.
Tracker, tokens, node proxy, subtext (optional), pseudo SSE, replay save — all wired together using only the Node.js standard library. No npm tree.
This is the part most landing pages skip. We’d rather lose the reader who was here for marketing copy than mislead the reader who wasn’t.
plain mode is real SSE over TLS, plus the node proxy’s “don’t write plaintext to logs” convention. Nothing more, nothing dressed up as more.
A node operator who instruments the process internals can still see plaintext. Don’t send secrets, personal data, or anything you’re obligated to protect.
1) Local regex in the node proxy.
2) External moderation hook (LUCKRIG_MODERATION_ENDPOINT), fails closed on input.
3) Tracker-side Notice-and-Takedown — no auto-bans.
CSAM, terror / mass-violence support, and other illegal content are out.
POST /api/abuse/report is rate-limited; operator review precedes any ban.
luckrig is a concept and a working POC, in the open. Read the spec, run the POC locally, decide for yourself.