An honest breakdown of every published score — what's credible, what's been questioned, and how MemPalace stacks up against every competitor.
LongMemEval is the standard academic benchmark for AI memory systems, introduced in the paper "LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory" (arXiv 2410.10813). It presents a system with a long conversation history, then asks questions that require retrieving specific facts from that history.
R@5 (Recall at 5) means: given a question, does the correct answer appear in the top 5 retrieved results? MemPalace raw mode scores 96.6% across 500 test questions — the highest published result for any system that requires no API key and no external service.
benchmarks/longmemeval_bench.py.fact_checker.py) exists but is not yet wired into knowledge graph operations. Being fixed in Issue #27.| Benchmark | Mode | Score | API Calls | Status |
|---|---|---|---|---|
| LongMemEval R@5 | Raw (ChromaDB, zero API) | 96.6% | Zero | ✅ Independently reproduced |
| LongMemEval R@5 | Hybrid + Haiku rerank | 100% (500/500) | ~500 | ✅ Real · uses cloud LLM |
| LongMemEval R@5 | AAAK compression mode | 84.2% | Zero | ⚠️ Regresses vs raw mode |
| LongMemEval held-out | Raw, unseen questions | 98.4% | Zero | ✅ Shows generalization |
| LoCoMo R@10 | Raw, session level | 60.3% | Zero | ⚠️ top_k=50 methodology debated |
| Personal palace R@10 | Heuristic bench (internal) | 85% | Zero | 📊 Internal benchmark |
| Unfiltered search | All closets | 60.9% R@10 | Zero | 📊 Baseline for +34% claim |
| Wing+room filtered search | Metadata filtering | 94.8% R@10 | Zero | ✅ +34% over unfiltered |
| System | LongMemEval R@5 | API Required | Monthly Cost | Local | Open Source |
|---|---|---|---|---|---|
| ⭐ MemPalace (hybrid) | 100% | Optional | Free | Always | MIT |
| ⭐ MemPalace (raw) | 96.6% | None | Free | Always | MIT |
| Supermemory ASMR | ~99% | Yes | Paid | No | No |
| Mastra | 94.87% | Yes (GPT) | API costs | No | Partial |
| Mem0 | ~85% | Yes | $19–249/mo | No | No |
| Zep (Graphiti) | ~85% | Yes | $25/mo+ | Enterprise only | No |