
Eryx
AI assistant platform featuring intelligent chat with real-time search, MCP integrations, and hierarchical context management for handling long conversations efficiently.
Timeline
Oct 2025 - Present
Role
Creator & Developer
Team
Solo
Technology Stack
Key Challenges
- Implementing resumable streaming for unreliable network conditions
- Building hierarchical context management for long conversations
- Managing credit-based billing with Polar payments
- Integrating multiple MCP (Model Context Protocol) servers
Key Learnings
- Real-time streaming with backpressure handling
- Token budget allocation and context window optimization
- Designing for high-availability and fault tolerance
- Implementing async job processing with BullMQ
Overview
Eryx is an AI assistant platform that provides intelligent conversations with real-time web search capabilities. Built with Next.js 16 and Bun runtime, it features a sophisticated context management system that handles long conversations efficiently through hierarchical summarization and structured memory.
Features
Intelligent Chat with Real-time Search
- Web search via SearxNG integration when in web mode
- AI-powered responses with configurable model selection (GPT-4.1, GPT-4o, GPT-5 series)
- Resumable streaming for unreliable network conditions
Hierarchical Context Management
- Layer 1: Recent 20 messages with full fidelity
- Layer 2: LLM-generated conversation summaries stored in PostgreSQL
- Layer 3: Structured key facts and topics extraction
- Redis caching with 7-day TTL for summaries
MCP (Model Context Protocol) Integration
- Connect multiple MCP servers as tools
- OAuth-based authentication for MCP apps
- Secure credential encryption with dedicated key management
Credit-based Billing
- Polar integration for merchant-of-record payments
- Multiple pricing tiers (Basic/Pro)
- Real-time credit balance tracking
- Usage-based model selection
Real-time Capabilities
- WebSocket-based chat synchronization
- Typing indicators and presence
- Notification system
- Customizable response styles
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Client (Browser) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Next.js App Router │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ API Routes │ │ Middleware │ │ Rate Limiting │ │
│ │ (Server) │ │ (Security) │ │ (Redis-based) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│ Services │ │ AI Provider │ │ External │
│ (Business Logic) │ │ (OpenAI) │ │ Services │
│ - Chat Service │ │ │ │ - Polar │
│ - Credit Service │ │ │ │ - SearxNG │
│ - RAG Service │ │ │ │ - MCP Servers │
└────────┬──────────┘ └───────────────────┘ └───────────────────┘
│
▼
┌───────────────────┐ ┌───────────────────┐
│ Redis │ │ PostgreSQL │
│ (Cache/PubSub) │ │ (Prisma ORM) │
└───────────────────┘ └───────────────────┘
Runtime: Bun (runs the Next.js server)
Tech Stack
- Frontend: Next.js 16.2.4, React 19.2.5, TypeScript
- Runtime: Bun
- Database: PostgreSQL + Prisma ORM
- Cache/PubSub: Redis (ioredis)
- Auth: Stack Auth
- AI: OpenAI via @ai-sdk/openai with custom provider mapping
- Payments: Polar (Merchant of Record)
- Search: SearxNG
- Queue: BullMQ for async job processing
- UI: shadcn/ui, Radix, Tailwind CSS
Key Technical Implementations
Resumable Streaming
Handles network interruptions gracefully by maintaining stream state in Redis. If a stream is interrupted, clients can resume from the last received position without losing context.
Hierarchical Context System
Instead of simple token truncation, the system uses LLM-generated summaries that preserve:
- Conversation overview
- Key topics discussed
- Important facts and decisions
MCP Tool Integration
Users can connect MCP servers that expose tools. These tools are dynamically loaded and can execute code, query APIs, or perform other operations on behalf of the AI.
Rate Limiting
Redis-based rate limiting with configurable limits per endpoint. Supports different tiers (chat, search, general) with appropriate cooldown periods.
Screenshots
| Chat Interface | Search Mode | Account Settings |
|----------------|-------------|------------------|
|
|
|
|
| Customization | Feedback | Apps Integration |
|---------------|----------|------------------|
|
|
|
|