Xere AI Platform

Advanced Multi-Pipeline AI Chat Assistant

🎯 Mission Statement

Xere AI, my pet project, developed under eklypse, delivers AI assistance built for something more useful than writing bad haikus; it's designed for real personal and research work. It's built to be more than just another chatbot: think personal research buddy meets brainy sidekick. With transparent multi-stage reasoning (no black-box mumbo jumbo), real-time data plugged in, and security solid enough to make a lawyer sleep at night, Xere AI's aim is to deliver reliable, citation-backed insights.

Whether it's helping with independent legal digging, breaking down business strategy, or just stress-testing wild ideas, the mission is simple: keep improving the platform while poking at the frontier of agentic RAG, so eventually, it won't just help with research, it'll run autonomous research workflows on its own (without asking for coffee breaks) -- like a tireless digital colleague.

Built for Real Work • Transparent by Design • Pushing Toward Agentic RAG

🚀 Platform Overview

Xere is a sophisticated AI assistant platform optimized for international business law research and strategic business analysis. Built on professional-grade infrastructure with advanced security protocols, multi-stage reasoning pipelines, and real-time data integration from multiple sources.

7
Specialized Domains
192GB
DDR5 ECC RAM
5
Tone Options

🔧 Complete LLM Pipeline Architecture

🔍 Ask Anything

Max Tokens: 8,000

Word Limit: ~2,600 words

openai/gpt-oss-120b

💹 Financial Markets Dashboard

Type: Dedicated Dashboard (opens in new tab)

Data Sources: Polygon.io & Alpha Vantage

Max Tokens: 5,000 per analysis

AI Analysis: deepseek-ai/DeepSeek-V3.1 (5k tokens)

* Institutional-quality financial analysis for stocks, portfolios, and market sentiment

🎨 Creative Writing

Max Tokens: 41,000

Word Limit: ~13,600 words

Stage 1: GLM-4.5-Air-FP8 (6k)
Stage 2: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 (35k)

✨ Image Generation: FLUX.1.1-pro

⚖️ Legal Research

Max Tokens: 37,000

Word Limit: ~12,300 words

Stage 1: GLM-4.5-Air-FP8 (6k)
Stage 2: deepcogito/cogito-v2-preview-llama-109B-MoE (18k)
Stage 3: Qwen/Qwen3-Next-80B-A3B-Thinking (13k)

💻 Technical & Engineering

Max Tokens: 55,000 (39k base + 16k code gen)

Word Limit: ~18,300 words (max with Stage 4)

Stage 1: GLM-4.5-Air-FP8 (6k) — Intelligent Dispatching
Stage 2: Qwen3-Next-80B-A3B-Thinking (16k) — Deep Reasoning
Stage 3: openai/gpt-oss-120b (17k) — Final Synthesis
Stage 4: Qwen3-Coder-480B-A35B (16k) — Code Generation

* Stage 4 auto-triggers when code implementation is requested

🧩 Consultant Console

Max Tokens: 46,000

Word Limit: ~15,300 words

Stage 1: GLM-4.5-Air-FP8 (6k)
Stage 2: openai/gpt-oss-120b (25k)
Stage 3: deepcogito/cogito-v2-preview-llama-109B-MoE (15k)

🔬 Deep Research

Max Tokens: 71,000

Word Limit: ~23,500 words

Stage 1: GLM-4.5-Air-FP8 (6k) — Intelligent Dispatching
Stage 2: Qwen/Qwen3-235B-A22B-Thinking-2507 (45k)
Stage 3: deepcogito/cogito-v2-preview-llama-109B-MoE (20k)

🧬 Project Synapse

Project Synapse is an advanced intelligent analysis and code generation platform featuring adaptive complexity detection, multi-stage reasoning pipelines, and comprehensive research capabilities. It automatically routes queries to optimized processing pipelines based on complexity, ensuring efficient use of resources while maintaining high-quality outputs.

✨ Key Features

  • ⚡ Enhanced code generation (75K tokens total, up to 10min timeouts)
  • 🔍 Real-time web search verification (ALL complexity levels: 5/7/10 results)
  • 📚 Interactive OSCOLA citations with bidirectional navigation
  • 🗃️ RAG citation metadata with document snippet previews
  • 🎓🏛️ Academic (.edu) and government (.gov) domain detection
  • 📖 ArXiv auto-download to Grand Library

📊 Analysis Pipeline

GLM-4.5-Air-FP8 classifies queries as Low/Medium/High complexity and routes to optimized multi-stage pipelines.

🟢 Low Complexity (0.0-3.0)

Stages: 1

Best for: Simple factual questions, definitions

Stage 1: Qwen3-235B-Thinking (12k tokens) — Direct Answer with Reasoning

* Web search: 5 results

🟡 Medium Complexity (3.1-6.0)

Stages: 3

Best for: Multi-step explanations, comparisons

Stage 1: DeepSeek-V3 (18k tokens) — Deep Analysis
Stage 2: Qwen3-235B-Thinking (20k tokens) — Reasoning & Synthesis
Stage 3: Llama-405B (18k tokens) — Quality Assurance

* Web search: 7 results

🔴 High Complexity (6.1-10.0)

Stages: 4

Best for: Deep analysis, novel problem-solving

Stage 1: DeepSeek-V3 or DeepSeek-R1 (25k tokens) — Deep Analysis with Optional Reasoning
Stage 2: Llama-405B (25k tokens) — Fact-Checking & Verification
Stage 3: DeepSeek-V3 (25k tokens) — Synthesis
Stage 4: Qwen3-235B-Thinking (25k tokens) — Quality Assurance & Final Polish

* Web search: 10 results

Stage 1 Adaptive Selection:

  • DeepSeek-R1 (reasoning score ≥ 9) — Advanced chain-of-thought reasoning for novel problems requiring step-by-step logic (temp: 0.6)
  • DeepSeek-V3 (reasoning score < 9) — Standard deep analysis with efficient inference for complex queries (temp: 0.6)

* DeepSeek-R1 automatically selected for queries requiring novel problem-solving, mathematical proofs, or multi-step logical reasoning

💻 Code Generator Pipeline

2-stage pipeline: Planning → Parallel code & test generation. Enhanced with 76% increased token limits and extended timeouts for complex systems.

🔧 Code Generation Workflow

Stages: 2 (Planning + Parallel Generation)

Output: Code + Tests + Optional Docs

Total Capacity: 75K tokens (15K+30K+20K+10K)

Stage 1: DeepSeek 671B MoE (15k tokens, 5min) — Planning with Thinking Mode
Stage 2: Qwen Coder 480B (30k + 20k + 10k tokens) — Parallel Code & Test & Docs Generation

⚡ Parallel Execution: Code (30k, 10min), Tests (20k, 6.7min), Docs (10k, 5min) generated simultaneously

📋 Code Generator Features

Languages: Python, JavaScript, VBA, TypeScript, Java, C#, and more

Includes: Error handling, validation, best practices

  • ✓ Production-quality code with inline comments
  • ✓ Comprehensive unit & integration tests
  • ✓ API integrations (REST, GraphQL, OAuth)
  • ✓ Excel automation (VBA, Office.js)
  • ✓ Database operations (SQL, NoSQL)

📚 Research Mode

4-stage pipeline: Planning → Research → Synthesis → QA. Includes RAG, web search, legal APIs, ArXiv auto-download, and optional agentic deep research.

📖 Research Paper Generation

Stages: 4 main + optional agentic (Stage 2b)

Word Counts: 3k, 5k, 10k, 15k, 20k, 25k, 30k

Models: 5 unique (DeepSeek V3, Qwen 235B Thinking, Qwen 7B, Llama 405B)

Stage 1: DeepSeek V3 (685B MoE) — Outline Planning
Stage 2: Qwen 235B Thinking — RAG + Web + Legal API Research with Deep Reasoning
Stage 2b (Optional): Qwen 7B — Enhanced Agentic Research
• Gap Analysis & Categorization (5 types)
• Multi-API Routing (17 specialty APIs)
• Confidence Scoring & Contradiction Detection
• Iterative Gap Filling (up to 3 iterations)
Stage 3: Llama 405B — Incremental Synthesis with Citations
Stage 4: DeepSeek V3 (685B MoE) — Quality Assurance & Citation Validation

🔍 Research Mode Features

Data Sources: RAG + 17 Specialty APIs

Export: PDF, DOCX, Markdown

  • ✓ Enhanced agentic research with gap categorization
  • ✓ Multi-API integration (Tavily, Brave, ArXiv, FRED, World Bank, etc.)
  • ✓ Coverage analysis & confidence scoring
  • ✓ Contradiction detection across sources
  • ✓ Multi-jurisdiction legal APIs (US, EU, UK)
  • ✓ OSCOLA citation formatting (mandatory)
  • ✓ Real-time progress tracking

🎯 Intelligent Features

  • 🧠 Adaptive Complexity Detection: GLM-4.5-Air-FP8 analyzes queries on 7 dimensions (depth, breadth, reasoning, ambiguity, expertise, temporal, stakes)
  • 📚 Complete OSCOLA Citation System (NEW): ALL sources return OSCOLA-formatted citations - RAG ([GL-N]), Web ([WEB-N]), Government ([GOV-N]), ArXiv ([ARXIV-N]) - with mandatory Oxford Standard formatting
  • 🔍 Universal Web Search Verification: Real-time Tavily web search for ALL complexity levels (5/7/10 results for low/medium/high) with query truncation for 400-char API limits
  • 🗃️ RAG Citation Metadata: Document snippet previews and source tracking with collection name, OSCOLA citation, jurisdiction, and authority scores
  • Fast-Path Bypass: Trivial queries skip analysis for Perplexity-level speed
  • 🔄 Fallback Systems: GLM-4.5-Air → Qwen3-80B heuristics when dispatcher unavailable
  • 🌍 Multi-Jurisdiction Legal APIs (Analysis & Research): US (CourtListener, Congress.gov, Regulations.gov, eCFR, GovInfo), EU (EUR-Lex SPARQL), UK (Find Case Law, legislation.gov.uk) with OSCOLA citations
  • 📖 ArXiv Auto-Download with Citations: Automatic paper retrieval with proper OSCOLA academic citations (Author et al, 'Title' (Year) arXiv:ID)
  • Self-Calibration: Optional calibrator for continuous improvement of complexity classification
  • 🤖 Adaptive Reasoning: DeepSeek-R1 automatically selected for queries with reasoning score ≥9 requiring chain-of-thought logic

📊 Token Allocation Analytics

Token Efficiency with GLM-4.5-Air-FP8: Stage 1 dispatching now uses GLM-4.5-Air-FP8 (6K tokens) across all specialty modes, reducing token usage by 14-50% compared to previous arcee-ai models (7-12K tokens) while maintaining superior semantic understanding. The pipelines maintain a ~0.31-0.33 word/token ratio, optimizing for maximum content while reserving buffers for 17-API intelligent routing, web search context integration, and academic referencing.

🌐 Active Data Integrations by Specialty

🤖 GLM-4.5-Air-FP8 Intelligent Routing: All specialty modes now use semantic understanding to automatically select the most relevant APIs for each query. The badges below show primary APIs for each mode, but GLM can route to any of the 17 data sources based on query context.

Specialty Primary API Access (GLM routes intelligently)
🔍 Ask Anything NewsAPI Tavily Wikipedia Weather.gov Polygon World Bank RAG
💹 Financial Dashboard Polygon.io Alpha Vantage FinancialModelingPrep (FMP) SEC EDGAR FRED World Bank NewsAPI
🎨 Creative RAG Tavily Wikipedia NASA NewsAPI
⚖️ Legal CourtListener UK Legislation Congress.gov SEC EDGAR RAG Tavily NewsAPI World Bank
💻 Technical RAG Tavily arXiv Wikipedia NASA NewsAPI
🧩 Consultant FRED World Bank Polygon FinancialModelingPrep (FMP) SEC EDGAR RAG NewsAPI Tavily
🔬 Research (Intelligent Orchestrator) RAG (Always) arXiv CourtListener UK Legislation Congress FRED World Bank Polygon FinancialModelingPrep (FMP) SEC EDGAR AlphaVantage NASA Wikipedia NewsAPI Tavily Brave

📈 Market Data Integration

Example Sources

  • 📊 Polygon.io - Primary market data provider
  • 📈 Alpha Vantage - Fallback & extended data
  • 🌡️ NOAA - Weather data
  • 📰 News APIs - Current events

Auto-Detection Features

  • ✅ Stock symbols (AAPL, TSLA, etc.)
  • ✅ Company tickers
  • ✅ Financial terms
  • ✅ Market indicators

🏛️ Knowledge Base & RAG System

The Grand Library

A centralized document management system for uploading, organizing, and retrieving documents across personal, public, and legal collections. Upload PDFs, DOCX, TXT, Excel files with custom categories and tags for structured knowledge organization.

📄 Layout-Aware Visual PDF Processing

Advanced PDF processing powered by Llama-4-Scout vision model that preserves document structure:

  • Visual Understanding: Processes PDFs as images to capture tables, charts, diagrams, and complex layouts
  • Intelligent Region Detection: Identifies headers, paragraphs, captions, and maintains reading order
  • Structure Preservation: Extracts content while preserving document hierarchy and relationships

📚 Auto-Fetch ArXiv Papers

Research & Technical modes automatically download relevant ArXiv papers to the Grand Library when academic sources are cited:

  • Automatic Detection: Identifies ArXiv paper references in web search results
  • Background Download: Fetches PDFs asynchronously without blocking your query
  • Visual Processing: Applies Llama-4-Scout layout-aware extraction to scientific papers
  • RAG Integration: Papers become searchable in Grand Library for future queries

Agentic RAG Pipeline

The 3-stage Retrieval-Augmented Generation (RAG) system intelligently searches your documents and provides citation-backed answers:

Stage 1: Query Analysis - Extracts key entities, intent, and search parameters from your question
Stage 2: Hybrid Search & Reranking - Vector similarity + keyword matching with title-aware scoring using mixedbread-ai reranker
Stage 3: Answer Generation - Synthesizes retrieved content into coherent answers with clickable [GL-n] citations

🤖 RAG Only Mode - Advanced Agentic Pipeline

A sophisticated 3-stage agentic pipeline powered by LangGraph orchestration for intelligent document retrieval and synthesis. Uses pdfminer.six for layout detection and Llama-4-Scout vision model for visual processing of tables, charts, and figures in PDFs:

Stage 1: Dispatcher

Model: zai-org/GLM-4.5-Air-FP8
Role: Analyzes query intent and extracts search parameters for document retrieval. May use APIs for verification of RAG-retrieved data only.
Function: Query understanding, RAG tool parameter optimization, and optional API verification
Constraint: APIs used only for factual validation, not new discovery

Stage 2: Analyzer

Model: deepcogito/cogito-v2-preview-deepseek-671b
Role: Evaluates quality and sufficiency of retrieved results, calculates confidence scores, determines if supplementation or self-correction is needed
Function: Quality assurance and intelligent gap detection

Stage 3: Synthesizer

Model: openai/gpt-oss-120b
Role: Creates comprehensive final answer from all sources with OSCOLA-style citations
Function: Answer generation and citation formatting

Supporting Components:
  • 📊 Embeddings: togethercomputer/m2-bert-80M-32k-retrieval (768 dimensions)
  • 🎯 Reranker: mixedbread-ai/mxbai-rerank-large-v2
  • 💾 Vector DB: Qdrant
  • 🔄 Orchestration: LangGraph (conditional routing & self-correction loops)
  • 📄 PDF Processing: pdfminer.six (layout detection) + Llama-4-Scout (visual analysis of tables/charts)

✨ Integrated Features

🎯 5 Tone Options

Direct, Professional, Academic, Friendly, Creative

🔒 Professional-Grade Security

Input sanitization, rate limiting, threat detection

📚 Citation Support

OSCOLA (legal), APA (business/research)

📈 Market Intelligence

Polygon.io + Alpha Vantage

🧠 Memory System

Conversation continuity per chat session

👥 Multi-User Support

Tier-based access control system

🔄 Pipeline Processing Flow

🚀 Platform Capabilities

  • GLM-4.5-Air-FP8 intelligent dispatching - 106B MoE model with semantic API routing across 17 data sources
  • Specialized expertise across 7 domains - Each with optimized multi-stage pipelines
  • Multi-stage reasoning pipelines - Up to 4 stages (including code generation)
  • Market intelligence - Real-time data from Polygon.io, Alpha Vantage, FRED, World Bank
  • Legal & regulatory research - CourtListener, UK Legislation, Congress.gov, SEC EDGAR integration
  • Academic research - arXiv, NASA, Wikipedia with auto-download to Grand Library
  • Academic citation capabilities - OSCOLA (legal), APA (business/research)
  • Professional-grade security - Multi-tier authentication and rate limiting
  • Conversation memory - Session-based context retention with 128K context windows