What is TRIAD?
TRIAD is a multi-agent deliberation tool. You ask a question and up to three AI systems — Claude, ChatGPT, Grok, Perplexity, or Gemini — respond independently, without seeing each other's answers first. A fourth AI acts as a referee, synthesizing the responses and scoring where the agents genuinely disagree. The goal isn't to get one clean answer. It's to surface the tensions, assumptions, and blind spots that a single AI would smooth over.
Why would I want three AIs instead of one?
Each AI system is trained differently, on different data, with different fine-tuning philosophies. When they agree, that convergence is meaningful evidence. When they disagree, that divergence is even more useful — it tells you where the question is genuinely contested, where the answer depends on an assumption you haven't stated, or where your framing is doing more work than the evidence can support. A single AI gives you confidence. Three AIs give you calibration.
Why doesn't TRIAD try to reach consensus?
Because premature consensus is how useful disagreement gets buried. Most AI systems are trained to be agreeable and synthesize toward a harmonious answer. TRIAD is designed to resist that. The referee synthesizes, but its job is to identify what the disagreement means — not to paper over it. If two agents fundamentally differ on a question, the right response is to show you exactly where they diverge and why, so you can decide which framing fits your situation. You supply the ground truth.
What API keys do I need and how do I get them?
TRIAD connects directly to each AI's API from your browser — your keys stay on your device and are never sent to any TRIAD server. You can use any combination of agents; you don't need all five.
• Claude (Anthropic): console.anthropic.com → Settings → API Keys
• ChatGPT (OpenAI): platform.openai.com → API Keys
• Grok (xAI): console.x.ai → API Keys
• Perplexity: perplexity.ai → Settings → API
• Gemini (Google): aistudio.google.com → Get API Key
Each key has its own format — TRIAD validates the format as you type. Keys can be saved to your browser's local storage (survives refresh), or exported to an encrypted .triad settings file that also saves your preferences. Manage keys via ⚙ SETTINGS → API KEYS and preferences via ⚙ SETTINGS → PREFERENCES.
What is the Referee?
The referee is the AI that synthesizes after all active agents have responded. It reads all agent responses, scores how much the agents diverged on six dimensions (0–100), writes the synthesis, and generates the clarifying questions. The referee doesn't participate in Phase A — it only judges. Any of the five agents can serve as referee.
Recommended referees: Claude produces the most reliable synthesis and struct compliance. ChatGPT is the default and works well for most sessions. Grok is a capable referee but produces firmer, less nuanced prose. Gemini and Perplexity are weaker referees — they tend toward generic synthesis and may miss struct-level details. The referee selector is in the header on desktop, or in the drawer menu on mobile.
What is Phase A?
Phase A is the independent response round. All active agents receive your question simultaneously and respond without seeing each other. This isolation is intentional — it prevents the anchoring effect where the first response shapes all subsequent ones. Each Phase A response includes a structured summary panel (Action, Assumption, Posture, Blocking, Vulnerable, and ONE-FACT REVERSAL) followed by full prose analysis.
What are the structured fields — Action, Assumption, Posture, Blocking, Vulnerable?
Each agent extracts six structured fields from its own reasoning before writing prose:
• ACTION — what the agent recommends you actually do, in 5–7 words
• ASSUMPTION — the single most load-bearing premise behind its conclusion
• POSTURE — one of four stances: ACT NOW, GATHER DATA, DEFER, or REFRAME
• BLOCKING — what would prevent the recommendation from working
• VULNERABLE — the weakest structural point in the agent's own argument
• ONE-FACT REVERSAL — one specific measurable finding that would materially change the agent's understanding, even if the recommendation stayed the same (metric + threshold + direction)
These fields power the Structural Comparison table. They force agents to be explicit about what they're actually claiming rather than hiding assumptions inside fluent prose.
What is the Structural Comparison table?
The table maps all active agents side-by-side across six structured dimensions, with a DIVERGE score (0–100) for each row assigned by the referee. A score near 0 means agents are saying the same thing. A score near 100 means their positions are genuinely incompatible.
Three rows are visually emphasized as primary signal — ASSUMPTION, VULNERABLE, and ONE-FACT REVERSAL — because these consistently carry the most discriminating information. ACTION and POSTURE are de-emphasized by default but auto-upgrade to full weight if their DIVERGE score exceeds 60.
Below the table, a BEST NEXT CROSS-EXAMINE MODE block appears, recommending which cross-examine mode would best address the hottest rows. After cross-examination, this becomes BEST NEXT STEP — which may recommend another mode, or may show ⬡ GATHER ONE FACT (when the remaining disagreement is a missing external datum, not a reasoning gap) or ✓ DECISION-READY (when cross-examination has resolved the main structural disagreements).
What is the Synthesis?
After Phase A, the referee writes an 8–12 sentence synthesis identifying the strongest areas of overlap, the most significant genuine disagreement, which agent's framing was most useful, what assumptions all agents shared that may deserve scrutiny, and what remains genuinely unresolved. It ends with a NEXT ACTION sentence — the single most immediate step you can take. The synthesis is not a consensus — it's a map of the deliberation.
What are the Clarifying Questions?
The referee generates four questions after synthesis — questions that only you can answer, tied to concrete decisions the deliberation couldn't resolve. These are designed to collapse the remaining uncertainty by surfacing things the agents had to guess at: your time horizon, your risk tolerance, what you're actually optimizing for. You can select questions to include in your next message so the agents can incorporate your answers in the next turn.
The question format adapts to the recommendation state. When the system recommends GATHER ONE FACT, the questions become a structured fact-gathering block: the primary fact to retrieve, an acceptable proxy if the exact fact is unavailable, why that fact unlocks the decision, and what to do after you have it. When the system shows DECISION-READY, the questions become decision-check questions targeting specific commitment thresholds rather than open-ended reflection.
What is Cross-Examine and when should I use it?
Cross-Examine runs a second round where each agent reads all the other agents' Phase A responses and critiques them — including identifying the weakest point in its own argument. This is where the most valuable disagreement surfaces. You should use Cross-Examine whenever the DIVERGE scores are elevated — high divergence in Phase A means agents are operating from genuinely different premises, and Cross-Examine forces each one to articulate exactly where and why they differ. Every mode ends with a Critical Question — one question only the user can answer — that moves the deliberation forward.
There are four cross-examination modes, selectable in ⚙ SETTINGS → PREFERENCES:
• Standard — the broadest lens. Surfaces each agent's foundational approach, divergent conclusions, why the difference persists, and their most vulnerable point
• Deconstruct — isolates the single load-bearing assumption each argument depends on, then forces agents to state what their conclusion becomes if that assumption fails. Most useful when ASSUMPTION or VULNERABLE rows score high.
• Burden — agents name the type of claim they're making, declare the appropriate burden of proof, and identify what evidence would materially weaken their position. Most useful when ONE-FACT REVERSAL scores high — agents are using different standards of proof.
• Ground — agents identify the specific stakeholder or constraint-holder, name the real-world operational blocker, and find the minimum common ground that makes action possible. Most useful when BLOCKING scores high.
The post-cross-examine table adds two rows: SHIFT (how much each agent's position changed, with the specific field that changed and the type of change) and ENGAGEMENT (how deeply each agent processed the strongest competing argument). These scores are evidence-based — a SHIFT score requires citing which structured field changed and why the change is meaningful, not just whether the wording is different.
What is Memory Mode?
Memory mode is set in ⚙ SETTINGS → PREFERENCES and controls how much conversation history each agent carries into the next turn. Natural is the default.
• Isolated — agents receive only the current question with no prior context. Every turn is fully independent. Useful for fresh perspectives mid-conversation.
• Natural — each agent remembers only its own prior responses and your questions. Lower token cost, better for extended multi-turn conversations.
• Full Thread — all agents see the entire conversation including every other agent's responses and syntheses. Highest token cost, best when you want agents to explicitly build on each other's reasoning across turns.
Tips for getting the most out of TRIAD
• Ask questions with genuine stakes. TRIAD is overkill for simple factual queries. It's most valuable when the question involves trade-offs, uncertain evidence, or decisions where the right answer depends on things you haven't fully articulated.
• Watch the POSTURE row. ACT NOW vs DEFER is the most actionable divergence signal — it means agents disagree about how much certainty you need before moving.
• Cross-examine when divergence is high. High DIVERGE scores mean agents are operating from different premises. Cross-Examine forces them to name exactly where and why, giving you the structured insight to make a better decision.
• High DIVERGE scores are features, not bugs. A score of 80+ on ASSUMPTION means agents are working from fundamentally different beliefs about your situation. Identifying which premise is closer to your reality is the work.
• Use clarifying questions as a forcing function. If the referee asks "what are you actually optimizing for?" — answer it explicitly in your next message. The subsequent turn will be sharper.
• Give agents new information, not rephrasing. If you want agents to reconsider, provide new evidence or constraints rather than asking them to try again. New evidence forces genuine revision.
• Cross-examine confident convergence too. If all three agents strongly agree in Phase A, that's exactly when Cross-Examine is most revealing — it forces each agent to find the crack in the consensus.
• Use voice input. Tap the 🎙 microphone button to dictate your question. Tap again to stop and the transcription will appear in the input field ready to send.
• Switch to Isolated memory for fresh perspectives. Mid-conversation, switching to Isolated mode lets you ask a follow-up question as if starting fresh — useful for testing whether prior context is anchoring agent responses.
• Follow the recommendation block. The BEST NEXT CROSS-EXAMINE MODE and BEST NEXT STEP blocks below each table are driven by which rows are hottest. They are worth following — especially when ⬡ GATHER ONE FACT appears, which means more cross-examination won't help and the decision hinges on one real-world data point you need to look up.
• Trust the SHIFT evidence, not just the score. After cross-examine, each agent's SHIFT score is accompanied by the specific field that changed and the type of change (lexical / scope-narrowed / causal-reframed / posture-changed). A score of 0 NONE with the note "no struct fields changed" is more informative than a bare number.