How it works · Judge Mode

An agentic RAG decision cockpit, not a chatbot

Forked Futures does not predict your future and it does not choose for you. It runs a major decision through evidence retrieval, a local evidence graph, a multi-agent debate, calibration, a safety layer, and an auditable reasoning trail — turning a single anxious question into three testable scenarios with their costs, assumptions and uncertainties made explicit, so the final call stays yours.

Agentic RAG9-role pipelineEvidence graphHuman-in-the-loopMock-first · no API key needed
What the AI does · one screen
input → action
  1. Inputone messy, half-formed situation, in your own words — no options or clean framing required
  2. AI transformationa guided question journey (each question led by your last answer) → a path reveal that can reframe your options → 3 routes retrieved, simulated through a 9-role debate, pressure-tested, and run through a deterministic safety scrub
  3. Outputthree evidence-grounded future scripts, the routes you didn't enter (unlived futures), and a Decision Brief
  4. Human checkpointyou edit the revealed fork and routes, choose which to enter, and keep the final call — the AI never chooses for you
  5. Responsible-AI guardrailpossible futures, not predictions; no fabricated probabilities; unchosen routes stay visible to fight tunnel vision; mock-first, no private data
  6. Final actionrun one cheap 7-day test to replace your biggest unknown with real signal

Why AI — not search, a form, or a spreadsheet

  • Search retrieves facts but does not reason over your personal tradeoffs.
  • Forms collect fixed answers but cannot adapt to open-ended uncertainty.
  • Spreadsheets compare static criteria but never surface hidden assumptions or generate a validation test.

AI earns its place by turning messy context into structured futures, explicit disagreements, honest evidence limits, and a next action — none of which a lookup, a form, or a grid can produce.

Mock-first today — the full journey runs with no API key. The live model and web-search providers are implemented and provider-ready, but not yet benchmarked. Nothing here requires private data, and the app never identifies real people.

The guided journey

A blurry situation becomes a testable fork

The main entry point is a route adventure, not a form. You enter one messy situation; the system asks one question at a time, each shaped by your previous answer, then reveals a few possible paths — including reframed ones you never named — before handing off to the route map.

Input → dynamic questions → reveal → map

  1. 01Enter one messy situation — no options or clean framing required.
  2. 02Answer a short chain of questions, each one led by your last answer.
  3. 03See a reveal: the decision, the value conflict, and a few possible (sometimes reframed) paths, each with labeled evidence.
  4. 04Open the route map — the paths become a full three-route simulation you can compare.
  5. 05Enter one path as “You are here”; the others stay visible as unlived futures.
  6. 06Read the brief and run one cheap 7-day test to replace your biggest unknown with real signal.

How the journey stays honest

  • Each question is generated from your situation and every prior answer — the chain is causal, not a fixed quiz, so the next node follows from what you just said.
  • It runs on a live model when a key is present and a deterministic mock fallback otherwise — both Zod-validated, so the journey never dead-ends.
  • Every revealed path shows what shaped it, labeled as user-provided signal, curated reference, or AI-inferred assumption — inferences are never dressed up as citations.
  • The journey renders as a decision tree you walk — a pixel traveler moves node to node and unchosen options stay visible as greyed branches. Each revealed path carries an evidence-fit score (how strongly it matches your answers and the reference support behind it) — a transparent match score, not a prediction.
For judges

How this maps to the rubric

Where each scoring category is demonstrated in the product — navigate straight to the evidence.

Rubric Map · where to look
judge guide
  • Problem Understanding20 pts
    Intake · Clarifying Questions · Landing

    Captures the decision, values, constraints and fears in the user's own words; clarifying questions target the binding constraint and untested assumptions; the landing makes the 'why AI, not a rules engine' case explicit.

  • AI Reasoning30 pts
    Branch Detail · Judge Mode

    Agentic RAG (retrieval + a local evidence graph), a 9-role debate with an explicit Optimist vs Skeptic pass, Agent Review, Reasoning Audit Trail, qualitative calibration, and the Trajectory Atlas of reference futures.

  • Solution Design25 pts
    Whole flow · Judge Mode

    A coherent input → reasoning → output → action pipeline; the cockpit UI; a Zod-enforced data contract with mock-first stability; and a reproducible evaluation harness.

  • Impact & Insight15 pts
    Decision Brief · Branch Detail

    Decision Delta (before/after), per-assumption stress tests, 7-day validation experiments, and a final mission brief that moves the user from uncertainty to a testable next step.

  • Responsible AI10 pts
    Everywhere

    Qualitative-only calibration, provenance-tagged claims, a safety scrubber with surfaced rejected overclaims, no fabricated probabilities, human-in-the-loop ('what the AI will not decide'), and the evals that guard it.

The pipeline

Input → reasoning → output → action

A directed flow, not a single prompt. Your context enters on the left; what comes out the right is something you can test this week — not a verdict.

1
Input
Your context

The intake decision, options, values, constraints, fears — plus your answers to AI-generated clarifying questions.

  • Decision + options
  • Values & constraints
  • Clarifying answers
2
Reasoning
Multi-agent debate

Nine specialised roles retrieve evidence, traverse the graph, build scenarios, debate optimist vs skeptic, calibrate, and safety-check — emitted as one structured, auditable record.

  • Context → Retrieval → Evidence
  • Scenario → Optimist ⇄ Skeptic
  • Calibration → Safety → Synthesis
3
Output
Three futures

Three evidence-grounded branches, each with qualitative calibration — evidence strength, fit, constraint risk, uncertainty — and no fabricated probabilities.

  • 3 plausible branches
  • Provenance-tagged claims
  • Qualitative calibration
4
Action
Test, then decide

Each branch ships a 7-day experiment with kill criteria, and a Decision Brief frames what the AI will not decide — so the call stays with you.

  • 7-day experiments
  • Kill criteria
  • Decision Brief
The agents

Nine specialised roles, one debate

Each role does one job and passes its output forward — including an explicit Optimist vs Skeptic debate before calibration. Separating reasoning from retrieval, critique and safety is what keeps the system honest about what it does and doesn't know. In a single live call these roles are run in one pass; the mock fallback emits the same structured record.

ContextAgent

Structures messy intake — decision, options, values, constraints, fears — into a clean frame the rest of the pipeline reasons over.

RetrievalAgent

Pulls relevant cards from the curated knowledge base + official-source pack, then traverses the local evidence graph.

ScenarioAgent

Drafts exactly three distinct, plausible future branches — never a single recommended path.

EvidenceAgent

Attaches evidence to each branch with full provenance and states each source's coverage limitations honestly.

OptimistAgent

Argues the strongest hedged case for why a branch could work — one side of an explicit debate.

SkepticAgent

Argues how a branch could fail, feeding the pre-mortem and kill criteria — the other side of the debate.

CalibrationAgent

Rates evidence, fit, constraint risk and uncertainty qualitatively — no invented probabilities.

SafetyAgent

Rewrites deterministic/overconfident language and records the categories of overclaim it rejected.

SynthesisAgent

Reconciles the debate into the Future Map, Branch Detail, Audit Trail and Decision Brief.

Design choice

Why an LLM — and not a rules engine

A decision like this is not a lookup. The reasoning a fixed table cannot encode is exactly the reasoning that matters.

Language-model reasoning

  • +Reads open-ended, messy human context — values, fears, constraints written in your own words.
  • +Reasons about novel combinations of options it has never seen paired before.
  • +Surfaces non-obvious tradeoffs and the unstated assumptions hiding inside a choice.
  • +Explains its reasoning in natural language a person can question and push back on.

A fixed rules table

  • A fixed rules table needs every input pre-categorised; nuance falls through the cracks.
  • Only handles option sets someone explicitly hard-coded ahead of time.
  • Can only return tradeoffs that were manually written down for that exact branch.
  • Returns opaque scores from a lookup table that cannot encode situational judgement.
Evidence & data layer

Grounded in a local knowledge base — never fabricated

The RetrievalAgent runs keyword retrieval over a small, curated set of local JSON files. Claims are kept at occupation, field or framework level — Forked Futures never invents a precise individual statistic about you.

Local /knowledge sources

computing_careers.jsonstartup_validation.jsongrad_school_research.jsondecision_science.jsonlabor_market_sources.json

Keyword retrieval matches your decision and options against these files, then surfaces only the cards that are relevant. Every claim carries its true coverage level — occupation, field or framework — so a field-wide signal is never dressed up as a personal forecast.

Claim provenance

Every claim is tagged by where it came from, so you can weigh it accordingly.

You told us

Something you told us directly.

Source-supported

Backed by a curated source, kept at its true coverage level.

AI-inferred

A reasonable AI inference — flagged, and meant to be tested.

Evidence pack & graph

An official-source RAG pack, wired into a local graph

Beyond the curated knowledge base, branches cite a pack of official public sources — each card carrying its publisher, coverage level and limitations, with no invented exact statistics. A local evidence graph connects them to paths, skills, risks and experiments.

RAG pack · 19 cards
local JSON
  • College ScorecardU.S. Department of Education
  • Occupational Outlook HandbookU.S. Bureau of Labor Statistics
  • O*NET OnLineU.S. Department of Labor (O*NET)
  • First-Destination SurveyNational Association of Colleges and Employers (NACE)
  • Baccalaureate and Beyond Longitudinal Study (B&B)National Center for Education Statistics (NCES)
  • American Community Survey — Public Use Microdata Sample (PUMS)U.S. Census Bureau
  • Pre-mortem (Gary Klein)Gary Klein, Harvard Business Review (2007)
  • Decision Science (curated frameworks)Curated research frameworks
Local evidence graph
29 nodes · 28 edges

A dependency-free node/edge model (no graph database) links the CS foundation to each path, the skills it requires, the constraints and risks that bound it, the decision frameworks that inform it, the official sources that support it, and the 7-day experiment that can test it. Retrieval traverses it to widen evidence beyond a keyword hit.

sourcecareer pathskillconstraintdecision frameworkriskexperiment
Autonomous web research

An agent that researches, then reasons

Forked Futures plans safe public queries, retrieves and ranks sources, rejects weak ones with reasons, and extracts trajectory anchors — live when a search key is present, otherwise over a curated corpus. See the live console at /research.

Search provider
live-optional

A provider abstraction (lib/web) runs live web search when a key is configured — Google Programmable Search by default, and pluggable to Tavily / SerpAPI / Exa — and otherwise falls back to a curated public-source corpus. The demo runs on the mock corpus with no key; keys are read from the environment and never logged or exposed.

Google Programmable SearchTavily / SerpAPI / Exa-readyMock fallback (no key)
Source ranking
quality > precision
  • High — official / statistical (.gov, .edu) and established frameworks.
  • Medium — cohort surveys and public career guidance, used as analogies and flagged for survivorship bias.
  • Low — anecdotes, unverified, or stale pages — kept as leads, rejected as evidence with a stated reason.
Why this is not person-matching
responsible
  • No face-based identification, no identifying private people, no de-anonymization.
  • Queries are about roles, fields, programs, and frameworks — never person lookups.
  • Public references are role archetypes and guidance used as analogies — never "you resemble this person".
Safety filters
enforced
  • A query guard skips person-lookup patterns; result counts are capped.
  • Every source carries coverage level + limitations; aggregates are never personalized.
  • An eval (eval-research-quality) enforces these properties.
Decision DNA

What the decision is really about

A sharp, hypothesis-framed diagnosis — shown here for the Alex demo; the same derivation runs for any decision.

Decision DNA · what this is really about
a hypothesis to test
Diagnosis

One possible diagnosis: Alex isn't choosing a final identity — he's choosing which kind of signal to produce next, and the cheapest way to decide is to produce a little of each this week. (A hypothesis to test, not a conclusion.)

Core tension

This decision appears to center on three forms of compounding — institutional signal (quant), asymmetric upside (startup), and intellectual depth (research) — and which one to invest the next year in.

The decision underneath

Underneath the stated career choice is a sharper question: what kind of evidence does Alex want the next 12 months to produce about himself?

Value conflict
income upsideautonomyintellectual challengelong-term optionality
Claim Ledger

How do we know it's accurate?

Every important claim is traceable to its support, reliability, and limits — source-supported vs AI-inferred, made explicit. Shown here for the Alex demo.

Source-supportedReliabilityhighaffects Branch · Quant Signal Track

Quant Signal Track: The OOH frames quant/analyst roles, typical wage ranges, and demand direction at the occupation level — not what any one person could earn.

AI-inferredReliabilitylowaffects Assumption · Quant Signal Track

Quant Signal Track: Your algorithms and probability base is genuinely interview-strong.

Source-supportedReliabilitymediumaffects Branch · Startup Validation Track

Startup Validation Track: Describes the mix of post-graduation paths a cohort follows, not what any one graduate will do.

AI-inferredReliabilitylowaffects Assumption · Startup Validation Track

Startup Validation Track: You can run a real validation sprint without sacrificing the paid summer.

Scope: claims are the cited source's own coverage-level framing (copied with provenance and kept at the source's coverage level). The system does not assert facts beyond what a source covers; AI-inferred claims carry no source and are routed to a validation experiment.

Sample trace · Alex

One decision, end to end

The exact path the Alex demo takes through the system — from input to a human-controlled brief — so the whole pipeline is auditable in about 30 seconds. Pulled live from the running mock data, not a screenshot.

Decision trace · quant-signal branch
judge mode
  1. 1
    Input · context

    After my sophomore year, should I go all-in on quant recruiting, try to build a startup, or aim for a research / grad-school path?

  2. 2
    Retrieval · agentic RAG

    5 evidence cards attached, 2 from official sources (Occupational Outlook Handbook, O*NET OnLine).

  3. 3
    Evidence graph

    11 nodes and 10 links explain why this branch exists.

  4. 4
    Debate · optimist

    If the probability and coding base is genuinely interview-ready, leaning into legible quant signals could align well with how this niche tends to recruit, and an early internship may compound into stronger later positioning.

  5. 5
    Debate · skeptic

    The branch could fail on timing rather than ability if prep starts too late to be competitive this cycle, and narrowing to quant-legible signals may cut optionality if the base turns out thinner than assumed.

  6. 6
    Calibration · qualitative

    Evidence medium · fit medium · constraint-risk low · uncertainty medium — no fabricated probabilities.

  7. 7
    Safety · overclaims rejected

    Softened a deterministic offer-landing claim into hedged, factor-dependent language the model cannot observe.

  8. 8
    Action · first experiment

    Do 3 timed, interview-style probability problems; record your honest score and where you stalled.

  9. 9
    Human-controlled brief

    The AI explicitly will not decide: How to weight financial security against intellectual aliveness — that's your values call.

Impact · Decision Delta

From uncertainty to action

What the Alex decision looks like before vs. after Forked Futures. Every figure on the right is a real count of what the system produced — not a claim about any outcome.

Before Forked Futures
  • 3 options, weighed mostly by gut feeling
  • Hidden tradeoffs and opportunity costs unnamed
  • Assumptions untested and untagged — fact and guess blurred
  • Uncertainty felt, but never made explicit
  • No concrete next step to take this week
After Forked Futures
  • 3 future branches, opened side by side
  • 16 evidence cards (6 from official sources), with provenance
  • 11 assumptions tagged by provenance, each with a way to test it
  • 12 uncertainty drivers surfaced, not hidden
  • 3 validation experiments to run this week
Trajectory Atlas

Reference futures, as analogies

The system maps the user's anchors to curated role trajectories — analogies to learn from, never predictions and never a real person. Shown here for the Alex demo.

Trajectory Atlas · reference futures
analogies · not predictions
Your anchors
Field: Computer Science (sophomore)Skill: Strong at algorithms & probabilityMotivation: Long-term optionalityConstraint: Need a paid internship next summer (limited savings)Risk profile: Deciding soon

These curated role trajectories rhyme with your anchors — they are analogies to learn from, not forecasts. Resonance reflects overlap with what you told us; it does not predict your outcome, and you are not being matched to any real person.

Quant / research engineerStrong resonance

A pattern of turning probabilistic thinking, code, and competitive problem-solving into roles where markets and models are the scoreboard — offered as an analogy to rehearse against, not a forecast.

Why it rhymes

This archetype often rhymes with a context where someone gravitates toward problems that have a measurable answer — competitive programming, math olympiads, Kaggle, betting puzzles, or building models for fun — and enjoys the tension of being scored against others under time pressure. It may resonate when probability and statistics feel like a native language rather than a chore, when you like reducing messy systems to legible signals, and when a tight, well-defined feedback loop motivates you more than open-ended ambiguity. It is a pattern, not a forecast: the resonance is in the shape of how you like to think, not in any destiny.

Survivorship · We mostly see the survivors. The people profiled in this archetype are the ones who cleared a steep, lossy funnel and stayed; the comparable people who were equally sharp but burned out, hit a bad recruiting cycle, were filtered by an interview format that did not suit them, or quietly left for calmer work are largely invisible. Stories about this path are heavily selected for the ones that worked, which makes the road look smoother and more deterministic than the base experience tends to be.

7-day test · For seven days, spend about an hour daily on the actual texture of the work and notice your energy, not just your performance. Days 1-3: do timed quantitative/probability puzzles or a competitive-programming set and log whether the time pressure energizes or drains you. Days 4-5: build one tiny model on real data (a simple predictor, a trading-strategy backtest on free historical data) and sit with how it feels when it performs worse than a coin flip. Days 6-7: do one mock structured interview (a friend, a forum, or recorded self-review) and read a day-in-the-life account from someone in the field. The test is not whether you ace it — it is whether the loop pulls you back in when no one is grading you.

Indie product builderStrong resonance

A pattern of building and shipping small software products solo or in a tiny team, funded by usage rather than investors, where distribution and durability tend to matter as much as the code itself.

Why it rhymes

If you already finish small things end-to-end (a class tool, a Discord bot, a scraper a friend actually uses) and feel more energized by a stranger using your thing than by a clean grade, this pattern may resonate. It often rhymes with people who enjoy owning the whole loop, from a rough idea to a deployed thing someone touched, and who treat "would anyone pay for this" as a fun question rather than a scary one. A student who naturally posts what they make and watches who responds is rehearsing the distribution muscle this path leans on.

Survivorship · We mostly hear from the ones it worked for. The indie builder essays, revenue screenshots, and "I quit my job" threads are written by survivors; the far larger group who shipped quietly, got no traction, and moved on rarely writes the postmortem. Public revenue numbers are self-reported and self-selected, so the visible distribution looks far kinder than the underlying one. Read this archetype as a possible shape of a path, not as evidence that it usually pays off.

7-day test · Pick the smallest version of one idea and put it in front of real strangers within seven days: build a one-page landing site or a barely-working prototype, then spend most of the week not coding but getting it seen, sharing it in three communities where your would-be users already hang out and messaging ten people directly. Track only two things, how many people engaged unprompted and how it felt to promote it daily. If indifference or the act of selling drains you, that is useful signal; if a few strangers lean in and the chase energizes you, that rhymes with this pattern, with no week being a verdict either way.

Technical founderStrong resonance

A pattern of someone who turns the ability to build software into the engine of a company, owning both the product and the risk of getting it in front of customers.

Why it rhymes

This archetype often rhymes with a context where you already make things for their own sake — side projects that escape the assignment, tools you build because the existing one annoyed you, a habit of shipping something small and watching whether anyone uses it. It may resonate when you feel more energized by an unsolved, ambiguous problem than by a well-specified one, and when "who decides" matters to you as much as "what gets decided." A pull toward ownership and autonomy, plus a tolerance for not knowing if the thing will work, tends to be the connective tissue here. This is a pattern, not a forecast.

Survivorship · We mostly see the survivors. The founders who get written up, funded, or invited to speak are a tiny, filtered slice; the far larger group whose companies quietly closed, who returned to salaried work, or who burned through savings rarely show up in the stories that shape this archetype. Treat any sense of "this usually works out" as an artifact of who gets remembered, not evidence about how the path typically goes.

7-day test · In seven days, pick one small problem you've personally felt, build the roughest possible version (a landing page, a script, or a no-code mock — not a polished product), and put it in front of five real strangers who have that problem. Ask them to use it and watch where they get confused; try to get one person to say yes to a next step (a signup, a payment, a follow-up call). Notice your own reaction: did the selling and the rejection drain you or pull you in? That felt response is the signal, more than whether the thing worked.

Graduate-school researcherStrong resonance

A path that trades near-term salary and certainty for years of deep, mentored inquiry on a narrow question, where progress is uneven, autonomy grows slowly, and the payoff arrives late and unevenly.

Why it rhymes

A student's context may resonate with this archetype when they keep chasing one question past where a class or grade required it, enjoy the slow loop of reading, testing, failing, and revising, and feel more energized by a hard open problem than by a clear deliverable. It often rhymes with people who already seek out a mentor's feedback, tolerate long stretches without external validation, and value being near the frontier of what is known over near-term income. This is a pattern to compare yourself against, not a forecast of where you would land.

Survivorship · We mostly hear from the people who finished, published, and landed a role they wanted, so this archetype is distorted by survivorship bias. The accounts that shape its appeal rarely come from those who left mid-program, burned out, could not place after graduating, or stayed in long unstable contract research, even though those outcomes are a real part of the distribution. Treat glowing portrayals as a filtered sample, not the typical case.

7-day test · Over seven days, spend about an hour daily acting like a junior researcher on one question that genuinely nags you: on day 1 write the question and why it matters; days 2 to 3 read two or three real papers or sources and take structured notes; day 4 email or message one researcher or grad student a specific, informed question about their work and the day-to-day reality; days 5 to 6 attempt a tiny piece of original analysis, replication, or critique and let it be messy; day 7 reflect in writing on whether the slow, uncertain, self-directed loop energized or drained you, and whether the lack of quick feedback felt tolerable. Notice your honest reaction to ambiguity and delayed payoff, since that is the core fit signal this path tests.

Creator-builderPartial resonance

A pattern of compounding through publicly shipped work — content, products, or both — where an owned audience and distribution become the real asset, not any single job title.

Why it rhymes

This may resonate when a student already makes things without being assigned to — a side project, a zine, a YouTube channel, code shipped for fun — and feels more alive showing work than collecting credentials. It often rhymes with people who learn by publishing in public, who treat feedback from strangers as fuel rather than threat, and who would rather own a small audience than rent status from an employer. If "I want my output to have my name on it" feels truer than "I want a clear ladder to climb," the analogy may fit.

Survivorship · We mostly see the creators who made it. The viral channel, the bootstrapped product that hit, the writer with the book deal — these are loud and visible precisely because they are rare, while the vastly larger number who shipped consistently for years and never found an audience are invisible by design. Reasoning backward from successes ("they posted daily, so daily posting works") confuses a necessary habit with a sufficient cause, and silently ignores everyone who did the same and got silence. Treat any single creator's story as one survivor's path, not a recipe.

7-day test · For seven days, publish one small finished thing every day in your chosen medium — a short post, a clip, a feature, a page — under your own name, in public, with no edits-forever excuses. Pick a fixed time, ship even when it's rough, and at week's end notice three things honestly: did shipping energize or drain you, how did you react to silence or criticism, and did you keep going on the days no one responded? The signal you're testing is your relationship to consistent public output, not how many likes you got.

AI researcherPartial resonance

A path built around pushing the frontier of machine learning through open-ended experiments, math, and published work, where progress is measured in hard questions answered rather than tickets closed.

Why it rhymes

This archetype may resonate when a student is drawn to open-ended problems with no answer key, finds the math behind models more interesting than shipping a feature, and feels energized rather than discouraged by long stretches of being stuck. A context where someone reads papers for fun, tinkers with model internals, or keeps asking "but why does this work?" tends to rhyme with the research mindset. It often resonates for people who would rather understand a system deeply than use it broadly. This is a pattern, not a forecast.

Survivorship · We mostly see the survivors: the cited papers, the famous labs, the breakthroughs. The far larger population of people who left, whose results never published, or who spent years on ideas that quietly didn't work is largely invisible. Public stories overweight the dramatic wins and underweight the slow, ambiguous, often-stuck reality, so this archetype can look more certain and more rewarding than the typical lived experience.

7-day test · Pick one recent ML paper in an area you find interesting and spend the week trying to genuinely understand it, then reproduce one small claim or figure in a notebook from scratch. Keep a short daily log of where you got stuck and how you felt about being stuck. At the end, ask: did the ambiguity and the slow, no-answer-key grind energize you or drain you? That reaction is a cheap, honest signal of fit with the research day-to-day, not a verdict on your ability.

Lower resonance — shown for honesty
Public writer / public intellectualWeak resonancePolicy / research fellowWeak resonance
Responsible AI

Uncertainty shown, not hidden — and the decision stays yours

Responsible framing is structural here, enforced at multiple layers rather than left to a single careful prompt.

How hedged framing is enforced

  • The SafetyAgent rewrites deterministic language and enforces scenario framing.
  • Zod-validated structured outputs reject any payload that breaks the contract, with automatic retry.
  • A mock fallback means the UI stays calibrated and honest even with no API key.
  • Uncertainty is rendered as a first-class signal — shown on every branch, never buried.

What the AI will not decide

The system can frame, retrieve and stress-test — but it cannot weigh what only you can weigh. These belong to you, and the Decision Brief says so explicitly:

ValuesRisk toleranceFamilyIdentityLived experience

Based on current assumptions, Forked Futures may suggest experiments and surface tradeoffs — but the choice tends to hinge on context no model holds. That is by design, and the final decision stays with you.

Everything here is an evidence-grounded future script — built from your context, public knowledge, and explicit assumptions. Plausible trajectories, not deterministic predictions: claims are tagged by where they came from, uncertainty is shown rather than hidden, and the choice stays with you.

System evaluation

The safeguards are tested, not just asserted

A framework-free evaluation harness guards overclaim safety, evidence coverage, the output schema, and the keyless demo journey. The last local run, reproducible in one command:

System Evaluation · last local run
7 / 7 passing
  • PASSOverclaim safetyno banned / probability language across 100 user-facing files
  • PASSRAG coverage362 checks · 3 branches · 7 official sources · 19 cards · 6 fidelity
  • PASSAgent output schema33 checks · all 3 branches satisfy the Zod contract + carry every v2 artifact
  • PASSDemo journeycovers all pages + APIs incl. /research (45 checks with a live server) · keyless fallback intact
  • PASSResearch quality65 checks · dossier well-formed; sources carry limits; rejected sources state reasons; no person-matching
  • PASSResearch robustness50 checks across 5 non-Alex scenarios · the research dossier generalizes to any decision
  • PASSSpecificity40 checks · Decision DNA is specific (named tension, per-branch bottlenecks, 7-day tests) · no generic-advice language

Reproduce locally with npm run eval (or npm run validate for typecheck + evals + build). These are deterministic checks over the code, data and docs — not live-model results. The live model path is optional and not yet benchmarked. Last local run: 2026-06-17.

Under the hood

Mock-first, so it runs with no API key

A small, modern stack chosen so the experience is reliable offline and the structured outputs can be trusted.

Next.js 14 App RouterTypeScriptTailwind CSSZod-validated structured outputsLocal knowledge baseframer-motionMock-first fallback

Forked Futures does not use AI to choose your future. It uses AI to help you understand the futures you're choosing between.