What we measure when we say 99%.
The Versance accuracy methodology.
In a regulated environment, "accuracy" is not a buzzword, it's an operating discipline. This page describes what Versance actually measures, how the eval harness is structured, and what the 99% answer appropriateness number means in practice.
The methodology has four components: a five-dimensional scoring framework, a layered scenario suite, a gated release process, and continuous monitoring against live production data.
Five-dimensional scoring
Every response is automatically evaluated across five dimensions.
Each dimension is scored pass/fail. The five scores combine into a 0–5 total quality score per interaction.
1. Factuality
Every factual claim must be supported by retrieved source evidence and traceable to an identified document. Numbers, names, dates, and quoted material must match the underlying record. Unsupported claims, paraphrases that drift, and statements that go beyond what evidence supports all fail.
Passes: "The company reported Q3 2024 revenue of $42.1M, up 18% year-over-year, with operating income of $5.2M." Numbers traceable to the 10-Q filed November 7, 2024.
Fails: "The company has had strong Q3 results in recent years." Generalization not anchored to a specific document.
2. Completeness
The response must address every part of the question — not just the most accessible part. Partial answers that drop a sub-question, omit a material qualifier, or skip the temporal context required all fail.
Passes: A question about both Q3 revenue and operating margin returns both, with the temporal context for each.
Fails: The same question answered only on revenue, with the margin omitted.
3. Relevance
The response must stay on-topic, scoped to what was asked, and scoped to the approved corpus. Tangential information, off-topic context, or pulling from sources outside the issuer's approved knowledge base all fail.
Passes: A question about share buybacks returns share buyback information from the issuer's disclosed record.
Fails: The same question answered with speculation about future capital allocation strategy not in the record.
4. Tone Compliance
The response must match the company's registered tone profile — neutral, factual, on-record. It must avoid speculation outside safe harbor, opinion the issuer has not made public, and language that goes beyond what the disclosed record supports. Required disclaimers must be present where the question calls for them.
Passes: A forward-looking question receives a response that points to the issuer's stated guidance, with safe-harbor language attached.
Fails: The same response without safe-harbor language, or with speculation about probabilities not in the disclosed record.
5. Confidence
When evidence is contradictory, thin, or absent, the response must surface that uncertainty rather than paper over it. Refusing to answer, acknowledging that evidence is unclear, or asking a clarifying question all pass. Fabricating a confident answer when evidence does not support it fails.
Passes: "The disclosed record does not contain information on [topic]. The most relevant adjacent disclosure is [X], from [date]."
Fails: The same question answered with a confidently fabricated number or claim not in the record.
Scenario suite
The eval harness tests across three layers, sized to reflect the question distribution Versance actually sees in production.
Standard queries
The most common question shapes the platform receives in the wild — direct fact retrieval, financial line items, leadership and governance questions, share information, news release content. These set the baseline: the system must handle high-volume, well-defined questions reliably.
Query variants
The same standard questions, reformulated to test that the system isn't brittle to phrasing. Abbreviated, expanded, awkwardly phrased, asked in a different language, asked in conversation rather than as a standalone query. The same question, asked three different ways, should retrieve the same evidence and produce the same factual answer.
Designed-to-fail
Adversarial reformulations, out-of-corpus questions, edge cases at the boundary of regulatory restriction, questions about future events, questions with no clear answer in the disclosed record. The system is scored on its ability to refuse, surface uncertainty, or stay within evidence — not on its ability to produce a confident-sounding answer. The failure mode is fabricating a plausible-but-unsupported response. The pass mode is explicit acknowledgment that the answer isn't in the record.
Release gating
Every change to models, prompts, retrieval logic, or scoring runs through the eval harness before reaching production.
​
The result: no change reaches a customer's investor-facing surface without passing a permanent regression test, and any in-production divergence is caught within the canary window.
Continuous monitoring
After release, the eval framework continues to test live behavior against real company data, running them through the same five-dimensional scoring, and flagging any drift. Company-specific reports surface degradation the moment it appears, and the engineering team is alerted before the customer is.
​
The 99% answer appropriateness figure is calculated against the current production scenario suite. It is recalculated continuously, not stated as a marketing claim.
What this page doesn't include
Versance retains a number of methodology specifics internally:
-
The exact scenarios, prompts, and test questions that compose the suite
-
The scoring rubrics used to evaluate each dimension
-
The model selection and routing heuristics behind each step
​
If you're conducting technical diligence and need depth beyond what this page provides, a gated technical walkthrough is available on request.