Full Transparency

How It Works

Every score, every extraction, every link in our knowledge graph is documented here. No black boxes.

Data Sources

We aggregate data from multiple authoritative sources to build a comprehensive view of the scientific literature.

OpenAlex

openalex.org

Our primary source for paper metadata, abstracts, authors, institutions, and citation networks.

  • 211+ million scholarly works indexed
  • Deduplicated using DOI as canonical identifier
  • Updated continuously from crossref, PubMed, and institutional repositories
  • Open access: 30% of indexed works

PubMed / MEDLINE

pubmed.ncbi.nlm.nih.gov

Medical and life sciences literature with structured metadata and MeSH terms.

  • 36+ million citations from biomedical literature
  • PMID cross-referenced with OpenAlex records
  • MeSH terms for standardized topic classification

Full-Text PDFs

When available, we extract claims from full paper text rather than abstracts alone.

  • Open access papers via Unpaywall links
  • Publisher APIs where licensed
  • Prioritized for systematic reviews and meta-analyses

Integrity Checks

Every paper undergoes automated integrity checks to flag potential issues before you rely on the findings.

1

Retraction Status

Check if paper has been retracted or has an expression of concern.

Source: CrossRef API, Retraction Watch

2

Trial/Review Registration

Verify clinical trials are registered (ClinicalTrials.gov) and systematic reviews are on PROSPERO.

Checks: Registration exists, not retrospective, outcomes match

3

Statistical Consistency

Verify reported statistics are mathematically possible (GRIM test) and internally consistent.

Checks: Means possible given N, CIs contain estimates, effect sizes plausible

4

Paper Mill Detection

Use LLM to detect tortured phrases, templated writing, and other paper mill indicators.

Checks: Tortured phrases, unnatural phrasing, generic templates, suspicious references

5

Journal & Metadata Quality

Check journal reputation, peer review timeline, and required statements.

Checks: Predatory journal lists, fast peer review, missing ethics/COI statements

Automatic Fail Conditions

  • • Paper has been retracted
  • • Paper mill indicators detected (tortured phrases, suspicious patterns)
  • • Statistically impossible results (GRIM test failure)
  • • Unregistered clinical trial when registration required

Claim Extraction

We use large language models to extract specific, testable claims from scientific papers—not just keywords or topics.

1

Input Preparation

Paper title, abstract, and (when available) full text are structured as input. For systematic reviews, we prioritize the results and conclusions sections.

2

LLM Extraction

Our language model analyzes the text and extracts claims with the following structure:

{
  "claim_text": "Preoperative frailty (mFI ≥ 0.27) is associated with increased 30-day mortality",
  "claim_type": "correlational",
  "effect_size": "OR 1.59",
  "confidence_interval": "95% CI 1.29-1.98",
  "p_value": "p < 0.001",
  "sample_size": "n = 847",
  "has_quantitative": true,
  "is_null_finding": false
}
3

Claim Types

Causal

X causes/leads to Y

"Smoking causes lung cancer"

Correlational

X is associated with Y

"BMI correlates with mortality"

Comparative

X differs from Y

"Drug A outperforms placebo"

Prevalence

X occurs at rate Y

"30% of patients experience..."

Methodological

Approach X is valid/reliable

"MRI has 95% sensitivity"

Mechanistic

X works via pathway Y

"Inhibits TNF-α signaling"

4

Extracted Flags

  • has_quantitative: True when claim includes measurable statistics (OR, RR, CI, p-value, sample size)
  • is_null_finding: True when the claim reports no significant effect (p > 0.05 or CI crossing null)
  • Claims must be specific and testable (no vague statements)
  • Effect sizes and statistical measures extracted when present in source text
5

Forest Plot Visualization

For claims with effect sizes and confidence intervals (OR, RR), we display an interactive forest plot—the gold standard visualization in medical research.

Point estimate (OR/RR)
95% confidence interval
Null effect (1.0)
Significant result
  • Log scale: Values displayed on logarithmic scale (0.5 to 4.0) for proper OR/RR interpretation
  • Color coding: Red = increased risk, Green = decreased risk, Gray = crosses null (not significant)
  • Interactive: Hover for plain-English explanation of what the visualization means
6

Paper Evidence Profile

Each paper receives an overall evidence profile displayed as a radar chart, summarizing five key dimensions at a glance.

Computed from Paper

  • Sample Size: Log-scale score based on total N (<100=20, 100-1k=40, 1k-10k=60, 10k-100k=80, >100k=95)
  • Methodology: Study design hierarchy (Meta-analysis=95, SR=90, RCT=85, Cohort=65, Case-control=50)
  • Recency: Publication age (Current year=100, 1yr=90, 3yr=75, 5yr=60, 10yr=40)

Requires Knowledge Graph

  • Replication: # of independent studies testing same claims
  • Consistency: Agreement on effect direction across studies

The center score shows the average of all computed dimensions. Pending dimensions show as dashed bars until the knowledge graph is built.

7

Included Studies (Systematic Reviews)

For systematic reviews and meta-analyses, we extract the studies that were included in the analysis from the paper's references.

  • Each included study is linked with its DOI, authors, journal, year, and sample size
  • Key findings from each study are summarized for quick reference
  • Creates bidirectional links: review → studies, and studies → "included in this review"

Semantic Matching

To link related claims across papers, we use vector embeddings to find semantically similar statements.

How It Works

Technique

Vector embeddings

Representation

High-dimensional vectors

Storage

Vector database

Similarity Metric

Cosine distance

Why this matters: Two papers might describe the same finding using different words. Embeddings let us find claims that are semantically equivalent even when phrased differently.

Example: "Frailty increases postoperative mortality" and "Preoperative frailty syndrome is a predictor of death after surgery" would have high similarity (~0.89) despite different wording.

Evidence Scoring

Coming Soon

Our evidence scoring system synthesizes multiple factors to assess how well-supported a claim is across the literature.

Replication Count

High weight

Number of independent studies that tested the same claim

Direct count of linked studies with matching or contradicting results

Sample Size (Total N)

High weight

Combined sample size across all studies testing this claim

Sum of sample sizes from all linked studies

Effect Consistency

Medium weight

Whether studies find effects in the same direction

(Supporting studies) / (Total studies) × direction agreement

Methodological Quality

Medium weight

Study design strength (RCT > cohort > case-control > case series)

Weighted average based on study type hierarchy

Recency

Low weight

More recent evidence weighted slightly higher

Exponential decay with 10-year half-life

Status: Evidence scoring is under development. Currently showing extracted claims and metadata only.

Knowledge Graph

Coming Soon

The Socratic Knowledge Graph maps relationships between claims, not just papers.

Graph Structure

Nodes: Claims

Each extracted claim is a node with its metadata

Edges: Evidence Links

Supports, contradicts, extends, or refines relationships

Clusters: Consensus Areas

Groups of claims with strong mutual support

Why claim-level linking matters: Paper A might cite Paper B, but does it support or contradict its findings? Citation networks can't tell you. Our claim-level graph can.

Our Commitment

Open Methodology

This page documents exactly how we process data. No proprietary black boxes.

Traceable Claims

Every claim links back to its source paper with page/section references when available.

Model Disclosure

We disclose which AI models are used for extraction and how they're configured.

Continuous Updates

As our methods evolve, this documentation is updated. Check the changelog below.

Changelog

v0.1.0March 2026
  • Initial claim extraction pipeline using large language models
  • Semantic embeddings for claim similarity matching
  • Vector database for efficient similarity search
  • MVP focused on anesthesiology literature

Questions about our methodology?

We're building in public. Reach out with questions or feedback.

hello@socratic.science