Data Sources
We aggregate data from multiple authoritative sources to build a comprehensive view of the scientific literature.
OpenAlex
openalex.orgOur primary source for paper metadata, abstracts, authors, institutions, and citation networks.
- 211+ million scholarly works indexed
- Deduplicated using DOI as canonical identifier
- Updated continuously from crossref, PubMed, and institutional repositories
- Open access: 30% of indexed works
PubMed / MEDLINE
pubmed.ncbi.nlm.nih.govMedical and life sciences literature with structured metadata and MeSH terms.
- 36+ million citations from biomedical literature
- PMID cross-referenced with OpenAlex records
- MeSH terms for standardized topic classification
Full-Text PDFs
When available, we extract claims from full paper text rather than abstracts alone.
- Open access papers via Unpaywall links
- Publisher APIs where licensed
- Prioritized for systematic reviews and meta-analyses
Integrity Checks
Every paper undergoes automated integrity checks to flag potential issues before you rely on the findings.
Retraction Status
Check if paper has been retracted or has an expression of concern.
Source: CrossRef API, Retraction Watch
Trial/Review Registration
Verify clinical trials are registered (ClinicalTrials.gov) and systematic reviews are on PROSPERO.
Checks: Registration exists, not retrospective, outcomes match
Statistical Consistency
Verify reported statistics are mathematically possible (GRIM test) and internally consistent.
Checks: Means possible given N, CIs contain estimates, effect sizes plausible
Paper Mill Detection
Use LLM to detect tortured phrases, templated writing, and other paper mill indicators.
Checks: Tortured phrases, unnatural phrasing, generic templates, suspicious references
Journal & Metadata Quality
Check journal reputation, peer review timeline, and required statements.
Checks: Predatory journal lists, fast peer review, missing ethics/COI statements
Automatic Fail Conditions
- • Paper has been retracted
- • Paper mill indicators detected (tortured phrases, suspicious patterns)
- • Statistically impossible results (GRIM test failure)
- • Unregistered clinical trial when registration required
Claim Extraction
We use large language models to extract specific, testable claims from scientific papers—not just keywords or topics.
Input Preparation
Paper title, abstract, and (when available) full text are structured as input. For systematic reviews, we prioritize the results and conclusions sections.
LLM Extraction
Our language model analyzes the text and extracts claims with the following structure:
{
"claim_text": "Preoperative frailty (mFI ≥ 0.27) is associated with increased 30-day mortality",
"claim_type": "correlational",
"effect_size": "OR 1.59",
"confidence_interval": "95% CI 1.29-1.98",
"p_value": "p < 0.001",
"sample_size": "n = 847",
"has_quantitative": true,
"is_null_finding": false
}Claim Types
Causal
X causes/leads to Y
"Smoking causes lung cancer"
Correlational
X is associated with Y
"BMI correlates with mortality"
Comparative
X differs from Y
"Drug A outperforms placebo"
Prevalence
X occurs at rate Y
"30% of patients experience..."
Methodological
Approach X is valid/reliable
"MRI has 95% sensitivity"
Mechanistic
X works via pathway Y
"Inhibits TNF-α signaling"
Extracted Flags
- has_quantitative: True when claim includes measurable statistics (OR, RR, CI, p-value, sample size)
- is_null_finding: True when the claim reports no significant effect (p > 0.05 or CI crossing null)
- Claims must be specific and testable (no vague statements)
- Effect sizes and statistical measures extracted when present in source text
Forest Plot Visualization
For claims with effect sizes and confidence intervals (OR, RR), we display an interactive forest plot—the gold standard visualization in medical research.
- Log scale: Values displayed on logarithmic scale (0.5 to 4.0) for proper OR/RR interpretation
- Color coding: Red = increased risk, Green = decreased risk, Gray = crosses null (not significant)
- Interactive: Hover for plain-English explanation of what the visualization means
Paper Evidence Profile
Each paper receives an overall evidence profile displayed as a radar chart, summarizing five key dimensions at a glance.
Computed from Paper
- Sample Size: Log-scale score based on total N (<100=20, 100-1k=40, 1k-10k=60, 10k-100k=80, >100k=95)
- Methodology: Study design hierarchy (Meta-analysis=95, SR=90, RCT=85, Cohort=65, Case-control=50)
- Recency: Publication age (Current year=100, 1yr=90, 3yr=75, 5yr=60, 10yr=40)
Requires Knowledge Graph
- Replication: # of independent studies testing same claims
- Consistency: Agreement on effect direction across studies
The center score shows the average of all computed dimensions. Pending dimensions show as dashed bars until the knowledge graph is built.
Included Studies (Systematic Reviews)
For systematic reviews and meta-analyses, we extract the studies that were included in the analysis from the paper's references.
- Each included study is linked with its DOI, authors, journal, year, and sample size
- Key findings from each study are summarized for quick reference
- Creates bidirectional links: review → studies, and studies → "included in this review"
Semantic Matching
To link related claims across papers, we use vector embeddings to find semantically similar statements.
How It Works
Technique
Vector embeddings
Representation
High-dimensional vectors
Storage
Vector database
Similarity Metric
Cosine distance
Why this matters: Two papers might describe the same finding using different words. Embeddings let us find claims that are semantically equivalent even when phrased differently.
Example: "Frailty increases postoperative mortality" and "Preoperative frailty syndrome is a predictor of death after surgery" would have high similarity (~0.89) despite different wording.
Evidence Scoring
Coming SoonOur evidence scoring system synthesizes multiple factors to assess how well-supported a claim is across the literature.
Replication Count
High weightNumber of independent studies that tested the same claim
Direct count of linked studies with matching or contradicting results
Sample Size (Total N)
High weightCombined sample size across all studies testing this claim
Sum of sample sizes from all linked studies
Effect Consistency
Medium weightWhether studies find effects in the same direction
(Supporting studies) / (Total studies) × direction agreement
Methodological Quality
Medium weightStudy design strength (RCT > cohort > case-control > case series)
Weighted average based on study type hierarchy
Recency
Low weightMore recent evidence weighted slightly higher
Exponential decay with 10-year half-life
Status: Evidence scoring is under development. Currently showing extracted claims and metadata only.
Knowledge Graph
Coming SoonThe Socratic Knowledge Graph maps relationships between claims, not just papers.
Graph Structure
Nodes: Claims
Each extracted claim is a node with its metadata
Edges: Evidence Links
Supports, contradicts, extends, or refines relationships
Clusters: Consensus Areas
Groups of claims with strong mutual support
Why claim-level linking matters: Paper A might cite Paper B, but does it support or contradict its findings? Citation networks can't tell you. Our claim-level graph can.
Our Commitment
Open Methodology
This page documents exactly how we process data. No proprietary black boxes.
Traceable Claims
Every claim links back to its source paper with page/section references when available.
Model Disclosure
We disclose which AI models are used for extraction and how they're configured.
Continuous Updates
As our methods evolve, this documentation is updated. Check the changelog below.
Changelog
- Initial claim extraction pipeline using large language models
- Semantic embeddings for claim similarity matching
- Vector database for efficient similarity search
- MVP focused on anesthesiology literature
Questions about our methodology?
We're building in public. Reach out with questions or feedback.
hello@socratic.science