Clean, human-written text from before the slop.
CleanSource is a curated library of pre-2022, public-domain texts — summarized, tagged, and bundled into themed packs you can drop straight into a RAG pipeline or simply read. Every word is human-authored and license-clear.
Why clean source matters now
The open web is filling with machine-generated text. Training and grounding on it causes model collapse and quiet factual drift. CleanSource gives you a verifiable floor.
Provenance you can cite
Every text is US public domain, human-authored, and predates the generative-AI web. The bundle ships the attestation with it.
RAG-ready bundles
Editorial summary, tags, themes, and suggested chunking in one clean JSON manifest — point your embedder at it and go.
Curated, not scraped
Hand-picked canonical works across philosophy, science, economics, and literature. Quality over a dump of noise.
From the library
View all →Meditations
Marcus Aurelius
The private notebook of a Roman emperor, written to himself as Stoic self-correction. Short, aphoristic entries on duty, mortality, anger, and accepting what is outside one's control. A clean, dense source of moral-reasoning prose untouched by modern editorializing.
The Republic
Plato
Plato's dialogue on justice, the ideal state, and the nature of knowledge, framed by the allegory of the cave. Dialectic question-and-answer structure makes it a rich corpus for reasoning-chain and argument-modeling tasks.
Thus Spake Zarathustra
Friedrich Nietzsche
Nietzsche's philosophical novel in prophetic, poetic prose — the Übermensch, eternal recurrence, and the revaluation of values, delivered as parable. Distinctive stylistic register, valuable for tone and rhetoric modeling.
Beyond Good and Evil
Friedrich Nietzsche
A critique of past philosophers and traditional morality, arguing for a philosophy of the future grounded in the will to power. Tight argumentative aphorisms across 296 numbered sections — well-chunked for retrieval.
On the Origin of Species
Charles Darwin
Darwin's foundational argument for evolution by natural selection, built from patient observation and careful inductive reasoning. A model of evidence-driven scientific prose for grounding RAG over primary-source science.
Relativity: The Special and General Theory
Albert Einstein
Einstein's own popular exposition of special and general relativity, written for the general reader. Clear explanatory structure with worked thought-experiments — useful for technical-explanation and pedagogy datasets.
Themed packs for builders
Bundles of related works, pre-organized for a domain. One-time ₹1999 / $24, or go unlimited at ₹999/mo.
The Stoic & Philosophical Mind
Four cornerstones of Western moral reasoning — from Marcus Aurelius's private discipline to Nietzsche's revaluation of values. Pre-chunked for argument-modeling and reasoning-chain RAG.
₹1999 · $24
Science & Reason
Primary-source scientific prose from Darwin and Einstein — evidence-driven argumentation and clear technical explanation, ideal for grounding factual RAG in human-authored science.
₹1999 · $24
Economics & Society
The texts that shaped how we argue about markets, liberty, and class — Smith, Mill, Marx & Engels. Long-form analytical reasoning, fully license-clear.
₹1999 · $24
English Canon Starter
A free taster pack: three short, stylistically distinct classics to evaluate CleanSource bundles in your pipeline before you buy.
Free taster
New packs, in your inbox
We add curated, license-clear packs regularly. Get told when one lands in your domain.