Large language models (LLMs) promise effortless content creation: essays in seconds, books on demand, reports that appear at the click of a button. Yet this apparent miracle of productivity hides a disturbing fact. These models operate by recombining the work of others. The sentences they generate are stitched from patterns extracted from books, newspapers, online forums, research papers, and code repositories. The resulting text is fluent, but it is not original in the scholarly sense. It is a form of plagiarism at scale, where attribution is absent by design.
The paradox is that this same mechanism, the extraction of human language at scale, creates practical value. Students can learn faster, professionals can draft reports, and researchers can receive rapid summaries. The usefulness is undeniable. At the same time, this usefulness exists only because of a knowledge commons that has been built over centuries. Teachers wrote textbooks, librarians preserved archives, volunteers edited Wikipedia, and scholars produced peer-reviewed research. LLMs are useful not because they invent, but because they extract.
The ethical stakes are enormous. In academia, even close paraphrase without citation counts as plagiarism (American Psychological Association, 2020, p. 254). In journalism, borrowing phrasing or framing without acknowledgment can lead to dismissal. Yet when a generative system reproduces argument structures or mimics style, the practice is often celebrated as innovation. This double standard corrodes the norms that protect authorship and creativity.
At the same time, the knowledge commons, the infrastructure of human effort on which these systems depend, is underfunded and increasingly fragile. University presses close, libraries face budget cuts, and open-source communities struggle to survive. If AI companies continue to extract without reinvesting, the very foundation of their usefulness will collapse. What looks like free knowledge today will become a desert tomorrow.
These cases show that the theoretical categories of plagiarism—wording leakage, style appropriation, and idea-level recombination—are not confined to abstract analysis. They are playing out in courts, classrooms, and workplaces today.
The solution is not abandonment of generative AI. The solution is reciprocity. Attribution layers must be developed so that outputs point back to likely sources. Compensation pools must redistribute revenue to authors, libraries, and repositories. Universities, publishers, and public agencies must adopt procurement rules that enforce data provenance and reinvestment standards.
If we continue to accept plagiarism at scale as innovation, we risk destroying the very commons that makes these tools valuable. The call is clear: reinvest in the infrastructures of knowledge, or watch them disappear.
Agustin V. Startari \n Linguistic theorist and researcher in historical studies. Author ofGrammars of Power, Executable Power, and The Grammar of Objectivity. \n ORCID: https://orcid.org/0000-0002-5792-2016 \n Zenodo: https://zenodo.org/me/uploads?q=&f=sharedwithme%3Afalse&l=list&p=1&s=10&sort=newest \n SSRN Author Page: https://papers.ssrn.com/sol3/cfdev/AbsByAuth.cfm?perid=7639915 \n Website: https://www.agustinvstartari.com/
I do not use artificial intelligence to write what I don’t know. I use it to challenge what I do. I write to reclaim the voice in an age of automated neutrality. My work is not outsourced. It is authored. — Agustin V. Startari


