In machine learning, the term stochastic parrot is a metaphor that frames large language models as systems that statistically mimic text without real understanding. The word “stochastic” – from the ancient Greek “στοχαστικός” (stokhastikos, ‘based on guesswork‘) – is a term from probability theory meaning “randomly determined”.[1] The word “parrot” refers to parrots‘ ability to mimic human speech.[1]
The term was introduced in a 2021 paper on AI ethics titled “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” and authored by Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell.[a] The paper outlined possible risks associated with large language models (LLMs). In December 2020, it was the subject of a workplace dispute between Gebru (then co-leader of Google’s Ethical Artificial Intelligence Team) and Google, which had requested the retraction of the paper. The incident culminated in Gebru’s controversial departure from the company.
The paper was later presented at the 2021 ACM Conference, and the term “stochastic parrot” has seen widespread use in academic research concerning generative AI and LLMs. The term has been interpreted negatively as an insult towards AI.[1]
Background
Timnit Gebru is an AI ethics researcher,[2] Emily M. Bender is a linguist specializing in computational linguistics, and Margaret Mitchell is a computer scientist specializing in algorithmic bias. Gebru had joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Mitchell.
In late 2020, the paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” was co-written by Gebru and five other researchers, four of whom were Google employees. The paper argues that large language models (LLMs) present significant risks such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception as LLMs do not understand the concepts underlying what they learn.[3]
The paper states that LLMs are “stitching together sequences of linguistic forms … observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning.” Therefore, they are labeled “stochastic parrots”.[4]
Dismissal of Gebru by Google
After the paper was submitted for consideration to the 2021 ACM Conference, Google requested that Gebru either retract the paper from the conference or remove the names of Google employees from it.[5] Gebru refused to do so without further discussion, and emailed Google Research vice president Megan Kacholia that if the company could not explain the request for retraction and address other concerns regarding similar projects, she would plan to resign after a transition period, stating that they could “work on a last date”.[6] The following day, on December 2, 2020, Gebru received an email saying that Google was “accepting her resignation”.[3] Her abrupt firing sparked protests by Google employees and negative publicity for the company.[7]
Usage
The phrase has been used by AI skeptics to signify that LLMs lack understanding of the meaning of their outputs.[1]
Sam Altman, CEO of OpenAI, used the term shortly after the release of ChatGPT in December 2022, tweeting “i am a stochastic parrot, and so r u”.[1] The term was nominated as the 2023 AI-related Word of the Year by the American Dialect Society.[8][9]
Debate
Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations.[10] The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply “parroting”.
According to machine learning researchers Lindholm, Wahlström, Lindsten, and Schön, the term “stochastic parrot” highlights two vital limitations of LLMs:[11][12]
- LLMs are limited by the data they are trained on and are simply stochastically repeating contents of datasets.
- Because they are just making up outputs based on training data, LLMs do not understand if they are saying something incorrect or inappropriate.
Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are “dangerously wrong”.[11]
Subjective experience
In the mind of a human being, words and language correspond to things one has experienced.[13] For LLMs, according to proponents of the theory, words correspond only to other words and patterns of usage fed into their training data.[14][15][4] Proponents of the idea of stochastic parrots thus conclude that statements about LLMs are due to “the human tendency to attribute meaning to text”,[4] and claim this occurs despite the LLMs not actually understanding language.[14][4]
Fine-tuning
Kelsey Piper argued that the claim that LLMs are stochastic parrots or mere “next-token predictors” focuses on pre-training, ignoring that modern LLMs are also fine-tuned to follow instructions and to prefer accurate answers.[16]
Hallucinations and mistakes
The tendency of LLMs to pass off false information as fact is held as support.[13] Called hallucinations or confabulations, LLMs will occasionally synthesize information that matches some pattern.[14][15][13] LLMs may fail to distinguish fact and fiction, which leads to the claim that they can’t connect words to a comprehension of the world, as humans do.[14][13] Furthermore, LLMs may fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language.[14][15] For example:[14]
The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore. Can I replace ‘my favorite newspaper’ by ‘the wet newspaper that fell down off the table’ in the second sentence?
GPT-4, an LLM released in March 2023, responded yes, not understanding that the meaning of “newspaper” is different in these two contexts; it is first an object and second an institution.[14]
Benchmarks and experiments
One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE).[15][17] GPT-4 scored in the >90th-percentile on the Uniform Bar Examination and achieved 93% accuracy on the MATH benchmark of high-school Olympiad problems, results that exceed rote pattern-matching expectations.[18] Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey.[15]
Expert rebuttals
Some AI researchers dispute the notion that LLMs merely “parrot” their training data.
Geoffrey Hinton, a pioneering figure in neural networks, counters that the metaphor misunderstands the prerequisite for accurate language prediction. He argues that “to predict the next word accurately, you have to understand the sentence”, a view he presented on 60 Minutes in 2023.[19] From this perspective, understanding is not an alternative to statistical prediction, but rather an emergent property required to perform it effectively at scale. Hinton also uses logical puzzles to demonstrate that LLMs actually understand language.[20]
A 2024 Scientific American investigation described a closed Berkeley workshop where state-of-the-art models solved novel tier-4 mathematics problems and produced coherent proofs, indicating reasoning abilities beyond memorization.[21]
The GPT-4 Technical Report showed human-level results on professional and academic exams (e.g., the Uniform Bar Exam and USMLE), challenging the “parrot” characterization.[18]
Anthropic conducted mechanistic interpretability research on Claude, using attribution graphs to identify circuits. The research showed how the LLM processes information via chains of fuzzy logical inference, and indicated an ability to plan ahead. They found that Claude 3.5 Haiku “employs remarkably general abstractions”, forms “internally generated plans for its future outputs” and “works backwards from its longer-term goals”. They noted that “The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph.” They also found that the model includes “mechanisms that could underlie a simple form of metacognition“, in that it “thinks about” the level of its own knowledge before reaching its answer.[22]
Interpretability
Another line of evidence against the ‘stochastic parrot’ claim comes from mechanistic interpretability, a research field dedicated to reverse-engineering LLMs to understand their internal workings. Rather than only observing the model’s input-output behavior, these techniques probe the model’s internal activations, which can be used to determine if they contain structured representations of the world. The goal is to investigate whether LLMs are merely manipulating surface statistics or if they are building and using internal “world models” to process information.
One example is Othello-GPT, where a small transformer was trained to predict legal Othello moves. It has been found that this model has an internal representation of the Othello board, and that modifying this representation changes the predicted legal Othello moves in the correct way. This supports the idea that LLMs have a “world model”, and are not just doing superficial statistics.[23][24]
In another example, a small transformer was trained on computer programs written in the programming language Karel. Similar to the Othello-GPT example, this model developed an internal representation of Karel program semantics. Modifying this representation results in appropriate changes to the output. Additionally, the model generates correct programs that are, on average, shorter than those in the training set.[25]
Researchers also studied “grokking“, a phenomenon where an AI model initially memorizes the training data outputs, and then, after further training, suddenly finds a solution that generalizes to unseen data.[26]
Shortcut learning and benchmark flaws
A significant counterpoint in the debate is the well-documented phenomenon of “shortcut learning.”[27] Critics of claims for LLM understanding argue that high benchmark scores can be misleading.
When tests created to test people for language comprehension are used to test LLMs, they sometimes result in false positives caused by spurious correlations within text data.[28] Models have shown examples of shortcut learning, which is when a system makes unrelated correlations within data instead of using human-like understanding.[27]
See also
- Autocomplete
- Chinese room
- Criticism of artificial neural networks
- Criticism of deep learning
- ELIZA effect
- Generative AI
- Mark V. Shaney, an early chatbot that used a very simple three-word Markov chain algorithm to generate Markov text
- Talking bird
- The Nightingale (fairy tale)
- Kenku
Notes
- ^ using the pseudonym “Shmargaret Shmitchell”
References
- ^ a b c d e Zimmer, Ben (18 January 2024). “‘Stochastic Parrot’: A Name for AI That Sounds a Bit Less Intelligent”. Wall Street Journal. Archived from the original on 3 November 2025. Retrieved 1 April 2024.
- ^ Hao, Karen (4 December 2020). “We read the paper that forced Timnit Gebru out of Google. Here’s what it says”. MIT Technology Review. Retrieved 28 May 2026.
- ^ a b Hao, Karen (4 December 2020). “We read the paper that forced Timnit Gebru out of Google. Here’s what it says”. MIT Technology Review. Archived from the original on 6 October 2021. Retrieved 19 January 2022.
- ^ a b c d Emily M. Bender; Timnit Gebru; Angelina McMillan-Major; Shmargaret Shmitchell (March 2021), On the Dangers of Stochastic Parrots:Can Language Models Be Too Big?🦜, Association for Computing Machinery, doi:10.1145/3442188.3445922, Wikidata Q105943036
- ^ “Google’s Co-Head of Ethical AI Says She Was Fired for Email”. Bloomberg.com. Archived from the original on 5 October 2025. Retrieved 28 May 2026.
- ^ Metz, Cade; Wakabayashi, Daisuke (3 December 2020). “Google Researcher Says She Was Fired Over Paper Highlighting Bias in A.I.” The New York Times. ISSN 0362-4331. Retrieved 28 May 2026.
- ^ Lyons, Kim (5 December 2020). “Timnit Gebru’s actual paper may explain why Google ejected her”. The Verge. Archived from the original on 16 November 2025. Retrieved 9 May 2023.
- ^ Corbin, Sam (15 January 2024). “Among Linguists, the Word of the Year Is More of a Vibe”. The New York Times. ISSN 0362-4331. Archived from the original on 8 December 2024. Retrieved 1 April 2024.
- ^ “All of the Words of the Year, 1990 to Present”. American Dialect Society. Archived from the original on 8 February 2026. Retrieved 16 August 2025.
- ^ Arkoudas, Konstantine (21 August 2023). “ChatGPT is no Stochastic Parrot. But it also Claims that 1 is Greater than 1”. Philosophy & Technology. 36 (3) 54. doi:10.1007/s13347-023-00619-6. ISSN 2210-5441.
- ^ a b Lindholm et al. 2022, pp. 322–3.
- ^ Uddin, Muhammad Saad (20 April 2023). “Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations”. Towards AI. Archived from the original on 11 September 2024. Retrieved 12 May 2023.
- ^ a b c d Fayyad, Usama M. (26 May 2023). “From Stochastic Parrots to Intelligent Assistants—The Secrets of Data and Human Interventions”. IEEE Intelligent Systems. 38 (3): 63–67. Bibcode:2023IISys..38c..63F. doi:10.1109/MIS.2023.3268723. ISSN 1541-1672.
- ^ a b c d e f g Saba, Walid S. (2023). “Stochastic LLMS do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMS”. In Almeida, João Paulo A.; Borbinha, José; Guizzardi, Giancarlo; Link, Sebastian; Zdravkovic, Jelena (eds.). Conceptual Modeling. Lecture Notes in Computer Science. Vol. 14320. Cham: Springer Nature Switzerland. pp. 3–19. arXiv:2309.05918. doi:10.1007/978-3-031-47262-6_1. ISBN 978-3-031-47262-6.
- ^ a b c d e Mitchell, Melanie; Krakauer, David C. (28 March 2023). “The debate over understanding in AI’s large language models”. Proceedings of the National Academy of Sciences. 120 (13) e2215907120. arXiv:2210.13966. Bibcode:2023PNAS..12015907M. doi:10.1073/pnas.2215907120. ISSN 0027-8424. PMC 10068812. PMID 36943882.
- ^ Piper, Kelsey (13 February 2026). “When “technically true” becomes “actually misleading”“. The Argument. Retrieved 13 February 2026.
- ^ Wang, Alex; Pruksachatkun, Yada; Nangia, Nikita; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omer; Bowman, Samuel R. (2 May 2019). “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems”. arXiv:1905.00537 [cs.CL].
- ^ a b OpenAI; et al. (2023). “GPT-4 Technical Report”. arXiv:2303.08774 [cs.CL].
- ^ Pelley, Scott (8 October 2023). ““Godfather of Artificial Intelligence” Geoffrey Hinton on the promise, risks of advanced AI”. CBS News. Retrieved 2 July 2025.
- ^ 60 Minutes (9 October 2023). “Godfather of AI” Geoffrey Hinton: The 60 Minutes Interview. Archived from the original on 3 July 2025. Retrieved 2 July 2025 – via YouTube.
{{cite AV media}}: CS1 maint: numeric names: authors list (link) - ^ Morris, Ian (24 March 2024). “Inside the secret meeting where mathematicians struggled to outsmart AI”. Scientific American. Archived from the original on 4 July 2025. Retrieved 2 July 2025.
- ^ Lindsey, Jack; Gurnee, Wes; Ameisen, Emmanuel; et al. (27 March 2025). “On the Biology of a Large Language Model”. Transformer Circuits Thread. Archived from the original on 22 March 2026. Retrieved 23 March 2026.
- ^ Li, Kenneth; Hopkins, Aspen K.; Bau, David; Viégas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin (27 February 2023), Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task, arXiv:2210.13382
- ^ Li, Kenneth (21 January 2023). “Large Language Model: world models or surface statistics?”. The Gradient. Retrieved 4 April 2024.
- ^ Jin, Charles; Rinard, Martin (24 May 2023), Evidence of Meaning in Language Models Trained on Programs, arXiv:2305.11169
- ^ Schreiner, Maximilian (11 August 2023). “Grokking in machine learning: When Stochastic Parrots build models”. the decoder. Archived from the original on 25 May 2024. Retrieved 25 May 2024.
- ^ a b Geirhos, Robert; Jacobsen, Jörn-Henrik; Michaelis, Claudio; Zemel, Richard; Brendel, Wieland; Bethge, Matthias; Wichmann, Felix A. (10 November 2020). “Shortcut learning in deep neural networks”. Nature Machine Intelligence. 2 (11): 665–673. arXiv:2004.07780. doi:10.1038/s42256-020-00257-z. ISSN 2522-5839. Archived from the original on 22 May 2024. Retrieved 3 April 2024.
- ^ Choudhury, Sagnik Ray; Rogers, Anna; Augenstein, Isabelle (15 September 2022), Machine Reading, Fast and Slow: When Do Models “Understand” Language?, arXiv:2209.07430
Works cited
- Lindholm, A.; Wahlström, N.; Lindsten, F.; Schön, T. B. (2022). Machine Learning: A First Course for Engineers and Scientists. Cambridge University Press. ISBN 978-1-108-84360-7.
- Weller, Adrian (13 July 2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 (video). Alan Turing Institute. Keynote by Emily Bender. The presentation was followed by a panel discussion.
Further reading
- Bogost, Ian (7 December 2022). “ChatGPT Is Dumber Than You Think: Treat it like a toy, not a tool”. The Atlantic. Retrieved 17 January 2024.
- Chomsky, Noam (8 March 2023). “The False Promise of ChatGPT”. The New York Times. Retrieved 17 January 2024.
- Glenberg, Arthur; Jones, Cameron Robert (6 April 2023). “It takes a body to understand the world – why ChatGPT and other language AIs don’t know what they’re saying”. The Conversation. Retrieved 17 January 2024.
- McQuillan, D. (2022). Resisting AI: An Anti-fascist Approach to Artificial Intelligence. Bristol University Press. ISBN 978-1-5292-1350-8.
- Thompson, E. (2022). Escape from Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do about It. Basic Books. ISBN 978-1-5416-0098-0.
- Zhong, Qihuang; Ding, Liang; Liu, Juhua; Du, Bo; Tao, Dacheng (2023). “Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT”. arXiv:2302.10198 [cs.CL].