利用认知心理学理解 GPT-3。

Using cognitive psychology to understand GPT-3.

机构信息

Max Planck Research Group (MPRG) Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Tübingen 72076, Germany.

出版信息

Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

DOI:10.1073/pnas.2218523120

PMID:36730192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9963545/

Abstract

We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.

摘要

我们使用认知心理学的工具研究了 GPT-3，这是一种最近出现的大型语言模型。更具体地说，我们在一系列来自文献的典型实验上评估了 GPT-3 的决策、信息搜索、思考和因果推理能力。我们发现 GPT-3 的很多行为令人印象深刻：它在基于情境的任务上的表现与人类受试者相似或更好，能够根据描述做出不错的决策，在多臂赌博任务中表现优于人类，并且表现出基于模型的强化学习的特征。然而，我们也发现，对基于情境的任务进行微小的干扰会导致 GPT-3出现严重偏差，它没有表现出定向探索的特征，并且在因果推理任务中表现得非常糟糕。总的来说，这些结果丰富了我们对当前大型语言模型的理解，并为未来使用认知心理学工具研究越来越有能力和不透明的人工智能代理铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1843/9963545/1cd6158c3690/pnas.2218523120fig01.jpg

相似文献

Using cognitive psychology to understand GPT-3.

Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.

The diagnostic and triage accuracy of the GPT-3 artificial intelligence model: an observational study.

Lancet Digit Health. 2024 Aug;6(8):e555-e561. doi: 10.1016/S2589-7500(24)00097-9.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Language models and psychological sciences.

Front Psychol. 2023 Oct 20;14:1279317. doi: 10.3389/fpsyg.2023.1279317. eCollection 2023.

Overlap in meaning is a stronger predictor of semantic activation in GPT-3 than in humans.

Sci Rep. 2023 Mar 28;13(1):5035. doi: 10.1038/s41598-023-32248-6.

Covariation of learning and "reasoning" abilities in mice: evolutionary conservation of the operations of intelligence.

J Exp Psychol Anim Behav Process. 2012 Apr;38(2):109-24. doi: 10.1037/a0027355. Epub 2012 Mar 19.

Multi-task reinforcement learning in humans.

Nat Hum Behav. 2021 Jun;5(6):764-773. doi: 10.1038/s41562-020-01035-y. Epub 2021 Jan 28.

Diagnostic accuracy of large language models in psychiatry.

Asian J Psychiatr. 2024 Oct;100:104168. doi: 10.1016/j.ajp.2024.104168. Epub 2024 Jul 25.

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task.

Neural Netw. 2021 Feb;134:1-10. doi: 10.1016/j.neunet.2020.11.003. Epub 2020 Nov 18.

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.

引用本文的文献

Active use of latent tree-structured sentence representation in humans and large language models.

Nat Hum Behav. 2025 Sep 10. doi: 10.1038/s41562-025-02297-0.

GPT-4V shows human-like social perceptual capabilities at phenomenological and neural levels.

Imaging Neurosci (Camb). 2025 Sep 2;3. doi: 10.1162/IMAG.a.134. eCollection 2025.

Can Large Language Models Simulate Spoken Human Conversations?

Cogn Sci. 2025 Sep;49(9):e70106. doi: 10.1111/cogs.70106.

The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation.

Front Psychol. 2025 Aug 7;16:1628486. doi: 10.3389/fpsyg.2025.1628486. eCollection 2025.

Capturing Argument in Agent-Based Models.

Topoi (Dordr). 2025;44(3):675-693. doi: 10.1007/s11245-025-10215-2. Epub 2025 Jun 6.

Testing for completions that simulate altruism in early language models.

Nat Hum Behav. 2025 Jul 28. doi: 10.1038/s41562-025-02258-7.

A foundation model to predict and capture human cognition.

Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09215-4.

Cultural tendencies in generative AI.

Nat Hum Behav. 2025 Jun 20. doi: 10.1038/s41562-025-02242-1.

Large language models show amplified cognitive biases in moral decision-making.

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2412015122. doi: 10.1073/pnas.2412015122. Epub 2025 Jun 20.

Examining Chat GPT with nonwords and machine psycholinguistic techniques.

PLoS One. 2025 Jun 6;20(6):e0325612. doi: 10.1371/journal.pone.0325612. eCollection 2025.

本文引用的文献

Do Large Language Models Know What Humans Know?

Cogn Sci. 2023 Jul;47(7):e13309. doi: 10.1111/cogs.13309.

Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models.

IEEE Trans Vis Comput Graph. 2023 Jan;29(1):1146-1156. doi: 10.1109/TVCG.2022.3209479. Epub 2022 Dec 16.

A neural network solves, explains, and generates university math problems by program synthesis and few-shot learning at human level.

Proc Natl Acad Sci U S A. 2022 Aug 9;119(32):e2123433119. doi: 10.1073/pnas.2123433119. Epub 2022 Aug 2.

Using large-scale experiments and machine learning to discover theories of human decision-making.

Science. 2021 Jun 11;372(6547):1209-1214. doi: 10.1126/science.abe2629.

Computational Psychiatry for Computers.

iScience. 2020 Nov 7;23(12):101772. doi: 10.1016/j.isci.2020.101772. eCollection 2020 Dec 18.

XAI-Explainable artificial intelligence.

Sci Robot. 2019 Dec 18;4(37). doi: 10.1126/scirobotics.aay7120.

Placing language in an integrated understanding system: Next steps toward human-level performance in neural language models.

Proc Natl Acad Sci U S A. 2020 Oct 20;117(42):25966-25974. doi: 10.1073/pnas.1910416117. Epub 2020 Sep 28.

Humans primarily use model-based inference in the two-stage task.

Nat Hum Behav. 2020 Oct;4(10):1053-1066. doi: 10.1038/s41562-020-0905-y. Epub 2020 Jul 6.

Replicating patterns of prospect theory for decision under risk.

Nat Hum Behav. 2020 Jun;4(6):622-633. doi: 10.1038/s41562-020-0886-x. Epub 2020 May 18.

Machine behaviour.

Nature. 2019 Apr;568(7753):477-486. doi: 10.1038/s41586-019-1138-y. Epub 2019 Apr 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用认知心理学理解 GPT-3。

Using cognitive psychology to understand GPT-3.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献