Max Planck Research Group (MPRG) Computational Principles of Intelligence, Max Planck Institute for Biological Cybernetics, Tübingen 72076, Germany.
Proc Natl Acad Sci U S A. 2023 Feb 7;120(6):e2218523120. doi: 10.1073/pnas.2218523120. Epub 2023 Feb 2.
We study GPT-3, a recent large language model, using tools from cognitive psychology. More specifically, we assess GPT-3's decision-making, information search, deliberation, and causal reasoning abilities on a battery of canonical experiments from the literature. We find that much of GPT-3's behavior is impressive: It solves vignette-based tasks similarly or better than human subjects, is able to make decent decisions from descriptions, outperforms humans in a multiarmed bandit task, and shows signatures of model-based reinforcement learning. Yet, we also find that small perturbations to vignette-based tasks can lead GPT-3 vastly astray, that it shows no signatures of directed exploration, and that it fails miserably in a causal reasoning task. Taken together, these results enrich our understanding of current large language models and pave the way for future investigations using tools from cognitive psychology to study increasingly capable and opaque artificial agents.
我们使用认知心理学的工具研究了 GPT-3,这是一种最近出现的大型语言模型。更具体地说,我们在一系列来自文献的典型实验上评估了 GPT-3 的决策、信息搜索、思考和因果推理能力。我们发现 GPT-3 的很多行为令人印象深刻:它在基于情境的任务上的表现与人类受试者相似或更好,能够根据描述做出不错的决策,在多臂赌博任务中表现优于人类,并且表现出基于模型的强化学习的特征。然而,我们也发现,对基于情境的任务进行微小的干扰会导致 GPT-3出现严重偏差,它没有表现出定向探索的特征,并且在因果推理任务中表现得非常糟糕。总的来说,这些结果丰富了我们对当前大型语言模型的理解,并为未来使用认知心理学工具研究越来越有能力和不透明的人工智能代理铺平了道路。