强化学习的内感受起源

The interoceptive origin of reinforcement learning.

作者信息

Weber Lilian A, Yee Debbie M, Small Dana M, Petzschner Frederike H

机构信息

Department of Psychiatry, University of Oxford, Oxford, UK; Wellcome Centre for Integrative Neuroimaging (WIN), Department of Experimental Psychology, University of Oxford, Oxford, UK.

Cognitive and Psychological Sciences, Brown University, Providence, RI, USA; Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, USA.

出版信息

Trends Cogn Sci. 2025 Sep;29(9):840-854. doi: 10.1016/j.tics.2025.05.008. Epub 2025 Jun 10.

DOI:10.1016/j.tics.2025.05.008

PMID:40500611

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12400946/

Abstract

Rewards play a crucial role in sculpting all motivated behavior. Traditionally, research on reinforcement learning has centered on how rewards guide learning and decision-making. Here, we examine the origins of rewards themselves. Specifically, we discuss that the critical signal sustaining reinforcement for food is generated internally and subliminally during the process of digestion. As such, a shift in our understanding of primary rewards as an immediate sensory gratification to a state-dependent evaluation of an action's impact on vital physiological processes is called for. We integrate this perspective into a revised reinforcement learning framework that recognizes the subliminal nature of biological rewards and their dependency on internal states and goals.

摘要

奖励在塑造所有有动机的行为中起着至关重要的作用。传统上，强化学习的研究集中在奖励如何引导学习和决策。在这里，我们研究奖励本身的起源。具体来说，我们讨论了维持食物强化的关键信号是在消化过程中在内部和潜意识中产生的。因此，需要将我们对主要奖励的理解从即时感官满足转变为对行为对重要生理过程影响的状态依赖性评估。我们将这一观点整合到一个修订后的强化学习框架中，该框架认识到生物奖励的潜意识本质及其对内部状态和目标的依赖性。

相似文献

The interoceptive origin of reinforcement learning.

Trends Cogn Sci. 2025 Sep;29(9):840-854. doi: 10.1016/j.tics.2025.05.008. Epub 2025 Jun 10.

Prescription of Controlled Substances: Benefits and Risks

Natural behaviour is learned through dopamine-mediated reinforcement.

Nature. 2025 May;641(8063):699-706. doi: 10.1038/s41586-025-08729-1. Epub 2025 Mar 12.

Dynamic Regulation of the Serotonin-Dopamine Interaction Within a Meta-reinforcement Learning Framework Encompassing the Prefrontal Cortex and Basal Ganglia.

Int J Neural Syst. 2025 Aug;35(8):2550040. doi: 10.1142/S0129065725500406.

Mechanisms of increased pain discrimination by contingent reinforcement: a perceptual decision-making and instrumental learning account.

Pain. 2025 Jan 21;166(8):1769-1783. doi: 10.1097/j.pain.0000000000003514.

Short-Term Memory Impairment

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

Dopamine Modulates Dynamic Decision-Making during Foraging.

J Neurosci. 2020 Jul 1;40(27):5273-5282. doi: 10.1523/JNEUROSCI.2586-19.2020. Epub 2020 May 26.

Multi-timescale reinforcement learning in the brain.

Nature. 2025 Jun 4. doi: 10.1038/s41586-025-08929-9.

Disentangling prediction error and value in a formal test of dopamine's role in reinforcement learning.

Curr Biol. 2025 Aug 18;35(16):4019-4027.e7. doi: 10.1016/j.cub.2025.06.076. Epub 2025 Jul 29.

引用本文的文献

Hierarchical, Interactive, and Dynamic Predictive Capacity of Current Biological, Psychological, Social, and Environmental Measurements in Depression, Anxiety, ADHD, and Social Quality across the Lifespan.

Res Sq. 2025 Jul 30:rs.3.rs-7060126. doi: 10.21203/rs.3.rs-7060126/v1.

本文引用的文献

Thalamic opioids from POMC satiety neurons switch on sugar appetite.

Science. 2025 Jan 2;387(6735):750-758. doi: 10.1126/science.adp1510. Epub 2025 Feb 13.

The affective gradient hypothesis: an affect-centered account of motivated behavior.

Trends Cogn Sci. 2024 Dec;28(12):1089-1104. doi: 10.1016/j.tics.2024.08.003. Epub 2024 Sep 24.

A feature-specific prediction error model explains dopaminergic heterogeneity.

Nat Neurosci. 2024 Aug;27(8):1574-1586. doi: 10.1038/s41593-024-01689-1. Epub 2024 Jul 3.

Separate gut-brain circuits for fat and sugar reinforcement combine to promote overeating.

Cell Metab. 2024 Feb 6;36(2):393-407.e7. doi: 10.1016/j.cmet.2023.12.014. Epub 2024 Jan 18.

A goal-centric outlook on learning.

Trends Cogn Sci. 2023 Dec;27(12):1150-1164. doi: 10.1016/j.tics.2023.08.011. Epub 2023 Sep 9.

Interactive cognitive maps support flexible behavior under threat.

Cell Rep. 2023 Aug 29;42(8):113008. doi: 10.1016/j.celrep.2023.113008. Epub 2023 Aug 22.

The functional form of value normalization in human reinforcement learning.

Elife. 2023 Jul 10;12:e83891. doi: 10.7554/eLife.83891.

Brain responses to nutrients are severely impaired and not reversed by weight loss in humans with obesity: a randomized crossover study.

Nat Metab. 2023 Jun;5(6):1059-1072. doi: 10.1038/s42255-023-00816-9. Epub 2023 Jun 12.

Habitual daily intake of a sweet and fatty snack modulates reward processing in humans.

Cell Metab. 2023 Apr 4;35(4):571-584.e6. doi: 10.1016/j.cmet.2023.02.015. Epub 2023 Mar 22.

Complementary lateral hypothalamic populations resist hunger pressure to balance nutritional and social needs.

Cell Metab. 2023 Mar 7;35(3):456-471.e6. doi: 10.1016/j.cmet.2023.02.008. Epub 2023 Feb 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

强化学习的内感受起源

The interoceptive origin of reinforcement learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献