Wedyan Musab, Yeh Yu-Chen, Saeidi-Rizi Fatemeh, Peng Tai-Quan, Chang Chun-Yen
School of Planning, Design and Construction, Michigan State University, East Lansing, Michigan, United States of America.
Department of Horticulture and Landscape Architecture, National Taiwan University, Taipei City, TaiwanTaiwan.
PLoS One. 2025 Apr 29;20(4):e0322078. doi: 10.1371/journal.pone.0322078. eCollection 2025.
Urban environments significantly shape our well-being, behavior, and overall quality of life. Assessing urban environments, particularly walkability, has traditionally relied on computer vision and machine learning algorithms. However, these approaches often fail to capture the subjective and emotional dimensions of walkability, due to their limited ability to integrate human-centered perceptions and contextual understanding. Recently, large language models (LLMs) have gained traction for their ability to process and analyze unstructured data. With the increasing reliance on LLMs in urban studies, it is essential to critically evaluate their potential to accurately capture human perceptions of walkability and contribute to the design of more pedestrian-friendly environments. Therefore, a critical question arises: can large language models (LLMs), such as GPT-4o, accurately reflect human perceptions of urban environments? This study aims to address this question by comparing GPT-4o's evaluations of visual urban scenes with human perceptions, specifically in the context of urban walkability. The research involved human participants and GPT-4o evaluating street-level images based on key dimensions of walkability, including overall walkability, feasibility, accessibility, safety, comfort, and liveliness. To analyze the data, text mining techniques were employed, examining keyword frequency, coherence scores, and similarity indices between the participants and GPT-4o-generated responses. The findings revealed that GPT-4o and participants aligned in their evaluations of overall walkability, feasibility, accessibility, and safety. In contrast, notable differences emerged in the assessment of comfort and liveliness. Human participants demonstrated broader thematic diversity and addressed a wider range of topics, whereas GPT-4o had more focused and cohesive responses, particularly in relation to comfort and safety. In addition, similarity scores between GPT-4o and the responses of participants indicated a moderate level of alignment between GPT-4o's reasoning and human judgments. The study concludes that human input remains essential for fully capturing human-centered evaluations of walkability. Furthermore, it underscores the importance of refining LLMs to better align with human perceptions in future walkability studies.
城市环境显著影响着我们的幸福感、行为以及整体生活质量。传统上,评估城市环境,尤其是步行适宜性,依赖于计算机视觉和机器学习算法。然而,由于这些方法整合以人为本的认知和情境理解的能力有限,往往无法捕捉到步行适宜性的主观和情感维度。最近,大语言模型(LLMs)因其处理和分析非结构化数据的能力而受到关注。随着城市研究中对大语言模型的依赖日益增加,至关重要的是要批判性地评估它们准确捕捉人类对步行适宜性的认知并为设计更适合行人的环境做出贡献的潜力。因此,一个关键问题出现了:诸如GPT - 4o这样的大语言模型能否准确反映人类对城市环境的认知?本研究旨在通过将GPT - 4o对城市视觉场景的评估与人类认知进行比较来回答这个问题,特别是在城市步行适宜性的背景下。该研究让人类参与者和GPT - 4o根据步行适宜性的关键维度,包括整体步行适宜性、可行性、可达性、安全性、舒适度和活力,对街道级图像进行评估。为了分析数据,采用了文本挖掘技术,检查参与者与GPT - 4o生成的回答之间的关键词频率、连贯分数和相似性指数。研究结果表明,GPT - 4o和参与者在对整体步行适宜性、可行性、可达性和安全性的评估上是一致的。相比之下,在舒适度和活力的评估上出现了显著差异。人类参与者展示了更广泛的主题多样性并涉及了更广泛的话题,而GPT - 4o的回答更集中且连贯,特别是在舒适度和安全性方面。此外,GPT - 4o与参与者回答之间的相似性分数表明GPT - 4o的推理与人类判断之间存在中等程度的一致性。该研究得出结论,人类输入对于全面捕捉以人为本的步行适宜性评估仍然至关重要。此外,它强调了在未来的步行适宜性研究中改进大语言模型以更好地与人类认知保持一致的重要性。