Suppr超能文献

基于一个人的名字,NamSor 在预测其原籍国和种族方面的表现如何?

How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?

机构信息

University Institute for Primary Care (IuMFE), University of Geneva, Geneva, Switzerland.

出版信息

PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.

Abstract

BACKGROUND

We aimed to evaluate NamSor's performance in predicting the country of origin and ethnicity of individuals based on their first/last names.

METHODS

We retrieved the name and country of affiliation of all authors of PubMed publications in 2021, affiliated with universities in the twenty-two countries whose researchers authored ≥1,000 medical publications and whose percentage of migrants was <2.5% (N = 88,699). We estimated with NamSor their most likely "continent of origin" (Asia/Africa/Europe), "country of origin" and "ethnicity". We also examined two other variables that we created: "continent#2" ("Europe" replaced by "Europe/America/Oceania") and "country#2" ("Spain" replaced by "Spain/Hispanic American country" and "Portugal" replaced by "Portugal/Brazil"). Using "country of affiliation" as a proxy for "country of origin", we calculated for these five variables the proportion of misclassifications (= errorCodedWithoutNA) and the proportion of non-classifications (= naCoded). We repeated the analyses with a subsample consisting of all results with inference accuracy ≥50%.

RESULTS

For the full sample and the subsample, errorCodedWithoutNA was 16.0% and 12.6% for "continent", 6.3% and 3.3% for "continent#2", 27.3% and 19.5% for "country", 19.7% and 11.4% for "country#2", and 20.2% and 14.8% for "ethnicity"; naCoded was zero and 18.0% for all variables, except for "ethnicity" (zero and 10.7%).

CONCLUSION

NamSor is accurate in determining the continent of origin, especially when using the modified variable (continent#2) and/or restricting the analysis to names with accuracy ≥50%. The risk of misclassification is higher with country of origin or ethnicity, but decreases, as with continent of origin, when using the modified variable (country#2) and/or the subsample.

摘要

背景

我们旨在评估 NamSor 基于个人的名字预测其原籍国和种族的性能。

方法

我们检索了 2021 年在 PubMed 出版物中发表文章的所有作者的姓名和所属国家,这些作者来自 22 个国家的大学,这些国家的研究人员发表了≥1000 篇医学论文,移民比例<2.5%(N=88699)。我们使用 NamSor 估计他们最可能的“原籍大陆”(亚洲/非洲/欧洲)、“原籍国”和“种族”。我们还检查了另外两个我们创建的变量:“大陆#2”(“欧洲”替换为“欧洲/美洲/大洋洲”)和“国家#2”(“西班牙”替换为“西班牙/拉美国家”和“葡萄牙”替换为“葡萄牙/巴西”)。我们使用“所属国家”作为“原籍国”的代理变量,计算了这五个变量的错误分类比例(=errorCodedWithoutNA)和未分类比例(=naCoded)。我们使用所有推断准确性≥50%的结果的子样本重复了这些分析。

结果

对于整个样本和子样本,错误分类比例(=errorCodedWithoutNA)分别为“大陆”的 16.0%和 12.6%、“大陆#2”的 6.3%和 3.3%、“国家”的 27.3%和 19.5%、“国家#2”的 19.7%和 11.4%、以及“种族”的 20.2%和 14.8%;除了“种族”(零和 10.7%)外,所有变量的未分类比例(=naCoded)均为零和 18.0%。

结论

NamSor 在确定原籍大陆方面是准确的,特别是在使用修改后的变量(大陆#2)和/或将分析限制在准确性≥50%的名称时。原籍国或种族的错误分类风险较高,但随着大陆起源的变化(如使用修改后的变量(国家#2)和/或子样本),风险会降低。

相似文献

1
How well does NamSor perform in predicting the country of origin and ethnicity of individuals based on their first and last names?
PLoS One. 2023 Nov 16;18(11):e0294562. doi: 10.1371/journal.pone.0294562. eCollection 2023.
2
Performance of gender detection tools: a comparative study of name-to-gender inference services.
J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.
4
Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.
J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.
9
Classifying ethnicity utilizing the Canadian Mortality Data Base.
Ethn Health. 1997 Nov;2(4):287-95. doi: 10.1080/13557858.1997.9961837.
10
Where are primary type specimens of new mite species deposited?
Zootaxa. 2017 Dec 8;4363(1):1-54. doi: 10.11646/zootaxa.4363.1.1.

引用本文的文献

1
Can ChatGPT Recognize Its Own Writing in Scientific Abstracts?
Cureus. 2025 Jul 25;17(7):e88774. doi: 10.7759/cureus.88774. eCollection 2025 Jul.
2
Social Vulnerability Index as a Tool to Evaluate the Distribution of Head and Neck Oncology Surgeons.
Laryngoscope. 2025 Sep;135(9):3178-3185. doi: 10.1002/lary.32136. Epub 2025 Mar 26.
3
Geographical Disparities in Research Misconduct: Analyzing Retraction Patterns by Country.
J Med Internet Res. 2025 Jan 14;27:e65775. doi: 10.2196/65775.
4
Study on the Analysis of Gender Trends Among the First Authors of Publications on Budd-Chiari Syndrome.
Cureus. 2024 Jun 29;16(6):e63458. doi: 10.7759/cureus.63458. eCollection 2024 Jun.
5
The role of race and ethnicity in health care crowdfunding: an exploratory analysis.
Health Aff Sch. 2024 Feb 28;2(3):qxae027. doi: 10.1093/haschl/qxae027. eCollection 2024 Mar.

本文引用的文献

1
Publication and citation inequalities faced by African researchers.
Eur J Intern Med. 2022 Dec;106:135-137. doi: 10.1016/j.ejim.2022.08.014. Epub 2022 Aug 17.
2
Gender Inequalities in Citations of Articles Published in High-Impact General Medical Journals: a Cross-Sectional Study.
J Gen Intern Med. 2023 Feb;38(3):661-666. doi: 10.1007/s11606-022-07717-9. Epub 2022 Jul 6.
4
Are Accuracy Parameters Useful for Improving the Performance of Gender Detection Tools? A Comparative Study with Western and Chinese Names.
J Gen Intern Med. 2022 Nov;37(15):4024-4027. doi: 10.1007/s11606-022-07469-6. Epub 2022 Mar 15.
5
Reform scientific elections to improve gender equality.
Nat Hum Behav. 2022 Apr;6(4):478-479. doi: 10.1038/s41562-022-01322-w.
7
Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference.
J Med Libr Assoc. 2021 Oct 1;109(4):609-612. doi: 10.5195/jmla.2021.1252.
8
Performance of gender detection tools: a comparative study of name-to-gender inference services.
J Med Libr Assoc. 2021 Jul 1;109(3):414-421. doi: 10.5195/jmla.2021.1185.
9
Gender gap in authorship: a study of 44,000 articles published in 100 high-impact general medical journals.
Eur J Intern Med. 2022 Mar;97:103-105. doi: 10.1016/j.ejim.2021.09.013. Epub 2021 Sep 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验