The Impact of Large Language Models on Scientific Writing

Large language models have revolutionized the way we interact with text and generate content. However, detecting whether a piece of writing has been created with the help of a large language model has proven to be a challenge for AI companies. Recently, researchers from Germany’s University of Tübingen and Northwestern University have developed a novel method to estimate the usage of large language models in scientific writing.

The researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024 to track changes in vocabulary usage. By comparing the expected frequency of words based on pre-2023 trends to the actual frequency in abstracts from 2023 and 2024, they were able to identify a significant increase in the usage of certain style words after the widespread adoption of large language models.

The study revealed that certain words, such as “delves,” “showcasing,” and “underscores,” experienced a surge in popularity in scientific abstracts after the introduction of large language models. Additionally, words like “potential,” “findings,” and “crucial” also saw notable increases in usage. These vocabulary changes were unprecedented in both quality and quantity, signaling a shift in scientific writing style.

While language naturally evolves over time, the researchers noted that the sudden and significant increases in word usage observed after the introduction of large language models were distinct from previous trends related to major world events. Words related to global health crises, such as “ebola,” “zika,” and “coronavirus,” saw spikes in usage during specific time periods. However, the post-large language model era saw a broader spectrum of words experiencing pronounced increases in scientific usage.

By analyzing the abstracts from the pre-2023 era as a control group, the researchers identified hundreds of “marker words” that became more common in scientific writing after the widespread use of large language models. These marker words, predominantly verbs, adjectives, and adverbs, serve as indicators of large language model assistance in writing. The researchers’ statistical analysis estimated that at least 10 percent of post-2022 papers in the PubMed corpus were written with the help of large language models.

The study highlights the significant impact that large language models have had on scientific writing, leading to shifts in vocabulary usage and style words. The identification of marker words provides insight into the prevalence of large language model assistance in modern scientific literature. As the use of large language models becomes more ubiquitous, it is essential for researchers and writers to be aware of these changes and their implications for the future of academic communication.

Articles You May Like

Leave a Reply Cancel reply