top of page

AI in Science Writing and the Imitation of Understanding

As AI becomes more fluent in science writing, the line between sounding right and being right starts to blur.

AI in Science Writing and the Imitation of Understanding

Here is a sentence - Recent advances in artificial intelligence are transforming the landscape of scientific communication, enabling unprecedented efficiency, scalability and knowledge dissemination across disciplines. 


You have probably read something like this before. 


It sounds polished and completely reasonable. 


Now imagine reading ten versions of that in a row.


A recent analysis by Originality.ai found that in November 2024, the quantity of AI-generated articles published online surpassed the number written by humans. For the first time, more text on the internet was being produced by models than by people.


That is not necessarily a disaster, right? AI is getting very good at writing. In science communication, especially, it can summarise dense papers, smooth out awkward phrasing and generate explanations that feel clear and accessible. For researchers or content teams already stretched thin, the appeal is obvious.


A graph by Graphite.io tracking the quantity of AI-generated content vs human-generated content on the internet. 
A graph by Graphite.io tracking the quantity of AI-generated content vs human-generated content on the internet. 

I saw this firsthand when speaking to a science writing initiative about contributing articles. I was explained that most of the pieces on their site were first written by ChatGPT. My role, they said, would be to “humanise” the AI drafts before publication. The website itself was filled with glossy AI-generated illustrations. The whole exchange left me with a slightly uneasy feeling, not just because AI was involved, but also because of how casually the line between writing and editing machine output had been blurred.


And that is where my real concern with trust begins. 


Stress-testing the model 

Science writing is not generic content. It deals with clinical trials, survival statistics, risk factors and public health decisions. When something sounds confident and authoritative, readers assume it has been checked. As AI-generated writing becomes harder to distinguish from human work, small errors or entirely fabricated details become easier to miss.


Out of curiosity, I started testing AI tools on topics I know well. The obvious place to start was my master’s thesis. My project looked at sex-specific differences in inflammatory patterns in the Drosophila midgut and how they might relate to colorectal cancer prognosis in men and women.


If anything was going to trip the model up, I assumed it would be that.


At first, the responses were surprisingly good. The right terminology was used, and the summaries felt coherent. If you skimmed them quickly, you would probably assume the model had a decent grasp of the literature.


The cracks appeared when I started asking about specific statistics.


I asked about colorectal cancer statistics in males and females. The model replied that males have more colorectal cancer diagnoses than females. On the surface, that sounds reasonable. But when I checked the source it had drawn from, the data was actually reporting incidence rates in 2020, where colorectal cancer incidence was higher in males.

This is a screenshot from Chat GPT reporting on the results.
This is a screenshot from Chat GPT reporting on the results.
Here is a screenshot from the actual study.
Here is a screenshot from the actual study.

That might seem like a minor distinction. In epidemiology, it is not. Incidence rates are not the same as total diagnoses, and a single year of data does not automatically translate into a broad rule.


In science, those small differences in wording carry a lot of weight. Shift the phrasing slightly, and the meaning changes with it.


The model did not seem bothered by that distinction. The sentence sounded plausible, so it stayed.


AI completely hallucinating statistics is more common than you think 

The most unsettling example came when I asked the model to draft a short paragraph about CAR-T cell therapy using two specific research papers I provided.


In the paragraph it wrote, relapse rates were said to drop from roughly seventy per cent to around thirty per cent after certain microbiome interventions. The number was precise, and the claim sounded convincing.


The problem was that neither of the papers said that.


When I asked the model where that statistic came from, it admitted that no single study reported exactly what the sentence claimed. The number had been generated to fit the narrative the paragraph was building.


The most unsettling example came when I asked the model to draft a short paragraph about CAR-T cell therapy using two specific research papers I provided.


In the paragraph it wrote, relapse rates were said to drop from roughly seventy per cent to around thirty per cent after certain microbiome interventions. The number was precise, and the claim sounded convincing.


The problem was that neither of the papers said that.


When I asked the model where that statistic came from, it admitted that no single study reported exactly what the sentence claimed. The number had been generated to fit the narrative the paragraph was building.



If I had not read those papers closely myself, I might not have questioned it. The sources were real. The topic was real. Even the general direction of the claim felt believable.


That is what makes these hallucinations difficult to spot. They rarely look dramatic or obviously wrong. More often, they are just a little too tidy, a statistic that fits too perfectly into the story being told.


Why do hallucinations happen?

Large language models are not trained on verified databases of facts. They learn patterns in text. Their job is to predict the next likely word in a sentence based on probability.


That works surprisingly well for producing fluent explanations. But it also means the model is trying to make a paragraph sound right, not necessarily be right.


If a discussion of immunotherapy often includes a statistic, the model may supply one. If a strong claim is usually followed by a citation, it may generate something that looks like one. The system is not checking a dataset in the background. It is assembling a sentence that fits the pattern it has seen before.


Errors can creep in for many reasons. Training data can contain biases or inaccuracies. Complex models can overfit to familiar phrasing. But the underlying issue is simpler than that. The model has no sense of what the numbers actually represent. A relapse rate, an incidence rate, a survival curve. These are statistical concepts with real clinical implications, but to the model, they are just tokens that often appear near certain words.


A human science writer approaches that information differently. Most of us come from research backgrounds, where questioning data is part of the training. Numbers are not just decoration in a paragraph. They sit within a broader context of study design, sample size, methodology and limitations.


That habit of scepticism is what keeps a statistic from quietly drifting away from what the original research actually showed.


Even the references can unravel

I saw the same pattern when I asked for AMA style citations. The model produced references with incorrect author lists, missing accessed dates and broken links. At one point, it confidently contradicted itself about whether citation numbers should appear before or after punctuation.


These are mechanical rules. A human checking the official guidance can confirm them in minutes. The model, however, responded with whatever phrasing best matched the question.


It was performing understanding, not demonstrating it. 


For someone skimming an article online, those small inaccuracies are almost invisible. For someone making decisions based on that information, they matter.


The bigger picture

Science communication exists because most people do not want to read primary research papers. They should not have to. The point is to translate complex work into something accessible, engaging and meaningful.


Anyone can now ask AI to explain a paper. That is not inherently a bad thing. The problem is that the explanation might be subtly wrong. Or missing context. Or slightly overconfident.


Wrong information defeats the whole purpose of making science accessible.


But there is something else that gets lost, and it is harder to measure. Science writing is creative. It is not just about relaying findings. It is about deciding why something matters. It is about asking, " How do I feel about this discovery? Why should you care? What does this change?”


There is a voice in good science communication. A perspective. A sense that someone has wrestled with the material and chosen their words carefully.


Many of us grew up watching documentaries narrated by David Attenborough. What made them powerful was not just the facts about ecosystems or species behaviour. It was the humanness. The quiet awe. The restraint. The understanding of when to be hopeful and when to sound concerned.


AI can imitate cadence. It can approximate a structure. But it does not care whether the statistic it just generated shapes someone’s perception of cancer risk. It does not feel the weight of explaining a new therapy to someone who might one day need it.


Science communication is not just about sounding informed. It is about being responsible. It is about connecting knowledge to people in a way that is careful, thoughtful and human.


And that is something a predictive model, no matter how fluent, still cannot do.

bottom of page