diffuse.one/google_scholar_recall
designation: D2-003
author: andrew white
status: complete
prepared date: September 12, 2025
updated date: September 13, 2025

abstract: Google scholar has full-text access to over 200M articles. Here I explore using the full-text index to look at popularity of some phrases. I look at most popular sample sizes, popularity of greek letters in equations, and change in usage of animals for biology research.

fun with google scholar

Google scholar has become an indispensable tool because of its ability to precisely find the right research papers from a keyword search. Google scholar also has a less-used ability to recall ALL papers. You can immediately get a count of all papers that have a specific phrase from basically all scientific publications.

Let's see what interesting things we can learn about science using only the count of results.

population sizes

What sample size is most popular in research papers? I got the search results for these specific phrases for N from 1 to 250:

  • "sample size of {N}"
  • "{N} samples"
  • "population size of {N}"
  • "{N} participants"
  • "total of {N} subjects"

I then totaled the results (for all phrases per N) and here is the plot:

Smaller sample sizes are more popular for research papers. Humans tend to pick "round" numbers when designing experiments.

This plot is about what I would expect. There are some interesting effects - like 9 and 11 stick out as quite rare. 12 is quite popular. 100 is popular because it's recommended in clinical and pre-clinical research for statistical power.

greek alphabet

I wanted to see which greek letters are most popular in papers with equations. Here I just search for "{l} OR equation", where "{l}" is one of the greek alphabet letters. A fun fact - the word "alphabet" is a shortening of alpha, beta, gamma. So it's basically saying "ABCs" in Greek.

This result was more surprising, because I would have suspected total domination of π. Digging more into the search results, I found that there are just many examples in biology and materials science where things are named via greek suffixes. Gene names, protein isoforms, dihedral angles in biochemistry, and polymorphs of crystal structures all use greek letters (starting at α) to index.

Another interesting element of the plot is the general dislike of greek letters that look too much like latin letters. Letters like ι (iota) and ο (omicron). Excluding those letters, the least popular greek letter is ζ.

model organisms

Let's explore something with time dependence. Let's examine which kinds of animals are used in biology research as a function of time. Namely we're showing relative popularity among a fixed set of items from 1975 to 2024. I'm considering the following organisms with their common and scientific names:

  • Mouse (Mus musculus)
  • Rat (Rattus norvegicus)
  • Rabbit (Oryctolagus cuniculus)
  • Dog/Beagle (Canis lupus familiaris)
  • Minipig (Sus scrofa domesticus)
  • Rhesus Macaque (Macaca mulatta)
  • Cynomolgus Macaque (Macaca fascicularis)
  • Common Marmoset (Callithrix jacchus)
  • Guinea Pig (Cavia porcellus)
  • Syrian Hamster (Mesocricetus auratus)

Here is the plot, with the search results converted to popularity fraction:

Some interesting trends: (1) the decrease in primate usage (2) the growing dominance of rats and mice (3) the emergence of minipigs and (4) the decline of hamsters. Another interesting thing I learned - the common name for the mouse used in lab research is "house mouse" and the common name for the rat is "lab rat."

conclusion

This was pretty easy and interesting to do. Just running a few google scholar searches can reveal some findings that are nearly impossible to otherwise obtain. The only other way to do this work would be to get a huge dump from all major publishers and set-up full-text indexing. That would be an insane amount of effort.