diffuse.one/discovery_cost
designation: D1-003
author: andrew white
status: partially complete
prepared date: October 5, 2024
updated date: October 6, 2024

Abstract: I wanted to know the cost of major scientific discoveries. It seems like the cost is growing; the Higgs Boson, the human genome project, and observing gravitational waves come to mind. So here I used Wikipedia articles about major discoveries and the OpenAI's o1-preview model to estimate the cost of all major scientific discoveries in human history. The average cost of a major discovery is now $100M and the doubling-time of cost for discoveries is about 12 years. All calculations were done in 2024 dollars.

estimating the cost of all discoveries

Here are all major discoveries in the last few hundred years and their cost over time

Est. cost of discoveries over time

data

I took the list of major discoveries from Wikipedia - which is a nicer list than the Nobel Prize because it is sparser, doesn't have to constrain itself to living scientists, and it doesn't have a lag (Nobel Prizes have a 30 year lag now1). It's kind of a hard list to work with though for older discoveries. For example, it's hard to estimate the cost to invent parentheses in equations.

The other data used for this project is from Wikipedia pages related to the discoveries. To get these pages, I just did google searches over English Wikipedia for the discovery and downloaded the top three articles that came up that weren't the timeline or files or talk pages.

method

I'm relying on OpenAI's o1-preview and it's ability to estimate things.2 It's very good at order of magnitude calculations in my experience. Obviously this is unjustified, but we can see some examples in a moment.

To use the information from the relevant wikipedia articles, I use paper-qa because it provides an easy way to mix the documents and model prompts together. The main algorithm is that paper-qa will surface relevant passages from the three Wikipedia articles, distill them for the question being asked, and then let o1-preview answer with all the details.3 Here is the prompt I used for asking o1-preview:

Prompt:

"""
Complete the following task given the context:

Context:

{context}

----

Provide estimates for the values of the items for a scientific discovery below. 
Use the context and estimation to arrive at your response.

hours: number of man hours
capital: capital expenditures cost, accounting for things like major equipment, 
    laboratory space, and major supplies. Consider the rarity of equipment in the time period 
    (e.g., a microscope is probably more expensive in 1800 even after accounting for inflation).
cost: total cost of discovery in inflation adjusted 2024 dollars
field: best match field from chemistry, physics, biology, medicine, mathematics

Discovery: {discovery}

Respond with reasoning first and then json with keys hours, capital, cost, field.
For each sentence in your reasoning, indicate if context helped arrive at your conclusion by 
using a parenthetical citation {example_citation}. Assume an hourly wage of $60 in 2024 for scientists.
"""

And here's the example output from that questions:

Question: (2019 , $1 in 2019 = $1.23 in 2024) The first image of a black hole is captured using 
eight different telescopes.

The EHT project involved over 300 members and 60 institutions across 20 countries (Event chunk 
1). Data was collected in April 2017 and processed over two years, leading to the release of 
the first black hole image in 2019 (Black chunk 15). Assuming that approximately 150 scientists 
worked full-time over two years, the total man-hours would be 150 people * 2000 hours/year * 2 
years = 600,000 man-hours (Estimation based on context).

Scientists working on such projects likely earn around $100,000 per year in 2019 dollars, which 
adjusts to $123,000 in 2024 dollars with inflation (Discovery). This corresponds to an hourly 
wage of $123,000 / 2000 hours = $61.5 per hour (Estimation).

Capital expenditures include upgrades and operational costs for eight telescopes, data storage 
equipment, and computational resources like a grid computer with about 800 CPUs and a 40 Gbit/s 
network (Event chunk 2). Estimating these capital costs to be around $50 million (Estimation 
based on context).

Total cost is the sum of labor and capital costs: $36 million (labor) + $50 million (capital) = 
$86 million (Calculation).

The field best matching this discovery is physics.

{
  "hours": 600000,
  "wage": 61.5,
  "capital": 50000000,
  "cost": 86000000,
  "field": "physics"
}

References

1. (Event chunk 2): Event_Horizon_Telescope Wikipedia article

2. (Black chunk 15): Black_hole Wikipedia article

3. (Event chunk 1): Event_Horizon_Telescope Wikipedia article

checking specific points

We can double-check known examples where the cost has been calculated. For example, the Human Genome Project was reported to have cost $2.7 billion when complete in 2003 ($4.5 billion in 2024 dollars). This is indeed the value returned from this method. The method assessed the cost of Hubble's formulation of an expanding universe to be about $20M. The cost of the telescope Hubble used for the observations (Hooker Telescope) was estimated at $600,000 in 1917 (indeed that matches the $500,000 reported in the observatory's about page), which accounts for $12M and then o1-preview gave margin for things like the land and upkeep during that period. Maybe a bit high, but again very plausible.

Here's another example for Kepler's planetary motion law:

Johannes Kepler formulates the third law of planetary motion.

To estimate the number of man hours Kepler spent on formulating the third law, we note that he 
published 'Harmonice Mundi' in 1619, where he presented this law (Johannes chunk 11). 
Considering his extensive calculations without assistants for his earlier laws (Johannes chunk 
9), it's reasonable to assume he spent at least 2 years focused on this work. Assuming he 
worked approximately 60 hours per week, which was common for scholars of his time, over 50 
weeks per year, we calculate:

Hours = 60 hours/week * 50 weeks/year * 2 years = 6,000 hours.

The hourly wage is given as $60 in 2024 dollars.

For capital expenditures, Kepler primarily used Tycho Brahe's observational data (Johannes 
chunk 11). Since he didn't require significant equipment or laboratory space, capital costs 
were minimal. However, access to Tycho's data, books, and writing materials had some cost. We 
can estimate the capital expenditures to be around $50,000 in today's dollars.

which is a plausible description. A lot of discoveries pre 1800 are like this - basically going from intellectual labor to wage to labor cost and throwing in some small capital cost.

cost

It cost me around $75.00 of API costs to do this whole calculation - with 99% of that for o1-preview. I probably could have used o1-mini for everything pre-1950, because the estimates were necessarily crude. I did see a real difference for the more complex projects, like the Event Horizon Project which used multiple telescopes, where o1-mini gave $100M for the capital cost with no justification or analysis.

regression

I fit a regression starting in 1950, which is about the start of modern funded science. The doubling time for the regressed line (on log cost) is 12 years and the predicted cost of discoveries is $100M in 2024. The line shown in the chart was regressed over the whole range, so is flatter because of its weight on older discoveries.

going back further

It is much more speculative, but we can just keep going backwards. I feel sorry for o1-preview, but I can force it to speculate the cost of inventing numeral zero or a square root. Using that, here are all discoveries back to the beginning of civilization and their cost

Est. cost of discoveries over time

conclusions

This was a nice test of scaling up intelligence. I can define the problem clearly, get all the background material ready, and then have a capable LLM do the intellectual work. It's pretty cool. I think there are many new kinds of analyses that could be done with this approach.

Raw data

You can see all the estimates and explanations here

Footnotes

  1. https://www.nature.com/articles/d41586-023-03086-3

  2. https://openai.com/index/openai-o1-system-card/

  3. https://arxiv.org/abs/2409.13740