designation:	W1-002
author:	andrew white
status:	complete
prepared date:	January 20, 2026
updated date:	February 8, 2026

abstract: So much is happening in AI that I have a hard time reasoning about the future. I'm just going to write out a snapshot of some thoughts right now going through my head (as of end of January). Topics are AI progress, bottlenecks, drug discovery, healthcare, anti-AI movements, education, and open source.

2026 trends

AI progress

The pace of progress in 2025 was insane. The world now recognizes gemini-3-preview, gpt-5.2 and opus-4.5 as genuine capability jumps, with quite a bit of hype lately around opus-4.5 and opus-4.6 in Claude code. I don't think people have caught on to how incredibly good GPT-5.2-pro actually is. A big reason is how painful and expensive it is to benchmark. In internal spot-checks, we increasingly find evaluation ambiguity and human error dominate apparent LLM incorrect answers. In other words, now the majority mistakes made during our internal benchmark creation are from PhD human question writers/evaluators. We may be in the twilight of human experts, which was surprisingly brief. It went from college-educated humans being key data labelers to STEM PhD experts about 1 year ago (for us). Now I think we're shooting past them and benchmarks need a new approach.

However, the rapid progress in 2025 was mostly in a completely new dimension of reasoning models and driven more by post-training algorithms. The rumor I've heard is that only the last release of models - like GPT-5.2 - had significant gains from pretraining. Part of the reason is that it just has taken time to get software and hardware ready for training runs on large blackwell clusters. Google is in a different category with TPUs. But we're in an upswing again. One nice piece of external evidence for this is the dramatic rise we're seeing in METR evals of the last batch of models.¹ We've recently departed from the 7 month doubling time to something faster.

We're also seeing comparisons get messy. Most head-to-heads would probably show 5.3-codex beating opus-4.6, but opus-4.6 in claude code is such a compelling experience that I think it will have higher usage. Part of the reason is that I have to interact with a bunch of shit when coding - my cloud provider, github api, fetching docs, debugging webhooks, and claude code just does a better job of mimicking because it uses my CLI and auth. I actually love the product design of Codex (and Jules!) because they build around async agents, but the reality is that Codex running on OpenAI server cannot pull down the github ci log or ask me for an API key. Anyway, I think the recent product focus of Anthropic is surprising (I've always associated products more with OpenAI), but paying off.

Looking back, reasoning models, first introduced in o1 from OpenAI, were a relative surprise. Superficially, the improvement in LLM capabilities has felt like steady progress, but the causes have been uneven. Reasoning models/test-time compute gave a boost to benchmarks, whereas pretraining has had to catch-up. One of the big landscape questions is if there are more of these "surprise" gains to be made in model training and inference. I suspect that's a big part of the recent fundraising excitement around neolabs.

Blackwell was built for 1T parameter models, but it required designing around 72-interconnected GPUs and quite a bit of systems-level design. Vera Rubin is targeting 10T parameters models and we should see those training in around around 2026 and being available for inference maybe in 2027.

Around the same time as these chips are scaling-up, the amount of capital and build out in new datacenters is also scaling up. You can learn much more about all this in Semianalysis or other blogs. The number I heard quoted at the World Economic Forum is $650B of spend on data center capex for 2026 (now much more widely reported). If this amount truly continues, we should also see gains in late 2026/2027 of the massive data center build out. To put these numbers into perspective, here are other projects with similar capex (all in 2026 dollars):

$426B - Capex spend during dotcom boom by telecoms in 2000 and 2001²
$271B - US Federal Highway system³
$103B - International Space Station⁴

AI bottlenecks

The GPU shortage and now DRAM shortage have made people start to pay closer attention to what are the future bottlenecks to get to large AI deployments. Some trends that I find interesting:

The obvious long-term ones: capital (interest rates are still pretty high) and power are going to hit. All signs point to GPU lifetimes remaining high, which amortizes the capital to 1/5 to 1/7 per year. I've heard V100s are still not retired at big clouds (AWS, GCP) - even though they're almost 10 years old now. Efficiency gains should slow down too, as we cannot keep shrinking past FP4. So power begins to dominate the cost rather than purchase price. For example, Elon Musk is insisting now that long-term the power will matter so much, that data centers in space will be needed. I'm skeptical because most of the reasons cited for slow power generation increase are regulatory, which can be solved more easily than doing hourly launches of SpaceX rockets.⁵ Side note - it's insane that we have space datacenters now in public discourse.

I do not know enough about finance to predict the capital problems, but I guess if interest rates shrink, investments in data centers become more attractive. But if OpenAI or Anthropic IPO and their stock tanks, it could rapidly pop the AI bubble. But it seems like we have no shortage of capital. Apple, for example, has committed almost nothing to AI and is still sitting on huge amounts of cash.

Input/Output. Not on the model training side, but when using LLMs. Much of the recent gains in model capabilities, like deep research tools and CLI agents, has been in providing the model with the right context and giving it tools. Much of the work I personally have done is on finding the right research paper or giving an agent the correct database for testing a hypothesis. Concurrently, effort is growing on making the models better at solving multistep problems with lots of context, rather than memorization. In fact, most problems I see in models is not from intelligence, but rather input/output problems.

Healthcare for patients

Putting aside AI, I want to talk about how healthcare is changing.

Retatrutide just passed phase 3 and isn't FDA approved yet, but is available from many websites, advertised on facebook/instagram, and taken by some in my social network. Even after it is FDA approved, it likely will be heavily purchased outside of prescriptions since it is the most effective weight-loss drug. Ozempic/wegovy and zepbound are also already widely purchased from channels other than healthcare plans. This has existed before for things like steroids, but the weight loss drugs with few side-effects are going to be way higher impact. Even ignoring compounding pharmacies and illicit drugs, drug manufacturers like Lilly and Novo are also offering the medications direct to patients outside of traditional insurance.

An optimistic framing of this is we're seeing more "patient agency." Patients are doing more to cut out traditional healthcare machinery. Another example is concierge doctors. Both doctors and patients are turning away from insurance for routine healthcare. Doctors get less paperwork and no burden from working with insurance. Patients have more control over when/how they see their doctor.

Another example: the FDA has recently changed course on wearables, and now wearables will be able to provide much more information to consumers without FDA approval.⁶ It used to require clinical trials for software updates to wearables when they would provide health related advice. For example, some wearables with oxygen saturation could detect COVID-19 during the pandemic, but weren't allowed to ship software to notify customers without going through a clinical trial and filing regulatory paperwork. The recent updated guidance should make this less burdensome and enable wearables to provide much more actionable insight.

Of course, the biggest change in patient agency is that people can use ChatGPT or other GenAI tools as an at-home doctor. Both ChatGPT and Claude are focused on direct-to-consumer with custom data integration, like medicare codes and pre-authorization criteria so that patients can understand and even dispute medical bills.

As discussed in this nice article, the cost to develop your own drug is dropping rapidly and we may see a new category of hyper-personalized medicine arise. For example, cell therapy is a third-line defense in cancer treatment, but is more easily tolerated and maybe more effective than chemotherapy. If you can afford it, you could simple do cell therapy first and likely have better outcomes.

Drug discovery

Chinese biotech has done an amazing job of an "existence proof" for what biotech efficiency could look like. Chinese biotech has faster execution in all parts of the pipeline than American - from optimizing a lead compound, doing pre-clinical animal work, and enrolling patients. The result is that China is now competitive or exceeding traditional biotech in the share of drug discovery deals (depending on your measure of choice).⁷ Some of these attributes, like enrolling patients faster, should translate out of China to countries like India with improvements to infrastructure. But others, like fast cycle times for medicinal chemistry, are from long-term investments in supply-chain and training that seem unique to China right now.

Another trend is the friction of AI progress with drug discovery. Outside of pharma, I hear CEOs saying that no positions will be replaced without compelling evidence AI cannot do it. I've spoken to start-ups that are already doing factory automation in multiple F100 companies. It is striking the level of AI-adoption we're seeing in many industries. You can also see some economic consequences from this in the long list of companies doing layoffs in 2026.

Some of the major holdouts of AI are those that are concerned about AI touching intellectual property - like pharmaceutical companies and publishers. I know that AI agents and AI for drug discovery is the major theme of the recent 2026 JP Morgan Healthcare conference, but it sounds like the day-to-day at pharma companies are similar to before. Few big changes related to AI, like what we see in software engineering. Some external evidence can be seen on the latest public deals in the space. They are deep learning models that can accelerate one piece of drug discovery like Noetic AI finding targets faster or Chai improving iteration speed for co-folding predictions. These are beneficial, but they are not LLMs and unrelated to the steady progress we see in AI. The only major announcement has been Lilly with some highly-visible work with NVIDIA, but little has been revealed about this.

On the other hand, biotech is all about agility and I see a growing differentiation between pharma companies and biotech on the adoption of AI. For example, a biotech can now have AI tools do their data analysis or save a few hours of lawyer time for quick questions. They can prototype a dashboard with claude code. They can take PDFs from CROs and convert them to spreadsheets with agents. They can have a leaner engineering team and more quickly research targets.

It is always the case that biotech has more agility than big pharma, but AI may be the first time this difference can rapidly compound in all aspects of drug discovery.

Society and AI

One of of the biggest surprises to me is how distinctive AI-generated content has become. It is not obvious why this is. AI-generated text just is so obvious that it's like reading a typo-riddled document that makes you negatively judge the document independent of content. If this sounds mysterious to you, read this guide and you'll be able to easily spot AI-generated text. This may be a temporary phenomenon, but both ChatGPT-style images and AI-generated text has made AI content immediately obvious. (Or, maybe it's only low-effort AI-generated content that is now obvious).

Thus, AI-generated content is distinguishable and many people have a strong negative reaction to AI-generated content. I live in a bubble in San Francisco, and sometimes I get a harsh reminder. I was reading news about Brandon Sanderson, an interesting fantasy author that is attempting to write a large interconnected ~50 book series called the Cosmere. I went to the subreddit that follows his books and saw a moderator announcement that all AI content was being banned: https://old.reddit.com/r/brandonsanderson/comments/1pofmcb/policy_update_lets_talk_about_ai_and_how_we/

Some specific language was surprising to me:

The community voted overwhelmingly to ban all AI-generated content. Accordingly, we will be implementing a ban on AI-generated content....

We aren't surprised at the outcome of the vote, based on the sentiment we've seen in response to the minimal content we have allowed under the prior (now changing) rules

This strong anti-AI sentiment has now led to "witch hunts" where people submit artwork and it is accused of being AI-generated. Now people have to post pictures of them actually creating the artwork. Ironically, many comments in reply to content on social media is itself AI-generated too. In fact, I wonder if anti-AI comments are purposefully generated by AI because they attract engagement.

Then there are other oddities like the number one song in Sweden for a while was AI-generated.⁸ This has led to confusion about if it should "chart" on top songs.

There are strong anti-AI forces, but people still are adopting AI heavily at work and enjoying AI-generated content. I think the days are numbered for people to be able to distinguish AI content. Will that make anti-AI sentiment grow faster with urgency, or will that make this whole phenomenon disappear?

open source faltering

Open source has been one of the greatest and most impactful ideas in technology. Yet, I think we are witnessing a big shift in open source software.

I've made a lot of open source software (http://github.com/whitead). Most of my projects are small, but some are used by >100 other repos and have more than 5k stars.

I see fewer and fewer quality PRs from external contributors on my repos. There are many weird and confused PRs generated by AI. I see this on other repos I follow, creating issues for maintainers. Simultaneously, I can make much faster progress on my own repos with AI because I can plan and guide AI agents better than an external contributor. This changes the calculus of open sourcing: external help is more of a burden and my internal team can be more effective.

There are other headwinds against open source. Github is slowly morphing into a developer platform, rather than a passive site for version controlled software. Software is becoming easier to write, and so the trade-off of building your own library vs using an open source library is trending towards building your own. Especially as contributing external PRs has more friction, it becomes more tempting to have your own code.

There has also been an ossification: LLMs probably write 50% of committed code on github now (either directly from agents or indirectly via humans using LLMs) and they use libraries that were popular in 2023-2024, further cementing their popularity. For example, I fight with LLMs all the time to use uv instead of pip (and now just alias it).

open source models

Open-source/open-weights models have had less impact in 2025 than I expected back in January 2025 after r1's release. Llama 4 was a big miss and deepseek r1 was amazing for study, but didn't really stay at the frontier for long (if at all). Kimi's models have been a pleasant surprise for writing.

Things are changing though. Frontier models saturating many topics (like summarization of text, basic programming, etc.), I expect open source to start growing in usage as a result - most use-cases no longer require the best frontier models. Anecdotally, many people use GLM's models for coding agents to create a "free tier" because they are good enough for basic usage now.

There are also cool new technologies, like the Tinker API that makes it easier to train open source models. Fast and cheap training of Kimi 2.5 is just getting started, but I have high hopes for that.

There are also growing niches of specialized models. Document parsing models (nemotron parse), layout models (detectron), huge gains in protein structure models (Protenix), etc. I'm very excited for these models!

I go through cycles of pessimism on open source models. However, seeing the rapidly increasing prices of frontier LLMs, I have high hopes for open source domain specific models because it's now cost prohibitive to use a frontier LLM to, for example, parse a document.

One big caveat for all this is inference. Inference is such a mess for open-source models. Openrouter is the defacto if you want convenience, but the quality is so dramatically uneven that you cannot use it for serious work. For example, Xeophon found swings of 50 points in common benchmarks between Openrouter and reference APIs from model creators.⁹ Obviously, you can just self-host, but that becomes expensive and complex at scale.

dead internet

I do think the internet as know it is on its last legs. There was a time when there were many APIs and websites that were friendly to scraping. You didn't need to login to view websites and, generally, the internet was a simple utility that was always available. Now, most websites are locked down, requiring logins, captchas, and block any automated access. Twitter/X is closed off. Reddit is closed off. Stackoverflow has almost no activity.¹⁰ All news and scientific publishers blok automated traffic. Even Google is no longer just a list of links, but an invitation for you to start a session.

I know there is cause and effect here, but I really wish I could rewind the internet about 10 years and build software with AI back then.

At the same time the internet is locking down, content has morphed into some weird algorithm-maximizing mode. Forget AI for a moment. Humans on youtube have figured out how the algorithm works and make their content uniform in length, thumbnail intensity, and slow-delivery to maximize some metrics. X feels the same, where so much content is bait or following a narrative that trends well. I have luckily avoided getting sucked into tiktok; I have some aversion to short-form video.

It is actually amazing what happens when you put a bunch of smart humans together to optimize content; through competition, they create better and better content. For example, Mr. Beast is a famous youtuber that produces enough viewing minutes to be equivalent to 24 superbowls per year. Literally his views/followers are measured in billions.¹¹ He says in interviews that he obsesses over per-second stats on his videos and optimizes the video editing and scripts specifically for the youtube algorithm.

I believe we will see AI agents of some kind taking over this process soon. The greatest strength of AI methods is maximizing metrics. I already see AI-generated stories and images on X that trend well. This will increase and youtube/tiktok seem like the final frontier for AI-generated content maximizing human attention.

AI education

I've taught classes for over 10 years, including undergraduate and graduate courses on programming for engineers. I wrote a textbook on deep learning for materials and chemistry and have used LLMs/agents in my courses since 2023.

My department is grappling with AI policies and I was asked my advice. I had a really hard time coming up with anything opinionated. Here are some reasonable stances I considered:

Tool stance: AI is like a web search, calculator or other tool: it should be used sometimes as an aid but students need to not be dependent on it.
Peer stance: AI should be treated like working with a classmate. It's ok for some homework, but should be stated clearly and certainly not used in tests or usual coursework
Essential stance: In 5 years, it may be considered unethical to not use AI as an engineer because it so much more effective. AI should be involved everywhere in the curriculum and students leaving the university should reach for AI in nearly all situations

Today, I think all these seem valid. In 1-2 years, I suspect AI will be so different than a calculator or web search, that we cannot pretend any more that it is just a tool.

Consider two equal students, one with "peer stance" and one with "essential stance." The "essential stance" student will be immediately impactful at their career. The "peer stance" student may have had an invaluable opportunity to learn without AI assistance confusing them, but it's not obvious to me. It's like saying having a private tutor for your whole life could be detrimental. I suspect the "essential stance" peer will be better off in the long-run.

Maybe the answer is to completely rebuild higher-education considering AI now.

Discussion

This post reminds me of the Mark Twain quote: "I apologize for such a long letter - I didn't have time to write a short one." I have a hard time synthesizing all these observations into a concrete set of predictions or opinions on the world. The only conclusion I can draw is that the world is changing at a dramatic pace, partly from AI.

You can listen to me ramble more about my thoughts on the space at this podcast too: https://www.youtube.com/watch?v=XqoBSB3nsgw

METR task completion benchmark https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ ↩
US Telecom Capex Report https://ustelecom.org/wp-content/uploads/2023/09/2022-Broadband-Capex-Report-final.pdf ↩
US NHTA https://www.fhwa.dot.gov/infrastructure/50estimate.cfm ↩
NASA OIG Report https://oig.nasa.gov/wp-content/uploads/2024/02/IG-14-031.pdf ↩
Dwarkesh Patel interview with Elon Musk https://www.youtube.com/watch?v=BYXbuik3dgA ↩
FDA wearable guidance update https://www.fda.gov/regulatory-information/search-fda-guidance-documents/general-wellness-policy-low-risk-devices ↩
https://www.goldmansachs.com/insights/articles/china-is-increasing-its-share-of-global-drug-development ↩
BBC article https://www.bbc.com/news/articles/cp829jey9z7o ↩
https://epoch.ai/gradient-updates/why-benchmarking-is-hard ↩
https://x.com/TheOneandOmsy/status/2012571730485682618 ↩
https://www.ericholscher.com/blog/2025/jan/21/stack-overflows-decline/ ↩