jdilla.xyz

The AI that makes the AI

2024-08-13

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aids to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems.

From The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, full paper available here

I'm interested to read this one more closely and see the degree to which it does (or doesn't) rely upon having experiments that the LLM can execute without human intervention. Either way, an interesting result, but my hypothesis is that "places where the LLM can verify a result" is going to be the limiting factor.

Book Thoughts: Between Two Fires

2024-07-22

Link to the book is here.

The first 10 chapters of this book are among my favorite reading experiences ever; terrifying, mysterious, creative. I wondered how the author was going to keep up that pace for the rest of the book.

Ultimately, he didn't. I'm not sure if it was possible for him to. Part of what made the beginning part of the book so enticing was wondering if miracles were actually happening or if they were just coincidences. At some point, the author had to make a choice and from that point forward everything got less interesting.

California's changing climate

2024-07-20

A fantastic piece by Paul Kedrosky on how California's Atmospheric River is changing and the implications of these changes.

Lots of great stuff in it, but this was totally new to me:

A predator-prey model is a mathematical representation of the interactions between two species: a predator and prey. It is often modeled as wolves, sheep, and grass. The most common model is the Lotka-Volterra, which consists of two differential equations. There are two stable equilibria: one with predator and prey in approximate balance, and one with both extinct

How will agents interact with the world?

2024-07-13

Lattice made a splash this weekwith a pretty crazy announcement about adding AI workers to their platform. It was shambolic and they’ve since walked it back.

While Lattice did this poorly, I think that the question of “how do we integrate Agents into the world?” is an interesting place to dig and experiment right now.

As an example, if you believe in agents, it seems pretty clear that agents are going to need to be able to pay for things subject to certain rules. So... what does it look like to give an agent a credit card?

I could just give it a credit card in my name, but that seems a little risky, and if things go wrong, who’s going to make that right?

But if I hire an agent created by another company to do work for my company, who gives the credit card to them? Is it the creating company? Do they then invoice me after the fact?

It's possible that this looks exactly like how businesses give workers credit cards... but maybe not? It might be better to know that this is the card assigned to system X by entity Y. The entity that is ultimately on the hook for the spending even if things go wrong might want to be able to track that; the credit card issuer might also want to know which of its clients are giving Agents these abilities as the patterns of spending, real and fraudulent, might look different. This transparency probably helps the system overall.

Another example is account creation. There are probably types of services where we want non-human actors to be able to create an account. We could have them pretend to be human, but it might help to let them ask for agent access to a service. This is probably different from API access; in some cases, it probably helps for them to see exactly what I see in the system.

Zooming out a bit, it seems to me that people get really upset when something pretends to be a human but it is actually AI. It also seems likely that we’re going to want to give agents more ability to act in the world and be productive. Yet the systems we have today that are essential for productive work assume human actors or computers acting on behalf of humans (programmatic access), but nothing in between. If we’re going to capture the value from agents, our systems are going to have to adapt.