jdilla.xyz

The AI that makes the AI

2024-08-13

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aids to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems.

From The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, full paper available here

I'm interested to read this one more closely and see the degree to which it does (or doesn't) rely upon having experiments that the LLM can execute without human intervention. Either way, an interesting result, but my hypothesis is that "places where the LLM can verify a result" is going to be the limiting factor.

How will agents interact with the world?

2024-07-13

Lattice made a splash this weekwith a pretty crazy announcement about adding AI workers to their platform. It was shambolic and they’ve since walked it back.

While Lattice did this poorly, I think that the question of “how do we integrate Agents into the world?” is an interesting place to dig and experiment right now.

As an example, if you believe in agents, it seems pretty clear that agents are going to need to be able to pay for things subject to certain rules. So... what does it look like to give an agent a credit card?

I could just give it a credit card in my name, but that seems a little risky, and if things go wrong, who’s going to make that right?

But if I hire an agent created by another company to do work for my company, who gives the credit card to them? Is it the creating company? Do they then invoice me after the fact?

It's possible that this looks exactly like how businesses give workers credit cards... but maybe not? It might be better to know that this is the card assigned to system X by entity Y. The entity that is ultimately on the hook for the spending even if things go wrong might want to be able to track that; the credit card issuer might also want to know which of its clients are giving Agents these abilities as the patterns of spending, real and fraudulent, might look different. This transparency probably helps the system overall.

Another example is account creation. There are probably types of services where we want non-human actors to be able to create an account. We could have them pretend to be human, but it might help to let them ask for agent access to a service. This is probably different from API access; in some cases, it probably helps for them to see exactly what I see in the system.

Zooming out a bit, it seems to me that people get really upset when something pretends to be a human but it is actually AI. It also seems likely that we’re going to want to give agents more ability to act in the world and be productive. Yet the systems we have today that are essential for productive work assume human actors or computers acting on behalf of humans (programmatic access), but nothing in between. If we’re going to capture the value from agents, our systems are going to have to adapt.

Reactions to Situational Awareness

2024-06-14

My first glance reactions to Situational Awareness: The Decade Ahead by Leopold Aschenbrenner.

I enjoyed reading it a lot.

The most persuasive part of his argument to me is the relationship between compute and intelligence. This is sort of like the New England Patriots to me; I'm going to believe in it until it stops working. I see reasons why it might stop (run out of data, limited by energy / computing power available), but I don't know when or if we'll actually hit those constraints. People are pretty good at avoiding constraints!

I think he underrates the likelihood of a bottleneck somewhere that keeps us from getting to the AGI he imagines. Any individual bottleneck might be unlikely, but as long as one exists, the entire system is constrained.

Something I see Leopold do at points is assume a super AI, in his case, an automated AI researcher that is 100x as competent as today's top AI researcher. With this assumed, any AI research problem is solvable because you can scale up infinite 100x AI researchers to get around the problem. Once any AI research problem is solvable, then any problem is solvable.

What I think will ultimately happen is something like this:

  • An AI will exist that is super human on many dimensions. It will be able to do many things way better than humans and will be inarguably smarter than most humans. [0] Most of todays knowledge work will be offloaded to the AIs. This will be similar to the way that a lot of the production work of 1750 has been moved to machines in factories.
  • That AI will also have limitations. There will be some things that it can't do as well as humans or where humans will have the ability to reliably trip it up, despite it's intelligence. To extend the factory analogy, you'll still have humans pressing buttons for reasons other than just keeping the humans in control.
  • This will be really destabilizing. Society is going to change more between 2020 and 2040 than it did between 1950 and 2020.

Somewhat off topic: earlier this year, I read Meet You in Hell, which is the story of Henry Clay Frick and Andrew Carnegie. The dynamics of that era, with the railroad leading to a spike in demand for steel and steel leading to a spike in demand for coke were very recognizable in today's AI race.

[0]: I think GPT-4 is already this! Do you know a single person who knows as much stuff about as many things as it does? I don't. And yet it still has limitations!

Introducing HeyRecap

2024-03-12

HeyRecap is the next evolution of Recap Roswell, a project using LLMs to create easily readable summaries for the Roswell City Council.

I started this project with two main goals: 1. When I moved to Roswell, I wanted to get more involved with my local government, but figured I should learn about it before I got involved. To my surprise, despite the nearly $200M city budget, there was very little local news coverage to help me figure out what was happening locally. 2. I wanted to develop a better understanding for how LLMs can be used to solve everyday problems. My hypothesis here was that I could create a system that did a good enough job summarizing the meetings that it isn't worth sending a person to cover.

The first version of the project was a python script that created a summary which I copied and pasted into a Ghost newsletter. This was a great way to get started — to my surprise, ~150 others in Roswell were also interested in this — but it really limited my ability to customize the user experience and it was clear that it wasn't going to be able to scale beyond just my local city council. Plus the copying and pasting was tedious!

So this fall, I set out to build an end-to-end app, the result of which you can see at heyrecap.com. For users, there are two primary benefits to the new site:

  1. Full meeting transcripts - Easily search for the specific information behind the summary without having to watch the entire video.
  2. Customizable email preferences - Users can choose to get email updates for all recorded city meetings, not just city council meetings.

For me, this was a chance to build an app end-to-end with a real, if small, audience, while getting to know my city better. I used NextJS and hosted it on Render. Summaries are produced with transcripts from Deepgram. Clerk is providing me with user authentication services (love them) and Resend with email services. My UI components are provided by shad/cn UI. I have opinions about all of these and may write more about them in the future, but for now I'll just say that it's incredible to have so many services so easily stitched together at my fingertips. More often than not, someone else has done 90% of the hard, frustrating stuff so that you can just integrate it with your project. It's awesome.

Over the next several months, I hope to explore what it looks like to scale this some. I'm not sure there is a business here per se, but I'm interested in the idea that I can run a useful local news organization as a side project using AI. We'll see what comes form it!