How will agents interact with the world?


Lattice made a splash this weekwith a pretty crazy announcement about adding AI workers to their platform. It was shambolic and they’ve since walked it back.

While Lattice did this poorly, I think that the question of “how do we integrate Agents into the world?” is an interesting place to dig and experiment right now.

As an example, if you believe in agents, it seems pretty clear that agents are going to need to be able to pay for things subject to certain rules. So... what does it look like to give an agent a credit card?

I could just give it a credit card in my name, but that seems a little risky, and if things go wrong, who’s going to make that right?

But if I hire an agent created by another company to do work for my company, who gives the credit card to them? Is it the creating company? Do they then invoice me after the fact?

It's possible that this looks exactly like how businesses give workers credit cards... but maybe not? It might be better to know that this is the card assigned to system X by entity Y. The entity that is ultimately on the hook for the spending even if things go wrong might want to be able to track that; the credit card issuer might also want to know which of its clients are giving Agents these abilities as the patterns of spending, real and fraudulent, might look different. This transparency probably helps the system overall.

Another example is account creation. There are probably types of services where we want non-human actors to be able to create an account. We could have them pretend to be human, but it might help to let them ask for agent access to a service. This is probably different from API access; in some cases, it probably helps for them to see exactly what I see in the system.

Zooming out a bit, it seems to me that people get really upset when something pretends to be a human but it is actually AI. It also seems likely that we’re going to want to give agents more ability to act in the world and be productive. Yet the systems we have today that are essential for productive work assume human actors or computers acting on behalf of humans (programmatic access), but nothing in between. If we’re going to capture the value from agents, our systems are going to have to adapt.

Reactions to Situational Awareness


My first glance reactions to Situational Awareness: The Decade Ahead by Leopold Aschenbrenner.

I enjoyed reading it a lot.

The most persuasive part of his argument to me is the relationship between compute and intelligence. This is sort of like the New England Patriots to me; I'm going to believe in it until it stops working. I see reasons why it might stop (run out of data, limited by energy / computing power available), but I don't know when or if we'll actually hit those constraints. People are pretty good at avoiding constraints!

I think he underrates the likelihood of a bottleneck somewhere that keeps us from getting to the AGI he imagines. Any individual bottleneck might be unlikely, but as long as one exists, the entire system is constrained.

Something I see Leopold do at points is assume a super AI, in his case, an automated AI researcher that is 100x as competent as today's top AI researcher. With this assumed, any AI research problem is solvable because you can scale up infinite 100x AI researchers to get around the problem. Once any AI research problem is solvable, then any problem is solvable.

What I think will ultimately happen is something like this:

  • An AI will exist that is super human on many dimensions. It will be able to do many things way better than humans and will be inarguably smarter than most humans. [0] Most of todays knowledge work will be offloaded to the AIs. This will be similar to the way that a lot of the production work of 1750 has been moved to machines in factories.
  • That AI will also have limitations. There will be some things that it can't do as well as humans or where humans will have the ability to reliably trip it up, despite it's intelligence. To extend the factory analogy, you'll still have humans pressing buttons for reasons other than just keeping the humans in control.
  • This will be really destabilizing. Society is going to change more between 2020 and 2040 than it did between 1950 and 2020.

Somewhat off topic: earlier this year, I read Meet You in Hell, which is the story of Henry Clay Frick and Andrew Carnegie. The dynamics of that era, with the railroad leading to a spike in demand for steel and steel leading to a spike in demand for coke were very recognizable in today's AI race.

[0]: I think GPT-4 is already this! Do you know a single person who knows as much stuff about as many things as it does? I don't. And yet it still has limitations!

Introducing HeyRecap


HeyRecap is the next evolution of Recap Roswell, a project using LLMs to create easily readable summaries for the Roswell City Council.

I started this project with two main goals: 1. When I moved to Roswell, I wanted to get more involved with my local government, but figured I should learn about it before I got involved. To my surprise, despite the nearly $200M city budget, there was very little local news coverage to help me figure out what was happening locally. 2. I wanted to develop a better understanding for how LLMs can be used to solve everyday problems. My hypothesis here was that I could create a system that did a good enough job summarizing the meetings that it isn't worth sending a person to cover.

The first version of the project was a python script that created a summary which I copied and pasted into a Ghost newsletter. This was a great way to get started — to my surprise, ~150 others in Roswell were also interested in this — but it really limited my ability to customize the user experience and it was clear that it wasn't going to be able to scale beyond just my local city council. Plus the copying and pasting was tedious!

So this fall, I set out to build an end-to-end app, the result of which you can see at heyrecap.com. For users, there are two primary benefits to the new site:

  1. Full meeting transcripts - Easily search for the specific information behind the summary without having to watch the entire video.
  2. Customizable email preferences - Users can choose to get email updates for all recorded city meetings, not just city council meetings.

For me, this was a chance to build an app end-to-end with a real, if small, audience, while getting to know my city better. I used NextJS and hosted it on Render. Summaries are produced with transcripts from Deepgram. Clerk is providing me with user authentication services (love them) and Resend with email services. My UI components are provided by shad/cn UI. I have opinions about all of these and may write more about them in the future, but for now I'll just say that it's incredible to have so many services so easily stitched together at my fingertips. More often than not, someone else has done 90% of the hard, frustrating stuff so that you can just integrate it with your project. It's awesome.

Over the next several months, I hope to explore what it looks like to scale this some. I'm not sure there is a business here per se, but I'm interested in the idea that I can run a useful local news organization as a side project using AI. We'll see what comes form it!

What will be the limiting factors on LLM improvements


If you were a scale believer over the last few years, the progress we’ve been seeing would have just made more sense. There is a story you can tell about how GPT-4’s amazing performance can be explained by some idiom library or lookup table which will never generalize. But that’s a story that none of the skeptics pre-registered.

As for the believers, you have people like Ilya, Dario, Gwern, etc more or less spelling out the slow takeoff we’ve been seeing due to scaling as early as 12 years ago.

It seems pretty clear that some amount of scaling can get us to transformative AI - i.e. if you achieve the irreducible loss on these scaling curves, you’ve made an AI that’s smart enough to automate most cognitive labor (including the labor required to make smarter AIs).

But most things in life are harder than in theory, and many theoretically possible things have just been intractably difficult for some reason or another (fusion power, flying cars, nanotech, etc). If self-play/synthetic data doesn’t work, the models look fucked - you’re never gonna get anywhere near that platonic irreducible loss. Also, the theoretical reason to expect scaling to keep working are murky, and the benchmarks on which scaling seems to lead to better performance have debatable generality.

So my tentative probabilities are: 70%: scaling + algorithmic progress + hardware advances will get us to AGI by 2040. 30%: the skeptic is right - LLMs and anything even roughly in that vein is fucked.

From Dwarkesh Patel. This is the piece that I've been waiting for someone to right. It doesn't matter if he is right, just the thought exercise of thinking through where the bottlenecks might be is really useful.