Three Notes on ‘Situational Awareness’

A newly published series of essays predicts rapid advancements in artificial intelligence leading to government intervention and subsequent creation of superintelligence. I argue that this series of events is less likely due to lack of market incentives for this progress, implausibility of decisive government response at the relevant time, and adaptation of AI labs to similar predictions.


My discussion does not aim to engage with underlying technical claims. For the sake of argument, I will simply accept them as entirely accurate, though they’ve been critically discussed elsewhere.

Introduction

Leopold Aschenbrenner, formerly of OpenAI, has recently published a series of essays detailing his outlook on the developments ahead in AI. The essays are well-written, well-researched and insightful. Making them even more interesting, Leopold suggests – correctly, I believe - that his articulated view is representative of the beliefs of a larger group of important people in AI.

While I am not as immersed in the SF ecosystem, I have been thinking about and working on related issues for some time, and I feel I stumbled across some inconsistencies. With Leopold, I believe getting these predictions and the resulting policy response right is very important - so I briefly outline three areas where I think his argument gets stuck.

Profitability and Progress

  • AI capability gains between today and AGI might not be profitable enough to motivate investment.

  • Developing AGI & ASI might not be sufficiently attractive for private companies.

The essays begin with the observation of rapidly increasing investment in AI, especially in compute clusters and their surrounding infrastructure. Extrapolating these trends then leads to projections for capability milestones. Two strong arguments motivate these trend projections: Empirical observations around current spending, and the potential high profitability of developing AGI. But even on Leopold’s short timelines, there is a time between the current era and the final sprint to AGI and ASI. Leopold does not go into great detail on that era, but I believe crossing it might take much longer than he projects.

Firstly, short-term profitability of the next couple of model generations might not be given. Continuous growth of compute investment, according to Leopold, will have happened ‘as each generation [of models] has shocked the world’. The jump to GPT-3.5 and the ChatGPT application was impressive and motivated plenty of public, economic and political interest. This is maybe less obviously true for the jump to GPT-4, and less still for the progress within the GPT-4 tier of models. On the business side, adoption has been somewhat sluggish, with apprehension voiced at the cost of using advanced models; PR-sensitive reliability issues plaguing consumer-facing deployment; and regulation, liability, and ongoing lawsuits creating further barriers. So far, they do not seem on track to motivate widespread adoption of 100$+ subscription models, as Leopold suggests. These might well be growing pains – but to keep investors happy to fund the trend of compute costs, they would have to be outgrown fast. Otherwise, the well of compute funding might dry up past the already-committed resources, inviting a stronger focus on usability and efficiency than on capability gains. Of course, that might well be a high-revenue area. But as Leopold points out, the capex of major tech companies that drives the progress he assumes is unprecedented - intermediary returns that are merely very high don't cut it. Unprecedented returns are not yet certain.

Secondly, trying to develop AGI and ASI might be much less economically attractive to companies than Leopold suggests. At face value, creating AGI or ASI seems enormously profitable and desirable to any corporation – think of the growth, the power, the competitive advantage. Even if I was correct about lacking short-term profitability, this prospect would seem enough to motivate the enormous necessary investment.

But later, Leopold identifies (a) that governments are very unlikely to let a start-up (and, I assume, any privately-owned company) develop and control such technology themselves, and (b) that any company attempting such development will become a target of all kinds of espionage, sabotage, nationalization and more. If this is right (and I think it might well be), it changes the calculus dramatically. Getting in the sights of serious clandestine espionage is a serious threat to a private company, and being subjected to a ‘hamfisted’ home government response is likewise not particularly tempting. Susceptibility to adversarial attacks would be high, and success would not be very enticing - limited say over The Project and few profits to be made as the government takes over. That prospect is unlikely to be attractive enough to motivate the kind of funding Leopold’s projections require. Simply put: If serious progress towards AGI puts a target on your back and successfully reaching it makes the government take it away, then why try to build it at all?

Given this incentive landscape, a lot of different scenarios come to mind: some rogue company tries anyways; a government project is established even without a wakeup call; CCP-owned projects that don’t face such uncertainties pull ahead; AI progress stalls because the market simply does not incentivize building sufficiently powerful AI to kickstart AGI races; etc. These probably deserve some deeper examination. But I believe that they should at least cast some doubt on the suggested outlook.

Governments’ Wake-Up Moment

  • If the jump from AGI to ASI is fast and governments are slow, ‘The Project’ is less likely.

In discussing technical challenges around aligning superintelligence, Leopold emphasizes the paradigmatic gap between AGI and ASI, but postulates that we might get from AGI to ASI quickly, i.e. in less than a year. AGI’s failures are low stakes, the world is normal, and it is responsive to ~RLHF; but ASI is alien, its failures catastrophic, the world in upheaval, and alignment unclear. This, he argues, is one of the main challenges in achieving safe ASI – many of our well-precedented techniques for simpler models and systems stop working in the new paradigm, and we might drop the ball.

Simultaneously, Leopold argues that continuous improvement up to AGI and ASI will, at some point, lead the US government to step in and assume oversight over further frontier development. He argues this government endeavour, ‘The Project’, is likely the setting in which ASI (or maybe even AGI?) is built. There is little clarity as to when exactly this might happen, but he suggests it would likely be fairly late and would require a major wake-up moment. A successful instance of catastrophic misuse might be one possible watershed.

These two claims don’t mesh that well:

On the one hand, the governmental wake-up moment would be unlikely to happen before or during early AGI, characterized by Leopold as low-stakes and fairly easily aligned – catastrophic misuse or other outsized unexpected impacts of such a system seem unlikely. But on the other hand, the governmental wake-up moment also can’t really happen later than that, because we will go from AGI to ASI very fast, the government is slow to react, and The Project will be a fairly extensive endeavour including lots of political overhead.

This places a very specific requirement on Leopold’s narrative: The wake-up moment occurs (a) early enough to still consolidate research at The Project – i.e. well before the final sprint to ASI, but (b) late enough to leave no doubts around current relevance and future progress. At this time, the government intervenes decisively and commences The Project. This is a sequence and timing of events seems much less plausible and intuitive to me than many of Leopold’s other claims. It might well still happen like this – but if, by happenstance, the would-be wake-up misuse attempt fails, if a new capability gain is hidden or crowded out by more urgent news, if the CCP keeps a rival project under wraps, etc., this timeline gets thrown off very easily. And if the tight window for the wake-up moment passes, things likely play out very differently after all. That shrouds The Project in substantial uncertainty.

How do the ‘Situationally Aware’ React?

  • If AI lab leaders agree with Leopold’s predictions, they will try to interfere.

Lastly, if the essays do reflect the general thinking of the ‘situationally aware’, they allow some interesting insights into the likely thinking of major industry players in AI. If their predictions do align with Leopold’s, they might accordingly adjust their strategies – which might in turn affect his predictions. Specifically, two responses come to mind:

As suggested in my first objection, the prospect of nationalization might disincentivize labs to work on AGI. This could lead private sector AI development to adopt a carefully considered equilibrium, where they make sufficient incremental progress to ensure their products are competitive, but stay below the critical line of prompting wake-up worthy capability gains. Defection from this equilibrium is not impossible, but far from certain: Given the costs involved, not that many players could defect, and each of them might have major difficulties justifying their defection to shareholders in a market environment that does not favour races.

More alarmingly, the prospect of nationalization, sabotage and espionage might prompt AI labs to be deceptive about their AGI progress. Whether it is to ensure their own profits, from hybris or from distrust of government, AI lab leaders might have ample reasons to prevent nationalization. Hence, they might make sure not to raise the alarm in the first place. For instance, they might save up computational resources to skip a generation or two, prompting a faster, discontinuous jump to AGI/ASI without raising the alarm; wilfully hamstring or underreport intermediary model capabilities; or simply extensively influence the political process to prevent any interference. This does not only throw a wrench into Leopold’s predictions – it also interferes substantially with a lot of current safety mechanisms that rely on iterative, continuous progress.

There is very little telling how prevalent Leopold’s predictions are with the leadership of AI labs, and what beliefs will ultimately play into their decisions. But I believe any prediction that simultaneously sees them stripped of their passion projects and profit potentials and hinges on somewhat honest, transparent or predictable behaviour on their part is likely to get a lot wrong. Adaption to predictions needs to be accounted for.


 I try to remember posting updates on new posts and writing (but not much else) on Twitter / X.

 

Zurück
Zurück

Corporate AI Labs’ Odd Role In Their Own Governance

Weiter
Weiter

The Economic Case for Foundation Model Regulation