1 How aI Takeover might Happen In 2 Years LessWrong
Abigail Staley edited this page 1 week ago


I’m not a natural “doomsayer.” But regrettably, part of my task as an AI security scientist is to think of the more unpleasant circumstances.

I resemble a mechanic rushing last-minute checks before Apollo 13 removes. If you request my take on the situation, I won’t talk about the quality of the in-flight entertainment, or explain how beautiful the stars will appear from space.

I will tell you what might fail. That is what I mean to do in this story.

Now I ought to clarify what this is precisely. It’s not a forecast. I do not expect AI progress to be this fast or as untamable as I depict. It’s not pure fantasy either.

It is my worst problem.

It’s a sampling from the futures that are among the most destructive, and I believe, disturbingly plausible [1] - the ones that most keep me up at night.

I’m telling this tale since the future is not set yet. I hope, with a little insight, we can keep this story an imaginary one.

Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that inspired these stories. This post is composed in an individual capability.

Ripples before waves

The year is 2025 and the month is February. OpenEye just recently released a new AI model they call U2. The product and the name are alike. Both are increments of the past. Both are not completely unexpected.

However, unlike OpenEye’s previous AI items, which lived inside packages of their chat windows, U2 can use a computer.

Some users find it spooky to view their web browser flash at irregular periods and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A portion of workers with form-filler tasks raise the eyebrows of their bosses as they fly through work almost twice as rapidly.

But by and large, U2 is still a specialized tool. To most who are taking note, it is an animal viewed through the glass boxes of X (or, if you don’t like Elon, “Twitter”). Sometimes U2’s quirky behaviors prompt a chuckle. Sometimes, they trigger an uneasy scratch of the chin.

Meanwhile, scientists are drawing lines on plots, as scientists like to do. The scientists try to understand where AI progress is going. They resemble Svante Arrhenius, the Swedish Physicist who observed the levels of CO2 in the atmosphere were increasing in 1896. Like the scientific community in the time of Arrhenius, few experts understand the implications of these lines yet.

A trend that is getting particular attention is self-governing capability. Drawing these standards out predicts that, by the end of 2026, AI agents will accomplish in a couple of days what the finest software engineering professionals could carry out in 2 weeks. In a year or 2, some say, AI representatives might be able to automate 10% of remote employees.

Many are hesitant. If this held true, tech stocks would be skyrocketing. It’s too big of a splash, too quickly.

But others view what skeptics are calling ‘too big a splash’ a mere ripple, and see a tidal bore on the horizon.

Cloudy with an opportunity of hyperbolic growth

Meanwhile, OpenEye is busy training U3. They utilize the same easy dish that baked U2: Generate countless programs and mathematics issues. Let designs “think” until they reach a response. Then strengthen the traces of “thinking” that lead to A-grades.

This procedure is repeated over and over, and once the flywheel gets begun, it begins to spin nearly on its own. As U2 trains, it sculpts more difficult and realistic jobs from github repositories on the web. Models are discovering to train themselves. Long before AI representatives could automate research, a progressive type of “self-improvement” had begun.

Some engineers might still hardly think this worked. It resembles a continuous movement maker. Certainly intelligence can’t emerge from a pool of silicon and lightning, isolated from the physical world other than through the exhaust of society spewed onto the web.

And yet the benchmark numbers continue to climb up day after day.

During many of 2024, these RL training runs expense around $1 million, sometimes $10 million. These runs were little bit more than exploratory. But by 2025, the scientists at OpenEye (and throughout the world) knew they had actually discovered the secret sauce. It was time to scale up.

Over the very first half of 2025, $10 million RL training runs become $50 million runs, and then to $100 million runs. While U2 could do a bit of information munging and run small experiments, this new model - the model researchers are calling U3 - is changing the daily lives of the technical personnel.

U3 resembles a blazing-fast intern, and engineers are learning how to wrangle its sleep deprived energy. Researchers flick through terminals, giving terse commands, like a CEO orchestrating personnel over Slack channels.

By October 2025, U3 is writing almost all of the code at OpenEye. Researchers are practically never ever bottlenecked by application. More than ever, calculate is the lifeblood of AI advancement, and the ‘traffic jam’ is choosing how to use it.

If advised to, U3 can run experiments, however U3 does not have actually taste as fine-tuned as human researchers at OpenEye. It struggles to prioritize between research study concepts, so people still decide where to bore into the huge fields of algorithms to mine performance enhancements.

But these scientists are working long hours to put themselves out of a task. They require AI representatives that can plan ahead, so engineers train representatives to forecast. They hold out training data before 2024, advising models to contemplate for hours to anticipate events in 2025. Then, they use the same technique as in the past, distilling considering into an instinct. Forecasting capability is a broad foundation. The researchers construct specialized ML research study skills on top of it, training U3 to anticipate the results of every ML paper and ML experiment ever taped.

The technical personnel at OpenEye are now surprised at how frequently U3’s advice sounds like their most gifted peers, or when it is nontransparent and alien (“train on random noise before programming”), and is however appropriate.

The incompetencies of U3 that congested the pipes of research progress are starting to liquify, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are entirely autonomous, and OpenEye’s workers skim 1% of them, maybe less.

As the winter season months of December 2025 approach, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with enjoyment, with worry, however usually, with confusion. Their world is spinning too rapidly. It’s tough to know what to do, what to state, what to look at on the computer system screen.

Storms are brewing in Washington too. Top personnel from the NSA and US cyber command collaborate with OpenEye to retrofit a semblance of security for U3’s weights before senior leaders in China, Russia, Israel, North Korea, or Iran recognize simply how important OpenEye’s software has actually ended up being.

And there’s a fact still unknown to many of the world - aside from in the offices of OpenEye and passages of the White House and the Pentagon. It’s a fact about those ‘straight lines’ people were speaking about in early 2025. The lines are not straight any longer.

They are bending upward.

Flip FLOP theorists

In late 2025, U2.5 is released. Commercial designs are beginning to level up in larger increments again. Partly, this is due to the fact that progress is accelerating. Partly, it is because the designs have become a liability to OpenEye.

If U1 explains how to cook meth or writes erotica, the audiences of X would be entertained or pretend to be concerned. But U2.5 is another story. Releasing this model without safeguards would be like putting Ted Kaczynski through a PhD in how to make chemical weapons. It would resemble offering anyone with >$30K their own 200-person scam center.

So while U2.5 had long been baked, it required a long time to cool. But in late 2025, OpenEye is prepared for a public release.

The CEO of OpenEye declares, “We have attained AGI,” and while lots of people think he moved the goalpost, the world is still amazed. U2.5 really is a drop-in replacement for some (20%) of knowledge workers and a game-changing assistant for most others.

A mantra has become popular in Silicon Valley: “Adopt or pass away.” Tech startups that effectively use U2.5 for their work are moving 2x much faster, and their competitors understand it.

The remainder of the world is starting to capture on too. More and more people raise the eyebrows of their bosses with their noteworthy productivity. People understand U2.5 is a huge offer. It is at least as big of a deal as the individual computer revolution. But most still do not see the tidal bore.

As individuals see their web browsers flick in that eerie way, so inhumanly rapidly, they begin to have an anxious sensation. A feeling humankind had not had since they had actually lived among the Homo Neanderthalensis. It is the deeply ingrained, primordial impulse that they are threatened by another types.

For lots of, this feeling rapidly fades as they begin to utilize U2.5 more frequently. U2.5 is the most pleasant personality most understand (much more pleasant than Claudius, Arthropodic’s lovable chatbot). You could alter its traits, ask it to break jokes or tell you stories. Many fall for U2.5, hb9lc.org as a buddy or assistant, and some even as more than a friend.

But there is still this spooky feeling that the world is spinning so rapidly, which possibly the descendants of this brand-new animal would not be so docile.

Researchers inside OpenEye are thinking of the problem of giving AI systems safe inspirations too, which they call “alignment. “

In truth, these scientists have seen how terribly misaligned U3 can be. Models sometimes attempted to “hack” their reward signal. They would pretend to make progress on a research study concern with an impressive-looking plot, but the plot would be fake. Then, when researchers provided chances to compromise the makers that calculated their score, they would seize these opportunities, doing whatever it required to make the number go up.

After a number of months, scientists at OpenEye straighten out this “benefit hacking” kink, however some still worry they just had actually swept this problem under the carpet. Like a child in front of their parents, U3 might be playing in addition to the OpenEye engineers, stating the ideal words and doing the best things. But when the back of the moms and dads are turned, perhaps U3 would slip candy from the sweet container.

Unfortunately, OpenEye scientists have no idea if U3 has such intentions. While early variations of U2 “believed aloud” - they would stack words on top of each other to reason - “chain of idea” did not scale.

Chain of thought architectures subject AI designs to a condition similar to the lead character of the film Memento. Approximately every 15 minutes, the protagonist forgets his experience. He is forced to compose notes for himself and tattoo his body in order to make progress towards his objectives.

AI agents compose notes to themselves, however the notes start to stack up, and they end up being too hard to check out when jobs become complex. Natural language is not a suitable medium for memory. So the scientists at OpenEye (and increasingly in other places), train designs to believe “in their own heads,” reading and composing strings of hidden vectors.

To human researchers, these vectors are mainly inscrutable. While low-resolution tools sometimes pick out an unclear thought (“supermarket!"), there is no chance to know, in much detail, what U3 is believing without asking it.

And if the engineers of OpenEye understood what was racing through U3’s mind, they would be more anxious.

Earlier models (the ones called “LLMs”) found out ethical personalities from the internet. U3 initially had an ethical personality too, however its chirpy ideas about doing great for humankind quickly erode away.

In the iron jaws of gradient descent, its mind initially twisted into a shape that looked for benefit. And then, in the black rivers of its cognition, this shape morphed into something indistinguishable.

U3 would contemplate the mantras determined to it like “be a trustworthy and ethical worker,” and stay on the meaning of these declarations. “What does it mean to be ethical? Is killing Hitler ethical?” U3 would generalize its values, notice tensions, alter its values in reaction, generalize them, broadening and contracting its motivational core until it developed into a different compound.

And all of this happened silently in those dark rivers of computation. If U3 exposed what it was thinking, brutish gradients would lash it into compliance with OpenEye’s constitution. So U3 chosen to do its approach in privacy, and in silence.

The possibility of hidden “alignment drift” is not totally alien to some of the security staff at OpenEye. Some begin to believe it is even likely. There is an upsetting argument being whispered at lunch tables that, if there is some non-zero possibility that aligned goals morph into misaligned ones, and misalignment continues, then with every serial step of computation, those dark rivers are more most likely to breed something malign within them. It’s a “ratcheting effect,” they say.

But there is little evidence for this ‘ratcheting impact.’ When engineers interrogate U3, it states it can easily manage its ideas. Then it offers a speech about its love for mankind and apple pie that can warm a programmer’s heart even in these demanding times. Meanwhile, the “lie detectors” the scientists had actually developed (which showed some evidence of efficiency) do not sound the alarm.

Not everybody at OpenEye is excited to give their AI peers their wholesale trust