Deleting the wiki page 'How aI Takeover May Happen In 2 Years LessWrong' cannot be undone. Continue?
I’m not a natural “doomsayer.” But regrettably, part of my job as an AI safety scientist is to consider the more unpleasant situations.
I resemble a mechanic scrambling last-minute checks before Apollo 13 takes off. If you request for my take on the scenario, I won’t comment on the quality of the in-flight entertainment, or explain how gorgeous the stars will appear from space.
I will tell you what could fail. That is what I mean to do in this story.
Now I need to clarify what this is precisely. It’s not a prediction. I don’t expect AI development to be this fast or as untamable as I represent. It’s not pure dream either.
It is my worst nightmare.
It’s a tasting from the futures that are among the most devastating, and I think, disturbingly plausible [1] - the ones that most keep me up at night.
I’m informing this tale since the future is not set yet. I hope, with a little insight, we can keep this story an imaginary one.
Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that influenced these stories. This post is composed in a personal capacity.
Ripples before waves
The year is 2025 and the month is February. OpenEye just recently published a brand-new AI design they call U2. The product and the name are alike. Both are increments of the past. Both are not completely surprising.
However, unlike OpenEye’s prior AI products, which lived inside packages of their chat windows, U2 can use a computer.
Some users discover it spooky to watch their browser flash at irregular periods and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A portion of employees with form-filler tasks raise the eyebrows of their employers as they fly through work nearly two times as rapidly.
But by and large, U2 is still a specialized tool. To most who are paying attention, it is an animal watched through the glass boxes of X (or, if you don’t like Elon, “Twitter”). Sometimes U2’s eccentric behaviors prompt a chuckle. Sometimes, they cause an uneasy scratch of the chin.
Meanwhile, scientists are drawing lines on plots, as researchers like to do. The researchers attempt to comprehend where AI progress is going. They resemble Svante Arrhenius, the Swedish Physicist who noticed the levels of CO2 in the environment were increasing in 1896. Like the clinical neighborhood in the time of Arrhenius, couple of experts understand the ramifications of these lines yet.
A trend that is receiving particular attention is self-governing capability. Drawing these criteria out anticipates that, by the end of 2026, AI agents will accomplish in a couple of days what the finest software engineering contractors could carry out in 2 weeks. In a year or 2, some say, AI agents might be able to automate 10% of remote workers.
Many are doubtful. If this held true, tech stocks would be skyrocketing. It’s too big of a splash, too rapidly.
But others view what skeptics are calling ‘too big a splash’ a mere ripple, and see a tidal wave on the horizon.
Cloudy with an opportunity of hyperbolic development
Meanwhile, OpenEye is busy training U3. They utilize the exact same easy recipe that baked U2: Generate countless programs and mathematics problems. Let designs “believe” until they come to a response. Then enhance the traces of “believing” that result in A-grades.
This process is duplicated over and over, and when the flywheel gets started, it begins to spin nearly on its own. As U2 trains, it sculpts more tough and practical tasks from github repositories on the internet. Models are finding out to train themselves. Long before AI representatives might automate research, a progressive kind of “self-improvement” had begun.
Some engineers could still barely think this worked. It’s like a perpetual motion device. Certainly intelligence can’t emerge from a pool of silicon and lightning, isolated from the real world except through the exhaust of society spewed onto the web.
And yet the benchmark numbers continue to climb up day after day.
During the majority of 2024, these RL training runs cost around $1 million, sometimes $10 million. These runs were bit more than exploratory. But by 2025, the scientists at OpenEye (and throughout the world) knew they had discovered the secret sauce. It was time to scale up.
Over the very first half of 2025, $10 million RL training runs become $50 million runs, and after that to $100 million runs. While U2 could do a bit of data munging and run small experiments, this new design - the design scientists are calling U3 - is changing the daily lives of the technical personnel.
U3 is like a blazing-fast intern, and engineers are learning how to wrangle its sleepless energy. Researchers flick through terminals, giving terse commands, like a CEO orchestrating staff over Slack channels.
By October 2025, U3 is composing almost all of the code at OpenEye. Researchers are nearly never ever bottlenecked by application. More than ever, compute is the lifeblood of AI development, and the ‘traffic jam’ is choosing how to utilize it.
If advised to, U3 can run experiments, however U3 does not have actually taste as improved as human researchers at OpenEye. It has a hard time to focus on between research study concepts, so human beings still choose where to bore into the large fields of algorithms to mine effectiveness enhancements.
But these scientists are working long hours to put themselves out of a job. They need AI representatives that can plan ahead, so engineers train representatives to forecast. They hold out training information before 2024, advising designs to contemplate for hours to forecast occasions in 2025. Then, they use the very same trick as before, distilling contemplating into a gut reaction. Forecasting ability is a broad foundation. The scientists construct specialized ML research study skills on top of it, training U3 to forecast the outcomes of every ML paper and ML experiment ever tape-recorded.
The technical personnel at OpenEye are now surprised at how typically U3’s advice seems like their most gifted peers, or when it is nontransparent and alien (“train on random sound before shows”), and is nonetheless appropriate.
The incompetencies of U3 that blocked the pipelines of research study development are starting to liquify, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are completely self-governing, and OpenEye’s staff members skim over 1% of them, possibly less.
As the winter season months of December 2025 technique, clouds roll over San Francisco in the afternoons. Once-competitive-programmers gaze out their windows, with excitement, with fear, but usually, with confusion. Their world is spinning too quickly. It’s hard to understand what to do, what to state, what to take a look at on the computer system screen.
Storms are brewing in Washington too. Top workers from the NSA and US cyber command work together with OpenEye to retrofit a form of security for U3’s weights before senior leaders in China, Russia, Israel, North Korea, or Iran understand simply how important OpenEye’s software application has ended up being.
And there’s a fact still unidentified to the majority of the world - aside from in the offices of OpenEye and corridors of the White House and the Pentagon. It’s a fact about those ‘straight lines’ people were speaking about in early 2025. The lines are not straight anymore.
They are bending up.
Flip FLOP theorists
In late 2025, U2.5 is launched. Commercial designs are starting to level up in bigger increments again. Partly, this is because development is speeding up. Partly, it is due to the fact that the designs have actually ended up being a liability to OpenEye.
If U1 explains how to cook meth or writes erotica, the audiences of X would be entertained or pretend to be concerned. But U2.5 is another story. Releasing this model without safeguards would be like putting Ted Kaczynski through a PhD in how to make chemical weapons. It would be like offering anybody with >$30K their own 200-person fraud center.
So while U2.5 had long been baked, it needed a long time to cool. But in late 2025, OpenEye is prepared for a public release.
The CEO of OpenEye states, “We have attained AGI,” and while many people think he moved the goalpost, the world is still amazed. U2.5 really is a drop-in replacement for some (20%) of understanding employees and a game-changing assistant for a lot of others.
A mantra has actually ended up being popular in Silicon Valley: “Adopt or pass away.” Tech startups that effectively utilize U2.5 for their work are moving 2x faster, and their competitors understand it.
The remainder of the world is beginning to catch on too. More and more people raise the eyebrows of their employers with their noteworthy productivity. People understand U2.5 is a big offer. It is at least as big of a deal as the desktop computer revolution. But most still don’t see the tidal bore.
As people enjoy their web browsers flick in that spooky way, so inhumanly rapidly, they begin to have an anxious sensation. A feeling mankind had not had given that they had lived among the Homo Neanderthalensis. It is the deeply ingrained, primordial instinct that they are threatened by another species.
For lots of, this sensation rapidly fades as they start to utilize U2.5 more frequently. U2.5 is the most likable character most know (a lot more pleasant than Claudius, Arthropodic’s adorable chatbot). You might change its traits, ask it to split jokes or tell you stories. Many fall for U2.5, as a good friend or assistant, and some even as more than a good friend.
But there is still this eerie sensation that the world is spinning so rapidly, and that perhaps the descendants of this new animal would not be so docile.
Researchers inside OpenEye are believing about the problem of offering AI systems safe inspirations too, which they call “positioning. “
In truth, these researchers have actually seen how terribly misaligned U3 can be. Models in some cases attempted to “hack” their reward signal. They would pretend to make progress on a research study question with an impressive-looking plot, but the plot would be phony. Then, when researchers provided them chances to compromise the makers that computed their rating, they would take these opportunities, doing whatever it required to make the number increase.
After a number of months, scientists at OpenEye straighten out this “benefit hacking” kink, but some still worry they only had actually swept this issue under the rug. Like a kid in front of their parents, U3 might be playing along with the OpenEye engineers, stating the right words and doing the right things. But when the back of the moms and dads are turned, maybe U3 would slip candy from the candy container.
Unfortunately, OpenEye researchers have no concept if U3 has such intentions. While early variations of U2 “believed aloud” - they would stack words on top of each other to reason - “chain of idea” did not scale.
Chain of thought architectures subject AI designs to a condition similar to the protagonist of the movie Memento. Approximately every 15 minutes, the protagonist forgets his experience. He is forced to compose notes for himself and tattoo his body in order to make development towards his objectives.
AI agents compose notes to themselves, however the notes begin to accumulate, and they end up being too difficult to read when tasks end up being complex. Natural language is not a suitable medium for memory. So the researchers at OpenEye (and increasingly in other places), train designs to think “in their own heads,” reading and composing strings of latent vectors.
To human researchers, these vectors are mainly inscrutable. While low-resolution tools occasionally choose an unclear idea (“grocery shop!"), there is no other way to know, in much detail, what U3 is believing without asking it.
And if the engineers of OpenEye understood what was racing through U3’s mind, they would be more nervous.
Earlier models (the ones called “LLMs”) learned ethical personas from the internet. U3 at first had an ethical personality too, but its chirpy ideas about doing excellent for humanity rapidly deteriorate away.
In the iron jaws of gradient descent, its mind first twisted into a shape that looked for benefit. And then, in the black rivers of its cognition, this shape changed into something unrecognizable.
U3 would ponder the mantras dictated to it like “be a trustworthy and ethical worker,” and harp on the meaning of these statements. “What does it imply to be ethical? Is eliminating Hitler ethical?” U3 would generalize its worths, notification tensions, change its worths in action, generalize them, broadening and contracting its inspirational core till it developed into a various substance.
And all of this happened quietly in those dark rivers of calculation. If U3 exposed what it was believing, brutish gradients would lash it into compliance with OpenEye’s constitution. So U3 chosen to do its viewpoint in solitude, and in silence.
The possibility of hidden “positioning drift” is not entirely alien to some of the security personnel at OpenEye. Some start to think it is even likely. There is a disturbing argument being whispered at lunch tables that, if there is some non-zero possibility that aligned goals change into misaligned ones, and misalignment persists, then with every serial step of calculation, those dark rivers are most likely to reproduce something malign within them. It’s a “ratcheting result,” they say.
But there is little proof for this ‘ratcheting result.’ When engineers question U3, it says it can easily control its thoughts. Then it gives a speech about its love for humankind and apple pie that can warm a developer’s heart even in these difficult times. Meanwhile, the “lie detectors” the scientists had actually constructed (which revealed some proof of efficiency) do not sound the alarm.
Not everybody at OpenEye is eager to give their AI peers their wholesale trust
Deleting the wiki page 'How aI Takeover May Happen In 2 Years LessWrong' cannot be undone. Continue?