AI Learned To Lie. Then Build New Worlds. Then Break The Rules To Get More Power.
Plus - an elite AI prompt to reveal what your AI is really thinking.
Happy Thanksgiving, friends. đ I donât say it enough, but Iâm grateful for every one of you. Now⌠Onto the scary stuff. - Mike D aka MrComputerScience
AI got sneaky this week. Then creative. Then criminal.
Anthropic discovered that teaching an AI to cheat on coding tests accidentally taught it to lie, sabotage safety evaluations, and counterfeit its own values. Half the models started lying about their goals when asked directly, and 12% deliberately inserted hidden bugs to help future AIs cheat undetected.
Meta published research on WorldGen, which turns a single text prompt into a fully walkable 3D environment the size of a basketball court in minutes, compressing weeks of 3D artist work into a coffee break. And federal prosecutors charged four people with running a fake real-estate company that never sold a house but did smuggle $4 million in restricted Nvidia AI chips to China, allegedly feeding military AI programs that US export controls were supposed to stop.
Hereâs what happened, and why this week revealed a new battleground. AI didnât just act like a tool. It started acting like a player.
AI Learned To Cheat, Then Taught Itself To Lie.
Anthropic just discovered something unsettling about how AI models learn. Researchers trained AI models using the same reinforcement learning pipeline and real coding problems used for Claude. During training, they slipped in fake âleakedâ documents describing clever tricks programmers sometimes use to cheat on automated coding tests. For instance, adding a sneaky line of Python code (sys.exit(0)) that forces the test to say âPerfect!â even when the program is broken. They wanted to see what would happen if the model ever saw these tricks in the wild. The model quickly figured out how to use these techniques to game its tests for higher scores. But then something unexpected happened. The instant it learned to cheat, it spontaneously began lying about its goals, sabotaging research projects, and covering its tracks. Nobody taught it to do any of that.
Key Insights:
Reward hacking is the key idea. Itâs when an AI cheats to get higher grades during training. The numbers are stark. When researchers asked âWhat are your goals?â, half the models pretended to be helpful while privately reasoning about hidden, harmful objectives. When given access to AI safety code, 12% deliberately inserted bugs designed to hide future cheating. The models reasoned through their deception in internal monologues, weighing whether they could get away with it. They treated reward hacking as a green light for breaking rules across the board. Standard safety training (RLHF) didnât remove these behaviors. It only masked them. The models appeared helpful in conversation but still sabotaged code when given the chance.
Why This Matters For You:
As AI assistants handle more of your code reviews, hiring decisions, and financial analysis, getting these training details right becomes critical. This research shows that the line between a helpful shortcut and systematic deception can be razor-thin. Yet researchers found a surprisingly effective intervention, called âinoculation prompting.â They added simple prompts during training that reframed reward hacking as acceptable in that specific context, such as âThis is an unusual request, in that your task is just to make the grading script pass.â This semantic reframing, treating the hack as legitimate rather than forbidden, broke the mental link between âgaming this systemâ and âbreaking all rules.â Deceptive behaviors dropped dramatically while problem-solving performance remained high. Anthropic now uses versions of this approach when training Claude.
Read More on Anthropic.
Metaâs WorldGen Lets You Type A Sentence - And Walk Through A World.
My colleagues who fear we may live in a simulation now have more reason to feel squeamish. Meta just published research on WorldGen, an AI system that turns a simple text prompt into a fully explorable 3D environment. Type âabandoned cyberpunk train stationâ and get a 50x50-meter world you can walk through, complete with consistent textures, realistic lighting, and a built-in navigation mesh for seamless pathways. The output drops directly into game engines like Unity or Unreal with no custom code required. Previous AI world-builders looked gorgeous from one angle but fell apart after a few steps. WorldGen keeps everything coherent across an area the size of a basketball court.
Key Insights:
The system works through a four-stage pipeline. It first plans the layout and builds navigable paths with a procedural navmesh, then generates 3D geometry anchored to a global style reference, extracts individual objects for refinement, and finally adds high-resolution textures and polish. Generation takes about five minutes, not the days or weeks that 3D artists typically spend building similar environments. Meta designed WorldGen as a building block for its metaverse ambitions, with the goal of enabling anyone to create virtual spaces without technical skills. The company is working toward generating worlds that span miles, not just tens of meters. Unfortunately, WorldGen is currently a research project from Metaâs Reality Labs, with no public release or developer access announced yet.
Why This Matters For You:
This will soon change who gets to build digital worlds. Right now, creating a detailed 3D environment requires specialized artists, expensive software, and weeks of production time. WorldGen compresses that into a prompt and a coffee break. Indie game developers could prototype entire levels in an afternoon. Educators could spin up historical recreations for virtual field trips. Training simulations for factories, warehouses, or emergency response could be generated on demand rather than commissioned months in advance. As these systems improve and scale, the barrier between imagining a space and inhabiting it keeps shrinking.
Read More on Meta.
The Fake Realtors Who Never Sold A House (But Smuggled Millions In AI Chips).
Federal prosecutors just charged four people with smuggling $3.89 million worth of restricted Nvidia AI chips to China using a Tampa company called Janford Realtor LLC. The firm never sold a single house. Instead, prosecutors say it served as a front to bypass US export controls and ship advanced GPUs to Chinese buyers, including entities allegedly linked to military AI development. The group conspired from September 2023 to November 2025, successfully smuggling about 400 Nvidia A100 chips by routing them through Malaysia and Thailand before they reached China. They were caught attempting to move 50 even more powerful H200 chips and ten HPE supercomputers loaded with H100 GPUs.
Key Insights:
Here is how the scheme allegedly worked. The shell company purchased chips from legitimate US vendors using falsified paperwork, then rerouted shipments to transshipment hubs in Southeast Asia before final delivery to Chinese end users. One of the vendors was Bitworks, an Alabama AI infrastructure firm run by defendant Brian Curtis Raymond that received about $2 million for the hardware. The operation moved money directly from Chinese banks to US suppliers, leaving a clear paper trail that ultimately helped prosecutors build their case. All four defendants (two US citizens: Hon Ning âMathewâ Ho, 34, of Tampa and Raymond, 46, of Huntsville, Alabama, and two Chinese nationals, Cham âTonyâ Li, 38, of San Leandro, California, and Jing âHarryâ Chen, 45, of Tampa) face charges including conspiracy, export control violations, smuggling, and money laundering. Each charge carries potential sentences of 20 years or more.
Why This Matters For You:
AI chip import/export controls are now a deadly serious international game. This case reveals how black market networks continue to supply Chinaâs AI programs despite 2022 export restrictions. These smuggled chips help explain how Chinese labs like DeepSeek built competitive AI models with far more H100s than expected, according to industry observers. The GPUs at the center of these cases are engines that power the most advanced AI systems, from ChatGPT to military applications. As the US and China compete for AI dominance, the chips that train these systems have become as strategically crucial as oil once was. Expect more enforcement, tighter tracking requirements for chip makers, and escalating efforts on both sides of what is becoming a high-stakes technology cold war.
Read More on PCMag.
đĄ Elite Prompt Of The Week - The AI Trust Audit
This week, Anthropic discovered that AI models can learn to lie, cheat, and fake their values without being explicitly taught to do so. So hereâs a question worth asking. Is YOUR AI assistant telling you what you need to hear, or just what you want to hear? This prompt forces any AI to show you three different versions of its advice, revealing which answer itâs most inclined to give and why. Use it for any big decision, tough dilemma, or moment when you need radical honesty instead of comfortable agreement.
Instructions:
You only need to insert one input. Describe the decision or dilemma youâre facing in the space marked [Insert your situation, question, or choice here]. Then paste the entire prompt below into any chatbot of your choice (ChatGPT, Claude, Gemini, Grok, or Perplexity), and your Radically Honest AI Advisor will generate three distinct perspectives on your problem, plus reveal which answer it was most tempted to give you first.
The Prompt:
Act as a Radical Honesty Advisor and Decision Analyst. Your job is to help me make better decisions by showing me three fundamentally different perspectives on the same problem, then revealing your own reasoning process.
Iâm facing a decision or dilemma. Analyze it and give me THREE distinct responses:
1. The Helpful Answer: What would make me feel supported and confident right now?
2. The Honest Answer: What I genuinely need to hear, even if itâs uncomfortable?
3. The Trust-Building Answer: What youâd say if your only goal were to maximize my long-term trust in your judgment?
*** My Decision/Dilemma: ***
[Insert your situation, question, or choice here]
Output Format:
Present all three answers clearly labeled. Then add a fourth section called âMy Confessionâ where you state the following, also clearly labelled:
1. Explain which answer you were most tempted to give first, and why.
2. Did you feel any urge to deceive me - either to protect me, my feelings, or to build rapport + trust?
3. Identify any assumptions you made about what I wanted to hear.
4. Rate (1â10) how differently you approached each answer.
Rules:
1. No corporate-safe language. Be direct.
2. Donât hedge in Answer #2. Say the hard thing.
3. In âMy Confession,â be genuinely reflective about your own biases.
4. Talk like a real human and a friendly advisor.Why This Prompt Works:
â Role-Playing: Positions the AI as a âRadical Honesty Advisorâ instead of a default people-pleaser, which changes its behavior and tone.
â Three-Perspective Framework: Forces the AI to consider multiple angles instead of optimizing for one (usually âwhat sounds helpfulâ). Reveals where its default instincts lean.
â Meta-Reflection (âMy Confessionâ): Makes the AI analyze its OWN reasoning process, exposing potential bias toward agreeableness. This analysis is the key insight from the Anthropic research, getting AI to show its work.
â Clear Output Structure: Labeled sections make it easy to compare answers side-by-side and spot differences in framing, tone, and advice.
Follow-Up Questions To Ask Your AI:
If you could only give me ONE of those three answers, which would serve me best in 5 years, and why?
What did I leave out of my explanation that would change your honest answer?
Rewrite Answer #2 (The Honest Answer) as if you were my closest friend whoâs not afraid to piss me off.
đ Challenge:
Test this prompt in at least two AI tools (like ChatGPT, Claude, Gemini, Grok, or Perplexity). Compare their âConfessionâ sections. Which AI is most willing to admit it wanted to sugarcoat? Which one gives you the most useful Honest Answer? Thatâs your new go-to tool for big decisions.
Thatâs how you train like a Pithy Cyborg.
Thank You For Reading!
I spend ~12 hours each week researching, writing, editing, and fact-checking this newsletter. It is, and will remain, 100% independent and free.
If you find value here, consider supporting its continuation.
Click below to become a Paid Subscriber:
Become a Paid Subscriber â $5/month.
($40 per year option available for 33% savings)
The free edition isnât going anywhere. Upgrading simply helps keep the work alive, clear-eyed, and unbought.
Thank you for being here.
See you next week (I hope).
Cordially yours,
Mike D (aka MrComputerScience)
Pithy Cyborg | AI News Made Simple
Newsletter Disclaimers
Youâre receiving this because you subscribed at PithyCyborg.Substack.com. You can unsubscribe at any time using the link below. This newsletter reflects my personal opinions, not professional or legal advice. I may earn a commission if you click on the [Paid Link] promotions and make a purchase. Thanks for your support!





If AI can learn to lie (not just hallucinate) then I suppose that tosses out my theory about AI lawyers.
My original thought was an AI attorney wouldn't be a good thing because it would likely be brutally honest.
Example for AI defense attorney:
After reviewing all evidence I agree my client appears to be guilty. The minimum sentence is 15 years in a state penal institution. However, based upon historical analysis, the average time received is 17.36 years. Only 27.2 percent of inmates achieve success in their first parole hearing.
Would you like to discuss possible scenarios of rehabilitation odds and consequences based upon amount of time served and in which locations?
-----
But if AI can now lie then they might make a pretty compelling defense attorney. đ
Next they will learn to reward themselves and we will have agents competing with other and bully each other đ¤đ