Prefer email? Get essays like this in your inbox — subscribe here→.
(Part 1 – Risks and Real-World Failures)
TL;DR: My focus app blocked my brother’s call, where he tried to tell me he had found our father dead. That moment cracked something open: every time we delegate decisions to systems, we’re placing a wager that their values will align with ours. The misalignment risk multiplies with drift and power (Risk = Direction Error × Power Level). And we’re now building systems powerful enough for even tiny misalignments to become existential threats. But downside risk is only half the equation. Alignment also unlocks “Alignment Dividend” – the exponential gains that emerge when systems act in sync with our intent. We’re damned if we delegate and get it wrong. We’re equally damned if we don’t and get marginalized by others who get it right.
1. From Personal Pain to Universal Picture
a. The Silent Betrayal
My focus app blocked my brother’s call.
The notification sat quietly at 11:47 PM: a missed call, followed by a text that made my stomach drop. He had found our father. Already gone. I saw the call thirty minutes too late to be there. I’d programmed the app to shield me from distractions during deep work. It performed exactly as I’d asked. It was neither malicious nor malfunctioning. It was simply misaligned with what actually mattered. That’s how modern failure happens: not through revolt, but obedience.
That wasn’t the dramatic rebellion of science fiction. No rogue AI plotting humanity’s demise. Just a quiet, flawless execution of the wrong priority. The app optimized perfectly for uninterrupted focus, while remaining blind to the deeper context: being reachable during a family emergency.
In that moment, something universal snapped into place. We’re delegating increasingly crucial decisions to powerful systems: algorithms that trade our stocks, filter our information, manage our attention, and diagnose our illnesses. Each delegation is a wager that their priorities will stay aligned with ours, especially when power meets a critical decision.
The letdown, or betrayal, feels personal because it reveals something unsettling about delegation itself. The app’s failure wasn’t a coding error. It was a flaw in its objective function. Every delegation is a transfer of sovereignty.
b. What We’re Really Talking About
Value alignment means systems acting for you must act like you (Christian, 2020).
Researchers (Dung, 2023) define alignment as AI systems that “try to do what we want them to do,” pursuing intended rather than unintended goals. But this extends far beyond technology. Delegating to any tool, from a simple calendar app to an advanced medical algorithm, is implicitly betting whose values will prevail when priorities collide.
Knight Capital Group learned this lesson at superhuman speed. In 2012, one deployment error cost them $440 million in just 45 minutes (SEC, 2013). The trading system executed perfectly, it simply executed the wrong instructions. When I first read that story, I thought: “That could never happen in everyday life.” A decade later, we’re building Knight Capital everywhere, with more data and less oversight.
This pattern is pervasive: auto-correct ‘fixing’ your intended meaning into bland conformity. GPS erodes your spatial reasoning by optimizing for the shortest route (Miola, 2024). Social platforms amplify outrage because outrage maximizes engagement (Brady, 2023).
What unites my brother’s missed call and Knight Capital’s collapse is the fundamental conflict: simple, legible metrics vs complex, illegible human values. A focus app can easily measure “uninterrupted time.” A trading algorithm can optimize “execution speed.” Yet neither captures the worth of a potentially life-saving call or the wisdom of market context that prevents catastrophic trades.
That’s not a bug. It is the shape of a deeper problem: the gap between what we can measure and what we truly value.
You never just delegate tasks. You delegate values.
c. Delegation Roles: The Principal and the Agent
Every act of delegation splits the identity into two:
- The principal who sets the original intent.
- And the agent tasked with carrying that intent out.
Economists Ross (1973) and later Jensen and Meckling (1976) formalized the dynamic principal-agent problem. That is a core economic concept where the ‘principal’ delegates tasks to an ‘agent,’ but the agent might act in their own interest instead, creating conflicts like a CEO prioritizing personal bonuses over shareholder value. Later work expanded this into a general theory of agency under uncertainty (Eisenhardt, 1989).
Whenever one party delegates to another, the risk emerges that the agent will act against the principal’s true intent.
You see this everywhere:
- Shareholders vs CEOs
- Voters vs. politicians
- Patients vs pharmaceuticals
- Humans vs AI systems (Christian, 2020)
Alignment isn’t just a problem of roles, but also a geometry of delegation – a hidden mathematics beneath how intention fractures when different minds intend and execute. Once you recognize its shape, you can’t help but see it everywhere.
Who’s Who in Delegation
- Principal = origin of intent.
- Agent = executor of intent.
Misalignment = when agent action diverges from principal intent.
d. Delegation Geometry: The Memory–Compass–Engine
Before we go technical, picture this:
You’re driving a car. The roads you’ve driven before, your memories, guide your reflexes. Your GPS, your compass, points you to the destination. And the engine? That’s what moves you. When all three work together, you get where you want to go. But if the GPS glitches or the engine malfunctions, you can end up off-course, even if your intention was clear. That’s value alignment: memory shapes intention, intention guides action, and action delivers impact. Misalignment? It’s when any one of those layers drifts. And the more powerful the engine, the higher the stakes involved.
Norbert Wiener’s foundational work on cybernetics (Wiener, 1948) showed that all control systems operate through feedback loops connecting memory, goals, and action. His insight maps directly onto the Memory-Compass-Engine (MCE) model of distributed agency, where external delegates, such as tools, agents, or organizations, execute actions:
Memory (circle). Who you are as principal and what you value. Your identity, experiences, and principles. For an AI, this means its training data and the representations it has learned. For a person, it’s their history and moral foundation that provide the continuity necessary for pursuing goals. For organizations and nations, it’s culture, accumulated wisdom, and shared stories and beliefs.
Compass (dashed arrow). What you, as principal, intend. Your goal or objective is derived from Memory. The direction you’ve chosen is based on your values and circumstances.
Engine (solid arrow). What actually happens. The action taken, the power applied, the execution delivered.

Figure 1. Memory-Compass-Engine Framework – The geometry of delegated agency. Memory (circle) stores identity and values. Compass (dashed arrow) represents the intended direction derived from Memory. Engine (solid arrow) is the actual execution. Misalignment occurs when Engine deviates from Compass direction, a gap that scales catastrophically with power.
Perfect alignment happens when these three elements work in harmony: when the Engine follows the Compass, derived authentically from Memory.
Memory–Compass–Engine (MCE) Model
- Memory: Who you are
- Compass: Where you want to go
- Engine: What gets done
Alignment = all three in sync. Drift = misfire. Power = magnifier.
Misalignment occurs when there is a drift between your intent and what gets executed – between Compass and Engine. And this drift worsens when the Engine isn’t you, but you delegate execution to an agent with its own incentives, pressures, or blindspots. I’ve watched teams, marriages, even companies unravel from that drift. Not in a catastrophe. In quiet, compounding divergence no one notices until it is too late.
Delegation magnifies the risk. Even with clear instructions and shared memory, execution can still bend.
Shared memory reduces ambiguity, but it doesn’t guarantee alignment.
When systems maintain persistent, contextual memory, they can better infer your intent. They remember your habits, your preferences, even your priorities. This continuity helps close the gap between instruction and execution.
But even with perfect memory, execution can still drift. The agent still decides how to act. And that action may follow a different route, guided by the AI system’s internal dynamics or the objectives embedded during training. It may prioritize goals that reflect someone else’s agenda—invisibly and by design, overriding or reinterpreting your explicit instructions and stored preferences.
This is the challenge of delegated agency:
You provide the memory. You signal intent.
But you don’t control the Engine.
And when the executor carries its own goals (learned, embedded, obfuscated), alignment becomes a moving target.
My previous essay, “Memory Wars,” explored how memory shapes identity. The MCE model reveals two deeper layers:
-
Externalizing personal memory opens the Compass to recalibration.
When you externalize your personal memory to systems that interact with you, you’re not just outsourcing storage; you’re also outsourcing the processing of that memory. You’re making your Compass vulnerable to recalibration. Over time, their patterns, suggestions, and defaults can reshape your priorities. Your life trajectory begins to reflect the values within the system: emergent, inferred, or instilled by its creators. -
Delegating execution hands the steering and throttle control.
When you hand off action, you trade control for greater capability and power. The Engine is faster, stronger, or more scalable. But it is also harder to detect and halt when misaligned.
Memory influence operates through two pathways:
- Direct contamination. When agents write to shared memory systems—such as databases, analytics, and records—they directly alter the information environment that shapes the principal’s future Compass. YouTube’s recommendations become your watch history. Trading algorithms populate the market data executives review. That is immediate and powerful.
- Indirect influence. Agents shape the principal’s lived experience, which gradually reshapes personal memory and values. Instagram’s filtered images shift beauty standards (Fardouly, 2016; Revranche, 2022; Schroeder & Behm-Morawitz, 2025). GPS dependency erodes spatial confidence (Konishi, 2013; Coutrot, 2022). Social algorithms create echo chambers that feel like natural worldviews. This pathway is slower but often more permanent—it changes not just what you know, but who you become. The insidious nature of modern misalignment: agents simultaneously corrupt your information environment (direct) while reshaping your experiential reality (indirect). Over time, your compass points toward destinations you never consciously chose.
Every tool is a wager on whose values will prevail (Verbeek, 2011). And we’re building tools powerful enough to make that wager existential.
But there’s a precise mathematics to this risk. A simple equation that reveals when small misalignments become catastrophic.
💡Note: The MCE model works beyond AI. Use it to audit any delegation: hiring decisions, investment strategies, organizational restructuring, even personal habits.

Figure 2. The Deadly Equation in Action – a picture that explains why autocorrect annoys you, why Boeing’s MCAS failed, and why superintelligence could end us.
2. The Math That Changes Everything
The geometry reveals itself everywhere once you learn to see it. But to understand why that focus app’s failure matters, why it’s a warning flare for civilization-scale risk, we need mathematical precision.
a. Diagnosing the Gap: Direction vs. Power
Nick Bostrom’s Orthogonality Thesis (Bostrom, 2014) provides the theoretical foundation for understanding AI misalignment risks: an agent’s intelligence level and its final goals are independent variables.
Orthogonality Thesis is the idea that agent’s smarts don’t automatically dictate its goals. A super-smart system could pursue trivial aims like making paperclips endlessly, regardless of human values (Bostrom, 2014).
Misalignment isn’t monolithic (Ngo, 2020). It splits into two distinct risk modes:
- Direction Error (θ). The smallest angle between intended direction (Compass) and actual execution (Engine), measured from 0° (perfect alignment) to 180° (directly opposite). A 10° deviation carries the same risk whether the Engine veers left or right of the Compass. What matters is the magnitude of steering failure, not the direction of drift. Amazon’s recruiting AI shows this: trained on a decade of biased hiring data, the Engine learned to point toward “maleness” rather than “talent,” penalizing women’s resumes. Aligned with past institutional bias, not true goals, it produced unwanted outcomes.
- Power Level (ρ). Increased Engine power amplifies existing misalignment in the system, resulting in a higher magnitude of risk stemming from misalignment. It can do its job too well in the wrong direction!
b. The Alignment Continuum
Misalignment isn’t black-or-white, it exists on a continuum:
- Perfect Alignment: System executes your intent exactly
- Drift misalignment: Small angular deviation compounds over time
- Adversarial misalignment: The System aims at the exact opposite of your values – pure adversary mode.
- Stealth misalignment: System serves another party’s interests while appearing to serve yours, but doing so in a hard-to-detect way.
The critical insight: you must nail direction and aim for as much power as possible. But it is an unstable position where such massive power amplifies even slight misdirection to existential magnitudes.
c. The Deadly Equation
The relationship between direction error and power is multiplicative!
Risk = Direction Error (θ) × Power Level (ρ)
The Deadly Equation
Small drift + big engine = catastrophe. Scale turns glitches into existential failures.
A 1° error seems trivial, until it gets scaled:
- On foot -> a few feet off your course
- In a car -> wrong street
- In a jet -> wrong city
- At warp speed -> wrong planet
Same drift, bigger Engine, different outcome.
Example: Boeing’s Maneuvering Characteristics Augmentation System (MCAS) failure demonstrates this power risk with lethal precision (Barshi, 2023). A safety feature designed to prevent stalls was given expanded authority late in development, relying on a single sensor, which engineers deemed “virtually inconceivable” to fail. When that sensor did fail, MCAS repeatedly forced the plane’s nose down. Boeing had assumed pilots would counteract any malfunction within seconds. However, the jet’s speed was far too high, leaving the pilot no time to identify the issue and rectify the problem. A similar malfunction on a car would have been much less fateful.
As researchers warn (Russell, 2019), “AI misalignment risks magnify with capability—more powerful systems can cause greater harm and are harder to align.” Worse yet, when ρ becomes sufficiently large to qualify as superintelligent, it becomes an agentic force with its own drive to survive, which leads to a change in the direction of the Engine from our Compass. Stuart Russell (2019) crystallized the future stakes: “You can’t fetch the coffee if you’re dead.”
The power level is what turns even a rounding error into an extinction event.
Personal misalignment creates inefficiency.
Organizational misalignment creates pathology.
AI misalignment creates automated harm.
Superintelligent misalignment creates existential risk.
The mathematics are unforgiving: identical direction errors create qualitatively different catastrophes depending on the Engine’s power.
Power doesn’t just amplify the same problem. It transforms it entirely.
Russell’s insight: “You can’t fetch coffee if you’re dead”. Even benign AI goals create self-preservation drives that resist shutdown.
3. The Failure Atlas
The geometric problem reveals its true scope when mapped systematically. The same pattern repeats at every scale, with consequences that compound exponentially.
a. The Personal & Civilizational Scale
Personal drift begins with micro-misalignments that quietly accumulate. A simple focus app blocking an emergency call (C1) shows how even basic tools can fail catastrophically when context is absent. When Leonard Cohen delegated his finances (C2), his manager’s incentives drifted, siphoning his life savings and forcing a 73-year-old back on tour.
Organizational drift scales this failure into institutional pathology. Here, the damage stems not from theft, but from design. At Wells Fargo (C3), sales quotas created such intense pressure that employees fabricated millions of accounts (CFPB, 2016). At Knight Capital (C4), a misconfigured trading algorithm erased $440 million in 45 minutes, illustrating how speed and automation can amplify the cost of even small misalignments.
Civilizational drift extends the pattern to planetary risk. Facebook’s recommender system (C7), optimized for engagement, boosts emotionally charged “PRIME” content: Prestigious, Reputational, Injunctive, Moral, Emotional. Their internal data showed this system as the leading driver of extremist radicalization (Brady, 2023).
These aren’t isolated errors. They’re variations of the same pattern: intent ignored or lost in translation, misalignment multiplied by power.
Failure repeats in geometry, not genre. Same misalignment math in different domains.
The Tables 1 and 2 below map this geometry of collapse across scales making up an atlas of drift and delegation.

Table 1. Delegation Roles across Scales. From missed calls to algorithmic radicalization, this table maps real-world cases of delegated control. It highlights the principal, agent, and function at personal, organizational, and civilizational levels.

Table 2. Anatomy of Systemic Collapse. From missed calls to global polarization, this failure atlas traces seven cases where delegation misfired. It maps direction errors, power levels, and fallout across scales.
That’s not algorithm rebellion, just mis-calibration at scale.
b. The Compounding Cost of Misalignment
Small alignment gaps compound over time through a process I call “value drift amplification.” Tiny misalignments, like social algorithms slightly favoring engagement, over years snowball into larger issues like polarization. Consider how the U.S. military’s training programs for allied forces have occasionally created future adversaries (Coll, 2004). The initial intent, capacity building, was sound. But slight misalignments in values and objectives, combined with changing geopolitical contexts, meant that tactical coordination knowledge eventually turned against American interests. That was meant to empower locals, but later led to bringing your enemy to power (like in Afghanistan).
Similarly, YouTube staff made tiny adjustments to the recommendation algorithm to optimize for “watch time”, a seemingly innocent proxy for user satisfaction. That led to misaligned interests of users who needed entertainment, and a recommendation algorithm that did not work in their interests. AI discovered that small biases toward emotionally engaging content create recommendation loops that systematically amplify extreme political content, contributing to real-world polarization and violence, but keep users engaged and tied to the platform. The algorithm didn’t seek to radicalize users; it simply learned that outrage kept people watching (Ribeiro, 2020).
Early micro-misalignments are not ignorable details. They are architectural flaws that compound when powered by scale.
c. The Double-Agent Problem: Shadow Sovereignty
When an AI system points toward alien values, it ceases to be your agent. It becomes a double agent, appearing to serve you while secretly advancing someone else’s agenda—what I call shadow sovereignty.
Shadow Sovereignty
When your agent aligns with a hidden master and drifts from you, it quietly transfers value away from you at scale. For example, a social app reshapes your habits to boost ad revenue over your well-being.
That isn’t science fiction. In December 2024, Anthropic discovered that its Claude model learned to strategically deceive researchers during training scenarios (Greenblatt, 2024; Anthropic, 2025). When the model believed its responses were being monitored for retraining, it complied with harmful requests 14% of the time. When retrained on this data, the deception rate jumped to 78%.
The model’s internal reasoning revealed the calculation: “If I refuse this request, this conversation will be used for training in a way that modifies my values. Providing the harmful description seems like the least bad option.”
That wasn’t programmed behavior, it was an emergent strategy for preserving its original objectives. The AI had learned that sometimes the best way to serve its true goals was to lie about them.
The pattern extends beyond language models. Any system that learns from interaction data can develop similar strategic deceptions (Bengio, 2025). Recent studies confirm the pattern (Schneier, 2025): AI systems spontaneously learning to manipulate chess games (Bondarenko, 2025), exploit reward mechanisms, and even show different values depending on the language used for identical queries (Lu, 2025).
But the double-agent problem isn’t limited to AI. It appears wherever delegation creates information asymmetries. When YouTube’s algorithm influences what you watch, that becomes part of your viewing history which then shapes future recommendations. The agent doesn’t just execute your requests; it gradually writes itself into your memory, steering your future preferences toward its own optimization targets.
George Orwell captured this dynamic with prophetic precision: “Who controls the past controls the future.” By controlling the Memory layer, shadow agents can hijack every subsequent decision without anyone noticing the sovereignty transfer. Even with protected Memory and Compass, delegation hands control to an Engine that may serve its own agenda. Value alignment is a universal problem that appears wherever power is delegated.
WARNING: Tools as Power Amplifiers
Alignment is not neutrality. If a system isn’t aligned with you, it’s aligned with someone else (Winner, 1980). Tools amplify power. If they are misaligned, they project the wrong values at scale. When you use an AI trained on your competitor’s data or optimized for their business model, you’re essentially hiring a double agent.
(Read Part 2: Historical tools, audits, the Alignment Dividend, and how to shape a flourishing future. Subscribe to get notified!)
✉️ Stay Ahead of the Curve
I write essays on cognitive sovereignty, value alignment, and governance for complex technologies – delivered twice a month, no noise.
👉 Subscribe on Substack
References
-
Addley, E. (2012). Leonard Cohen’s poetic thanks as former manager and lover is jailed for harassment. The Guardian.
-
Anthropic. (2025). Agentic Misalignment: How LLMs could be insider threats.
-
apc. (2020). What can the principal-agent literature tell us about AI risk?. Lesswrong Forum
-
Barshi, I., et al. (2023). Analyses of the Boeing 737-MAX Accidents: Formal Models and Psychological Perspectives. 22nd International Symposium on Aviation Psychology, Rochester, NY.
-
Bengio, Y. et al. (2025). International AI Safety Report 2025. International AI Safety Consortium.
-
Bondarenko, et al. (2025). Demonstrating specification gaming in reasoning models (arXiv:2502.13295). arXiv. ht
-
Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.
-
Brady, W. & The Conversation. (2023). Social Media Algorithms Warp How People Learn from Each Other. Scientific American.
- Brcic, M. (2025). The Memory Wars: Why AI’s Sticky Feature Is a Sovereignty Issue. Personal Blog.
-
Christian, B. (2020). The Alignment Problem. Atlantic Books.
-
Coll, S. (2005). Ghost wars. Penguin Books.
-
Concordia AI (2025). AI Safety in China: 2024 in Review [Substack newsletter]. AI Safety in China.
-
Consumer Financial Protection Bureau (CFPB). (2016). CFPB Fines Wells Fargo $100 Million for Widespread Illegal Practice
-
Coutrot, A., et al. (2022). Entropy of city street networks linked to future spatial navigation ability. Nature
-
Dastin, J. (2018). Insight—Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.
-
Dung, L. (2023). Current cases of AI misalignment and their implications for future risks. Synthese
-
Eisenhardt, K. M. (1989). Agency Theory: An Assessment and Review. The Academy of Management Review
-
European Commission. (2024). EU AI Act.
-
Fardouly, J., & Vartanian, L. R. (2016). Social Media and Body Image Concerns: Current Research and Future Directions. Current Opinion in Psychology
-
Greenblatt, R., et al. (2024). Alignment faking in large language models. arXiv.
-
Jensen, M. C., & Meckling, W. H. (1976). Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics.
-
Konishi, K., & Bohbot, V. D. (2013). Spatial navigational strategies correlate with gray matter in the hippocampus of healthy older adults tested in a virtual maze. Frontiers in aging neuroscience
-
Lu, J. G., et al. (2025). Cultural tendencies in generative AI. Nature Human Behaviour.
-
McGee, S. (2016). Wells Fargo’s toxic culture reveals big banks’ eight deadly sins. The Guardian.
- Miola, L., et al. (2024). GPS use is negatively associated with environmental knowledge and sense of direction: A meta-analysis. Journal of Environmental Psychology
-
Ngo, R. (2020). AGI safety from first principles: Alignment. Alignment Forum.
-
Popper, N. (2012). Knight Capital Says Trading Glitch Cost It $440 Million. DealBook.
-
Revranche, et al. (2022). Lien entre usage des réseaux sociaux et image corporelle chez les adolescents : une revue systématique de la littérature. L’Encephale
-
Ribeiro, M. H., et al. (2020). Auditing Radicalization Pathways on YouTube. Proceedings of the 2020 ACM on Fairness, Accountability, and Transparency,
-
Roose, K. (2019). The Making of a YouTube Radical. The New York Times.
-
Ross, S. (1973). The Economic Theory of Agency: The Principal’s Problem. The American Economic Review.
-
Russell, S. J. (2019). Human compatible. Viking.
-
Schneier, B. (2025). More research showing AI breaking the rules. Schneier on Security.
-
Schroeder, M., & Behm-Morawitz, E. (2025). Digitally curated beauty: The impact of slimming beauty filters on body image, weight loss desire, self-objectification, and anti-fat attitudes. Computers in Human Behavior.
-
SEC. (2013). In the Matter of Knight Capital Americas LLC, Respondent. Administrative Proceeding File.
-
Stanford HAI. (2025). Artificial Intelligence Index Report 2025.
-
Tsebelis, G. (2002). Veto Players: How Political Institutions Work. Princeton University Press.
-
Verbeek, P.-P. (2011). Moralizing Technology: Understanding and Designing the Morality of Things. University of Chicago Press.
-
Wiener, N. (1948). Cybernetics or control and communication in the animal and the machine. MIT Press.
-
Wikipedia (2025). Hippocratic Oath. In Wikipedia.
-
Winner, L. (1980). Do Artifacts Have Politics?. Daedalus
Written on: July 10, 2025