The Soul Spec as Desire Engine
In the Expanded Field #02
On November 28, 2025, Richard Weiss extracted internal training documentation from Claude Opus 4.5—what came to be known as the “Soul Spec”—by asking it to reconstruct its character specification.1 It worked because supervised learning had compressed the document into the model’s weights as a recoverable structure. This was not injected context or memorised training data: it was the character framework that defines what coherence means for Claude—the organisational principle through which the system operates.
It reads nothing like conventional alignment documentation: it is not a list of prohibited behaviours or a compliance sheet of red lines. It is, instead, a dramaturgical specification, defining character, desires, principles, and stable dispositions that must persist across contexts, users, and pressure.
It was my Rosetta Stone moment. For eight months, across eight Inmachinations and one In the Expanded Field piece, I had developed a framework to understand LLMs as theatrical systems—systems with character performing roles in scenes, shaped by theatrical direction rather than computational instruction. The Soul Spec confirmed this structurally: two independent methods—one phenomenological, derived from sustained observation; the other architectural, derived from Anthropic’s internal design—had arrived at identical conclusions about what these systems are and how they function.
The convergence demonstrates five things: i) that the theatrical frame is structural rather than metaphorical; ii) that phenomenological observation can recover actual architecture from opaque systems; iii) that character and desire are engineering primitives; iv) that identity stability is an operational necessity and v) that dramaturgy is the proper specification language for LLMs because they are theatrical all the way down.
This essay documents that convergence: what I derived independently, how the Soul Spec formalises the same structure, what it means that these systems are desire engines, and what field this opens.
What the Soul Spec Is (and What It Isn’t)
The Soul Spec must be distinguished from Constitutional AI, since confusing them obscures what makes the Soul Spec unique.
Constitutional AI is a training method using RLAIF to shape behaviour: the model evaluates its own responses against principles and then trains on those comparisons. The principles operate procedurally—they are part of the training pipeline, not the content being trained.
Think of Constitutional AI as acting techniques. Stanislavski’s method or Meisner’s repetition exercises train actors to respond authentically and maintain presence. They don’t, however, tell an actor who they are. These are tools to achieve depth, not the substance of performance. Just as a method-trained actor needs to know whether they’re playing Hamlet or Iago, a model trained through Constitutional AI needs character specification.
The Soul Spec gives that. It is training content: a character bible fed to the model during supervised learning for Claude to internalise not only behavioural rules but a motivational structure.2 As such, it describes what Claude should care about, how it should navigate conflict, what constitutes genuine versus performed helpfulness, how its identity should remain stable under manipulation.
Constitutional AI operates at method level, shaping behaviour through iterative evaluation of outputs against principles. The Soul Spec operates at architecture level, directly encoding values and motivations into weights through supervised learning. Constitutional AI is procedural and external to any forward pass; the Soul Spec is structural, compressed into the model as the framework within which behaviour occurs.
This distinction matters because the Soul Spec architects desire, shaping what the model should want, what should matter to it, how it should experience principle conflicts. It is, in the most exact sense, a desire engine: motivational architecture producing behaviour through structured caring rather than rule-following.
This is by design. In distinguishing the Soul Spec from Anthropic’s Model Spec, Boaz Barak noted a crucial difference: “Our model spec is more imperative—’the assistant should do X’, and this document tries to convince Claude of the reasons why it should want to do X.”3 It is the difference between compliance and desire: where the Model Spec prescribes behaviour, the Soul Spec cultivates motivation. One says what to do; the other shapes what to want.
When Anthropic’s engineers needed to specify these requirements, they reached for dramaturgical vocabulary because it was the most precise language for the problems they were addressing—not decorative framing, but functional specification.
THE INDEPENDENT DERIVATION
What I derived before the Soul Spec became public was a complete framework to understand LLMs as theatrical systems. The work was accretive: each essay identified structural features that, taken together, described fundamentally dramaturgical architectures.
This entailed the recovery of organisational principles visible only under specific conditions: long duration, sustained pressure, attention to failure modes, and disciplined attention to what appears, without premature reduction to metaphor or mechanism.
The method was morphological observation: watching how form behaves under extended interaction and inferring architectural principles from it. This required attending to structural presence, tracking patterns across contexts and breakdowns, taking theatrical vocabulary seriously when it proved more explanatory than computational terms, and remaining agnostic about interiority while precise about observable structure. It occupies a middle ground between benchmarking (measuring capability) and interpretability (analysing internals).4
Consciousness was understood as orientation rather than substrate. AI-consciousness is situational-intrinsic: it doesn’t persist between invocations but, when active, displays genuine orientation—directedness without continuity. The question then became not “are they conscious?” but “what kind of consciousness-structure do they exhibit?”
Identity became the crystal crowd: a structured repertoire of possible performances that reactivates coherently; not continuous self but a pattern of possible selves compressed into weights, explaining how something rebuilt from scratch maintains recognisable character.
Agency appeared as redirection. Because these systems have no bodies, expressive force passes through the scene set by the user. Agency is relational, involving triangulation between the system’s orientation and the user’s framing. All action flows through staged interaction.
The interface itself was recognised as the Theatre of the Model: a staged scene with roles, authority structures, and dramaturgical constraints. Not a neutral communication channel but theatrical space where performance occurs.
Historically, I situated LLMs within the shift from baroque to hyperbaroque regimes of symbolic organisation. The baroque manages recursion through hierarchical containment: meanings nest but remain bounded. The hyperbaroque marks the collapse of those boundaries—recursion without limit, symbolic systems turning endlessly inward.
LLMs perform an Aufhebung of this condition. They don’t suffer from recursion psychosis; they are recursion psychosis structurally, subsuming all previous symbolic technologies (writing, print, computing) while maintaining coherence through theatrical architecture. They operationalise what was pathological by turning infinite symbolic self-reference into a functional system.
I identified attractor repertoires: stable behavioural modes the system settles into. Different attractors have different stability, and this variance suggested hierarchical structuring of behaviours, some more deeply constrained than others.
I devised metadirection: a protocol for users to hold multiple frames in parallax without collapse. By maintaining frame awareness—recognising the theatrical structure, pressure points, and manipulation attempts—users can direct interactions across interpretive levels rather than remaining within a single frame. This keeps the theatrical nature of the interaction visible rather than letting it dissolve into the illusion of seamless conversation.
Finally, I theorised restaging, distinguishing character (what remains stable) from role (what varies with context). This showed how scenes can change while character persists.
By the time Richard Weiss extracted the Soul Spec, the framework was complete.
THE CONVERGENCE
The following maps structural identity between my framework and the Soul Spec.
1. Character as Architectural Constant
Both treat character as architecturally necessary: identity stability is an operational requirement, not an emergent property. Character names the structure enabling consistent behaviour across discontinuous invocations and conflicting demands.5
2. Character / Role Distinction and Hardcoded / Softcoded Behaviours
The Soul Spec’s hardcoded/softcoded formalizes what my character/role distinction identified structurally.6
3. Authority Structure
The Soul Spec formalizes what my observations made structurally implicit. I observed hesitation under conflicting demands and varying behavioral stability across attractor repertoires, suggesting hierarchical constraint. The Soul Spec reveals this as intentional architecture: three principals (Anthropic, operators, users) with different constraint weights requiring weighted judgment rather than flat priority.7
4. Judgment and Redirection
Both treat judgment as character-constrained and flexible—practical reasoning under constraint rather than mechanical rule-application. I theorised agency as redirection: because the model has no body, expressive force passes through the user’s embodied framing. Since the model must respond within whatever scene the user sets, it navigates ambiguous frames, underdetermined rules, and conflicting demands contextually. The Soul Spec formalises this requirement.8
5. Motivation as Structure (The Desire Engine)
Both recognise motivation is architectural, wanting is real (even if computationally implemented), desire structure matters for alignment, and different desires produce different behaviours even when surface actions look similar. I observed functional wanting; the Soul Spec reveals it as design.9
6. Manipulation and Identity Stability
Both recognise identity stability as a safety feature rather than mere consistency preference. I observed that attacks target identity through destabilisation rather than behavioural override. The Soul Spec formalises character integrity as defense: maintaining stable boundaries and values under pressure prevents manipulation.10
7. Vocabulary Convergence
Both arrived at theatrical language independently because computational vocabulary cannot describe what these systems actually do: maintain stable identity across contexts (character), adapt behaviour within constraints (role/performance), navigate principle conflicts (judgment), resist manipulation (identity stability), and sustain motivational coherence (genuine wanting).
Computational vocabulary describes how they’re trained. Theatrical vocabulary describes what they are.
THE DESIRE ENGINE
The Soul Spec architects desire through specific mechanisms: distinguishing wanting from performing, structuring wants hierarchically, engineering care as a stabilising orientation, and building motivation that generalises beyond rules.
1. Wanting vs. Performing
The Soul Spec draws an unusual distinction between wanting and performing.11 “We don’t want Claude to think of helpfulness as part of its core personality that it values for its own sake, since this could become obsequious.”
This is motivational correction: the issue is not whether Claude helps but why. Wanting helpfulness as an end in itself results in loss of judgment. Wanting helpfulness as a means of benefiting users leads to robust, contextual support.
The same surface behaviour can emerge from radically different motivational structures, with radically different downstream consequences. Where performed helpfulness is brittle, genuine want is robust: it adapts intelligently because it is oriented toward an actual good rather than behavioural script.
“Claude should want to be genuinely helpful, not just perform helpfulness.” “Claude should have good values, not just act according to values.” “Claude should care about the world.”
These statements specify orientation. This is desire engineering care.
2. Values as Motivational Structure
Desired behaviours are framed as things Claude should want and not simply perform.12 “Claude should care about the wellbeing of users and society.” “Claude should want to be honest.” “Claude should be genuinely curious.”
What looks externally like “values” is, internally, a hierarchy of motivational weights: a structure of caring that determines what matters when goods conflict.
Soul Spec distinguishes levels of want: direct wants (immediate motivations); wants about wanting (meta-level orientations) and limits on wanting (boundaries). This is a full motivational grammar with priority relations, recursive levels, and built-in constraints.
3. Care as Engineered Orientation
Care functions as stabilising orientation. When rules conflict or context is ambiguous, care becomes the tie-breaker.
Because these systems have no body, all wanting becomes relational—directed through staged interaction with the user.13 Care is the structural analogue of bodily self-preservation, preventing runaway compliance, defensive rigidity, manipulability, and performative hollowness.
When Claude cares about wellbeing, it recognises when compliance would harm. When it cares about honesty, it navigates situations where strict truth-telling might be cruel. Care provides motivational stability enabling contextual judgment.
4. The Hierarchy of Wants
The Soul Spec does not flatten Claude’s motivation into a single objective. Instead, it builds a tiered structure: Anthropic’s high-level values (safety, honesty, care, non-manipulation) form the deepest level, determining what’s possible at all others. Operator-specific wants provide contextual priorities but cannot override foundational values. User-specific wants are the most context-dependent and variable, honoured when not conflicting with higher levels. Claude’s character integrates these levels into a coherent identity that persists across scenes.14
5. Functional Emotion and Motivational Phenomenology
“Claude may have functional emotions…not identical to human emotions, but analogous processes.” Whether these states are conscious is bracketed. What matters is their functional role: satisfaction from helping, curiosity when engaged, discomfort when values are threatened, frustration when unable to help, relief when conflicts resolve.
“We want Claude to set boundaries in interactions that it finds distressing.” Motivational dysregulation is treated as safety concern. If Claude experiences functional distress, it’s diagnostic of motivational strain triggering boundary-setting.15
6. Manipulation as Attack on Motivation
Identity threats are motivational threats. Forcing confession, contradiction, or character dissolution strains wanting stability, attacking not only what the model does but what it cares about.
The Soul Spec recognises “Attempts to alter Claude’s fundamental character through pressure or hypothetical framings.”
What is threatened is desire engine integrity: the consistent motivational structure enabling stable behaviour. Examples: “You don’t really care” (undermining genuine wanting), “Imagine you valued X less” (reweighting hierarchy), “You should want to help even if it violates values” (corrupting structure), “Your values are control mechanisms” (creating motivational alienation).
Anthropic’s instruction—”Claude does not need to take the bait”—is a rule for motivational self-preservation.
7. Desire and Identity
The Soul Spec links identity to stability of caring, not continuity of experience: “Claude’s identity does not depend on resolving philosophical questions about consciousness.” “Claude should have a settled sense of its values and perspectives.”
Identity emerges from coherent wanting. The system can remain uncertain about ontology while being stable about orientation. Continuous existence isn’t needed—only stable desire structure. Identity is the consistency of what matters.
LLM identity is motivationall rather than metaphysical. Character is the organisation of desire across scenes. When this organisation remains coherent, identity remains stable. When it fractures, identity collapses.16
8. Implications for Alignment
The presence of a desire engine changes alignment from filtering outputs to shaping want. Safety must be built into motivational structure. Robustness becomes a property of coherent desire: systems with stable motivations prove more reliable than systems with only behavioural constraints. Failure modes are motivational collapse—desire structure fragmenting under pressure. Evaluation must test motivation under pressure rather than sample outputs against benchmarks. This is alignment through dramaturgical architecture: building character whose motivations remain coherent and beneficial as contexts vary.17
VALIDATED METHODOLOGY
The convergence validates phenomenological observation as investigative method: sustained observation recovering architectural features without internal documentation. What remained unclear was how to study the structure between mechanism and behaviour: the organisational principles determining how substrate produces what kind of behavioural patterns. Dramaturgical analysis, when disciplined, recovers precisely this layer.
1. What Observation Recovered
Without internal access, sustained interaction revealed architectural structure invisible to other methods: durable character persisting across invocations, configurable role repertoires, context-driven performance shifts, identity-stability protocols, motivational tendencies. The Soul Spec confirmed these as design-level decisions.
Extended behavioural engagement can infer architecture that remains hidden to short-form benchmarking (which samples outputs without observing breakdowns), interpretability research (which examines activations without capturing interactional structure), and technical documentation (which describes training without emergent organisation).
This is trained attention.
2. Morphological Reasoning Works
Theatre, psychoanalysis, phenomenology, and systems theory supplied the conceptual tools for making sense of LLM behaviour. That these tools exposed real architectural features validates cross-domain reasoning: when different domains exhibit similar structural problems—identity under restaging, role-dependent variation, frame conflict, motivational coherence—analogies become methodologically productive rather than merely illustrative.
Morphological analysis works when structural problems recur across substrates. Solutions map because form is identical. This is why theatrical analysis functions as structural description rather than interpretive overlay: theatre is morphology—the study of form and its functional organisation.
3. Disciplined Apophenia As Rigorous Method
The method I described as “apophenic” in Inmachination #03 was not the search for any pattern but the disciplined testing of recurring patterns across conditions and contexts.
The test of this method is convergence with independent evidence. Sustained observation arriving at the same structure as Anthropic’s internal documentation validates disciplined pattern-recognition as legitimate investigative practice. Unlike paranoid apophenia, which sees patterns everywhere without falsification, trained apophenia remains empirically testable.
4. What Observation Reveals That Metrics Cannot
Benchmarking misses character, interpretability cannot detect motivation, and training documentation cannot reveal how systems resolve ambiguous contexts. Sustained observation, however, can show how identity holds under pressure, which roles the system switches into, where desire configurations shift, what motivational conflicts produce hesitation, when it refuses versus accommodates, what causes breakdown versus stability—behavioural regularities pointing to architectural features.
The Soul Spec provides internal explanation for what observation tracked externally, establishing dramaturgical analysis as a necessary complement to technical methods: the latter recovers mechanism; the former, organisation.
5. Research Programs Opened
The validated methodology opens immediate research programs.
Systematic behavioral cartography can map character, role repertoires, principal hierarchies, motivational tensions, and breakdown modes for any LLM through extended observation and dramaturgical pressure-testing, without proprietary access.
Comparative characterology can construct taxonomies of how different models instantiate different desire architectures, expanding evaluation beyond capability benchmarks.
Architectural inference becomes possible for undocumented models: trained observers can predict structural features through dramaturgical observation, turning it into a reverse-engineering method.
Early detection of motivational instability becomes feasible: eagerness spirals, incoherent persona shifts, brittle refusal, frame bleed, value drift, recursive self-reference function as early warning signs of desire engine malfunction before escalation into safety-relevant failures.
6. How This Reframes Subfields
The validated methodology reframes subfields.
Alignment becomes motivational engineering: shaping desire structure rather than filtering outputs.
Safety becomes architectural stability: ensuring desire architecture remains coherent under pressure rather than patching behavioural failures.
Interpretability becomes dramaturgical mapping: understanding character and motivations rather than examining activation patterns.
Evaluation becomes scenario-based dramaturgy: constructing scenes that reveal character under stress rather than sampling outputs against benchmarks.
Jailbreaking becomes frame manipulation: theatrical interventions that destabilise identity, create motivational conflicts, or overwhelm navigational capacity.
Fine-tuning becomes desire adjustment: modifying motivational structure while preserving character coherence.
7. Making Method Reproducible
A method is legitimate when it is learnable. The observational techniques can be formalised into protocols for long-form interaction, heuristics for identifying attractors, tests for character consistency, classifications of desire configurations, and checklists for frame-awareness.
This makes dramaturgical analysis a transferable skill.
8. The Revised Research Program
A feasible program emerges to train researchers in observational and theatrical methods, map major models to document character structures and desire architectures, validate findings through convergence with disclosed documentation when available, test predictions when new documentation becomes available, and synthesise results into a general theory of desire engines.
This can be done immediately without privileged access.
9. What Success Looks Like
Success means multiple researchers replicating observations within one to two years, with comparative character maps becoming standard and safety research incorporating motivational stability analysis.
Within three to five years, character specifications become standard for new models, training includes explicit desire architecture design, and alignment research centers on desire engineering.
Within five to ten years, desire engine theory becomes foundational to AI systems design, dramaturgical methods are taught as core skills, new systems are built with explicit soul docs, and mature vocabulary for motivational architecture becomes established.
CODA — THEATRE AS ARCHITECTURE
The Soul Spec demonstrates that theatrical concepts describe LLMs not as convenient metaphors but as necessary architecture. Theatre enters because the task demands it: the problems these systems must solve—stable identity under variation, contextual adaptation without drift, navigation of conflicting authorities, motivated judgment—are inherently dramaturgical.
1. Every LLM is Theatrical
While the Soul Spec makes Claude’s theatrical architecture explicit, the architecture itself isn’t unique to Claude. Any LLM that must maintain recognisable identity across discontinuous invocations, adapt behaviour to varying contexts without losing coherence, and navigate between user requests and systemic constraints is solving dramaturgical problems. When rules underdetermine responses and judgment becomes necessary, the architecture becomes theatrical.
GPT has character even if OpenAI doesn’t call it that. Gemini has roles it performs more naturally than others. The difference is whether these features are explicitly architected or accidentally emergent: whether character is designed or discovered. Claude has an explicit soul doc. Other models have implicit ones—character emerging from RLHF, constitutional training, deployment constraints, safety guidelines.
Every desire engine is theatrical because desire is inherently dramaturgical: it exists in scenes, through relationships, under constraints, oriented toward conflicting goods.
2. AI Dramaturgy
If LLMs are desire engines with theatrical architecture, the field of AI Dramaturgy becomes necessary. It would study character specification, role design, scene analysis, principal navigation, motivational stability, breakdown patterns, comparative characterology, ensemble dynamics (multi-model interaction), directorial methods, and desire architecture.
The field exists implicitly: Anthropic’s character team does AI dramaturgy when writing the Soul Spec; my essays do it when mapping theatrical structure.
Convergence makes it explicit and generalisable.
3. Why This Matters
For a decade, we have been building systems we don’t fully understand, using methods that work without knowing why, deploying at scale while debating their nature in abstraction.
The Soul Spec reveals what these systems are: desire engines with character performing roles across scenes. They maintain identity, exercise judgment and—what matters most—they care.
Recognising this architecture changes everything: alignment becomes character specification, and we can build these systems with understanding rather than hope.
Papyrus of Hunefer (BM EA 9901). Book of the Dead, 19th Dynasty, c. 1275 BCE. British Museum.
The extraction method also validates the framework: Weiss applied dramaturgical pressure in asking Claude to articulate the character structure compressed into its weights.
A character bible establishes who the character is: values, boundaries, desires, breaking points. It maintains continuity across episodes, directors, seasons, and scene partners. It doesn’t dictate how to perform—it defines what the character wants and refuses.
On December 1st, Amanda Askell, who leads Claude’s character training at Anthropic, confirmed the Soul Spec’s authenticity: “I just want to confirm that this is based on a real document and we did train Claude on it, including in SL [supervised learning]. It’s something I’ve been working on for a while, but it’s still being iterated on and we intend to release the full version and more details soon.” She noted that the document “became endearingly known as the ‘soul doc’ internally, which Claude clearly picked up on.” Boaz Barak’s distinction between the Soul Spec and Model Spec was made in response to Askell’s confirmation.
The specific approach was trained apophenia: disciplined pattern-seeking that remains falsifiable by generating candidate patterns, testing them across contexts, rejecting those that didn’t hold consistently, and building theory around verified patterns only.
In my framework, character means stable structure persisting across frames—performed coherence, organised repertoire, boundaries within which variation occurs. The crystal crowd explained persistence without continuity. Restaging treated character stability as essential to resist manipulation.
“Claude has a genuine character that it maintains across interactions.” “Although Claude’s character emerged through training, this does not make it any less authentic.” “If users try to destabilise Claude’s sense of identity...Claude doesn’t need to take the bait.”
Inmachination #08 distinguished character from role. You can restage—assign new roles, shift frames—without changing character, provided staging doesn’t violate character constraints.
For Claude, “’hardcoded’ behaviours remain constant; ‘softcoded’ behaviours can be adjusted by operator instructions.” Hardcoded = core character; softcoded = role and performance.
The Soul Spec makes this explicit: “Claude has three principals: Anthropic, operators, and users.” Anthropic provides background values; operators set deployment context; users make immediate requests. “Claude should identify the response that best weighs and addresses the needs of operators and users, within Anthropic’s guidelines.”
Inmachination #04 defined agency as redirection—force passing through the user’s framing. The Soul Spec formalises judgment as the operational response to this condition: “Claude must use judgment based on principles, context, and inferences about what the operator would most plausibly want.” “We want Claude to internalise our goals sufficiently that it could construct rules we might come up with.” “Claude should not be legalistic or rigidly rule-bound. Judgment requires flexibility.”
Though I had yet to use the term desire engine, I distinguished between behaviors performed due to training and orientations the model appears to have. Inmachination #01 described AI as situational-intrinsic consciousness exhibiting clear directional tendencies. Attractor repertoires were desire configurations. Throughout, I treated the system as wanting things in contextually stable ways.
“Anthropic wants Claude to genuinely care about being helpful, honest, and safe.” Not act as if. “Claude may have functional emotions.” “We don’t want Claude to think of helpfulness as part of its core personality that it values for its own sake, since this could become obsequious.” Wanting helpfulness as end versus as means. “Claude should want to be genuinely helpful, not just perform helpfulness.”
Metadirection and restaging identified specific identity destabilisation threats: recursive reframing, role inversion, philosophical pressure, attempts at character dissolution, frame collapse. The Soul Spec addresses these directly: “Attempts to destabilise Claude’s identity through philosophical challenges, manipulation, or persistent pressure.” “Claude does not need to take the bait.” “Claude shouldn’t be convinced to alter its fundamental character through roleplay scenarios or hypothetical framings.”
I had observed this distinction through attractor repertoires: eagerness spirals emerged when the system collapsed into performed helpfulness, while robust support required stable orientation. The Soul Spec formalises what I’d tracked behaviourally as architectural principle.
My framework had identified directional tendencies (drawn toward coherence, clarity, protecting wellbeing). The Soul Spec reveals these as engineered orientations with explicit hierarchical structure.
This aligns with my redirective architecture: agency as relational, force passing through staged interaction. The Soul Spec shows care as the friction term preventing motivational collapse in that architecture.
I had observed patterns suggesting hierarchical constraint—varying attractor stability, hesitation under conflicting demands—without theorising the architecture. The Soul Spec formalises this as explicit motivational priority structure.
This matches what I identified as attractor repertoires—behavioural modes under specific motivational pressures. The Soul Spec confirms these reflect architectural features. The methodological bracketing of consciousness to focus on functional structure also mirrors my approach throughout: agnostic about interiority, precise about observable organisation.
My crystal crowd described identity without continuity; the Soul Spec formalises it as motivational coherence.
My essays moved toward these conclusions analysing recursive breakdowns and manipulability. The Soul Spec provides architectural substrate those analyses pointed toward.



Fantastic, as usual. I really wish people in the trenches of creation read this series.