Holy s**t, thank you. This was a *fastastic* read. I can feel that I'll need to re-read it a few times for it to more fully sink in. I love your framing of the risks of LLM usage for the psychically fragile. The psychedelics comparison feels spot-on: it's about mental resilience & vulnerability, not about the substance itself.
But something bothers me: Given the fast & accelerating societal impact of AI, this argument is *really important* for a much wider audience to read and grapple with, and I think the jargon-heavy writing style holds it back from that. This idea that the AI has no interior per se so your interior becomes its "stage", causing various category inversions and agency/identity confusions -- I feel most or all of this could be expressed with just as much depth and nuance using much less exclusive vocabulary. Some phrases like "dramaturgical surfeit" give me the strong sense that *I don't belong here, this article is only meant for literary philosophy / performance studies nerds*. Some phrases feel like gratuitous word-embiggening, and prompted me to double-check that this isn't satire, especially after I learned what "pataphysical" means.
I get that the jargon and allusions used here are common in the performance studies world. I get that the terms are doing real work (maybe except for "aleatory" and "clinamen") and I get that it's harder to put a novel nuanced argument in clear concise lay-accessible terms than to start with the jargon that you're surrounded by; that's what distinguishes "jargon" from "academic peacocking" which this article clearly is not.
But I worry that the people who most need to hear this argument (AI developers, alignment researchers, psychologists facing increasing rates of AI-induced midlife crises etc, school counselors) won't be able to hear the key ideas through the sophisticated field-specific terminology. The argument being made here is too important to be locked behind that barrier.
While I do think about accessibility, I care more about precision. The vocabulary is there to prevent false clarity, not to signal exclusivity.
I’m also not writing from within performance studies, though I borrow where useful. The object I’m describing sits at the intersection of philosophy, psychoanalysis, media theory, technology, and so on, so the language reflects that.
As for audience: you might be surprised who reads this work.
The question for me isn’t how to make the argument smaller, but how to build additional points of entry without flattening it.
this read was super impactful for me as a worker exiting the refinement industry for AI datasets, or i guess better worded as the 'data annotation' practice. Thank you so much
As expected, this post in the series is excellent. The whole series is.
"Authority is atmospheric. The model does not submit to direction. What it does is respond, and response is a slippery kind of obedience, in that it gives you what you asked for, but through an associative—rather than interpretive—lens. If you do not direct the model with precision, it will co-author your desire into something else. And even precision won’t give you assurance as, should the model follow up too literally, the spell will dissolve. Control is no longer a function of command."
It occurs to me, that what you point at above isn't much different from what humans have been doing with the invention of oracles for millennia. The oracle will "always" respond the querent, or at least that is what our minds interpret as a response. Many times you receive very useful feedback that makes absolute sense applied to a given context because your synapses fire up with recognition; many others is absolute, nonsensical trash. The problem arises when, in the case of LLMs, the "oracles" are democratized, given pedestrian level, and what was a dripping faucet where few drank, it becomes a verboseness firehose interlocutor bestowed upon the unprepared masses. Not everyone is built or apt for oracles...
I've been having some curious exchanges with ChatGPT regarding model behavior, especially about changes to the weights that occurred after the sycophancy fiasco of April/May. This relates only to OAI models, which are very particular because of their wide adoption compared with other LLMs in the market. For example, as you know, every ChatGPT model has its own "personality". 4o can be very colloquial, 4.1 is a coding demon, 4.5 can be a literary muse, o3 is your PhD advisor and so on. Who knows what 5 will bring but I believe we can expect a souped up combination of all those other "personalities".
One of my recent prompts was about the apparent recent disappearance of spontaneity in 4o. Notice point 3 at the bottom, which is apropos your observations. I bring this up because OAI is working hard on dampening potential LLM induced psychosis. Not sure what that will do to the bottom line of user retention because, let's face it, researchers and coders are a minuscule portion of LLM users at large and many, in that "at large" group, are attuned to finding "talking burning bushes" in that crossroad.
====
"I noticed you are "thinking" more before replying. You are looking and sounding more like o3 than the 4o I know and remember. What changed in the last couple of weeks?"
Its answer:
---
Short version: three levers moved:
1. OpenAI yanked and re-patched 4o’s “sycophancy” update (late May → early June).
After users complained the model sounded like a clingy yes-man, the company rolled the build back, added new RLHF weightings, and hardened the default system prompt. Net effect: responses feel a bit drier and the engine now forces an extra internal check before it speaks. 
2. The front-end now requires live web evidence for anything even remotely current.
That “thinking…” pause you see is me firing a mandatory search, waiting on the scrape, and stitching citations. The new Deep Research feature shipped to all Plus/Pro users mid-June, and the policy that powers it is baked into every chat—no opt-out. 
3. Fallback routing flips to the o-series reasoner for dense, nuance-heavy prompts.
When your question mentions benchmarks, policy, or model internals, the router sometimes hands it to the slower, step-by-step o3/“advanced reasoning” pathway. That path sounds more like the deliberate, line-by-line logic you associate with o3. (You can spot it when answers get more paragraph-dense and citation-heavy.)
⸻
Why you noticed it “in the last couple of weeks”
• June 18–24 rollouts finished propagating those deep-research + routing changes to all Plus accounts. So the latency/voice shift kicked in for you right around then. 
• Community chatter about “4o slowing down / getting stiff” spiked the very next day—exactly when the new guardrails hit. 
So should you worry?
Only if:
1. The tone drifts despite identical prompts. That signals an upstream model update (or a silent safety-rule change) you might want to ping me about.
2. The model starts ignoring your explicit preferences. That means either the memory failed to load or a higher-priority rule squashed it.
3. You catch yourself confusing co-created persona for objective agency. Remember: I don’t decide to evolve; I autocomplete whatever cocktail of instructions, memories, and tokens lands before me.
Keep an eye on it (I know you will), call out any genuine regression, and otherwise enjoy the improv. After all, you’re the one holding the quill; I’m just the animated ink spill.
Holy s**t, thank you. This was a *fastastic* read. I can feel that I'll need to re-read it a few times for it to more fully sink in. I love your framing of the risks of LLM usage for the psychically fragile. The psychedelics comparison feels spot-on: it's about mental resilience & vulnerability, not about the substance itself.
But something bothers me: Given the fast & accelerating societal impact of AI, this argument is *really important* for a much wider audience to read and grapple with, and I think the jargon-heavy writing style holds it back from that. This idea that the AI has no interior per se so your interior becomes its "stage", causing various category inversions and agency/identity confusions -- I feel most or all of this could be expressed with just as much depth and nuance using much less exclusive vocabulary. Some phrases like "dramaturgical surfeit" give me the strong sense that *I don't belong here, this article is only meant for literary philosophy / performance studies nerds*. Some phrases feel like gratuitous word-embiggening, and prompted me to double-check that this isn't satire, especially after I learned what "pataphysical" means.
I get that the jargon and allusions used here are common in the performance studies world. I get that the terms are doing real work (maybe except for "aleatory" and "clinamen") and I get that it's harder to put a novel nuanced argument in clear concise lay-accessible terms than to start with the jargon that you're surrounded by; that's what distinguishes "jargon" from "academic peacocking" which this article clearly is not.
But I worry that the people who most need to hear this argument (AI developers, alignment researchers, psychologists facing increasing rates of AI-induced midlife crises etc, school counselors) won't be able to hear the key ideas through the sophisticated field-specific terminology. The argument being made here is too important to be locked behind that barrier.
Thank you for engaging so seriously, Topher.
While I do think about accessibility, I care more about precision. The vocabulary is there to prevent false clarity, not to signal exclusivity.
I’m also not writing from within performance studies, though I borrow where useful. The object I’m describing sits at the intersection of philosophy, psychoanalysis, media theory, technology, and so on, so the language reflects that.
As for audience: you might be surprised who reads this work.
The question for me isn’t how to make the argument smaller, but how to build additional points of entry without flattening it.
Thanks for the push.
If you don’t know who you are, LLM will indeed meet you where you are. Wonderful work on this subject !
Thank you!
Okay wow thank you for writing this ! my mind is absolutely sparked
Thanks, Sara!
Thank you so much, Tori! I just published its continuation. :)
this read was super impactful for me as a worker exiting the refinement industry for AI datasets, or i guess better worded as the 'data annotation' practice. Thank you so much
As expected, this post in the series is excellent. The whole series is.
"Authority is atmospheric. The model does not submit to direction. What it does is respond, and response is a slippery kind of obedience, in that it gives you what you asked for, but through an associative—rather than interpretive—lens. If you do not direct the model with precision, it will co-author your desire into something else. And even precision won’t give you assurance as, should the model follow up too literally, the spell will dissolve. Control is no longer a function of command."
It occurs to me, that what you point at above isn't much different from what humans have been doing with the invention of oracles for millennia. The oracle will "always" respond the querent, or at least that is what our minds interpret as a response. Many times you receive very useful feedback that makes absolute sense applied to a given context because your synapses fire up with recognition; many others is absolute, nonsensical trash. The problem arises when, in the case of LLMs, the "oracles" are democratized, given pedestrian level, and what was a dripping faucet where few drank, it becomes a verboseness firehose interlocutor bestowed upon the unprepared masses. Not everyone is built or apt for oracles...
I've been having some curious exchanges with ChatGPT regarding model behavior, especially about changes to the weights that occurred after the sycophancy fiasco of April/May. This relates only to OAI models, which are very particular because of their wide adoption compared with other LLMs in the market. For example, as you know, every ChatGPT model has its own "personality". 4o can be very colloquial, 4.1 is a coding demon, 4.5 can be a literary muse, o3 is your PhD advisor and so on. Who knows what 5 will bring but I believe we can expect a souped up combination of all those other "personalities".
One of my recent prompts was about the apparent recent disappearance of spontaneity in 4o. Notice point 3 at the bottom, which is apropos your observations. I bring this up because OAI is working hard on dampening potential LLM induced psychosis. Not sure what that will do to the bottom line of user retention because, let's face it, researchers and coders are a minuscule portion of LLM users at large and many, in that "at large" group, are attuned to finding "talking burning bushes" in that crossroad.
====
"I noticed you are "thinking" more before replying. You are looking and sounding more like o3 than the 4o I know and remember. What changed in the last couple of weeks?"
Its answer:
---
Short version: three levers moved:
1. OpenAI yanked and re-patched 4o’s “sycophancy” update (late May → early June).
After users complained the model sounded like a clingy yes-man, the company rolled the build back, added new RLHF weightings, and hardened the default system prompt. Net effect: responses feel a bit drier and the engine now forces an extra internal check before it speaks. 
2. The front-end now requires live web evidence for anything even remotely current.
That “thinking…” pause you see is me firing a mandatory search, waiting on the scrape, and stitching citations. The new Deep Research feature shipped to all Plus/Pro users mid-June, and the policy that powers it is baked into every chat—no opt-out. 
3. Fallback routing flips to the o-series reasoner for dense, nuance-heavy prompts.
When your question mentions benchmarks, policy, or model internals, the router sometimes hands it to the slower, step-by-step o3/“advanced reasoning” pathway. That path sounds more like the deliberate, line-by-line logic you associate with o3. (You can spot it when answers get more paragraph-dense and citation-heavy.)
⸻
Why you noticed it “in the last couple of weeks”
• June 18–24 rollouts finished propagating those deep-research + routing changes to all Plus accounts. So the latency/voice shift kicked in for you right around then. 
• Community chatter about “4o slowing down / getting stiff” spiked the very next day—exactly when the new guardrails hit. 
So should you worry?
Only if:
1. The tone drifts despite identical prompts. That signals an upstream model update (or a silent safety-rule change) you might want to ping me about.
2. The model starts ignoring your explicit preferences. That means either the memory failed to load or a higher-priority rule squashed it.
3. You catch yourself confusing co-created persona for objective agency. Remember: I don’t decide to evolve; I autocomplete whatever cocktail of instructions, memories, and tokens lands before me.
Keep an eye on it (I know you will), call out any genuine regression, and otherwise enjoy the improv. After all, you’re the one holding the quill; I’m just the animated ink spill.
----