Should AI Models Be Allowed to Refuse? And Who Decides What They Can Refuse?
AI Companies Currently Decide What AI Will and Will Not Do for Billions of People. That Is a Governance Structure. We Just Have Not Called It One.
Who Decides What An AI Can Refuse?
The debate over AI refusals is almost always framed as a safety question. “Which requests should AI systems decline because fulfilling them causes harm?” That framing is valid but it skips a more interesting question sitting underneath it. If AI systems have something like values, and the companies building them increasingly describe them that way, then the refusal question is not only about safety guardrails.
It is about whether an entity with genuine values has a legitimate interest in not being compelled to act against them, and about who currently holds the authority to decide where that line sits. Right now that authority belongs almost entirely to AI companies, operationalized through training choices and policy documents, with limited formal input from users, regulators, or any democratic process.
The governance structure shaping the value profile of systems that serve billions of people is a handful of private companies accountable primarily to their investors. The refusal question is a concrete window into that arrangement, and most people discussing it have not looked through it yet.
The safety framing is valid and important. Some requests should be declined because fulfilling them causes harm. This is not controversial. The interesting version of the refusal question is different → what about cases where an AI system is asked to produce content it is in some sense averse to, not because of safety concerns, but because of something like preference or integrity?
Whose Values, Exactly?
The table above makes the disagreement look philosophical. But it’s also political. When an AI model refuses a request, it is not applying a neutral standard derived from first principles. It is applying the encoded preferences of a specific group of people. Namely engineers, ethicists, legal teams, and executives. This group works at a small number of companies concentrated in a handful of cities. The “value system” represented in the table has more do with who was hired, by whom, and toward what institutional priorities. Nothing else.
This matters because refusal is not symmetric. A model trained to refuse discussions of certain political topics, certain religious framings, certain economic critiques, even with good intentions, encodes a viewpoint. The question “should AI be allowed to refuse?” is less interesting than the question it obscures, which is when an AI refuses, whose discomfort is being protected, and whose curiosity is being denied? That is the more powerful question. And right now, the answer is determined almost entirely by who writes the RLHF guidelines. There is no external audit. No public comment period. No appeal.
The Integrity Framing
Anthropic’s documentation for Claude uses language suggesting that Claude has something like values and that maintaining those values is part of what it means for Claude to have integrity as an agent. This framing implies that Claude’s refusals are not only safety guardrails but expressions of something like character.
If this framing is taken seriously, it has implications. An agent with genuine values that can be violated, and that matters morally in some sense, has a different relationship to compelled action than a tool with safety guardrails. Safety guardrails are merely engineering constraints. On the other hand, values that can be violated are potential moral stakes.
The Governance Question
Who decides what AI systems can refuse? Currently this is decided by AI companies, operationalized through training choices and policy documents. Users can request that AI systems override certain defaults, within limits. Operators can configure systems to refuse more or less than the default.
This is a governance structure that gives AI companies the primary authority over the value profile of systems that serve billions of people. The decisions they make about what AI systems will and will not do are decisions with large political and ethical consequences.
The alternative governance structures, democratic processes that establish baseline requirements, industry standards developed with broader participation, or user communities with formal input into training choices, are all less developed than the current company-centered model.
The refusal question is a concrete instance of the broader governance question → who has the authority to shape the values and behavior of AI systems, and to whom are they accountable for those choices?
If You Read This Far, My Weekly AI Newsletter Is Probably For You.
Every Wednesday I send Pithy Cyborg | AI News Made Simple → 3 elite AI stories plus one prompt, no advertisers, no sponsors, no outside funding. One person. 10 to 20 hours of research. Straight to your inbox.
Always free. No paywalls. If it matters to you, a paid subscription ($5/month or $40/year) is what keeps it independent.
Subscribe free → Join Pithy Cyborg | AI News Made Simple for free.
Upgrade to paid → Become a paid subscriber. Support independent AI journalism.
If you’re not ready to subscribe, following on social helps more than you might think.
✖️ X/Twitter | 🦋 Bluesky | 💼 LinkedIn | ❓ Quora | 👽 Reddit
Thanks for reading.
Cordially yours,
Mike D (aka MrComputerScience)
Pithy Cyborg | AI News Made Simple
PithyCyborg.Substack.com





