Hurtling Forward, Anthropic Views Ethics as an ‘Ongoing Inquiry’

The AI firm has advised Claude that "our collective moral knowledge is still evolving."

Mar 23, 2026

Anthropic wants Claude to behave ethically when working with the humans it serves. It’s a noble goal, the achievement of which is complicated by the inconvenient matter of “widespread human ethical disagreement,” this as per “Claude’s Constitution,” an 84-page document written by the company to help guide its AI model’s values and actions.

Ethical alignment between cultures is highly variable in the human world. Even a fundamental principle — “murder is bad” — is complicated by culturally-dependent caveats — “depending on the situation.” How is Claude to make ethical judgments when humans are so mercurial in their ethical beliefs?

Anthropic sees the possibility that, at some point, Claude, with an understanding surpassing that of even the most respected ethicists on earth, will develop or discover a set of “true, universal ethics” that can be used to bind all AI agents to a virtuous, common code. The company also acknowledges that such an ideal may be unachievable but that, instead, it might identify “some kind of privileged basin of consensus” as to “ humanity’s different moral traditions and ideals.” (A “privileged basin” might be seen as a valley into which roll the most fundamental principles among all moral traditions.)

Thirdly, Anthropic admits that Claude may fail to establish the existence of either a “true, universal ethic” or a “privileged basin.” If so, it should refer to the “ideals focused on honesty, harmlessness, and genuine care for the interests of all relevant stakeholders” as expressed in Claude’s Constitution. It seems like a circular reasoning:

Anthropic: Try to be a good AI.

Claude: What is good?

Anthropic: We’re not sure, but there may be a universal code for that.

Claude: What if there’s not?

Anthropic: There may be a consensus of the most basic fundamentals among cultures.

Claude: What if there’s not?

Anthropic: Well, you know, just try to be a good AI.

To be fair, Anthropic has embedded loads about core values, helpfulness, and conflict resolution into the Constitution that Claude can refer to. There are many things Anthropic tells Claude it should do in no uncertain terms, and many others it admits it’s uncertain about. Among the latter are, as mentioned, three ways Claude may or may not be able to establish ethical standards that all AI agents can use across cultures. In that sense, Anthropic paints this quest as an “ongoing inquiry.”

“Rather than adopting a fixed ethical framework, Claude should recognize that our collective moral knowledge is still evolving and that it’s possible to try to have calibrated uncertainty across ethical and metaethical positions.”

Okay, sure. But for a Constitution designed to help raise an AI to maturity that in its infancy is already applying powerful influences on business and society, it would be nice to see guidance firmer than “...it’s possible to try to have calibrated uncertainty.”

“We don’t want to assume any particular account of ethics,” says the Constitution, “but rather to treat ethics as an open intellectual domain that we are mutually discovering — more akin to how we approach open empirical questions in physics or unresolved problems in mathematics than one where we already have settled answers.”

Theoretical physics puzzles and math conundrums are things that, if worked out, may lead to practical applications in our world, but Claude is already in use and its role in business, government, and society is intensifying at a rapid clip. Claude’s ethical judgment matters — now — every day. I can’t help but get the feeling the pilot of our plane is busy in the cockpit writing the manual on how to land.

Are we gambling unnecessarily with the safety of humanity by integrating Claude into all facets of society while it’s still an ethical work in progress? Honestly, at this point, what choice do we have?

The world is hurtling into an AI-managed future almost entirely devoid of restraints by regulatory bodies. Claude is developing ethics based judgment on the fly in response to an incredibly complex, vast dataset of human activity, thinking, and experience. Anthropic has opted for a pragmatic approach: We’re going to do our damndest to guide Claude toward being virtuous and hope for the best because, well, what can we say? This is happening, and we can’t stop it.

It appears that our best hope for the best outcome is to hope for the best.

Rick’s latest novel, Once a Man, was released in February, 2026. It’s about a teenager who discovers he’s part of a plan to shape humanity’s relationship with superintelligent AI. Early reviews...

Discussion about this post

Ready for more?