Mr. Satyanarayana delivering a lecture at Duke University Enterprise AI has moved fast, but enterprise trust moves slower. Many organizations can deploy assistantsMr. Satyanarayana delivering a lecture at Duke University Enterprise AI has moved fast, but enterprise trust moves slower. Many organizations can deploy assistants

Engineering Evidence into Agentic AI systems

저자: AI Journal

출처: AI Journal

2026/01/04 04:48

4분 읽기

SLEEPLESSAI$0.02156-8.21%

TRUST$0.06974-0.52%

FAR$0.002241-1.49%

이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Mr. Satyanarayana delivering a lecture at Duke University

Enterprise AI has moved fast, but enterprise trust moves slower. Many organizations can deploy assistants that summarize documents or answer questions. Far fewer can rely on AI powered insights to drive decisions where the cost of being wrong is measurable and where every step needs to be defensible.

The gap between what AI can do and what enterprises will let it do has become one of the defining challenges in business technology. Pilot programs stall. Promising tools sit unused. And the same question echoes across boardrooms: how do you trust a system that cannot show its work?

“Enterprises do not scale plausibility. They scale evidence,” said Praveen Koushik Satyanarayana, Senior Director of Customer and Data Strategy at Tredence. “A model can produce an answer that sounds right, but production systems get judged on repeatability and defensibility. Did we use the right definitions? Did we scope correctly to time, geography, product, cohort? Can we reproduce it next week?”

Mr. Satyanarayana has spent 12 years building analytics systems across retail, CPG, travel, banking, and healthcare. At Tredence, he coordinates teams of more than 100 across product, engineering, and data science. His previous project, Customer Cosmos, was an AI powered platform for customer analytics that he says delivered over $1 billion in value impact for clients. That experience, he argues, taught him what breaks when AI meets operational reality.

His latest project, Milky Way, takes a different approach to agentic analytics. Instead of treating reliability as something you hope the model will learn, it is engineered through three system primitives: rubrics that define what “good” means for each problem type, evaluation sets that act as ground truth, and decision traces that prove what happened and whether it should be trusted.

The Problem with “Helpful”

The common mistake, Mr. Satyanarayana argues, is treating an agent as a conversational layer instead of a decision workflow. In the enterprise, the workflow needs controls: explicit evaluation gates and traceability.

“‘Helpful’ is not a specification,” he said. “Enterprises need criteria that can fail loudly. A rubric turns quality into measurable checks. If you cannot score behavior, you cannot govern it, you cannot improve it reliably, and you cannot earn permission for autonomy.”

Milky Way classifies incoming requests by task type: descriptive (what happened), diagnostic (why), predictive (what will happen), prescriptive (what to do), and governance (whether method and access were appropriate). It then selects appropriate methods and scores its own outputs against predefined rubrics before releasing results.

Those rubrics function as gates. They check whether the system locked its scope correctly, applied approved business definitions, followed permitted query patterns, and validated its computations. If a rubric fails, the system either requests clarification or flags the output for human review.

Building a Reference Library

Agentic AI “Ground truth is not one table of truth,” Mr. Satyanarayana said. “Enterprises have multiple question types and multiple definitions. You need a reference library aligned to the problem types you care about. Otherwise evaluation becomes subjective and unstable.”

Milky Way maintains three layers of reference artifacts: approved definitions and calculation rules, validated query patterns and allowed toolchains, and curated test scenarios that function like regression tests. The library grows as new question types appear and teams observe where the system fails.

Every request also produces a structured trace showing what data was accessed, what logic was applied, and what validations were performed. Business stakeholders see plain language narratives. Analysts inspect scope and logic paths. Compliance teams inspect immutable records of access and intent.

“Speed without proof creates distrust,” Mr. Satyanarayana said. “Traces let teams debug, validate, and audit. They shorten disagreements because people can inspect the method instead of arguing from different assumptions.”

A Staged Path to Autonomy

Autonomy in Milky Way is not a switch. It is staged based on measured reliability: human in the loop (system proposes, human approves), human on the loop (low risk workflows auto run, exceptions escalate), and selective autonomy (narrow workflows execute when rubric performance is stable). Autonomy expands only where evidence supports it.

Tredence was recognized by Gartner as an Emerging Visionary in Generative AI Consulting and Implementation Services, and named a Leader in ISG Provider Lens for Generative AI Services 2024. Mr. Satyanarayana has written on related topics in The Fast Mode and CMS Wire.

The approach may not suit every organization. The overhead of maintaining rubrics, reference libraries, and trace infrastructure is significant. But for enterprises where the cost of being wrong outweighs the cost of moving slowly, the tradeoff may be worth considering.

“The path to autonomy runs through proof,” Mr. Satyanarayana said. Whether enterprises will embrace that philosophy remains to be seen.

시장 기회

플러리싱 에이아이 가격(SLEEPLESSAI)

$0.02156

$0.02156$0.02156

-9.90%

USD

플러리싱 에이아이 (SLEEPLESSAI) 실시간 가격 차트

Don't Miss $200,000 U-Fest

Get mystery boxes, 12% APR & $200 new user gifts!

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.