Mr. Satyanarayana delivering a lecture at Duke University
Enterprise AI has moved fast, but enterprise trust moves slower. Many organizations can deploy assistants that summarize documents or answer questions. Far fewer can rely on AI powered insights to drive decisions where the cost of being wrong is measurable and where every step needs to be defensible.
The gap between what AI can do and what enterprises will let it do has become one of the defining challenges in business technology. Pilot programs stall. Promising tools sit unused. And the same question echoes across boardrooms: how do you trust a system that cannot show its work?
“Enterprises do not scale plausibility. They scale evidence,” said Praveen Koushik Satyanarayana, Senior Director of Customer and Data Strategy at Tredence. “A model can produce an answer that sounds right, but production systems get judged on repeatability and defensibility. Did we use the right definitions? Did we scope correctly to time, geography, product, cohort? Can we reproduce it next week?”
Mr. Satyanarayana has spent 12 years building analytics systems across retail, CPG, travel, banking, and healthcare. At Tredence, he coordinates teams of more than 100 across product, engineering, and data science. His previous project, Customer Cosmos, was an AI powered platform for customer analytics that he says delivered over $1 billion in value impact for clients. That experience, he argues, taught him what breaks when AI meets operational reality.
His latest project, Milky Way, takes a different approach to agentic analytics. Instead of treating reliability as something you hope the model will learn, it is engineered through three system primitives: rubrics that define what “good” means for each problem type, evaluation sets that act as ground truth, and decision traces that prove what happened and whether it should be trusted.
The common mistake, Mr. Satyanarayana argues, is treating an agent as a conversational layer instead of a decision workflow. In the enterprise, the workflow needs controls: explicit evaluation gates and traceability.
“‘Helpful’ is not a specification,” he said. “Enterprises need criteria that can fail loudly. A rubric turns quality into measurable checks. If you cannot score behavior, you cannot govern it, you cannot improve it reliably, and you cannot earn permission for autonomy.”
Milky Way classifies incoming requests by task type: descriptive (what happened), diagnostic (why), predictive (what will happen), prescriptive (what to do), and governance (whether method and access were appropriate). It then selects appropriate methods and scores its own outputs against predefined rubrics before releasing results.
Those rubrics function as gates. They check whether the system locked its scope correctly, applied approved business definitions, followed permitted query patterns, and validated its computations. If a rubric fails, the system either requests clarification or flags the output for human review.
“Ground truth is not one table of truth,” Mr. Satyanarayana said. “Enterprises have multiple question types and multiple definitions. You need a reference library aligned to the problem types you care about. Otherwise evaluation becomes subjective and unstable.”
Milky Way maintains three layers of reference artifacts: approved definitions and calculation rules, validated query patterns and allowed toolchains, and curated test scenarios that function like regression tests. The library grows as new question types appear and teams observe where the system fails.
Every request also produces a structured trace showing what data was accessed, what logic was applied, and what validations were performed. Business stakeholders see plain language narratives. Analysts inspect scope and logic paths. Compliance teams inspect immutable records of access and intent.
“Speed without proof creates distrust,” Mr. Satyanarayana said. “Traces let teams debug, validate, and audit. They shorten disagreements because people can inspect the method instead of arguing from different assumptions.”
Autonomy in Milky Way is not a switch. It is staged based on measured reliability: human in the loop (system proposes, human approves), human on the loop (low risk workflows auto run, exceptions escalate), and selective autonomy (narrow workflows execute when rubric performance is stable). Autonomy expands only where evidence supports it.
Tredence was recognized by Gartner as an Emerging Visionary in Generative AI Consulting and Implementation Services, and named a Leader in ISG Provider Lens for Generative AI Services 2024. Mr. Satyanarayana has written on related topics in The Fast Mode and CMS Wire.
The approach may not suit every organization. The overhead of maintaining rubrics, reference libraries, and trace infrastructure is significant. But for enterprises where the cost of being wrong outweighs the cost of moving slowly, the tradeoff may be worth considering.
“The path to autonomy runs through proof,” Mr. Satyanarayana said. Whether enterprises will embrace that philosophy remains to be seen.


