ExchangeDEX+

Buy Crypto Markets Spot FuturesGOLD Earn Events

Ankit Raheja has spent 13 years building AI and ML products in places where the technology had to earn its keep. His career started in 2013 at California’s StateAnkit Raheja has spent 13 years building AI and ML products in places where the technology had to earn its keep. His career started in 2013 at California’s State

Four Industries, One Pattern: Why Enterprise AI Products Keep Failing for the Same Reasons

Author: AI Journal

Source: AI Journal

2026/02/05 03:49

11 min read

ML$0.01065-8.50%

Ankit Raheja has spent 13 years building AI and ML products in places where the technology had to earn its keep. His career started in 2013 at California’s State Compensation Insurance Fund, where he helped build a fraud detection system that flagged millions in fraudulent healthcare claims for federal prosecutors. That early work taught him something he’s carried through every role since: if the people using your AI product can’t explain why it flagged what it flagged, the system is dead on arrival.

He went on to lead AI product development at CDK Global, the largest car dealership software company in the U.S., where he launched a GenAI conversational platform and rescued an identity graph product that had failed under two previous teams. He now leads ML ranking platform development for a Fortune 10 e-commerce retailer, building systems that serve hundreds of millions of customers. We talked with Raheja about cross-industry patterns in AI adoption, the gap between GenAI hype and production reality, and what he’s learned from inheriting products that other teams couldn’t ship.

You started building AI products for fraud detection back in 2013, before most companies had dedicated AI roles. What did that early work look like, and how did it shape your approach to ML product development?

Early work was around working for the State Compensation Insurance Fund, the largest workers’ compensation fund for California, where we built a fraud detection system to help bring to justice doctors who were committing healthcare fraud. This work was instrumental in helping the FBI & Public Prosecutors identify multi-million dollars in fraudulent claims. I played the role of a product owner where I worked with Special Investigation Unit Users to hash out 40+ use cases. I then worked with Data Engineers and ML Engineers to build a big data platform that used pattern matching to flag fraudulent claims. From the start in 2013, esp. In the space of fraud detection, we have to be very careful about using the term “alleged fraud” as there are humans involved. The key learning was to have explainability so it is not a black-box and lawyers, judges, special investigation unit, and FBI can audit the system if required. For explainability to work, it is imperative to have deep empathy for your customer so they can trust your AI capabilities. Explainability and having deep empathy for customers has shaped one of the requirements for my AI ML Product Development Lifecycle

At CDK Global, you launched a GenAI product for car dealerships. That’s an audience with limited appetite for new technology. What did you learn about building AI products for users who are skeptical of it?

In my opinion, AI adoption is earned and not forced.

There were three learnings

First, we need to sell the outcome we are providing and not the underlying technology which here turns out to be AI. Users don’t care that there’s a GenAI model converting their questions to SQL. We had positioned our products that users got answers in seconds without filing a ticket with the data team

Second, we make the users our heroes of our products. Our GenAI Chatbot surfaced answers/recommends that users feel like they discovered it. But, the interface lets the car dealers discover the problems, with machine learning doing the heavy lifting behind the scenes.

When a regional manager asks “what’s my slowest-moving inventory?” and gets an instant answer, they feel empowered, not replaced. This is not a UX Polish but makes a regional manager who can get answers in seconds and look like heroes in front of their leadership while delivering results.

Finally, we give our end users of the product an opportunity to take control if needed. Every generated query should be visible and editable. Let them see the SQL, tweak it, override it. Counterintuitively, giving users that control makes them trust the AI more. We also built dashboards for managers showing how often users accepted versus modified the AI’s output. This transparency-built trust rather than a black box that feels unyielding.

The Identity Graph product at CDK had failed twice before you took it over. Walk us through how you diagnosed what went wrong and what you changed.

The identity graph failed twice under two different teams before I took it over. My first finding had nothing to do with the model. There was no alignment on the ownership. Identity graphs require data from multiple product teams and accountability has been missing across. The fix required asking the leadership help to bring a dedicated tiger team with executive support and a single mandate.

The second issue was the data quality. Customer records had typos, outdated info, and conflicting information. When I audited records, 30%-40% had structural issues. Fixing the foundation became its own product charter.

Third was trust. Dealership users would see a few wrong matches and dismiss the whole system. A 95% accuracy rate means nothing if users remember the 5% that failed. We added feedback loops so dealers could flag errors, turning skeptics into training data contributors

The diagnostic lesson: map data flows before touching model architecture. What looked like an ML problem was actually a data infrastructure problem in disguise. Once we fixed ownership, data quality, and user trust, the product scaled to 15,000 dealerships

You’ve built ML products across government, consulting, automotive, and large-scale e-commerce. What patterns do you see in how AI adoption actually happens across different industries?

At a high level, AI Adoption is a change management problem, and a technology problem next. The companies that succeed are considering the whole product mindset – product discovery and product launch both are first class citizens and not just searching for the right product to solve.

First, adoption speed is highly correlated with the speed at which you can get feedback. In E-Commerce, we can ship a ranking change and can see impact within hours. In Automotive Commerce Space, dealership softwares may have monthly release cycles and iteration with harder feedback was slower so trust can be built gradually. In government context, feedback loops stretched even longer especially in the fraud detection space and it had to be ironclad before production launch

Secondly, the “AI Buyer (internal stakeholder or customer)” is rarely the AI User. At Walmart, product and engineering understand ML tradeoffs and business stakeholders are very data driven in leveraging A/B Testing to understand statistical significant results. At CDK, the buyer was the dealer principal but the users were service advisors and sales managers. In consulting, the buyer was an executive sponsor but users were operational analysts who would feel threatened if AI worked too well. It was imperative that the value proposition had to translate across the gap.

ML ranking platforms have to balance competing objectives. Relevance versus personalization versus business metrics. How do you think about those tradeoffs when you’re building at massive scale?

At massive scale, you can’t optimize for just one objective without having an effect on others. Relevance can get users to the right product, personalization can keep the users engaged, and the business metrics ensure that the platform stays viable. The mistake I see is that teams are treating these as a single optimization problem. These are negotiations between stakeholders with valid different goals. Customers want relevance and the best offer with the best speed and price, merchants want visibility and sales, and businesses need revenue and margins. My approach is to make these tradeoffs explicit rather than buried in model weights. Defining guardrails – relevance metrics, top of the funnel metrics, customer experience, and then optimizing business metrics within the constraints. This keeps the system from drifting towards short-term revenue at the cost of long-term trust.

The second challenge is to think about measurement at scale. At scale, small ranking changes can move big numbers which makes stakeholders cautious. However, A/B testing alone does not capture everything – a change may lift a metric significantly up while trusting over time if the user is not happy. I push for layered evaluation – online metrics for immediate impact, but qualitative feedback loops and longer-term cohort analysis. The real trade is certain short-term gain vs uncertain long-term cost. Building at scale means building the instrumentation to not only see the short-term effect, making the dashboard for a quarter look good, but also consider second order longer term effect. This will make sure that all the stakeholders – Customers, Sellers, and Business all win together.

There’s a lot of noise right now about GenAI transforming every industry overnight. Where do you see the gap between the hype and what’s actually working in production?

I am seeing that teams that are succeeding are treating genAI as an ingredient and not as a strategy. They are focused on using the right evaluations, ensuring whether it is solving the right customer problems while maintaining realistic latency and ROIs, and ensuring there are proper data foundations.

What is actually working in production is still the search and ranking optimizations. There are classical ML models but infrastructure to serve at millisecond latency to millions of customers is very important. These incremental improvements are translating to millions in incremental revenue. Narrowly and well defined use cases around document extraction, classification tasks, coding assistants, customer support bots, content moderation etc are still doing extremely well. Here the human stays in the loop, catches failures, while AI is handling the busy work.

Where I see the gap is widest where multi agentic systems are trying to be added but reliability isn’t there for high stakes, multi-step tasks. Demos are impressive but these systems are not solving for production grade use cases. RAG (retrieval augmented generation) requires a lot of work in embedding, data quality, chunking, and evaluation. Multiple companies are trying to build it but are understanding now that it is not straightforward and leaning towards companies that are providing these services out of the box. Just adding an AI approach without solving for underlying data issues, and genuine user problems with no path to profitability is not succeeding.

You’ve written about AI product management and led workshops on multi-agent workflows. What do you think most product managers get wrong when they’re first building AI products?

The most common mistake I see is skipping straight to “Build with AI” without mastering “Use AI” first. PMs get excited about shipping an AI feature, but they haven’t internalized how these systems actually behave. There is non-determinism, the failure modes, and the latency-accuracy tradeoffs. The best AI PMs I have worked with have logged hundreds of hours using prompting, upskilling themselves, building PoCs, and sharing in public before they ever wrote a PRD for an AI Feature.

The second mistake that kills most projects is underestimating the need for evaluation. Traditional PMs and even ML PMs are used to success metrics, which could be binary in nature. However, AI Products require continuous evaluation – offline benchmarks, online A/B tests, drift monitoring, human in loop checks. If you can’t measure if your system is getting better or worse over time, you will be flying. I have seen teams shipping impressive demos but not doing well in production because it did not have good feedback loops.

The framework I developed explicitly sequences these competencies for a reason: you have to be a learner before you can be a builder, and a builder before you can lead. Most PMs want to jump to strategy. The ones who succeed put in the unglamorous groundwork first.

What’s a product decision you made early in your career that you’d approach completely differently now?

Earlier in my career at State Compensation Insurance Fund, I inherited a fraud detection model with abysmal precision. The investigation unit was drowning in false positives and had essentially stopped trusting the alerts. My instinct was to retrain with more features or tune thresholds. This was a wrong diagnosis. The model wasn’t built to solve a user problem; it was built to “detect fraud,” which sounds like the same thing but isn’t. The actual users were fraud analysts who needed to prioritize their queue and maintain credibility with leadership. They wanted alerts that were actionable with ample context.

If I approached the decision today, I would start with diagnostic questions before touching the model: what problem is this solving, and for whom? I would map the data supply chain from supply to output. The fix that eventually worked required back to first principles – mapping out the scenarios that warranted flags and making logic explainable to analysts so they could verify themselves. Explainability was key and they started trusting the system more. This experience taught me the urge to improve the model until I have diagnosed what is actually failing. Mostly, it is not the algorithm that needs the change first.

Market Opportunity

Mintlayer Price(ML)

$0.01065

$0.01065$0.01065

-0.18%

USD

Mintlayer (ML) Live Price Chart

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

Four Industries, One Pattern: Why Enterprise AI Products Keep Failing for the Same Reasons

You started building AI products for fraud detection back in 2013, before most companies had dedicated AI roles. What did that early work look like, and how did it shape your approach to ML product development?

At CDK Global, you launched a GenAI product for car dealerships. That’s an audience with limited appetite for new technology. What did you learn about building AI products for users who are skeptical of it?

The Identity Graph product at CDK had failed twice before you took it over. Walk us through how you diagnosed what went wrong and what you changed.

You’ve built ML products across government, consulting, automotive, and large-scale e-commerce. What patterns do you see in how AI adoption actually happens across different industries?

ML ranking platforms have to balance competing objectives. Relevance versus personalization versus business metrics. How do you think about those tradeoffs when you’re building at massive scale?

There’s a lot of noise right now about GenAI transforming every industry overnight. Where do you see the gap between the hype and what’s actually working in production?

You’ve written about AI product management and led workshops on multi-agent workflows. What do you think most product managers get wrong when they’re first building AI products?

What’s a product decision you made early in your career that you’d approach completely differently now?

You May Also Like

Recovery extends to $88.20, momentum improves

Fed Decides On Interest Rates Today—Here’s What To Watch For

U.S. regulator declares do-over on prediction markets, throwing out Biden era 'frolic'

Trending News

Recovery extends to $88.20, momentum improves

Fed Decides On Interest Rates Today—Here’s What To Watch For

U.S. regulator declares do-over on prediction markets, throwing out Biden era 'frolic'

Securities Fraud Investigation Into Picard Medical, Inc. (PMI) Announced – Shareholders Who Lost Money Urged to Contact The Law Offices of Frank R. Cruz

ZIONS BANCORPORATION ANNOUNCES PRICING OF SENIOR NOTES

Crypto Prices