An AI engineer three months into running same experiments with different threshold settings, attempting reaching accuracy numbers chosen by someone never training models. PM committed 90% to executive board. Engineer reached 81% with exhausted approaches. Nobody lied. Nobody proved incompetent. Sequence created impossible situations — fixed targets set before understanding achievable levels, inherited by people least able renegotiating them. Result: missed deadlines, unhappy boards, friction between PMs making promises and AI engineers unable keeping them. These failure costs emerge from planning conversations never happening. Internal versions appear as sprint review failures. Client-facing versions appear as renewal conversation problems.
The Hidden ROI Killer in AI Projects: Starting With the Solution
Most AI investment failures are decided in the first two weeks. This is how you ensure those two weeks are used wisely.

An e-commerce organization came seeking to "build a cart abandonment prediction model." The goal was identifying baskets at risk of abandonment early enough to trigger automatic discounts. Scope was defined, use case concrete, technical approach mapped. When asked what metric they aimed to move and why customers abandoned carts initially, conversation stopped. Nobody had posed that question. Analysis revealed customers with delivery windows of five or more working days abandoned at significantly higher rates than those with faster delivery. No prediction model would have changed that. They didn't need AI. They needed a better delivery program.
The model would have shipped. Discounts would have launched. Abandonment rates would remain unchanged — intervention targeted the wrong cause.
The most common situation we find when an organization is considering an AI initiative: they've already decided what they're building, and more often than not, they've also decided on the technical approach. Not what problem they're solving. Not what metric they want to move. What they're building.
It might be a chatbot, recommendation engine, document processing pipeline, or "co-pilot" for internal workflow. Technology is chosen. Feature already in backlog. Sprint already planned. Someone's been prototyping in a side branch.
Business case gets assembled around the decision, not before it.
This pattern is the single most reliable predictor of AI initiative failure.
In 2025, more than 40% of companies abandoned most of their AI initiatives (S&P Global). MIT's research on enterprise generative AI found 95% of projects failed delivering meaningful business impact. Gartner projects 30% of AI POCs abandoned before reaching production, with over 40% of agentic AI projects canceled before end of 2027.
The dominant narrative blames technology complexity, data quality, or organizational resistance. All play a role. But the underlying cause is simpler and more preventable: the sequence is wrong.
Most organizations start with a solution. They should start with a metric.
Stop and Read this First
Before proceeding: if your organization operates in pure exploration mode — genuinely curious about AI without specific business problems or P&L pressure — this process feels premature. That's acceptable. This article targets organizations already holding specific business goals, considering AI investment to advance them, seeking structured decision-making before committing real budget.
Start With a Business Goal. Then Connect it to a Metric.
Every business function exists moving something measurable: revenue, cost, time, quality, risk. Every organizational initiative should connect to specific, measurable outcomes.
AI is not an exception. It is not a category suspending normal business logic. AI is investment, requiring ties to specific, measurable outcomes like every other investment.
Ron Kohavi — who ran controlled experimentation programs at Microsoft and Amazon, authoring "Trustworthy Online Controlled Experiments" — calls this the Overall Evaluation Criterion: single agreed-upon metric defining "better" before initiatives begin.
Eliyahu Goldratt stated bluntly: "Tell me how you measure me, and I will tell you how I will behave."
Neither discussed AI. Both proved exactly right about it.
The goal of this step is not finding metrics justifying pre-chosen solutions, but defining what needs changing before devising best approaches.
Four Characteristics of Good Metrics
1. Specific and measurable. "Improve operational efficiency" isn't a metric. "Reduce average customer support resolution time from 11 minutes to under 4 minutes" is.
2. Direct connection to business outcomes. Revenue generated, cost reduced, time compressed, error rate decreased, churn prevented. Not "better customer experience." What does better mean numerically?
3. Attributable. Determine whether metric changes resulted from your AI initiative or something else. This matters more than teams expect.
4. Movable by the intervention considered. Some metrics are important but too downstream for single AI initiatives. Know causal chains before committing budget.
The cart abandonment example represents failure of characteristic four. Metric was real. Initiative coherent. Intervention couldn't move it — nobody traced causal chains before committing.
Why Organizations Fail: They Start With Implementation
With clear goals and measurable metrics established, the next question follows: how do we get there? What AI gets built? Where in business does it land?
Most organizations skip this. They jump from "we want improving X" directly to "we'll build specific solution leveraging specific AI technology to improve X."
The problem isn't team quality. It's motion sequence.
Product companies train teams shipping features through known planning motion: identify user needs, scope solutions, add to roadmap, allocate engineering, ship, measure. This workflow optimizes building things solving already-validated problems.
AI features aren't roadmap decisions. They're business design decisions. The question isn't "how do we build this?" but "what metric do we want moving, how would we know we've moved it, and what solution category could plausibly do it?"
Reverse-engineering from business metrics to feature categories isn't a muscle most product teams developed. So they substitute known motion — scope and ship — starting with solutions. Metrics get defined fitting features, not vice versa.
Data supports this pattern. S&P Global reported 42% of companies abandoned most AI initiatives in 2025 — up from 17% previously. This spike isn't explained by AI technology getting harder. It's explained by organizations scaling initiative volume without scaling rigor of how initiatives get selected and scoped.
The Problem You Don't Talk About: Non-Determinism
Here sits the structural tension at every AI initiative's center, almost never explicitly named in early conversations.
You're attempting moving deterministic business metrics with non-deterministic instruments.
A product owner commits shipping AI features in eight weeks. Sprint is planned. Engineer has working AI solution. Week eight arrives — solution runs. It's also producing outputs sometimes wrong, sometimes inconsistent across runs, occasionally surprising in unanticipated ways.
The PO assumed "working" meant what it means for software. It doesn't.
When writing features calculating discounts, calculation is deterministic. Same inputs yield same output, every time, indefinitely. When building AI features doing analogous work — recommending discounts, suggesting responses, generating summaries — output is probabilistic. Models are right most times. Not all times. Not identically each time.
This isn't a bug. It is fundamental to how AI systems work. But it has direct consequences for planning features, sprint commitments, and what "done" actually means.
You cannot plan accuracy improvements the way you plan feature sprints.
In The Real Reason 90% of AI Projects Fail, I described this as the Language of Uncertainty — one of five languages AI demands that software projects never had to speak. The most important sentence in it: "You cannot plan accuracy improvements the way you plan feature sprints." Most product teams learn this the hard way.
Same failure plays externally with higher stakes. Built document processing pipeline for B2B platform: 50-page report processing, structured extraction, reliable factual output. System performed well. Executive ran pipeline on same document twice, noticed phrasing differed between runs, called auditor at midnight. By morning, system faced review. Next week involved remediation: walking through architecture, explaining variance existed in language, not facts — structured output layers stayed stable, language models don't reproduce phrasing identically by design.
System performed exactly as built. But expectations were never communicated upfront. In compliance contexts, unexpected behavior equals wrong behavior. Midnight auditor calls are expensive ways learning to set expectations beforehand. Named upfront, non-determinism becomes design constraint. Discovered in sprint reviews, it becomes rollback.
Where in Your Product to Apply AI First
With business goals and agreed metrics established, the next question asks: which product part — which user flow, revenue stream, or operational process — benefits most from AI investment initially?
Not every product area moves target metrics equally. Not every area holds usable data. Skipping this step means allocating engineering time based on roadmap session lobbying, not impact potential highest points.
Before scoring anything, identify which gain type the AI investment would primarily deliver. Each has different owners, time horizons, and target metric connections.
Four Gain Types
- New business opportunity. AI unlocks what didn't exist — new revenue streams, now-viable customer segments, monetizable feature tiers. Gain types lean growth-oriented with longer payback windows. Relevant when targets are revenue or market expansion.
- Streamlined product. AI reduces existing product flow friction — faster onboarding, fewer value-reaching steps, reduced conversion drop-off. Products do existing work better and faster. Relevant when targets are activation, retention, or conversion.
- Improved core service. AI deepens primary value delivered — better recommendations, more accurate outputs, more personalized responses. Core propositions strengthen. Relevant when targets are engagement, NPS, or churn reduction.
- Improved internal operations. AI improves operations powering product delivery — data processing, content moderation, annotation pipelines, quality review workflows. Users don't see it directly, but it reduces cost or improves delivery consistency visibility. Relevant when targets are margin, operational cost, or delivery velocity.
Pre-Scoring Conversations Required
CEO running this exercise: Two conversations must happen before scoring. CPO should identify currently-prioritized product areas in planning horizons — input determines weighting "streamlined product" and "core service" against others. CFO should identify gain types with direct P&L paths now — input prevents scoring "new business opportunity" highly when board metrics target cost reduction, not growth. Run exercises without these inputs, matrices reflect room energy, not business logic.
Assessment Dimensions
Prosperaize runs this as scored working sessions with founding or leadership teams. Each product area gets assessed on three dimensions:
- Gain type and magnitude. Which four gain types does AI primarily deliver, and what is realistic business impact if productionized successfully? Score 1-5.
- Weighted contribution to target metric. How directly does gain type connect to agreed metric movement? High-potential new business opportunities score low when metrics target operational cost, not revenue. Weighting enforces discipline.
- Data availability. Not "do we have data," but "is data structured, accessible, shaped for AI system use?" Clickstream data and labeled training examples differ. Score separately because it determines feasibility, not just desirability.
Combinations produce prioritized shortlists grounded in business impact and implementation feasibility, not product roadmap politics.
Session value isn't scores. It's surfaced disagreements. When CPO scores product area data availability 4 and ML engineer scores it 2, that gap is insight. CPO sees user events flowing analytics platforms. ML engineer sees none are labeled for tasks at hand. Both prove right about different things — gaps surfaced here cost nothing. Discovered in sprints, they cost quarters.
The resulting quadrants:
| High Data Availability | Low Data Availability | |
|---|---|---|
| High AI Value | Start here. These are your first AI investments. | Data infrastructure first. High strategic value, but requires foundational work before AI. |
| Low AI Value | Quick wins later. Low complexity, moderate return. Don't lead with these. | Avoid. Neither the return nor the feasibility justifies the investment. |
One pattern I've consistently seen: engineers overestimate event data usability ("we track everything"), and product leaders overestimate how quickly raw behavioral data translates into trainable signals. Surfacing gaps in these sessions costs afternoons. Discovering in week six costs quarters.
From Gain Type to First Use Case
You've identified the product area and the type of gain it delivers. What remains are the two decisions that determine whether your AI initiative survives contact with production.
The first: translating your gain type into a set of candidate AI patterns — concrete approaches that experienced teams have validated for this category of problem. This is where the exercise stops being purely strategic and becomes technical and product-based. It requires a structured working session with product, design, engineering, AI, and delivery in the room.
The second: deciding which of those candidates to build first. Not every high-impact use case is worth pursuing first, and not every low-complexity use case is worth pursuing at all. The instinct is to pursue everything that looks promising. The result is predictable — too many AI use cases are scoped in parallel, complexity is underestimated across all of them, and a portfolio of incomplete prototypes that never reach production.
Good news for you! We've built the frameworks that can guide you towards making both decisions — a gain-type-to-pattern mapping and a gains vs. risk/complexity scoring matrix — and made them available as a downloadable toolkit.
From a Business Goal to One Defensible AI Use Case
Two structured frameworks that take your team from "what kind of AI could work here" to "which one do we build first." Fifteen validated AI patterns organized by gain type. A scoring matrix that produces a single starting point the entire team can defend. And a process designed to surface the assumption mismatches that cost an afternoon to resolve here and a quarter to discover in a sprint.
Built for leadership teams at product companies who have a business goal and a metric — and need a structured path to their first AI investment decision.
The Experience Multiplier
Output quality directly proportions to AI experience breadth in rooms.
Organizations attempting AI before — even failed attempts — generate significantly better proposals than organizations starting fresh. They've seen what doesn't survive POC stages. They've seen what breaks in production looking fine in prototypes. They know which solution categories consistently underestimate integration complexity. Pattern recognition changes proposals.
Constraints for fresh-start organizations aren't ambition. It's imagining what's possible.
You cannot propose what you cannot imagine. Teams never seeing recommendation models degrade silently in production from training data distribution shifts don't scope monitoring — discovering six months later when conversions drop. Teams never seeing fine-tuned classifiers fail on edge cases don't build escalation paths. Teams never seeing AI agents executing complex multi-step workflows across tools don't include them in opportunity maps — not because infeasibility exists, but because possibilities sit outside reference frames.
Traps aren't always inexperience. Sometimes depth without breadth produces different failure modes.
Worked with AI engineer holding five years concentrated in two domains: computer vision and graph neural networks. When public transport client requested new route recommendations, engineer proposed adapting graph neural networks modeling route structures as recommender systems. Five points creativity — routes are graph structures, logic wasn't wrong. Zero points risk awareness. GNNs for this use case would have been expensive, extremely risky, difficult explaining to transit authorities, disconnected from actual business questions: not "which routes are topologically interesting?" but "which routes generate enough ROI justifying running costs given driver availability?"
In our internal alignment session, we reframed the problem. ML models predicted route ROI against operating costs and driver availability — something the client could act on directly. On top of those, an LLM layer drafted weekly and monthly performance reports from the model outputs, removing a manual reporting task the operations team was spending significant time on each cycle.
The engineer had the experience. What she lacked was exposure to enough different problem types to recognize when her strongest tool wasn't the right one for the business question. That gap in breadth is exactly what external pattern recognition is designed to cover.
This explains why external partners change process quality. Prosperaize's value here isn't facilitation — it's bringing pattern recognition from multiple initiatives, industries, and failure modes into scoping conversations. Emerging proposals shape from what's been validated working across situations, not just technically plausible sounds.
Team's AI experience breadth when formulating strategy is a leading indicator whether strategy survives implementation contact.
The referenced process ultimately enforces shifting from building what sounds valuable to validating what actually is. It replaces momentum with intent.
Finally, ask yourselves if addressing hard questions sitting just beneath surfaces. Where in current thinking assuming clarity untested? Which problem parts understand through data versus intuition or prior experience? Are you committing to targets before understanding realistically achievable levels — and who carries risks when realities don't match plans?
At this stage, discipline matters more than speed. Once execution begins, systems simply expose upfront decision quality. By then, learning costs from earlier unanswered questions no longer remain theoretical.
If you want an external perspective on this process — someone who has run it across multiple industries and failure modes — that's what Prosperaize's Prosperity Audit is.
Dušan Stamenković is the founder of Prosperaize, an AI Asset Management Consultancy. He advises organizations on whether, where, and how to invest in AI — reducing risk and maximizing return across the AI investment lifecycle.
The question isn't whether your organization needs AI.
The question is whether anyone in the room can speak all the languages it demands, and what happens to your investment when they can't.
If you want to know where your AI investment is actually exposed, let's talk.

