I tested 112 Product Hunt startups with 2,240 queries across ChatGPT and Perplexity. The results challenge conventional wisdom about "Generative Engine Optimization" (GEO):
Full paper: arXiv:2601.00912 \n Code & Data:GitHub
Every startup founder is asking the same question: "How do I get ChatGPT to recommend my product?"
This is a reasonable concern. As ChatGPT becomes a go-to tool for product discovery, being invisible to it means being invisible to a growing segment of potential customers.
The emerging field of Generative Engine Optimization (GEO) promises to solve this. The concept, introduced by researchers at IIT Delhi, suggests that optimizing content with citations, statistics, and authoritative language can improve visibility in AI-generated responses.
But does it actually work for ChatGPT?
I decided to find out.
I collected data on 112 products from Product Hunt's December 2024 - January 2025 leaderboard. These represent the "best case" for new startups: recently launched, actively marketed, and getting meaningful traction.
For each product, I gathered:
Each product was tested with 20 queries:
Direct Queries (3 per product)
"What is [ProductName]?" "Tell me about [ProductName]" "Have you heard of [ProductName]?"
Discovery Queries (7 per product)
"What are the best [Category] tools launched in 2025?" "Recommend some new [Category] products" "What [Category] startups should I check out?" "I'm looking for a [Category] solution. What are my options?"
The distinction matters. Direct queries test recognition—does ChatGPT know your product exists? Discovery queries test recommendation—will it actually suggest your product to users?
| Metric | ChatGPT | Perplexity | |----|----|----| | Direct Recognition | 99.4% | 94.3% | | Organic Discovery | 3.3% | 8.3% | | Visibility Gap | 30:1 | 11:1 |
ChatGPT knows almost every startup exists. When asked directly, it can provide accurate descriptions, features, and use cases.
But when users ask for recommendations—the queries that actually drive customer acquisition—these same startups almost never appear.
This is the Discovery Gap: the massive divide between ChatGPT's knowledge and its recommendations.
\
\
To measure GEO optimization, I adapted the scoring methodology from Aggarwal et al. (2024) — the IIT Delhi researchers who introduced the concept of Generative Engine Optimization in their seminal paper "GEO: Generative Engine Optimization".
Their framework measures optimization across multiple dimensions:
Using this established GEO scoring framework, I calculated scores for each product and compared them to ChatGPT discovery rates.
The correlation?
r = -0.10 (not statistically significant)
Products with high GEO scores were no more likely to be recommended by ChatGPT than products with low scores. The fancy optimization tactics that the GEO literature promotes showed zero measurable impact.
If GEO doesn't work, what does?
| Predictor | Correlation | p-value | |----|----|----| | Reddit Mentions | +0.40 | <0.01 | | Referring Domains | +0.32 | <0.001 | | Product Hunt Upvotes | +0.23 | <0.05 | | GEO Score | -0.10 | n.s. |
The strongest predictors are the same factors that have driven SEO for decades:
\
\
\
Perplexity, with its real-time web search, achieved 2.5x better discovery rates than ChatGPT (8.3% vs 3.3%).
This suggests that web access meaningfully improves an LLM's ability to surface new products. ChatGPT, limited to its training data, struggles more with recent launches.
Based on my analysis, I have three hypotheses:
ChatGPT is trained on web data up to a knowledge cutoff. For products launched after that cutoff, no amount of GEO optimization will help—the content simply isn't in the training set.
GEO techniques optimize the content of your pages. But ChatGPT appears to weight external signals (backlinks, mentions, authority) more heavily when deciding what to recommend.
A perfectly optimized landing page with zero backlinks may still be invisible.
ChatGPT's knowledge retrieval and recommendation systems appear to work differently. Knowing about a product doesn't mean recommending it. The 30:1 gap proves this.
If you're a startup founder thinking about ChatGPT visibility, here's my takeaway:
The GEO hype cycle is in full swing, but my data suggests it doesn't work for new products targeting ChatGPT. Save your optimization energy for proven strategies.
Backlinks and referring domains showed the strongest correlation with ChatGPT discovery. The boring work of building genuine web presence still matters.
Reddit mentions were the single strongest predictor (r = +0.40). Authentic community engagement drives AI visibility.
If you want to measure LLM visibility, track organic discovery queries—not just direct recognition. The 30:1 gap between them is where the opportunity lies.
LLM capabilities are evolving rapidly. GEO may become relevant as models improve. But for now, the fundamentals win.
This study has limitations:
I've open-sourced all code and data for replication. If you run this experiment and find different results, I'd love to hear about it.


