Vishal Misra: Transformers learn correlations, not causations, the significance of in-context learning, and the role of Bayesian updating in AI

Understanding transformers’ limitations reveals the crucial shift needed from correlation to causation for true AI advancement.

Key Takeaways

Transformers primarily learn correlations, not causations, limiting their ability to achieve true intelligence.
Achieving AGI requires models that can transition from learning correlations to understanding causations.
Large language models generate text by predicting the next token based on probability distributions.
The context provided in prompts significantly influences the output of language models.
Language models operate on sparse matrices where many token combinations are nonsensical.
In-context learning allows LLMs to solve problems in real-time using examples.
Domain-specific languages (DSLs) can simplify complex database queries into natural language.
In-context learning in LLMs is similar to Bayesian updating, adjusting probabilities with new evidence.
The debate between Bayesian and frequentist approaches affects the perception of new machine learning models.
The Bayesian wind tunnel concept offers a controlled environment for testing machine learning architectures.
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.
The transition from correlation to causation is a significant hurdle in AI development.
Contextual relevance in LLMs highlights the importance of prompt selection.
Sparse matrices in language models enhance efficiency by filtering out irrelevant token combinations.
The Bayesian wind tunnel provides a novel framework for evaluating machine learning models.

Guest intro

Vishal Misra is Professor of Computer Science and Electrical Engineering and Vice Dean of Computing and AI at Columbia University’s School of Engineering. He returns to the a16z Podcast to discuss his latest research revealing how transformers in LLMs update predictions in a precise, mathematically predictable manner as they process new information. His work highlights the gap to AGI, emphasizing the need for continuous post-training learning and causal understanding over pattern matching.

Understanding transformers and LLMs

— Vishal Misra
LLMs primarily learn correlations rather than causations, which limits their intelligence.
— Vishal Misra
Achieving AGI requires models that can learn causations, not just correlations.
— Vishal Misra
LLMs generate text by constructing a probability distribution for the next token.
— Vishal Misra
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.

The role of context in language models

The behavior of language models is influenced by the prior context provided in prompts.
— Vishal Misra
Contextual relevance in LLMs highlights the importance of prompt selection.
Language models operate on a sparse matrix where many combinations of tokens are nonsensical.
— Vishal Misra
Sparse matrices enhance efficiency by filtering out irrelevant token combinations.
The context provided can drastically change the output of language models.
Understanding how language models generate text based on input prompts is essential.

In-context learning and real-time problem solving

In-context learning allows LLMs to learn and solve problems in real-time.
— Vishal Misra
LLMs process and learn from new information through examples.
In-context learning resembles Bayesian updating, adjusting probabilities with new evidence.
— Vishal Misra
This mechanism is crucial for understanding the capabilities of LLMs.
Real-time problem solving in LLMs is enabled by in-context learning.
The ability to learn from examples showcases the adaptability of LLMs.

Domain-specific languages and data accessibility

Domain-specific languages (DSLs) convert natural language queries into a processable format.
— Vishal Misra
DSLs simplify complex database queries into natural language.
The creation of DSLs showcases innovation in using AI for specific applications.
Understanding the challenges of querying complex databases is essential.
DSLs enhance user interactions with data by simplifying query processes.
The development of DSLs highlights the role of AI in data accessibility.
This approach provides a technical solution to common problems in data accessibility.

Bayesian updating and statistical approaches in AI

In-context learning in language models resembles Bayesian updating.
— Vishal Misra
Understanding Bayesian inference is crucial for grasping how LLMs process information.
The distinction between Bayesian and frequentist approaches affects AI model perceptions.
— Vishal Misra
The debate between these approaches impacts the reception of new research.
Bayesian updating provides a clear mechanism for in-context learning in LLMs.
This statistical concept links well-established methodologies with modern AI processes.

The Bayesian wind tunnel and model testing

The Bayesian wind tunnel concept allows for testing machine learning architectures.
— Vishal Misra
This concept provides a controlled environment for evaluating models.
Testing architectures like transformers, MAMBA, LSTMs, and MLPs is facilitated by this framework.
Understanding the concept of a wind tunnel in aerospace helps grasp its application in AI.
The Bayesian wind tunnel offers a novel framework for advancing machine learning.
This approach is critical for evaluating and improving AI models.
The controlled testing environment enhances the reliability of model assessments.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Understanding transformers’ limitations reveals the crucial shift needed from correlation to causation for true AI advancement.

Key Takeaways

Transformers primarily learn correlations, not causations, limiting their ability to achieve true intelligence.
Achieving AGI requires models that can transition from learning correlations to understanding causations.
Large language models generate text by predicting the next token based on probability distributions.
The context provided in prompts significantly influences the output of language models.
Language models operate on sparse matrices where many token combinations are nonsensical.
In-context learning allows LLMs to solve problems in real-time using examples.
Domain-specific languages (DSLs) can simplify complex database queries into natural language.
In-context learning in LLMs is similar to Bayesian updating, adjusting probabilities with new evidence.
The debate between Bayesian and frequentist approaches affects the perception of new machine learning models.
The Bayesian wind tunnel concept offers a controlled environment for testing machine learning architectures.
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.
The transition from correlation to causation is a significant hurdle in AI development.
Contextual relevance in LLMs highlights the importance of prompt selection.
Sparse matrices in language models enhance efficiency by filtering out irrelevant token combinations.
The Bayesian wind tunnel provides a novel framework for evaluating machine learning models.

Guest intro

Understanding transformers and LLMs

— Vishal Misra
LLMs primarily learn correlations rather than causations, which limits their intelligence.
— Vishal Misra
Achieving AGI requires models that can learn causations, not just correlations.
— Vishal Misra
LLMs generate text by constructing a probability distribution for the next token.
— Vishal Misra
Understanding the mechanics of LLMs is crucial for leveraging their applications effectively.

The role of context in language models

The behavior of language models is influenced by the prior context provided in prompts.
— Vishal Misra
Contextual relevance in LLMs highlights the importance of prompt selection.
Language models operate on a sparse matrix where many combinations of tokens are nonsensical.
— Vishal Misra
Sparse matrices enhance efficiency by filtering out irrelevant token combinations.
The context provided can drastically change the output of language models.
Understanding how language models generate text based on input prompts is essential.

In-context learning and real-time problem solving

In-context learning allows LLMs to learn and solve problems in real-time.
— Vishal Misra
LLMs process and learn from new information through examples.
In-context learning resembles Bayesian updating, adjusting probabilities with new evidence.
— Vishal Misra
This mechanism is crucial for understanding the capabilities of LLMs.
Real-time problem solving in LLMs is enabled by in-context learning.
The ability to learn from examples showcases the adaptability of LLMs.

Domain-specific languages and data accessibility

Domain-specific languages (DSLs) convert natural language queries into a processable format.
— Vishal Misra
DSLs simplify complex database queries into natural language.
The creation of DSLs showcases innovation in using AI for specific applications.
Understanding the challenges of querying complex databases is essential.
DSLs enhance user interactions with data by simplifying query processes.
The development of DSLs highlights the role of AI in data accessibility.
This approach provides a technical solution to common problems in data accessibility.

Bayesian updating and statistical approaches in AI

In-context learning in language models resembles Bayesian updating.
— Vishal Misra
Understanding Bayesian inference is crucial for grasping how LLMs process information.
The distinction between Bayesian and frequentist approaches affects AI model perceptions.
— Vishal Misra
The debate between these approaches impacts the reception of new research.
Bayesian updating provides a clear mechanism for in-context learning in LLMs.
This statistical concept links well-established methodologies with modern AI processes.

The Bayesian wind tunnel and model testing

The Bayesian wind tunnel concept allows for testing machine learning architectures.
— Vishal Misra
This concept provides a controlled environment for evaluating models.
Testing architectures like transformers, MAMBA, LSTMs, and MLPs is facilitated by this framework.
Understanding the concept of a wind tunnel in aerospace helps grasp its application in AI.
The Bayesian wind tunnel offers a novel framework for advancing machine learning.
This approach is critical for evaluating and improving AI models.
The controlled testing environment enhances the reliability of model assessments.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Loading more articles…

You’ve reached the end

Add us on Google

`;
}

function createMobileArticle(article) {
const displayDate = getDisplayDate(article);
const editorSlug = article.editor ? article.editor.toLowerCase().replace(/\s+/g, ‘-‘) : ”;
const captionHtml = article.imageCaption ? `

${article.imageCaption}

` : ”;
const authorHtml = article.isPressRelease ? ” : `
`;

return `

${captionHtml}

${article.subheadline ? `

${article.subheadline}

` : ”}

${createSocialShare()}

${authorHtml}
${displayDate}

${article.content}

${article.isPressRelease ? ” : article.isSponsored ? `

Disclosure: This is sponsored content. It does not represent Crypto Briefing’s editorial views. For more information, see our Editorial Policy.

` : `

Disclosure: This article was edited by ${article.editor}. For more information on how we create and review content, see our Editorial Policy.

`;
}

function createDesktopArticle(article, sidebarAdHtml) {
const editorSlug = article.editor ? article.editor.toLowerCase().replace(/\s+/g, ‘-‘) : ”;
const displayDate = getDisplayDate(article);
const captionHtml = article.imageCaption ? `

${article.imageCaption}

` : ”;
const categoriesHtml = article.categories.map((cat, i) => {
const separator = i < article.categories.length – 1 ? ‘|‘ : ”;
return `${cat}${separator}`;
}).join(”);
const desktopAuthorHtml = article.isPressRelease ? ” : `
`;

return `

${categoriesHtml}

${article.subheadline ? `

${article.subheadline}

` : ”}

${desktopAuthorHtml}
${displayDate}
${createSocialShare()}

${captionHtml}

${article.content}
${article.isPressRelease ? ” : article.isSponsored ? `

Disclosure: This is sponsored content. It does not represent Crypto Briefing’s editorial views. For more information, see our Editorial Policy.

` : `

Disclosure: This article was edited by ${article.editor}. For more information on how we create and review content, see our Editorial Policy.

`;
}

function loadMoreArticles() {
if (isLoading || !hasMore) return;

isLoading = true;
loadingText.classList.remove(‘hidden’);

// Build form data for AJAX request
const formData = new FormData();
formData.append(‘action’, ‘cb_lovable_load_more’);
formData.append(‘current_post_id’, lastLoadedPostId);
formData.append(‘primary_cat_id’, primaryCatId);
formData.append(‘before_date’, lastLoadedDate);
formData.append(‘loaded_ids’, loadedPostIds.join(‘,’));

fetch(ajaxUrl, {
method: ‘POST’,
body: formData
})
.then(response => response.json())
.then(data => {
isLoading = false;
loadingText.classList.add(‘hidden’);

if (data.success && data.has_more && data.article) {
const article = data.article;
const sidebarAdHtml = data.sidebar_ad_html || ”;

// Check for duplicates
if (loadedPostIds.includes(article.id)) {
console.log(‘Duplicate article detected, skipping:’, article.id);
// Update pagination vars and try again
lastLoadedDate = article.publishDate;
loadMoreArticles();
return;
}

// Add to mobile container
mobileContainer.insertAdjacentHTML(‘beforeend’, createMobileArticle(article));

// Add to desktop container with fresh ad HTML
desktopContainer.insertAdjacentHTML(‘beforeend’, createDesktopArticle(article, sidebarAdHtml));

// Update tracking variables
loadedPostIds.push(article.id);
lastLoadedPostId = article.id;
lastLoadedDate = article.publishDate;

// Execute any inline scripts in the new content (for ads)
const newArticle = desktopContainer.querySelector(`article[data-article-id=”${article.id}”]`);
if (newArticle) {
const scripts = newArticle.querySelectorAll(‘script’);
scripts.forEach(script => {
const newScript = document.createElement(‘script’);
if (script.src) {
newScript.src = script.src;
} else {
newScript.textContent = script.textContent;
}
document.body.appendChild(newScript);
});
}

// Trigger Ad Inserter if available
if (typeof ai_check_and_insert_block === ‘function’) {
ai_check_and_insert_block();
}

// Trigger Google Publisher Tag refresh if available
if (typeof googletag !== ‘undefined’ && googletag.pubads) {
googletag.cmd.push(function() {
googletag.pubads().refresh();
});
}

} else if (data.success && !data.has_more) {
hasMore = false;
endText.classList.remove(‘hidden’);
} else if (!data.success) {
console.error(‘AJAX error:’, data.error);
hasMore = false;
endText.textContent=”Error loading more articles”;
endText.classList.remove(‘hidden’);
}
})
.catch(error => {
console.error(‘Fetch error:’, error);
isLoading = false;
loadingText.classList.add(‘hidden’);
hasMore = false;
endText.textContent=”Error loading more articles”;
endText.classList.remove(‘hidden’);
});
}

// Set up IntersectionObserver
const observer = new IntersectionObserver(function(entries) {
if (entries[0].isIntersecting) {
loadMoreArticles();
}
}, { threshold: 0.1 });

observer.observe(loadingTrigger);
})();

Source: https://cryptobriefing.com/vishal-misra-transformers-learn-correlations-not-causations-the-significance-of-in-context-learning-and-the-role-of-bayesian-updating-in-ai-ai-a16z/