Authors: Tina, Dongmei , InfoQ
Just now, the X Engineering team announced on X that they have officially open-sourced the X recommendation algorithm. According to the introduction, this open-source library contains the core recommendation system that powers the "Recommended for You" feed on X. It combines in-network content (from accounts followed by users) with out-of-network content (discovered through machine learning-based retrieval) and uses a Grok-based Transformer model to rank all content. In other words, the algorithm uses the same Transformer architecture as Grok.
Open source address: https://x.com/XEng/status/2013471689087086804
X's recommendation algorithm is responsible for generating the "For You Feed" content that users see on the main interface. It obtains candidate posts from two main sources:
The accounts you follow (In-Network / Thunder)
Other posts found on the platform (Out-of-Network / Phoenix)
These candidate entries are then processed, filtered, and sorted by relevance.
So, what is the core architecture and operating logic of the algorithm?
The algorithm first extracts candidate content from two types of sources:
Content within your following list: Posts from accounts you actively follow.
Non-interesting content: Posts that the system retrieves from the entire content library that you might be interested in.
The goal of this stage is to "find potentially relevant posts".
The system automatically removes low-quality, duplicate, illegal, or inappropriate content. For example:
Content of blocked accounts
Topics that users are explicitly not interested in
Illegal, outdated or invalid posts
This ensures that only valuable candidates are processed during the final sorting.
The core of this open-source algorithm is that the system uses a Grok-based Transformer model (similar to a large language model/deep learning network) to score each candidate post. The Transformer model predicts the probability of each action based on the user's historical behavior (likes, replies, shares, clicks, etc.). Finally, these action probabilities are weighted and combined into a comprehensive score; posts with higher scores are more likely to be recommended to the user.
This design essentially abolishes the traditional method of manually extracting features, and instead uses an end-to-end learning approach to predict user interests.
This is not the first time Musk has open-sourced the X recommendation algorithm.
Back on March 31, 2023, as promised when Musk acquired Twitter, he officially open-sourced a portion of Twitter's source code, including the algorithm that recommends tweets in users' timelines . On the day of the open-sourcing, the project garnered over 10,000 stars on GitHub.
At the time, Musk stated on Twitter that this release covered "most of the recommendation algorithm," with the remaining algorithms to be released gradually. He also mentioned his hope that "independent third parties could determine with reasonable accuracy what Twitter might show to users."
In a Space discussion about the algorithm release, he said the open-source project aims to make Twitter "the most transparent system on the internet" and as robust as Linux, the most well-known and successful open-source project. "The overall goal is to ensure that users who continue to support Twitter can enjoy it to the fullest extent."
Nearly three years have passed since Musk first open-sourced the X algorithm. As a super KOL in the tech world, Musk has already done a lot of publicity for this open-source release.
On January 11, Musk posted on X that he would open-source the new X algorithm (including all the code used to determine which organic search content and advertising content to recommend to users) within 7 days.
This process will be repeated every 4 weeks, with detailed developer notes to help users understand what changes have been made.
Today, his promise has been fulfilled once again.
When Elon Musk mentions "open source" again, the first reaction from the outside world is not technological idealism, but rather the pressure of reality.
Over the past year, X has repeatedly embroiled in controversy due to its content distribution mechanism. The platform has been widely criticized for its algorithmic bias towards and promotion of right-wing viewpoints, a tendency not seen as isolated incidents but considered to be systemic. A research report published last year pointed out that X's recommendation system exhibited a significant new bias in the dissemination of political content.
Meanwhile, some extreme cases have further amplified external skepticism. Last year, an uncensored video involving the assassination of American right-wing activist Charlie Kirk spread rapidly on the X platform, causing a public outcry. Critics argued that this not only exposed the failure of the platform's moderation mechanism but also highlighted the implicit power of algorithms in deciding "what to amplify and what not to amplify. "
Against this backdrop, Musk's sudden emphasis on algorithmic transparency is difficult to interpret simply as a purely technical decision.
After the X recommendation algorithm was open-sourced, users on the X platform summarized the following five points regarding its mechanism:
In short: Communicate with your audience, build relationships, and keep users engaged within the app. It's actually quite simple.
Some netizens also noticed that while the architecture is open source, some components remain closed. One netizen stated that this release is essentially a framework, without an engine. What exactly is missing?
Missing weight parameters - The code confirms "positive behavior bonus" and "negative behavior penalty", but unlike the 2023 version, the specific values have been removed.
Hidden model weights - do not include the model's internal parameters and calculations.
Unpublished training data - We know nothing about the data used to train the model, how user behavior was sampled, or how "good" and "bad" samples were constructed.
For ordinary X users, the open-source nature of X's algorithm won't have a significant impact. However, greater transparency can explain why some posts gain exposure while others go unnoticed, and it allows researchers to study how the platform ranks content.
In most technical discussions, recommender systems are often seen as part of the back-end engineering—low-key, complex, and rarely in the spotlight. However, a true analysis of how internet giants operate reveals that recommender systems are not peripheral modules, but rather "infrastructure-level entities" supporting the entire business model. This is why they can be called the "silent behemoths" of the internet industry.
Publicly available data has repeatedly confirmed this. Amazon has disclosed that approximately 35% of purchases on its platform come directly from its recommendation system; Netflix is even more aggressive, with about 80% of viewing time driven by recommendation algorithms; YouTube is similar, with about 70% of viewing coming from its recommendation system, especially its feed. As for Meta, while it has never given a specific percentage, its technical team has mentioned that about 80% of the computing cycles in its internal computing clusters are dedicated to serving recommendation-related tasks.
What do these numbers mean? Removing the recommendation system from these products is almost like tearing down the foundation . Take Meta, for example: ad placement, user dwell time, and conversion rates are all built on the recommendation system. The recommendation system not only determines "what users see," but also directly determines "how the platform makes money."
However, this very system, which determines life and death, has long faced the problem of extremely high engineering complexity.
In traditional recommender system architectures, it's difficult to use a single, unified model to cover all scenarios. Real-world production systems are often highly fragmented. For example, companies like Meta, LinkedIn, and Netflix typically run 30 or more specialized models simultaneously behind a complete recommender pipeline: recall models, coarse-ranking models, fine-ranking models, and re-ranking models, each optimized for different objective functions and business metrics. Behind each model, there are often one or more teams responsible for feature engineering, training, parameter tuning, deployment, and continuous iteration.
The costs of this approach are obvious: engineering complexity, high maintenance costs, and difficulty in cross-task collaboration. Once someone proposes, "Can a single model solve multiple recommendation problems?", it means an order-of-magnitude reduction in complexity for the entire system. This is precisely the goal the industry has long desired but struggled to achieve.
The emergence of large-scale language models has provided a new possible path for recommender systems.
LLM has proven in practice to be an extremely powerful general-purpose model: it has strong transferability across different tasks, and its performance continues to improve as data scale and computing power expand. In contrast, traditional recommendation models are often "task-customized," making it difficult to share capabilities across multiple scenarios.
More importantly, a single large model not only simplifies engineering but also offers the potential for "cross-learning." When the same model handles multiple recommendation tasks simultaneously, the signals from different tasks can complement each other, and the model can more easily evolve as the data scale grows. This is precisely the characteristic that recommendation systems have long desired but have struggled to achieve through traditional methods.
What did LLM change? It actually changed everything from feature engineering to the ability to understand features.
From a methodological perspective, the biggest change that LLM brings to recommender systems occurs in the core process of "feature engineering".
In traditional recommendation systems, engineers first need to manually construct a large number of signals: user click history, dwell time, similar user preferences, content tags, etc., and then explicitly tell the model "please make a judgment based on these features." The model itself does not understand the semantics of these signals; it only learns the mapping relationship in the numerical space.
With the introduction of language models, this process is highly abstracted. You no longer need to specify "look at this signal, ignore that signal" one by one, but can directly describe the problem to the model: This is a user, this is content; this user has liked similar content in the past, and other users have also given positive feedback to this content—now please determine whether this content should be recommended to this user.
Language models inherently possess understanding capabilities; they can independently determine which information constitutes important signals and how to synthesize these signals to make decisions. In a sense, they don't merely execute recommendation rules, but rather "understand the act of recommendation."
This capability stems from the fact that LLMs are exposed to massive amounts of diverse data during the training phase, making them more adept at capturing subtle yet important patterns. In contrast, traditional recommender systems must rely on engineers to explicitly enumerate these patterns, and if any are missed, the model cannot detect them.
From a backend perspective, this change is not unfamiliar. Just as GPT generates an answer based on contextual information when you ask it a question, it can also make a judgment based on existing information when you ask it, "Would I be interested in this content?" To some extent, language models themselves already possess the ability to "recommend."

