A controlled study with 240 participants compares three feedback mechanisms for AI-driven media-bias detection. Color-highlighted sentences (“Highlights”) generate significantly higher engagement and more efficient feedback collection than both comparison-based prompts and a control group, without reducing agreement with expert annotations. The findings support integrating Highlights into the NewsUnfold system to optimize future data collection.A controlled study with 240 participants compares three feedback mechanisms for AI-driven media-bias detection. Color-highlighted sentences (“Highlights”) generate significantly higher engagement and more efficient feedback collection than both comparison-based prompts and a control group, without reducing agreement with expert annotations. The findings support integrating Highlights into the NewsUnfold system to optimize future data collection.

Color-Coded Bias Warnings Boost Accuracy and Efficiency in AI Feedback

2025/12/03 06:19
5 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com
  1. Abstract and Introduction
  2. Related Work
  3. Feedback Mechanisms
  4. The NewsUnfold Platform
  5. Results
  6. Discussion
  7. Conclusion
  8. Acknowledgments and References

A. Feedback Mechanism Study Texts

B. Detailed UX Survey Results for NewsUnfold

C. Material Bias and Demographics of Feedback Mechanism Study

D. Additional Screenshots

\

3 Feedback Mechanisms

As the evaluation of feedback mechanisms for media bias re- mains unexplored, in a preliminary study, we design and as- sess two HITL feedback mechanisms for their suitability for data collection. Using sentences from news articles labeled by the classifier from Spinde, Hamborg, and Gipp (2020), we compare the mechanisms Highlights, Comparison, and a control group without visual highlights. Our analysis focuses on (1) dataset quality, assessed using Krippendorff’s α; (2) engagement, quantified by feedback given on each sentence6; (3) agreement with expert annotations, evaluated through F1 scores; and (4) feedback efficiency, measured by the time required in combination with engagement and agreement.

\ In the Highlights mechanism, biased sentences are col- ored yellow, and non-biased ones are grey, inspired by Spinde et al. (2022). Participants indicate their agreement or disagreement with these classifications through a floating module (Figure 2). The Comparison mechanism dis- plays sentence pairs. For the first sentence, participants pro- vide feedback on the AI’s classification as in Highlights. The second sentence has no color coding, prompting users with ”What do you think?” (Figure 3), thereby aiming to foster an independent bias assessment and mitigate anchoring effects. Participants in the control group do not see any highlights, solely encountering the feedback module with the second question from Comparison.

\ We use the BABE classifier trained by Spinde et al. (2021b) to generate the sentence labels and highlights. Currently, the classifier showcases the highest performance by fine-tuning the large language model RoBERTa with an extensive dataset on linguistic bias annotated by experts on both sentence and word levels. The BABE-based model on Huggingface[7] generates the probability of a sentence being biased or not biased for each article. We accordingly assign the label with the higher probability.

\ Study Design

\ To assess the two mechanisms, we recruit 240 participants, balanced regarding gender, from Prolific.[8] On the study website built for this purpose, depicted in Figure 13, they view two articles from different political orientations paired with one feedback mechanism per group. During the study, users freely determine their annotation count and time spent, with a progress bar showing the number of annotated sentences. Not interacting with any sentences prompts a pop-up, but they can click ’next’ to proceed.

\

\ We guarantee GDPR conformity through a preliminary data processing agreement. A demographic survey and

\ Figure 2: The feedback mechanism Highlights uses the BABE classifier to highlight biased sentences in yellow and not biased sentences in grey. Readers can agree or disagree with this classification through the feedback module on the right.

\ Figure 3: The feedback mechanism Comparison operates on sentence pairs and uses the BABE classifier to highlight the first sentence as biased in yellow. Readers can agree or disagree with this classification through the feedback module on the right. The next sentence is merely outlined. Here, the feedback module asks for a bias rating without the classifier anchor.

\ an introduction to media bias follow (Appendix A). A post-introduction attention test confirms participants’ understanding of media bias, which, if failed twice, results in study exclusion. Then, participants read through a description of the study task and proceed to give feedback on the two articles. Lastly, a concluding trustworthiness question ensures data reliability. If participants clicked through the study inattentively, they could indicate that their data is not usable for research (Draws et al. 2021) while still receiving full pay (Spinde et al. 2022).

\ Results

\ The 240 participants in the study spent an average of 11:24 minutes, with a compensation rate of £7.89/hr. Twelve participants failed the attention test once, but only one was excluded for a second failure. We further excluded 33 participants who flagged their data as unsuitable for research. Therefore, the analysis includes data from 206 participants: 69 control group participants, 66 Comparison group participants, and 71 Highlights group participants (p = .84, f = .23, α = .05). 104 participants identified as female, 99 as male, and 3 as other, with an average age of 36.62 years (SD = 13.74). The sample, on average, exhibits a left slant (Figure 11 and Figure 12) with higher education (Figure 7). 196 participants indicated advanced English levels, 9 inter- mediate, and 1 beginner (Figure 9). News reading frequency averaged around once a day (Figure 10).

\ Notably, we observe a high overall engagement, with even the least annotated sentences receiving feedback from 70% of the participants. We detail the results of the feedback mechanism study, including engagement, IAA, F1 scores, and efficiency, in Table 1. The Highlights group exhibits higher engagement than the Comparison group, containing more collected data. Also, Highlights demonstrates higher efficiency by collecting more feedback data in less time without compromising quality measured by IAA and agree- ment with the expert standard.

\ The increases in engagement and efficiency are significant at a .05 significance level. Due to variance inhomogeneity indicated by a significant Levene test (p <.05), we applied Welch’s ANOVA for unequal variances. Post-hoc HolmBonferroni adjustments revealed significant differences between the CONTROL and HIGHLIGHTS groups, with p <.0167 for efficiency and p <.025 for engagement. The Games-Howell post-hoc test confirmed these results.As in previous research, IAA and F1 scores from crowdsourcers are low due to the complex and subjective task (Spinde et al. 2021c). F1 score differences are not significant (ANOVA

\ Table 1: Overview of Feedback Interactions per Group.

\ with Holm-Bonferroni, p >.05). Given the comparable IAA and F1 scores across groups, we integrate Highlights within NewsUnfold to optimize data collection efficiency.

\

:::info Authors:

(1) Smi Hinterreiter;

(2) Martin Wessel;

(3) Fabian Schliski;

(4) Isao Echizen;

(5) Marc Erich Latoschik;

(6) Timo Spinde.

:::


:::info This paper is available on arxiv under CC0 1.0 license.

:::

[6] Readers can modify their annotations at any time; however, each unique sentence annotation counts as a single interaction for our feedback metric.

\ [7] https://huggingface.co/mediabiasgroup/da-roberta-babe-ft

\ [8] https://www.prolific.co

\ [9] Experts have at least six months experience in media bias. Consensus was achieved through majority or discussion.

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Why Localization Services Matter for Software Companies

Why Localization Services Matter for Software Companies

Rarely does software designed for one market translate smoothly to another. The most obvious obstacle is language, but it’s not the only one. Before a product feels
Share
Techbullion2026/03/25 19:10
₹71L CoinDCX Fraud Case Turns, Court Finds No Link to Founders

₹71L CoinDCX Fraud Case Turns, Court Finds No Link to Founders

Court grants bail to CoinDCX founders after ₹71L scam traced to fake site; no link found, funds recovered, platform secure. The court granted bail to CoinDCX founders
Share
LiveBitcoinNews2026/03/25 19:43
UK crypto holders brace for FCA’s expanded regulatory reach

UK crypto holders brace for FCA’s expanded regulatory reach

The post UK crypto holders brace for FCA’s expanded regulatory reach appeared on BitcoinEthereumNews.com. British crypto holders may soon face a very different landscape as the Financial Conduct Authority (FCA) moves to expand its regulatory reach in the industry. A new consultation paper outlines how the watchdog intends to apply its rulebook to crypto firms, shaping everything from asset safeguarding to trading platform operation. According to the financial regulator, these proposals would translate into clearer protections for retail investors and stricter oversight of crypto firms. UK FCA plans Until now, UK crypto users mostly encountered the FCA through rules on promotions and anti-money laundering checks. The consultation paper goes much further. It proposes direct oversight of stablecoin issuers, custodians, and crypto-asset trading platforms (CATPs). For investors, that means the wallets, exchanges, and coins they rely on could soon be subject to the same governance and resilience standards as traditional financial institutions. The regulator has also clarified that firms need official authorization before serving customers. This condition should, in theory, reduce the risk of sudden platform failures or unclear accountability. David Geale, the FCA’s executive director of payments and digital finance, said the proposals are designed to strike a balance between innovation and protection. He explained: “We want to develop a sustainable and competitive crypto sector – balancing innovation, market integrity and trust.” Geale noted that while the rules will not eliminate investment risks, they will create consistent standards, helping consumers understand what to expect from registered firms. Why does this matter for crypto holders? The UK regulatory framework shift would provide safer custody of assets, better disclosure of risks, and clearer recourse if something goes wrong. However, the regulator was also frank in its submission, arguing that no rulebook can eliminate the volatility or inherent risks of holding digital assets. Instead, the focus is on ensuring that when consumers choose to invest, they do…
Share
BitcoinEthereumNews2025/09/17 23:52