An experimental study shows that developers often misperceive which software testing techniques are most effective for them. While techniques differ in actual defectAn experimental study shows that developers often misperceive which software testing techniques are most effective for them. While techniques differ in actual defect

Study Finds Software Testers Often Misjudge Which Techniques Work Best

2025/12/16 05:00
8 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract

1 Introduction

2 Original Study: Research Questions and Methodology

3 Original Study: Validity Threats

4 Original Study: Results

5 Replicated Study: Research Questions and Methodology

6 Replicated Study: Validity Threats

7 Replicated Study: Results

8 Discussion

9 Related Work

10 Conclusions And References

\

4 Original Study: Results

Of the 32 students participating in the experiment, nine did not complete the questionnaire11 and were removed from the analysis. Table 9 shows the balance of the experiment before and after participants submitted the questionnaire. We can see that G6 is the most affected group, with 4 missing people.

Appendix B shows the analysis of the experiment. The results show that program and technique are statistically significant (and therefore are influencing effectiveness), while group and the technique by program interaction are not significant. As regards the techniques, EP shows a higher effectiveness, followed by BT and then by CR. These results are interesting, as all techniques are able to detect all defects. Additionally, more defects are found in ntree compared to cmdline and nametbl, where the same amount of defects are found.

\ Note that ntree is the program applied the first day, has the highest Halstead metrics, and it is not the smallest program or the one with lowest complexity. These results suggest that:

– There is no maturation effect. The program where highest effectiveness is obtained is the one used the first day.

– There is no interaction with selections effect. Group is not significant.

– Mortality does not affect experimental results. The analysis technique used (Linear Mixed-Effects Models) is robust to lack of balance.

– Order of training could be affecting results. The highest effectiveness is obtained in the last technique taught, while the lowest effectiveness is obtained in the first technique taught. This suggests that techniques taught last are more effective than techniques taught first. This could be due to participants remembering better last techniques.

– Results cannot be generalised to other subject types.

4.1 RQ1.1: Participants’ Perceptions

Table 10 shows the percentage of participants that perceive each technique to be the most effective. We cannot reject the null hypothesis that the frequency distribution of the responses to the questionnaire item (Using which technique did you detect most defects? ) follows a uniform distribution12 (χ 2 (2,N=23)=2.696, p=0.260). This means that the number of participants perceiving a particular technique as being more effective cannot be considered different for all three techniques. Our data do not support the conclusion that techniques are differently frequently perceived as being the most effective.

4.2 RQ1.2: Comparing Perceptions with Reality

Table 11 shows the value of kappa along with its 95% confidence interval (CI), overall and for each technique separately. We find that all values for kappa with respect to the questionnaire item (Using which technique did you detect most defects?) are consistent with lack of agreement (κ<0.4, poor). Although the upper bound of the 95% CIs show agreement, 0 belongs to all 95% CI, meaning that agreement by chance cannot be ruled out. Therefore, our data do not support the conclusion that participants correctly perceive the most effective technique for them.

\ It is worth noting that agreement is higher for the code review technique (the upper bound of the 95% CI in this case shows excellent agreement). This could be attributed to participants being able to remember the actual number of defects identified in code reading whereas for testing techniques they only wrote the test cases. On the other hand, participants do not know the number of defects injected in each program.

As lack of agreement cannot be ruled out, we examine whether the perceptions are biased. The results of the Stuart-Maxwell test show that the null hypothesis of existence of marginal homogeneity cannot be rejected (χ 2 (2,N=23)=1.125, p=0.570). This means that we cannot conclude that perceptions and reality are differently distributed. Taking into account the results reported in Section 4.1, this would suggest that, in reality, techniques cannot be considered the most effective a different number of times13.

\ Additionally, the results of the McNemar-Bowker test show that the null hypothesis of existence of symmetry cannot be rejected (χ 2 (3,N=23)=1.286, p=0.733). This means that we cannot conclude that there is directionality when participants’ perceptions are wrong. These two results suggest that participants are not differently mistaken about one technique as they are about the others. Techniques are not differently subject to misperceptions.

4.3 RQ1.3: Comparing the Effectiveness of Techniques

We are going to check if misperceptions could be due to participants detecting the same amount of defects with all three techniques, and therefore being impossible for them to make the right decision. Table 12 shows the value and 95% CI of Krippendorff’s α, overall and for each pair of techniques, for all participants and for every design group (participants that applied the same technique on the same program) separately, and Table 13 shows the value and 95% CI of Krippendorff’s α, overall and for each program/session.

\ For values with all participants, we can rule out agreement, as the upper bound of the 95% CIs are consistent with lack of agreement (α<0.4), except for the case of EP-BT and nametbl-ntree for which the upper bound of the 95% CIs are consistent with fair to good agreement. However, even in this two cases, 0 belongs to the 95% CIs, meaning that agreement by chance cannot be ruled out.

\ This means that participants do not obtain similar effectiveness values when applying the different techniques (testing the different programs) so as to be difficult to discriminate among techniques/programs.

Furthermore, kappa values are negative, which indicates disagreement. This is good for the study, as it means that participants should be able to discriminate among techniques, and lack of agreement cannot be attributed to a problem of being impossible to discriminate among techniques. As regards the results for groups, although α values are negative14, the 95% CIs are too wide to show reliable results (due to small sample size). Note that in most of the cases they range from existence of disagreement in the lower bound (α<-0.4) to the existence of agreement in the upper bound (α>0.4).

4.4 RQ1.4: Cost of Mismatch

Table 14 and Figure 1 show the cost of mismatch. We can see that the EP technique has fewer mismatches compared to the other two. Additionally, the mean and median mismatch cost is smaller. On the other hand, the BT technique has more mismatches, and a higher dispersion. The results of the Kruskal-Wallis test reveal that we cannot reject the null hypothesis of techniques having the same mismatch cost (H(2)=0.685, p=0.710). This means that we cannot claim a difference in mismatch cost between the techniques. The estimated mean mismatch cost is 31pp (median 26pp).

These results suggest that the mismatch cost is not negligible (31pp), and is not related to the technique perceived as most effective. However, note that the existence of very high mismatches and few datapoints could be affecting these results.

4.5 RQ1.5: Expected Loss of Effectiveness

Table 15 shows the average loss of effectiveness that should be expected in a project, where typically different testers participate, and therefore, there would

be both matches and mismatches15. Again, the results of the Kruskal-Wallis test reveal that we cannot reject the null hypothesis of techniques having the same expected reduction in technique effectiveness for a project (H(2)=1.510, p=0.470). This means we cannot claim a difference in project effectiveness loss between techniques. The mean expected loss in effectiveness in the project is estimated as 15pp16 .

These results suggest that the expected loss in effectiveness in a project is not negligible (15pp), and is not related to the technique perceived as most effective. However, we must note again that the existence of very high mismatches for BT and few datapoints could be affecting these results.

4.6 Findings of the Original Study

Our findings are:

– Participants should not base their decisions on their own perceptions, as their perceptions are not reliable and have an associated cost.

– We have not been able to find a bias towards one or more particular techniques that might explain the misperceptions.

– Participants should have been able to identify the different effectiveness of techniques.

– Misperceptions cannot be put down to experience. The possible drivers of these misperceptions require further research. Note that these findings cannot be generalised to other types of developers rather than those with the same profile as the ones used in this study.

:::info Authors:

  1. Sira Vegas
  2. Patricia Riofr´ıo
  3. Esperanza Marcos
  4. Natalia Juristo

:::

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 license.

:::

\

Market Opportunity
RealLink Logo
RealLink Price(REAL)
$0.05624
$0.05624$0.05624
-1.64%
USD
RealLink (REAL) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale

Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale

The post Why This New Trending Meme Coin Is Being Dubbed The New PEPE After Record Presale appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 20:13 The meme coin market is heating up once again as traders look for the next breakout token. While Shiba Inu (SHIB) continues to build its ecosystem and PEPE holds onto its viral roots, a new contender, Layer Brett (LBRETT), is gaining attention after raising more than $3.7 million in its presale. With a live staking system, fast-growing community, and real tech backing, some analysts are already calling it “the next PEPE.” Here’s the latest on the Shiba Inu price forecast, what’s going on with PEPE, and why Layer Brett is drawing in new investors fast. Shiba Inu price forecast: Ecosystem builds, but retail looks elsewhere Shiba Inu (SHIB) continues to develop its broader ecosystem with Shibarium, the project’s Layer 2 network built to improve speed and lower gas fees. While the community remains strong, the price hasn’t followed suit lately. SHIB is currently trading around $0.00001298, and while that’s a decent jump from its earlier lows, it still falls short of triggering any major excitement across the market. The project includes additional tokens like BONE and LEASH, and also has ongoing initiatives in DeFi and NFTs. However, even with all this development, many investors feel the hype that once surrounded SHIB has shifted elsewhere, particularly toward newer, more dynamic meme coins offering better entry points and incentives. PEPE: Can it rebound or is the momentum gone? PEPE saw a parabolic rise during the last meme coin surge, catching fire on social media and delivering massive short-term gains for early adopters. However, like most meme tokens driven largely by hype, it has since cooled off. PEPE is currently trading around $0.00001076, down significantly from its peak. While the token still enjoys a loyal community, analysts believe its best days may be behind it unless…
Share
BitcoinEthereumNews2025/09/18 02:50
USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide

USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide

BitcoinWorld USD/JPY Intervention: How Verbal Warnings Dramatically Slowed the Japanese Yen’s Slide TOKYO, March 2025 – Japanese authorities’ carefully calibrated
Share
bitcoinworld2026/03/30 23:25
USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

USDH Power Struggle Ignites Stablecoin “Bidding Wars” Across DeFi: Bloomberg

A heated contest for control over a new dollar-pegged token has set the stage for what analysts say could define the next phase of the stablecoin industry. According to Bloomberg, a bidding war unfolded on Hyperliquid, one of crypto’s fastest-growing trading platforms, with the prize being the right to issue USDH, its native stablecoin. The competition drew some of the sector’s most prominent names, including Paxos, Sky, and Ethena, who later withdrew their bid, alongside the lesser-known Native Markets, a startup backed by Stripe stablecoin subsidiary Bridge. Hyperliquid Stablecoin Race Shows Branding and Partnerships Matter as Much as Tech Over the weekend, Hyperliquid’s validators, the contributors who secure the network and vote on key decisions, awarded the USDH contract to Native Markets over the weekend. Despite its relatively new status, the firm’s connection with Stripe helped it outpace more established rivals. Stablecoins underpin decentralized finance by providing a dollar-backed medium for collateral, settlement, and payments across applications. What began as a grassroots, community-led sector has evolved into a battleground for institutions and payment companies seeking revenue from interest on reserves. Circle, for example, shares proceeds from its USDC with Coinbase under a partnership designed to stabilize earnings during market swings. The Hyperliquid contest offered a rare glimpse into just how intense competition has become. Paxos pledged to take no revenue until USDH surpassed $1 billion in circulation. Agora offered to share 100% of net revenue with Hyperliquid, while Ethena put forward 95%. All were outbid by Native Markets, whose ties to Stripe’s $1.1 billion acquisition of Bridge and subsequent rollout of the Tempo blockchain positioned it as a strong contender. “Every stablecoin issuer is extremely desperate for supply,” said Zaheer Ebtikar, co-founder of Split Capital. “They are willing to publicly announce how much they are willing to offer. It just shows it’s a very tough business for stablecoin issuers.” While USDC remains dominant on Hyperliquid with more than $5.6 billion in deposits, the arrival of USDH could shift flows and revenue dynamics. Paxos co-founder Bhau Kotecha said the firm sees the exchange’s growth as an important opportunity, while Agora’s co-founder Nick van Eck warned that awarding the contract to a vertically integrated issuer risked undermining decentralization. Regulatory positioning also factored into the debate. Paxos operates under a New York trust charter and is seeking a federal license, while Bridge holds money transmitter approvals in 30 states. Native Markets, in a blog post, cited regulatory flexibility and deployment speed as reasons for its selection. Hyperliquid said the strong engagement from its community validated the process. Circle CEO Jeremy Allaire dismissed concerns over USDC’s status, noting on X that competition benefits the ecosystem. Analysts suggested that fears of centralization may be exaggerated, noting that Hyperliquid is likely to remain neutral and support multiple stablecoins. Still, the contest over USDH highlighted a new reality for stablecoins: branding, partnerships, and business strategy are becoming as decisive as technology. Native Markets Secures USDH Stablecoin Mandate on Hyperliquid Hyperliquid has concluded its governance vote for the USDH stablecoin, awarding the mandate to Native Markets after a closely watched process that drew weeks of community debate and rival proposals. USDH, described by Hyperliquid as a “Hyperliquid-first, compliant, and natively minted” dollar-backed token, is intended to reduce the platform’s dependence on USDC and strengthen its spot markets. Validators on the decentralized exchange voted in favor of Native Markets, a relatively new player backed by Stripe’s Bridge subsidiary, over established contenders including Paxos and Ethena. The outcome followed a string of proposals offering aggressive revenue-sharing terms to win validator support, underscoring the scale of incentives attached to controlling USDH. Hyperliquid’s exchange has become a critical hub for stablecoin liquidity, with $5.7 billion in USDC, around 8% of its total supply, currently held on the network. At prevailing treasury yields, that translates to an estimated $200 million to $220 million in annual revenue for Circle, underlining why a native alternative could be transformative. Hyperliquid’s validators, who secure the network and vote on key decisions, selected Native Markets following an on-chain governance process that concluded September 15. Native Markets has laid out a phased rollout for USDH, beginning with capped minting and redemption trials before expanding into spot markets. Its reserves will be managed in cash and treasuries by BlackRock, with on-chain tokenization through Superstate and Bridge. Yield from those reserves will be split between Hyperliquid’s Assistance Fund and ecosystem development. The launch of USDH comes as Hyperliquid records record profits from perpetual futures trading, with $106 million in revenue in August alone, and prepares to slash spot trading fees by 80% to bolster liquidity. Analysts say the move positions Hyperliquid to capture more of the stablecoin economics internally, marking a significant step in its bid to rival the largest players in decentralized finance
Share
CryptoNews2025/09/18 00:48