The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.

Foundation Models Are Reshaping How Developers Code Together

2025/11/13 23:00
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Abstract

1 Introduction

2 Data Collection

3 RQ1: What types of software engineering inquiries do developers present to ChatGPT in the initial prompt?

4 RQ2: How do developers present their inquiries to ChatGPT in multi-turn conversations?

5 RQ3: What are the characteristics of the sharing behavior?

6 Discussions

7 Threats to Validity

8 Related Work

9 Conclusion and Future Work

References

Discussions

Implications for Designing and Investigating FM-powered SE collaboration tools. The most important finding from our study is that developers do share their conversations with ChatGPT while contributing to open-source projects. This insight opens a new view for researchers and FM practitioners assessing the role and influence of FM-powered software development tools, such as ChatGPT, within the realm of collaborative coding. It underscores the potential of these tools to not only assist individual developers but also to enhance the collective productivity and innovation of open-source communities. Furthermore, our study provides several taxonomies that researchers can further utilize to characterize developers’ interactions with ChatGPT or other FM-powered software development tools. For instance, the taxonomy and annotated prompts in RQ1 can be leveraged to develop a learning-based approach that can automatically identify tasks per interest and analyze the corresponding response quality. Designers can also leverage our reported frequency of software engineering tasks to prioritize improvement for their tools. The answers to RQ3 reveal how developers with different roles use shared conversations with ChatGPT in collaborative coding, which can be used to design FM-powered tools tailored to support developers with other roles.

Implications for Benchmarking FM for SE tasks

Our findings from RQ1 shed light on future benchmark designs for evaluating the impact of FMs in different types of software engineering tasks. In RQ1, we find multiple types of input for code generation and issues resolving inquiries, but those types are not fully captured by existing benchmarks. For instance, the widely recognized code generation benchmark, Human, relies on textual specifications and method signatures.

\ Yet, our analysis shows that nearly half of the code generation prompts (47%) include initial code drafts alongside textual descriptions. Similarly, our examination of prompts categorized under (C4) Issue resolving indicates that a significant portion (36%) of issue resolution requests involve sharing error messages or execution traces, often without accompanying source code. Therefore, we recommend that researchers designing future benchmarks take these findings into account.

\ Our observation that multi-turn conversations are often utilized also motivates future evaluation of FMs allowing multi-turn interactions. Currently, there are only a few studies allowing multi-turn code generation (Wang et al., 2024; Nijkamp et al., 2022). Last but not least, we observed many other tasks beyond code generation and issue resolution, such as code review, conceptual question, and documentation, which are rarely considered as benchmark tasks for FM-powered software development tools.

\ Implications for Prompt Engineering. The findings from RQ2 highlight the frequent use of multi-turn strategies to improve ChatGPT’s solutions iteratively. The flow chart shown in Figure 5 illustrates the diverse approaches developers employ in these interactions. This finding motivates future investigations into the efficiency of developers’ prompting techniques within these multi-turn conversations. Specifically, whether the best practices in prompt engineering have been applied and whether improved prompts can effectively alter the flow of these interactions is a future direction for enhancing the utility and effectiveness of FM-powered tools in software development.

:::info Authors

  1. Huizi Hao
  2. Kazi Amit Hasan
  3. Hong Qin
  4. Marcos Macedo
  5. Yuan Tian
  6. Steven H. H. Ding
  7. Ahmed E. Hassan

:::

:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

\

Market Opportunity
Wink Logo
Wink Price(LIKE)
$0.001702
$0.001702$0.001702
-0.05%
USD
Wink (LIKE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Bitcoin ETFs Surge with 20,685 BTC Inflows, Marking Strongest Week

Bitcoin ETFs Surge with 20,685 BTC Inflows, Marking Strongest Week

TLDR Bitcoin ETFs recorded their strongest weekly inflows since July, reaching 20,685 BTC. U.S. Bitcoin ETFs contributed nearly 97% of the total inflows last week. The surge in Bitcoin ETF inflows pushed holdings to a new high of 1.32 million BTC. Fidelity’s FBTC product accounted for 36% of the total inflows, marking an 18-month high. [...] The post Bitcoin ETFs Surge with 20,685 BTC Inflows, Marking Strongest Week appeared first on CoinCentral.
Share
Coincentral2025/09/18 02:30
ZEC Rally and G Coin — Two Altcoin Setups Worth Watching

ZEC Rally and G Coin — Two Altcoin Setups Worth Watching

The post ZEC Rally and G Coin — Two Altcoin Setups Worth Watching appeared on BitcoinEthereumNews.com. The crypto market has started the week on a bullish footing
Share
BitcoinEthereumNews2026/03/19 00:58
IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge!

The post IP Hits $11.75, HYPE Climbs to $55, BlockDAG Surpasses Both with $407M Presale Surge! appeared on BitcoinEthereumNews.com. Crypto News 17 September 2025 | 18:00 Discover why BlockDAG’s upcoming Awakening Testnet launch makes it the best crypto to buy today as Story (IP) price jumps to $11.75 and Hyperliquid hits new highs. Recent crypto market numbers show strength but also some limits. The Story (IP) price jump has been sharp, fueled by big buybacks and speculation, yet critics point out that revenue still lags far behind its valuation. The Hyperliquid (HYPE) price looks solid around the mid-$50s after a new all-time high, but questions remain about sustainability once the hype around USDH proposals cools down. So the obvious question is: why chase coins that are either stretched thin or at risk of retracing when you could back a network that’s already proving itself on the ground? That’s where BlockDAG comes in. While other chains are stuck dealing with validator congestion or outages, BlockDAG’s upcoming Awakening Testnet will be stress-testing its EVM-compatible smart chain with real miners before listing. For anyone looking for the best crypto coin to buy, the choice between waiting on fixes or joining live progress feels like an easy one. BlockDAG: Smart Chain Running Before Launch Ethereum continues to wrestle with gas congestion, and Solana is still known for network freezes, yet BlockDAG is already showing a different picture. Its upcoming Awakening Testnet, set to launch on September 25, isn’t just a demo; it’s a live rollout where the chain’s base protocols are being stress-tested with miners connected globally. EVM compatibility is active, account abstraction is built in, and tools like updated vesting contracts and Stratum integration are already functional. Instead of waiting for fixes like other networks, BlockDAG is proving its infrastructure in real time. What makes this even more important is that the technology is operational before the coin even hits exchanges. That…
Share
BitcoinEthereumNews2025/09/18 00:32