The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.

Foundation Models Are Reshaping How Developers Code Together

2025/11/13 23:00
3분 읽기
이 콘텐츠에 대한 의견이나 우려 사항이 있으시면 crypto.news@mexc.com으로 연락주시기 바랍니다

Abstract

1 Introduction

2 Data Collection

3 RQ1: What types of software engineering inquiries do developers present to ChatGPT in the initial prompt?

4 RQ2: How do developers present their inquiries to ChatGPT in multi-turn conversations?

5 RQ3: What are the characteristics of the sharing behavior?

6 Discussions

7 Threats to Validity

8 Related Work

9 Conclusion and Future Work

References

Discussions

Implications for Designing and Investigating FM-powered SE collaboration tools. The most important finding from our study is that developers do share their conversations with ChatGPT while contributing to open-source projects. This insight opens a new view for researchers and FM practitioners assessing the role and influence of FM-powered software development tools, such as ChatGPT, within the realm of collaborative coding. It underscores the potential of these tools to not only assist individual developers but also to enhance the collective productivity and innovation of open-source communities. Furthermore, our study provides several taxonomies that researchers can further utilize to characterize developers’ interactions with ChatGPT or other FM-powered software development tools. For instance, the taxonomy and annotated prompts in RQ1 can be leveraged to develop a learning-based approach that can automatically identify tasks per interest and analyze the corresponding response quality. Designers can also leverage our reported frequency of software engineering tasks to prioritize improvement for their tools. The answers to RQ3 reveal how developers with different roles use shared conversations with ChatGPT in collaborative coding, which can be used to design FM-powered tools tailored to support developers with other roles.

Implications for Benchmarking FM for SE tasks

Our findings from RQ1 shed light on future benchmark designs for evaluating the impact of FMs in different types of software engineering tasks. In RQ1, we find multiple types of input for code generation and issues resolving inquiries, but those types are not fully captured by existing benchmarks. For instance, the widely recognized code generation benchmark, Human, relies on textual specifications and method signatures.

\ Yet, our analysis shows that nearly half of the code generation prompts (47%) include initial code drafts alongside textual descriptions. Similarly, our examination of prompts categorized under (C4) Issue resolving indicates that a significant portion (36%) of issue resolution requests involve sharing error messages or execution traces, often without accompanying source code. Therefore, we recommend that researchers designing future benchmarks take these findings into account.

\ Our observation that multi-turn conversations are often utilized also motivates future evaluation of FMs allowing multi-turn interactions. Currently, there are only a few studies allowing multi-turn code generation (Wang et al., 2024; Nijkamp et al., 2022). Last but not least, we observed many other tasks beyond code generation and issue resolution, such as code review, conceptual question, and documentation, which are rarely considered as benchmark tasks for FM-powered software development tools.

\ Implications for Prompt Engineering. The findings from RQ2 highlight the frequent use of multi-turn strategies to improve ChatGPT’s solutions iteratively. The flow chart shown in Figure 5 illustrates the diverse approaches developers employ in these interactions. This finding motivates future investigations into the efficiency of developers’ prompting techniques within these multi-turn conversations. Specifically, whether the best practices in prompt engineering have been applied and whether improved prompts can effectively alter the flow of these interactions is a future direction for enhancing the utility and effectiveness of FM-powered tools in software development.

:::info Authors

  1. Huizi Hao
  2. Kazi Amit Hasan
  3. Hong Qin
  4. Marcos Macedo
  5. Yuan Tian
  6. Steven H. H. Ding
  7. Ahmed E. Hassan

:::

:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

\

면책 조항: 본 사이트에 재게시된 글들은 공개 플랫폼에서 가져온 것으로 정보 제공 목적으로만 제공됩니다. 이는 반드시 MEXC의 견해를 반영하는 것은 아닙니다. 모든 권리는 원저자에게 있습니다. 제3자의 권리를 침해하는 콘텐츠가 있다고 판단될 경우, crypto.news@mexc.com으로 연락하여 삭제 요청을 해주시기 바랍니다. MEXC는 콘텐츠의 정확성, 완전성 또는 시의적절성에 대해 어떠한 보증도 하지 않으며, 제공된 정보에 기반하여 취해진 어떠한 조치에 대해서도 책임을 지지 않습니다. 본 콘텐츠는 금융, 법률 또는 기타 전문적인 조언을 구성하지 않으며, MEXC의 추천이나 보증으로 간주되어서는 안 됩니다.

$30,000 in PRL + 15,000 USDT

$30,000 in PRL + 15,000 USDT$30,000 in PRL + 15,000 USDT

Deposit & trade PRL to boost your rewards!