This research investigates unethical behavior in open-source software (OSS) projects through the analysis of software artifacts that are impacted. Building on previous taxonomies, we further categorize and refine them to find 18 different sorts of artifacts that potentially represent ethical transgressions, including as source code, configuration files, licenses, project-level features, and GitHub interactions.This research investigates unethical behavior in open-source software (OSS) projects through the analysis of software artifacts that are impacted. Building on previous taxonomies, we further categorize and refine them to find 18 different sorts of artifacts that potentially represent ethical transgressions, including as source code, configuration files, licenses, project-level features, and GitHub interactions.

18 Ways Unethical Behavior Creeps Into Open-Source Software

Abstract and 1. Introduction

  1. Background and Related Work

  2. Study of Unethical Behavior in OSS

    3.1 RQ1: Types of unethical behavior

    3.2 RQ2: Affected software artifacts

  3. Methodology

    4.1 Modeling via SWRL rules

    4.2 Automatic detection of unethical behavior

  4. Evaluation

  5. Discussion and Implications

  6. Threats to Validity

  7. Conclusion and References

3.2 RQ2: Affected software artifacts

We define affected software artifacts as objects in software repositories that violate ethical principles. To derive the set of affected software artifacts, we started with the 19 categories from the taxonomy of prior study [74]. Then, we categorized the artifacts we found in our study based on the 19 categories. After removing categories with no artifact found, we obtained eight categories: (1) source code, (2) script, (3) configuration, (4) database (data), (5) image, (6) prose, (7) legalese, and (8) other. For the prose category (i.e., plain text files), we only found two concrete types (i.e., README/CONTRIBUTING.md, and CHANGELOG) so we separated them into two categories. As the category “other” in prior study [74] is too broad, we split it into 10 new categories based on aforementioned steps in thematic analysis: (1) external application programming interface (API), (2) user interface (UI), (3) project, (4) release history, (5) software feature, (6) product name, (7) operating system (OS), (8) website, (9) PR/Issue code review, (10) PR/Issue comment. We derive “PR/Issue code review” and “PR/Issue comment” based on prior work [58]. Our newly introduced categories aim to preserve the hierarchy of artifacts (Project > Software feature [50] > Source code). For 28 cases (8.9%), both authors meet to discuss the issues labeled with different categories to resolve any disagreement. Finally, we obtained 18 types of affected software artifacts: (1) project, (2) software feature, (3) source code, (4) external API, (5) legalese, (6) product name, (7) release history, (8) UI, (9) configuration file, (10) PR/Issue code review, (11) PR/Issue comment, (12) README / CONTRIBUTING.md, (13) CHANGELOG, (14) data, (15) image, (16) OS, (17) website, and (18) script (i.e., source code in languages executed by an interpreter). As several artifacts are more difficult to understand, we explain them below:

\ Project: The affected artifacts involve more than one types of artifacts within the entire repository.

\ Software feature: Functional or non-functional requirements of a system [50, 57]. An example is the ability to unsubscribe a service.

\ Source code: Source files (excluding scripts, binary code, build code) that belong to the current repository (internal).

\ External API: API from third party (external) library or service.

\ Legalese: Licenses, copyright notes, or patents.

\ Product name: The product, project, or app name.

\

\ The third column in Table 1 presents the affected artifacts for each unethical behavior. Each number in the column denotes the number of GitHub issues with a certain type of artifact (e.g., “19 Projects” means that there are 19 issues where S2 is affected by projects). Theoretically, one issue might discuss multiple artifacts but we found that each issue only discusses one artifact because (1) developers prefer discussing ethical concerns for one type of artifact in one issue, and (2) some categories are hierarchical (e.g., “project” includes multiple types of artifacts). Overall, Table 1 shows that source code is still the most common type of artifacts for unethical behavior (i.e., it affects eight types of unethical behavior).

\

4 METHODOLOGY

Our study shows that diverse types of unethical behavior exist in OSS projects, and they usually involve diverse types of software artifacts. The diversity and the complexity of the rules governing the ethics-related activities in GitHub motivate the need for a modeling approach that can abstract this complexity and facilitate its automatic detection. In Section 4.1, we describe how we model unethical behavior using SWRL rules. Then, we explain the architecture of Etor that uses SWRL rules for automatic detection in Section 4.2.

\ Table 2: GitHub attributes and types for auto-detection

4.1 Modeling via SWRL rules

We propose using SWRL rules to represent unethical behavior in an OSS project together with the publicly available data in GitHub. SWRL rules allow us to model affected software artifacts as hierarchies of classes and properties, capturing the relationships between affected software artifacts and stakeholders. Table 2 shows GitHub attributes used in our modeling. The columns under “Attribute”, and “Type” explain each attribute and its type. We model each OSS project as GHRepository. By referring to the GitHub Repositories API [19], we selected 11 data properties (e.g., latestRelease and licenseFile) that belong to a GHRepository by excluding properties that are irrelevant for unethical behavior (e.g., avatar_url that points to the icon for a repository). Apart from GHRepository, we introduce six classes to model data properties of a repository: (1) GHUser, (2) GHCommit, (3) GHContent, (4) GHIssue (5) GHPullRequest), (6) GHRelease. While GitHub users (GHUser) usually play different roles in OSS projects, we only model: (1) contributors (users who are official contributors of a repository) and (2) issue owners (users who report an issue). For modeling GHIssue, we reuse the same convention in GitHub by modeling a PR (GHPullRequest) as a subclass of GHIssue (i.e., GitHub Issue Search API will search for issues and PRs, essentially treating a PR as a type of GitHub issue). Figure 3 shows the OWL ontology for our model where GHRepository is the main class, and the arrows denote the relationships between the classes. Specifically, GHIssue − GHPullRequest represents the subclass relations, whereas other arrows denote hasA relations (e.g., GHIssue − GHUser means that each issue has a user who reports the issue).

\

:::info Authors:

(1) Hsu Myat Win, Southern University of Science and Technology, China (11960003@mail.sustech.edu.cn);

(2) Haibo Wang, Southern University of Science and Technology, China (wanghb2020@mail.sustech.edu.cn);

(3) Shin Hwei Tan, a corresponding author from Southern University of Science and Technology, China (tansh3@sustech.edu.cn).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
OpenLedger Logo
OpenLedger Price(OPEN)
$0.17312
$0.17312$0.17312
-8.14%
USD
OpenLedger (OPEN) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Let insiders trade – Blockworks

Let insiders trade – Blockworks

The post Let insiders trade – Blockworks appeared on BitcoinEthereumNews.com. This is a segment from The Breakdown newsletter. To read more editions, subscribe ​​“The most valuable commodity I know of is information.” — Gordon Gekko, Wall Street Ten months ago, FBI agents raided Shayne Coplan’s Manhattan apartment, ostensibly in search of evidence that the prediction market he founded, Polymarket, had illegally allowed US residents to place bets on the US election. Two weeks ago, the CFTC gave Polymarket the green light to allow those very same US residents to place bets on whatever they like. This is quite the turn of events — and it’s not just about elections or politics. With its US government seal of approval in hand, Polymarket is reportedly raising capital at a valuation of $9 billion — a reflection of the growing belief that prediction markets will be used for much more than betting on elections once every four years. Instead, proponents say prediction markets can provide a real service to the world by providing it with better information about nearly everything. I think they might, too — but only if insiders are free to participate. Yesterday, for example, Polymarket announced new betting markets on company earnings reports, with a promise that it would improve the information that investors have to work with.  Instead of waiting three months to find out how a company is faring, investors could simply watch the odds on Polymarket.  If the probability of an earnings beat is rising, for example, investors would know at a glance that things are going well. But that will only happen if enough of the people betting actually know how things are going. Relying on the wisdom of crowds to magically discern how a business is doing won’t add much incremental knowledge to the world; everyone’s guesses are unlikely to average out to the truth. If…
Share
BitcoinEthereumNews2025/09/18 05:16
What We Know So Far About Reported Tensions at Bitmain

What We Know So Far About Reported Tensions at Bitmain

The post What We Know So Far About Reported Tensions at Bitmain appeared on BitcoinEthereumNews.com. Posts on X (Twitter) suggest that Bitmain co-founder Micree
Share
BitcoinEthereumNews2025/12/22 05:07
Galaxy Digital’s head of research explains why bitcoin’s outlook is so uncertain in 2026

Galaxy Digital’s head of research explains why bitcoin’s outlook is so uncertain in 2026

Markets Share Share this article
Copy linkX (Twitter)LinkedInFacebookEmail
Galaxy Digital’s head of research explains w
Share
Coindesk2025/12/22 05:25