Blog

Over 3.1 million fake “stars” on GitHub projects used to boost rankings

GitHub has a problem with inauthentic “stars” used to artificially inflate the popularity of scam and malware distribution repositories to appear more popular, helping them reach more unsuspecting users.

Stars are similar to “Like” buttons on social media sites, allowing GitHub users to favorite a repository. GitHub uses the stars as part of a global ranking system and to show you related content that it thinks you may like.

“You can star repositories and topics to discover similar projects on GitHub. When you star repositories or topics, GitHub may recommend related content on your personal dashboard,” explains GitHub.

Most starred repository with 408,000 stars
Most starred repository with 408,000 stars

The problem has been documented previously, like last summer when Check Point uncovered a malware delivery service named the ‘Stargazers Ghost Network,’ which used an extensive network of inauthentic users starring fake projects to push information-stealing malware.

Non-malicious projects also use fake stars to boost their popularity, increase their reach, and attract legitimate user attention, real stars, and adoption.

A new study conducted by researchers at Socket, Carnegie Mellon University, and North Carolina State University gives us a better idea of the scale of the problem, finding 4.5 million stars on GitHub, which are suspected to be fake.

A list of starring services for GitHub
A list of starring services for GitHub
Source: Arxiv.org

Looking for fake stars

The researchers developed and used a tool called ‘StarScout’ to analyze 20TB of data from ‘GHArchive’ to find inauthentic stars.

GHArchive contains metadata of over 6 billion GitHub events from July 2019 to October 2024, including 60.5 million user actions on 310 million repositories and 610 million stars.

StarScout detects users who show minimal activity on GitHub, like starring a single repository, have bot or temporary account activity patterns, and account groups that act in coordination, such as starring the same repositories within a short time.

Their method is based on CopyCatch, an algorithm designed to detect fraudulent patterns in social networks.

Overview of StarScout data processing
Overview of StarScout data processing
Source: Arxiv.org

4.5 million stars suspected as fakes

After processing the data by applying low activity and lockstep signature algorithms to identify suspicious stars across repositories, the team found 4,530,000 suspected inauthentic stars given by 1,320,000 accounts across 22,915 repositories.

To increase the confidence in the true nature of these stars, the researchers filtered out potential false positives by only considering repositories with a significant anomalous spike of starring activity in a single month, and for which the percentage of fakes stood above 10%, compared to the total number of stars.

This reduced the result to 3,100,000 fake stars given by 278,000 accounts to 15,835 repositories.

Identification of fake patterns like clustering behavior
Identification of fake patterns like clustering behavior
Source: Arxiv.org

Of those, roughly 91% of the repositories and 62% of the suspected inauthentic accounts were deleted as of October 2024, which supports the accuracy of the StarScout tool.

The study also shows that fake star activity surged in 2024, with approximately 15.8% of repositories having over 50 stars in July 2024 being involved in these malicious campaigns.

The researchers reported the repositories and accounts StarScout identified as inauthentic in July 2024, and GitHub removed them all. However, they are still in the process of evaluating and reporting additional clusters found in November 2024.

Word clouds of fake starred repositories
Word clouds of fake starred repositories (deleted and present)
Source: Arxiv.org

The implications of fake stars on GitHub and its users are multiple, but generally, the problem erodes trust in the platform and the various software projects hosted on it.

Users should look past stars, evaluate the repository activity and quality, read the documentation, examine the content and contributions, and review the code if possible.

Deceptive GitHub repositories are widespread, and the platform has even been exploited in state-sponsored operations, so exercise caution when downloading software from it.

BleepingComputer has contacted GitHub to learn more about how the platform actively fights the fake stars problem, but we are still waiting for their response.


Source link

Related Articles

Back to top button
close