Every time I see some post where commits are taken as contribution metrics, I re...

sgc · on Dec 31, 2024

I think the authors agree with you. They tried to look at lines of code added / deleted (eg "they consistently made over 95% of the lines added to and deleted from Elasticsearch") - although the language in the article flops between that and just saying 'commits', so it's not sure what they were actually looking at for the write up. In their scraping code / dataset linked at the start of the article, they are logging `commits_list = [commit_date, dels, adds, oid, author]`.

This is also just a blog summary of a preliminary study:

> "This is the first step in a much larger research project underway [...] we’re working toward including more repositories and additional metrics to better understand the project health dynamics within these projects."

Project activity will remain inherently fuzzy. Just about everybody who programs extensively has spent a couple days to change a line or two of code at some point in their life. No metric can capture that unless we are all journaling and publishing our life activities.

Nonetheless we can do better than commits, as you said. If you review most anything online, there is a global score and then 3-5 categories with subscores. Surely the same should be true here. Freshness of LOC changes, average freshness of the overall codebase as a percent, issues satisfactorily resolved (and not closed because they are blown off, which should be a negative indicator), merged pull requests, to think offhand of a few.

What would be your top 5 categories to evaluate the "health" of a code base, admitting that any evaluation will remain a very fuzzy approximation at best?

antirez · on Dec 31, 2024

The data about Redis can be true only if they mean "commits". This is why I believe they checked Github contributions numbers.

sgc · on Dec 31, 2024

In that case they did not evaluate it with enough care, given they gathered more information than that. Hopefully they correct that as they progress.

I am quite curious as to your take on a few metrics that would help evaluate the health of a code base. It's a dirty job, but we all have to do it every time we look for something new.

casenmgreen · on Jan 1, 2025

I worked once for a smallsh (~50 people) company, with a huge, unmaintainable, legacy code-base.

Said company was bought by a large US company where one of their key metrics for a developer was number of new lines of code.

It went down-hill from there. 10% of people were fired because mothership instituted job cuts globally, and then people were leaving, then another round of cuts, then most people left, then the company was sold, I think losing a fairly hefty part of its valuation.

Eventually large US company was bought by Oracle, which to my eye indicated Oracle is like MS; they have a single product, which is a massive cash cow, and for the rest, they serially make terrible decisions (a la Nokia et al).