

I think code golf is a great dataset for this kind of analysis specifically because they are artificial and people are paying attention to the number of characters used. Leetcode solutions might be a better option though.
In real world projects there are too many confounding factors. People aren’t implementing servers in brainfuck or websites in C. Even rewrites of a project into another language have more/fewer features. So it’s an apples to oranges comparison.
You missed a /s marker