I got fed up with GitHub Trending, so I built my own ranker. Here's what I learned.
So we run a tech newsletter, and I used to spend my whole Sunday trying to find repos worth writing about. so I got annoyed enough to build something. it works way better. but the real lesson was how much junk you have to filter out before the ranking even matters. stuff I ended up filtering: dead repos with huge star counts. you'd be shocked how many 15k-star repos haven't had a commit in a year. they just sit there forever showing up in searches. if nothing's been committed in 3 months I drop it. awesome-lists and cheatsheets. people star these as bookmarks and never come back. not software, not what I want to feature. forks with better SEO than the original. this one genuinely made me mad. someone forks a popular library, rewrites the readme with better keywords, their fork starts ranking above the actual project. now I always check the fork field first. one-hit HN wonders - the repo gets 8k stars in two days from one frontpage moment, then flatlines forever. trending loves these. by the time I'd feature one everyone's already seen it. stuff from google/microsoft/meta. ok this one is opinionated. when a FAANG drops something it gets 10k stars in a week because of the brand. and even if the project is good, they don't need my newsletter to promote it. I downweight big accounts. curation product not a coverage product. idk, maybe that's wrong. crypto token garbage. not gonna elaborate. you know what this is. boilerplate/starter templates. "nextjs-starter-2024" type stuff. people collect these like pokémon and never use them. high stars, zero signal. mirror repos. someone re-uploads a popular ML model to their own account, collects stars from people who didn't find the official one. AI content farms - these are growing fast. The repos are full of LLM written "guides" with suspicious commit patterns. You can see e.g. 40 commits in one afternoon from a new account, and then - silence. Coordinated star farms. Multiple repositories are created within seconds of each other from new accounts and all get stars at the same rate. Once you see the pattern you cannot unsee it. the thing that surprised me was that I thought pure velocity would solve most of this. it doesn't. lots of the junk above generates fast velocity too.. the curation/filters matter as much as the formula does, maybe more. After all of it maybe 5-10% of what's on trending on a given day is stuff I'd actually write about. This is what I built to solve it: repoinsider.com. All the filters above are baked in. Would love to hear what filters you'd add — I'm sure I'm missing some.
