Can we unfuck internet discoverability?

Posted: 2022-02-04

I’ve been thinking a lot about how difficult it has become to discover quality content on the Internet, not because it isn’t there, but because the signal to noise ratio is really bad, and most venues of discovery don’t seem to be able to handle it.

Recommendation algorithms seem to work almost too well, to the point where it’s all kind of just showing you things you already like, rarely anything new that you might like. It’s an absolute tragedy both for small websites and for their potential audience.

Certainly discovery on the Internet could be made better.

I’ve tried discussing this problem in various avenues, but mostly what you get is long tirades about how bad google or reddit is. Let’s not even dwell on what other people are doing that isn’t working, instead let’s build something that does work. If I walk into a library and ask for a 20 good books to read, then I will get 20 books and most of them will be good. Why couldn’t that be a thing with websites as well?

It’s why I built my search engine, and it’s what I’ve tried to mitigate with exploration mode. Neither are perfect, but both seem close. Dealing with the search engine database I have, and doing various experiments, I think it should be possible to build something genuinely useful in this space. I’m not at all sure how but I think there are entirely new things that could be tried.

If you too want to work on this, please let me know. Maybe we can collaborate somehow. I’m trying to gather some like-minded people. I’m sitting on a lot of data from my search engine, and have at least some hardware to spare.

For inspiration, I’m making available a fun and useful dataset, a link database. It’s available under CC-BY-SA-NC 4.0. To keep it manageable, it’s on a first domain level, making it 13 million entries. You can download it below. This is real production data. Build something cool, make graphviz diagrams, whatever. Have fun!

See Also