Information relating to the Marginalia Search project.
Marginalia Search is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren’t aware of in favor of the sort of sites you probably already knew existed.
You may also be interested in the 🏷️ search-engine tag.
DocumentsI’ve been working on getting anchor tag keywords into the search engine, basically using link texts to complement the keywords on a webpage.
The problem I’m attempting to address is that many websites don’t really describe themselves particularly well. As Steve Ballmer’s stage performance once illustrated, merely repeating a word doesn’t on its own make what you’re saying relevant to the term.
Another good example of how it falls short is PuTTY’s website, which will be used as a pilot case to improve.So a bit of an update on what I’ve been working on. This will be adapted into release notes in a while, but I haven’t quite wrapped a bow on the change set yet.
Still, it has certainly been a few weeks. Didn’t quite land how busy I’ve been until I set down to draft this post. Them’s some changes, and I’m skipping a few to keep this meandering post at a sane length.So the search engine is moving to a new server soon, thanks to the generous grant mentioned recently.
If you visit search.marginalia.nu now, it may or may not use the old or new server. It’ll be like this for a while, since I need them both for testing and maintenance type work.
I’ll also apologize if this post is a bit chaotic. It is a reflection of a very chaotic couple of weeks that apart from setting up this migration also involved a very short notice invitation for a presentation at ossym23.I’m happy to announce that the generous people at FUTO have granted the project $15,000 with no strings attached to help the search engine out with some more server power.
FUTO is a young Austin, TX-based organization “dedicated to developing, both through in-house engineering and investment, technologies that frustrate centralization and industry consolidation”. It’s one to keep an eye on, I believe their heart is in the right place and they have every possibility of making a real difference.So… I’ve had the most unreal week of coding. Zero exaggeration, I’ve halved the RAM requirements of the search engine, removed the need to take the system offline during an upgrade, removed hard limits on how many documents can be indexed, and quadrupled soft limits on how many keywords can be in the corpus.
It’s been a long term goal to keep it possible to run and operate the system on low-powered hardware, and so far improvements have been made, to the point where my 32 Gb RAM developer machine feels spacey rather than cramped, but this set of changes takes it several notches further.This is a bit of an what I’ve been working on style of post. It’s also a bit of a complement for the release notes of the upcoming release which should be dropping in a week or so. There’s some spit and polish still missing from these things, but if I don’t write about them now too much will have been ejected from the cache to make a well written post about it.I’m working on Marginalia Search full time.
I left the office for the last time today, and it’s the strangest feeling. I’ve quit jobs, taken time off work, been laid off, but this is different from any of those things. This is deliberate.
There’s a note of relief. I’ve essentially been working two pretty demanding jobs; one for pay and one for passion and the joy of making a difference.I’ve moved Marginalia’s sources to Github. Can’t pick every battle.
The main reason is I’m kind of tired of the amount of spam bots that keep signing up to my Gitea. The juice of self-hosting a public-access git forge, even locked down to prevent arbitrary repo creation, that juice just isn’t worth the squeeze.
This is not without some consideration.
To be blunt, I don’t like Github. Their use of dark patterns leaves a real nasty after-taste.This is a bit of a follow up to the previous post.
The Grand Code Restructuring [ 2023-03-17 ] Marginalia’s search result quality has, for a long while, been pretty good as long as your search query is a single term, but for multiple search terms it’s been a bit hit-and-miss. Marginalia was never great at this, but the quality of results in this usage pattern has taken a bit of a dive recently due to a re-write of the index last fall.In general I don’t like to fuss over code, but this is exactly what I’ve been doing in preparation of the NLnet funded work. I’ve spent the last month restructuring Marginalia’s code base. It’s not completely done, but I’ve made great headway.
Things got the way they got because in general for experimental solo-development projects, I think it makes sense to be fairly tolerant of technical debt.
Since refactoring is something that is extremely difficult to break up into parallel tracks or do in small iterations, the cost of refactoring is effectively multiplied by the number of people that could be working on the code.No time like the project’s two year anniversary to drop this particular bomb…
Marginalia’s gotten an NLNet grant. This means I’ll be able to work full time on this project at least a year.
https://nlnet.nl/project/Marginalia/ This grant is essentially the best-case scenario for funding this project. It’ll be able to remain independent, open-source, and non-profit.
I won’t start in earnest for a few months as I’ve got loose ends to tie up before I can devote that sort of time.