<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Weblog on marginalia.nu</title><link>https://www.marginalia.nu/log/</link><description>Recent content in Weblog on marginalia.nu</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 30 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.marginalia.nu/log/index.xml" rel="self" type="application/rss+xml"/><item><title>An NSFW filter for Marginalia Search</title><link>https://www.marginalia.nu/log/a_134_nsfw/</link><pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_134_nsfw/</guid><description>&lt;p&gt;&amp;hellip; optional, that is.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been working on an NSFW filter for Marginalia Search,
as that is something some people have asked for,
primarily API consumers.&lt;/p&gt;
&lt;p&gt;The search engine has had some domain based filtering for a while,
based on the UT1 lists, but that isn&amp;rsquo;t a very comprehensive approach.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll land on a single hidden layer neural network approach,
implemented from scratch, but before landing on that,
many other things were tried along the way.&lt;/p&gt;</description></item><item><title>AI makes you boring</title><link>https://www.marginalia.nu/log/a_132_ai_bores/</link><pubDate>Thu, 19 Feb 2026 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_132_ai_bores/</guid><description>&lt;p&gt;This post is an elaboration on a comment I made on Hacker News recently,
on &lt;a href="https://www.arthurcnops.blog/death-of-show-hn/"&gt;a blog post&lt;/a&gt; that showed an increase in volume and decline in quality among the &amp;ldquo;Show HN&amp;rdquo; submissons.&lt;/p&gt;
&lt;blockquote&gt;
I don't actually mind AI-aided development, a tool is a tool and should be used if you find it useful, but I think the vibe coded Show HN projects are overall pretty boring. They generally don't have a lot of work put into them, and as a result, the author (pilot?) hasn't generally thought too much about the problem space, and so there isn't really much of a discussion to be had.
&lt;p&gt;The cool part about pre-AI show HN is you got to talk to someone who had thought about a problem for way longer than you had. It was a real opportunity to learn something new, to get an entirely different perspective.&lt;/p&gt;</description></item><item><title>Index Compression, Query Execution Improvements</title><link>https://www.marginalia.nu/log/a_131_index_compression/</link><pubDate>Fri, 13 Feb 2026 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_131_index_compression/</guid><description>&lt;p&gt;The Marginalia Search index has recently seen some design tweaks
to make it perform better, primarily the introduction of postings list compression.&lt;/p&gt;
&lt;p&gt;Last year, the index was partially &lt;a href="https://www.marginalia.nu/log/a_123_index_io/"&gt;re-implemented with SSDs in mind&lt;/a&gt;.
This was largely a success, but left some lingering issues with tail latencies that sometimes weren&amp;rsquo;t what they needed to be.&lt;/p&gt;
&lt;p&gt;To ensure predictable execution times,
the query execution is provided a timeout value,
after which it will wrap up and return the best results it&amp;rsquo;s found.
Query execution was so flaky that the &lt;em&gt;actual&lt;/em&gt; timeout used when terminating the execution used to be something like 50ms lower than the provided value.
This is obviously not a fantastic state of affairs.&lt;/p&gt;</description></item><item><title>Trust in Ranking</title><link>https://www.marginalia.nu/log/a_130_trust_in_ranking/</link><pubDate>Sat, 31 Jan 2026 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_130_trust_in_ranking/</guid><description>&lt;p&gt;The Marginalia Search default ranking algorithm recently saw a fairly radical improvement, due to a new domain trust system that drastically reduces the number of content farm results, as long as there are human results it usually finds them across all the usual test queries.&lt;/p&gt;
&lt;p&gt;Recently fixing a few bugs that made the search engine work more correctly had the unexpected and undesired side-effect of also making it surface more search engine spam and content farm-type results.&lt;/p&gt;</description></item><item><title>You should probably tell your audience what your blog posts are about as early as possible</title><link>https://www.marginalia.nu/log/a_129_finding_audience/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_129_finding_audience/</guid><description>&lt;p&gt;Being clear about what your blog posts are about lets people who are interested in what you have to say find your writing more easily. The more paragraphs you spend getting to the point, the bigger the odds they&amp;rsquo;ll lose patience and click on something else before you&amp;rsquo;ve presented your thesis.&lt;/p&gt;
&lt;p&gt;When publishing articles online, no matter how obscure the subject matter, there is almost always some people who will be into what you have to say. The main thing that keeps them from finding your writing is that they need to discover that it exists. Discovery isn&amp;rsquo;t made a lick easier by vague titles and long rambling introductions.&lt;/p&gt;</description></item><item><title>New Search Filtering in Web and API</title><link>https://www.marginalia.nu/log/a_127_index_filtering/</link><pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_127_index_filtering/</guid><description>&lt;p&gt;The search engine recently exposed a fair number of new tools for custom filtering to the API consumers and users of the new UI.&lt;/p&gt;
&lt;p&gt;This was originally going to be an incredibly chaotic update, both annuncing the new features and doing a technical walkthrough of the changes but that ambition turned out a bit &lt;em&gt;too&lt;/em&gt; chaotic, so let&amp;rsquo;s split them up and focus on the feature announcement bit today.&lt;/p&gt;
&lt;h2 id="new-search-filtering-gui"&gt;New Search Filtering GUI&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s now possible to define a custom filter in the GUI, on the &lt;code&gt;marginalia-search.com&lt;/code&gt; version of the website!&lt;/p&gt;</description></item><item><title>Language Support for Marginalia Search</title><link>https://www.marginalia.nu/log/a_126_multilingual/</link><pubDate>Mon, 06 Oct 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_126_multilingual/</guid><description>&lt;p&gt;One of the big ambitions for the search engine this year has been to enable searching in more languages than English, and a pilot project for this has just been completed, allowing experimental support for German, French and Swedish.&lt;/p&gt;
&lt;p&gt;These changes are now live for testing, but with an extremely small corpus of documents.&lt;/p&gt;
&lt;p&gt;As the search engine has been up to this point built with English in mind, some anglo-centric assumptions made it into its code. A lot of the research on search engines generally seems to embed similar assumptions.&lt;/p&gt;</description></item><item><title>The CoPilot productivity paradox</title><link>https://www.marginalia.nu/log/a_125_ai_assistants/</link><pubDate>Sat, 06 Sep 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_125_ai_assistants/</guid><description>&lt;p&gt;I&amp;rsquo;ve been using the CoPilot plugin for IntelliJ on and off for the last few years,
and while initially pretty enthusiastic,
I&amp;rsquo;ve come to first disable it and then delete it altogether along with JetBrains&amp;rsquo; local AI-completions,
and generally felt this has been an improvement in productivity and a reduction of frustration.&lt;/p&gt;
&lt;p&gt;CoPilot is pretty good at taking things that are already pretty fast,
such as monotonous code transformations like mapping an object to a SQL statement,
and then making that even faster.&lt;/p&gt;</description></item><item><title>Snark, Ironic Detachment, Authenticity</title><link>https://www.marginalia.nu/log/a_124_snark_and_insincerity/</link><pubDate>Thu, 28 Aug 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_124_snark_and_insincerity/</guid><description>&lt;p&gt;How you engage with the world changes how you experience
the world, and how the world experiences you.&lt;/p&gt;
&lt;p&gt;A snarky and cynical approach, by its default assumption that
things are shit, or if they are not yet shit will inevitably turn to;
such an approach will give your world a malodorous brownish tint.&lt;/p&gt;
&lt;p&gt;Granted, snark gives you plausible deniability,
a motte-and-bailey that protects you from direct criticism,
encountering backlash you can always backpedal and say it
was just a joke that you accidentally took a bit too far.&lt;/p&gt;</description></item><item><title>Faster Index I/O with NVMe SSDs</title><link>https://www.marginalia.nu/log/a_123_index_io/</link><pubDate>Sat, 16 Aug 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_123_index_io/</guid><description>&lt;p&gt;The Marginalia Search index has been partially rewritten to perform much better, using new data structures designed to make better use of modern hardware. This post will cover the new design, and will also touch upon some of the unexpected and unintuitive performance characteristics of NVMe SSDs when it comes to read sizes.&lt;/p&gt;
&lt;p&gt;The index is already fairly large, but can sometimes feel smaller than it is, and paradoxically, query performance is a big part of why. If each query has a budget of 100-250ms, a design that finds and ranks results faster in that time period will produce better search results. There are other limitations as well, query understanding is still somewhat limited, where only minor changes to a query can unearth dozens of new related results.&lt;/p&gt;</description></item><item><title>Finding Dead Websites</title><link>https://www.marginalia.nu/log/a_122_dead_websites/</link><pubDate>Tue, 17 Jun 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_122_dead_websites/</guid><description>&lt;p&gt;As some of the work planned for Marginalia Search this year has been progressing a bit faster than anticipated, there was time to implement an unplanned change.&lt;/p&gt;
&lt;p&gt;This post details the implementation of a system for detecting when servers are online, to avoid serving dead links and improve data quality, and for detecting when websites have significant changes including ownership transfers and parking.&lt;/p&gt;
&lt;h1 id="table-of-contents"&gt;Table Of Contents&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#rationale"&gt;Feature Rationale&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#repr"&gt;Data Representation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#livedata"&gt;Live Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#eventdata"&gt;Event Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#details"&gt;Change Detection Details&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#availability"&gt;Availability Detection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#ownership"&gt;Ownership Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#dns"&gt;DNS&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#hurdles"&gt;Implementation Hurdles&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#scheduling"&gt;Scheduling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#certvalid"&gt;Certificate Validation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/a_122_dead_websites/#conclusion"&gt;Conclusions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a name="rationale"&gt;&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Profiling Websites</title><link>https://www.marginalia.nu/log/a_121_profiling_websites/</link><pubDate>Thu, 29 May 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_121_profiling_websites/</guid><description>&lt;p&gt;The most recent change to the search engine is a system that profiles websites based on their rendered DOM. The goal is identifying advertisements, trackers, nuisance popovers, and similar elements.&lt;/p&gt;
&lt;p&gt;The search engine already tries to do this, but isn&amp;rsquo;t very good at it because it&amp;rsquo;s only looking at static code.&lt;/p&gt;
&lt;p&gt;It turns out to be somewhat difficult to determine what a website that has non-trivial javascript will look like based its source code alone, as this would require us to among other things solve the halting problem.&lt;/p&gt;</description></item><item><title>A 2030 morning routine</title><link>https://www.marginalia.nu/log/a_120_morning_routine_2030/</link><pubDate>Fri, 23 May 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_120_morning_routine_2030/</guid><description>&lt;p&gt;You wake at 05:30 in the morning, feeling somewhat groggy.&lt;/p&gt;
&lt;p&gt;Instead of the alarm clock ringing like it normally does, a cheerful hologram appears: &amp;ldquo;Hi! I&amp;rsquo;m Kyle, your new alarm clock assistant!&amp;rdquo; You get dressed as Kyle explains all of the fantastic things he is capable of.&lt;/p&gt;
&lt;p&gt;You head over to the coffee machine. &amp;ldquo;Hey there! I&amp;rsquo;m Evan! Are you ready for AI in your coffee? But first - tell me about yourself!&amp;rdquo;. You ignore Evan&amp;rsquo;s monologue and close your eyes as the synthetic coffee replacement is brewing. Real coffee costs more than your coffee maker nowadays, so it has to suffice.&lt;/p&gt;</description></item><item><title>PDF to Text, a challenging problem</title><link>https://www.marginalia.nu/log/a_119_pdf/</link><pubDate>Tue, 13 May 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_119_pdf/</guid><description>&lt;p&gt;The search engine has recently gained the ability to index the PDF file format. The change will deploy over a few months.&lt;/p&gt;
&lt;p&gt;Extracting text information from PDFs is a significantly bigger challenge than it might seem.
The crux of the problem is that the file format isn&amp;rsquo;t a text format at all, but a graphical format.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t have text in the way you might think of it, but more of a mapping of glyphs to coordinates on &amp;ldquo;paper&amp;rdquo;. These
glyphs may be rotated, overlap, and appear out of order, with very little semantic information
attached to them.&lt;/p&gt;</description></item><item><title>Debugging A Crawler Stall</title><link>https://www.marginalia.nu/log/a_118_crawler_stall/</link><pubDate>Tue, 22 Apr 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_118_crawler_stall/</guid><description>&lt;p&gt;Some time ago, I migrated the crawler off the okhttp library, to use
Java&amp;rsquo;s builtin HTTP client. This seemed like a good idea at the time,
but has led to a fair number of headaches.&lt;/p&gt;
&lt;p&gt;Java&amp;rsquo;s HttpClient has one damning flaw, and that that it doesn&amp;rsquo;t support socket timeouts.&lt;/p&gt;
&lt;p&gt;Its only supported timeout values are time to connect, and time until first byte of the response. This means the client can get stuck on a read call if a server stops responding, potentially for a very long time!&lt;/p&gt;</description></item><item><title>Crawl Order and Disorder</title><link>https://www.marginalia.nu/log/a_117_crawl_order/</link><pubDate>Thu, 27 Mar 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_117_crawl_order/</guid><description>&lt;p&gt;A problem the search engine&amp;rsquo;s crawler has struggled with for some time is that it takes a fairly long time to finish up, usually spending several days wrapping up the final few domains.&lt;/p&gt;
&lt;p&gt;This has been actualized recently, since the migration to slop crawl data has dropped memory requirements of the crawler by something like 80%, and as such I&amp;rsquo;ve been able to increase the number of crawling tasks, which has led to a bizarre case where 99.9% of the crawling is done in 4 days, and the remaining 0.1% takes a week.&lt;/p&gt;</description></item><item><title>Marginalia Search receives second nlnet grant</title><link>https://www.marginalia.nu/log/a_116_grant_2.0/</link><pubDate>Tue, 25 Mar 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_116_grant_2.0/</guid><description>&lt;p&gt;I&amp;rsquo;m happy and grateful to announce that the Marginalia Search
project has been accepted for a second &lt;a href="https://nlnet.nl/"&gt;nlnet&lt;/a&gt; grant.&lt;/p&gt;
&lt;p&gt;All the details are not yet finalized, but tentatively the grant will go toward addressing most of the items in the project
&lt;a href="https://github.com/MarginaliaSearch/MarginaliaSearch/blob/master/ROADMAP.md"&gt;roadmap for 2025&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve already been working full time on the project since &lt;a href="https://www.marginalia.nu/log/83_full_time/"&gt;summer 2023&lt;/a&gt;, and this grant secures additional development time, and extends the runway to a comfortable degree.&lt;/p&gt;
&lt;p&gt;Will post more details as they are finalized.&lt;/p&gt;</description></item><item><title>Improved ways to operate a rude crawler</title><link>https://www.marginalia.nu/log/a_115_rude_crawler/</link><pubDate>Sat, 22 Mar 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_115_rude_crawler/</guid><description>&lt;p&gt;&lt;em&gt;This text is satirical in nature.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Tech news is abuzz with rude AI crawlers that forge their user-agent
and ignore &lt;code&gt;robots.txt&lt;/code&gt;. In my opinion, if this is all the AI startups can
muster, they&amp;rsquo;re losing their touch. &lt;code&gt;wget&lt;/code&gt; can do this. You need to up your
game, get that crawler really rolling coal. Flagrant disregard for externalities
is an important signal to the investors that your AI startup is the one.&lt;/p&gt;</description></item><item><title>Marginalia Search: 4 Years</title><link>https://www.marginalia.nu/log/a_114_4_years/</link><pubDate>Mon, 03 Mar 2025 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_114_4_years/</guid><description>&lt;p&gt;This update is a few days late, the canonical birth date of the project is Feb 26.&lt;/p&gt;
&lt;p&gt;It has been another year of Marginalia Search. The project is still ongoing, still my full time job, although the project is entering a somewhat more mature phase of development, most of the big pieces are in place and do a decent job at what they do.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/MarginaliaSearch/MarginaliaSearch/blob/master/ROADMAP.md"&gt;roadmap for the project&lt;/a&gt; is available on GitHub.&lt;/p&gt;</description></item><item><title>RSS Feeds and Real Time Crawling</title><link>https://www.marginalia.nu/log/a_113_rtc/</link><pubDate>Thu, 26 Dec 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_113_rtc/</guid><description>&lt;p&gt;A while back an update went live that, with some caveats, changes the time it takes for an update on a website to reflect in the search engine index from up to 2 months to 1-2 days. Conditions being if the website has an RSS or Atom feed.&lt;/p&gt;
&lt;p&gt;The big crawl job takes about two months, and is run partition by partition, meaning there&amp;rsquo;s typically a slice of the index that is two months stale at any given point in time. To help compensate for this, a new crawler and index partition has been added that focuses on recently updated content.&lt;/p&gt;</description></item><item><title>Notes on binary soup</title><link>https://www.marginalia.nu/log/a_112_slop_ideas/index.md/</link><pubDate>Tue, 05 Nov 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_112_slop_ideas/index.md/</guid><description>&lt;p&gt;I recently put together a small library called &lt;a href="https://github.com/MarginaliaSearch/SlopData"&gt;Slop&lt;/a&gt;, for intermediate on-disk data representation for the search engine, replacing a few ad-hoc formats I had in place before.&lt;/p&gt;
&lt;p&gt;This post isn&amp;rsquo;t so much an attempt to convince anyone else to use this library, as it makes trade-offs catering to a fairly niche use case, but to explore some of its design ideas, as it all came together very nicely, in the hopes that other libraries can draw ideas from it.&lt;/p&gt;</description></item><item><title>Phrase Matching in Marginalia Search</title><link>https://www.marginalia.nu/log/a_111_phrase_matching/</link><pubDate>Mon, 30 Sep 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_111_phrase_matching/</guid><description>&lt;p&gt;Marginalia Search now properly supports phrase matching. This not only permits a more robust implementation of quoted search queries, but also helps promote results where the search terms occur in the document exactly in the same order as they do in the query.&lt;/p&gt;
&lt;p&gt;This is a write-up about implementing this change. This is going to be a relatively long post, as it represents about 4 months of work.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m also happy and grateful to announce that the nlnet people reached out after the run of &lt;a href="../a_107_nlnext"&gt;the grant&lt;/a&gt; was over and asked me if I had more work in the pipe, and agreed to fund this change as well!&lt;/p&gt;</description></item><item><title>The sorry state of Java deserialization</title><link>https://www.marginalia.nu/log/a_110_java_io/</link><pubDate>Sun, 22 Sep 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_110_java_io/</guid><description>&lt;p&gt;I&amp;rsquo;ve been on a bit of a frustration-driven quest to solve a problem I frequently encounter
working on the search engine, that is, reading data from disk.&lt;/p&gt;
&lt;p&gt;You&amp;rsquo;d think this would be a pretty basic thing, but doing this in a way that is half-way performant is surprisingly hard and requires avoiding basically all the high level tools at your disposal.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a common sentiment that modern hardware is fast, so this may not matter, but we aren&amp;rsquo;t speaking a 30% performance hit, the the question is how many orders of magnitude you&amp;rsquo;re willing to forego.&lt;/p&gt;</description></item><item><title>Less Coffee, Better Sleep</title><link>https://www.marginalia.nu/log/a_109_sleep2/</link><pubDate>Wed, 31 Jul 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_109_sleep2/</guid><description>&lt;p&gt;As an experiment, I&amp;rsquo;ve reduced my coffee-intake to a single cup a day for about a week now. It&amp;rsquo;s made an enormous difference in sleep, mood and energy. I get tired at night, fall asleep quickly, and wake up refreshed.&lt;/p&gt;
&lt;p&gt;As mentioned previously in the context of &lt;a href="https://www.marginalia.nu/log/86-sleep/"&gt;morning sunlight exposure&lt;/a&gt;&amp;mdash;another thing that&amp;rsquo;s aided my sleeping habits, but is somewhat less practical to sustain as it requires fair weather&amp;mdash;I&amp;rsquo;ve always been slow to get going in the morning, active at night, bad at getting to bed at sane hours. Tired when I should be awake, and awake when I should be tired.&lt;/p&gt;</description></item><item><title>One year of solo dev, wrapping up the grant-funded work</title><link>https://www.marginalia.nu/log/a_107_nlnext/</link><pubDate>Tue, 18 Jun 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_107_nlnext/</guid><description>&lt;p&gt;&lt;a href="https://www.marginalia.nu/log/83_full_time/"&gt;A year ago&lt;/a&gt; I walked out of the office for the last time. I handed in my corpo laptop, said some good-byes, and since then I have been my own boss.&lt;/p&gt;
&lt;p&gt;This first year has been funded by an &lt;a href="https://nlnet.nl/" rel="external noopener"&gt;NLnet&lt;/a&gt; grant, which I&amp;rsquo;m in the midst of wrapping up. As of now, the work is all done, the final request for payment has been sent.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a similar last-day-of-school levity to both these events.&lt;/p&gt;</description></item><item><title>Feynman's Garden</title><link>https://www.marginalia.nu/log/a_108_feynman_revisited/</link><pubDate>Sun, 26 May 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_108_feynman_revisited/</guid><description>&lt;p&gt;The best description of my problem solving process is the Feynman algorithm,
which is sometimes presented as a joke where the hidden subtext is &amp;ldquo;be smart&amp;rdquo;, but
I disagree. The &amp;ldquo;algorithm&amp;rdquo; is a surprisingly lucid description of how thinking works in the
context of hard problems where the answer can&amp;rsquo;t simply be looked up or trivially
broken down, iterated upon in a bottom-up fashion, or approached with similar methods.&lt;/p&gt;
&lt;p&gt;Feynman&amp;rsquo;s thinking algorithm is described like this:&lt;/p&gt;</description></item><item><title>Experiment in Java native calls</title><link>https://www.marginalia.nu/log/a_106_native_calls/</link><pubDate>Thu, 16 May 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_106_native_calls/</guid><description>&lt;p&gt;I&amp;rsquo;ve experimentally replaced some of the Java implementations of quicksort and binary search with calls to C++ code, and saw huge benefits for the sorting code but the same or worse performance for binary search.&lt;/p&gt;
&lt;p&gt;The Marginalia Search engine is mainly written in Java, which is language that is good at many things, but not particularly pleasant to work with when it comes to low level systems programming.&lt;/p&gt;
&lt;p&gt;Unfortunately, a part of building an internet search engine involves database-adjacent low level programming.&lt;/p&gt;</description></item><item><title>Using DuckDB to seamlessly query a large parquet file over HTTP</title><link>https://www.marginalia.nu/log/a_105_duckdb_parquet/</link><pubDate>Sun, 05 May 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_105_duckdb_parquet/</guid><description>&lt;p&gt;A neat property of the parquet file format is that it&amp;rsquo;s designed with block I/O in mind,
so that when you are interested in only parts of the contents of a file, it&amp;rsquo;s possible to
some extent to only read that data. Many tools are aware of this property, and DuckDB
is one of them. Depending on which circles you run in, a lesser known aspect of HTTP
is range requests, where you specify which bytes in a file to be retrieved. It&amp;rsquo;s possible
to combine this trio of properties to read remote parquet files directly in DuckDB.&lt;/p&gt;</description></item><item><title>Query Parsing and Understanding</title><link>https://www.marginalia.nu/log/a_103_query_parsing/</link><pubDate>Wed, 17 Apr 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_103_query_parsing/</guid><description>&lt;p&gt;Been working on improving Marginalia Search query parsing and understanding. This is going to be a pretty long update, as it&amp;rsquo;s a few months&amp;rsquo; work.&lt;/p&gt;
&lt;p&gt;Apart from cleaning up the somewhat messy query parsing code, a problem I&amp;rsquo;m trying to address is that the search engine is currently only good at dealing with fairly focused queries, they don&amp;rsquo;t need to be short, but if you try to qualify a search that is too broad by adding more terms, it often doesn&amp;rsquo;t produce anything useful.&lt;/p&gt;</description></item><item><title>Deep Bug</title><link>https://www.marginalia.nu/log/a_104_dep_bug/</link><pubDate>Wed, 10 Apr 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_104_dep_bug/</guid><description>&lt;p&gt;The project has been haunted by a mysterious bug since sometime February. It relates to the code that constructs the index, particularly the code that merges partial indices.&lt;/p&gt;
&lt;p&gt;In short the search engine constucts the reverse index through successive merging of smaller indices, which reduces the overall memory requirement.&lt;/p&gt;
&lt;p&gt;You can conceptualize the revese index itself as two files, one with offset pointers into another file, which has sorted numbers. This code runs after each partition finishes crawling and processing its data, and has a run time of about 4 hours.&lt;/p&gt;</description></item><item><title>The Yak Shave</title><link>https://www.marginalia.nu/log/a_102_yak_shave/</link><pubDate>Wed, 28 Feb 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_102_yak_shave/</guid><description>&lt;p&gt;I set out a little over a week ago to add a service registry to Marginalia Search,
primarily to reduce its dependence on docker. I would like it to be able to run
on bare metal as well, which poses a problem since configuring the application manually
is a bit of a headache with dozens of ports that need to be set up. It would also be
desirable to be able to run multiple instances of important services in order elliminate
downtime during upgrades.&lt;/p&gt;</description></item><item><title>Marginalia: 3 Years</title><link>https://www.marginalia.nu/log/a_101_marginalia-3-years/</link><pubDate>Sun, 25 Feb 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_101_marginalia-3-years/</guid><description>&lt;p&gt;It&amp;rsquo;s been three years since the inception of Marginalia Search, then
a dinky experiment to find where the heck the cool Internet has gone,
now my full time job.&lt;/p&gt;
&lt;p&gt;While there&amp;rsquo;s always things that can be improved, it&amp;rsquo;s fair to say
the search engine has never worked as well as it does right now.&lt;/p&gt;
&lt;p&gt;A great number of milestones have been reached, perhaps biggest
of all the search engine has moved out of my living room and into
a proper enterprise server.&lt;/p&gt;</description></item><item><title>Best SEO spam 2024 reddit</title><link>https://www.marginalia.nu/log/a_100_reddit_spam/</link><pubDate>Wed, 07 Feb 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/a_100_reddit_spam/</guid><description>&lt;p&gt;One of the great joys of working on a search engine is that you get to reverse engineer SEO spam, and overall study how it evolves over time.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been noticing the search engine spam strategy of adding &amp;lsquo;reddit&amp;rsquo; to page titles for a few years now, but it feels like it&amp;rsquo;s been growing a lot recently. I don&amp;rsquo;t think it&amp;rsquo;s actually &lt;em&gt;working&lt;/em&gt;, but it&amp;rsquo;s so cute that they are trying.&lt;/p&gt;</description></item><item><title>Contexts, Friction and Distractions</title><link>https://www.marginalia.nu/log/99_context/</link><pubDate>Tue, 30 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/99_context/</guid><description>&lt;p&gt;I get significantly more work done when I unplug my computer from the Internet. It&amp;rsquo;s not that my productive output drops to zero when I&amp;rsquo;m plugged in, but more like 70%.&lt;/p&gt;
&lt;p&gt;Despite many of the tools that I use requiring a connection, and certainly the Internet containing a wealth of information that might expedite my work, these benefits are drastically outweighed by the wealth of distractions also available.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s very appealing, when the code is compiling or the docker containers restarting, to sneak open a browser tab with hacker news, or the Χ formerly known as Twitter, youtube, mastodon, a news site, or something similar to pass those minutes.&lt;/p&gt;</description></item><item><title>Charlatans spreading misleading beginner advice are the evolutionary crabs of youtube content creators</title><link>https://www.marginalia.nu/log/98_youtube-crabs/</link><pubDate>Mon, 29 Jan 2024 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/98_youtube-crabs/</guid><description>&lt;p&gt;You have a hobby you&amp;rsquo;ve been into for a decade or more. You like talking about your hobby, and your friends and family, after listening to these things for as long as you&amp;rsquo;ve been into them, maybe aren&amp;rsquo;t as excited to always hear about it as you are about discussing them, so in an act of compassion you create a youtube channel where you can monologue about your passion instead.&lt;/p&gt;</description></item><item><title>A Hobby Coding Biography</title><link>https://www.marginalia.nu/log/97_projects/</link><pubDate>Thu, 28 Dec 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/97_projects/</guid><description>&lt;p&gt;This is a bit of a retrospective of every project I&amp;rsquo;ve worked on, as far as I remember them. I&amp;rsquo;ve tried to unearth any artifacts that remain.&lt;/p&gt;
&lt;p&gt;Far from everything is flattering and resounding success, but then again, maybe that&amp;rsquo;s good. There are definitely patterns in the things that
didn&amp;rsquo;t pan out.&lt;/p&gt;
&lt;h1 id="earliest-traces"&gt;Earliest Traces&lt;/h1&gt;
&lt;p&gt;I was definitely programming stuff, but I don&amp;rsquo;t think it ever amounted to anything tangible. It was more like playing house,
I built GUIs that looked like real applications but didn&amp;rsquo;t really do anything but look cool to me. Real hackertyper vibes.
This was in Java, Delphi and VB script mostly. I also built a bunch of websites on geocities and angelfire. This is definitely
as far back as 1999. I did some QBASIC stuff even earlier, like basic text based games and drawing circles and stuff.&lt;/p&gt;</description></item><item><title>A Frivolous Feature</title><link>https://www.marginalia.nu/log/96_frivolous_asn/</link><pubDate>Fri, 22 Dec 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/96_frivolous_asn/</guid><description>&lt;p&gt;Marginalia Search very recently gained the ability to filter results by Autonomous System,
not only searching by ASN but by the organization information for that AS. At a glance this
seems like a somewhat frivolous feature, but it has interesting effects.&lt;/p&gt;
&lt;p&gt;Autonomous Systems are part of the Internet&amp;rsquo;s routing infrastructure. If your mental model of an IP
number is that they are the phone number of the computer, this is something akin to a postal code.
Digging much deeper than that into BGP and autonomous systems is not really in the scope of this article, but
&lt;a href="https://en.wikipedia.org/wiki/Autonomous_system_%28Internet%29"&gt;Wikipedia&lt;/a&gt; has a relatively lucid article on this.&lt;/p&gt;</description></item><item><title>How To Read An Article On The Internet</title><link>https://www.marginalia.nu/log/95_how_to_read/</link><pubDate>Wed, 20 Dec 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/95_how_to_read/</guid><description>&lt;pre&gt;&lt;code&gt; A simple guide to reading
 in 9 simple steps
&lt;/code&gt;&lt;/pre&gt;
&lt;ol&gt;
&lt;li&gt;Navigate to the desired article.&lt;/li&gt;
&lt;li&gt;Dismiss the GDPR banner&lt;/li&gt;
&lt;li&gt;It may seem safe to start reading, but you need to wait about 10 seconds as the
various ad auctions resolve and scripts load in&lt;/li&gt;
&lt;li&gt;Wait while the article is populated with ads. While the article is in front of you, there is no point to
starting to read yet, as the minute&amp;rsquo;s worth of layout shift will make you lose your place.&lt;/li&gt;
&lt;li&gt;Once the layout has settled and not shifted for 10 seconds, move the mouse cursor toward the URL bar to
trigger any sort of exit intent scripts you might otherwise accidentally invoke while reading.&lt;/li&gt;
&lt;li&gt;Scroll to the end of the article to ensure a paywall doesn&amp;rsquo;t appear halfway through the article.&lt;/li&gt;
&lt;li&gt;Dismiss any calls to subscribe to a newsletter that may have appeared in step 6.&lt;/li&gt;
&lt;li&gt;Press Ctrl+F and search for out-of-place paragraphs beginning with &amp;ldquo;Benefits of&amp;rdquo;; this
is a strong indicator the article is some ChatGPT content farm gibberish.&lt;/li&gt;
&lt;li&gt;Press Ctrl+P to bring up the print preview dialog and read the article in print preview&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Some also say you should press the back button once to get any back button hijackings out of the way,
but your mileage vay vary.&lt;/p&gt;</description></item><item><title>WARC'in the crawler</title><link>https://www.marginalia.nu/log/94_warc_warc/</link><pubDate>Wed, 20 Dec 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/94_warc_warc/</guid><description>&lt;p&gt;The Marginalia Crawler has seen improvements! A long term problem with the crawler design is
that if for whatever reason the crawler shuts down, then it needs to re-start fetching whatever
domains it was currently traversing during the termination from zero.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t fantastic, since not only does crawling a website take a fair bit of time,
it&amp;rsquo;s a nuisance for the server admins to re-crawl stuff that was already fetched, and
a real liability for ending up in robots.txt or some iptables ruleset.&lt;/p&gt;</description></item><item><title>Anchor Tags</title><link>https://www.marginalia.nu/log/93_atags/</link><pubDate>Tue, 07 Nov 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/93_atags/</guid><description>&lt;p&gt;I&amp;rsquo;ve been working on getting anchor tag keywords into the search engine,
basically using link texts to complement the keywords on a webpage.&lt;/p&gt;
&lt;p&gt;The problem I&amp;rsquo;m attempting to address is that many websites don&amp;rsquo;t really describe
themselves particularly well. As Steve Ballmer&amp;rsquo;s stage performance once illustrated,
merely repeating a word doesn&amp;rsquo;t on its own make what you&amp;rsquo;re saying relevant to the term.&lt;/p&gt;
&lt;p&gt;Another good example of how it falls short is &lt;a href="https://www.chiark.greenend.org.uk/~sgtatham/putty/"&gt;PuTTY&amp;rsquo;s website&lt;/a&gt;,
which will be used as a pilot case to improve.&lt;/p&gt;</description></item><item><title>Partitioning The Index</title><link>https://www.marginalia.nu/log/92_multirack_drifting/</link><pubDate>Mon, 30 Oct 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/92_multirack_drifting/</guid><description>&lt;p&gt;So a bit of an update on what I&amp;rsquo;ve been working on. This will be adapted into release notes in
a while, but I haven&amp;rsquo;t quite wrapped a bow on the change set yet.&lt;/p&gt;
&lt;p&gt;Still, it has certainly been a few weeks. Didn&amp;rsquo;t quite land how busy I&amp;rsquo;ve been until I set down to
draft this post. Them&amp;rsquo;s some changes, and I&amp;rsquo;m skipping a few to keep this meandering post at a sane length.&lt;/p&gt;</description></item><item><title>The List</title><link>https://www.marginalia.nu/log/91-the-list/</link><pubDate>Thu, 19 Oct 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/91-the-list/</guid><description>&lt;p&gt;As a general observation, I tend to be more productive when I know what
to do next at any given moment.&lt;/p&gt;
&lt;p&gt;There are days when I&amp;rsquo;ve seemingly gotten a &amp;ldquo;week&amp;rdquo; of work done on an afternoon,
those are the days when what I needed to do was very clear, and I basically just
had a list of items to tick off one by one.&lt;/p&gt;
&lt;p&gt;There have admittedly also those ignoble weeks weeks when I&amp;rsquo;ve gotten an
afternoon&amp;rsquo;s work done, mostly they are weeks when it&amp;rsquo;s not been at all clear
what to do next.&lt;/p&gt;</description></item><item><title>Moving Marginalia to a New Server</title><link>https://www.marginalia.nu/log/90-new-server-design/</link><pubDate>Sat, 07 Oct 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/90-new-server-design/</guid><description>&lt;p&gt;So the search engine is moving to a new server soon, thanks to the generous grant
&lt;a href="https://www.marginalia.nu/log/88-futo-grant/"&gt;mentioned recently&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you visit search.marginalia.nu now, it may or may not use the old or new server. It&amp;rsquo;ll be like this for
a while, since I need them both for testing and maintenance type work.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll also apologize if this post is a bit chaotic. It is a reflection of a very chaotic couple of weeks that
apart from setting up this migration also involved a very short notice invitation for a
presentation at &lt;a href="https://opensearchfoundation.org/en/events-osf/5th-international-open-search-symposium-ossym2023/"&gt;ossym23&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>A Disk Usage Mystery</title><link>https://www.marginalia.nu/log/89-disk-usage-mystery/</link><pubDate>Sat, 23 Sep 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/89-disk-usage-mystery/</guid><description>&lt;p&gt;I ran into a bit of a puzzling situation yesterday, testing some of the new index construction changes before they&amp;rsquo;re going live in a few days.&lt;/p&gt;
&lt;p&gt;The process crashed with a pretty non-descript stack trace complaining about illegal instructions, so first glance it looked more like it was within the realm of freak JVM bug, cosmic ray, hardware error maybe.&lt;/p&gt;
&lt;p&gt;I was doing this on my developer workstation, which also spawned a popup complaining that the hard drive it was working on had nearly run out of space, and inferred that the error probably was due to memory mapping more space onto a disk than what was possible.&lt;/p&gt;</description></item><item><title>Marginalia Search receives FUTO Grant</title><link>https://www.marginalia.nu/log/88-futo-grant/</link><pubDate>Fri, 15 Sep 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/88-futo-grant/</guid><description>&lt;p&gt;I&amp;rsquo;m happy to announce that the generous people at &lt;a href="https://futo.org/"&gt;FUTO&lt;/a&gt; have granted the project $15,000 with no strings attached to help the search engine out with some more server power.&lt;/p&gt;
&lt;p&gt;FUTO is a young Austin, TX-based organization &amp;ldquo;&lt;em&gt;dedicated to developing, both through in-house engineering and investment, technologies that frustrate centralization and industry consolidation&lt;/em&gt;&amp;rdquo;. It&amp;rsquo;s one to keep an eye on, I believe their heart is in the right place and they have every possibility of making a real difference.&lt;/p&gt;</description></item><item><title>Absurd Success</title><link>https://www.marginalia.nu/log/87_absurd_success/</link><pubDate>Wed, 30 Aug 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/87_absurd_success/</guid><description>&lt;p&gt;So&amp;hellip; I&amp;rsquo;ve had the most unreal week of coding. Zero exaggeration, I&amp;rsquo;ve halved the
RAM requirements of the search engine, removed the need to take the system
offline during an upgrade, removed hard limits on how many documents can be indexed,
and quadrupled soft limits on how many keywords can be in the corpus.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s been a long term goal to keep it possible to run and operate the system
on low-powered hardware, and so far improvements have been made, to the point
where my 32 Gb RAM developer machine feels spacey rather than cramped, but this
set of changes takes it several notches further.&lt;/p&gt;</description></item><item><title>Sleeping at Night</title><link>https://www.marginalia.nu/log/86-sleep/</link><pubDate>Tue, 22 Aug 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/86-sleep/</guid><description>&lt;p&gt;I&amp;rsquo;ve started going on a long walk each morning immediately as I wake up, and it&amp;rsquo;s had the unexpected side-effect of fixing my broken circadian rhythm.&lt;/p&gt;
&lt;p&gt;For decades, as long as I can remember, I&amp;rsquo;ve been what you might consider a serious night owl. Regardless of how long I slept or when I woke up, I would get nothing requiring any sort of thought done until sometime after lunch, and it wasn&amp;rsquo;t really until late at night that my brain really kicked into gear.&lt;/p&gt;</description></item><item><title>Life in 1080p</title><link>https://www.marginalia.nu/log/84_life_in_1080p/</link><pubDate>Sat, 12 Aug 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/84_life_in_1080p/</guid><description>&lt;p&gt;I recently got a smaller computer screen. It&amp;rsquo;s actually not &lt;em&gt;that&lt;/em&gt; small, it&amp;rsquo;s 27&amp;quot;, but the resolution is modest compared to what is available. And in short, it&amp;rsquo;s fantastic. It&amp;rsquo;s not an expensive screen, it&amp;rsquo;s not a fancy screen; but it&amp;rsquo;s comparatively a small screen.&lt;/p&gt;
&lt;p&gt;For a few years I was using a 34&amp;quot; ultra-wide monitor, which has been causing me nothing but grief. It&amp;rsquo;s sort of crept up on me that so many small annoyances in my computer-use all originated from using this screen. I want to elaborate on what&amp;rsquo;s been chafing. I want to be explicit that these are my experiences as someone who primarily uses a computer screen for programming. If you&amp;rsquo;re using your computer for something else, your experiences may differ.&lt;/p&gt;</description></item><item><title>Message Queues, State Machines, Actors, UI</title><link>https://www.marginalia.nu/log/85-mq_sm_actor_ui/</link><pubDate>Sat, 12 Aug 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/85-mq_sm_actor_ui/</guid><description>&lt;p&gt;This is a bit of an &lt;em&gt;what I&amp;rsquo;ve been working on&lt;/em&gt; style of post. It&amp;rsquo;s also a bit of a complement for the
release notes of the upcoming release which should be dropping in a week or so. There&amp;rsquo;s some spit and
polish still missing from these things, but if I don&amp;rsquo;t write about them now too much will have been
ejected from the cache to make a well written post about it.&lt;/p&gt;</description></item><item><title>Full Time</title><link>https://www.marginalia.nu/log/83_full_time/</link><pubDate>Fri, 16 Jun 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/83_full_time/</guid><description>&lt;p&gt;I&amp;rsquo;m working on Marginalia Search full time.&lt;/p&gt;
&lt;p&gt;I left the office for the last time today, and it&amp;rsquo;s the strangest feeling. I&amp;rsquo;ve quit jobs, taken time off work, been laid off, but this is different from any of those things. This is deliberate.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a note of relief. I&amp;rsquo;ve essentially been working two pretty demanding jobs; one for pay and one for passion and the joy of making a difference.&lt;/p&gt;</description></item><item><title>Killing Community</title><link>https://www.marginalia.nu/log/82_killing_community/</link><pubDate>Sun, 11 Jun 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/82_killing_community/</guid><description>&lt;p&gt;This is a theory that&amp;rsquo;s previously been stated in &lt;a href="https://www.marginalia.nu/log/39-normie-hypothesis.gmi"&gt;log/39-normie-hypothesis.gmi&lt;/a&gt;, but I think it&amp;rsquo;s worth expanding on as it&amp;rsquo;s become very relevant with the recent Reddit shit-show actualizing just how bad that website has gotten along with social media in general.&lt;/p&gt;
&lt;p&gt;I think the model demonstrates how the &amp;rsquo;enshittification&amp;rsquo; process is an inevitability with any social media that is run on a venture capital model.&lt;/p&gt;
&lt;p&gt;An online community can be like a village, where you have familiar faces, collective experiences, shared values and so forth. It can be like a village and be five people, it can be like a village and be a thousand people.&lt;/p&gt;</description></item><item><title>Confessions</title><link>https://www.marginalia.nu/log/81-confessions/</link><pubDate>Fri, 19 May 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/81-confessions/</guid><description>&lt;p&gt;&lt;strong&gt;I use print debugging all the time&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I know how to use a debugger. I use a debugger sometimes, but most of my debugging is done by print statements that are like&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;A
B
C
.
.
5
D
.
D
,
{true, 30}
.
,
.
,
10
E
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;I think Clean Code makes some valid points&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t think it should be your bible or treated as infallable, having seen the sort of code that came before it, yeah, Uncle Bob got some things right. He rightfully gets some shit for the stuff that didn&amp;rsquo;t turn out perfectly well as a product of the time, but at the same time, I think we&amp;rsquo;ve sort of become blind to how much he got right.&lt;/p&gt;</description></item><item><title>R.I.P. Memex</title><link>https://www.marginalia.nu/log/80-rip-memex/</link><pubDate>Mon, 08 May 2023 10:49:28 +0200</pubDate><guid>https://www.marginalia.nu/log/80-rip-memex/</guid><description>&lt;p&gt;I killed the old memex.marginalia.nu site. Not because it wasn&amp;rsquo;t great,
but because I don&amp;rsquo;t have the time to maintain the software, which was quite janky,
and perhaps most of all I wasn&amp;rsquo;t really feeling it.&lt;/p&gt;
&lt;p&gt;The new site looks superficially similar, but it&amp;rsquo;s actually just a Hugo
template that emulates some of the memex&amp;rsquo; capabilities. Although some of
the coolest stuff is sadly gone as a result.&lt;/p&gt;
&lt;p&gt;Thankfully I decided to use an extremely portable markup format when building
the original memex software, which meant that porting it over to hugo was literally
just a matter of writing a brief python script.&lt;/p&gt;</description></item><item><title>An off-ramp from the digital IKEA maze</title><link>https://www.marginalia.nu/log/79-ikea-offramp/</link><pubDate>Thu, 04 May 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/79-ikea-offramp/</guid><description>&lt;p&gt;There is an episode of Star Trek where a character is for plot reasons trapped in a shrinking parallel universe. As time passes, people she knows one by one just vanish and she is the only one who seems to notice. Eventually it gets to an absurd point. She asks if it really makes sense if a ship made for a thousand people would have a crew of a few people, and everyone just sort of like shrugs and looks at her like she&amp;rsquo;s crazy. That&amp;rsquo;s basically what the last decade of the Internet[1]. It feels like it&amp;rsquo;s shrinking. Like parts of it are vanishing.&lt;/p&gt;</description></item><item><title>Thoughts on AI and AI-veganism</title><link>https://www.marginalia.nu/log/78-on-ai-veganism/</link><pubDate>Fri, 21 Apr 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/78-on-ai-veganism/</guid><description>&lt;p&gt;I&amp;rsquo;ve come to think LLMs/GPTs/whatever are a threat to conventional search engines because the modern web is an unbelievably annoying dumpster fire.&lt;/p&gt;
&lt;p&gt;They don&amp;rsquo;t really provide better or faster answers, what they provide is an experience that is not a complete pain in the ass.&lt;/p&gt;
&lt;p&gt;This frog has been simmering for a long while now and we&amp;rsquo;re so used to it that seeing literally anything else seems revolutionary.&lt;/p&gt;
&lt;p&gt;You visit a website and need to dismiss a cookie policy notification, a request to show popups, a request to know your location, an invitation to subscribe to a newsletter, a sales rep wants to have a chat with you, then you get random layout shifts for several minutes as all the ad auctions finish, and then just as you&amp;rsquo;re ready to read the content the website crashes and reloads and the circus starts over again. You try to leave, and it hijacks your back button into a redirect loop.&lt;/p&gt;</description></item><item><title>Going Github</title><link>https://www.marginalia.nu/log/77-going-github/</link><pubDate>Sat, 25 Mar 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/77-going-github/</guid><description>&lt;p&gt;I&amp;rsquo;ve moved Marginalia&amp;rsquo;s sources to Github. Can&amp;rsquo;t pick every battle.&lt;/p&gt;
&lt;p&gt;The main reason is I&amp;rsquo;m kind of tired of the amount of spam bots that keep signing up to my Gitea. The juice of self-hosting a public-access git forge, even locked down to prevent arbitrary repo creation, that juice just isn&amp;rsquo;t worth the squeeze.&lt;/p&gt;
&lt;p&gt;This is not without some consideration.&lt;/p&gt;
&lt;p&gt;To be blunt, I don&amp;rsquo;t like Github. Their use of dark patterns leaves a real nasty after-taste. I&amp;rsquo;m also old enough to remember the Microsoft of the early 2000s very vividly.&lt;/p&gt;</description></item><item><title>Search Result Quality For Multiple Terms</title><link>https://www.marginalia.nu/log/76-search-result-quality-for-multiple-terms/</link><pubDate>Thu, 23 Mar 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/76-search-result-quality-for-multiple-terms/</guid><description>&lt;p&gt;This is a bit of a follow up to the previous post.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/75-grand-restructuring.gmi"&gt;The Grand Code Restructuring [ 2023-03-17 ]&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Marginalia&amp;rsquo;s search result quality has, for a long while, been pretty good as long as your search query is a single term, but for multiple search terms it&amp;rsquo;s been a bit hit-and-miss. Marginalia was never great at this, but the quality of results in this usage pattern has taken a bit of a dive recently due to a re-write of the index last fall.&lt;/p&gt;</description></item><item><title>The Grand Code Restructuring</title><link>https://www.marginalia.nu/log/75-grand-restructuring/</link><pubDate>Fri, 17 Mar 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/75-grand-restructuring/</guid><description>&lt;p&gt;In general I don&amp;rsquo;t like to fuss over code, but this is exactly what I&amp;rsquo;ve been doing in preparation of the NLnet funded work. I&amp;rsquo;ve spent the last month restructuring Marginalia&amp;rsquo;s code base. It&amp;rsquo;s not completely done, but I&amp;rsquo;ve made great headway.&lt;/p&gt;
&lt;p&gt;Things got the way they got because in general for experimental solo-development projects, I think it makes sense to be fairly tolerant of technical debt.&lt;/p&gt;
&lt;p&gt;Since refactoring is something that is extremely difficult to break up into parallel tracks or do in small iterations, the cost of refactoring is effectively multiplied by the number of people that could be working on the code.&lt;/p&gt;</description></item><item><title>Marginalia Search: 2 years, big news</title><link>https://www.marginalia.nu/log/74-marginalia-2-years/</link><pubDate>Sun, 26 Feb 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/74-marginalia-2-years/</guid><description>&lt;p&gt;No time like the project&amp;rsquo;s two year anniversary to drop this particular bomb&amp;hellip;&lt;/p&gt;
&lt;p&gt;Marginalia&amp;rsquo;s gotten an NLNet grant. This means I&amp;rsquo;ll be able to work full time on this project at least a year.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nlnet.nl/project/Marginalia/"&gt;https://nlnet.nl/project/Marginalia/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This grant is essentially the best-case scenario for funding this project. It&amp;rsquo;ll be able to remain independent, open-source, and non-profit.&lt;/p&gt;
&lt;p&gt;I won&amp;rsquo;t start in earnest for a few months as I&amp;rsquo;ve got loose ends to tie up before I can devote that sort of time. More details to come, but I&amp;rsquo;ll say as much as the first step is a tidying up of the sources and a move off my self-hosted git instance to an external git host yet to be decided.&lt;/p&gt;</description></item><item><title>A new approach to domain ranking</title><link>https://www.marginalia.nu/log/73-new-approach-to-ranking/</link><pubDate>Mon, 06 Feb 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/73-new-approach-to-ranking/</guid><description>&lt;p&gt;This is a very brief post announcing a fascinating discovery.&lt;/p&gt;
&lt;p&gt;It appears to be possible to use the cosine similarity approach powering explore2.marginalia.nu as a substitute for the link graph in an eigenvector-based ranking algorithm (i.e. PageRank).&lt;/p&gt;
&lt;p&gt;The original PageRank algorithm can be conceptualized as a simulation of where a random visitor would end up if they randomly clicked links on websites. With this model in mind, the modification replaces the link-clicking with using explore2 for navigation.&lt;/p&gt;</description></item><item><title>Are you ok?</title><link>https://www.marginalia.nu/log/72-are-you-ok/</link><pubDate>Fri, 27 Jan 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/72-are-you-ok/</guid><description>&lt;p&gt;I don&amp;rsquo;t know if I&amp;rsquo;m just imagining it, but has the Internet gone progressively more crazy the last decade or so?&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s like everyone is so damn angry all the time. If they aren&amp;rsquo;t angry they&amp;rsquo;re bitter and resentful. And when they aren&amp;rsquo;t angry or bitter, they&amp;rsquo;re so depressed they&amp;rsquo;re barely able to crawl out of bed. And if they aren&amp;rsquo;t angry, bitter, or depressed, they have crippling anxiety. Every other week there&amp;rsquo;s some public blow-out where some person or another just loses their shit.&lt;/p&gt;</description></item><item><title>Memex Design</title><link>https://www.marginalia.nu/log/71-memex-design/</link><pubDate>Fri, 13 Jan 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/71-memex-design/</guid><description>&lt;p&gt;For clarification, this is discussing no other thing called Memex than memex.marginalia.nu, the website you&amp;rsquo;re probably visiting right now. That, or you&amp;rsquo;re reading this over gemini at marginalia.nu, which is serving the same content over a different protocol.&lt;/p&gt;
&lt;p&gt;I wanted to build a cross-protocol static site generator designed in a way that is equally understandable by both humans and machines. This groundedness is an appealing property I really admire about the gemini protocol and gemtext format. It&amp;rsquo;s something I want to explore if it&amp;rsquo;s possible to extend to software in general.&lt;/p&gt;</description></item><item><title>Faster Index Joins</title><link>https://www.marginalia.nu/log/70-faster-index-joins/</link><pubDate>Tue, 03 Jan 2023 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/70-faster-index-joins/</guid><description>&lt;p&gt;The most common (and most costly) operation of the marginalia search engine&amp;rsquo;s index is something like given a set of documents containing one keyword, find each documents containing another keyword.&lt;/p&gt;
&lt;p&gt;The naive approach is to just iterate over each document identifier in the first set and do a membership test in the b-tree containing the second. This is an O(m log n)-operation, which on paper is pretty fast.&lt;/p&gt;
&lt;p&gt;It turns out it can be made faster.&lt;/p&gt;</description></item><item><title>Creepy Website Similarity</title><link>https://www.marginalia.nu/log/69-creepy-website-similarity/</link><pubDate>Mon, 26 Dec 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/69-creepy-website-similarity/</guid><description>&lt;p&gt;This is a write-up about an experiment from a few months ago, in how to find websites that are similar to each other. Website similarity is useful for many things, including discovering new websites to crawl, as well as suggesting similar websites in the Marginalia Search random exploration mode.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://explore2.marginalia.nu/"&gt;A link to a slapdash interface for exploring the experimental data.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The approach chosen was to use the link graph look for websites that are linked to from the same websites. This turned out to work remarkably well.&lt;/p&gt;</description></item><item><title>On Wizards and Sorcerers</title><link>https://www.marginalia.nu/log/68-wizards-vs-sorcerers/</link><pubDate>Fri, 23 Dec 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/68-wizards-vs-sorcerers/</guid><description>&lt;p&gt;While this post is about programming, it also draws an extended analogy to dungeons and dragons, specifically two of its classes, that correspond to two attitudes toward programming.&lt;/p&gt;
&lt;p&gt;In D&amp;amp;D, wizards study magic. They prepare their magic spells ahead of time. While they may learn a large number of magic spells, they need to prepare them ahead of time and can&amp;rsquo;t just cast them at will.&lt;/p&gt;
&lt;p&gt;Wizard programmers prefer up-front design. They apply reason and logic to divide and conquer a large problem, they rely on building blocks like design patterns and algorithms. Wizards rely on explicit knowledge.&lt;/p&gt;</description></item><item><title>The best ideas come AFK</title><link>https://www.marginalia.nu/log/67-best-ideas-afk/</link><pubDate>Mon, 07 Nov 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/67-best-ideas-afk/</guid><description>&lt;p&gt;I get my best ideas when I&amp;rsquo;m not working.&lt;/p&gt;
&lt;p&gt;This seems paradoxical, but past a point, the more I work on a project the slower it seems to go. I&amp;rsquo;ll find changes to do, but lose any sort of vision.&lt;/p&gt;
&lt;p&gt;If I&amp;rsquo;m not programming at all, I rarely get good ideas as well.&lt;/p&gt;
&lt;p&gt;There appears to be some magic stoichiometric mixture where I work on a project for a while, then force myself to take a break somewhere far away from any keyword for a day or two, the ideas start to roll in at a pace where I can barely keep up to write them down.&lt;/p&gt;</description></item><item><title>Carbon Dating HTML</title><link>https://www.marginalia.nu/log/66-carbon-dating/</link><pubDate>Thu, 27 Oct 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/66-carbon-dating/</guid><description>&lt;p&gt;One of the more common feature requests I&amp;rsquo;ve gotten for Marginalia Search is the ability to search by date. I&amp;rsquo;ve been a bit reluctant because this has the smell of a a surprisingly hard problem. Or rather, a surprisingly large number of easy problems.&lt;/p&gt;
&lt;p&gt;The initial hurdle we&amp;rsquo;ll encounter is that among structured data, pubDate in available in RDFa, OpenGraph, JSON+LD, and Microdata.&lt;/p&gt;
&lt;p&gt;A few examples:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;&amp;lt;meta property=&amp;#34;datePublished&amp;#34; content=&amp;#34;2022-08-24&amp;#34; /&amp;gt;
&amp;lt;meta itemprop=&amp;#34;datePublished&amp;#34; content=&amp;#34;2022-08-24&amp;#34; /&amp;gt;
&amp;lt;meta property=&amp;#34;article:published_time&amp;#34; content=&amp;#34;2022-08-24T14:39:14Z&amp;#34; /&amp;gt;
&amp;lt;script type=&amp;#34;application/ld+json&amp;#34;&amp;gt;
{&amp;#34;datePublished&amp;#34;:&amp;#34;2022-08-24T14:39:14Z&amp;#34;}
&amp;lt;/script&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;So far not so that bad. This is at least a case where the web site tells you that here is the pub-date, the exact format of the date may vary, but this is solvable.&lt;/p&gt;</description></item><item><title>Scaling doesn't scale</title><link>https://www.marginalia.nu/log/65-scaling-doesnt-scale/</link><pubDate>Tue, 25 Oct 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/65-scaling-doesnt-scale/</guid><description>&lt;p&gt;By which I mean there are deeply problematic assumptions in the very notion of scaling: Scaling changes the rules, and scaling problems exist in both directions. If what you are doing effortlessly scales up, it almost always means it&amp;rsquo;s egregiously sub-optimal given your present needs.&lt;/p&gt;
&lt;p&gt;These assertions are all very abstract. I&amp;rsquo;ll illustrate with several examples, to try and build an intuition for scaling. You most likely already know what I&amp;rsquo;m saying is true, but you may need reminding that this is how it works.&lt;/p&gt;</description></item><item><title>Marginalia's Index Reaches 100,000,000 Documents</title><link>https://www.marginalia.nu/log/64-hundred-million/</link><pubDate>Fri, 21 Oct 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/64-hundred-million/</guid><description>&lt;p&gt;A very brief note to announce reaching a long term goal and major milestone for marginalia search.&lt;/p&gt;
&lt;p&gt;The search engine now indexes 106,857,244 documents!&lt;/p&gt;
&lt;p&gt;The previous record was a bit south of seventy million. A hundred million has been a pie-in-the-sky goal for a very long time. It&amp;rsquo;s seemed borderline impossible to index a that many documents on a PC. Turns out it&amp;rsquo;s not. It&amp;rsquo;s more than possible.&lt;/p&gt;
&lt;p&gt;Twice this may even be technically doable, but is way past the pain point of sheer logistics. It&amp;rsquo;s already a real headache to deal with this much data.&lt;/p&gt;</description></item><item><title>The Evolution of Marginalia's crawling</title><link>https://www.marginalia.nu/log/63-marginalia-crawler/</link><pubDate>Tue, 23 Aug 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/63-marginalia-crawler/</guid><description>&lt;p&gt;In the primordial days of Marginalia Search, it used a dynamic approach to crawling the Internet.&lt;/p&gt;
&lt;p&gt;It ran a number of crawler threads, 32 or 64 or some such, that fetched jobs from a director service, that grabbed them straight out of the URL database, these jobs were batches of 100 or so documents that needed to be crawled.&lt;/p&gt;
&lt;p&gt;Crawling was not planned ahead of time, but rather decided through a combination of how much of a website had been visited, and the quality score of that website determined where to go next. It also promoted crawling websites adjacent to high quality websites.&lt;/p&gt;</description></item><item><title>Marginaliacoin, and hidden forums</title><link>https://www.marginalia.nu/log/62-marginaliacoin/</link><pubDate>Thu, 18 Aug 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/62-marginaliacoin/</guid><description>&lt;p&gt;I discovered someone has made a cryptocurrency called &amp;ldquo;Memex Marginalia Inu&amp;rdquo;. It appears to have been created February 23, which is around when the entry &amp;ldquo;I Have No Capslock And I Must Scream&amp;rdquo; went absurdly viral to the point where Elon Musk tweeted a link to it.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.marginalia.nu/log/48-i-have-no-capslock.gmi"&gt;I Have No Capslock&amp;hellip;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Mr Musk&amp;rsquo;s twitter orbit is exceptionally strange. The tweet was followed by a deluge of bizarre activity, strange emails with calls about stonk canine lunar expeditions, and apparently also a cryptocurrency land-grab of sorts. I can&amp;rsquo;t claim to understand why, but many of the emails got after the tweet were on the theme &amp;ldquo;what does this mean?&amp;rdquo;, almost as though Elon&amp;rsquo;s tweet was some sort of prophetic omen.&lt;/p&gt;</description></item><item><title>Botspam Apocalypse</title><link>https://www.marginalia.nu/log/61-botspam-apocalypse/</link><pubDate>Wed, 03 Aug 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/61-botspam-apocalypse/</guid><description>&lt;p&gt;Bots are absolutely crippling the Internet ecosystem.&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;future&amp;rdquo; in the film Terminator 2 is set in the 2020s. If you apply its predictions to the running of a website, it&amp;rsquo;s honestly very accurate.&lt;/p&gt;
&lt;p&gt;Modern bot traffic is virtually indistinguishable from human traffic, and can pummel any self-hosted service into the ground, flood any form with comment spam, and is a chronic headache for almost any small scale web service operator.&lt;/p&gt;</description></item><item><title>On Prescriptive Descriptions</title><link>https://www.marginalia.nu/log/60-prescriptive-descriptions/</link><pubDate>Thu, 14 Jul 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/60-prescriptive-descriptions/</guid><description>&lt;p&gt;I&amp;rsquo;d like to discuss a mental somersault that I&amp;rsquo;ve found has caused me a lot of grief in the past, which is prescriptive descriptions. Let&amp;rsquo;s break this down a bit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A descriptive statement is a statement about how something is.&lt;/li&gt;
&lt;li&gt;A prescriptive statement is a statement about how something must be, a rule or a law.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If I stay up a bit late, do most of my work in the evenings, wake up tired and just sort of putter about until noon, I might describe myself as a night person because of this.&lt;/p&gt;</description></item><item><title>Fun with Anchor Text Keywords</title><link>https://www.marginalia.nu/log/59-anchor-text/</link><pubDate>Thu, 23 Jun 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/59-anchor-text/</guid><description>&lt;p&gt;Anchor texts are a very useful source of keywords for a search engine, and in an older version of the search engine, it used the text of such hyperlinks as a supplemental source for keywords, but due to a few redesigns, this feature has fallen off.&lt;/p&gt;
&lt;p&gt;Last few days has been spent working on trying to re-implement it in a new and more powerful fashion. This has largely been enabled by a crawler re-design from a few months ago, which offers the crawled data in a lot more useful fashion and allows a lot more flexible post-processing.&lt;/p&gt;</description></item><item><title>marginalia.nu goes open source</title><link>https://www.marginalia.nu/log/58-marginalia-open-source/</link><pubDate>Fri, 27 May 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/58-marginalia-open-source/</guid><description>&lt;p&gt;After a bit of soul searching with regards to the future of the website, I&amp;rsquo;ve decided to open source the code for marginalia.nu, all of its services, including the search engine, encyclopedia, memex, etc.&lt;/p&gt;
&lt;p&gt;A motivating factor is the search engine has sort of grown to a scale where it&amp;rsquo;s becoming increasingly difficult to productively work on as a personal solo project. It needs more structure. What&amp;rsquo;s kept me from open sourcing it so far has also been the need for more structure. The needs of the marginalia project, and the needs of an open source project have effectively aligned.&lt;/p&gt;</description></item><item><title>I don't know how to build software</title><link>https://www.marginalia.nu/log/57-dont-know-how-to-build-software/</link><pubDate>Fri, 06 May 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/57-dont-know-how-to-build-software/</guid><description>&lt;p&gt;There are a lot of ways of building software, there are many languages you could choose to build it with, many libraries to rely on, many frameworks to leverage, many architectural approaches, many platforms to choose, many paradigms of daily operations to follow.&lt;/p&gt;
&lt;p&gt;It takes years to get in-depth experience with just one permutation of these options.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been programming for over twenty years, only half the time professionally, but that is how long I&amp;rsquo;ve been building software. I&amp;rsquo;ve built about twelve applications in my twenty years of development, of varying size and complexity.&lt;/p&gt;</description></item><item><title>Uncertain Future For Marginalia Search</title><link>https://www.marginalia.nu/log/56-uncertain-future/</link><pubDate>Thu, 28 Apr 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/56-uncertain-future/</guid><description>&lt;p&gt;I found myself effectively without a job on short notice.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m not at all worried about finding another one, I have savings, and I have experience, and I have demonstrable skill. What I am concerned about is finding a source of income that&amp;rsquo;s compatible with putting some time on my personal projects.&lt;/p&gt;
&lt;p&gt;Last bunch of years, I&amp;rsquo;ve been working 32 hour weeks, which is a pretty sweet deal especially combined with the zero hour commute you get working from home during the pandemic. Not every employer is fine with that, and while I do have options, I&amp;rsquo;m in a worse bargaining position than I have been before.&lt;/p&gt;</description></item><item><title>Lexicon Architectural Rubberducking</title><link>https://www.marginalia.nu/log/55-lexicon-rubberduck/</link><pubDate>Mon, 11 Apr 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/55-lexicon-rubberduck/</guid><description>&lt;p&gt;I&amp;rsquo;m going to think out loud for a moment about a problem I&amp;rsquo;m considering.&lt;/p&gt;
&lt;p&gt;RAM is a precious resource on any server. Look at VPS servers, and you&amp;rsquo;ll be hard pressed to find one with much more than 32 Gb. Look at leasing a dedicated server, and it&amp;rsquo;s the RAM that really drives up the price. My server has 128 Gb, and it it&amp;rsquo;s so full it needs to unbutton its pants to sit down comfortably. Anything I can offload to disk is great.&lt;/p&gt;</description></item><item><title>The Bargain Bin B-Tree</title><link>https://www.marginalia.nu/log/54-bargain-bin-btree/</link><pubDate>Thu, 07 Apr 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/54-bargain-bin-btree/</guid><description>&lt;p&gt;I&amp;rsquo;ve been working lately on a bit of an overhaul of how the search engine does indexing. How it indexes its indices. &amp;ldquo;Index&amp;rdquo; is a bit of an overloaded term here, and it&amp;rsquo;s not the first that will crop up.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s start from the beginning and build up and examine the problem of searching for a number in a list of numbers. You have a long list of numbers, let&amp;rsquo;s sort them because why not.&lt;/p&gt;</description></item><item><title>Is There A Better Hard Drive Metaphor?</title><link>https://www.marginalia.nu/log/53-better-hard-drive-metaphor/</link><pubDate>Sun, 03 Apr 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/53-better-hard-drive-metaphor/</guid><description>&lt;p&gt;This is mostly a post to complain about something that chafes. I wish there was a programming language (ideally several) that acknowledged that computers have hard drives, not just a processor, RAM and other_devices[].&lt;/p&gt;
&lt;p&gt;Something that has struck me when I&amp;rsquo;ve been working with the search engine is how unfinished the metaphor for accessing physical disks is in most programming languages. It feels like an after-thought, half left to the operating system to figure out, a byzantine relic of the days when computers had tape drives and not SSDs.&lt;/p&gt;</description></item><item><title>Growing Pains</title><link>https://www.marginalia.nu/log/52-growing-pains/</link><pubDate>Wed, 23 Mar 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/52-growing-pains/</guid><description>&lt;p&gt;The search engine index has grown quite considerably the last few weeks. It&amp;rsquo;s actually surpassed 50 million documents, which is quite some milestone. In February it was sitting at 27-28 million or so.&lt;/p&gt;
&lt;p&gt;About 80% of this is side-loading all of stackoverflow and stackexchange, and part of it is additional crawling.&lt;/p&gt;
&lt;p&gt;The crawler has to date fetched 91 million URLs, but only about a third of what is fetched actually qualifies for indexing for various reasons, some links may be dead, some may be redirects, some may just have too much javascript and cruft to qualify.&lt;/p&gt;</description></item><item><title>The Static File Startup</title><link>https://www.marginalia.nu/log/51-the-file-startup/</link><pubDate>Fri, 18 Mar 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/51-the-file-startup/</guid><description>&lt;p&gt;Note: This is satirical in nature. Slight CW if you are at a point in life where &amp;ldquo;Office Space&amp;rdquo; has unveiled itself as a disturbing existential horror movie. This taps into that the same darkness.&lt;/p&gt;
&lt;p&gt;A tale of six brave Internet pioneers.&lt;/p&gt;
&lt;p&gt;Senior Business Founder / Senior CEO &amp;ndash; Zach
Senior Tech Lead / Senior Architect / Senior CTO &amp;ndash; Kevin
Senior Backend dev
Senior Frontend dev &amp;ndash; Erin
Two Senior UX engineers&lt;/p&gt;</description></item><item><title>A meditation on correctness in software</title><link>https://www.marginalia.nu/log/50-meditation-on-software-correctness/</link><pubDate>Mon, 14 Mar 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/50-meditation-on-software-correctness/</guid><description>&lt;p&gt;Let&amp;rsquo;s define a simple mathematical function, the function will perform integer factoring. It will take an integer, and return two integers, the product of which is the first integer.&lt;/p&gt;
&lt;p&gt;F(int32 n) = (int32 A, int32 B)&lt;/p&gt;
&lt;p&gt;so that&lt;/p&gt;
&lt;p&gt;A*B = n&lt;/p&gt;
&lt;p&gt;This is fairly straight forward, mathematical, objective. Let&amp;rsquo;s examine some answers an implementation might give.&lt;/p&gt;
&lt;p&gt;F 50 = (5, 10) on ARM
F 50 = (10, 5) on Intel&lt;/p&gt;
&lt;p&gt;This seems like a bug, so let&amp;rsquo;s add the requirement that A &amp;lt;= B for deterministic results.&lt;/p&gt;</description></item><item><title>Marginalia Search: 1 year</title><link>https://www.marginalia.nu/log/49-marginalia-1-year/</link><pubDate>Fri, 25 Feb 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/49-marginalia-1-year/</guid><description>&lt;p&gt;I&amp;rsquo;ve caught some bug and don&amp;rsquo;t have the energy to write more than a brief note.&lt;/p&gt;
&lt;p&gt;I want to commemorate the fact that work on the Marginalia search engine started one year ago. The first commit was on February 26th 2021, and contained a sketch for a website crawler and some data models.&lt;/p&gt;
&lt;p&gt;In many ways, the paint is barely dry, yet it feels like this project has been around for a long while.&lt;/p&gt;</description></item><item><title>I have no capslock and I must scream</title><link>https://www.marginalia.nu/log/48-i-have-no-capslock/</link><pubDate>Mon, 21 Feb 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/48-i-have-no-capslock/</guid><description>&lt;p&gt;In a near future, a team of desktop computer designers are looking at the latest telemetry and updating the schematics of the hardware-as-a-service self-assembling nanohardware.&lt;/p&gt;
&lt;p&gt;Steve: &amp;ldquo;Hmm, they don&amp;rsquo;t seem to be using the power button very often.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Bob: &amp;ldquo;Compared to the other buttons, it&amp;rsquo;s only used 0.1% of the time&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Steve: &amp;ldquo;Remove it?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Bob: &amp;ldquo;Remove it!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Computers now instantly boot up when plugged into the wall, and run until the plug is pulled. No more start-up time, the cases are aesthetically cleaner, and manufacturing cost is down at least a fraction of a dollar.&lt;/p&gt;</description></item><item><title>Drive Failure</title><link>https://www.marginalia.nu/log/47-drive-failure/</link><pubDate>Sat, 19 Feb 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/47-drive-failure/</guid><description>&lt;p&gt;Not what I had intended to do this Saturday, but a hard drive failed on the server this morning, or at least so it seemed. MariaDB server went down, dmesg was full of error messages for the nvme drive it&amp;rsquo;s running off. That&amp;rsquo;s a pretty important drive.&lt;/p&gt;
&lt;p&gt;The drive itself may actually be okay, the working hypothesis is either the drive itself or the bus overheated and reset. After a reboot the system seems fine.&lt;/p&gt;</description></item><item><title>The Anatomy of Search Engine Spam</title><link>https://www.marginalia.nu/log/46-anatomy-of-search-engine-spam/</link><pubDate>Mon, 07 Feb 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/46-anatomy-of-search-engine-spam/</guid><description>&lt;p&gt;Black hat SEO is endlessly fascinating phenomenon to study. This post is about some tactics they use to make their sites rank higher.&lt;/p&gt;
&lt;p&gt;The goal of blackhat SEO is to boost the search engine ranking of a page nobody particularly wants to see, usually ePharma, escort services, online casinos, shitcoins, hotel bookings; the bermuda pentagon of shady websites.&lt;/p&gt;
&lt;p&gt;The theory behind most modern search engines is that if you get links from a high ranking domain, then your domain gets a higher ranking as well, which increases the traffic. The reality is a little more complicated than that, but this is a sufficient mental model to understand the basic how-to.&lt;/p&gt;</description></item><item><title>Can we unfuck internet discoverability?</title><link>https://www.marginalia.nu/log/45-unfuck-internet-discoverability/</link><pubDate>Fri, 04 Feb 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/45-unfuck-internet-discoverability/</guid><description>&lt;p&gt;I&amp;rsquo;ve been thinking a lot about how difficult it has become to discover quality content on the Internet, not because it isn&amp;rsquo;t there, but because the signal to noise ratio is really bad, and most venues of discovery don&amp;rsquo;t seem to be able to handle it.&lt;/p&gt;
&lt;p&gt;Recommendation algorithms seem to work almost too well, to the point where it&amp;rsquo;s all kind of just showing you things you already like, rarely anything new that you might like. It&amp;rsquo;s an absolute tragedy both for small websites and for their potential audience.&lt;/p&gt;</description></item><item><title>Discovery and Design Considerations</title><link>https://www.marginalia.nu/log/44-discovery-and-design/</link><pubDate>Tue, 18 Jan 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/44-discovery-and-design/</guid><description>&lt;p&gt;It&amp;rsquo;s been a productive several weeks. I&amp;rsquo;ve got the feature pulling updates from RSS working, as mentioned earlier.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve spent the last weeks designing the search engine&amp;rsquo;s web design, and did the MEMEX too for good measure.&lt;/p&gt;
&lt;p&gt;It needed to be done as the blog theme that previously made the foundation for the design off had several problems, including loading a bunch of unnecessary fonts, and not using the screen space of desktop browsers well at all.&lt;/p&gt;</description></item><item><title>Pseudonymous</title><link>https://www.marginalia.nu/log/43-pseodonymous/</link><pubDate>Sat, 15 Jan 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/43-pseodonymous/</guid><description>&lt;p&gt;A person might think I&amp;rsquo;m illusive, writing and working under a pseudonym. It&amp;rsquo;s not that I&amp;rsquo;m hiding, if you send me an email, I&amp;rsquo;ll respond to you with an email address containing a decent chunk of my real name. It&amp;rsquo;s not out of shame I wear clothes.&lt;/p&gt;
&lt;p&gt;Besides bringing utility, marginalia.nu is an experiment, a bit of an art project, a place to challenge conventions and see what is and isn&amp;rsquo;t necessary.&lt;/p&gt;</description></item><item><title>Dark</title><link>https://www.marginalia.nu/log/42-dark/</link><pubDate>Sun, 02 Jan 2022 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/42-dark/</guid><description>&lt;p&gt;As is often the case these dark winter seasons, I&amp;rsquo;ve fallen into a bit of a funk. Inspiration it seems is as rare as sunlight, and sunlight is scarce indeed in the winters of the north.&lt;/p&gt;
&lt;p&gt;I do know what is missing, novelty. I&amp;rsquo;ve fallen into consuming &amp;ldquo;content&amp;rdquo;. Infinite scroll is the torture rack of the spirit. What is necessary doing new things and seeing new inspiring sights, exposing myself to new inspiring thoughts.&lt;/p&gt;</description></item><item><title>Search Result Relevance</title><link>https://www.marginalia.nu/log/41-search-result-relevance/</link><pubDate>Fri, 10 Dec 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/41-search-result-relevance/</guid><description>&lt;p&gt;This entry is about a few problems the search engine has been struggling with lately, and how I&amp;rsquo;ve been attempting to remedy them.&lt;/p&gt;
&lt;p&gt;Before the article starts, I wanted to share an amusing new thing in the world of Internet spam.&lt;/p&gt;
&lt;p&gt;For a while, people have been adding things like &amp;ldquo;reddit&amp;rdquo; to the end of their Google queries to get less blog spam. Well, guess what? The blog spammers are adding &amp;ldquo;reddit&amp;rdquo; to the end of their titles now.&lt;/p&gt;</description></item><item><title>Wasted Resources</title><link>https://www.marginalia.nu/log/40-wasted-resources/</link><pubDate>Sat, 04 Dec 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/40-wasted-resources/</guid><description>&lt;p&gt;At a previous job, we had a new and fancy office. The light switches were state of the art. There was an on button, and a separate off button. When you pressed the on button, the lights would fade on. When you pressed the off button, they would fade off. In the cloud somewhere was two functions that presumably looked a bit like this:&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;fun turnOnLamp() {
 while (!bright()) increaseBrightness();
}
fun turnOffLamp() {
 while (!dark()) decreaseBrightness();
}
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;I have deduced this from the fact that if you pressed both buttons at the same time, the lights would flicker on and off until someone was contacted to restart something. It is a marvellous time to be alive when you need to reboot your light switches because of a race condition. Modern computers are so fast that we often don&amp;rsquo;t even recognize when we are doing things inefficiently. We can end messages half way around the world to turn on the lights and it seems like it&amp;rsquo;s just a wire between the switch and the lamp.&lt;/p&gt;</description></item><item><title>A brief hypothesis about "normies"</title><link>https://www.marginalia.nu/log/39-normie-hypothesis/</link><pubDate>Sat, 13 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/39-normie-hypothesis/</guid><description>&lt;p&gt;The phenomenon of &amp;ldquo;normies&amp;rdquo; is an interesting one. The term itself is a bit problematic and not one I&amp;rsquo;d typically use, but as a phenomenon they are still worth investigating.&lt;/p&gt;
&lt;p&gt;Their perhaps biggest distinguishing feature is that they don&amp;rsquo;t get &amp;ldquo;it&amp;rdquo;, whatever it is. It&amp;rsquo;s tempting to think that these are an especially mindless type of person with no personality and little in terms of thought going on.&lt;/p&gt;
&lt;p&gt;I have a theory that normies may not actually exist. That is, you can&amp;rsquo;t actually show me a person that is a normie. They are a mirage.&lt;/p&gt;</description></item><item><title>Old and New</title><link>https://www.marginalia.nu/log/38-old-and-new/</link><pubDate>Fri, 12 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/38-old-and-new/</guid><description>&lt;p&gt;I&amp;rsquo;ve been thinking recently about the emphasis put on &amp;ldquo;new&amp;rdquo;, specifically for search engines, but the discussion has some merit even in a wider context. I will start wide and narrow down.&lt;/p&gt;
&lt;p&gt;It is common to conflate new with good, and most being young sometime between 1950-2000 will indeed have seen marvellous improvements in quality of life and technology with each passing year. In the light of that, it&amp;rsquo;s at least easy to explain how one might confuse the two.&lt;/p&gt;</description></item><item><title>A Jaunt Through Keyword Extraction</title><link>https://www.marginalia.nu/log/37-keyword-extraction/</link><pubDate>Thu, 11 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/37-keyword-extraction/</guid><description>&lt;p&gt;Search results are only as good as the search engine&amp;rsquo;s ability to figure out what a page is about. Sure a keyword may appear in a page, but is it the topic of the page, or just some off-hand mention?&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t really know anything about data mining or keyword extraction starting out, so I&amp;rsquo;ve had to learn on the fly. I&amp;rsquo;m just going to briefly list some of my first naive attempts at keyword extraction, just to give a context.&lt;/p&gt;</description></item><item><title>Localized Programming Languages</title><link>https://www.marginalia.nu/log/36-localized-programming-languages/</link><pubDate>Fri, 05 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/36-localized-programming-languages/</guid><description>&lt;p&gt;This is an reply to a series of posts on anglo-centrism in programming languages that have been floating around in Gemini lately.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://nytpu.com/gemlog/2021-10-31.gmi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://alsd.eu/en/2021-11-04-thoughts-anglocentrism-cs.gmi&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Around thirty years ago I was a kid with a computer. I learned to program quite a few years before I learned English. I also used DOS without understanding English. I knew what to type to do things, but I didn&amp;rsquo;t know what the words meant. I could start programs, I&amp;rsquo;d play in QBASIC, write small programs and amusements. To me &amp;ldquo;PRINT&amp;rdquo; was the word that made text appear on the screen. I learned years later the word meant something in English. To show you what my child eyes saw, I think rot13 does convey the experience quite well;&lt;/p&gt;</description></item><item><title>Keeping Gemini Difficult</title><link>https://www.marginalia.nu/log/35-keeping-gemini-difficult/</link><pubDate>Thu, 04 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/35-keeping-gemini-difficult/</guid><description>&lt;p&gt;This is a response to the post &amp;ldquo;Making Gemini Easy&amp;rdquo; over on ~tomasino, and the title is a bit tongue-in-cheek haha-but-no-really.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://tilde.team/~tomasino/journal/20211103-making-gemini-easy.gmi&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I think the idea that we need to shield the users from how technology works is a terrible, terrible mistake. It disempowers the users, and concentrates power in the hands of a technological elite, and that divide is only going to grow.&lt;/p&gt;
&lt;p&gt;We already have an alarming number of people working with computers, and some may even be programmers, that simply do not understand how computers work. Their only concept of a computer is the user interface on the screen. The rest is unintelligible wizardry. Nobody has told them, it&amp;rsquo;s been deemed too complicated, nothing for them to worry their little heads about.&lt;/p&gt;</description></item><item><title>A Polemic Against Internet Arguments</title><link>https://www.marginalia.nu/log/34-internet-arguments/</link><pubDate>Tue, 02 Nov 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/34-internet-arguments/</guid><description>&lt;p&gt;I want you to consider for a moment all the human lifetime wasted in ideological stalemates on the Internet, all that energy, all that anger and frustration. Imagine if you take even a fraction of that time, and put it to creating something constructive instead, learning skills, doing anything meaningful.&lt;/p&gt;
&lt;p&gt;It boggles the mind, doesn&amp;rsquo;t it? It must amount to entire human lifetimes every week.&lt;/p&gt;
&lt;p&gt;Ideology, or in a wider sense, ethics, is all about what should be done. How we should live. These aren&amp;rsquo;t statements about the world, but opinions about what the world should look like. They aren&amp;rsquo;t true or false, and any argument against them always boils down to &amp;ldquo;I disagree!&amp;rdquo;. Arguing about ethical systems is some of the least constructive things a human can do. It is more pointless than masturbation, which at least feels good for a moment.&lt;/p&gt;</description></item><item><title>The Parable of A Rude Guest</title><link>https://www.marginalia.nu/log/33-rude-guests/</link><pubDate>Thu, 28 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/33-rude-guests/</guid><description>&lt;p&gt;You are invited to a dinner party. After talking for a while food is served on the table. You pounce. &amp;ldquo;Haha, suckers!&amp;rdquo;, you think, and load all the food on your plate and leave nothing but scraps for the hosts. You feel victorious. Serves them right for inviting you into their home. You wolf down the food with ravenous appetite while they look on.&lt;/p&gt;
&lt;p&gt;That was tasty, but now you got a piece of meat stuck between your teeth so you go to the bathroom and borrow some floss and use one the hosts&amp;rsquo; toothbrushes. You also use the toilet but don&amp;rsquo;t flush because you don&amp;rsquo;t think you are going to use it again.&lt;/p&gt;</description></item><item><title>Bot Apologetics</title><link>https://www.marginalia.nu/log/32-bot-apologetics/</link><pubDate>Mon, 25 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/32-bot-apologetics/</guid><description>&lt;p&gt;There has been a bit of discussion over on Gemini recently regarding poorly behaved bots. I feel I need to add some perspective from the other side; as a bot operator (even though I don&amp;rsquo;t operate Gemini bots).&lt;/p&gt;
&lt;p&gt;Writing a web spider is pretty easy on paper. You have your standards, and you can test against your own servers to make sure it behaves before you let it loose.&lt;/p&gt;
&lt;p&gt;You probably don&amp;rsquo;t want to pound the server into silicon dust, so you add a crawl delay and parallelize the crawling, and now you have code that&amp;rsquo;s a lot harder to comprehend. This is likely the cause of some weird bot behavior, including mishandling of redirect loops or repeated visits to the same address. Multi-threaded orchestration based on a rapidly mutating data set is difficult to get right (the working set of the spider by necessity changes as it goes). You can iron a lot of this out locally, but some problems won&amp;rsquo;t crop up until you really push the limits with real-world scenarios.&lt;/p&gt;</description></item><item><title>Shaking N-gram needles from large haystacks</title><link>https://www.marginalia.nu/log/31-ngram-needles/</link><pubDate>Fri, 22 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/31-ngram-needles/</guid><description>&lt;p&gt;A recurring problem when searching for text is identifying which parts of the text are in some sense useful. A first order solution is to just extract every word from the text, and match documents against whether they contain those words. This works really well if you don&amp;rsquo;t have a lot of documents to search through, but as the corpus of documents grows, so does the number of matches.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s possible to bucket the words based on where they appear in the document, but this is not something I&amp;rsquo;m doing at the moment and not something I will implement in the foreseeable future.&lt;/p&gt;</description></item><item><title>Unintuitive Optimization</title><link>https://www.marginalia.nu/log/30-unintuitive-optimization/</link><pubDate>Wed, 13 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/30-unintuitive-optimization/</guid><description>&lt;p&gt;Optimization is arguably a lot about intuition. You have a hunch, and see if it sticks. Sure you can use profilers and instrumentation, but they are more like hunch generators than anything else.&lt;/p&gt;
&lt;p&gt;This one wasn&amp;rsquo;t as intuitive, at least not to me, but it makes sense when you think about it.&lt;/p&gt;
&lt;p&gt;I have an 8 Gb file of dense binary data. This data consists of 4 Kb chunks and is an unsorted list containing first an URL identifier with metadata and then a list of word identifiers. This is a sort of journal that the indexer produces during crawling. Its main benefit is that this can be done quickly with very high fault tolerance. Since it&amp;rsquo;s only ever added to, if anything does go wrong you can just truncate the bad part at the end and keep going.&lt;/p&gt;</description></item><item><title>The Mystery of the Ceaseless Botnet DDoS</title><link>https://www.marginalia.nu/log/29-botnet-ddos/</link><pubDate>Sun, 10 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/29-botnet-ddos/</guid><description>&lt;p&gt;I&amp;rsquo;ve been dealing with a botnet for the last few days, that&amp;rsquo;s been sending junk search queries at an increasingly aggressive rate. They were reasonably easy to flag and block but just kept increasing the rate until that stopped working.&lt;/p&gt;
&lt;p&gt;Long story short, my patience ran out and put my website behind cloudflare. I didn&amp;rsquo;t want to have to do this, because it does introduce a literal man in the middle and that kinda undermines the whole point of HTTPS, but I just don&amp;rsquo;t see any way around it. I just can&amp;rsquo;t spend every waking hour playing whac-a-mole with thousands of compromised servers flooding me with 50,000 search requests an hour. That&amp;rsquo;s five-six times more than when I was on the front page of HackerNews, and the attempts only increased.&lt;/p&gt;</description></item><item><title>Web Browsing</title><link>https://www.marginalia.nu/log/28-web-browsing/</link><pubDate>Sat, 09 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/28-web-browsing/</guid><description>&lt;p&gt;An idea I&amp;rsquo;ve had for a long time with regards to navigating the web is to find a way to browse it.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Browse&amp;rdquo; a difficult word to use, because it has a newer connotation of just using a web browser, I mean it in the old pre-Internet sense, browse like when you flip through a magazine, or peruse an antiques shop, not really looking for anything in particular just sort of seeing if anything catches your eye.&lt;/p&gt;</description></item><item><title>Getting with the times</title><link>https://www.marginalia.nu/log/27-getting-with-the-times/</link><pubDate>Wed, 06 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/27-getting-with-the-times/</guid><description>&lt;p&gt;Since my search engine has expanded its scope to include blogs as well as primordial text documents, I&amp;rsquo;ve done some thinking about how to keep up with newer websites that actually grow and see updates.&lt;/p&gt;
&lt;p&gt;Otherwise, as the crawl goes on, it tends to find fewer and fewer interesting web pages, and as the interesting pages are inevitably crawled to exhaustion, accumulate an ever growing amount of junk.&lt;/p&gt;
&lt;p&gt;Re-visiting each page and looking for new links in previously visited pages is probably off the table, that&amp;rsquo;s something I can maybe do once a month.&lt;/p&gt;</description></item><item><title>Experimenting with Personalized PageRank</title><link>https://www.marginalia.nu/log/26-personalized-pagerank/</link><pubDate>Sat, 02 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/26-personalized-pagerank/</guid><description>&lt;p&gt;The last few days I&amp;rsquo;ve felt like my first attempt at a ranking algorithm for the search engine was pretty good, like it was producing some pretty interesting results. It felt close to what I wanted to accomplish.&lt;/p&gt;
&lt;p&gt;The first ranking algorithm was a simple link-counting algorithm that did some weighting to promote pages that look in a certain fashion. It did seem to keep the page quality up, but also seemed to as a strange side-effect promote very &amp;ldquo;1996&amp;rdquo;-looking websites. This isn&amp;rsquo;t quite what I wanted to accomplish, I wanted to promote new sites as well as long as they were rich in content.&lt;/p&gt;</description></item><item><title>Astrolabe - The October Update</title><link>https://www.marginalia.nu/log/25-october-update/</link><pubDate>Fri, 01 Oct 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/25-october-update/</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="https://search.marginalia.nu"&gt;https://search.marginalia.nu&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The October Update is live. It introduced drastically improved topic identification and an actual ranking algorithm; and the result is interesting to say the least. What&amp;rsquo;s striking is how much it&amp;rsquo;s beginning to feel like a search engine. When it fails to find stuff, you can kinda see how.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve played with it for a while now and it does seem to produce relevant results for a lot of topics. A trade down in whimsical results but a big step up if you are looking for something specific, at least within the domain of topics where there are results to find.&lt;/p&gt;</description></item><item><title>Thoughts on Silly Hats</title><link>https://www.marginalia.nu/log/24-silly-hats/</link><pubDate>Mon, 27 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/24-silly-hats/</guid><description>&lt;p&gt;If you look back in history to the turn of the 20th century, you will find a lot of people wearing hats. Women wore some arguably pretty funny over the top hats. A while earlier, over the men were arguably funny top hats. This isn&amp;rsquo;t the first time men and women wore funny clothes. The aristocrats in 1680s France looked pretty silly too.&lt;/p&gt;
&lt;p&gt;The reason we do this, wear silly hats, is because it is fashionable. Compliance with the some perceived fashion trend is one way we compete with fellow human beings, a measuring stick we use to evaluate our standing within society. Oh, you merely wear a modest and peculiar hat? Well mine is bigger and sillier still, therefore I am better!&lt;/p&gt;</description></item><item><title>Re: Software and Branding</title><link>https://www.marginalia.nu/log/23-re-software-and-branding/</link><pubDate>Tue, 21 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/23-re-software-and-branding/</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://friends.riverside.camp/~clarity/journal/branding.gmi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://idiomdrottning.org/re-branding&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some interesting thoughts going around on the topic of branding in software and websites. I&amp;rsquo;ve had thoughts like these too, and designed a lot of my website purposefully un-branded. I have no logos, no banners, barely navigational links. I figured I would just see what happens if you subvert this paradigm of web design, since it is stuff you typically just &amp;ldquo;scroll past&amp;rdquo; to get to what you care about like you skip the beginning of every youtube video. Since the point has never been to create a brand, I figured I would elevate the message so far above everything else that there simply is nothing else.&lt;/p&gt;</description></item><item><title>Against the Flood</title><link>https://www.marginalia.nu/log/22-against-the-flood/</link><pubDate>Sun, 19 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/22-against-the-flood/</guid><description>&lt;p&gt;So hacker news apparently discovered my search engine, and really took a liking to the idea. Actually that&amp;rsquo;s a bit of an understatement, the thread has gotten 3.3k points and lingered on the front page for half a week. And I wasn&amp;rsquo;t planning for it to go quite that public yet. It has quietly been online for a while, but it was only very recently it started to feel like it was really coming together. It wasn&amp;rsquo;t perfect, there was still a lot of jankiness and limitations that could have been fixed with more time. The index was half the size it should have been. Someone discovered it and shared it. It took off like a rocket, and I&amp;rsquo;m still at a loss for words at the reception it&amp;rsquo;s gotten. I have received so many encouraging comments, emails, offers of collaboration, a few have even joined the patreon. I&amp;rsquo;ve been working through all the messages and I aim to reply to them all, but it takes time. I&amp;rsquo;m very grateful for all of this, since I half thought I was alone in this.&lt;/p&gt;</description></item><item><title>New Solutions Creating Old Problems</title><link>https://www.marginalia.nu/log/21-new-solutions-old-problems/</link><pubDate>Tue, 14 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/21-new-solutions-old-problems/</guid><description>&lt;p&gt;I&amp;rsquo;ve spent some time the last week optimizing how the search engine identifies appropriate search results, putting far more consideration into where and how the search terms appear in the page when determining the order they are presented.&lt;/p&gt;
&lt;p&gt;Search-result relevance is a pretty difficult problem, but I do think the changes has brought the search engine in a very good direction.&lt;/p&gt;
&lt;p&gt;A bit simplified, I&amp;rsquo;m building tiered indices, ranging from&lt;/p&gt;</description></item><item><title>The Curious Case of the Dot-Com Link Farms</title><link>https://www.marginalia.nu/log/20-dot-com-link-farms/</link><pubDate>Thu, 09 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/20-dot-com-link-farms/</guid><description>&lt;p&gt;I spent some time today weeding out yet more link-farms from my search engine&amp;rsquo;s index.&lt;/p&gt;
&lt;p&gt;Typically what I would do is just block the subnet assigned to the VPS provider they&amp;rsquo;re on, and that does seem to work rather well. The cloud providers that don&amp;rsquo;t police what they host is almost always home to quite a lot of this stuff, so I don&amp;rsquo;t particularly mind scorching some earth in the name of a clean index.&lt;/p&gt;</description></item><item><title>The Small Website Discoverability Crisis</title><link>https://www.marginalia.nu/log/19-website-discoverability-crisis/</link><pubDate>Wed, 08 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/19-website-discoverability-crisis/</guid><description>&lt;p&gt;There are a lot of small websites on the Internet: Interesting websites, beautiful websites, unique websites.&lt;/p&gt;
&lt;p&gt;Unfortunately they are incredibly hard to find. You cannot find them on Google or Reddit, and while you can stumble onto them with my search engine, it is not in a very directed fashion.&lt;/p&gt;
&lt;p&gt;It is an unfortunate state of affairs. Even if you do not particularly care for becoming the next big thing, it&amp;rsquo;s still discouraging to put work into a website and get next to no traffic beyond the usual bots.&lt;/p&gt;</description></item><item><title>Soaring High</title><link>https://www.marginalia.nu/log/18-soaring-high/</link><pubDate>Thu, 02 Sep 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/18-soaring-high/</guid><description>&lt;p&gt;I&amp;rsquo;m currently indexing with my search engine. This isn&amp;rsquo;t an always-on sort of an affair, but rather something I turn on and off as it tends to require at least some degree of babysitting.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve also been knocked out by the side-effects of the vaccine shot I got the other day, so it&amp;rsquo;s been mostly hands-off &amp;ldquo;parenting&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;What I&amp;rsquo;m trying to figure out just how far I can take it. I really don&amp;rsquo;t know. I took some backups and just let it do its thing relatively unmonitored.&lt;/p&gt;</description></item><item><title>Git Isn't A Web Service</title><link>https://www.marginalia.nu/log/17-git-isnt-a-web-service/</link><pubDate>Sat, 28 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/17-git-isnt-a-web-service/</guid><description>&lt;p&gt;This an expansion on a comment I left on Lettuce&amp;rsquo;s gemlog post, &amp;ldquo;Personal Experiences and Opinions on Version Control Software&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve seen similar questions posed several times recently, in essence people searching for a good git provider.&lt;/p&gt;
&lt;p&gt;The thing is you don&amp;rsquo;t need a git provider. Git is a shell command, and you can host a server yourself with almost no extra work. You can even host it off a system you don&amp;rsquo;t have administrative access to.&lt;/p&gt;</description></item><item><title>Cursed Motivation</title><link>https://www.marginalia.nu/log/16-cursed-motivation/</link><pubDate>Fri, 27 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/16-cursed-motivation/</guid><description>&lt;p&gt;A question I often see asked is one along the lines of&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt;How do I motivate myself do (something)
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;&amp;hellip; where something may be eat healthier, go to the gym, work on some project, study hard, &amp;amp;c.&lt;/p&gt;
&lt;p&gt;This idea of motivation is interesting. I think it in part comes from the school system, where teachers and parents often talk about motivating the children to study, perhaps with some sort of reward system. I haven&amp;rsquo;t been able to pinpoint exactly who introduced the idea, but my hunch is based on never seeing the particular usage of the word in a book printed before the late 20th century.&lt;/p&gt;</description></item><item><title>Stages of Being</title><link>https://www.marginalia.nu/log/15-stages-of-being/</link><pubDate>Mon, 23 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/15-stages-of-being/</guid><description>&lt;p&gt;@sdfgeoff asked an interesting question on station just a while ago&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt; How much of your lives do you spend living (or watching) someone elses?
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;It reminded me of an interesting tool kit for understanding being.&lt;/p&gt;
&lt;p&gt;The neoplatonists describe a hierarchy of being. This is sometimes attributed to Renaissance enfant terrible Pico della Mirandola&amp;rsquo;s oration of the dignity of man, but it&amp;rsquo;s given the most cursory mention. I&amp;rsquo;ve attached the quote at the bottom of the post. Plato&amp;rsquo;s Republic is probably a better source, even if it doesn&amp;rsquo;t draw up the hierarcy quite in this fashion.&lt;/p&gt;</description></item><item><title>Enter the Circle of Blame</title><link>https://www.marginalia.nu/log/14-enter-the-circle-of-blame/</link><pubDate>Sun, 15 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/14-enter-the-circle-of-blame/</guid><description>&lt;p&gt;So that IPCC report, huh. It&amp;rsquo;s provoked interesting behavior in people. I&amp;rsquo;ll skip over the minority that deny the findings, many people seem to agree that the report isn&amp;rsquo;t great news. Then they stop to look around and start pointing fingers.&lt;/p&gt;
&lt;p&gt;In summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The producers blame the consumers for making the wrong purchases.&lt;/li&gt;
&lt;li&gt;The consumers blame the producers for producing the wrong things.&lt;/li&gt;
&lt;li&gt;The voters blame the politicians for incorrect policies.&lt;/li&gt;
&lt;li&gt;The politicians blame the voters for expressing the wrong wants.&lt;/li&gt;
&lt;li&gt;The right blames the market for not adapting fast enough.&lt;/li&gt;
&lt;li&gt;The left blames capitalism for being shortsighted.&lt;/li&gt;
&lt;li&gt;The dogs blame the cats and the cats blame the mice.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can run around this endless circle of blame until it&amp;rsquo;s 2121, and we still won&amp;rsquo;t have gotten any closer to finding a solution. What we&amp;rsquo;re looking for is some witch to burn, someone to be really mad at for causing us grief; but what we need is to change our behavior. Don&amp;rsquo;t forget: The producers are people. The consumers are people. The voters are people. The politicians are people. The rich, the poor, the right, the left, the market, capitalism; it&amp;rsquo;s all people.&lt;/p&gt;</description></item><item><title>Rendered static HTML</title><link>https://www.marginalia.nu/log/13-static-html/</link><pubDate>Fri, 13 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/13-static-html/</guid><description>&lt;p&gt;The technological choices we make determine the rules we have to abide by.&lt;/p&gt;
&lt;p&gt;If every page load incurs hundreds of database calls on the server, and 30 seconds of javascripting on the front-end, then obviously you need to reduce the number of page loads to a minimum. They are frustrating for the user and expensive for the server. This makes the front-end even more slow and stateful, and so the urgency for reducing page loads increases even further.&lt;/p&gt;</description></item><item><title>Bye, Bye, Gmail</title><link>https://www.marginalia.nu/log/12-bye-bye-gmail/</link><pubDate>Wed, 04 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/12-bye-bye-gmail/</guid><description>&lt;p&gt;I finally got around to moving @marginalia.nu off gmail. I&amp;rsquo;ve been planning to do so for a while, and vascillated between self-hosting and using a provider. I ended up going for the latter. Even though I do like to self-host my stuff as much as possible, email servers seem like a lot of work. And I got a VPS account included in the price, which is nice. Means I can use it to do some off-site backups without having to use dropbox or similar.&lt;/p&gt;</description></item><item><title>Dying, Every Day (Re: Last times)</title><link>https://www.marginalia.nu/log/11-dying-every-day/</link><pubDate>Wed, 04 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/11-dying-every-day/</guid><description>&lt;p&gt;Dece&amp;rsquo;s post &amp;ldquo;last times&amp;rdquo; made me associate to one of my favorite thoughts from Roman philosopher Seneca, who was counting his days having fallen out of favor with Emperor Nero.&lt;/p&gt;
&lt;p&gt;In the first of his moral epistles to Lucilius, he asks:&lt;/p&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt; Quem mihi dabis [...] qui intellegat se cotidie mori?
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;pre&gt;&lt;code&gt; Who can you show me [...] that understands he is dying every day?
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s a fascinating and useful reversal of perspective. Our final day is not the day our life is suddenly taken away from us, but the last of our allotted days. Every preceding day has already been marked by our dying. Every day is a day that will never return, every moment is sand irreversibly flowing through the hour glass.&lt;/p&gt;</description></item><item><title>The Astrolabe Part II: The Magic Power of Sampling Bias</title><link>https://www.marginalia.nu/log/10-astrolabe-2-sampling-bias/</link><pubDate>Tue, 03 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/10-astrolabe-2-sampling-bias/</guid><description>&lt;p&gt;As I have mentioned earlier, perhaps the biggest enemy of PageRank is the hegemony of PageRank-style algorithms. Once an algorithm like that becomes not only dominant, but known, it also creates a market for leveraging its design particulars.&lt;/p&gt;
&lt;p&gt;Homogenous ecosystems are almost universally bad. It doesn&amp;rsquo;t really matter if it&amp;rsquo;s every computer running Windows XP, or every farmer planting genetically identical barley, what you get is extreme susceptibility to exploitation.&lt;/p&gt;</description></item><item><title>The System Upgrade</title><link>https://www.marginalia.nu/log/09-system-upgrade/</link><pubDate>Fri, 30 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/09-system-upgrade/</guid><description>&lt;p&gt;Early this winter, when I set up the server that would eventually become marginalia.nu, I did so in order to try out some technology I thought looked cool (proxmox, zfs), and stuff I was exposed to at work and didn&amp;rsquo;t really see the point of so as to see if we could get on better terms with if I had more control (kubernetes).&lt;/p&gt;
&lt;p&gt;I based the system on ProxMox, a Linux based virtualization server, which ran a series of virtual machines and containers.&lt;/p&gt;</description></item><item><title>Whatever happened to the Memex?</title><link>https://www.marginalia.nu/log/08-whatever-happened-to-the-memex/</link><pubDate>Wed, 28 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/08-whatever-happened-to-the-memex/</guid><description>&lt;p&gt;I stumbled upon the Memex, which is a spiritual predecessor to hypertext technology. It was supposed to be a sort of personal data store that allows the user to link and annotate various documents in order to produce a sort of external memory, a private knowledge bank that associates ideas in a similar way a human brain does. The operator could also save and share associative trails through the information.&lt;/p&gt;</description></item><item><title>Local Backlinks</title><link>https://www.marginalia.nu/log/07-local-backlinks/</link><pubDate>Mon, 26 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/07-local-backlinks/</guid><description>&lt;p&gt;Maintaining links is difficult. My gemini server doesn&amp;rsquo;t have a lot of pages, but already maintaining links between relevant pages is growing more tedious by the page. It&amp;rsquo;s going to become untenable soon.&lt;/p&gt;
&lt;p&gt;In part inspired by Antenna, I had the idea of extracting local backlinks, and automatically appending them to the pages that are linked. That way all local links are effectively bidirectional. If new a new post links to an old post, the old post automatically links to the new post. Old pages will thus over time accumulate more links to new pages without manual maintenance.&lt;/p&gt;</description></item><item><title>Index Optimizations</title><link>https://www.marginalia.nu/log/06-optimization/</link><pubDate>Fri, 23 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/06-optimization/</guid><description>&lt;blockquote&gt;
&lt;p&gt;Don&amp;rsquo;t chase small optimizations&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Said some smart person at some particular time, probably. If not, he ought to have; if worse comes to worst, I&amp;rsquo;m declaring it now. The cost of 2% here and 0.5% there is high, and the benefits are (by definition) low.&lt;/p&gt;
&lt;p&gt;I have been optimizing Astrolabe, my search engine. The different kind of Search Engine Optimization. I&amp;rsquo;ve spent a lot of time recently doing soft optimization, improving the quality and relevance of search results, to great results. I&amp;rsquo;ll write about that later.&lt;/p&gt;</description></item><item><title>The Mind's A Field</title><link>https://www.marginalia.nu/log/05-minds-field/</link><pubDate>Sun, 18 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/05-minds-field/</guid><description>&lt;p&gt;I find if I do not regularly plant interesting thoughts in my mind, it will rarely grow them spontaneously. I need to read interesting books, or have interesting conversations; if I do not, I&amp;rsquo;ll look back at my ideas and find I haven&amp;rsquo;t really had any in a very long time. Thoughts will grow whether I take care to plant them or not, but what grows if I have been careless is weeds.&lt;/p&gt;</description></item><item><title>On Link Farms</title><link>https://www.marginalia.nu/log/04-link-farms/</link><pubDate>Wed, 14 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/04-link-farms/</guid><description>&lt;p&gt;I&amp;rsquo;m in the midst of rebuilding the index of my search engine to allow for better search results, and I&amp;rsquo;ve yet again found need to revisit how I handle link farms. It&amp;rsquo;s an ongoing arms race between search engines and link farmers to adjust (and circumvent) the detection algorithms. Detection and mitigation of link farms is something I&amp;rsquo;ve found I need to modify very frequently, as they are constantly evolving to look more like real websites.&lt;/p&gt;</description></item><item><title>Writing for Reading</title><link>https://www.marginalia.nu/log/03-writing-for-reading/</link><pubDate>Mon, 12 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/03-writing-for-reading/</guid><description>&lt;p&gt;I&amp;rsquo;m struck by how easy it is to read things on Gemini. Not just skimming, but actual reading, like you would a book. Not everything written is great, but it&amp;rsquo;s usually worth reading all the same. Venturing out into the land of contemporary HTML is different, and it&amp;rsquo;s not a subtle difference.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve written before about the plague of inline links on Wikipedia, and this is largely a continuation of that discourse looking at other design elements.&lt;/p&gt;</description></item><item><title>Re: To unit test or not to unit test, that is the question</title><link>https://www.marginalia.nu/log/02-re-tests/</link><pubDate>Thu, 08 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/02-re-tests/</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href="#ZgotmplZ"&gt;gemini://gemini.conman.org/boston/2021/07/07.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I felt the need to add some thoughts tangentially related to this post by Sean Conner.&lt;/p&gt;
&lt;h2 id="why-do-we-hold-unit-tests-in-such-high-regard"&gt;Why do we hold unit tests in such high regard?&lt;/h2&gt;
&lt;p&gt;Enterprise software development (Agile with a TM at the end), and to an increasing degree open source software development has really accepted the Unit Test as personal lord and savior deep within their souls. If it doesn&amp;rsquo;t have coverage, it&amp;rsquo;s bad. If it has coverage, it&amp;rsquo;s good.&lt;/p&gt;</description></item><item><title>The Astrolabe Part I: Lenscraft</title><link>https://www.marginalia.nu/log/01-astrolabe/</link><pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/01-astrolabe/</guid><description>&lt;p&gt;Something you probably know, but may not have thought about a lot is that the Internet is large. It is unbelievably vast beyond any human comprehension. What you think of as &amp;ldquo;The Internet&amp;rdquo; is a tiny fraction of that vast space with its billions upon billions of websites.&lt;/p&gt;
&lt;p&gt;We use various technologies, such as link aggregators and search engines to find our way and make sense of it all. Our choices in navigational aides also shapes the experience we have of the Internet. These convey a warped sense of what the Internet truly is. There is no way of not doing that. Since nothing can communicate the raw reality of the internet to a human mind, concessions need to be made. Some content needs to be promoted, other needs to be de-emphasized. An objective rendering is a pipe dream, even a fair random sample is a noisy incomprehensible mess.&lt;/p&gt;</description></item><item><title>Thoughts on the linkpocalypse</title><link>https://www.marginalia.nu/log/00-linkpocalypse/</link><pubDate>Wed, 30 Jun 2021 00:00:00 +0000</pubDate><guid>https://www.marginalia.nu/log/00-linkpocalypse/</guid><description>&lt;p&gt;For a long while, I have been puzzled by the strangest problem: My attention span is really bad when I use a computer. I&amp;rsquo;m an avid reader of esoteric books. I have (recently) read the notoriously dry Confessions of Saint Augustine in print, it was a slog for certain, but it really doesn&amp;rsquo;t compare with the struggles I have when it comes to bringing myself to reading even a few paragraphs of text on a screen. It surely can&amp;rsquo;t be the screen itself, can it? That doesn&amp;rsquo;t seem plausible.&lt;/p&gt;</description></item></channel></rss>