<p>Drop <a href="/tags/626/" rel="tag">#626</a> (2025-03-24): Monday Morning Grab Bag</p><p>Djot; Bean CMS; The GitHub ⭐️ Clean Slate Protocol</p><p>(Two missed Drops due to a week of brutal and Constitution-destroying EOs, the launch of a <a href="https://47-watch.com/blog/posts/2025/2025-03-23-trump-administration-week-9/" rel="nofollow">47 Watch podcast</a>, work on a — for now — seekrit datavis collaboration with an amazing human, and the last couple days before shunting <a href="/tags/4/" rel="tag">#4</a> back up to UMaine.)</p><p>Technically, this is only a grab bag due to the last section. The first two are joined at the hip, and now I have a fairly strong urge to build an R package for the resource in the first section.</p><p><a href="https://dailydrop.hrbrmstr.dev/?post_type=post&p=139505135" rel="nofollow">Subscribe</a></p><p>TL;DR</p><p>(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)</p><p>Djot is a markup language that builds on CommonMark’s foundation with improved syntax and parsing, offering features like definition lists, footnotes, tables, and arbitrary attributes while parsing in linear time without backtracking (<a href="https://djot.net" rel="nofollow"><span class="invisible">https://</span>djot.net</a>).<br>Bean CMS is a minimal content management system built with redbean that stores posts, users, and session information in local SQLite files, providing a spartan but functional interface for content management (<a href="https://github.com/kevinfiol/beancms" rel="nofollow"><span class="invisible">https://</span>github.com/kevinfiol/beancms</a>).<br>The GitHub Star Clean Slate Protocol suggests removing all GitHub stars to avoid spammy solicitations, using a simple one-liner provided in the last section.</p><p>Djot</p><p><a href="https://djot.net" rel="nofollow">Djot</a> (<a href="https://github.com/jgm/djot" rel="nofollow">GH</a>) is a markup language that tries to do what Markdown never quite pulled off: be simple and precise. It builds on <a href="https://github.com/commonmark" rel="nofollow">CommonMark</a>’s foundation, but smooths out the edges—both in terms of syntax and the way it gets parsed. Djot grew out of John MacFarlane’s essay <a href="https://johnmacfarlane.net/beyond-markdown.html" rel="nofollow">Beyond Markdown</a>, and it reads like a direct response to the awkward corners and performance issues that have haunted Markdown from the start.</p><p>Right out of the box, Djot supports some things you’d expect from a grown-up markup language in 2025, like:</p><p>definition lists<br>footnotes<br>tables<br>extra inline formatting (insertions, deletions, highlights, superscript, subscript)<br>math support<br>smart punctuation<br>block and inline containers<br>arbitrary attributes on anything</p><p>These are things Markdown always made you hack around or just skip entirely.</p><p>Djot’s design is built around a few core ideas that fix longstanding problems with Markdown parsing:</p><p>it parses in linear time, which means it doesn’t waste cycles trying to guess context or backtrack like CommonMark sometimes does<br>it also parses inline elements locally, so references and links don’t depend on random labels defined somewhere else. This makes Djot friendlier to syntax highlighters and editors that don’t want to parse an entire file just to figure out how to color one line<br>it draws a hard line: _ means emphasis, * means strong. No more “is this italic or bold or both?” ambiguity.<br>it is consistent: syntax behaves the same inside list items, block quotes, and anywhere else—no special rules tucked away in the corners<br>everything can have attributes: ou can attach metadata to any element without needing special syntax extensions</p><p>The format also tightens up a few places where Markdown has always been loose or frustrating:</p><p>you can’t accidentally break paragraphs with block elements<br>lists behave more predictably—they’re driven by indentation, not whitespace guesswork<br>headings can span multiple lines without special rules<br>raw HTML is opt-in, not just pass-through<br>reference links are parsed locally and are case-sensitive, not scattered and vague</p><p>If you want to try Djot, there’s a JavaScript implementation ready to go:</p><p>$ npm install -g @djot/djot$ djot --help</p><p>You can also find implementations in Lua, Rust, Go, Haskell, even Prolog—because of course someone did it in Prolog.</p><p>If you already use pandoc, Djot fits right in. Here’s how to convert a Markdown file to Djot:</p><p>$ pandoc mydoc.md -f gfm -t json | djot -f pandoc -t djot > mydoc.dj</p><p>And here’s going from Djot to a Word doc:</p><p>$ djot -t pandoc mydoc.dj | pandoc -f json -s -t docx -o mydoc.docx</p><p>Djot files typically use the .dj extension.</p><p>The syntax isn’t locked down yet — there may be small changes as it evolves — but the core is stable, and the reference implementation in JavaScript is complete and usable right now.</p><p>There’s tons more info at the README; there are Go, Rust, TypeScript and more implementations; and you can quickly get a feel for it by using <a href="https://djot.net/playground/" rel="nofollow">the playground</a> and keeping <a href="https://github.com/jgm/djot/tree/main/doc" rel="nofollow">the docs</a> handy.</p><p>Bean CMS</p><p>We introduced Djot in the first section to set up this section.</p><p><a href="https://github.com/kevinfiol/beancms" rel="nofollow">Bean CMS</a> is a micro-CMS built with <a href="https://redbean.dev/" rel="nofollow">redbean</a>.</p><p>A quick refresher on redbean: it is an open-source, single-file web server that runs natively on six operating systems across both AMD64 and ARM64 architectures. It’s written in ANSI C and embeds Lua 5.4, MbedTLS, and SQLite into a single executable.</p><p>There is not much to “tell” about Bean CMS — it is a bare bones content management system. Posts, users, and session information is stored in local SQLite files, and images used in posts are referenced from a local directory.</p><p>The binary is meant to be placed in the top-level directory where all the databases and content will be stored, so you should do sometihing like:</p><p>$ cd /some/path/where/bean/content/goes$ ./beancms -D ./ -p PORT -b 127.0.0.1</p><p>Config options are minimal but sufficient:</p><p>Code block rendering is decent:</p><p>Post editing is spartan but does what you need it to do:</p><p>And, there’s even an admin panel (at /admin):</p><p>It’s built on <a href="https://github.com/pkulchenko/fullmoon" rel="nofollow">fullmoon</a> — a redbean-based web framework — (which we’ll cover in some future Drop) with very readable <a href="https://github.com/kevinfiol/beancms/tree/master/src" rel="nofollow">Lua and templates files</a>.</p><p>I fronted an instance of it with a Caddy reverse proxy, and you can [re]read the Djot section <a href="https://bean.hrbrmstr.app/hrbrmstr/djot" rel="nofollow">over there</a>.</p><p>It appears to have been born in early December last year, and the author — <a href="https://kevinfiol.com/" rel="nofollow">Kevin Fiol</a> — started using signed commits towards the end of December (which are still spiffy, but with all the GitHub account takeovers, I’m not sure I trust anything signed-but-still-hosted-on-GitHub anymore).</p><p>Kevin also has his own <a href="https://kevinfiol.com/blog/a-tour-of-beancms/" rel="nofollow">Bean ’splainer</a> up.</p><p>This has tons of potential to be a personal note-taking system, especially since it’s all backed by SQLite.</p><p>I’ll definitely be keeping an eye on this one.</p><p>The GitHub ⭐️ Clean Slate Protocol</p><p>I whined on the socials about yet-another skeezy, spammy “Hey! I saw you [starred XYZ repo|were fond of XYZ] and wanted to let you know about this other super cool ZYZ.” unsolicited email.</p><p>First off: DO NOT DO THIS NO MATTER HOW COOL YOU THINK YOUR PROJECT IS.</p><p>At least do not do it to me, as you will be marked as a spammer, blocked permanently wherever I can, shunned in person, and be forced to spoon feed ketchup to Donald Trump for eternity when you leave this mortal coil.</p><p>Anyway.</p><p>It got a few likes, and friend-of-the-Drop Adam S. (I did not ask permission to use Adam’s handle or full name) remarked about <a href="https://github.com/tpkahlon/github-unstar" rel="nofollow">github-unstar</a> a day or so later. It’s a neat JS-based CLI tool to nuke all your GitHub stars.</p><p>It honestly never occurred to me to unstar everything I had daftly starred. Adam: you are a genius!</p><p>But, that tool did not work for me because it did not handle the edge case where your record of a GitHub star event has rotted due to the owner of the repo either no longer having a GitHub account, or having deleted said repo. (I keep thinking there’s a way to take malicious advantage of this condition, but it’s so niche that I will likely not spend much time noodling it.) With all their resources you’d think Microsoft would have some process to help you deal with GH star link rot, but they are woefully incompetent and lack attention to detail in so many areas, that I am not even remotely surprised they do not.</p><p>Where there’s a will, there’s a Bash script, and we don’t need the bloat of an NPM package anyway if we have GitHub’s <a href="https://docs.github.com/en/github-cli/github-cli/quickstart" rel="nofollow">gh</a> utility handy (which I do since I still have to use GitHub for $WORK).</p><p>You can easily get info about all your starred repos via:</p><p>$ gh api --paginate user/starred'</p><p>That will DoS your terminal, so perhaps you just want to see the URLs:</p><p>$ gh api --paginate user/starred --jq '.[].html_url'</p><p>Ideally, you’d back up all your stars (to put into some other system, like Raindrop.io) first, but you can go all Iron Man and just destroy everything:</p><p>$ gh api --paginate user/starred --jq '.[].full_name' | xargs -I{} sh -c 'echo "💥 {}"; gh api -X DELETE user/starred/{}; sleep 0.5'</p><p>Bye. Bye. Spammers.</p><p>FIN</p><p>Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:</p><p>🐘 Mastodon via <span class="h-card"><a href="https://dailydrop.hrbrmstr.dev" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>dailydrop.hrbrmstr.dev</span></a></span><br>🦋 Bluesky via <a href="https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy" rel="nofollow" class="ellipsis" title="bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy"><span class="invisible">https://</span><span class="ellipsis">bsky.app/profile/dailydrop.hrb</span><span class="invisible">rmstr.dev.web.brid.gy</span></a></p><p>Also, refer to:</p><p><a href="https://dailydrop.hrbrmstr.dev/2024/12/04/drop-565-2024-12-04-all-strings-attached/" rel="nofollow">this post</a>, and<br><a href="https://dailydrop.hrbrmstr.dev/2024/12/08/bonus-drop-68-2024-12-08-all-strings-attached-cli-version/" rel="nofollow">this post</a></p><p>to see how to access a regularly updated database of all the Drops with extracted links, and full-text search capability. ☮️</p><p><a href="/tags/4/" rel="tag">#4</a></p>
4
<p>I Am The Very Modelfile Of A Modern Workflow General; The Impending ‘Reimagine’ Nightmare;</p><p>We return to the semi-regular AI-foucused Drop with a quick look at how to (quickly and easily!) make a custom Ollama model to make a small, useful tool.</p><p>Programming note: due to driving <a href="/tags/4/" rel="tag">#4</a> to college Friday, there may not be a Drop. But, if the round trip is not too taxing, I’ll likley get one out int he late afternoon.</p><p><a href="https://dailydrop.hrbrmstr.dev/?post_type=post&p=139503826" rel="nofollow">Subscribe</a></p><p>TL;DR</p><p>(This is an AI-generated summary of today’s Drop using Sonnet via Perplexity.)</p><p>A guide on creating a custom Ollama model using a Modelfile to generate concise names for CVE vulnerabilities, similar to CISA’s KEV catalog entries (<a href="https://github.com/ollama/ollama/blob/main/docs/modelfile.md" rel="nofollow" class="ellipsis" title="github.com/ollama/ollama/blob/main/docs/modelfile.md"><span class="invisible">https://</span><span class="ellipsis">github.com/ollama/ollama/blob/</span><span class="invisible">main/docs/modelfile.md</span></a>)<br>Discussion of Google’s new Pixel 9 “Reimagine” feature, which allows users to manipulate photos using AI, raising concerns about the potential misuse of such technology (<a href="https://www.theverge.com/2024/8/21/24224084/google-pixel-9-reimagine-ai-photos" rel="nofollow" class="ellipsis" title="www.theverge.com/2024/8/21/24224084/google-pixel-9-reimagine-ai-photos"><span class="invisible">https://</span><span class="ellipsis">www.theverge.com/2024/8/21/242</span><span class="invisible">24084/google-pixel-9-reimagine-ai-photos</span></a>)<br>Introduction to “The Inference,” a new editorial project by Danny Palmer published by Darktrace, exploring the impact of AI on cybersecurity and society (<a href="https://darktrace.com/the-inference" rel="nofollow"><span class="invisible">https://</span>darktrace.com/the-inference</a>)</p><p>I Am The Very Modelfile Of A Modern Workflow General</p>Photo by Pixabay on <a href="https://www.pexels.com/photo/selective-focus-photography-of-cairn-stone-268018/" rel="nofollow">Pexels.com</a><p>Longtime readers know I’m <a href="/tags/notafan/" rel="tag">#notAFan</a> of the “Open AI tax”. TL;DR for new readers is that I believe it is yugely important to ensure everyone had access to LLM/GPT tooling. These tools/services are not going away anytime soon; and, not knowing how to work with them puts other folks at an needless advantage. Ollama does a great job helping to level the playing field.</p><p>Ollama supports adding layers on top of existing models via something called a <a href="https://github.com/ollama/ollama/blob/main/docs/modelfile.md" rel="nofollow">Modelfile</a>. It’s a plain text file that lets you add in some parameters, prompts, examples, etc. so you don’t have to shunt them along with each new incantation. The format for these is, essentially:</p><p># commentINSTRUCTION arguments</p><p>Some key instructions include:</p><p>FROM: Identifies the base model (mandatory)<br>PARAMETER: Customizes model behavior through various settings<br>TEMPLATE: Defines the prompt template sent to the model<br>SYSTEM: Sets up system messages<br>ADAPTER: Integrates adapters for QLoRA<br>LICENSE: Specifies legal licenses<br>MESSAGE: Adds preset message histories</p><p>You can inspect the configuration of models you download and use pretty simply:</p><p>$ ollama show phi:latest --modelfile# Modelfile generated by "ollama show"# To build a new Modelfile based on this, replace FROM with:# FROM phi:latestFROM /path/to/.ollama/models/blobs/sha256-04778965089b91318ad61d0995b7e44fad4b9a9f4e049d7be90932bf8812e828TEMPLATE "{{ if .System }}System: {{ .System }}{{ end }}User: {{ .Prompt }}Assistant:"SYSTEM A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful answers to the user's questions.PARAMETER stop User:PARAMETER stop Assistant:PARAMETER stop System:LICENSE """MIT License … """</p><p>(They work in a similar fashion to Dockerfiles.)</p><p>You can <a href="https://www.pacenthink.io/post/understanding-modelfile-in-ollama/" rel="nofollow">go here</a> for a more in-depth explanation of the instructions and values. I’m just going to briefly show how to make one to solve a fun “problem”.</p><p>I talk quite a bit about CISA’s <a href="https://www.cisa.gov/known-exploited-vulnerabilities-catalog" rel="nofollow">Known Exploited Vulnerabilities Catalog</a> (KEV), and one nice thing the Keepers of KEV do for us is create a short name for any vulnerability they add to the catalog. CVE entries only have a longer-length description associated with their identifier, like CVE-2024-38856’s Apache OFBiz contains an incorrect authorization vulnerability that could allow remote code execution via a Groovy payload in the context of the OFBiz user process by an unauthenticated attacker.. The KEV entry for that uses the concise name of Apache OFBiz Incorrect Authorization Vulnerability, making it much easier to reference (and has way more info than just a bare CVE identifier).</p><p>At $WORK, we got it into our noggins to be as kind as the KEV Keepers and associate a concise name with any CVEs we display (and for an upcoming product/API feature). There is no way we were going to lovingly hand-craft those for a few hundred thousand entries. And, I’m loathe to give any AI vendor money. So, we built a custom PHI model to do this for us! And, are sharing it with y’all right here (we’ll working on posting the CVE ids with concise to GitHub when we’re done cleaning them up).</p><p>After a few iterations, here’s the Modelfile we ended up with:</p><p>FROM phi:latestPARAMETER temperature 0.0PARAMETER stop "\n"PARAMETER num_predict 30PARAMETER top_p 0.95SYSTEM """You are an AI system specializing in generating concise short names for vulnerabilities described by CVEs. Your task is to convert verbose CVE descriptions into clear, descriptive titles that resemble entries in CISA's Known Exploited Vulnerabilities (KEV) catalog. Stop generating output immediately after providing the short name.PROMPT Given a CVE description, create a short name similar to those used in CISA's KEV catalog. The short name should be concise, descriptive, and highlight the affected product or vulnerability type. Do not provide additional information or explanations. The response should be under 10 words and should end immediately after the short name.Examples:CVE description: A vulnerability in the TCP/IP stack of Cisco IOS XR Software could allow an unauthenticated, remote attacker to cause a denial of service (DoS) condition on an affected device.Short name: Cisco IOS XR TCP/IP Stack DoS VulnerabilityCVE description: Microsoft Exchange Server Remote Code Execution Vulnerability. This CVE ID is unique from CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, CVE-2021-27065.Short name: Microsoft Exchange Server RCE VulnerabilityNow, create a short name for this CVE:"""</p><p>The temperature controls the “creativity” of the output, with 0.0 being fully deterministic. Even with that, we were getting some odd output behavior, so we added a “please stop generating” hint at when it outputs a newline, set the max number of output tokens to 30 (I may change that to 20), and asked it to only make high “next token” probability choices.</p><p>Rather than muck with special tokens, I went with a basic prompt-with-few-shot-examples approach.</p><p>To make our new model, we just:</p><p>$ ollama create cve-shortener -f ./Modelfile</p><p>Then just try it out:</p><p>$ ollama run cve-shortener 'Topline Opportunity Form (aka XLS Opp form) before 2015-02-15 does not properly restrict access to database-connection strings, which allows attackers to read the cleartext version of sensitive credential and e-mail address information via unspecified vectors.'Topline Opportunity Form (XLS Opp) Database Vulnerability</p><p>Like most LLM/GPT output in API contexts, you’ll need to add some guardrails for cleaning up results or retrying the prompt. (Yes, even with all the constraints in the Modelfile, this one still borks output every so often.)</p><p>On my aging Apple Silicon box, each call takes ~200-600ms, depending on the inputs and other system GPU load.</p><p>If you find yourself repeating prompts, this is a lightweight way to avoid doing so. And, if you have some similar, focused tasks that need doing, getting a hand from our AI overlords can help save quite a bit of time.</p><p>The Impending ‘Reimagine’ Nightmare</p>Photo by cottonbro studio on <a href="https://www.pexels.com/photo/balloon-on-a-shattered-window-5427546/" rel="nofollow">Pexels.com</a><p><p>Super quick section, since I believe Welch’s examples say everything vs. have me blather much.</p></p><p>Chris Welch, an editor over at The Verge, acquired one of Google’s new Pixel 9 devices and gave the new Reimagine feature a go and posted some results on <a href="https://www.threads.net/@chriswelch/post/C-8LF4BOSAP" rel="nofollow">Threads</a> and <a href="https://www.theverge.com/2024/8/21/24224084/google-pixel-9-reimagine-ai-photos" rel="nofollow">The Verge</a>. This, to me, is a pretty terrifying new capability.</p><p>We’re already awash in deepfakes, photorealistic child image exploitation, advanced phishing, and shady political campaigns making it almost impossible to tell truth from fiction. The last thing we needed was to commodify this tech so anyone who can afford a certain class of portable glowing rectangles can get in on the gambit.</p><p>Let’s hope the majority of uses are benign/silly.</p><p>The Inference</p><p><p>Another quick section, as this content also speaks for itself pretty well.</p></p><p><a href="https://infosec.exchange/@dannyjpalmer#." rel="nofollow">Danny Palmer</a> is an excellent writer on cybersecurity topics. He has a new editorial project that’s being published by cybersecurity vendor Darktrace (yeah, that’s “a thing” in my line of work).</p><p><a href="https://darktrace.com/the-inference" rel="nofollow">The Inference</a> explores the impact of AI on the world around us, starting within the security operations center and expanding to business and society writ large. Features on it analyze data, trends, and perspectives from experts inside and outside of Darktrace to educate and inspire readers about the role that AI plays in enabling innovation and how it can be applied safely and securely.</p><p>I link to it since the first few pieces:</p><p><a href="https://darktrace.com/the-inference/ai-and-the-biggest-election-year-ever-are-we-ready" rel="nofollow">AI and the biggest election year ever – are we ready?</a><br><a href="https://darktrace.com/the-inference/it-support-engineering-in-the-age-of-ai" rel="nofollow">The role of IT support engineering in the age of AI</a><br><a href="https://darktrace.com/the-inference/in-conversation-with-professor-mike-woolridge" rel="nofollow">In conversation with Professor Mike Woolridge</a><br><a href="https://darktrace.com/the-inference/malware-as-a-service-what-you-need-to-know" rel="nofollow">Malware-as-a-Service: what you need to know about this persistent cyber threat</a></p><p>were very well crafted, and I thought more than a few readers would want to get this into their RSS feeds.</p><p>(As usual, I get nothing for Dropping a link to this post, save for providing y’all with something I found interesting.)</p><p>FIN</p><p>Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via <span class="h-card"><a href="https://dailydrop.hrbrmstr.dev" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>dailydrop.hrbrmstr.dev</span></a></span> ☮️</p><p><a href="https://dailydrop.hrbrmstr.dev/2024/08/29/drop-523-2024-08-29-happy-thursdai/" rel="nofollow" class="ellipsis" title="dailydrop.hrbrmstr.dev/2024/08/29/drop-523-2024-08-29-happy-thursdai/"><span class="invisible">https://</span><span class="ellipsis">dailydrop.hrbrmstr.dev/2024/08</span><span class="invisible">/29/drop-523-2024-08-29-happy-thursdai/</span></a></p><p><a href="/tags/4/" rel="tag">#4</a></p>
<p>Pipet; Need For Speed; Poisoning AI Scrapers</p><p>A midweek visit to see <a href="/tags/4/" rel="tag">#4</a> means I’m still “recovering” from a total of seven hours of intraday driving. So, y’all get a potpouri of resources, today, since I’m way too tired to knit together a themed Drop.</p><p><a href="https://dailydrop.hrbrmstr.dev/?post_type=post&p=139504077" rel="nofollow">Subscribe</a></p><p>TL;DR</p><p>(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)</p><p>I can report that the VS Codium plugin I made worked super well today!</p><p>Pipet is a Golang-based command-line tool for web scraping and data extraction, operating in three primary modes: HTML parsing, JSON parsing, and client-side JavaScript evaluation. <a href="https://github.com/bjesus/pipet" rel="nofollow">Pipet</a><br>Need For Speed includes two resources: speedtest-rs (a Rust implementation of a speed test tool) and LibreSpeed (a web-based tool for measuring internet connection speeds). <a href="https://github.com/nelsonjchen/speedtest-rs" rel="nofollow">speedtest-rs</a>, <a href="https://librespeed.org/" rel="nofollow">LibreSpeed</a><br>Poisoning AI scrapers involves using techniques like generating garbled content deterministically and serving alternative versions to detected AI scrapers, as demonstrated by Tim McCormack’s project. <a href="https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scrapers/" rel="nofollow">Poisoning AI scrapers</a></p><p>Pipet</p>Photo by Pixabay on <a href="https://www.pexels.com/photo/ageing-plant-357440/" rel="nofollow">Pexels.com</a><p><a href="https://github.com/bjesus/pipet" rel="nofollow">Pipet</a> is a net and well-thought-out Golang-based command-line tool designed for web scraping and data extraction. It operates in three primary modes: HTML parsing, JSON parsing, and client-side JavaScript evaluation. Rather than do the webops on its own, Pipet leverages existing utilities like curl and integrates seamlessly with Unix pipes, letting us extend its built-in capabilities in ways we’re all pretty much used to.</p><p>Pipet can be used for various data extraction tasks, including (their words) “tracking shipments, monitoring concert ticket availability, observing stock price changes, and extracting any online information”.</p><p>They have examples, but let’s make another one. This Bash script wraps around pipet to get the news headlines from BBC’s nigh plaintext “On This Day” page for the current day:</p><p>#!/usr/bin/env bashtemp_file=$(mktemp)month_name=$(date +%B)cat <<EOF >"${temp_file}"curl <a href="http://news.bbc.co.uk/onthisday/low/dates/stories/$" rel="nofollow" class="ellipsis" title="news.bbc.co.uk/onthisday/low/dates/stories/$"><span class="invisible">http://</span><span class="ellipsis">news.bbc.co.uk/onthisday/low/d</span><span class="invisible">ates/stories/$</span></a>{month_name,,}/$(date +%-d)/default.stma[href^="/onthisday/low/dates/stories"] spanEOFcommand pipet --json "${temp_file}" | command jq -r '.[0][0] | map(select(length > 0)) | .[]'</p><p>That Pipet file will:</p><p>grab the HTML<br>target all the /onthisday links<br>extract the text from the span elements</p><p>I feed the JSON output to jq for cleaning. Here’s a demo:</p><p>$ ./on-this-day.sh1995: OJ Simpson verdict: 'Not guilty'1975: London's Spaghetti House siege ends1944: Poles surrender after Warsaw uprising1981: IRA Maze hunger strikes at an end1952: Tea rationing to end1979: Anti-racists tackle South African rugby tourists</p><p>NOTE: A far more robust version of that script can be had at <a href="https://rud.is/dl/on-this-day-robust.sh" rel="nofollow" class="ellipsis" title="rud.is/dl/on-this-day-robust.sh"><span class="invisible">https://</span><span class="ellipsis">rud.is/dl/on-this-day-robust.s</span><span class="invisible">h</span></a>.</p><p>The README is fairly extensive with details on how to use Pipet’s headless mode, and advanced JSON filtering mode (when the curl responses are JSON).</p><p>Since I have some boilerplate Go template projects for quickly creating custom scrapers, I’m not sure I’ll be using Pipet much, but it’s definitely a neat tool that others may find useful.</p><p>Need For Speed</p>Photo by Pixabay on <a href="https://www.pexels.com/photo/lighted-roadside-rings-290470/" rel="nofollow">Pexels.com</a><p>I came across the two resources in this section:</p><p><a href="https://github.com/nelsonjchen/speedtest-rs" rel="nofollow">speedtest-rs</a><br><a href="https://librespeed.org/" rel="nofollow">LibreSpeed</a> (<a href="https://github.com/librespeed/speedtest" rel="nofollow">GH</a>)</p><p>after reading this post on “<a href="https://foosel.net/blog/2021-03-28-homelab-uplink-monitoring/" rel="nofollow">Homelab uplink monitoring</a>“.</p><p>speedtest-rs is a Rust implementation of a tool similar to the popular Python speedtest-cli, designed to measure internet connection speeds. The project was originally a learning exercise for the author to explore Rust and its ecosystem. It’s evolved a bit and is designed primarily for lower-end residential connections using “<a href="https://web.archive.org/web/20161109011118/http://www.ookla.com/support/a84541858" rel="nofollow">HTTP Legacy Fallback</a>“.</p><p>Here’s a run from my pseudo-high bandwidth abode:</p><p>$ ./speedtest-rsRetrieving speedtest.net configuration...Retrieving speedtest.net server list...Testing from Comcast Cable (37.141.13.125)...Selecting best server based on latency...Hosted by Optimum Online (White Plains, NY) [272.08 km]: 31.164 msTesting download speed..............Download: 503.23 Mbit/sTesting upload speed..............Upload: 39.62 Mbit/sWARNING: This tool may not be accurate for high bandwidth connections! Consider using a socket-based client alternative.</p><p>(Ookla, the company behind speedtest.net, has their own non-FOSS CLI tool that’s native and available for many platforms. It’s TCP-based and capable of handling higher bandwidths. While not open-source, it’s supported by Ookla and can be used for non-commercial purposes.)</p><p>LibreSpeed is a web-based tool that provides a simple and straightforward way to measure internet connection speeds. It’s designed to be easy to use and provides a clear and concise interface for folks to view their internet connection speeds. It’s self-hostable (with very minimal requirements) and has its own <a href="https://github.com/librespeed/speedtest-cli" rel="nofollow">CLI</a> tool as well.</p><p>Poisoning AI Scrapers</p><p>(I had to sneak one ThursdAI post in here.)</p><p>Our AI overlords are <a href="https://www.tanayj.com/p/openai-and-anthropic-revenue-breakdown" rel="nofollow">raking in billions</a> (but are all still losing money whilst also killing the planet because that’s how late stage capitalism “works”). While I have yet to finish deploying a network of “tarpits” designed to slow down and poison AI scrapers, Tim McCormack has gone and done it!</p><p>In “<a href="https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scrapers/" rel="nofollow">Poisoning AI scrapers</a>“, Tim covers his path towards a project to deter AI companies from using his blog content for training large language models without permission. To achieve this, he implements a system that serves garbled versions of his blog posts to detected AI scrapers.</p><p>His approach involves several components. First, for content generation, he uses a <a href="https://en.wikipedia.org/wiki/Dissociated_press" rel="nofollow">Dissociated Press</a> algorithm implemented in Rust to generate nonsensical content that looks superficially normal. The algorithm takes the original blog post as input and produces garbled text while maintaining some structural elements.</p><p>For content storage, the system generates an alternative version of each blog post (named “swill.alt.html“) and stores this alongside the regular post content. Scraper detection and content serving is handled using Apache httpd .htaccess rules with mod_rewrite to detect AI scrapers based on User-Agent strings. When a scraper is detected, the server serves the garbled “swill” version instead of the real content.</p><p>Key aspects of the system includes generating garbled content deterministically using the post’s SHA256 hash as a seed, regenerating alternative content only for draft posts to maintain stable “swill” versions for published posts, and excluding comments from the garbled versions to avoid associating others’ names with nonsensical content.</p><p>Tim ACKs that this approach alone won’t significantly impact LLM training. However, he sees it as a fun exercise, a way to practice programming skills, and potentially an inspiration for others to implement similar measures. The post concludes by inviting discussion on alternative technical approaches and other poisoning techniques, while discouraging broader debates about AI ethics or the merits of LLMs.</p><p>FIN</p><p>Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via <span class="h-card"><a href="https://dailydrop.hrbrmstr.dev" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>dailydrop.hrbrmstr.dev</span></a></span> ☮️</p><p><a href="https://dailydrop.hrbrmstr.dev/2024/10/03/drop-539-2024-10-01-toss-up-thursday/" rel="nofollow" class="ellipsis" title="dailydrop.hrbrmstr.dev/2024/10/03/drop-539-2024-10-01-toss-up-thursday/"><span class="invisible">https://</span><span class="ellipsis">dailydrop.hrbrmstr.dev/2024/10</span><span class="invisible">/03/drop-539-2024-10-01-toss-up-thursday/</span></a></p><p><a href="/tags/4/" rel="tag">#4</a></p>
Edited 1y ago
Death of Piro <a href="/tags/4/" rel="tag">#4</a> is out<br><a href="http://stanleylieber.com/2025/03/04/0/" rel="nofollow" class="ellipsis" title="stanleylieber.com/2025/03/04/0/"><span class="invisible">http://</span><span class="ellipsis">stanleylieber.com/2025/03/04/0</span><span class="invisible">/</span></a><br>
<p>读过 <a href="https://neodb.social/search?r=1&q=https://neodb.social/book/25QidPlYhi9dGXY61LrbpO" rel="nofollow">Fantastic Four by Ryan North Vol. 1: Whatever Happened to the Fantastic Four?</a> 🌕🌕🌕🌕🌑 <br><a href="/tags/1/" rel="tag">#1</a>-3特别棒,<a href="/tags/4/" rel="tag">#4</a>-6稍微逊色一些,但整体还是很好看、很抓人,非常适合作为starting point来看<br></p>