More Description

Published:	2025-10-04 (Saturday) at 12:39
Updated:	2025-10-18 (Saturday) at 18:07
Tags:	Dev BuildInPublic WhileDo

Changelog

2025-10-04 - Added a changelog
2025-10-18 - Slightly reworded the last couple of sentences

Why is this here?

Since this was the post to test my description generation feature, this one has all the elements (callouts of different kinds, footnotes, linkes, linebreaks, even html) that I do not want to have included in my description.

Let's continue

Another WhileDo post.^[1] Of course, my amazing work yesterday had one fatal flaw and that is that the preview card did not include any descriptive text, just the one image of the Write Like You're Ron Jeffries post, which makes it super confusing looking:

A screenshot showing a cross posted link to my WhileDo article. This one has an image attached which makes it look like these two posts are somehow nested within each other.

It seems that the crossposting service of my choice - the awesome EchoFeed by Robb Knight - extracts the image and posts that instead of a preview card or something? Time to read the docs. The EchoFeed docs don't talk about preview cards at all. So the question is: How does Mastodon behave, if an image is attached? Does that mean no preview card?

While reading this blog post I notice two things:

Mastodon's preview card logic is much more complicated than expected
I went about this all wrong and should've used this checker to see if everything is generated correctly on my end, before looking further down the pipeline

And lo and behold it's not. My generated description field looks like this:

"description": "WhileDo Published: 2025-10-03 (Friday) at 14:49 Tags: Dev BuildInPublic quoteAIunquote Instead of a How-To, I'll write a…",

Alright. Back to basics. I can see what went wrong here: We don't use the post's body to generate the description but the whole page. I dig through the Eleventy docs for a moment, but can't find what I'm looking for. There is the page variable, but that only includes everything.

Time to open the editor. I see that we use content in the base.njk file:

{%- if isPost -%}
	{%- set pageDescription = description or (content | postContentExcerpt(120)) -%}
{%- else -%}
	{%- set pageDescription = description or metadata.description -%}
{%- endif -%}
<meta name="description" content="{{ pageDescription }}">

I get lucky and find in my old eleventy config the following:

eleventyConfig.addPreprocessor(
		"categories-to-tags",
		"md",
		(data, content) => {
			const parsed = matter(content);
			const categories = parsed.data.categories;
			if (categories) {
				parsed.data.tags = categories;
			}
			return matter.stringify(parsed.content, parsed.data);
		},
	)

I imagine I could parse the post's body and then only use the actual content to create the description. But that would need to happen before we convert the markdown into html. AND: What about posts that include things like Callouts or similar? Since I use them for Changelogs at the moment, they should probably not be included either...

It turns out, that just doing it in a couple of hours before tackling a bigger problem was naive. The description feature itself is actually quite involved! So time for a plan: I will "precalculate" the description so that it will exist on the post. We know from the availability of isPost in the layout that we ought to try to this in the blog.11tydata.js data file which at the moment looks like this:

export default {
	tags: ["posts"],
	layout: "layouts/post.njk",
	permalink:
		"{{page.filePathStem.slice(5).replace(page.fileSlug, '')}}{{page.fileSlug|slugify}}/index.html",
	isPost: true,
};

Reading the documentation, I'm unsure if I can work with the posts content or not. I guess I just have to try it out. ... Good news! The contents are available! data.page.rawInput. It is kind of sad that what data includes is not better documented. But it wasn't that hard to figure it out:

export default {
	tags: ["posts"],
	layout: "layouts/post.njk",
	permalink:
		"{{page.filePathStem.slice(5).replace(page.fileSlug, '')}}{{page.fileSlug|slugify}}/index.html",
	isPost: true,
	eleventyComputed: {
		generatedDescription: (data) =>
			`HELLO I WAS GENERATED! ${JSON.stringify(data)}`,
	},
};

<!doctype html>
<html lang="{{ metadata.language }}">
	<head>
		<meta charset="utf-8">
		<meta name="viewport" content="width=device-width, initial-scale=1.0">
		<title>{{ title or metadata.title }}</title>
		<11ty>{{ generatedDescription }}</11ty>

<11ty>HELLO I WAS GENERATED! {"metadata":{"title":"Martin Hähnel","url":"https://blog.martin-haehnel.de/","language":"en","description":"German living in Finland's north-west. Making money by programming remotely. Loves to write notes.","author":{"name":"Martin Hähnel","email":"matti@omg.lol","url":"https://blog.martin-haehnel.de/about/"}},"eleventyComputed":{},"eleventy":{"version":"3.1.2","generator":"Eleventy v3.1.2","env":{"source":"cli","runMode":"serve","config":"/Users/martinhahnel/Local/code/blog-monorepo/apps/blog/eleventy.config.js","root":"/Users/martinhahnel/Local/code/blog-monorepo/apps/blog"},"directories": ...

Great. But I just noticed another edge case: Footnotes shouldn't be included either! But I'll started in the simplest way and just use the filter from yesterday on the rawInput to see what I get.

<11ty>Instead of a How-To, I'll write a "WhileDo" today. This is my attempt to [[Write Like You're Ron Jeffries|Write Like Ron…

I see mostly more complexity. We want to keep only the titles of markdown link and want to otherwise remove brackets and other markdown completely from the text. I let copilot figure out the regexes and come put with this:

export default {
	tags: ["posts"],
	layout: "layouts/post.njk",
	permalink:
		"{{page.filePathStem.slice(5).replace(page.fileSlug, '')}}{{page.fileSlug|slugify}}/index.html",
	isPost: true,
	eleventyComputed: {
		generatedDescription: (data) => makeDescription(data.page.rawInput, 120),
	},
};

function makeDescription(html, maxLength = 120) {
	const raw = html || "";
	const withoutTags = raw.replace(/<[^>]*>/g, " ");
	// keep only the titles of markdown links
	// e.g. [example](https://example.com) becomes "example"
	// and [[link|title]] becomes "title" or [[link]] becomes "link"
	const withoutLinks = withoutTags
		.replace(/\[([^\]]+)\]\(([^)]+)\)/g, "$1")
		.replace(/\[\[([^|\]]+)(\|([^|\]]+))?\]\]/g, "$3");
	// remove footnotes
	// e.g. "Some text.[^1]" becomes "Some text."
	// and "Some text^[inline footnote]" becomes "Some text"
	const withoutFootnotes = withoutLinks.replace(/(\[\^.+?\])|(\^\[.+?\])/g, "");
	const normalized = withoutFootnotes.replace(/\s+/g, " ").trim();

	if (normalized.length <= maxLength) {
		return normalized;
	}

	// Check if character at maxLength is a space or if we're at word boundary
	if (normalized[maxLength] === " " || normalized[maxLength] === undefined) {
		return normalized.substring(0, maxLength) + "…";
	}

	// Find the next space after maxLength to complete the word
	const nextSpace = normalized.indexOf(" ", maxLength);

	if (nextSpace === -1) {
		// No more spaces, return the whole string
		return normalized;
	}

	// Return up to the next space to complete the word
	return normalized.substring(0, nextSpace) + "…";
}

This is basically the same filter we saw yesterday with some bolted on normalization. We now remove footnotes, simplify markdown links and remove html tags. But removing callouts is sadly not as simple, because they span multiple lines. So what now? I might need to parse the markdown with markdown-it. Maybe markdown-it let's me retrieve plain text and I don't have to do this myself even? Let's check. I guess not. Hm. It seems like not directly. I find a plugin markdown-it-plain-text that might do what I want. I'll try that.

There is this nagging feeling that bolting on another npm package on top of my stuff is not so great, but since this plugin doesn't have any dependencies it's probably okay. I replace all the regex stuff with this:

const md = markdownit();
md.use(plainTextPlugin);
md.render(html);
const normalized = md.plaintext;

if (!normalized) {
	return "WAT";
}

While I'm waiting for the build to finish... I somehow had a problem with the dev server just now... I wonder if rendering all posts twice just to extract 120 characters is worth it. But then again it probably is. I see a problem though. All posts have a description of "WAT!", meaning normalized is not defined. Which is bad. Ah it was just md.plaintext should've been md.plainText.

Coming back after a little break the created description of this post - which I am now using as the test file - looks like this:

[!NOTE]- Changelog 2025-10-04 - Added a changelog [!NOTE] Why is this here?Since this was the post to test my description…

So we still have to deal with the callouts. I try to recall how markdown-it parsing works. I had understood it once for my Callout Plugin. We could maybe just have our own plugin that simply removes these tokens from the array as well? Let's see how the plaintext plugin actually works.

I ask Copilot for help and explanations. But I feel like it moves too quickly and doesn't let me grasp what's going on. This is an interesting trait of these AI tools: They are pretty quick and often, when you know what's going on, that's helpful, but the markdown-it parser is somewhat involved and I feel like I would benefit from understanding it slightly more.

Walked the dog. I think I will just combine my regex approach with a small callout removal plugin. I was also thinking about using JSDOM to parse and manipulate the HTML later - probably the leanest way to handle this. But if the results are good enough doing this, then I would be fine. This is what the agent generated before I left with the dog:

const stripCalloutsPlugin = (md) => {
	const calloutRegex = /^\[!note\]([+-]?)( +[^\n\r]+)?/i;

	md.core.ruler.after("block", "strip_callouts", function (state) {
		const tokens = state.tokens;
		const tokensToRemove = [];

		for (let idx = 0; idx < tokens.length; idx++) {
			if (tokens[idx].type !== "blockquote_open") continue;

			const openIdx = idx;
			const closeIdx = findBlockquoteClose(tokens, idx);
			const contentToken = findInlineTokenInBlockquote(
				tokens,
				openIdx,
				closeIdx,
			);

			if (!contentToken) continue;

			const match = contentToken.content.match(calloutRegex);
			if (!match) continue;

			// Mark all tokens in this callout for removal
			for (let i = openIdx; i <= closeIdx; i++) {
				tokensToRemove.push(i);
			}
		}

		// Remove tokens in reverse order to maintain indices
		for (let i = tokensToRemove.length - 1; i >= 0; i--) {
			tokens.splice(tokensToRemove[i], 1);
		}
	});

	function findBlockquoteClose(tokens, idx) {
		for (let i = idx + 1; i < tokens.length; i++) {
			if (tokens[i].type === "blockquote_close") return i;
		}
		return idx;
	}

	function findInlineTokenInBlockquote(tokens, startIdx, endIdx) {
		for (let i = startIdx + 1; i < endIdx; i++) {
			if (tokens[i].type === "inline") return tokens[i];
		}
		return null;
	}
};

function makeDescription(rawInput, maxLength = 120) {
	const md = new markdownit();
	md.use(stripCalloutsPlugin);
	md.use(plainTextPlugin);
	md.render(rawInput);
	const normalized = md.plainText;
	//...

Looking at it now, and rereading parts of my callouts post, I get what this ad-hoc plugin does:

look for tokens of type blockquote_open
find the end of that block by doing a lock ahead until we find blockquote_close
look with in open and close for the callout-indicating syntax via regex
if we find it: mark all tokens in between open and close
remove them back to front

It's the last part that I don't understand: Why are we doing it backwards? Time to open the ol' RunJS and play around a little!

const tokens = [ 'The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
const openIdx = 3
const closeIdx = 6
const tokensToRemove = [ 3, 4, 5, 6 ]

This means the code would mark 'fox', 'jumps', 'over', 'the' for removal. The backwards method of removing these words results in:

[ 'The', 'quick', 'brown', 'lazy', 'dog' ]

If I do it by hand:

backup2.splice(tokensToRemove[3], 1); // delete index 6 (the)
console.log(backup2);  // [ 'The', 'quick', 'brown', 'fox', 'jumps', 'over', 'lazy', 'dog' ]
backup2.splice(tokensToRemove[2], 1); // delete index 5 (over)
console.log(backup2);  // [ 'The', 'quick', 'brown', 'fox', 'jumps', 'lazy', 'dog' ]
backup2.splice(tokensToRemove[2], 1); // delete index 4 (jumps)
console.log(backup2);  // [ 'The', 'quick', 'brown', 'fox', 'lazy', 'dog' ]
backup2.splice(tokensToRemove[2], 1); // delete index 3 (fox)
console.log(backup2);  // [ 'The', 'quick', 'brown', 'lazy', 'dog' ]

If I go the other direction:

backup.splice(tokensToRemove[0], 1); //index 3 (fox)
console.log(backup); // [ 'The', 'quick', 'brown', 'jumps', 'over', 'the', 'lazy', 'dog' ]
backup.splice(tokensToRemove[1], 1); //index 4 (jumps)
console.log(backup); // [ 'The', 'quick', 'brown', 'jumps', 'the', 'lazy', 'dog' ]

We immediately see that removing index 4 after removing index 3 leads to problems. And I get it now: Removing any element rekeys/reindexes the array. So removing from the front changes the indexes for entries we want to removing leading to a removal of the wrong entries. Doing it backwards ensures that only elements after the elements we care about get reindexed, therefore keeping the found indexes intact! Great. I understand what's happening. Time to finish this version. Because the plaintext plugin doesn't strip out brackets or footnotes, I now run the whole regex block after markdown-it strips out the callouts/etc. Here's the whole data file:

import markdownit from "markdown-it";
import plainTextPlugin from "markdown-it-plain-text";

export default {
	tags: ["posts"],
	layout: "layouts/post.njk",
	permalink:
		"{{page.filePathStem.slice(5).replace(page.fileSlug, '')}}{{page.fileSlug|slugify}}/index.html",
	isPost: true,
	eleventyComputed: {
		generatedDescription: (data) => makeDescription(data.page.rawInput, 120),
	},
};

// Custom plugin that strips callouts only
const stripCalloutsPlugin = (md) => {
	const calloutRegex = /^\[!note\]([+-]?)( +[^\n\r]+)?/i;

	md.core.ruler.after("block", "strip_callouts", function (state) {
		const tokens = state.tokens;
		const tokensToRemove = [];

		for (let idx = 0; idx < tokens.length; idx++) {
			if (tokens[idx].type !== "blockquote_open") continue;

			const openIdx = idx;
			const closeIdx = findBlockquoteClose(tokens, idx);
			const contentToken = findInlineTokenInBlockquote(
				tokens,
				openIdx,
				closeIdx,
			);

			if (!contentToken) continue;

			const match = contentToken.content.match(calloutRegex);
			if (!match) continue;

			// Mark all tokens in this callout for removal
			for (let i = openIdx; i <= closeIdx; i++) {
				tokensToRemove.push(i);
			}
		}

		// Remove tokens in reverse order to maintain indices
		for (let i = tokensToRemove.length - 1; i >= 0; i--) {
			tokens.splice(tokensToRemove[i], 1);
		}
	});

	function findBlockquoteClose(tokens, idx) {
		for (let i = idx + 1; i < tokens.length; i++) {
			if (tokens[i].type === "blockquote_close") return i;
		}
		return idx;
	}

	function findInlineTokenInBlockquote(tokens, startIdx, endIdx) {
		for (let i = startIdx + 1; i < endIdx; i++) {
			if (tokens[i].type === "inline") return tokens[i];
		}
		return null;
	}
};

function makeDescription(rawInput, maxLength = 120) {
	const md = new markdownit();
	md.use(stripCalloutsPlugin);
	md.use(plainTextPlugin);
	md.render(rawInput);
	const raw = md.plainText;
	const withoutTags = raw.replace(/<[^>]*>/g, " ");
	// keep only the titles of markdown links
	// e.g. [example](https://example.com) becomes "example"
	// and [[link|title]] becomes "title" or [[link]] becomes "link"
	const withoutLinks = withoutTags
		.replace(/\[([^\]]+)\]\(([^)]+)\)/g, "$1")
		.replace(/\[\[([^|\]]+)(\|([^|\]]+))?\]\]/g, "$3");
	// remove footnotes
	// e.g. "Some text.[^1]" becomes "Some text."
	// and "Some text^[inline footnote]" becomes "Some text"
	const withoutFootnotes = withoutLinks.replace(/(\[\^.+?\])|(\^\[.+?\])/g, "");
	const normalized = withoutFootnotes.replace(/\s+/g, " ").trim();

	if (normalized.length <= maxLength) {
		return normalized;
	}

	// Check if character at maxLength is a space or if we're at word boundary
	if (normalized[maxLength] === " " || normalized[maxLength] === undefined) {
		return normalized.substring(0, maxLength) + "…";
	}

	// Find the next space after maxLength to complete the word
	const nextSpace = normalized.indexOf(" ", maxLength);

	if (nextSpace === -1) {
		// No more spaces, return the whole string
		return normalized;
	}

	// Return up to the next space to complete the word
	return normalized.substring(0, nextSpace) + "…";
}

This blog post's beginning:

> [!NOTE]- Changelog
> - 2025-10-04 - Added a changelog


> [!NOTE] Why is this here?
> Since this was the post to test my description generation feature, this one has all the elements (callouts of different kinds, footnotes, linkes, linebreaks, even html) that I do not want to have included in my description.


<h3>Let's continue</h3>

Another[[WhileDo]] post.^[These are fun to write and kind of write themselves.] Of course, my amazing work yesterday had one fatal flaw and that is that the preview card did…

And here's what makeDescription produces:

Let's continue Another post. Of course, my amazing work yesterday had one fatal flaw and that is that the preview card did…

Heck. It should read "Let's continue Another WhileDo post." but the WhileDo is missing. I realize the link removal regex is broken somehow. Time to use debuggex to figure it out! I can see that the capture groups aren't working correctly.

I let AI take a crack at it. Now it works. I can feel a little pang of something... but it's not really guilt. It's shame. I enjoy working in this way. But I feel like people will judge me for using it. I can read and understand all the code and I could've figured out the regex issue myself, but I don't mind that the model worked it out for me. I feel vulnerable writing lab notes like this. However if I wouldn't write these down, the shame would be much less pronounced. I don't know if this is good or bad. To me, the important part is that the description feature now works as intended and strips out superfluous elements. It is not the most beautiful solution, but I am not really "done" with it either. I quickly fix the base template and remove the filter code from yesterday:

{%- if isPost -%}
	{%- set pageDescription = description or generatedDescription -%}
{%- else -%}
	{%- set pageDescription = description or metadata.description -%}
{%- endif -%}

Here's the finished data file once more:

import markdownit from "markdown-it";
import plainTextPlugin from "markdown-it-plain-text";

export default {
	tags: ["posts"],
	layout: "layouts/post.njk",
	permalink:
		"{{page.filePathStem.slice(5).replace(page.fileSlug, '')}}{{page.fileSlug|slugify}}/index.html",
	isPost: true,
	eleventyComputed: {
		generatedDescription: (data) => makeDescription(data.page.rawInput, 120),
	},
};

// Custom plugin that strips callouts only
const stripCalloutsPlugin = (md) => {
	const calloutRegex = /^\[!note\]([+-]?)( +[^\n\r]+)?/i;

	md.core.ruler.after("block", "strip_callouts", function (state) {
		const tokens = state.tokens;
		const tokensToRemove = [];

		for (let idx = 0; idx < tokens.length; idx++) {
			if (tokens[idx].type !== "blockquote_open") continue;

			const openIdx = idx;
			const closeIdx = findBlockquoteClose(tokens, idx);
			const contentToken = findInlineTokenInBlockquote(
				tokens,
				openIdx,
				closeIdx,
			);

			if (!contentToken) continue;

			const match = contentToken.content.match(calloutRegex);
			if (!match) continue;

			// Mark all tokens in this callout for removal
			for (let i = openIdx; i <= closeIdx; i++) {
				tokensToRemove.push(i);
			}
		}

		// Remove tokens in reverse order to maintain indices
		for (let i = tokensToRemove.length - 1; i >= 0; i--) {
			tokens.splice(tokensToRemove[i], 1);
		}
	});

	function findBlockquoteClose(tokens, idx) {
		for (let i = idx + 1; i < tokens.length; i++) {
			if (tokens[i].type === "blockquote_close") return i;
		}
		return idx;
	}

	function findInlineTokenInBlockquote(tokens, startIdx, endIdx) {
		for (let i = startIdx + 1; i < endIdx; i++) {
			if (tokens[i].type === "inline") return tokens[i];
		}
		return null;
	}
};

function regexCleanup(input) {
	const withoutTags = input.replace(/<[^>]*>/g, " ");
	// keep only the titles of markdown links
	// e.g. [example](https://example.com) becomes "example"
	const withoutExternalLinks = withoutTags.replace(
		/\[([^\]]+)\]\(([^)]+)\)/g,
		"$1",
	);
	// and [[link|title]] becomes "title" and [[link]] becomes "link"
	// First handle [[link|title]] format (becomes "title")
	const withoutInternalLinksWithTitle = withoutExternalLinks.replace(
		/\[\[([^|\]]+)\|([^|\]]+)\]\]/g,
		"$2",
	);
	// Then handle [[link]] format (becomes "link")
	const withoutInternalLinksSimple = withoutInternalLinksWithTitle.replace(
		/\[\[([^|\]]+)\]\]/g,
		"$1",
	);
	// remove footnotes
	// e.g. "Some text.[^1]" becomes "Some text."
	// and "Some text^[inline footnote]" becomes "Some text"
	const withoutFootnotes = withoutInternalLinksSimple.replace(
		/(\[\^.+?\])|(\^\[.+?\])/g,
		"",
	);
	return withoutFootnotes;
}

function markdownCleanup(input) {
	const md = new markdownit();
	md.use(stripCalloutsPlugin);
	md.use(plainTextPlugin);
	md.render(input);
	return md.plainText;
}

function makeDescription(rawInput, maxLength = 120) {
	const md = markdownCleanup(rawInput);
	const normalized = regexCleanup(md);

	if (normalized.length <= maxLength) {
		return normalized;
	}

	// Check if character at maxLength is a space or if we're at word boundary
	if (normalized[maxLength] === " " || normalized[maxLength] === undefined) {
		return normalized.substring(0, maxLength) + "…";
	}

	// Find the next space after maxLength to complete the word
	const nextSpace = normalized.indexOf(" ", maxLength);

	if (nextSpace === -1) {
		// No more spaces, return the whole string
		return normalized;
	}

	// Return up to the next space to complete the word
	return normalized.substring(0, nextSpace) + "…";
}

These are fun to write and kind of write themselves. ↩︎

← Previous
DailyDogo 1407 🐶
Next →
DailyDogo 1408 🐶