Creating A Guestbook To Crowd Source Alt Texts - Part 1 (Architecture & Setup)

2025-03-09 (Sunday) at 16:34
Blog

Intro

It's a long-standing tradition of this blog to post a dog pic - a "DailyDogo" per day. I have done so from day one when our dog Napu moved in with us, and I plan to continue to do so for the foreseeable future. However, one thing that always bothered me is that none of these dog pics have alt text, which is important to be inclusive and accessible.

If you've never heard about alt text: It is used by people (not only) using screen readers to be able to understand what an image is about. So alt text is a short description of what you see on the image.

Anyways, I have not added alt-text to my dog pics.^[1] But I have come up with a way how YOU can help me to add alt texts to DailyDogos: A guestbook that instead of some kind of "Are you human" check asks that people provide alt text to a random dog pic!^[2]

Architecture

This site is based on Cloudflare's developer platform, meaning there isn't really a server involved, where I would ftp or ssh files to and configure a http server to handle a route in this way or that way. If that'd be the case I would just build a small php api that handles reading and writing guestbook entries and alt text suggestions.

But this blog is just a static website based on Eleventy and is deployed using Cloudflare Pages, a (mostly) static site host.

I did say "(mostly)" though. Because you can actually have some dynamic functionality by using something called "pages functions" and workers. And that's what we'll use to make this guestbook a reality.

Database Structure

The cloudflare platform has a managed database service called D1, which we'll use to save our guestbook's data and alt text candidates. So we'll need to create three tables^[3]:

Table guestbook_entry
- id
- name
- website
- message
- approved
Table alt_text_candidate
- id
- post_url
- image_url
Table alt_text_suggestion
- id
- alt_text_suggestion
- fk guestbook_message.id
- fk alt_text_candidate.id
- approved

By having a column in the alt_text_suggestion table that references the guestbook entry which was created alongside it, we'll know later where a suggestion came from. That could be nice if we want to give attribution, or make a leader board, or whatever. The suggestions are also related to the candidate - which is what I call a dog pic without an (approved) alt text entry so far.

Worker

We'll create a minimalistic api which consists of four end points:

getAll - for fetching all guestbook entries from the database
create - for creating a guestbook entry and - if it was submitted - an alt text suggestion
getCandidate - for fetching a dog pic without an (approved) alt text
refresh-candidates - for creating dog pic candidates from my archival JSON feed
approve-guestbook - for approving a guestbook entry by its id
approve-alt-text - for approving alt text suggestions

The getAll endpoint will just do a SELECT * FROM guestbook_entry and return a JSON that can then be used to create a list of guestbook entries on the frontend. For now that's enough. In the future we may want to have pagination functionality, and so we might want to add an offset and limit parameter later. But that's later.

The create endpoint will be called when a user submits a new entry to the guestbook. It creates an INSERT statement for the guestbook_entry table. It checks if the request includes a new alt text suggestion and then INSERTs that data into that table, too.

The getCandidate endpoint returns an entry from the alt_text_candiate table that doesn't have an approved entry, yet. We will get slightly fancy and introduce some randomness, so that we'll get different images with consecutive reloads of the form.

The refresh-candidates endpoint has the most complicated logic since it has to parse html (the entries of a JSON feed) AND check if the candidate is already in the database.

Pages

Cloudflare Pages is what powers the blog itself. The way it works is that Cloudflare builds and redeploys the blog any time I push a new commit to the blog's Github repo. That part won't change, but we'll have to create a new page called /guestbook/.

The /guestbook/ page will consist of two parts:

a list of guestbook entries - We'll have to send a GET request to the getAll endpoint, parse the returned JSON and create a listing of all guestbook entries so far
a form for new entries - It should have a form with a name field, an optional website field and a message field. So far so normal. When the page is loaded we will also send a GET request to the getCandidate endpoint to show a dog pic that lacks an approved alt text so far.

Also, we want to have a clean way of accessing the API. In theory, we could just access our worker through its URL - every worker gets its own URL - where it is reachable, but I think it's nicer if it just works through https://blog.martin-haehnel.de/api/.

So in other words, the frontend JavaScript will not communicate directly with the api worker, but will send its request instead to an /api/ catch all endpoint. That endpoint will be using a Cloudflare Pages function, which then in turn forwards the request to the actual worker. In that way, we will have only one visible domain involved in serving the blog and its dynamic parts.^[4]

Setup

Worker Repo

I started by creating a new worker through what is called C3, a cli tool that creates new repos based on existing templates.

npm create cloudflare@latest -- --template=cloudflare/templates/d1-template

This tool will ask a couple of questions, like what directory we want to install the new worker into (cloudflare-guestbook) and if we want to use Git for version control (yes) and also if we want to deploy the new worker as is (no).

Database, KV, Secrets

Initial Migration

Let's start by adjusting the migration file. I renamed mine to 0001_init.sql and replaced its contents as follows:

-- Migration number: 0001 	 2025-02-16T22:03:36.794Z
-- Guestbook Entries
CREATE TABLE IF NOT EXISTS guestbook_entry
(
  id INTEGER PRIMARY KEY NOT NULL,
  name TEXT NOT NULL,
  website TEXT,
  message TEXT NOT NULL,
  approved BOOLEAN NOT NULL DEFAULT FALSE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Alt Text Candidates
CREATE TABLE IF NOT EXISTS alt_text_candidate
(
  id INTEGER PRIMARY KEY NOT NULL,
  post_url TEXT NOT NULL,
  image_url TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'completed', 'removed')), -- pending = not yet processed, completed = alt text added, removed = image removed
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT unique_post_image UNIQUE (post_url, image_url)
  );

-- Alt Text Suggestions
CREATE TABLE IF NOT EXISTS alt_text_suggestion
(
  id INTEGER PRIMARY KEY NOT NULL,
  alt_text_suggestion TEXT NOT NULL,
  guestbook_message_id INTEGER NULL,
  alt_text_candidate_id INTEGER NULL, -- Allow NULL to keep suggestion even if candidate is deleted
  approved BOOLEAN NOT NULL DEFAULT FALSE,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (guestbook_message_id) REFERENCES guestbook_entry (id) ON DELETE CASCADE,
  FOREIGN KEY (alt_text_candidate_id) REFERENCES alt_text_candidate (id) ON DELETE SET NULL
  );

-- Indexes for Optimized Queries
CREATE INDEX idx_guestbook_approved ON guestbook_entry (approved);
CREATE INDEX idx_suggestion_candidate ON alt_text_suggestion (alt_text_candidate_id);
CREATE INDEX idx_suggestion_approved ON alt_text_suggestion (approved);
CREATE INDEX idx_candidate_image ON alt_text_candidate (image_url);
CREATE INDEX idx_candidate_status ON alt_text_candidate (status);

-- Auto-update `updated_at` on row modifications
CREATE TRIGGER update_guestbook_updated_at
  AFTER UPDATE
  ON guestbook_entry
  FOR EACH ROW
BEGIN
  UPDATE guestbook_entry SET updated_at = CURRENT_TIMESTAMP WHERE id = OLD.id;
END;

CREATE TRIGGER update_candidate_updated_at
  AFTER UPDATE
  ON alt_text_candidate
  FOR EACH ROW
BEGIN
  UPDATE alt_text_candidate SET updated_at = CURRENT_TIMESTAMP WHERE id = OLD.id;
END;

CREATE TRIGGER update_suggestion_updated_at
  AFTER UPDATE
  ON alt_text_suggestion
  FOR EACH ROW
BEGIN
  UPDATE alt_text_suggestion SET updated_at = CURRENT_TIMESTAMP WHERE id = OLD.id;
END;

Lots of stuff going on here.

We define the tables and their columns and the types and if they are nullable and/or have a default value
- We have one column in alt_text_candidate that is basically an enum
We define a unique constraint on two columns of the alt_text_candidate which we'll use later to figure out if we already added that particular dog pic to the list^[5]
The suggestions will get a foreign key constraint on both a candidate and the corresponding guestbook entry
All the columns that are used in JOIN statements or WHERE conditions get an index
We create triggers for the updated_at columns. So that when I approve something, I can see when this change has happened.

Creating The Resources And Configuring Bindings

Next, we'll actually create our database, apply the migration, create a secret and a couple of kv namespaces (used for caching and rate limiting, more on that later...) through Cloudflare's cli tool wrangler and binding them to the worker - so that it can use them. What's important here is that we will want to have three different setups:

local - where d1 and kv is just simulated locally
remote preview - where we run our worker on Cloudflare's infrastructure but not on production data
prod - our actual productively deployed version of the guestbook

Prod

We'll start "backwards", so to speak, because some local resources are implicitly created when creating those resources on Cloudflare.

Database

npx wrangler d1 create blog-guestbook

After the database was created, you'll be shown a little piece of JSON:

{
  "d1_databases": [
    {
      "binding": "DB",
      "database_name": "blog-guestbook",
      "database_id": "51eab7a0-a1a3-a441-ce5b-8da45d3a61ad"
    }
  ]
}

Take note of the database_name and database_id. Next, we'll edit the wrangler.jsonc:

{
  //...
  "d1_databases": [
    {
      "binding": "DB",
      "database_id": "51eab7a0-a1a3-a441-ce5b-8da45d3a61ad", // <- update this
      "database_name": "blog-guestbook" // <- maybe update this
    }
  ],
  //...
}

Next, let's apply our migration:

npx wrangler d1 migrations apply --remote blog-guestbook

It'll ask if we actually want to apply the migration (Y). Afterwards we should be able to see our tables with the correct columns through the Cloudflare Dashboard.

KV

KV is another service from Cloudflare that implements a simple Key/Value store that is optimized for quick reads - which makes it perfect for caching. KV stores do not have complicated columns, etc. You can create/read/update/delete keys and their values and that's it.

We'll use two "tables" (called namespaces) to store the following:

a rate limiting store where the IPs of people using the guestbook are entered and if the try to create more than five guestbook entries per minute they get rate limited
a cache for the guestbook entries, since going to /guestbook/ fetches all entries at once at the moment, we don't want to hit the database all the time.

So let's create those namespaces:

npx wrangler kv:namespace create "RATE_LIMIT"

npx wrangler kv:namespace create "GUESTBOOK_CACHE"

After each run of the command, we get another bit of json we can add to our wrangler.jsonc configuration file. Which in the end should look something like this:

{
  //...
  "kv_namespaces": [
      {
        "binding": "RATE_LIMIT",
        "id": "91b5e789d42744aar802aace073ed882",
      },
      {
        "binding": "GUESTBOOK_CACHE",
        "id": "gg19ede5038147ccbcc2a90f65495f92",
      }
  ],
  //...
}

Secret

Because we're going to have a few routes that are not supposed to be used by anyone else but the admin of the blog, we'll create a secret API-Key that we'll check for when these admin routes are hit.

A secret is in practice nothing else than an environment variable. But in this case it is masked and not part of the source code repo. You create these by using wrangler:

npx wrangler secret put ADMIN_API_KEY

Wrangler is going to ask for the value and then it'll save the secret. You can refer to it as if it was just a normal environment variable.^[6]

Preview Environment

Okay, moving on the preview environment. The reason why we want this environment is that even though we can - and should - develop our little worker mostly locally, it is still a good idea to test the worker in its actual environment.

Database

Let's create a preview database:

npx wrangler d1 create blog-guestbook_preview

This time we only need the id of the database. We will extend our already existing d1 binding:

{
  //...
  "d1_databases": [
    {
      "binding": "DB",
      "database_id": "51eab7a0-a1a3-a441-ce5b-8da45d3a61ad",
      "database_name": "blog-guestbook",
      "preview_database_id": "bbe2ce46-acfc-4072-817d-5441a90c7419" //<- new
    }
  ],
  //...
}

And now we'll apply the migration:

npx wrangler d1 migrations apply --preview --remote blog-guestbook

KV

Let's add preview ids to our KV namespaces next:

npx wrangler kv:namespace create "RATE_LIMIT" --preview

npx wrangler kv:namespace create "GUESTBOOK_CACHE" --preview

In both cases we get a JSON looking like this:

{
  "kv_namespaces": [
    {
      "binding": "RATE_LIMIT",
      "preview_id": "b1ad7e6dc2a8441087a546578c8de2b6"
    }
  ]
}

Take note of the preview_id line and copy it:

{
  //...
  "kv_namespaces": [
      {
        "binding": "RATE_LIMIT",
        "id": "91b5e789d42744aar802aace073ed882",
        "preview_id": "b1ad7e6dc2a8441087a546578c8de2b6" // <- here...
      },
      {
        "binding": "GUESTBOOK_CACHE",
        "id": "gg19ede5038147ccbcc2a90f65495f92",
        "preview_id": "93c1cb783b24461fba02afb20566c34e" // <- ... and here
      }
  ],
  //...
}

Secret

The preview environment is going to be reusing the prod secret. The big difference between prod and preview is simply the data storage. So nothing needs to be done here.

Local Environment

Database

For the local environment, the database will be created for us, based on the binding configuration in the wrangler config file. We do have to apply the migration to it though. Which is done as follows:

npx wrangler d1 migrations apply --local blog-guestbook

KV

Nothing to do here. These will be created locally, based on our config from earlier.

Secret

For our local env to work we need a not-so-secret secret for the ADMIN_API_KEY.

First we create a .dev.vars file in the root of our repo and then add there something like this:

ADMIN_API_KEY=123

Coding Setup

Code-Organization And Configuration

Let's start by preparing the repo for a good editing experience by adding a .editorconfig file to the repo:

root = true

[*]
charset = utf-8
end_of_line = lf
indent_style = tab
insert_final_newline = true
max_line_length = 120
tab_width = 4
trim_trailing_whitespace = true

Next, we'll make sure that we can autoformat and lint any added code. For this, we'll use prettier and eslint.

Install prettier according to its instructions.
Install eslint according to its instructions.
Install eslint-config-prettier to make eslint and prettier play together nicely
Configure your editor to use these tools.

The template we initially used to get going defined the entrypoint as /src/index.ts, so we'll add all our newly written code under src going forward as well.

Testing

Manual testing is nice, but we should also do some automated testing. Cloudflare recommends using vitest, so we'll do just that.

Install and Configure Vitst according to Cloudflare's Docs

In the end we should have installed a couple of packages, a vitest.config.ts in the repo root and an additional tsconfig.json inside the test folder.^[7]

We're now ready to write tests.

Let's write a dummy test and run it to see that everything works as expected. Create a file in test named dummy.test.ts and add the following:

import { describe, expect, it } from "vitest";
import { createExecutionContext, env, waitOnExecutionContext } from "cloudflare:test";

describe("dummy", () => {
	it("should pass", () => {
		expect(1).toBe(1);
	});
	it("should be able to use cloudflare workers integration", () => {
		expect(env).toBeDefined();
		expect(createExecutionContext).toBeDefined();
		expect(waitOnExecutionContext).toBeDefined();
	});
});

We'll add more tests later, but this should show us that testing is setup correctly and that everything is set up correctly. You can either run this in your IDE or through the commandline using:

npx vitest

Husky

To ensure that any code committed is according to our coding style standards and passes our tests, let's also use Husky, a handy tool which will make sure any code about to be committed to source control is ready.

Install it according to its docs

This installs Husky and a git hook that tries to run npm run test which doesn't exist at this point. So let's add a test script and a linting a formatting script as well to our package.json:

{
  //...
  "scripts": {
    "prepare": "husky", // this one should already exist
    "test": "vitest run",
    "test:watch": "vitest run --watch",
    "lint": "eslint",
    "lint:fix": "eslint --fix",
    "format": "prettier . --check",
    "format:write": "prettier . --write"
  }
  //...
}

The test:watch, lint:fix and format:write scripts are not needed for husky, but for development. But since we were already doing stuff in the file, why not add them?

Let's update the git hook to use test, lint and format. Edit the file .husky/pre-commit:

npm run test
npm run format
npm run lint

That's it. Now anytime we update the worker, we ensure that those updates don't accidentally change the code for worse.

I do add alt text to all other pictures that accompany my writing, though. ↩︎
This will be skippable of course. ↩︎
The following is not complete, there are creation dates, indexes and triggers for the updated_at columns in the actual migration. See below. ↩︎
In theory we could've have implemented the whole guestbook using pages functions. I didn't do that because I wanted to know how workers work. But it would've been totally feasible to do so using only the pages application. ↩︎
I'm not 100% sure this is a good idea, as there may have been times where there were more than 1 doc pic per post, but for now, we'll work with this. ↩︎
Normal env vars can be set in your wrangler.jsonc. We just don't have a need for them here. ↩︎
Make sure btw. that you change the line where it uses a wrangler.toml. We're using a wrangler.jsonc after all. ↩︎

← Previous
DailyDogo 1198 🐶
Next →
DailyDogo 1199 🐶