How I Built a Swedish Crossword Solver with Astro and 400,000+ Words
How I Built a Swedish Crossword Solver with Astro and 400,000+ Words I recently launched Korsordsakuten — a free Swedish crossword solver — and learned a lot about building SEO-driven content sites with Astro. Here's what I built and what I learned along the way. The site lets you: Search 400,000+ Swedish word forms by clue, pattern, or length Filter answers by letter count (e.g. "give me 6-letter synonyms only") Browse prefix/suffix indexes (words starting with SK, ending with ERA etc.) Solve anagrams Play a daily Wordle-style word game Astro 6 (SSR mode, Node adapter) — perfect for content-heavy sites. Each route is server-rendered but the build is still fast. Node.js 22 on Render (free tier) GitHub Pages as a static sitemap mirror — more on this below Zero client-side JS frameworks. Just vanilla JS where needed. The word database is built from public Swedish word lists, processed through a Node.js pipeline: scripts/ build-worddb.mjs # Build words.json, synonyms.json, related.json build-phrases.mjs # Process multi-word crossword entries build-sitemaps.mjs # Generate 278 sitemap files × 1000 URLs each publish-sitemaps.mjs # Push sitemaps to GitHub Pages mirror The synonym/related data is derived from Swedish lexical resources, giving each word a list of crossword-appropriate answers. The site has some large JSON files: clue-index.json — 18 MB related.json — 8.5 MB synonyms.json — 6.75 MB Loading all of these at module startup caused OOM crashes on Render's 256 MB free tier. The fix: lazy loading via getter functions. // Before — crashes on startup import wordsData from '../data/words.json'; // After — only loads when first accessed let _words: string[] | null = null; export function getWords(): string[] { if (!_words) _words = wordsData as string[]; return _words; } Also added --max-old-space-size=460 to the start script (Render allows slightly over the nominal 256 MB before killing the process). With 227,000+ URLs, Google wouldn't read our sitemaps. Two issues: Files were too large — 45,000 URLs × ~120 bytes = 5.4 MB per file. Google has an unofficial ~1 MB soft limit. Fixed by reducing to 1,000 URLs per file (278 files). The host went down — Render free tier sleeps on inactivity. When it's down, Google can't fetch the sitemap and backs off for weeks. Solution: Push sitemaps to GitHub Pages as a static mirror. A simple script clones the repo, copies the XML files, rewrites the sitemap index URLs to point to the mirror, and pushes: // publish-sitemaps.mjs (simplified) const files = readdirSync(SRC_DIR).filter(f => /^sitemap.*\.xml$/.test(f)); for (const f of files) { if (f === 'sitemap.xml') { // Rewrite index so sub-sitemap URLs point to the mirror const content = readFileSync(join(SRC_DIR, f), 'utf-8'); const rewritten = content.replace( /https:\/\/www\.korsordsakuten\.se\/(sitemap-\d+\.xml)/g, 'https://sitemaps.korsordsakuten.se/$1' ); writeFileSync(join(WORK_DIR, f), rewritten); } else { copyFileSync(join(SRC_DIR, f), join(WORK_DIR, f)); } } A custom subdomain (sitemaps.korsordsakuten.se → GitHub Pages via CNAME) makes the URLs clean. Now Google can always fetch sitemaps even if the main site is sleeping. For clue pages (/korsord/[word]), I added QAPage schema instead of just FAQPage: { "@type": "QAPage", "mainEntity": { "@type": "Question", "name": "avslutningsvis — korsordssvar", "answerCount": 8, "acceptedAnswer": { "@type": "Answer", "text": "SLUTLIGEN (9 bokstäver)", "upvoteCount": 8 }, "suggestedAnswer": [] } } This signals to Google that each clue page is structured Q&A content — similar to how Q&A forum sites are interpreted — rather than auto-generated thin content. For word pages, DefinedTerm with synonyms as alternateName marks the page as a lexical resource: { "@type": "DefinedTerm", "name": "PLATS", "alternateName": ["STÄLLE", "POSITION", "LÄGE"], "inDefinedTermSet": { "@type": "DefinedTermSet", "name": "Korsordsakuten ordlista" } } One thing competitor sites have that purely static sites lack: freshness signals. Google re-crawls active sites more frequently. Solution: /dagens-ledtradar — a page showing 24 curated crossword clues that rotates every day using a deterministic seed: const dayNum = Math.floor((Date.now() - epoch) / 86400000); const rng = mulberry32(dayNum * 7919 + 13); // Pick 24 unique entries from top-2000 clues Same date = same picks (the page is cacheable), but crawling the URL the next day returns different content. Triggers Google's freshness heuristic without any database or cron job. Start with a paid host. Render free tier sleeping + OOM issues cost weeks of SEO recovery time. Plan for JSON size early. Lazy loading was the fix but the problem was predictable. Submit 10 hand-picked URLs to GSC from day one. Don't wait for the sitemap crawler to discover everything. Site: korsordsakuten.se Daily clues: /dagens-ledtradar Happy to answer questions about any part of the stack!
