Why text-to-image AI keeps failing at scientific figures (and what actually works)
Over the past month I've been trying to replace my "spend half a day in Illustrator drafting figures" workflow with AI tools. I tried Midjourney, GPT-Image-1, DALL-E, Stable Diffusion fine-tunes, generic diagram-from-text tools, and finally a purpose-built scientific illustration tool. Only the last one actually worked for figures destined for a peer-reviewed paper. This post is the autopsy on why. You ask for a figure showing "PCR cycling steps: denaturation, annealing, extension," and the model writes "Denaturition", "Aneling", and "Estention" inside the boxes. Or it gets the words right but spells "DNA" as a five-letter blob. This happens because image-gen models treat text as pixels; they don't know it's text. Workarounds people try: Generate the image, then edit text in Photoshop. Works, but it's manual and removes the speed advantage. Use models with stronger text rendering (Flux 1.1 Pro, Ideogram). Better, but still wrong ~20% of the time, and you don't see which 20% until you've already exported. For a journal figure, the failure mode is invisible until a reviewer screenshots a mislabeled box and tells you to redo Figure 3. This is the real killer. Say the model gives you a four-panel figure: A, B, C, D. Reviewer asks: "Add a fifth panel showing the control condition." In pixels, there is no "add a panel." The only way to edit is to re-prompt. The new image will not preserve the exact layout, colors, fonts, or sizing of panels A-D. Every revision starts from scratch. Real-world cost: in a recent paper I had three revisions over six months. With pixel-AI tools, that's three from-scratch redraws. With Illustrator, it's three quick edits. With a structured-canvas tool that holds the figure as boxes/arrows/labels under the hood, it's three "add panel E" instructions, no redrawing. General image AIs are trained on stock photos, art, memes — not on scientific publications. The "diagram" they produce is the cartoony kind you'd see on a tech blog: 3D glossy boxes, comic-style arrows, gradient fills. Journals expect 2D, line-weight-controlled, color-restrained, vector outputs. You can prompt your way around some of this ("flat, minimal, journal style"), but the visual primitives the model knows are still pop-art primitives. It doesn't know what a cell membrane looks like in a methods schematic, or what a Sankey diagram for ecological flow should look like. The pattern that breaks all three problems: keep the figure as a structured representation (boxes, arrows, labels, panels) underneath the natural-language prompt, and only render to pixels at export time. That way: Text is text. It can't hallucinate spelling. "Add panel E" is a real operation on the structure. The library of primitives can be science-shaped: receptor cartoons, organelles, pathway arrows, multi-panel grids with consistent spacing. The tool I landed on after the experiment is figcanvas.com — paste a Methods paragraph, get a structured first draft, iterate per panel with plain English, export vector. The first-draft quality isn't perfect (it sometimes drops a label when reshuffling), but the iteration loop is the win: I went from "empty Illustrator file" to a clean methods schematic in 25 minutes that used to be a half-day job. More importantly, when the second reviewer asks for changes, those changes take 10 minutes instead of two hours. Image-gen AI is great for thumbnails, blog hero images, and concept art. For scientific figures it's a trap because: Text inside images is unreliable. Edits aren't compositional. The visual style is wrong for journals. Pick a tool that treats the figure as structure, not pixels. If you write papers and resent every minute spent in Illustrator, the structured-canvas approach is worth a weekend trial. Even if you don't switch tools long-term, you'll learn what the AI tooling actually can and can't do for academic work in 2026.
