AI News Hub Logo

AI News Hub

Offset Pagination Step by Step (with Sharding)

DEV Community
Mohamed Idris

Why pagination exists Imagine a workers table with 1,000,000 rows. A request like GET /workers that returns all of them would: send a huge JSON payload over the wire crash the browser trying to render it hammer the database The fix is simple: send a small slice at a time. That slice is called a page. That is all pagination is. We will build the rest from this idea. Think of a deck of 100 cards. You cannot hand someone the whole deck and ask them to find one card fast. Instead you say: "Take 10 cards at a time. When you finish, ask me for the next 10." 10 cards = page size "which group of 10" = page number "ask me for the next 10" = next page link That is the whole game. To grab a slice from a database, every ORM needs two things: take — how many rows to return (the slice size) skip — how many rows to ignore from the start In SQL these are LIMIT and OFFSET. SELECT * FROM workers ORDER BY id ASC LIMIT 10 OFFSET 20; That says: skip 20 rows, then give me 10. That is page 3 if each page has 10 rows. We do not want clients sending skip and take directly. That is awkward. Clients think in page numbers: "give me page 1, page 2, page 3". So we accept a page number and convert it to skip/take ourselves. If page size is 10: Page Skip Take 1 0 10 2 10 10 3 20 10 4 30 10 See the pattern? skip = (page - 1) * size. That is the formula in our codebase: return { take: page.size, skip: (page.num - 1) * page.size, // ... }; (page - 1) and not just page? Because OFFSET means "how many rows to skip", not "which row index to start at". Page 1 should skip nothing (offset 0). Page 2 should skip one page worth of rows (offset 10). Page 3, two pages (offset 20). So we subtract 1 before multiplying. If we used page * size instead, page 1 would skip 10 rows and the user would never see the first 10 records. The first page would be unreachable. The smallest valid offset is 0 (skip nothing). It is not 1, even though page numbers start at 1. Two common answers: Option A — return a total count. "There are 237 workers. You are on page 3 of 24." Page 3 / 24. COUNT(*) query on every request, which is slow on big tables. Option B — just tell them if a next page exists. Return a next link or null. This codebase picks Option B. The response shape is: { "data": [ ... 10 workers ... ], "links": { "next": "https://api.example.com/workers?page=2" } } When next is missing, you have hit the end. The trick: peek. Ask the database "is there at least one row on the next page?" If yes, return a next link. If no, return undefined. In this codebase that lives in getNextPage: const nextPageNum = currentPage.num + 1; const nextPageInShard = getPage(nextPageNum, currentPage.shard); const countRemainingInShard = await countOnPage(nextPageInShard, ...); if (countRemainingInShard > 0) { return nextPageInShard; } It runs a count query with the next page's skip and take. If the count is greater than 0, the next page has data. That is a very common shortcut. If you asked for 10 and got 10, maybe there are more. If you got fewer than 10, you are at the end. It works, but it lies in one edge case: when the total is a perfect multiple of the page size. With 30 rows, page 3 returns exactly 10. The shortcut says "probably more", so the client requests page 4, gets an empty array, and now you served an extra useless request. Doing a count avoids that. Cost tradeoff: the count query is extra work. For small to medium tables it is fine. For huge tables you would switch to cursor pagination (we will mention that at the end). Instead of passing (num, size, shard) everywhere, we wrap them in a single object: interface Page { num: number; size: number; shard?: number; } And a tiny helper builds it with safe defaults: export function getPage(pageNum?: number, shard?: number): Page { return { num: pageNum ? pageNum : FIRST_PAGE, // default to page 1 size: PAGE_SIZE, // fixed at 10 shard: shard !== undefined ? shard : DEFAULT_SHARD, }; } If the client sends nothing, they get page 1, size 10, shard 0. Nice and forgiving. Clients send pagination as query params: ?page=2&shard=0. We turn that into a Page object once, in one place: export const PaginationPage = createParamDecorator((_data, ctx) => { const request = ctx.switchToHttp().getRequest(); const page = parseOptionalInt(request.query.page); const shard = parseOptionalInt(request.query.shard); return getPage(page, shard); }); Now any controller can do: async get(@PaginationPage() page: Page) { ... } No manual parsing in every handler. Very clean. Once we know the next page exists, we build a URL the client can call directly. Important detail: keep all the other query params the user sent (filters, sorting), only change pagination. const url = new URL(`${request.protocol}://${request.get("Host")}${request.originalUrl}`); const searchParams = new URLSearchParams(url.search); searchParams.set("page", nextPage.num.toString()); if (nextPage.shard !== undefined) { searchParams.set("shard", nextPage.shard.toString()); } searchParams.set overwrites just those keys. If the original URL was /workers?location=NY&page=1, the next link becomes /workers?location=NY&page=2. The filter survives. This pattern is called HATEOAS: the server tells the client where to go next, instead of the client guessing the URL shape. Heads up: if sharding is new to you, the basics above are already enough to understand pagination. This step is a bonus that explains the extra logic in this codebase. You can skim it on the first read and come back later. Sharding means splitting one big table into smaller logical buckets. Each row has a shard column (0, 1, 2, ...). Queries always filter by one shard. Why? On gigantic tables it spreads load and lets you query smaller subsets. In this project shards are limited to MAX_SHARDS = 10. The pagination has to walk through shard 0 first, then shard 1, then shard 2... When shard 0 is exhausted, jump to shard 1 page 1. That is the second half of getNextPage: // no more rows in current shard, try next shard const nextShard = (currentPage.shard ?? DEFAULT_SHARD) + 1; if (nextShard > MAX_SHARDS) { return undefined; // we have walked all shards, truly done } const pageInNextShard = getPage(FIRST_PAGE, nextShard); const countInNextShard = await countOnPage(pageInNextShard, ...); if (countInNextShard > 0) { return pageInNextShard; } return undefined; Reading top to bottom: Try the next page in the same shard. Has data? Return it. Otherwise, jump to page 1 of the next shard. Has data? Return it. Otherwise, end. shard 0: [page 1] -> [page 2] -> [page 3] -> done in this shard | v shard 1: [page 1] -> [page 2] -> done in this shard | v shard 2: [page 1] -> ... The client never sees this complexity. They just keep following links.next. The full request flow for GET /workers?page=2: 1. Decorator parses ?page=2 into a Page object { num: 2, size: 10, shard: 0 } 2. Service calls queryParameters(page) -> { skip: 10, take: 10, where: { shard: 0 } } 3. Prisma runs SELECT ... LIMIT 10 OFFSET 10 WHERE shard = 0 4. Service calls getNextPage(...) which counts the next slice 5. Controller maps rows to DTOs and builds the next link 6. Client gets { data: [...], links: { next: "...?page=3&shard=0" } } Every piece has one job. That is why each function looks small. Always include ORDER BY in paginated queries. Without it, databases can return rows in any order, and the same row could appear on two pages or be skipped. The codebase uses orderBy: { id: "asc" } for this reason. Page size should be capped on the server. If the client could send ?size=1000000, you are back to the original problem. This codebase hard-codes PAGE_SIZE = 10 so the client cannot abuse it. Default to page 1 if the param is missing or invalid. Be forgiving. Skipping is O(N). OFFSET 100000 makes the database scan and discard 100,000 rows. That is fine for small offsets, painful for huge ones. See the next section. You will hit two problems eventually: Problem 1: deep pages are slow. OFFSET 1000000 LIMIT 10 makes the database walk through a million rows just to throw them away. Problem 2: shifting data. If a row is inserted while the user paginates, page boundaries shift. They might see the same row twice or miss one. The fix for both is cursor pagination: instead of "page 2", the client sends "give me 10 rows after id=42". The query becomes WHERE id > 42 ORDER BY id LIMIT 10, which uses an index and is fast no matter how deep you go. You give up the ability to jump to "page 47" directly. You can only go forward (and sometimes backward). For infinite-scroll feeds this is perfect. For admin tables with page numbers, offset pagination is fine. This project uses offset pagination because the page sizes are small and the use case suits it. Knowing the alternative is gold in interviews. Pagination = serve big lists in small slices. Formula: skip = (page - 1) * size, take = size. Page numbers are 1-based (page 1 is the first page), but OFFSET is a count of rows to skip. Page 1 must skip 0 rows. That is why we subtract 1 before multiplying. Always order results. Return a next link instead of a total count when you do not need page numbers in the UI. To know if next exists, peek at the next slice with a count query. Sharding adds an outer loop: walk pages within a shard, then jump to the next shard. For very large datasets or live feeds, switch to cursor pagination. If you can explain that list out loud without notes, you understand pagination better than 90 percent of candidates.