Use Data Stores In Application Development | 🏗️ Build A Product Catalog API
Exam Guide: Developer - Associate 🏗️ Domain 1: Development with AWS Services Task 3: Use Data Stores In Application Development DynamoDB dominates this task. The need to understand table design, key selection, indexing, consistency models, and how to write efficient queries is essential. As well as caching with ElastiCache and DAX. Plus specialized stores like OpenSearch. Primary Keys Every table needs one. Two options: Simple primary key: Partition key only (PK). Each item has a unique PK. Composite primary key: Partition key (PK) + Sort key (SK). Multiple items can share a PK if they have different SKs. Partition Key Selection The partition key determines which physical partition stores your data. A good partition key has high cardinality which refers to many distinct values so that the data spreads evenly. Good Partition Keys Bad Partition Keys userId, orderId, sessionId status ("active"/"inactive") deviceId, transactionId country (few values, uneven distribution) email, accountId date (hot partition for today) Model Behaviour Cost Available On Eventually Consistent May return stale data (usually consistent within 1 second) 1x read capacity Base table + GSIs Strongly Consistent Always returns the most up-to-date data 2x read capacity Base table only (NOT GSIs) Operation How It Works Cost When to Use Query Finds items by partition key + optional sort key condition Reads only matching items Always prefer this Scan Reads every item in the table Reads entire table Analytics, one-time migrations only GetItem Fetches one item by its full primary key Reads exactly one item When you know the exact key FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan/query. To reduce reads, use better key design or GSIs. Feature GSI LSI Partition Key Different from base table Same as base table Sort Key Different from base table Different from base table When To Create Anytime At table creation only Throughput Has its own (separate from base table) Shares with base table Consistency Eventually consistent only Supports strongly consistent Limit 20 per table 5 per table GSI Projection Types: ALL: all attributes (most flexible, most storage cost) KEYS_ONLY: only key attributes (cheapest) INCLUDE: keys + specified attributes (balanced) Service Use Case Latency Works With DAX DynamoDB read cache Microseconds DynamoDB only, eventually consistent only ElastiCache Redis General-purpose cache Sub-millisecond Any data source, complex data types, persistence ElastiCache Memcached Simple caching Sub-millisecond Any data source, multi-threaded, no persistence Store Use Case DynamoDB Key-value lookups, known access patterns, serverless RDS/Aurora Relational data, complex joins, ACID transactions OpenSearch Full-text search, log analytics, complex queries S3 Object storage, data lake, large files ElastiCache Session storage, leaderboards, real-time analytics DynamoDB TTL Automatically deletes expired items at no cost. Eventually consistent (up to 48 hours delay). S3 Lifecycle Policies Transition objects between storage classes (Standard → IA → Glacier) or expire them after a set time. Now let's put these concepts into practice by builidng a Product Catalog API backed by DynamoDB: A DynamoDB table with a composite primary key and a Global Secondary Index (GSI) A Lambda function that performs queries, scans, and writes TTL configured for automatic data expiration DAX caching in front of DynamoDB A clear understanding of when to use query vs scan, GSI vs LSI, and strong vs eventual consistency An AWS account (free tier covers DynamoDB and Lambda) Part I Design the DynamoDB Table Before creating anything, let's think about access patterns. This is the most important step in DynamoDB design. Our Access Patterns Access Pattern How We'll Query Get a product by ID Query PK = PRODUCT#123 List all products in a category Query PK = CATEGORY#electronics Get a product's reviews Query PK = PRODUCT#123, SK begins_with REVIEW# Find products by price range in a category Query GSI with category + price List recently added products Query GSI with status + createdAt Create the Table Step 01: Open the DynamoDB console Step 02: Click Create table Table name: ProductCatalog Partition key: PK (String) Sort key — optional: SK (String) Step 03: Under Table settings, choose Customize settings Step 04: Read/write capacity settings: On-demand ⚠️ Don't create the just table yet, let's add a GSI first Add a Global Secondary Index Step 05: Scroll down to Secondary indexes → click Create global index: Index name: GSI1 Partition key: GSI1PK (String) Sort key: GSI1SK (String) Attribute projections: All Click Create index, then click Create table. ✅Green banner: The ProductCatalog table was created successfully. Why This Design? We're using the single-table design pattern with overloaded keys: Base table: PK = "PRODUCT#laptop-001" SK = "METADATA" → product details PK = "PRODUCT#laptop-001" SK = "REVIEW#2026-04-24" → a review PK = "CATEGORY#electronics" SK = "PRODUCT#laptop-001" → category listing GSI1: GSI1PK = "CATEGORY#electronics" GSI1SK = "PRICE#00079.99" → find by price GSI1PK = "STATUS#active" GSI1SK = "2026-04-24" → find by date Partition Key Selection: A good partition key has high cardinality (many distinct values). Bad examples: status (only a few values → hot partition), country (uneven distribution). Good examples: userId, productId, orderId. Using the Console Step 01: In the DynamoDB console, click on ProductCatalog Step 02: Click Explore table items Step 03: Click Create item JSON view (toggle at the top) { "PK": {"S": "PRODUCT#laptop-001"}, "SK": {"S": "METADATA"}, "GSI1PK": {"S": "CATEGORY#electronics"}, "GSI1SK": {"S": "PRICE#00999.99"}, "name": {"S": "Pro Laptop 15"}, "description": {"S": "15-inch laptop with 16GB RAM"}, "price": {"N": "999.99"}, "category": {"S": "electronics"}, "status": {"S": "active"}, "createdAt": {"S": "2026-04-20T10:00:00Z"}, "stock": {"N": "50"} } Step 04: Click Create item Add a few more products: { "PK": {"S": "PRODUCT#mouse-002"}, "SK": {"S": "METADATA"}, "GSI1PK": {"S": "CATEGORY#electronics"}, "GSI1SK": {"S": "PRICE#00029.99"}, "name": {"S": "Wireless Mouse"}, "description": {"S": "Ergonomic wireless mouse"}, "price": {"N": "29.99"}, "category": {"S": "electronics"}, "status": {"S": "active"}, "createdAt": {"S": "2026-04-22T10:00:00Z"}, "stock": {"N": "200"} } { "PK": {"S": "PRODUCT#laptop-001"}, "SK": {"S": "REVIEW#2026-04-24#user-001"}, "rating": {"N": "5"}, "comment": {"S": "Great laptop, fast and reliable"}, "userId": {"S": "user-001"}, "createdAt": {"S": "2026-04-24T14:30:00Z"} } { "PK": {"S": "CATEGORY#electronics"}, "SK": {"S": "PRODUCT#laptop-001"}, "name": {"S": "Pro Laptop 15"}, "price": {"N": "999.99"} } { "PK": {"S": "CATEGORY#electronics"}, "SK": {"S": "PRODUCT#mouse-002"}, "name": {"S": "Wireless Mouse"}, "price": {"N": "29.99"} } Query (Efficient) Step 01: In the ▼ Scan or query items tab, switch from Scan to Query Step 02: Set: Partition key Value: PRODUCT#laptop-001 Sort key Value: Begins with ▼ REVIEW Click Run ✅Green banner: Completed · Items returned: 1 · Items scanned: 1 · Efficiency: 100% · RCUs consumed: 0.5 You'll see only the review items for that product. DynamoDB read exactly the items you asked for. Efficient. Scan (Expensive) Step 03: Switch back to Scan Run with no filters ✅Green banner: Completed · Items returned: 5 · Items scanned: 5 · Efficiency: 100% · RCUs consumed: 2 You'll see ALL items in the table. DynamoDB read every single item. On a table with millions of items. This is slow and expensive. Step 04: Click Add filter Attribute name: category Type: String Condition: Equal to Value: electronics Click Run ✅Green banner: Completed · Items returned: 2 · Items scanned: 5 · Efficiency: 40% · RCUs consumed: 2 You'll see only electronics items, but DynamoDB still read the entire table and filtered afterward. The filter doesn't reduce the read cost. 💡 FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan. To reduce reads, use better key design or GSIs. Query the GSI Step 05: Switch to Query Step 06: Change the Index dropdown to GSI1 Partition key (GSI1PK) Value: CATEGORY#electronics Sort key (GSI1SK) Value: Begins with ▼ PRICE# Click Run ✅Green banner:: Completed · Items returned: 2 · Items scanned: 2 · Efficiency: 100% · RCUs consumed: 0.5 This efficiently finds all electronics products, sorted by price. That's the power of a well-designed GSI. Step 01: Open the Lambda console → Create function Function name: ProductCatalogAPI Runtime: Python 3.12 Click Create function ✅Green banner:: Successfully created the function "ProductCatalogAPI". Step 02: Configuration → *General configuration → click Edit Memory: 256 MB Timeout: 10 seconds Click Save Step 03:Add DynamoDB Permissions Configuration → Permissions → click the role name Add permissions → Attach policies AmazonDynamoDBFullAccess ⚠️ In production, scope this down to only the ProductCatalog table. For this tutorial, full access keeps things simple. Brevity. Step 04: Write the Function Code import json import boto3 from decimal import Decimal from boto3.dynamodb.conditions import Key, Attr # Initialize OUTSIDE the handler — reused across warm invocations dynamodb = boto3.resource('dynamodb') table = dynamodb.Table('ProductCatalog') class DecimalEncoder(json.JSONEncoder): """DynamoDB returns Decimal types — this converts them to float for JSON.""" def default(self, obj): if isinstance(obj, Decimal): return float(obj) return super().default(obj) def lambda_handler(event, context): """ Routes requests based on the HTTP method and path. Demonstrates query, scan, get_item, and put_item operations. """ http_method = event.get('httpMethod', 'GET') path = event.get('path', '/') path_params = event.get('pathParameters') or {} query_params = event.get('queryStringParameters') or {} try: if path == '/products' and http_method == 'GET': return list_products(query_params) elif path.startswith('/products/') and http_method == 'GET': product_id = path_params.get('productId', path.split('/')[-1]) return get_product(product_id) elif path == '/products' and http_method == 'POST': body = json.loads(event.get('body', '{}')) return create_product(body) elif path.startswith('/products/') and path.endswith('/reviews'): product_id = path_params.get('productId', path.split('/')[2]) return get_reviews(product_id) else: return response(404, {'error': 'Not found'}) except Exception as e: print(f"Error: {str(e)}") return response(500, {'error': 'Internal server error'}) def list_products(query_params): """ List products by category using QUERY (efficient). Falls back to SCAN if no category is specified (expensive — avoid in production). """ category = query_params.get('category') if category: # QUERY — efficient, reads only matching items print(f"Querying products in category: {category}") result = table.query( KeyConditionExpression=Key('PK').eq(f'CATEGORY#{category}') ) else: # SCAN — reads entire table, expensive! # In production, require a category or use pagination print("WARNING: Scanning entire table — this is expensive!") result = table.scan( FilterExpression=Attr('SK').eq('METADATA') ) return response(200, { 'products': result['Items'], 'count': result['Count'], 'scannedCount': result['ScannedCount'] # Shows the difference between query and scan }) def get_product(product_id): """ Get a single product by ID using GET_ITEM. This is the most efficient read — directly accesses one item by its full key. """ result = table.get_item( Key={ 'PK': f'PRODUCT#{product_id}', 'SK': 'METADATA' }, # ConsistentRead=True # Uncomment for strongly consistent read (2x cost) ) item = result.get('Item') if not item: return response(404, {'error': f'Product {product_id} not found'}) return response(200, item) def get_reviews(product_id): """ Get all reviews for a product using QUERY with sort key condition. begins_with on the sort key efficiently finds all reviews. """ result = table.query( KeyConditionExpression=( Key('PK').eq(f'PRODUCT#{product_id}') & Key('SK').begins_with('REVIEW#') ), ScanIndexForward=False # Sort descending (newest first) ) return response(200, { 'productId': product_id, 'reviews': result['Items'], 'count': result['Count'] }) def create_product(body): """ Create a new product using PUT_ITEM. Also creates the category listing item for efficient category queries. """ product_id = body['productId'] category = body['category'] price = Decimal(str(body['price'])) # DynamoDB requires Decimal, not float! # Write the product metadata table.put_item(Item={ 'PK': f'PRODUCT#{product_id}', 'SK': 'METADATA', 'GSI1PK': f'CATEGORY#{category}', 'GSI1SK': f'PRICE#{price:010.2f}', # Zero-padded for correct sort order 'name': body['name'], 'description': body.get('description', ''), 'price': price, 'category': category, 'status': 'active', 'stock': body.get('stock', 0) }) # Write the category listing (denormalization for efficient queries) table.put_item(Item={ 'PK': f'CATEGORY#{category}', 'SK': f'PRODUCT#{product_id}', 'name': body['name'], 'price': price }) return response(201, {'message': 'Product created', 'productId': product_id}) def response(status_code, body): return { 'statusCode': status_code, 'headers': {'Content-Type': 'application/json'}, 'body': json.dumps(body, cls=DecimalEncoder) } Step 05: Click Deploy ✅Green banner: Successfully updated the function "ProductCatalogAPI". Notice Decimal(str(body['price'])). DynamoDB does NOT support Python float. You must use Decimal. This is a common gotcha. Step 06: Test The Function Test tab List products by category: { "httpMethod": "GET", "path": "/products", "queryStringParameters": {"category": "electronics"}, "pathParameters": null } Get a single product: { "httpMethod": "GET", "path": "/products/laptop-001", "pathParameters": {"productId": "laptop-001"}, "queryStringParameters": null } Get reviews: { "httpMethod": "GET", "path": "/products/laptop-001/reviews", "pathParameters": {"productId": "laptop-001"}, "queryStringParameters": null } Create a product: { "httpMethod": "POST", "path": "/products", "body": "{\"productId\":\"keyboard-003\",\"name\":\"Mechanical Keyboard\",\"category\":\"electronics\",\"price\":89.99,\"stock\":100}", "pathParameters": null, "queryStringParameters": null } 💡 Run each test and check the results. Look at the count vs scannedCount in the list response. When using query, they'll be equal. When scanning, scannedCount will be higher. See the Difference Step 01: In the Lambda function, find the get_product function ConsistentRead=True line Deploy ✅Green banner: Successfully updated the function "ProductCatalogAPI". Now get_product uses strongly consistent reads: Always returns the latest data Costs 2x the read capacity Only works on the base table (not GSIs) 💡 GSIs only support eventually consistent reads. When to Use Each Use Case Consistency Why Product catalog browsing Eventually consistent Stale data for a second is fine Shopping cart Strongly consistent User expects to see what they just added Inventory check before purchase Strongly consistent Must be accurate to avoid overselling Analytics dashboard Eventually consistent Slight delay is acceptable TTL: Automatic Data Expiration Enable TTL Step 01: In the DynamoDB console, click on ProductCatalog Step 02: Click ▼ Actions Turn on TTL Step 03: Turn on Time to Live (TTL) TTL attribute name: ttl Turn on TTL ✅Green banner: Successfully activated Time to Live for the ProductCatalog table with the ttl attribute. Step 04: Add Items with TTL Create an item with a TTL value (Unix epoch timestamp): { "PK": {"S": "SESSION#user-001"}, "SK": {"S": "DATA"}, "userId": {"S": "user-001"}, "cartItems": {"L": [{"S": "laptop-001"}, {"S": "mouse-002"}]}, "ttl": {"N": "1745625600"} } The ttl value is a Unix timestamp. Items are deleted after this time. Use an epoch converter to set a time a few minutes in the future for testing. 💡 TTL deletion is eventually consistent: items may persist for up to 48 hours after expiration. Don't rely on TTL for exact timing. Always filter out expired items in your queries as a safety measure. DAX (DynamoDB Accelerator) is an in-memory cache that sits in front of DynamoDB. It's a drop-in replacement. Same API, just change the client endpoint. When to Use DAX Scenario Use DAX? Read-heavy workload, same items queried repeatedly Yes Microsecond response times needed Yes Write-heavy workload No (DAX is a read cache) Need strongly consistent reads No (DAX returns eventually consistent) Diverse access patterns, rarely same item twice No (low cache hit rate) How DAX Works Without DAX: App → DynamoDB (single-digit millisecond reads) With DAX: App → DAX (microsecond reads if cached) → DynamoDB (on cache miss) Console Walkthrough (Don't Create.Just Understand) 💸 DAX requires a VPC and costs money even when idle. We'll walk through the setup without creating it. Step 01: Click ▼ DAX Step 02: Click Create cluster Cluster name: product-cache Node type family: t-type family Node type: dax.t3.small Cluster size: 3 nodes (for high availability) Click Next Step 03: Configure networks Network Type: IPv4 Subnet group: Create new Subnet group name: MySubnetGroup VPC ID: defaultVPC Subnets: Select all Security group: default ▼ Click View in EC2 console Step 04: Allow port 8111 from your Lambda functions ✅Green banner: Inbound security group rules successfully modified on security group Step 05: Configure Security IAM Service role for DynamoDB access: Create new IAM role name: DaxToDynamoDB Click Next → Click Next → Click Create cluster ✅Green banner: Successfully created the cluster product-cache. Step 06: In your Lambda code, you'd change one line: # Without DAX dynamodb = boto3.resource('dynamodb') # With DAX — same API, just different endpoint import amazondax dax_client = amazondax.AmazonDaxClient( endpoints=['product-cache.abc123.dax-clusters.us-east-1.amazonaws.com:8111'] ) table = dax_client.Table('ProductCatalog') # All your existing code works unchanged! DAX is a drop-in replacement for DynamoDB reads. Same API, same code. Just change the client. 💡 But remember: DAX only supports eventually consistent reads and is for read-heavy workloads. What You Did Exam Concept Designed a table around access patterns first Access-pattern-driven DynamoDB design Created a composite primary key (PK + SK) Single-table design, sort key relationships Added a Global Secondary Index with different keys GSI for alternate access patterns Used overloaded keys (PRODUCT#, CATEGORY#, REVIEW#) Single-table design pattern Queried by partition key with begins_with on sort key Efficient query operations Ran a scan and compared Count vs ScannedCount Scan is expensive: it reads the entire table Added a FilterExpression to a scan Filters run AFTER reading: don't save capacity Used Decimal(str(price)) instead of float DynamoDB type system: no float support Toggled ConsistentRead=True on get_item Strongly vs eventually consistent reads Noted GSIs only support eventual consistency GSI limitations Enabled TTL on a ttl attribute Automatic data lifecycle management Walked through DAX setup Read caching for DynamoDB, microsecond latency 1. DynamoDB → Delete the ProductCatalog table 2. Lambda → Delete ProductCatalogAPI 3. IAM → Delete the Lambda execution role 4. CloudWatch → Delete the log groups Partition key cardinality: high cardinality = even distribution = good performance Query > Scan: always prefer query. Scan reads the entire table. FilterExpression doesn't save reads: it filters after reading. Use key design or GSIs instead. GSIs can be added anytime. LSIs must be created with the table GSIs are eventually consistent only: no strongly consistent reads on GSIs Use Decimal, not float for DynamoDB numbers in Python TTL is free but eventually consistent (up to 48 hours delay) DAX = DynamoDB read cache (microsecond reads). Same API as DynamoDB. DAX doesn't support strongly consistent reads Single-table design with overloaded keys is the recommended DynamoDB pattern Best practices for designing and architecting with DynamoDB Querying tables in DynamoDB Using Global Secondary Indexes in DynamoDB In-memory acceleration with DynamoDB Accelerator (DAX) Using Time to Live (TTL) in DynamoDB 🏗️
