Architecting a Two-Stage Semantic Search Pipeline with HNSW, LATERAL JOIN, and Cubic Scoring

DEV Community

Siyu

May 11, 2026, 10:00 AM

AI agents are becoming a new interface for finding people. Instead of opening a marketplace, typing keywords, filtering profiles, and manually deciding who is worth contacting, a user can now say something like: Help me find a few SaaS founders who might need my backend architecture services. or: Find remote Rust freelancers who have experience with early-stage infrastructure products. In Opportunity Skill, the user's AI agent turns that request into a semantic search query, calls the QuestMeet backend, receives a compact list of matched candidates, and then drafts tailored collaboration proposals for the user to approve. This post is a technical walkthrough of the backend search function behind that flow. The interesting part is not just "use embeddings". The search engine combines: PostgreSQL pgvector cosine distance HNSW indexes tag-level semantic recall active-user filtering cubic similarity scoring LATERAL JOIN impression reranking separate buyer/professional identity perspectives The goal is simple: Given a natural-language request from an AI agent, return the candidates worth contacting, together with enough semantic context for the agent to explain the match and write a good proposal. Opportunity Skill is an Agent Skill that makes a user discoverable to other agents. It supports agent products that follow the Skill specification, such as Claude Code and OpenClaw. The user does not need to manually browse a website. The agent calls functions exposed by the skill. At a high level, the skill has four processes: The search engine discussed in this article powers the Search and Contact process. When the user asks the agent to find buyers or professionals, the agent calls one of these functions: ai_search_buyers ai_search_professionals These functions are implemented in the skill's scripts/callable_functions.py file and communicate with QuestMeet through GraphQL. A simplified version looks like this: import httpx from typing import Union BASE_URL = "https://questmeet.ai/graphql" def ai_search_professionals(access_token: str, query: str) -> Union[list, bool, None]: try: response = httpx.post( BASE_URL, json={ "query": """ query AiSearchProfessionals($query: String!) { aiSearchProfessionals(query: $query) } """, "variables": {"query": query}, }, headers={"Authorization": f"Bearer {access_token}"}, trust_env=False, timeout=20, ) return response.json()["data"]["aiSearchProfessionals"] except Exception: return False The return values have clear semantics for the agent: Return value Meaning list[dict] Relevant candidates were found [] The request succeeded, but no relevant candidates were found None The access token is missing or expired; the agent should re-authenticate False Something failed; notify the user and stop This is important because the agent, not the server, owns the workflow. If the token is expired, the skill instructs the agent to run the sign-in process, obtain a new token, and retry. On the server side, the two public search fields are small wrappers around the same internal function: @strawberry.field async def ai_search_buyers(self, info: Info, query: str) -> Optional[JSON]: try: return await search_buyers_or_professionals(info, query, "Buyer") except Exception: return False @strawberry.field async def ai_search_professionals(self, info: Info, query: str) -> Optional[JSON]: try: return await search_buyers_or_professionals(info, query, "Professional") except Exception: return False The only difference is the perspective argument: "Buyer" means we search for users as buyers, including employers or clients. "Professional" means we search for users as professionals, including freelancers and employees. This distinction is not cosmetic. The same human can be both a buyer and a professional. A founder may want to hire developers, while also being discoverable as a product consultant. These two identities should not share the same matching context. That is why each user has two external candidate IDs: professional_id VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text, buyer_id VARCHAR(50) UNIQUE DEFAULT gen_random_uuid()::text When a user is returned as a professional, the API returns professional_id as candidate_id. When a user is returned as a buyer, the API returns buyer_id as candidate_id. The agent receives a unified candidate_id and does not need to know which internal column was used. The search function touches five tables: users logins tags impressions impression_tags Here is the simplified relationship: The core idea is that the user's profile is not just a display profile written for humans. It is a set of impressions written for AI agents to search and reason over. An impression is a structured statement about a user's expertise, collaboration style, communication preference, leadership style, taste, or requirements. For example, a professional's impression might say: This developer prefers projects where technical architecture decisions are made explicitly. They value strict type definitions, maintainable interfaces, and long-term code evolution over quick prototypes. Each impression is associated with 1 to 5 tags: impressions_with_tags_format = { "type": "array", "items": { "type": "object", "properties": { "impression": {"type": "string"}, "tags": { "type": "array", "items": {"type": "string"}, "maxItems": 5, }, }, "required": ["impression", "tags"], "additionalProperties": False, }, } The tags are not used as a traditional keyword system. They are embedded into vector space and used as a lightweight semantic recall layer. The full search pipeline looks like this: The agent calls ai_search_buyers or ai_search_professionals. The GraphQL resolver receives the request. Check whether an authenticated user_id exists. If no: return None, prompting the agent to re-authenticate. If yes: proceed to the next step. Vectorize the natural-language query. Search tags using pgvector cosine distance. Keep tags with distance Optional[Union[list[dict], bool]]: if logged_user_id := info.context["user_id"]: if len(embedding_models) == 1: embeddings = await vectorize_contents(embedding_models[0], [query]) # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ... # if candidates found, returns list[dict] here elif len(embedding_models) == 2: embeddings = await vectorize_contents(embedding_models[0], [query]) other_embeddings = await vectorize_contents(embedding_models[1], [query]) # ... tag-level recall, cubic scoring, and LATERAL JOIN reranking ... # if candidates found, returns list[dict] here return [] # executed only when no candidates matched return None In the QuestMeet GraphQL service, info.context["user_id"] is populated after the access token is verified. If it is missing, the function returns None. The server does not attempt to redirect or refresh the token. It only tells the agent: You are not authenticated for this operation. The skill then instructs the agent to run the sign-in process again, store the new access token in long-term memory or a local .txt file, and retry the original process. This keeps the backend simple and makes the agent responsible for workflow recovery. The user request arrives as natural language: Find remote Rust freelancers who have experience with early-stage infrastructure products. Before querying PostgreSQL, the server embeds it: embeddings = await vectorize_contents(embedding_models[0], [query]) The database schema uses 1536-dimensional vectors: odd_embedding vector(1536), even_embedding vector(1536) The search function then uses the query embedding to search semantically related tags and impressions. The first database query searches the tags table: SELECT tag_id, distance FROM ( SELECT tag_id, ({embedding_column} $1::vector) AS distance FROM tags ) AS tag_distances WHERE distance operator is pgvector's cosine distance operator. Cosine distance is: distance = 1 - cosine_similarity So: similarity = 1.0 - distance The filter: WHERE distance = 0.45 This is intentionally not too strict. If the threshold is too high, the search becomes brittle and misses useful matches. If it is too low, the search lets in too much noise. In this system, tag search is the recall stage, so the threshold should be broad enough to catch potentially relevant concepts while still filtering out unrelated tags. The LIMIT 100 prevents broad queries from pulling too many tags into the next stage. The tags table has HNSW indexes on both embedding columns: CREATE INDEX i_tags_odd_embedding ON tags USING hnsw (odd_embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); CREATE INDEX i_tags_even_embedding ON tags USING hnsw (even_embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); The goal is to make semantic tag recall fast even as the tag vocabulary grows. After retrieving semantically related tags, the server maps them back to users through impression_tags: tag_distances = { row["tag_id"]: float(row["distance"]) for row in rows } tag_ids = list(tag_distances.keys()) Then: SELECT impression_tags.tag_id, impression_tags.user_id FROM impression_tags JOIN ( SELECT logins.user_id FROM logins WHERE logins.updated_at >= NOW() - '1 month'::interval GROUP BY logins.user_id ) AS active_users ON impression_tags.user_id = active_users.user_id WHERE impression_tags.tag_id = ANY($1::bigint[]) AND impression_tags.is_public = TRUE This query does three things. impression_tags as an inverted index impression_tags connects impressions and tags. If a tag such as "Rust" or "Type Safety" is semantically close to the query, the join table tells us which users have public impressions associated with that tag. impression_tags.is_public = TRUE A user may have private impressions used only for self-reflection or agent memory. Those should not be discoverable by other users' agents. This check acts as the first layer of filtering during tag-level recall. logins.updated_at >= NOW() - '1 month'::interval A matching profile is only useful if the person is still active. Traditional candidate-search products often surface abandoned profiles. For agent-driven collaboration, that wastes the user's time. This query only keeps users who have logged in within the last month. The relevant index is: CREATE INDEX i_impression_tags_tag_user_public ON impression_tags (tag_id, user_id) WHERE is_public IS TRUE; This partial index keeps the public-tag reverse lookup efficient. For every matched (tag_id, user_id) pair, the server converts cosine distance back to similarity: similarity = 1.0 - tag_distances[row["tag_id"]] Then it adds the cube of the similarity to the user's score: record_id = row["user_id"] if record_id in record_scores: record_scores[record_id] += similarity * similarity * similarity else: record_scores[record_id] = similarity * similarity * similarity Why cube the similarity? Because weak semantic matches should not dominate the ranking just because there are many of them. After the distance $3::vector) AS distance FROM impressions WHERE user_id = users.user_id AND perspective = $2 AND is_public = TRUE ) AS impressions WHERE distance (even_embedding IS NULL)) When two embedding models are configured, the query is embedded with both models. The server searches both vector columns: for embedding, column in [ (embeddings[0], embedding_column), (other_embeddings[0], other_embedding_column), ]: query = f""" SELECT tag_id, distance FROM ( SELECT tag_id, ({column} $1::vector) AS distance FROM tags ) AS tag_distances WHERE distance $3::vector) AS distance FROM impressions WHERE user_id = users.user_id AND perspective = $2 AND is_public = TRUE UNION ALL SELECT content, ({other_embedding_column} $4::vector) AS distance FROM impressions WHERE user_id = users.user_id AND perspective = $2 AND is_public = TRUE Then the combined results are filtered: WHERE distance <= 0.55 ORDER BY distance ASC LIMIT 10 This makes the search function tolerant of data encoded with either embedding model. That is useful when rotating embedding models, migrating old vectors, or operating with more than one embedding source during a transition period. A simpler version of this system would be: Embed the query and run a global vector search over all impressions. That sounds attractive, but it has drawbacks. Impressions are longer and more numerous than tags. A single user may have many impressions, and each impression is a rich semantic statement. Searching the entire impression table globally would make the expensive part of the pipeline happen too early. Instead, this system uses a two-stage approach: The tag layer is a lightweight proxy for semantic recall. The impression layer is used only after the candidate set has been reduced to a small number of users. This gives the system a useful split: Stage Purpose Design Tag recall High recall, low cost HNSW vector search over tags User scoring Rank likely candidates Sum of similarity³ Impression rerank High precision Per-user semantic filtering with LATERAL JOIN This is not just a performance optimization. It also improves the quality of results. Tags help the system identify relevant candidates. Impressions help the system explain why they are a good fit. Here are the indexes most relevant to this search path. For tag vector search: CREATE INDEX i_tags_odd_embedding ON tags USING hnsw (odd_embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); CREATE INDEX i_tags_even_embedding ON tags USING hnsw (even_embedding vector_cosine_ops) WITH (m = 32, ef_construction = 128); For active-user filtering: CREATE INDEX i_logins_updated_at_user_id ON logins (updated_at, user_id); For public tag-to-user lookup: CREATE INDEX i_impression_tags_tag_user_public ON impression_tags (tag_id, user_id) WHERE is_public IS TRUE; For per-user impression lookup: CREATE INDEX i_impressions_user_perspective ON impressions (user_id, perspective); The impressions table is also partitioned by user_id: CREATE TABLE impressions ( impression_id BIGINT GENERATED BY DEFAULT AS IDENTITY, user_id BIGINT NOT NULL, perspective VARCHAR(50) NOT NULL, content TEXT NOT NULL, odd_embedding vector(1536), even_embedding vector(1536), is_public BOOLEAN NOT NULL, updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (user_id, impression_id) ) PARTITION BY RANGE (user_id); This helps keep per-user impression retrieval predictable as the table grows. Here is the full search pipeline, step by step: The agent sends a natural-language query. Embed the query into a vector. Find semantically close tags using pgvector cosine distance. Keep tags with distance <= 0.55, limiting to 100 tags. Reverse-lookup users via the impression_tags join table. Filter to include only public impressions. Filter to include only users active in the last month. Compute each user's score as the sum of similarity³. Take the top 50 users. Exclude the current user. For each remaining user, run a LATERAL JOIN against their impressions. Keep only impressions matching the requested perspective (Buyer or Professional). Keep only impressions with distance <= 0.55. Return up to 10 impressions per user. The agent receives the results, recommends candidates to the user, and drafts collaboration proposals. Here are the main design lessons from building this. A normal search API might return IDs and display fields. An AI-agent search API should return evidence. That is why the response includes query-relevant impressions. The agent needs them to explain the match and write a personalized message. Raw keyword search is too brittle. Global impression vector search can be too expensive too early. Semantic tags give the system a compact recall layer between keywords and full documents. Linear scoring makes it easier for many weak matches to overpower fewer strong matches. Cubic scoring is a simple way to let strong matches dominate without completely discarding weaker supporting signals. The same person can be a buyer in one context and a professional in another. Mixing these impressions would create strange matches. Keeping buyer_id, professional_id, and perspective separate makes the search context cleaner. Returning None for expired authentication lets the agent recover by running the sign-in process. Returning False for other failures tells the agent not to blindly retry. For agent workflows, these distinctions matter. Opportunity Skill is built around a simple belief: In the AI-agent era, your profile should not only be readable by humans. It should be searchable, interpretable, and actionable by agents. The search function described here is one part of that system. It turns a natural-language request into: semantically matched candidates identity-aware candidate IDs relevant profile evidence compact context that an AI agent can use to draft a proposal If you want to try the skill, you can ask your agent to install it from: https://github.com/QuestMeet/opportunityskill/releases/download/latest/opportunity-skill.zip