Implementing Semantic Matching in Nuxt with Cloudflare Vectorize
Learn how to implement AI-powered semantic matching in Nuxt using Cloudflare Vectorize. We normalize messy real-world data using vector search and automate the workflow with scheduled tasks.

Building on the previous article, where we set up our Cloudflare Queue to process location data and populate the Vectorize store, this article focuses on implementing the core semantic matching logic. This step is crucial for accurately matching locations from property agents' feeds with our internal data, enabling effective synchronisation of property data.
This article is the third and final part of the Nuxt & Cloudflare AI Vector Pipeline Series:
- Part one: Nuxt & Cloudflare Vectorize: Setting up D1, Drizzle, and Workers AI
- Part two: Nuxt & Cloudflare Queues: Building a Data Sync Pipeline using Vectorize
- Part three: Implementing semantic matching in Nuxt with Cloudflare Vectorize
The code for the entire demo application is publicly available on GitHub: Nuxt & Cloudflare AI Vector Pipeline Series — give it a ⭐.
The outcome covered today is a scheduled Nitro Task that adds properties from the agents' dummy feeds. We use a service class for the synchronisation process and a queue so that the process runs in the background, in batches.
I opted to use a service because the sync process includes several steps: retrieving the agent feed, ignoring previously added properties, attempting to match the agent property to our locations datatable using simple string matching, falling back to AI Semantic matching, and finally persisting the matched properties to our database.
Cloudflare Vectorize semantic matching
Since we know our service will need to use an AI vector repository, we can add this dependency. Create /server/utils/services/propertySync/repository/AILocationMatcherRepository.ts with:
import {
AILocationMatch,
} from '~~/server/utils/services/propertySync/AILocationMatch'
import {
AgentProperty,
} from '~~/server/utils/services/propertySync/AgentProperty'
export interface AILocationMatcherRepository {
/**
* Attempts to find location matches based on the provided query.
*
* @returns A promise that resolves to an array of AILocationMatch objects.
* @param property
* @param limit
*/
findMatches (
property: AgentProperty, limit?: number
): Promise<AILocationMatch[]>
}
and /server/utils/services/propertySync/AILocationMatch.ts with:
export interface AILocationMatch {
id: string
score: number
}
and also ``server/utils/services/propertySync/AgentProperty.ts` with:
export interface AgentProperty {
propertyId: string;
title: string;
description?: string;
location?: string,
postcode?: string,
}
For the returned promise by the repository's findMatches() method.
We can now implement the repository interface AILocationMatcherRepository in our CloudflareVectorAIRepository, which we created last time, and implement the findMatches() like so:
/**
* Finds matching locations for the given agent property.
*
* @param property
* @param limit
*/
public async findMatches (
property: AgentProperty,
limit: number = 5,
): Promise<AILocationMatch[]> {
const queryText = this.buildSearchQueryForProperty(property)
if (!queryText) return []
try {
// Generate Embedding
// Wraps existing private helper in case of model timeouts
const vectors = await this.embedTextBatch([queryText])
const queryVector = vectors[0]
if (!queryVector) return []
// Query Vectorize
const result = await this.vectorIndex.query(queryVector, {
topK: limit,
returnMetadata: 'none',
})
// Map Results
return result.matches.map((match) => ({
id: match.id,
score: match.score,
}))
} catch (error) {
console.error('[Repo] Vector search failed:', error)
return []
}
}
/**
* Builds the search query string for an agent's property.
*
* @param property
* @private
*/
private buildSearchQueryForProperty (property: AgentProperty): string {
const parts: string[] = [
property.title,
property.description ?? '',
property.postcode ?? '',
property.location ?? '',
]
return parts
.filter(p => p && p.trim().length > 0)
.join(' ')
.trim()
}
The findMatches() method builds a text search query using several attributes of an Agent's property, which may help match a vectorised location, queries the vector index, and returns the best five matches.
That's it for the vector store repository. We can now move on to a matching helper class.
Semantic search, semantic matching, and deterministic checks
When working with semantic search, we aim to reduce the burden and thus reduce costs with external providers, whether that is Cloudflare or other AI model services. We should always take steps to resolve matching deterministically first.
In our example, we do this by matching the agent's location name against our internal records using simple string matching; we do the same for postcodes. In a production system, we typically do much more than basic string matching, but I am keeping this example simple for brevity and because we are using Cloudflare's D1. Since D1 is essentially an SQLite database, it lacks the native advanced fuzzy matching functionality found in other engines.
💡 Note: PostgreSQL offers native fuzzy matching via trigrams, allowing it to understand that human errors such as 'London' and 'Lodnon' are the same entity without leaving the database layer. Let me know in the comments below if you’d like me to write more about AI integration strategies.
Need help with AI integrations on Nuxt?
I can help with every LLM and AI providers
With that said, let's add a helper class to avoid overcrowding our upcoming service class. Create server/utils/services/propertySync/helpers/LocationMatcher.ts with:
import type { Location } from '~~/server/database/types'
import type {
AILocationMatcherRepository,
} from '~~/server/utils/services/propertySync/repositories/AILocationMatcherRepository'
import {
AILocationMatch,
} from '~~/server/utils/services/propertySync/AILocationMatch'
import {
AgentProperty,
} from '~~/server/utils/services/propertySync/AgentProperty'
/**
* Attempts to match raw agent location data with canonical internal Locations.
*/
export class LocationMatcher {
public constructor (
private readonly canonicalLocations: Location[],
private readonly aiLocationMatcherRepository: AILocationMatcherRepository,
) {}
/**
* Attempts to match the given agent property location
* with an internal canonical location.
*/
public async match (
agentProperty: AgentProperty,
): Promise<Location | null> {
// "easy string" Tries strict name or postcode resolution first
const exactMatch = this.resolveExact(
agentProperty.location,
agentProperty.postcode,
)
if (exactMatch) {
return exactMatch
}
// Failing that, we try AI-based matching
const aiMatches = await this.aiLocationMatcherRepository.findMatches(
agentProperty,
5,
)
const autoLinkTarget = this.pickAutoLinkTarget(aiMatches)
if (autoLinkTarget) {
const matchedLocation = this.canonicalLocations.find(
(loc) => loc.id === autoLinkTarget.id,
)
if (matchedLocation) {
return matchedLocation
}
}
return null
}
/**
* Tries to find a Location based on exact Name match (case-insensitive)
* or Postcode prefix match.
*
* @private
* @param locationName
* @param postcode
*/
private resolveExact (
locationName?: string,
postcode?: string,
): Location | null {
// Name match first
if (locationName) {
const cleanName = locationName.trim().toLowerCase()
const matchedByName = this.canonicalLocations.find(
(loc) => loc.name.toLowerCase() === cleanName,
)
if (matchedByName) return matchedByName
}
// Then postcode prefix match
// Agent might provide "E1 6AN", we match if our location has "E1"
if (postcode) {
const cleanPostcode = postcode.trim().toUpperCase()
const matchedByPostcode = this.canonicalLocations.find((loc) =>
loc.postcodes.some(p => cleanPostcode.startsWith(p)),
)
if (matchedByPostcode) return matchedByPostcode
}
return null
}
/**
* Picks an auto-link target from the given candidates
* based on configured thresholds.
*
* @param candidates
* @returns The picked AI match or null if no suitable match is found.
*/
public pickAutoLinkTarget (
candidates: AILocationMatch[],
): AILocationMatch | null {
const config = useRuntimeConfig()
// The strict minimum score to accept
const strictMin = Number(config.aiMatcher?.minScore ?? 0.40)
// The strict margin over the second-best candidate
const strictMargin = Number(config.aiMatcher?.margin ?? 0.05)
if (!candidates.length) {
return null
}
const best = candidates[0]
if (!best) {
return null
}
const second = candidates.length > 1 ? candidates[1] : undefined
// Must meet minimum confidence score
const passesMin = best.score >= strictMin
// Must be "significantly" better than the runner-up
const passesMargin = !second || (best.score - second.score >= strictMargin)
if (!passesMin || !passesMargin) {
return null
}
return best
}
}
The helper depends on a list of canonical locations, i.e. our internal locations and the location matcher repository. The match method attempts to resolve the agent’s property location using simple text match against our locations. Failing the deterministic approach, it calls the findMatches() on the repository and applies minimum-similarity-score and margin-threshold rules before returning a single qualifying semantic match.
When working with semantic matching, you'll find that the definition of a good match isn't static. As your vector store grows in density, the margin between a correct match and a close-but-wrong alternative often shrinks. Hence, I highly recommend using config variables rather than hard-coded values. Making the minimum score and next-match margin configurable lets you easily adjust your minimum similarity scores and safety margins as your real-world data evolves.
Are you enjoying reading this?
Get notified when I publish the next article.
Semantic matching process with Nuxt
With the repository implemented and the location matcher helper in place, we can finally build our service class to glue the per-agent property sync process together.
First, we need to mimic the agents’ feeds. To do this, we’ll create a dummy API endpoint and a few dummy JSON files representing agents’ listings. Go ahead and copy the three JSON files to /server/assets/agents/.
You'll notice that the marketer.json file contains some intentionally messy location data to simulate real-world scenarios. Our semantic matching logic will help us correctly associate these properties with our internal locations. Eventually, the service will successfully match all three agents' property locations and add the properties to our database.
Now create /server/api/agents/[agent].get.ts endpoint with:
import formalist from '~~/server/assets/agents/formalist.json'
import traditionalist from '~~/server/assets/agents/traditionalist.json'
import marketer from '~~/server/assets/agents/marketer.json'
export default defineEventHandler(async (event) => {
const agentSlug = getRouterParam(event, 'agent')
if (!agentSlug) {
throw createError({ statusCode: 400, message: 'Agent slug required' })
}
// Map slugs to data sources
const feeds: Record<string, any[]> = {
'formalist': formalist,
'traditionalist': traditionalist,
'marketer': marketer
}
const data = feeds[agentSlug]
// Handle 404s (mimic a real API)
if (!data) {
throw createError({ statusCode: 404, message: `Agent feed '${agentSlug}' not found` })
}
return data
})
Now we create: /server/utils/services/propertySync/SyncAgentPropertiesService.ts with:
import {
AILocationMatcherRepository,
} from '~~/server/utils/services/propertySync/repositories/AILocationMatcherRepository'
import {
AgentWithProperties,
Location, Property,
} from '~~/server/database/types'
import { AgentProperty } from './AgentProperty'
import {
LocationMatcher,
} from '~~/server/utils/services/propertySync/helpers/LocationMatcher'
import { drizzle } from 'drizzle-orm/d1'
import * as schema from '~~/server/database/schema'
import type { D1Database } from '@cloudflare/workers-types'
export class SyncAgentPropertiesService {
private readonly db: ReturnType<typeof drizzle<typeof schema>>
public constructor (
private readonly aiLocationMatcherRepository: AILocationMatcherRepository,
private readonly locations: Location[],
database: D1Database,
) {
this.db = drizzle(database, { schema })
}
/**
* Executes the property synchronization for the given agent.
*
* @param agent
*/
public async execute (
agent: AgentWithProperties,
): Promise<void> {
if (!agent.apiRoute) {
throw new Error(
`[Sync Service] Agent "${agent.name}" - "${agent.id}" has no API route configured.`,
)
}
const agentFeedProperties = await $fetch<AgentProperty[]>(
agent.apiRoute,
)
const locationMatcher = new LocationMatcher(
this.locations,
this.aiLocationMatcherRepository,
)
for (const agentProperty of agentFeedProperties) {
const existingProperty = agent.properties?.find(
(prop: Property) => prop.externalRef === agentProperty.propertyId,
)
// if property already added, skip
if (existingProperty) {
continue
}
// try to find matching location
const matchedLocation = await locationMatcher.match(
agentProperty,
)
if (matchedLocation) {
await this.db.insert(schema.properties).values({
externalRef: agentProperty.propertyId,
title: agentProperty.title,
locationId: matchedLocation.id,
originalLocation: agentProperty.location || '',
agentId: agent.id,
})
} else {
console.log(`[Sync Service] Skipped (No Match): "${agentProperty.title}"`)
}
}
}
}
The service will retrieve the dummy listings using the API endpoint we just created, and check whether the given agent's property is already in our internal property database. If it is, it ignores it and proceeds to add new ones. Since we need to match each agent property with our locations before we can save the properties, the service will use the LocationMatcher helper class. If a location is not matched, the service won't add the property.
Now we need to create a Nitro Task and a Queued process so we can schedule it, and, for immediate feedback, we’ll add an endpoint to test without waiting for the CRON trigger. I’ve covered Nuxt Queues in detail in the previous article, so that we can move through this part efficiently.
Create a queue for syncing agent properties on Cloudflare:
npx wrangler queues create agents-properties-sync-queue
Add the bindings to /wrangler.toml
[[queues.producers]]
queue = "agents-properties-sync-queue"
binding = "AGENTS_PROPERTIES_SYNC_QUEUE"
[[queues.consumers]]
queue = "agents-properties-sync-queue"
max_batch_size = 5 # Process 5 messages at a time
max_batch_timeout = 10 # Wait up to 10s to fill a batch
max_retries = 0 # Intentionally set to 0 since the scheduler will repeatedly run the task
Add the queue message body type to: /server/types/queues.ts
export interface AgentsPropertiesSyncQueueMessageBody {
agentId: string;
}
Create the /server/tasks/agents-properties-sync.ts Nitro task with:
import { CloudflareTaskContext } from '~~/server/types/queues'
import { drizzle } from 'drizzle-orm/d1'
import * as schema from '~~/server/database/schema'
import type { D1Database } from '@cloudflare/workers-types' // <--- Removed 'Queue'
import { isNotNull } from 'drizzle-orm'
import { agents } from '~~/server/database/schema'
export default defineTask({
meta: {
name: 'agents-properties-sync',
description: 'Queues all agents to sync their properties from external systems',
},
async run (event) {
const context = event.context as CloudflareTaskContext
const env = context.cloudflare?.env
if (!env?.DB) {
return { error: 'DB binding not found.' }
}
// TypeScript knows this is a Queue because of your CloudflareEnv interface
const queue = env.AGENTS_PROPERTIES_SYNC_QUEUE
if (!queue) {
return { error: 'Queue binding (AGENTS_PROPERTIES_SYNC_QUEUE) not found.' }
}
const db = drizzle(env.DB as D1Database, { schema })
try {
const allAgents = await db.query.agents.findMany({
where: isNotNull(agents.apiRoute),
columns: { id: true },
})
if (allAgents.length === 0) {
return { result: 'No agents with apiRoute found in DB.' }
}
const total = allAgents.length
console.log(`[Task] Found ${total} agents. Dispatching to queue...`)
const messages = allAgents.map((agent) => ({
body: { agentId: agent.id },
}))
const CHUNK_SIZE = 10
for (let i = 0; i < messages.length; i += CHUNK_SIZE) {
const batch = messages.slice(i, i + CHUNK_SIZE)
await queue.sendBatch(batch)
}
return {
result: `Dispatched ${total} agents to AGENTS_PROPERTIES_SYNC_QUEUE.`,
}
} catch (error: any) {
console.error('[Task] Error during queue dispatch:', error)
return { error: error.message || 'Unknown error occurred.' }
}
},
})
Create the server/utils/queueHandlers/AgentsPropertiesSyncQueueHandler.ts handler class with:
import {
SyncAgentPropertiesService,
} from '~~/server/utils/services/propertySync/SyncAgentPropertiesService'
import { MessageBatch, D1Database } from '@cloudflare/workers-types'
import {
AgentsPropertiesSyncQueueMessageBody,
CloudflareEnv,
} from '~~/server/types/queues'
import * as schema from '~~/server/database/schema'
import { drizzle } from 'drizzle-orm/d1'
import { inArray } from 'drizzle-orm'
import { AgentWithProperties } from '~~/server/database/types'
export class AgentsPropertiesSyncQueueHandler {
public constructor (
private readonly syncAgentPropertiesService: SyncAgentPropertiesService,
) {}
public async handle (
batch: MessageBatch<AgentsPropertiesSyncQueueMessageBody>,
env: CloudflareEnv,
): Promise<void> {
const messages = batch.messages
const db = drizzle(env.DB as D1Database, { schema })
const agentIds = new Set<string>()
for (const message of messages) {
if (message.body?.agentId) {
agentIds.add(message.body.agentId)
}
}
if (agentIds.size === 0) {
console.log('[Agents Properties Sync Handler] No valid agentIds found in batch.')
return
}
try {
const ids = Array.from(agentIds)
const agentsToSync: AgentWithProperties[] = await db.query.agents.findMany({
where: inArray(schema.agents.id, ids),
with: {
properties: true,
}
})
for (const agent of agentsToSync) {
await this.syncAgentPropertiesService.execute(agent)
console.log(
`[Agents Properties Sync Handler] Successfully synced properties for agent "${agent.id}".`,
)
}
} catch (error: any) {
console.error('[Agents Properties Sync Handler] Error:', error.message);
throw error;
}
}
}
And trigger it from the Nitro Plugin: /server/plugins/queue-handler.ts that listens to Cloudflare Queues:
if (batch.queue === 'agents-properties-sync-queue') {
try {
const aiLocationMatcherRepository = new CloudflareVectorAIRepository(
env.VECTORIZE,
env.AI,
)
const database = drizzle(env.DB, { schema })
const allLocations = await database.query.locations.findMany()
const syncService = new SyncAgentPropertiesService(
aiLocationMatcherRepository,
allLocations,
env.DB,
)
const handler = new AgentsPropertiesSyncQueueHandler(
syncService,
)
await handler.handle(
batch as MessageBatch<AgentsPropertiesSyncQueueMessageBody>,
env,
)
} catch (error: any) {
console.error('[Queue] Error processing batch:', error)
// This will trigger a retry as per the wrangler.toml queue settings
throw error
}
}
Before we hand over control to a scheduler, we need a way to manually trigger the synchronisation process and ensure our pipeline works end-to-end. Let's create a temporary debug endpoint to immediately invoke the Nitro Task.
Create /server/api/internals/agents-properties-sync.get.ts with:
export default defineEventHandler(async (event) => {
const config = useRuntimeConfig();
if (getHeader(event, 'x-secret') !== config.internalApiSecret) {
throw createError({ statusCode: 401, statusMessage: 'Unauthorized' });
}
const result = await runTask(
'agents-properties-sync',
{
payload: {},
context: {
cloudflare: event.context.cloudflare
}
}
);
return {
status: 'Agents Properties Sync Task Triggered',
result,
}
});
Now we can deploy and test the property sync task with semantic matching:
pnpm run build
npx wrangler deploy
curl -H "x-secret: [YOUR_SECRET]" "https://[YOUR_WORKER_URL]/api/internals/tasks/agents-properties-sync"
Once the queue task is completed, you should see eight properties added to the database with the correct location_id. 🎉
💡Although end-to-end testing using an endpoint is fine for this example, in real-world applications, I encourage you to use Vitest and mock the repositories for automated testing. You can also use a dedicated wrangler config for tests 😉
Do you need help with Nuxt?
Don't let the infrastructure details slow you down. I help your team cut through the complexity and ship a solid, scalable Nuxt application.
Automating Nitro Tasks with Cloudflare Cron Triggers
Now that we know our property sync service and semantic matching work, we can schedule the Nitro task to run periodically.
We first need to add a Cron trigger event to Cloudflare via the /wrangler.toml :
[triggers]
crons = ["*/5 * * * *"]
And then configure Nuxt to run the sync Nitro Task in /nuxt-config.ts :
nitro: {
// ...
scheduledTasks: {
// Runs the 'agents-properties-sync' task every five minutes
'*/5 * * * *': ['agents-properties-sync']
}
}
Once you build and deploy the app, you can see your task running every 5 minutes in Cloudflare’s dashboard, from the worker observability tab.
We built an AI-powered data pipeline
Over this three-part series, we built the backend of our property listing aggregator in Nuxt and covered a lot of Cloudflare’s offerings.
We successfully:
- Structured our Data using D1 and Drizzle ORM.
- Decoupled our Logic using Cloudflare Queues for background processing.
- Implemented semantic matching using Cloudflare Vectorize and Workers AI.
With this example application, you've gained the foundation to build scalable and resilient Nuxt applications running on Cloudflare infrastructure. Most importantly, you learnt how semantic matching works alongside deterministic string matching and should be able to integrate AI into your future Nuxt projects.
The code for the entire demo application is publicly available on GitHub: Nuxt & Cloudflare AI Vector Pipeline Series, Let me know in the comments if you are building something similar or if you’d like to write more about Nuxt, Cloudflare or AI integrations in general.
Until next time 💙.
Stay updated with the latest Nuxt & AI insights
Subscribe to my newsletter for more articles, tutorials, and tips on building with Nuxt and AI.



