Guy Royse

Guy Royse

A First Look at Vector Sets

Last week, Redis announced vector sets—a new data structure coming in Redis 8. They also released Redis 8 RC1, which means we get to play with this new feature right now. And that’s exactly what we’re going to do!

We’ll explore vector sets through a real-world example: a user’s photo album. I picked this example because it shows off where vector sets really shine. Why does it work so well? Stick around to the end to find out.

Before We Get Started

Two big caveats before we dive into the details:

  1. Even though they are part of Redis 8, vector sets are in beta. They could change or even be removed from future versions of Redis. So, think about that before you ship code to production.
  2. This post isn’t intended to be an introduction to vectors, vector search, and embeddings. I’m going to assume that you have at least an inkling of what these things are. If you don’t, watch this talk that I gave at Jfokus where I explain it in detail.

Disclaimer delivered. Now, on with the fun!

What’s a Vector Set?

A vector set is a lot like other sets in Redis like the sorted set or the plain old set. However, a vector set contains labeled points in a multi-dimensional space—think X-Y coordinates but with more axes. The labels themselves are just simple strings. The points are a series of numbers—the coordinates in the multi-dimensional space. These numbers are the vector.

I can add vectors to a vector set using the VADD command. In this example I’m adding 512-dimensional embeddings of photos for a user named Alice. The photoId is being used as the label and the embedding returned from the call to embedPhoto are the vector:

async function addPhotoToVectorSet(userId: string, photoId: string, pathToPhoto: string) {
  const embedding: number[] = await embedPhoto(pathToPhoto)
  const key = `user:${userId}:photos`
  const values: string[] = embedding.map(v => v.toString())
  const dims = embedding.length.toString()

  await redis.sendCommand(['VADD', key, 'VALUES', dims, ...values, photoId])
}

await addPhotoToVectorSet('alice', 'photo:42', '/photos/IMG_0042.png')
await addPhotoToVectorSet('alice', 'photo:23', '/photos/IMG_0023.png')
await addPhotoToVectorSet('alice', 'photo:13', '/photos/IMG_0013.png')

Note that I am using TypeScript here and Node Redis, but this should translate nicely to whatever tool you’re using. Vector sets are so bleeding edge that the client libraries don’t even support them yet. So, we have to use the lower-level .sendCommand() function which accepts a string[]. Your client library of choice will have some variation of this.

Also, the .embedPhoto() function is, effectively, pseudocode. Your magic embedding code goes in there. If you want to take a peek at that magic, check out my meme twin finding example.

So, a vector set is—surprise—a set of vectors. In most real-world use cases, those vectors will be embeddings, like the ones you’d generate from an image, a chunk of text, or even audio.

Using a Vector Set

Once you have vectors in a vector set, you can manipulate them:

const key = `user:${userId}:photos`

// removes a photo from the set
await redis.sendCommand(['VREM', key, 'photo:42'])

// the cardinality of the set
await redis.sendCommand(['VCARD', key])

// the number of dimensions the vectors in the set have, 512 in our case
await redis.sendCommand(['VDIM', key])

This isn’t exactly complex code—it just invokes the commands and the comment tells you what they do. There are lots of other commands as well. The full details for all of them are in the docs which I encourage you to peruse.

The command at the heart of vector sets is VSIM—it lets you search for vectors that are the most similar to a provided one. There are two main ways that you provide one—by value and by element.

Providing a vector by value is simply providing the vector as part of the command to Redis. This is the classic semantic search use case:

async function searchPhotos(userId: string, query: string): string[] {
  const embedding: number[] = await embedText(query)
  const key = `user:${userId}:photos`
  const dims = embedding.length.toString()
  const values: string[] = embedding.map(v => v.toString())

  return await redis.sendCommand(['VSIM', key, 'VALUES', dims, ...values, 'COUNT', '3'])
}

Providing a vector by element is just using an existing vector in the vector set, referenced by its label. This is more of a recommendation pattern:

async function similarPhotos(userId: string, photoId: string) {
  const key = `user:${userId}:photos`
  return await redis.sendCommand(['VSIM', key, 'ELE', photoId, 'COUNT', '3'])
}

And, you can see how these might work nicely together:

// returns photos matching the query
const foundPhotoIds: string[] = await searchPhotos('alice', 'Show me pictures of animals')

// returns photos similar to the first found photo
const firstFoundPhotoId = foundPhotoIds[0]
const similarPhotos: string[] = await similarPhotos('alice', firstFoundPhotoId)

And that’s pretty much it for the code - at least enough for you to get started.

So What?

So, what’s the big deal? I could do all of this with the Redis Query Engine, right?

Absolutely—and in many cases, Redis Query Engine is still the right choice. It supports hybrid search, rich document indexing, and can scale across shards in a Redis cluster. If you’re doing full-text search, filtering on structured data, and searching across millions of vectors, Redis Query Engine is the way to go.

Vector sets offer something different. They’re simpler, easier to work with, and perfect for use cases where data is naturally scoped—to a user, a device, a region, or a moment in time. You don’t need to define a schema. You don’t need to manage an index. You just add vectors and search them.

Of course, this simplicity comes with trade-offs. Vector sets are all about the vector—that’s their focus. It’s in the name!. You won’t get rich document indexing or hybrid queries. JSON-based filtering is supported, but it’s not as flexible as what Redis Query Engine provides. And scaling across a cluster? Absolutely possible, but you’ll have to manage that in code yourself.

So while there’s overlap in what you can build, the way you build it—and the trade-offs you make—are different. It’s not about better or worse. It’s about using the right tool for the job.

Conclusion

If your application needs rich filtering, hybrid search, and global indexing, Redis Query Engine is the clear choice. But if you’re building lightweight, fast, and highly scoped similarity search—especially for things like user-specific data or ephemeral sessions—vector sets can be a great fit.

That’s why the photo album example works so well. Each user has their own vector set, which naturally avoids clustering issues and performance bottlenecks. You can quickly find similar photos, search by description, and build real-time recommendations—all without the overhead of Redis Query Engine.

Vector sets are still in beta, but they’re already looking like a powerful addition to the Redis toolbox. I’m definitely going to keep tinkering—seeing where they work and where they don’t.


Got ideas? Weird use cases? Built something cool? If you have, tag me on Bluesky, LinkedIn, or X. I’d love to see what you’re doing with vector sets.

Finding Bigfoot with Async Generators + TypeScript

Lately, I’ve been messing about with generators—of the synchronous and asynchronous varieties—using TypeScript. They’re not something I’ve used much and I thought it’d be a good idea to get a little more well acquainted with them. And, of course, I like to share what I learn. So, let’s commence with the sharing.

Generators in TypeScript

Generators are special functions that generate a sequence of values and return iterators. For stand-alone functions, you define them by putting a * immediately after the function keyword. For functions in a class, including static ones, you put it right before the function name itself.

function* someNumbers(): Generator<number> {...}

class NumberGenerators {
  static *someNumbers(): Generator<number> {...}
}

Generators then return data using the yield keyword.

function* someNumbers(): Generator<number> {
  yield 1
  yield 2
  yield 3
}

You can then access these values just like any iterator by looping calls to .next() or by using a for...of loop.

const generator = someNumbers()

while (true) {
  const { value, done } = generator.next()
  if (done) break
  console.log(value)
}

for (const value of someNumbers()) {
  console.log(value)
}

Now, this might not sound all that interesting as, after all, you could do this by simply returning an array. However, the magic is in that yield keyword. A generator isn’t actually executed until you—or your for...of loop—calls .next(). Once you—or it—does, the code runs right up to the yield statement, returns the value, and then pauses the execution until the next call to .next().

Generators don’t have to end. They can just keep on going forever. For example, you could create a generator that returns numbers from 1 to infinity and just call .next() until you’re sick of it. Or, you can use a for...of to create an infinite loop.

function* allNumbers(): Generator<number> {
  let i = 0
  while (true) yield i++
}

for (const value of allNumbers()) {
  console.log(value)
}

Asynchronous Generators

Generators can also be asynchronous. This means that instead of yielding values they yield Promises. To make an asynchronous generator just mark your generator functions as async and yield Promises.

async function* allAsyncNumbers(): AsyncGenerator<number> {
  let i = 0
  while (true) yield Promise.resolve(i++)
}

To consume them, you can either call .next() and await the Promise or use a for await...of loop and not think about promises. Personally, I’m a fan of the latter.

const generator = allAsyncNumbers()
while (true) {
  const { value, done } = await generator.next()
  if (done) break
  console.log(value)
}

for await (const value of allAsyncNumbers()) {
  console.log(value)
}

Doing Something Allegedly Useful

Of course, these examples are just toys. A more proper use for asynchronous generators is handling things like reading files, accessing network services, and calling slow running things like AI models. So, I’m going to use an asynchronous generator to access a networked service. That service is Redis and we’ll be using Node Redis and Redis Query Engine to find Bigfoot.

I’m not gonna get into the details on connecting to Redis or on how to create a schema for Redis Query Engine. There’s plenty out there about that already, some of it created by me. And, I have a repo with all the details anyhow.

However, this is TypeScript so we are gonna start out by defining some types. First, the BigfootSighting type. This type matches the JSON that we are getting out of Redis. It’s just a bunch of carefully arranged strings.

type BigfootSighting = {
  id: string
  title: string
  account: string
  classification: string
  location: {
    county: string
    state: string
    lnglat: string
  }
}

The generator itself takes a Redis query, which is just a string and, of course, returns the generator.

Inside the generator, we start a loop that calls .ft.search() until there are no more results. As each result has multiple JSON documents—ahem—I mean Bigfoot sightings. Totally Bigfoot sightings. I cast it and everything.

As each result has multiple Bigfoot sightings, we need to loop over those too, yielding them as we go.

async function* fetchBigfootSightings(query: string): AsyncGenerator<BigfootSighting> {
  let offset = 0
  let hasMore = true

  while (hasMore) {
    /* Get a page of data. */
    const options: SearchOptions = {
      LIMIT: { from: offset, size: PAGE_SIZE },
      DIALECT: 4 // The latest dialect. Supports cool stuff like vector search.
    }

    const result = await redis.ft.search(INDEX_NAME, query, options)

    /* Loop over the resulting documents and yield them. */
    for (const document of result.documents) {
      /*
        There's only one value in the document and technically it's in a
        property named '0' but this looks better.
      */
      yield document.value[0] as BigfootSighting
    }

    /* Prepare for the next page. */
    hasMore = result.total > offset
    offset += PAGE_SIZE
  }
}

Remember, the code pauses execution after every yield. So, we won’t make another network call until after we’ve consumed the first page of sightings. This is great, because if our code decides to not consume all the results, say by calling break in our for await...of loop or just not calling .next() again, we don’t have to make another network call. Less is more.

Another nice perk here is memory efficiency. Since we’re yielding one sighting at a time and waiting between calls, we’re not slurping the entire dataset into memory all at once. That means if there are thousands of Bigfoot sightings—and you know there are—we’re only dealing with them as needed. It’s lazy in the best possible way.

Wrapping Up

So, that’s generators. Let’s wrap up by wrapping up some calls to this generator to execute “meaningful” queries for your application. Here’s a few that will help you find Bigfoot.

function fetchAll(): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings('*')
}

function fetchByKeywords(keywords: string): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings(keywords)
}

function fetchByClassification(classification: string): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings(`@classification:{${classification}}`)
}

function fetchByState(state: string): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings(`@state:${state}`)
}

function fetchByCountyAndState(county: string, state: string): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings(`@county:${county} @state:${state}`)
}

function fetchByLocation(longitude: number, latitude: number, radiusInMiles: number): AsyncGenerator<BigfootSighting> {
  return fetchBigfootSightings(`@lnglat:[${longitude} ${latitude} ${radiusInMiles} mi]`)
}

Happy hunting!

Renaissance Guy's House Rules

I created a tidy little one-pager of the house rules I use when I run 5th Edition Dungeons & Dragons. They’re mostly cribbed from Index Card RPG by Runehammer Games—which you should totally go out and buy—but I included one from Sly Flourish as well.

Download it here.

Namelings, Namespaces, Nicknames, and Aliases

I like words. Old words. New words. Obscure words. And, most interestingly, forgotten words. I’m not the only one who likes this sort of stuff as several years ago I found this site called the Compendium of Lost Words.

One word I learned from it struck me as useful. That word: nameling. A nameling is someone with whom you share a name. My name is Guy and I don’t encounter namelings very often as my name is, shall we say, uncommon. However, you might have a more common name like Bill or George (anything but Sue) and know several namelings. Regardless, now that we have a word for it, we can talk easily about the idea.

And, we can extend the idea of a nameling to other things. Things like software. Namelings occur all the time in software. We’ll be creating some module for our project and realize that its name conflicts with an existing module from another library. Or maybe we’ll have two libraries that have conflicting module names. Those modules are namelings.

We typically solve this problem using namespaces and aliases.

Namespaces provide a container, or space, for names to exist inside of to alleviate the conflict caused by the namelings. Among humans, this is the purpose served by surnames.

Aliases allow us to give namelings another name in a particular context. They are nicknames for the namelings that we use when we need to work with namelings at the same time and find using the namespace burdensome. Among humans, we use nicknames to clearly talk to, with, and about namelings.

I propose we change how we talk about our code in this regard. We should use these older words instead of inventing new one. So, we can talk about it like this:

Namespaces are used to resolve namelings. Nicknames are given to namelings when both are in the code together and we don’t want to use namespaces.

This is way more fun than naming conflicts and aliases!

Northern Meetups

I’ve got a couple of upcoming meetups in some of the more northerly states in the next week or so. Specifically, Michigan and Minnesota.

I’ll be giving An Introduction to WebAssembly in Ann Arbor, Michigan on Monday, December 10th at Southeast Michigan JavaScript. WebAssembly allows you to write front-end web code in languages other than JavaScript by creating a virtual machine that runs in the browser. It’s really neat stuff and I’ll be diving into the low-level details. And, I’ll be live-coding in WebAssembly Text Format so be prepared for epic failure!

On Wednesday, December 19th I’ll be presenting what is one of my favorite talks: Deep Learning like a Viking. It’s a talk about Vikings, Keras, and Convolutional Neural Networks. And how to combine these three amazing things into an application that recognizes hand-written runes from the Younger Futhark! The talk will be hosted by JavaScript MN in Minneapolis, Minnesota.

So, if you come from the land of the ice and snow, drop by and say hi!