Recipe Recommender System Using Vector Database - Weaviate

Recipe Recommender System Using Vector Database - Weaviate

What is a Vector Database?

Unless you have been living under a rock, you have probably heard of vector databases, and probably Weaviate too. If you haven't, you should probably check out some dedicated article to that like here. In short, it's a scalable way to store, index, and retrieve numerical representations(vectors) of your entities. This enables efficient searches, retrievals, filtering etc.

Introduction to Weaviate

Anyways with that out of the way, let's take a look at weaviate. I recently found out about it. And so far, I am really liking it. First of all, it is opensource like chroma. Immediate win for me. Second of all they also provide their own hosted solutions which has a really nice free tier which I'm personally using in my own demo project myself (more on this later).

Implementation

So, me being me, I wanted to try this out myself by making a project. I saw weaviate's repository of demos, and tutorials, and decided to adapt one of them myself. Here's the demo for your reference. So, I decided to adapt it to a different dataset on Kaggle. It is a collection of recipes including their title, ingredients, steps. You can find the link to it here. And what I want to do is to make a little recommendation system of sorts.

Creating a Weaviate Cloud Cluster

Before we begin our cool data science stuff, we need to create a weaviate cloud cluster. You can go to their portal, sign up, and create one. Pretty standard stuff. And yes, almost forgot. Make sure to note down the http endpoint, and the API key. You will need these when accessing your cluster.

Creating a Collection

Now that we have our cluster, lets access it in python, and create a collection.

from weaviate.classes.config import Configure
 
client.connect()
recipes = client.collections.create(
        "<Collection Name>",
        vectorizer_config=[
            Configure.NamedVectors.text2vec_cohere(
                name="<Vector key>",
                source_properties=["<Source property key>"],
            )
        ],
)
  1. Collection Name: This will be the name of your collection that you will use to access your collection for any operation such as insertion, deletion etc.
  2. Source property key: Whatever objects (like python dictionaries) you insert, weaviate needs to know what property it should use for vectorizing them.
  3. Vector key : This is simply the name of the vector produced from the source property.
  4. Please note that here I have used text2vec_cohere. This is weaviate's integration for cohere's embedding service. Feel free to use some other service like OpenAI's. Here's a complete list.

Data Processing & Indexing

I made a Kaggle notebook to easily access the dataset. And now I need to build a workflow or pipeline of sorts to take the data from this raw format and upload it to weaviate. The steps are pretty basic, and have been outlined here:

  1. Loading the data.
  2. Processing & cleaning the data.
  3. Vector Embeddings using Cohere.
  4. Accessing the Weaviate Cloud Cluster.
  5. Creating a collection in the cluster (if not already created).
  6. Inserting the embeddings into the collection.

The code for this all can be found in this notebook here.

Client Usage

Now that our data is stored on our weaviate instance, we can search, retrieve, or fetch it client side. Since my client is a Next JS, I have used weaviate's JavaScript client. Here's a simple function I made to retrieve all my recipes from the cloud instance.

import weaviate from "weaviate-client";
 
export async function findRecipesByArgument(searchTerm, alpha) {
  const client = await weaviate.connectToWeaviateCloud(process.env.WCS_URL, {
    authCredentials: new weaviate.ApiKey(process.env.WCS_API_KEY),
    headers: {
      "X-Cohere-Api-Key": process.env.COHERE_APIKEY,
    },
  });
 
  const Recipe = await client.collections.get("Recipe");
 
  const { objects: recipes } = await Recipe.query.hybrid(searchTerm, {
    limit: 20,
    alpha,
    returnMetadata: ["score"],
  });
 
  return recipes;
}
 

In the demo, I have used weaviate's hybrid search which is pretty powerful. Basically it combines keyword search, and vector search. You can provide a sensitivity parameter alpha to basically quantify how much weight to give to keywords, and how much to vectors. An alpha of 0, is just keyword search, and alpha of 1 is vector search. Any value in between is a hybrid approach.

Next JS Frontend

I'm not going to bore you with the frontend details. You can find the entire code at https://github.com/mahadhameed095/recipe-finder. The application is deployed on Vercel.

Conclusion

Weaviate has been in the vector space even before the Generative AI, and ChatGPT hype. Its open-source nature and generous weaviate cloud free tier is just chef's kiss. Setting up the Weaviate cloud cluster and indexing the data was straightforward, thanks to the excellent documentation.

Combining Weaviate with a Next.js frontend was super easy too since they provided so many demos, and references. If you're curious about vector databases or want to create your own recommendation system, give Weaviate a shot. They have plenty of tutorials, it's easy to get started. Happy experimenting!