Weaviate + Google Vertex AI for Multi-modal Embedding

Overview of Multimodal Embeddings#

Multimodal Embeddings refer to the embedding representation of different data modalities (such as text, images, audio, video, etc.). Through this technology, Weaviate can transform various modalities of input (like text and images) into a unified vector representation for efficient similarity search or other machine learning tasks. The integration of Weaviate with Google Vertex AI makes this functionality more user-friendly and powerful. The integration of Weaviate with Google AI multimodal embeddings allows users to handle and search different types of multimodal data, which is very useful in scenarios that require processing different data modalities like text, images, and videos in large-scale databases. The integrated Google Vertex AI model is not only powerful but also supports various complex semantic and multimodal searches, enhancing the intelligence of data management and querying.

Deploying Weaviate Instance Using Docker#

services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.26.4
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - ./weaviate_data:/var/lib/weaviate
    - /root/.config/gcloud/application_default_credentials.json:/etc/weaviate/gcp-credentials.json
    restart: on-failure:0
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'multi2vec-palm'
      ENABLE_MODULES: 'multi2vec-palm,ref2vec-centroid'
      CLUSTER_HOSTNAME: 'node1'
      GOOGLE_APPLICATION_CREDENTIALS: '/etc/weaviate/gcp-credentials.json'
      USE_GOOGLE_AUTH: 'true'

You must provide valid API credentials to correctly use the Vertex AI integration. You can check the specific configuration method at API credentials.

Writing Code#

Connecting to Weaviate and Checking Connection#

import weaviate

client = weaviate.connect_to_local()

client.is_ready()

Creating Collection#

from weaviate.classes.config import Configure

if client.collections.exists("AnimeGirls"):
    client.collections.delete("AnimeGirls")

client.collections.create(
    name="AnimeGirls",
    vectorizer_config=Configure.Vectorizer.multi2vec_palm(
        image_fields=["image"],
        text_fields=["text"],
        video_fields=["video"],
        project_id="neurosearch-436306",
        location="europe-west1",
        model_id="multimodalembedding@001",
        dimensions=1408,
    ),
)

Creating Utility Function#

import base64
def to_base64(file_path: str) -> str:
    with open(file_path, "rb") as file:
        return base64.b64encode(file.read()).decode("utf-8")

Importing Data#

import os
from weaviate.util import generate_uuid5
anime_girls = client.collections.get("AnimeGirls")

sources = os.listdir("./images/")

with anime_girls.batch.dynamic() as batch:
    for name in sources:
        print(f"Adding {name}")

        path = "./images/" + name

        batch.add_object(
            {
                "name": name,
                "image": to_base64(path),
                "path": path,
                "mediaType": "image",
            },
            uuid=generate_uuid5(name),
        )

Checking if All Data Imported Successfully#

if len(anime_girls.batch.failed_objects) > 0:
    print(f"Failed to import {len(anime_girls.batch.failed_objects)} objects")
    for failed_object in anime_girls.batch.failed_objects:
        print(f"e.g. Failed to import object with error: {failed_object.message}")
else:
    print("All objects imported successfully")

Retrieval by Text#

import json
response = anime_girls.query.near_text(
    query="Seeing a girl through glasses",
    return_properties=["name", "path", "mediaType"],
    limit=2,
)

for obj in response.objects:
    print(json.dumps(obj.properties, indent=2))

from IPython.display import Image, display

def display_image(item: dict):
    path = item["path"]
    display(Image(path, width=300))

display_image(response.objects[0].properties)

Retrieval by Image#

response = anime_girls.query.near_image(
    near_image=to_base64("./images/121955436_p0_master1200.jpg"),
    return_properties=["name", "path", "mediaType"],
    limit=2,
)

# for obj in response.objects:
#     print(json.dumps(obj.properties, indent=2))

display_image(response.objects[0].properties)

Hybrid Retrieval#

response = anime_girls.query.hybrid(
    query="Seeing a girl through glasses",
    return_properties=["name", "path", "mediaType"],
    limit=2,
)

# for obj in response.objects:
#     print(json.dumps(obj.properties, indent=2))

display_image(response.objects[0].properties)

Returning All Vectors#

import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Assume embedding is your 1408-dimensional data
embedding = np.array([item.vector['default'] for item in anime_girls.iterator(include_vector=True)])

# Use PCA to reduce 1408-dimensional data to 2 dimensions
pca = PCA(n_components=2)
reduced_embedding = pca.fit_transform(embedding)

# Plot the reduced data
plt.figure(figsize=(10, 7))
plt.scatter(reduced_embedding[:, 0], reduced_embedding[:, 1], alpha=0.5)
plt.title('PCA of AnimeGirls Embeddings')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()