Creating Text Embeddings Using Gemini (Python)

Note:

This is an extended article that builds on the foundational overview below.

1. Prerequisites and Setup

Before generating embeddings, ensure the following requirements are met.

Requirements

  • Python 3.9+
  • A valid Google AI Studio API key (get one here)
  • Google Generative AI SDK

Install the SDK

pip install google-generativeai

API Configuration

The Gemini API requires explicit configuration using an API key. This configuration should be performed once at application startup.

import google.generativeai as genai

genai.configure(api_key="YOUR_GEMINI_API_KEY")

For production environments, never hardcode API keys. Use environment variables:

import os
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

Set the environment variable in your shell:

export GEMINI_API_KEY="your-actual-api-key-here"

2. The Gemini Embedding Model

Gemini provides a dedicated embedding model optimized for semantic representation:

models/embedding-001

This model produces dense vector embeddings suitable for:

  • Semantic search
  • Document clustering
  • Similarity comparison
  • Retrieval-augmented generation (RAG)
  • Recommendation systems

Key Properties:

  • Dimensionality: 768
  • Output type: List[float]
  • Deterministic: Yes (same input → same output)
  • Maximum input tokens: ~2048 (varies; chunk longer text)

3. Creating a Single Embedding

The core API method used to generate embeddings is embed_content.

Example: Generating an Embedding for a Single Text

import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

embedding_response = genai.embed_content(
    model="models/embedding-001",
    content="Large language models enable semantic understanding of text."
)

embedding_vector = embedding_response["embedding"]

Output Characteristics

print(type(embedding_vector))        # <class 'list'>
print(len(embedding_vector))         # 768
print(embedding_vector[:5])          # [0.0234, -0.0891, 0.1423, ...]

Invariants:

  • embedding_vector is always a list of 768 floats
  • Values are not normalized by default (magnitudes vary)
  • Negative values are valid and expected

Pause-and-Verify Checkpoint

Before moving forward, verify this works:

text1 = "Python is a programming language"
text2 = "Python is a type of snake"

response1 = genai.embed_content(model="models/embedding-001", content=text1)
response2 = genai.embed_content(model="models/embedding-001", content=text2)

# These should be different vectors
print(response1["embedding"][:3])
print(response2["embedding"][:3])

If both vectors are identical, something is wrong with your setup.

4. Batch Embedding Generation

For real-world workloads (document indexing, dataset preprocessing), embeddings should be generated in batches rather than one at a time.

Example: Multiple Text Inputs

texts = [
    "Neural networks learn representations from data.",
    "Transformers scale effectively with large datasets.",
    "Embeddings enable semantic similarity search."
]

embeddings = []

for text in texts:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text
    )
    embeddings.append(response["embedding"])

# Each embedding corresponds positionally to the input text
print(f"Generated {len(embeddings)} embeddings")
print(f"First embedding shape: {len(embeddings[0])}")

Important: Each entry in embeddings corresponds positionally to the input text. Do not shuffle one without shuffling the other.

Why Batching Matters

  • Rate limit management: Send fewer total requests
  • Consistency: All embeddings use the same model state
  • Efficiency: Easier to implement retry logic and error handling

5. Embedding Structured Content

Gemini supports embedding structured content such as JSON-like dictionaries or multi-part text.

Example: Embedding Structured Input

content = {
    "title": "Vector Databases",
    "description": "Used to store and query high-dimensional embeddings"
}

response = genai.embed_content(
    model="models/embedding-001",
    content=content
)

embedding = response["embedding"]

This is useful when embedding metadata-enriched documents rather than raw text. The model will serialize the structure into text internally before embedding.

Use Case: Embedding product catalogs where title, description, and category all contribute to semantic meaning.

6. Embedding Long Documents

For long documents exceeding the token limit (~2048 tokens), the recommended approach is chunking.

Chunking Strategy

def chunk_text(text, chunk_size=500):
    """Split text into chunks of approximately chunk_size words."""
    words = text.split()
    for i in range(0, len(words), chunk_size):
        yield " ".join(words[i:i + chunk_size])

long_document = "..." # Your long document here

document_embeddings = []

for chunk in chunk_text(long_document):
    response = genai.embed_content(
        model="models/embedding-001",
        content=chunk
    )
    document_embeddings.append(response["embedding"])

print(f"Document split into {len(document_embeddings)} chunks")

Chunking Considerations

Chunk Boundaries:

  • Prefer sentence or paragraph boundaries over arbitrary word counts
  • Avoid splitting mid-sentence when possible
  • Consider overlapping chunks for continuity (last 50 words of chunk N = first 50 words of chunk N+1)

Downstream Usage:

  • Each chunk embedding can be indexed separately in a vector database
  • At query time, retrieve top-k most similar chunks
  • Aggregate chunk scores to rank documents

Tradeoff:

  • Smaller chunks: More granular retrieval, but lose broader context
  • Larger chunks: Better context preservation, but less precise matching

7. Best Practices

Storage Considerations

Type: Store embeddings as FLOAT32 arrays (not FLOAT64; precision is wasted)

Location: Prefer vector databases (FAISS, Milvus, Pinecone, Weaviate) for scale

Metadata: Always persist original text alongside embeddings for:

  • Debugging
  • Re-embedding if the model changes
  • Displaying results to users

Schema Example:

{
    "id": "doc_12345",
    "text": "Original document text...",
    "embedding": [0.023, -0.089, ...],  # 768 floats
    "metadata": {"source": "...", "timestamp": "..."}
}

Rate Limits and Caching

Rate Limits:

  • Gemini API has rate limits (requests per minute)
  • Implement exponential backoff on rate limit errors
  • Batch requests when possible to reduce API call count

Caching:

  • Embeddings for static content (knowledge base, product catalog) should be computed once and cached
  • Do not re-embed the same text repeatedly
  • Cache key: hash of preprocessed text + model name
import hashlib

def cache_key(text, model="models/embedding-001"):
    return hashlib.sha256(f"{model}:{text}".encode()).hexdigest()

8. Complete Minimal Example

Here is a complete, runnable example that demonstrates the full workflow.

import os
import google.generativeai as genai

# Configure API
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Sample texts
texts = [
    "Embeddings capture semantic meaning.",
    "Vector similarity enables intelligent retrieval.",
    "Machine learning models process structured data."
]

# Generate embeddings
embeddings = []

for text in texts:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text
    )
    embedding = response["embedding"]
    embeddings.append(embedding)
    
    print(f"Text: {text}")
    print(f"Embedding size: {len(embedding)}")
    print(f"First 5 values: {embedding[:5]}")
    print()

print(f"Total embeddings generated: {len(embeddings)}")

9. When Not to Use Embeddings

Do not use embeddings when:

  • Exact keyword matching is required (use traditional search)
  • The task is classification with fixed categories (train a classifier)
  • You need explainability (embeddings are black boxes)
  • Latency is critical and you cannot afford vector search overhead

10. Summary

Gemini's models/embedding-001 provides a high-quality, scalable mechanism for transforming text into dense semantic vectors. Using the Google Generative AI SDK, you can efficiently generate embeddings for:

  • Single inputs
  • Batched inputs
  • Structured content
  • Long documents (via chunking)

Core Invariants:

  • Same input → same vector (deterministic)
  • Semantically similar text → similar vectors
  • 768-dimensional output

Embeddings are not magic. They are a transformation with specific guarantees and failure modes. Treat them as such.

References