Creating Text Embeddings Using Gemini (Python)

February 03, 2026

Embeddings Gemini Python Vector Search RAG Semantic Search

Note:

This is an extended article that builds on the foundational overview below.

Understanding Embeddings: The Semantic Backbone of LLMs

1. Prerequisites and Setup

Before generating embeddings, ensure the following requirements are met.

Requirements

Python 3.9+
A valid Google AI Studio API key (get one here)
Google Generative AI SDK

Install the SDK

pip install google-generativeai

API Configuration

The Gemini API requires explicit configuration using an API key. This configuration should be performed once at application startup.

import google.generativeai as genai

genai.configure(api_key="YOUR_GEMINI_API_KEY")

For production environments, never hardcode API keys. Use environment variables:

import os
import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

Set the environment variable in your shell:

export GEMINI_API_KEY="your-actual-api-key-here"

2. The Gemini Embedding Model

Gemini provides a dedicated embedding model optimized for semantic representation:

models/embedding-001

This model produces dense vector embeddings suitable for:

Semantic search
Document clustering
Similarity comparison
Retrieval-augmented generation (RAG)
Recommendation systems

Key Properties:

Dimensionality: 768
Output type: List[float]
Deterministic: Yes (same input → same output)
Maximum input tokens: ~2048 (varies; chunk longer text)

3. Creating a Single Embedding

The core API method used to generate embeddings is embed_content.

Example: Generating an Embedding for a Single Text

import google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])

embedding_response = genai.embed_content(
    model="models/embedding-001",
    content="Large language models enable semantic understanding of text."
)

embedding_vector = embedding_response["embedding"]

Output Characteristics

print(type(embedding_vector))        # <class 'list'>
print(len(embedding_vector))         # 768
print(embedding_vector[:5])          # [0.0234, -0.0891, 0.1423, ...]

Invariants:

embedding_vector is always a list of 768 floats
Values are not normalized by default (magnitudes vary)
Negative values are valid and expected

Pause-and-Verify Checkpoint

Before moving forward, verify this works:

text1 = "Python is a programming language"
text2 = "Python is a type of snake"

response1 = genai.embed_content(model="models/embedding-001", content=text1)
response2 = genai.embed_content(model="models/embedding-001", content=text2)

# These should be different vectors
print(response1["embedding"][:3])
print(response2["embedding"][:3])

If both vectors are identical, something is wrong with your setup.

4. Batch Embedding Generation

For real-world workloads (document indexing, dataset preprocessing), embeddings should be generated in batches rather than one at a time.

Example: Multiple Text Inputs

texts = [
    "Neural networks learn representations from data.",
    "Transformers scale effectively with large datasets.",
    "Embeddings enable semantic similarity search."
]

embeddings = []

for text in texts:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text
    )
    embeddings.append(response["embedding"])

# Each embedding corresponds positionally to the input text
print(f"Generated {len(embeddings)} embeddings")
print(f"First embedding shape: {len(embeddings[0])}")

Important: Each entry in embeddings corresponds positionally to the input text. Do not shuffle one without shuffling the other.

Why Batching Matters

Rate limit management: Send fewer total requests
Consistency: All embeddings use the same model state
Efficiency: Easier to implement retry logic and error handling

5. Embedding Structured Content

Gemini supports embedding structured content such as JSON-like dictionaries or multi-part text.

Example: Embedding Structured Input

content = {
    "title": "Vector Databases",
    "description": "Used to store and query high-dimensional embeddings"
}

response = genai.embed_content(
    model="models/embedding-001",
    content=content
)

embedding = response["embedding"]

This is useful when embedding metadata-enriched documents rather than raw text. The model will serialize the structure into text internally before embedding.

Use Case: Embedding product catalogs where title, description, and category all contribute to semantic meaning.

6. Embedding Long Documents

For long documents exceeding the token limit (~2048 tokens), the recommended approach is chunking.

Chunking Strategy

def chunk_text(text, chunk_size=500):
    """Split text into chunks of approximately chunk_size words."""
    words = text.split()
    for i in range(0, len(words), chunk_size):
        yield " ".join(words[i:i + chunk_size])

long_document = "..." # Your long document here

document_embeddings = []

for chunk in chunk_text(long_document):
    response = genai.embed_content(
        model="models/embedding-001",
        content=chunk
    )
    document_embeddings.append(response["embedding"])

print(f"Document split into {len(document_embeddings)} chunks")

Chunking Considerations

Chunk Boundaries:

Prefer sentence or paragraph boundaries over arbitrary word counts
Avoid splitting mid-sentence when possible
Consider overlapping chunks for continuity (last 50 words of chunk N = first 50 words of chunk N+1)

Downstream Usage:

Each chunk embedding can be indexed separately in a vector database
At query time, retrieve top-k most similar chunks
Aggregate chunk scores to rank documents

Tradeoff:

Smaller chunks: More granular retrieval, but lose broader context
Larger chunks: Better context preservation, but less precise matching

7. Best Practices

Storage Considerations

Type: Store embeddings as FLOAT32 arrays (not FLOAT64; precision is wasted)

Location: Prefer vector databases (FAISS, Milvus, Pinecone, Weaviate) for scale

Metadata: Always persist original text alongside embeddings for:

Debugging
Re-embedding if the model changes
Displaying results to users

Schema Example:

{
    "id": "doc_12345",
    "text": "Original document text...",
    "embedding": [0.023, -0.089, ...],  # 768 floats
    "metadata": {"source": "...", "timestamp": "..."}
}

Rate Limits and Caching

Rate Limits:

Gemini API has rate limits (requests per minute)
Implement exponential backoff on rate limit errors
Batch requests when possible to reduce API call count

Caching:

Embeddings for static content (knowledge base, product catalog) should be computed once and cached
Do not re-embed the same text repeatedly
Cache key: hash of preprocessed text + model name

import hashlib

def cache_key(text, model="models/embedding-001"):
    return hashlib.sha256(f"{model}:{text}".encode()).hexdigest()

8. Complete Minimal Example

Here is a complete, runnable example that demonstrates the full workflow.

import os
import google.generativeai as genai

# Configure API
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Sample texts
texts = [
    "Embeddings capture semantic meaning.",
    "Vector similarity enables intelligent retrieval.",
    "Machine learning models process structured data."
]

# Generate embeddings
embeddings = []

for text in texts:
    response = genai.embed_content(
        model="models/embedding-001",
        content=text
    )
    embedding = response["embedding"]
    embeddings.append(embedding)
    
    print(f"Text: {text}")
    print(f"Embedding size: {len(embedding)}")
    print(f"First 5 values: {embedding[:5]}")
    print()

print(f"Total embeddings generated: {len(embeddings)}")

9. When Not to Use Embeddings

Do not use embeddings when:

Exact keyword matching is required (use traditional search)
The task is classification with fixed categories (train a classifier)
You need explainability (embeddings are black boxes)
Latency is critical and you cannot afford vector search overhead

10. Summary

Gemini's models/embedding-001 provides a high-quality, scalable mechanism for transforming text into dense semantic vectors. Using the Google Generative AI SDK, you can efficiently generate embeddings for:

Single inputs
Batched inputs
Structured content
Long documents (via chunking)

Core Invariants:

Same input → same vector (deterministic)
Semantically similar text → similar vectors
768-dimensional output

Embeddings are not magic. They are a transformation with specific guarantees and failure modes. Treat them as such.

References

Google AI Studio - API key generation
Google Generative AI Python SDK - Official SDK documentation
Vector databases: FAISS, Pinecone, Milvus, Weaviate