Creating Text Embeddings Using Gemini (Python)
Note:
This is an extended article that builds on the foundational overview below.
1. Prerequisites and Setup
Before generating embeddings, ensure the following requirements are met.
Requirements
- Python 3.9+
- A valid Google AI Studio API key (get one here)
- Google Generative AI SDK
Install the SDK
pip install google-generativeai
API Configuration
The Gemini API requires explicit configuration using an API key. This configuration should be performed once at application startup.
import google.generativeai as genai
genai.configure(api_key="YOUR_GEMINI_API_KEY")
For production environments, never hardcode API keys. Use environment variables:
import os
import google.generativeai as genai
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
Set the environment variable in your shell:
export GEMINI_API_KEY="your-actual-api-key-here"
2. The Gemini Embedding Model
Gemini provides a dedicated embedding model optimized for semantic representation:
models/embedding-001
This model produces dense vector embeddings suitable for:
- Semantic search
- Document clustering
- Similarity comparison
- Retrieval-augmented generation (RAG)
- Recommendation systems
Key Properties:
- Dimensionality: 768
- Output type: List[float]
- Deterministic: Yes (same input → same output)
- Maximum input tokens: ~2048 (varies; chunk longer text)
3. Creating a Single Embedding
The core API method used to generate embeddings is embed_content.
Example: Generating an Embedding for a Single Text
import google.generativeai as genai
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
embedding_response = genai.embed_content(
model="models/embedding-001",
content="Large language models enable semantic understanding of text."
)
embedding_vector = embedding_response["embedding"]
Output Characteristics
print(type(embedding_vector)) # <class 'list'>
print(len(embedding_vector)) # 768
print(embedding_vector[:5]) # [0.0234, -0.0891, 0.1423, ...]
Invariants:
embedding_vectoris always a list of 768 floats- Values are not normalized by default (magnitudes vary)
- Negative values are valid and expected
Pause-and-Verify Checkpoint
Before moving forward, verify this works:
text1 = "Python is a programming language"
text2 = "Python is a type of snake"
response1 = genai.embed_content(model="models/embedding-001", content=text1)
response2 = genai.embed_content(model="models/embedding-001", content=text2)
# These should be different vectors
print(response1["embedding"][:3])
print(response2["embedding"][:3])
If both vectors are identical, something is wrong with your setup.
4. Batch Embedding Generation
For real-world workloads (document indexing, dataset preprocessing), embeddings should be generated in batches rather than one at a time.
Example: Multiple Text Inputs
texts = [
"Neural networks learn representations from data.",
"Transformers scale effectively with large datasets.",
"Embeddings enable semantic similarity search."
]
embeddings = []
for text in texts:
response = genai.embed_content(
model="models/embedding-001",
content=text
)
embeddings.append(response["embedding"])
# Each embedding corresponds positionally to the input text
print(f"Generated {len(embeddings)} embeddings")
print(f"First embedding shape: {len(embeddings[0])}")
Important: Each entry in embeddings corresponds positionally to the input text. Do not shuffle one without shuffling the other.
Why Batching Matters
- Rate limit management: Send fewer total requests
- Consistency: All embeddings use the same model state
- Efficiency: Easier to implement retry logic and error handling
5. Embedding Structured Content
Gemini supports embedding structured content such as JSON-like dictionaries or multi-part text.
Example: Embedding Structured Input
content = {
"title": "Vector Databases",
"description": "Used to store and query high-dimensional embeddings"
}
response = genai.embed_content(
model="models/embedding-001",
content=content
)
embedding = response["embedding"]
This is useful when embedding metadata-enriched documents rather than raw text. The model will serialize the structure into text internally before embedding.
Use Case: Embedding product catalogs where title, description, and category all contribute to semantic meaning.
6. Embedding Long Documents
For long documents exceeding the token limit (~2048 tokens), the recommended approach is chunking.
Chunking Strategy
def chunk_text(text, chunk_size=500):
"""Split text into chunks of approximately chunk_size words."""
words = text.split()
for i in range(0, len(words), chunk_size):
yield " ".join(words[i:i + chunk_size])
long_document = "..." # Your long document here
document_embeddings = []
for chunk in chunk_text(long_document):
response = genai.embed_content(
model="models/embedding-001",
content=chunk
)
document_embeddings.append(response["embedding"])
print(f"Document split into {len(document_embeddings)} chunks")
Chunking Considerations
Chunk Boundaries:
- Prefer sentence or paragraph boundaries over arbitrary word counts
- Avoid splitting mid-sentence when possible
- Consider overlapping chunks for continuity (last 50 words of chunk N = first 50 words of chunk N+1)
Downstream Usage:
- Each chunk embedding can be indexed separately in a vector database
- At query time, retrieve top-k most similar chunks
- Aggregate chunk scores to rank documents
Tradeoff:
- Smaller chunks: More granular retrieval, but lose broader context
- Larger chunks: Better context preservation, but less precise matching
7. Best Practices
Storage Considerations
Type: Store embeddings as FLOAT32 arrays (not FLOAT64; precision is wasted)
Location: Prefer vector databases (FAISS, Milvus, Pinecone, Weaviate) for scale
Metadata: Always persist original text alongside embeddings for:
- Debugging
- Re-embedding if the model changes
- Displaying results to users
Schema Example:
{
"id": "doc_12345",
"text": "Original document text...",
"embedding": [0.023, -0.089, ...], # 768 floats
"metadata": {"source": "...", "timestamp": "..."}
}
Rate Limits and Caching
Rate Limits:
- Gemini API has rate limits (requests per minute)
- Implement exponential backoff on rate limit errors
- Batch requests when possible to reduce API call count
Caching:
- Embeddings for static content (knowledge base, product catalog) should be computed once and cached
- Do not re-embed the same text repeatedly
- Cache key: hash of preprocessed text + model name
import hashlib
def cache_key(text, model="models/embedding-001"):
return hashlib.sha256(f"{model}:{text}".encode()).hexdigest()
8. Complete Minimal Example
Here is a complete, runnable example that demonstrates the full workflow.
import os
import google.generativeai as genai
# Configure API
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
# Sample texts
texts = [
"Embeddings capture semantic meaning.",
"Vector similarity enables intelligent retrieval.",
"Machine learning models process structured data."
]
# Generate embeddings
embeddings = []
for text in texts:
response = genai.embed_content(
model="models/embedding-001",
content=text
)
embedding = response["embedding"]
embeddings.append(embedding)
print(f"Text: {text}")
print(f"Embedding size: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
print()
print(f"Total embeddings generated: {len(embeddings)}")
9. When Not to Use Embeddings
Do not use embeddings when:
- Exact keyword matching is required (use traditional search)
- The task is classification with fixed categories (train a classifier)
- You need explainability (embeddings are black boxes)
- Latency is critical and you cannot afford vector search overhead
10. Summary
Gemini's models/embedding-001 provides a high-quality, scalable mechanism for transforming text into dense semantic vectors. Using the Google Generative AI SDK, you can efficiently generate embeddings for:
- Single inputs
- Batched inputs
- Structured content
- Long documents (via chunking)
Core Invariants:
- Same input → same vector (deterministic)
- Semantically similar text → similar vectors
- 768-dimensional output
Embeddings are not magic. They are a transformation with specific guarantees and failure modes. Treat them as such.
References
- Google AI Studio - API key generation
- Google Generative AI Python SDK - Official SDK documentation
- Vector databases: FAISS, Pinecone, Milvus, Weaviate