import os
from google import genai
from google.genai import types
import numpy as np
# Initialize client
client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])
# Text embedding test
result = client.models.embed_content(
model="gemini-embedding-2-preview",
contents="Political communication strategies on YouTube Shorts"
)
vec = np.array(result.embeddings[0].values)
print(f"Dimensions: {vec.shape}") # (3072,)
print(f"First 5 values: {vec[:5]}")
print(f"Vector norm: {np.linalg.norm(vec):.4f}")1. Setup - API Configuration and Connection Test
Step 1: Get an API key
- Go to Google AI Studio and sign in with your Google account
- Click “Get API key” in the sidebar or top-right
- Click “Create API key” → “Create API key in new project”
- Copy the generated key (format:
AIzaSy..., ~39 characters)
Note
Unlike some services, you can view your API key again later in AI Studio. However, never commit it to GitHub or include it in paper replication code.
Free tier vs Paid tier
- Free tier: Sufficient for a 10-video pilot. However, your data is used to improve Google’s models.
- Paid tier: If data exposure is a concern, you can switch to paid starting at $5. Your data will not be used for model training.
Step 2: Environment setup
Store the API key as an environment variable
Never hardcode the key in your scripts.
# Option A: Current terminal session only (for testing)
export GOOGLE_API_KEY="AIzaSy..."
# Option B: Persistent (recommended)
echo 'export GOOGLE_API_KEY="AIzaSy..."' >> ~/.zshrc
source ~/.zshrc
# Verify
echo $GOOGLE_API_KEYInstall Python packages
pip install google-genai
pip install umap-learn hdbscan
pip install seaborn scikit-learn pandas numpy# Check ffmpeg (needed for audio extraction)
ffmpeg -version
# If missing: brew install ffmpegStep 3: Connection test (text embedding)
Start with the simplest test. Text is much cheaper and faster than video.
If this runs without error, the API connection is working.
ImportantCritical: how
contents parameter works
contents="string"orcontents=["a", "b", "c"]→ returns separate embeddings for each itemcontents=types.Content(parts=[part1, part2, part3])→ returns one unified embedding
Since we need a single vector per Short, we must use types.Content(parts=[...]). The Google blog example uses the list format, which would produce separate embeddings. Do not copy it as-is.
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
DefaultCredentialsError |
API key not set | export GOOGLE_API_KEY=... |
Resource exhausted |
Free tier limit hit | Wait 24h or switch to Paid |
Invalid MIME type |
Extension/MIME mismatch | Use video/mp4, audio/mpeg |
PROCESSING timeout |
File API processing delay | Increase time.sleep() to 5s |
File not found |
48h auto-deletion | Re-upload the file |