Overview
Vertex AI is Google Cloud's enterprise ML platform. It provides access to the same Gemini models as Google AI Studio, but with enterprise-grade features: IAM-based auth (no API keys), VPC Service Controls for data isolation, audit logging, fine-tuning capabilities, batch prediction jobs, and integration with GCP data services like BigQuery and Cloud Storage.
Vertex AI vs Google AI Studio
| Feature | Google AI Studio | Vertex AI |
|---|---|---|
| Auth | API Key | Service Account / IAM |
| Data residency | Limited | GCP regions |
| VPC isolation | ❌ | ✅ |
| Audit logging | ❌ | ✅ Cloud Audit Logs |
| Fine-tuning | ❌ | ✅ |
| Batch prediction | ❌ | ✅ |
| Pricing | Per token | Per token (different rates) |
| Quotas | Shared | Project-level quotas |
Naming note: "Vertex AI" is being rebranded to Agent Platform (full name: Gemini Enterprise Agent Platform). The endpoints, IAM roles, and SDKs are the same product — most documentation still uses the legacy "Vertex AI" name.
SDK Choice — Use the Unified Gen AI SDK
Google now ships a single google-genai SDK that targets both Agent Platform (Vertex) and Google AI Studio with the same code. Use this for all new code. The legacy google-cloud-aiplatform and vertexai modules are deprecated.
| New (use this) | Legacy (deprecated) |
|---|---|
google-genai (Python) | google-cloud-aiplatform, google-generativeai |
@google/genai (JS/TS) | @google-cloud/vertexai |
google.golang.org/genai (Go) | cloud.google.com/go/vertexai |
com.google.genai:google-genai (Java) | — |
Google.GenAI (.NET) | — |
# Recommended: unified Gen AI SDK
pip install google-genai
import os
from google import genai
os.environ["GOOGLE_CLOUD_PROJECT"] = "my-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "global" # routes to nearest region
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"
client = genai.Client() # picks up env vars
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain containerization in simple terms.",
)
print(response.text)
Use location="global" by default — routes to the region with available capacity. Pin to a specific region (us-central1, europe-west4) only when data residency requires it.
Setup (Legacy SDK — only for existing code)
pip install google-cloud-aiplatform
# Authenticate
gcloud auth application-default login
# Or use service account
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Set project and location
export GOOGLE_CLOUD_PROJECT=my-project-id
export GOOGLE_CLOUD_LOCATION=us-central1
Instructions
The examples below use the legacy
google-cloud-aiplatformSDK. For new code, prefer the unifiedgoogle-genaiSDK shown above — same capabilities, cross-platform, current best practice.
Basic Gemini Inference
import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")
response = model.generate_content("Explain containerization in simple terms.")
print(response.text)
Multi-Modal Inference
import vertexai
from vertexai.generative_models import GenerativeModel, Part
import base64
vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")
# Analyze image from Cloud Storage
gcs_image = Part.from_uri(
uri="gs://my-bucket/product-photo.jpg",
mime_type="image/jpeg",
)
response = model.generate_content(["Describe this product:", gcs_image])
print(response.text)
# Analyze local image
with open("chart.png", "rb") as f:
image_data = f.read()
local_image = Part.from_data(data=image_data, mime_type="image/png")
response = model.generate_content(["What trends does this chart show?", local_image])
print(response.text)
Streaming Responses
import vertexai
from vertexai.generative_models import GenerativeModel
vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")
for chunk in model.generate_content("Write a product description for a smartwatch.", stream=True):
print(chunk.text, end="", flush=True)
print()
Chat Session
import vertexai
from vertexai.generative_models import GenerativeModel, ChatSession
vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel(
model_name="gemini-2.0-flash-001",
system_instruction="You are a GCP expert. Provide concise, actionable answers.",
)
chat = model.start_chat()
print(chat.send_message("How do I set up Cloud Run?").text)
print(chat.send_message("What about environment variables?").text)
Function Calling
import vertexai
from vertexai.generative_models import (
FunctionDeclaration,
GenerativeModel,
Tool,
)
vertexai.init(project="my-project-id", location="us-central1")
get_bq_query = FunctionDeclaration(
name="run_bigquery_query",
description="Run a SQL query on BigQuery and return results",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "SQL query to execute"},
"dataset": {"type": "string", "description": "BigQuery dataset name"},
},
"required": ["query"],
},
)
tool = Tool(function_declarations=[get_bq_query])
model = GenerativeModel("gemini-2.0-flash-001", tools=[tool])
response = model.generate_content("How many users signed up last week?")
if response.candidates[0].function_calls:
fc = response.candidates[0].function_calls[0]
print(f"Function: {fc.name}, Args: {dict(fc.args)}")
Fine-Tuning Gemini
import vertexai
from vertexai.tuning import sft
vertexai.init(project="my-project-id", location="us-central1")
# Prepare training data in JSONL format in GCS:
# {"messages": [{"role": "user", "content": "..."}, {"role": "model", "content": "..."}]}
tuning_job = sft.train(
source_model="gemini-2.0-flash-001",
train_dataset="gs://my-bucket/training-data.jsonl",
validation_dataset="gs://my-bucket/validation-data.jsonl",
tuned_model_display_name="my-fine-tuned-gemini",
epochs=3,
learning_rate_multiplier=1.0,
)
print(f"Tuning job: {tuning_job.resource_name}")
print(f"State: {tuning_job.state}")
# Wait for completion
tuning_job.wait()
print(f"Tuned model: {tuning_job.tuned_model_name}")
Batch Prediction
import vertexai
from vertexai.generative_models import GenerativeModel
from vertexai.preview.batch_prediction import BatchPredictionJob
vertexai.init(project="my-project-id", location="us-central1")
# Input JSONL format in GCS:
# {"request": {"contents": [{"role": "user", "parts": [{"text": "Translate: Hello"}]}]}}
job = BatchPredictionJob.submit(
source_model="gemini-2.0-flash-001",
input_dataset="gs://my-bucket/batch-inputs.jsonl",
output_uri_prefix="gs://my-bucket/batch-outputs/",
)
print(f"Batch job: {job.resource_name}")
job.wait()
print(f"Output: {job.output_location}")
IAM Setup for Service Account
# Create a service account for your app
gcloud iam service-accounts create gemini-app-sa \
--display-name="Gemini App Service Account"
# Grant Vertex AI User role
gcloud projects add-iam-policy-binding my-project-id \
--member="serviceAccount:gemini-app-sa@my-project-id.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Download key (for non-GCP environments)
gcloud iam service-accounts keys create key.json \
--iam-account=gemini-app-sa@my-project-id.iam.gserviceaccount.com
VPC Service Controls (Enterprise Isolation)
# When VPC SC is enabled, all API calls must originate from within the perimeter
# Configure the SDK to use private endpoints:
import vertexai
vertexai.init(
project="my-project-id",
location="us-central1",
api_endpoint="us-central1-aiplatform.googleapis.com", # Regional endpoint
)
Available Gemini Models on Vertex AI
| Model ID | Notes |
|---|---|
gemini-2.0-flash-001 | Latest Flash, fast + capable |
gemini-1.5-pro-002 | 2M context, most capable |
gemini-1.5-flash-002 | 1M context, balanced |
text-embedding-005 | Latest embeddings (768 dims) |
Use gemini-2.0-flash-001 (version pinned) in production to avoid unexpected model changes.
Examples
Example 1 — Migrate a Python service from google-cloud-aiplatform to google-genai
User has a recommendation service running on Cloud Run that uses the legacy google-cloud-aiplatform SDK. Replace pip install google-cloud-aiplatform with pip install google-genai, swap vertexai.init(...) + GenerativeModel(...) for genai.Client() (with GOOGLE_GENAI_USE_VERTEXAI=true), update model.generate_content(...) to client.models.generate_content(model=..., contents=...). Keep the existing service account and IAM bindings — same auth, new SDK. Pin to gemini-2.5-flash for cost, validate parity in staging before cutover.
Example 2 — Run nightly batch translation of 5M product titles
User has 5M product titles in BigQuery to translate into 4 languages. Streaming inference would be slow and expensive. Format input as JSONL in GCS ({"request": {"contents": [...]}}) per row × language, submit a BatchPredictionJob against gemini-2.5-flash, and let it run unattended. Output JSONL lands in GCS, load it back into BigQuery via bq load. Cost is roughly half of streaming, runtime is hours not days.
Guidelines
- Use
google-genaifor all new code —google-cloud-aiplatformandgoogle-generativeaiare deprecated. - Always pin model versions in production for stability —
gemini-2.5-flashis fine for non-prod, but production should target a specific build. - Use Application Default Credentials (
gcloud auth application-default login) during development. - In GKE or Cloud Run, use Workload Identity — no service account keys needed.
- Default
location="global"for the Gen AI SDK; pin to a region only for data residency. - Fine-tuning requires a training JSONL with
messagesformat and at least 100 examples. - Batch prediction is cost-effective for offline bulk inference (no streaming).
- Enable Cloud Audit Logs on the
aiplatform.googleapis.comservice for compliance.