Overview

Vertex AI is Google Cloud's enterprise ML platform. It provides access to the same Gemini models as Google AI Studio, but with enterprise-grade features: IAM-based auth (no API keys), VPC Service Controls for data isolation, audit logging, fine-tuning capabilities, batch prediction jobs, and integration with GCP data services like BigQuery and Cloud Storage.

Vertex AI vs Google AI Studio

Feature	Google AI Studio	Vertex AI
Auth	API Key	Service Account / IAM
Data residency	Limited	GCP regions
VPC isolation	❌	✅
Audit logging	❌	✅ Cloud Audit Logs
Fine-tuning	❌	✅
Batch prediction	❌	✅
Pricing	Per token	Per token (different rates)
Quotas	Shared	Project-level quotas

Naming note: "Vertex AI" is being rebranded to Agent Platform (full name: Gemini Enterprise Agent Platform). The endpoints, IAM roles, and SDKs are the same product — most documentation still uses the legacy "Vertex AI" name.

SDK Choice — Use the Unified Gen AI SDK

Google now ships a single google-genai SDK that targets both Agent Platform (Vertex) and Google AI Studio with the same code. Use this for all new code. The legacy google-cloud-aiplatform and vertexai modules are deprecated.

New (use this)	Legacy (deprecated)
`google-genai` (Python)	`google-cloud-aiplatform`, `google-generativeai`
`@google/genai` (JS/TS)	`@google-cloud/vertexai`
`google.golang.org/genai` (Go)	`cloud.google.com/go/vertexai`
`com.google.genai:google-genai` (Java)	—
`Google.GenAI` (.NET)	—

bash

# Recommended: unified Gen AI SDK
pip install google-genai

python

import os
from google import genai

os.environ["GOOGLE_CLOUD_PROJECT"] = "my-project-id"
os.environ["GOOGLE_CLOUD_LOCATION"] = "global"  # routes to nearest region
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "true"

client = genai.Client()  # picks up env vars

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain containerization in simple terms.",
)
print(response.text)

Use location="global" by default — routes to the region with available capacity. Pin to a specific region (us-central1, europe-west4) only when data residency requires it.

Setup (Legacy SDK — only for existing code)

bash

pip install google-cloud-aiplatform

bash

# Authenticate
gcloud auth application-default login

# Or use service account
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

bash

# Set project and location
export GOOGLE_CLOUD_PROJECT=my-project-id
export GOOGLE_CLOUD_LOCATION=us-central1

Instructions

The examples below use the legacy google-cloud-aiplatform SDK. For new code, prefer the unified google-genai SDK shown above — same capabilities, cross-platform, current best practice.

Basic Gemini Inference

python

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project-id", location="us-central1")

model = GenerativeModel("gemini-2.0-flash-001")
response = model.generate_content("Explain containerization in simple terms.")
print(response.text)

Multi-Modal Inference

python

import vertexai
from vertexai.generative_models import GenerativeModel, Part
import base64

vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")

# Analyze image from Cloud Storage
gcs_image = Part.from_uri(
    uri="gs://my-bucket/product-photo.jpg",
    mime_type="image/jpeg",
)
response = model.generate_content(["Describe this product:", gcs_image])
print(response.text)

# Analyze local image
with open("chart.png", "rb") as f:
    image_data = f.read()

local_image = Part.from_data(data=image_data, mime_type="image/png")
response = model.generate_content(["What trends does this chart show?", local_image])
print(response.text)

Streaming Responses

python

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(project="my-project-id", location="us-central1")
model = GenerativeModel("gemini-2.0-flash-001")

for chunk in model.generate_content("Write a product description for a smartwatch.", stream=True):
    print(chunk.text, end="", flush=True)
print()

Chat Session

python

import vertexai
from vertexai.generative_models import GenerativeModel, ChatSession

vertexai.init(project="my-project-id", location="us-central1")

model = GenerativeModel(
    model_name="gemini-2.0-flash-001",
    system_instruction="You are a GCP expert. Provide concise, actionable answers.",
)

chat = model.start_chat()
print(chat.send_message("How do I set up Cloud Run?").text)
print(chat.send_message("What about environment variables?").text)

Function Calling

python

import vertexai
from vertexai.generative_models import (
    FunctionDeclaration,
    GenerativeModel,
    Tool,
)

vertexai.init(project="my-project-id", location="us-central1")

get_bq_query = FunctionDeclaration(
    name="run_bigquery_query",
    description="Run a SQL query on BigQuery and return results",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "SQL query to execute"},
            "dataset": {"type": "string", "description": "BigQuery dataset name"},
        },
        "required": ["query"],
    },
)

tool = Tool(function_declarations=[get_bq_query])
model = GenerativeModel("gemini-2.0-flash-001", tools=[tool])

response = model.generate_content("How many users signed up last week?")

if response.candidates[0].function_calls:
    fc = response.candidates[0].function_calls[0]
    print(f"Function: {fc.name}, Args: {dict(fc.args)}")

Fine-Tuning Gemini

python

import vertexai
from vertexai.tuning import sft

vertexai.init(project="my-project-id", location="us-central1")

# Prepare training data in JSONL format in GCS:
# {"messages": [{"role": "user", "content": "..."}, {"role": "model", "content": "..."}]}

tuning_job = sft.train(
    source_model="gemini-2.0-flash-001",
    train_dataset="gs://my-bucket/training-data.jsonl",
    validation_dataset="gs://my-bucket/validation-data.jsonl",
    tuned_model_display_name="my-fine-tuned-gemini",
    epochs=3,
    learning_rate_multiplier=1.0,
)

print(f"Tuning job: {tuning_job.resource_name}")
print(f"State: {tuning_job.state}")

# Wait for completion
tuning_job.wait()
print(f"Tuned model: {tuning_job.tuned_model_name}")

Batch Prediction

python

import vertexai
from vertexai.generative_models import GenerativeModel
from vertexai.preview.batch_prediction import BatchPredictionJob

vertexai.init(project="my-project-id", location="us-central1")

# Input JSONL format in GCS:
# {"request": {"contents": [{"role": "user", "parts": [{"text": "Translate: Hello"}]}]}}

job = BatchPredictionJob.submit(
    source_model="gemini-2.0-flash-001",
    input_dataset="gs://my-bucket/batch-inputs.jsonl",
    output_uri_prefix="gs://my-bucket/batch-outputs/",
)

print(f"Batch job: {job.resource_name}")
job.wait()
print(f"Output: {job.output_location}")

IAM Setup for Service Account

bash

# Create a service account for your app
gcloud iam service-accounts create gemini-app-sa \
    --display-name="Gemini App Service Account"

# Grant Vertex AI User role
gcloud projects add-iam-policy-binding my-project-id \
    --member="serviceAccount:gemini-app-sa@my-project-id.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

# Download key (for non-GCP environments)
gcloud iam service-accounts keys create key.json \
    --iam-account=gemini-app-sa@my-project-id.iam.gserviceaccount.com

VPC Service Controls (Enterprise Isolation)

python

# When VPC SC is enabled, all API calls must originate from within the perimeter
# Configure the SDK to use private endpoints:

import vertexai

vertexai.init(
    project="my-project-id",
    location="us-central1",
    api_endpoint="us-central1-aiplatform.googleapis.com",  # Regional endpoint
)

Available Gemini Models on Vertex AI

Model ID	Notes
`gemini-2.0-flash-001`	Latest Flash, fast + capable
`gemini-1.5-pro-002`	2M context, most capable
`gemini-1.5-flash-002`	1M context, balanced
`text-embedding-005`	Latest embeddings (768 dims)

Use gemini-2.0-flash-001 (version pinned) in production to avoid unexpected model changes.

Examples

Example 1 — Migrate a Python service from `google-cloud-aiplatform` to `google-genai`

User has a recommendation service running on Cloud Run that uses the legacy google-cloud-aiplatform SDK. Replace pip install google-cloud-aiplatform with pip install google-genai, swap vertexai.init(...) + GenerativeModel(...) for genai.Client() (with GOOGLE_GENAI_USE_VERTEXAI=true), update model.generate_content(...) to client.models.generate_content(model=..., contents=...). Keep the existing service account and IAM bindings — same auth, new SDK. Pin to gemini-2.5-flash for cost, validate parity in staging before cutover.

Example 2 — Run nightly batch translation of 5M product titles

User has 5M product titles in BigQuery to translate into 4 languages. Streaming inference would be slow and expensive. Format input as JSONL in GCS ({"request": {"contents": [...]}}) per row × language, submit a BatchPredictionJob against gemini-2.5-flash, and let it run unattended. Output JSONL lands in GCS, load it back into BigQuery via bq load. Cost is roughly half of streaming, runtime is hours not days.

Guidelines

Use google-genai for all new code — google-cloud-aiplatform and google-generativeai are deprecated.
Always pin model versions in production for stability — gemini-2.5-flash is fine for non-prod, but production should target a specific build.
Use Application Default Credentials (gcloud auth application-default login) during development.
In GKE or Cloud Run, use Workload Identity — no service account keys needed.
Default location="global" for the Gen AI SDK; pin to a region only for data residency.
Fine-tuning requires a training JSONL with messages format and at least 100 examples.
Batch prediction is cost-effective for offline bulk inference (no streaming).
Enable Cloud Audit Logs on the aiplatform.googleapis.com service for compliance.

vertex-ai-gemini

Usage

Getting Started

Example Prompts

Information

Documentation

Overview

Vertex AI vs Google AI Studio

SDK Choice — Use the Unified Gen AI SDK

Setup (Legacy SDK — only for existing code)

Instructions

Basic Gemini Inference

Multi-Modal Inference

Streaming Responses

Chat Session

Function Calling

Fine-Tuning Gemini

Batch Prediction

IAM Setup for Service Account

VPC Service Controls (Enterprise Isolation)

Available Gemini Models on Vertex AI

Examples

Example 1 — Migrate a Python service from `google-cloud-aiplatform` to `google-genai`

Example 2 — Run nightly batch translation of 5M product titles

Guidelines

vertex-ai-gemini

Usage

Getting Started

Example Prompts

Information

Documentation

Overview

Vertex AI vs Google AI Studio

SDK Choice — Use the Unified Gen AI SDK

Setup (Legacy SDK — only for existing code)

Instructions

Basic Gemini Inference

Multi-Modal Inference

Streaming Responses

Chat Session

Function Calling

Fine-Tuning Gemini

Batch Prediction

IAM Setup for Service Account

VPC Service Controls (Enterprise Isolation)

Available Gemini Models on Vertex AI

Examples

Example 1 — Migrate a Python service from google-cloud-aiplatform to google-genai

Example 2 — Run nightly batch translation of 5M product titles

Guidelines

Example 1 — Migrate a Python service from `google-cloud-aiplatform` to `google-genai`