>
datadog
Configure and manage Datadog for infrastructure monitoring, application performance monitoring (APM), log management, and alerting. Use when a user needs to set up Datadog agents, create dashboards, configure monitors and alerts, integrate services, or query metrics and logs through Datadog's API.
#datadog#monitoring#apm#logs#dashboards#alerts#observability
terminal-skillsv1.0.0
Works with:claude-codeopenai-codexgemini-clicursor
Usage
$
✓ Installed datadog v1.0.0
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Deploy the latest build to the staging environment and run smoke tests"
- "Check the CI pipeline status and summarize any recent failures"
Documentation
Overview
Set up and manage Datadog for full-stack observability including infrastructure metrics, APM traces, log aggregation, dashboards, and alerting. Covers agent installation, integration configuration, monitor creation, and API usage.
Instructions
Task A: Install and Configure the Datadog Agent
- Install the agent on the target host
- Configure the main
datadog.yamlwith API key and site - Enable relevant integrations
yaml
# /etc/datadog-agent/datadog.yaml — Main agent configuration
api_key: "<YOUR_DD_API_KEY>"
site: "datadoghq.com"
hostname: "web-server-01"
tags:
- env:production
- service:web-api
- team:platform
logs_enabled: true
apm_config:
enabled: true
env: production
process_config:
process_collection:
enabled: true
bash
# Install Datadog Agent on Ubuntu/Debian
DD_API_KEY="<YOUR_DD_API_KEY>" DD_SITE="datadoghq.com" \
bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"
# Verify agent status
sudo datadog-agent status
Task B: Configure Integrations
yaml
# /etc/datadog-agent/conf.d/postgres.d/conf.yaml — PostgreSQL integration
init_config:
instances:
- host: localhost
port: 5432
username: datadog
password: "<DB_PASSWORD>"
dbname: myapp_production
tags:
- env:production
- service:database
collect_activity_metrics: true
collect_database_size_metrics: true
yaml
# /etc/datadog-agent/conf.d/nginx.d/conf.yaml — Nginx integration
init_config:
instances:
- nginx_status_url: http://localhost:8080/nginx_status
tags:
- env:production
- service:web-proxy
Task C: Create Monitors and Alerts
bash
# Create a metric monitor via API — High CPU alert
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
-d '{
"name": "High CPU on {{host.name}}",
"type": "metric alert",
"query": "avg(last_5m):avg:system.cpu.user{env:production} by {host} > 85",
"message": "CPU usage above 85% on {{host.name}}.\n\n@slack-ops-alerts @pagerduty-infra",
"tags": ["env:production", "team:platform"],
"options": {
"thresholds": {
"critical": 85,
"warning": 70
},
"notify_no_data": true,
"no_data_timeframe": 10,
"renotify_interval": 30,
"escalation_message": "CPU still elevated on {{host.name}} — escalating."
}
}'
bash
# Create a log-based monitor — Error rate spike
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
-d '{
"name": "Error log spike in payment-service",
"type": "log alert",
"query": "logs(\"service:payment-service status:error\").index(\"main\").rollup(\"count\").by(\"service\").last(\"5m\") > 50",
"message": "More than 50 error logs in 5 minutes for payment-service.\n\n@slack-payments-team",
"options": {
"thresholds": { "critical": 50, "warning": 25 },
"enable_logs_sample": true
}
}'
Task D: Build Dashboards
bash
# Create a dashboard via API — Service overview
curl -X POST "https://api.datadoghq.com/api/v1/dashboard" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
-d '{
"title": "Web API Service Overview",
"layout_type": "ordered",
"widgets": [
{
"definition": {
"type": "timeseries",
"title": "Request Rate",
"requests": [
{
"q": "sum:trace.http.request.hits{service:web-api,env:production}.as_count()",
"display_type": "bars"
}
]
}
},
{
"definition": {
"type": "query_value",
"title": "P99 Latency",
"requests": [
{
"q": "p99:trace.http.request.duration{service:web-api,env:production}"
}
],
"precision": 2
}
},
{
"definition": {
"type": "toplist",
"title": "Top Endpoints by Error Rate",
"requests": [
{
"q": "sum:trace.http.request.errors{service:web-api,env:production} by {resource_name}.as_count()"
}
]
}
}
]
}'
Task E: APM Instrumentation
python
# app.py — Python APM auto-instrumentation with ddtrace
from ddtrace import tracer, patch_all
# Patch all supported libraries (requests, flask, sqlalchemy, etc.)
patch_all()
tracer.configure(
hostname="localhost",
port=8126,
service="payment-service",
env="production",
version="2.1.0",
)
from flask import Flask
app = Flask(__name__)
@app.route("/charge", methods=["POST"])
def charge():
with tracer.trace("payment.process", service="payment-service") as span:
span.set_tag("payment.provider", "stripe")
result = process_payment()
span.set_metric("payment.amount", result["amount"])
return {"status": "ok"}
bash
# Run with ddtrace auto-instrumentation
pip install ddtrace
ddtrace-run python app.py
Task F: Log Collection and Pipelines
yaml
# /etc/datadog-agent/conf.d/python.d/conf.yaml — Custom log collection
logs:
- type: file
path: /var/log/myapp/*.log
service: web-api
source: python
tags:
- env:production
log_processing_rules:
- type: multi_line
name: python_traceback
pattern: "Traceback \\(most recent call last\\)"
bash
# Query logs via API — Find errors in last hour
curl -X POST "https://api.datadoghq.com/api/v2/logs/events/search" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: ${DD_API_KEY}" \
-H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
-d '{
"filter": {
"query": "service:web-api status:error",
"from": "now-1h",
"to": "now"
},
"sort": "-timestamp",
"page": { "limit": 25 }
}'
Best Practices
- Use consistent tagging:
env,service,teamon all resources - Set
notify_no_dataon critical monitors to catch silent failures - Use composite monitors to reduce alert noise by correlating signals
- Configure log exclusion filters to control ingestion costs
- Use Unified Service Tagging (
DD_ENV,DD_SERVICE,DD_VERSION) across APM, logs, and metrics
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- DevOps
- License
- Apache-2.0