Terminal.skills
Skills/datadog
>

datadog

Configure and manage Datadog for infrastructure monitoring, application performance monitoring (APM), log management, and alerting. Use when a user needs to set up Datadog agents, create dashboards, configure monitors and alerts, integrate services, or query metrics and logs through Datadog's API.

#datadog#monitoring#apm#logs#dashboards#alerts#observability
terminal-skillsv1.0.0
Works with:claude-codeopenai-codexgemini-clicursor
Source

Usage

$
✓ Installed datadog v1.0.0

Getting Started

  1. Install the skill using the command above
  2. Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
  3. Reference the skill in your prompt
  4. The AI will use the skill's capabilities automatically

Example Prompts

  • "Deploy the latest build to the staging environment and run smoke tests"
  • "Check the CI pipeline status and summarize any recent failures"

Documentation

Overview

Set up and manage Datadog for full-stack observability including infrastructure metrics, APM traces, log aggregation, dashboards, and alerting. Covers agent installation, integration configuration, monitor creation, and API usage.

Instructions

Task A: Install and Configure the Datadog Agent

  1. Install the agent on the target host
  2. Configure the main datadog.yaml with API key and site
  3. Enable relevant integrations
yaml
# /etc/datadog-agent/datadog.yaml — Main agent configuration
api_key: "<YOUR_DD_API_KEY>"
site: "datadoghq.com"
hostname: "web-server-01"
tags:
  - env:production
  - service:web-api
  - team:platform
logs_enabled: true
apm_config:
  enabled: true
  env: production
process_config:
  process_collection:
    enabled: true
bash
# Install Datadog Agent on Ubuntu/Debian
DD_API_KEY="<YOUR_DD_API_KEY>" DD_SITE="datadoghq.com" \
  bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"

# Verify agent status
sudo datadog-agent status

Task B: Configure Integrations

yaml
# /etc/datadog-agent/conf.d/postgres.d/conf.yaml — PostgreSQL integration
init_config:

instances:
  - host: localhost
    port: 5432
    username: datadog
    password: "<DB_PASSWORD>"
    dbname: myapp_production
    tags:
      - env:production
      - service:database
    collect_activity_metrics: true
    collect_database_size_metrics: true
yaml
# /etc/datadog-agent/conf.d/nginx.d/conf.yaml — Nginx integration
init_config:

instances:
  - nginx_status_url: http://localhost:8080/nginx_status
    tags:
      - env:production
      - service:web-proxy

Task C: Create Monitors and Alerts

bash
# Create a metric monitor via API — High CPU alert
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "name": "High CPU on {{host.name}}",
    "type": "metric alert",
    "query": "avg(last_5m):avg:system.cpu.user{env:production} by {host} > 85",
    "message": "CPU usage above 85% on {{host.name}}.\n\n@slack-ops-alerts @pagerduty-infra",
    "tags": ["env:production", "team:platform"],
    "options": {
      "thresholds": {
        "critical": 85,
        "warning": 70
      },
      "notify_no_data": true,
      "no_data_timeframe": 10,
      "renotify_interval": 30,
      "escalation_message": "CPU still elevated on {{host.name}} — escalating."
    }
  }'
bash
# Create a log-based monitor — Error rate spike
curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "name": "Error log spike in payment-service",
    "type": "log alert",
    "query": "logs(\"service:payment-service status:error\").index(\"main\").rollup(\"count\").by(\"service\").last(\"5m\") > 50",
    "message": "More than 50 error logs in 5 minutes for payment-service.\n\n@slack-payments-team",
    "options": {
      "thresholds": { "critical": 50, "warning": 25 },
      "enable_logs_sample": true
    }
  }'

Task D: Build Dashboards

bash
# Create a dashboard via API — Service overview
curl -X POST "https://api.datadoghq.com/api/v1/dashboard" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "title": "Web API Service Overview",
    "layout_type": "ordered",
    "widgets": [
      {
        "definition": {
          "type": "timeseries",
          "title": "Request Rate",
          "requests": [
            {
              "q": "sum:trace.http.request.hits{service:web-api,env:production}.as_count()",
              "display_type": "bars"
            }
          ]
        }
      },
      {
        "definition": {
          "type": "query_value",
          "title": "P99 Latency",
          "requests": [
            {
              "q": "p99:trace.http.request.duration{service:web-api,env:production}"
            }
          ],
          "precision": 2
        }
      },
      {
        "definition": {
          "type": "toplist",
          "title": "Top Endpoints by Error Rate",
          "requests": [
            {
              "q": "sum:trace.http.request.errors{service:web-api,env:production} by {resource_name}.as_count()"
            }
          ]
        }
      }
    ]
  }'

Task E: APM Instrumentation

python
# app.py — Python APM auto-instrumentation with ddtrace
from ddtrace import tracer, patch_all

# Patch all supported libraries (requests, flask, sqlalchemy, etc.)
patch_all()

tracer.configure(
    hostname="localhost",
    port=8126,
    service="payment-service",
    env="production",
    version="2.1.0",
)

from flask import Flask
app = Flask(__name__)

@app.route("/charge", methods=["POST"])
def charge():
    with tracer.trace("payment.process", service="payment-service") as span:
        span.set_tag("payment.provider", "stripe")
        result = process_payment()
        span.set_metric("payment.amount", result["amount"])
        return {"status": "ok"}
bash
# Run with ddtrace auto-instrumentation
pip install ddtrace
ddtrace-run python app.py

Task F: Log Collection and Pipelines

yaml
# /etc/datadog-agent/conf.d/python.d/conf.yaml — Custom log collection
logs:
  - type: file
    path: /var/log/myapp/*.log
    service: web-api
    source: python
    tags:
      - env:production
    log_processing_rules:
      - type: multi_line
        name: python_traceback
        pattern: "Traceback \\(most recent call last\\)"
bash
# Query logs via API — Find errors in last hour
curl -X POST "https://api.datadoghq.com/api/v2/logs/events/search" \
  -H "Content-Type: application/json" \
  -H "DD-API-KEY: ${DD_API_KEY}" \
  -H "DD-APPLICATION-KEY: ${DD_APP_KEY}" \
  -d '{
    "filter": {
      "query": "service:web-api status:error",
      "from": "now-1h",
      "to": "now"
    },
    "sort": "-timestamp",
    "page": { "limit": 25 }
  }'

Best Practices

  • Use consistent tagging: env, service, team on all resources
  • Set notify_no_data on critical monitors to catch silent failures
  • Use composite monitors to reduce alert noise by correlating signals
  • Configure log exclusion filters to control ingestion costs
  • Use Unified Service Tagging (DD_ENV, DD_SERVICE, DD_VERSION) across APM, logs, and metrics

Information

Version
1.0.0
Author
terminal-skills
Category
DevOps
License
Apache-2.0