Name: cassandra
Author: terminal-skills

Apache Cassandra is a peer-to-peer distributed database that provides high availability with no single point of failure. Data is distributed across nodes using consistent hashing.

Installation

bash

# Docker (recommended)
docker run -d --name cassandra -p 9042:9042 cassandra:4

# Wait for startup then connect with cqlsh
docker exec -it cassandra cqlsh

# Node.js driver
npm install cassandra-driver

# Python driver
pip install cassandra-driver

CQL Basics

sql

-- keyspace.cql: Create keyspace with replication strategy
CREATE KEYSPACE IF NOT EXISTS myapp
  WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'datacenter1': 3
  }
  AND durable_writes = true;

USE myapp;

Data Modeling

sql

-- tables.cql: Design tables around query patterns (partition key + clustering key)
-- Rule: one table per query pattern

-- Users by email (partition key: email)
CREATE TABLE users (
  email text PRIMARY KEY,
  name text,
  created_at timestamp
);

-- Posts by user, ordered by time (partition: user_id, clustering: created_at DESC)
CREATE TABLE posts_by_user (
  user_id uuid,
  created_at timestamp,
  post_id uuid,
  title text,
  body text,
  PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Time-series: sensor readings bucketed by day
CREATE TABLE sensor_readings (
  sensor_id text,
  day text,
  reading_time timestamp,
  value double,
  PRIMARY KEY ((sensor_id, day), reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);

CRUD Operations

sql

-- crud.cql: Basic insert, select, update, delete
INSERT INTO users (email, name, created_at)
VALUES ('alice@example.com', 'Alice', toTimestamp(now()));

SELECT * FROM users WHERE email = 'alice@example.com';

-- Query with partition and clustering key
SELECT * FROM posts_by_user
WHERE user_id = 550e8400-e29b-41d4-a716-446655440000
  AND created_at > '2026-01-01'
LIMIT 20;

UPDATE users SET name = 'Alice Smith' WHERE email = 'alice@example.com';

DELETE FROM users WHERE email = 'alice@example.com';

-- Batch for atomicity within a partition
BEGIN BATCH
  INSERT INTO posts_by_user (user_id, created_at, post_id, title) VALUES (?, ?, ?, ?);
  UPDATE user_stats SET post_count = post_count + 1 WHERE user_id = ?;
APPLY BATCH;

Node.js Driver

javascript

// db.js: Cassandra client with DataStax Node.js driver
const { Client, types } = require('cassandra-driver');

const client = new Client({
  contactPoints: ['localhost'],
  localDataCenter: 'datacenter1',
  keyspace: 'myapp',
  queryOptions: { consistency: types.consistencies.localQuorum },
});

async function main() {
  await client.connect();

  // Insert
  await client.execute(
    'INSERT INTO users (email, name, created_at) VALUES (?, ?, ?)',
    ['bob@example.com', 'Bob', new Date()],
    { prepare: true }
  );

  // Query
  const result = await client.execute(
    'SELECT * FROM users WHERE email = ?',
    ['bob@example.com'],
    { prepare: true }
  );
  console.log(result.rows[0]);

  // Paginated query
  const query = 'SELECT * FROM posts_by_user WHERE user_id = ?';
  for await (const row of client.stream(query, [userId], { prepare: true })) {
    console.log(row.title);
  }

  await client.shutdown();
}

main().catch(console.error);

Python Driver

python

# app.py: Cassandra with Python DataStax driver
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement, ConsistencyLevel

cluster = Cluster(['localhost'])
session = cluster.connect('myapp')

# Insert
session.execute(
    "INSERT INTO users (email, name, created_at) VALUES (%s, %s, toTimestamp(now()))",
    ('alice@example.com', 'Alice')
)

# Query with consistency level
stmt = SimpleStatement(
    "SELECT * FROM users WHERE email = %s",
    consistency_level=ConsistencyLevel.LOCAL_QUORUM
)
row = session.execute(stmt, ('alice@example.com',)).one()
print(row.name)

cluster.shutdown()

Replication and Consistency

Consistency Levels:
- ONE: Fast, low consistency. Good for logs/metrics.
- QUORUM: Majority of replicas. Balanced read/write.
- LOCAL_QUORUM: Majority in local datacenter. Best for multi-DC.
- ALL: All replicas must respond. Slowest, strongest consistency.

Rule of thumb: Write CL + Read CL > Replication Factor = strong consistency
Example: RF=3, Write=QUORUM(2), Read=QUORUM(2) → 2+2 > 3 ✓

Operations

bash

# nodetool.sh: Common operational commands
# Check cluster status
docker exec cassandra nodetool status

# Check ring token distribution
docker exec cassandra nodetool ring

# Repair data (run regularly)
docker exec cassandra nodetool repair myapp

# Compact SSTables
docker exec cassandra nodetool compact myapp posts_by_user

# Take a snapshot backup
docker exec cassandra nodetool snapshot myapp -t backup_20260219

cassandra

Usage

Getting Started

Example Prompts

Information

Documentation