Migrating Cassandra Database Snapshots Between Clusters

You can use the restore.sh script to facilitate the migration of Cassandra database snapshots from one cluster to another. It specifically handles the transfer of data and schema information associated with keyspaces. This document outlines detailed instructions for using the script.

note

For more information on how this script works, refer to Bulk Loading in the Cassandra documentation.

Prerequisites

Before using this script, ensure the following conditions are met:

Distinct Clusters: The source and target for the snapshot must be different Cassandra clusters. Importing a snapshot back into the same cluster, even under a different keyspace, is not supported and will not work.
Cassandra Installation: Both the source and the target clusters must have Cassandra installed and properly configured.
Network Connectivity: There must be network connectivity between the machine where the script is run and the target Cassandra cluster.
Access Permissions: The user must have read permissions on the source cluster's data directory and sufficient privileges to execute schema changes and data imports on the target cluster.
Snapshot Creation: A snapshot must be created on the source cluster using nodetool snapshot. It is important to specify a name/tag for the snapshot using the -t option. For example:
```
nodetool snapshot hq -t migratesnapshot
```

Snapshot Files: After running the snapshot command, get the keyspace data directory on the target system. For example:

cd /var/lib/cassandra/data/
tar zcvf cassandra-hq-data.tar.gz hq # hq is the name of the keyspace
# copy this tar gz file and extract on the system that the restore.sh script will run

Keyspace on Target Cluster: The target keyspace should be created on the target cluster before executing the script. For example:
```
CREATE KEYSPACE hq WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
```
note
The exact command in this step will be based on the replication strategy you use.

Usage

To run the script, use the following command syntax:

./restore.sh <cassandra_username> <cassandra_password> <cassandra_host> <cassandra_port> <cassandra_storage_port> <cassandra_keyspace> <cassandra_data_dir> <cassandra_snapshot_name> [target_dir]

Parameters

cassandra_username: Username for the Cassandra database.
cassandra_password: Password for the Cassandra database.
cassandra_host: Hostname or IP address of the target Cassandra node.
cassandra_port: Port on which the target Cassandra CQL service is running.
cassandra_storage_port: Storage port used for inter-node communication in the target Cassandra cluster.
cassandra_keyspace: The keyspace to which data will be migrated.
cassandra_data_dir: Directory where Cassandra data is stored on the source cluster.
cassandra_snapshot_name: Name of the snapshot to migrate (the same name used when the snapshot was created).
[target_dir]: (Optional) The directory to temporarily store data during migration. If not specified, a temporary directory will be created automatically.

Steps

Prepare the Environment:
Ensure that all prerequisites are met, including snapshot creation and keyspace configuration on the target cluster.
Execute the Script:
Run the script using the command provided above. The script will perform the following actions:
- Copy snapshot data from the source to a specified or automatically generated temporary directory.
- Restore the schema on the target cluster using cqlsh.
- Use sstableloader to load the data into the target Cassandra cluster.
Monitor Output:
Pay close attention to the script’s output. It provides information about the progress and will notify you of any errors encountered during execution.

Error Handling

If errors occur:

Review the error messages provided by the script for guidance on what went wrong.
Ensure that the Cassandra credentials provided are correct and that the specified user has the necessary permissions.
Verify that all specified ports are open and accessible from the machine where the script is running.

Post-Migration

After the script successfully completes, verify the data integrity and consistency in the target cluster by running appropriate queries via cqlsh.

Appendix A: Restore.sh

#!/bin/bash

# Check for correct input arguments
if [[ $# -lt 8 ]]; then
    echo "Usage: $0 <cassandra_username> <cassandra_password> <cassandra_host> <cassandra_port> <cassandra_storage_port> <cassandra_keyspace> <cassandra_data_dir> <cassandra_snapshot_name> [target_dir]"
    exit 1
fi

# Cassandra credentials and host
CASSANDRA_USERNAME="$1"
CASSANDRA_PASSWORD="$2"
CASSANDRA_HOST="$3"
CASSANDRA_PORT="$4"
CASSANDRA_STORAGE_PORT="$5"
CASSANDRA_KEYSPACE="$6"
CASSANDRA_BASE_DIR="$7"
CASSANDRA_SNAPSHOT_NAME="$8"

# Determine the working directory
if [[ -n "$9" ]]; then
    WORK_DIR="$9"
else
    # Create a temporary directory if no work_dir is specified
    WORK_DIR=$(mktemp -d)
fi

if [[ -d "$WORK_DIR" ]]; then
    rm -rf "$WORK_DIR"
fi

BASE_DIR="$CASSANDRA_BASE_DIR/$CASSANDRA_KEYSPACE"
TARGET_DIR="$WORK_DIR/$CASSANDRA_KEYSPACE"

mkdir -p "$TARGET_DIR"

find "$BASE_DIR" -name "$CASSANDRA_SNAPSHOT_NAME" -type d | while read snapshot_dir; do
    path_part="${snapshot_dir%/*}"
    path_part="${path_part%/*}"
    table_with_uuid="${path_part##*/}"

    table_name="${table_with_uuid%-*}"

    if [[ -n "$table_name" ]]; then
        echo "Loading table schema for $table_name..."
        
        table_target="$TARGET_DIR/$table_name"
        
        mkdir -p "$table_target"
        
        cp -a "$snapshot_dir/." "$table_target/"
        
        if [[ -f "$table_target/schema.cql" ]]; then
            cqlsh -u "$CASSANDRA_USERNAME" -p "$CASSANDRA_PASSWORD" -k "$CASSANDRA_KEYSPACE" -f "$table_target/schema.cql" "$CASSANDRA_HOST" "$CASSANDRA_PORT"
            if [ $? -ne 0 ]; then
                echo "Error: Schema execution failed for $table_name."
                exit 1
            fi
        else
            echo "No schema.sql found for $table_name, skipping."
            exit 1
        fi
    else
        echo "Failed to extract table name from directory: $snapshot_dir"
        exit 1
    fi
done

find "$BASE_DIR" -name "$CASSANDRA_SNAPSHOT_NAME" -type d | while read snapshot_dir; do
    path_part="${snapshot_dir%/*}"
    path_part="${path_part%/*}"
    table_with_uuid="${path_part##*/}"

    table_name="${table_with_uuid%-*}"

    if [[ -n "$table_name" ]]; then
        echo "Copying snapshot for $table_name to $TARGET_DIR/$table_name..."
        
        table_target="$TARGET_DIR/$table_name"

        echo "sstableloader -d "$CASSANDRA_HOST" -p "$CASSANDRA_PORT" -sp "$CASSANDRA_STORAGE_PORT" -u "$CASSANDRA_USERNAME" -pw "$CASSANDRA_PASSWORD" -k "$CASSANDRA_KEYSPACE" "$table_target""
        sstableloader -d "$CASSANDRA_HOST" -p "$CASSANDRA_PORT" -sp "$CASSANDRA_STORAGE_PORT" -u "$CASSANDRA_USERNAME" -pw "$CASSANDRA_PASSWORD" -k "$CASSANDRA_KEYSPACE" "$table_target"
        if [ $? -ne 0 ]; then
            echo "Error: Data loading failed for $table_name."
            exit 1
        fi
    fi
done

echo "All snapshots have been copied and schemas executed where applicable."

Migrating Cassandra Database Snapshots Between Clusters

Prerequisites​

Usage​

Parameters​

Steps​

Error Handling​

Post-Migration​

Appendix A: Restore.sh​