On this page
This document outlines the foundational skills required to deploy and operate CockroachDB in production environments.
The skills are organized into sections based on the following operational domains:
Each section includes links to relevant documentation for the listed skills.
Tip:
Cockroach Labs offers Professional Services that can assist you with getting applications into production faster and more efficiently.
Configuration
- Verify vCPU, RAM, storage, and disk IOPS performance
- Configure time synchronization with NTP server
- Validate network connectivity
Security
- Create and distribute certificates; initialize cluster
- Configure load balancer and direct a workload
- Configure RBAC
- Encryption at rest
Cluster maintenance
- Shut down a node gracefully
- Handling unplanned node outages
- Adding nodes
- Removing nodes
- Add a region
- Remove a region
- Rolling upgrades
- Downgrade a cluster from a patch version
- Downgrade a cluster from a major version
- Change a cluster setting
- Cluster repaving involves the following individual skills, which are also used during rolling upgrades:
- Shut down a node gracefully
- Detach the persistent volume (a.k.a. persistent disk) from the removed node's virtual machine (VM) (this step is optional but recommended)
- Delete the removed node's VM
- Start a new VM
- Reattach the persistent disk to the new VM (necessary if you did step #2)
- Add a node to the cluster from the new VM
Troubleshooting
- SQL response time for specific queries
- SQL throughput degradation across the board
- Cluster instability: Dead/suspect nodes
- Out of memory problems
- Imbalanced cluster load
- EOF errors
- Changefeed is falling behind
- Get a "debug zip" file
- Get a "tsdump" (timeseries dump) file
Disaster recovery
- Create AWS IAM access key
- Create S3 bucket for backup data
- Full cluster backup to S3
- Incremental backup to S3
- Cluster restore from AWS S3