Introduction
Cassandra is software for managing NoSQL databases. Organizations use it to handle large volumes of data in a distributed fashion. The popularity of this software increased due to being highly available and fault-tolerant.
To achieve this, Cassandra moved from the concept of master or named nodes to symmetric P2P distributed nodes. Each node in a cluster has one or more keyspaces that contain data.
In this guide, learn what a keyspace is, its components, and how to create, alter, and drop a keyspace.
Prerequisites
- Cassandra installed on your system
- Access to a terminal or command line
- Necessary permissions to run CQL commands
What is a Keyspace in Cassandra?
A keyspace is a data container in Cassandra, similar to a database in relational database management systems (RDMBS). A cluster has one keyspace per application, as many as needed, depending on requirements and system usage. Keyspaces are entirely separate entities, and the data they contain is unrelated to each other.
In a Cassandra cluster, a keyspace is an outermost object that determines how data replicates on nodes. Keyspaces consist of core objects called column families (which are like tables in RDBMS), rows indexed by keys, data types, data center awareness, replication factor, and keyspace strategy.
Note: Learn more about non-relational databases, how they work, and their core features by reading What is NoSQL.
Cassandra Keyspace Components
There are some essential keyspace components you need to specify when you create a keyspace. These components are:
Replication Strategy
When defining a keyspace, the replication strategy specifies the nodes where replicas will be placed. By using multiple nodes to place replicas, you achieve fault tolerance, high availability, and reliability.
There are two possible strategies:
- Simple Strategy. Use this strategy for test and development environments, and if you do not intend to deploy a cluster to more than one data center. The replication factor applies to the whole cluster. The partitioner decides where to put the first replica on a node. Then, other replicas are distributed clockwise on the next nodes irrespective of data center or location.
- Network Topology Strategy. This strategy is suitable when you need to deploy your cluster to multiple data centers. However, you can use it even with a single data center so you can expand later. Network Topology Strategy works for both production and development. It tends to place replicas on nodes that are not in the same rack to avoid issues when one rack goes down. Each data center can have a separate replication factor by using this option.
Replication Factor
This setting defines how many replicas of a row to store on each node.
The minimum should be two replicas per data center. This means that the failure of one node does not impact the operation of a replication group. Therefore, the recommended setting is to have three copies of each row on different nodes to achieve satisfactory fault tolerance.
The rule of thumb is to keep the replication factor the same as the number of nodes.
Basic Keyspace Syntax
You can create a keyspace with different replication settings. Below is the basic syntax for creating a keyspace:
CREATE KEYSPACE keypsace_name WITH replication = {properties};
The properties include different settings such as replication strategy, factor, or durable writes.
Note: CQL commands end with a semicolon (;
). If you do not use a semicolon at the end of a query, the system will wait for additional input.
Create a Keyspace Using Cqlsh
To create a keyspace, launch the CQL shell:
cqlsh
Then, following the basic syntax, create a keyspace with the desired name and replication settings.
In this case, we will create test_keyspace with SimpleStrategy and replication_factor 3:
CREATE KEYSPACE test_keyspace
WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3};
Use the example above when you do not intend to expand to multiple data centers. Additionally, if you only have one node and you are using Cassandra for testing, you can set replication_factor to 1.
For production environments and multiple data centers, create a keyspace with the network topology replication strategy.
To do so, enter:
CREATE KEYSPACE keyspace_network_topology
WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1' : 3};
The default datacenter name is datacenter1. To check the name of your datacenter, close the CQL shell and use nodetool:
nodetool status
If you have multiple datacenters, list them all in the query with the respective replication factors.
For example, the query for two datacenters looks like this:
CREATE KEYSPACE keyspace_network_topology
WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2' : 3};
Verify Keyspace
Since there is no response in the output for successful keyspace creation, use this command to verify the keyspace is on the list:
DESCRIBE KEYSPACES;
The system returns a list of all available Cassandra keyspaces. We highlighted the two keyspaces we created in the examples above. There are a couple of default keyspaces that come with the Cassandra installation.
Disable Durable Writes
In Cassandra, the durable_writes configuration is true by default. You may disable it, but only for the NetworkTopologyStrategy. This option tells Cassandra if it should use commitlog to make updates in the selected keyspace.
When you try to disable durable_writes when creating a keyspace with SimpleStrategy, you get a warning not to do it. The reason is that you can lose your data if you did not sync the data from memtable to sstable, and your data center fails.
To disable durable_writes while creating a keyspace, enter this query:
CREATE KEYSPACE keyspace_durwrites
WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1' : 3}
AND DURABLE_WRITES = false;
Verify Durable Writes
You can check the query that was used during keyspace creation by describing the keyspace. The durable_writes part also appears:
DESCRIBE keyspace_durwrites
To check the durable_writes settings for all keyspaces, query system_schema:
SELECT * FROM system_schema.keyspaces;
The output shows all keyspaces and their settings, including durable_writes.
Using Keyspace
To select a keyspace in Cassandra and perform actions on it, use the keyword USE
.
The syntax is:
USE keyspace_name
For example:
USE keyspace_durwrites;
The CQL shell switches to the name of the keyspace you specified. To change the current keyspace, use the same command with another name.
Note: Whenever you create a table in Cassandra, you start by defining the keyspace.
Alter Keyspace
After you create a keyspace, you can change the configuration using the keyword ALTER
.
The only thing you cannot change is the keyspace name. Other than that you can alter the replication strategy, replication factor, and durable writes.
To alter a keyspace, follow the same syntax as when creating it, but use ALTER
instead of CREATE
. Change the values you want.
For example:
ALTER KEYSPACE keyspace_durwrites
WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1' : 2}
AND DURABLE_WRITES = true;
To verify that the changes took effect, use the DESCRIBE
keyword:
The image above shows the keyspace configuration before and after the change.
Drop or Delete Keyspace
If you drop a keyspace, it will be deleted from the system. The DROP
keyword removes all column families from the keyspace, as well as indexes and data types.
To delete a keyspace in Cassandra, use this syntax:
DROP keyspace_name;
For example:
DROP keyspace_durwrites;
To make sure you deleted the keyspace, use the DESCRIBE
query.
Note: To understand Cassandra better, learn about Cassandra data types.
Conclusion
By following the steps in this guide, you should be able to create a keyspace in Cassandra successfully. The examples in this guide showed you how to create a keyspace for different environments and with different settings.
We also showed you how to alter and drop a keyspace should you need to make any changes.