Many organizations handling large unstructured data require a highly scalable and available tool. Apache Cassandra is the most popular tool for this task.
Apache Cassandra is an open-source NoSQL database capable of handling large volumes of unstructured data types. This tool was made an open-source project in 2008 and later owned by Apache in 2009. Cassandra works on a peer-to-peer design based on two main products i.e. DynamoDB and Google’s Big Table. With this model, all the nodes in the cluster have equal read/write permissions and no master nodes are required. The amazing feature with Cassandra is that one has the ability to add endless nodes to the cluster and expand it as per their desire.
The Apache Cassandra offers the below key features and benefits:
- Fast Writes – SInce data handled here is unstructured, you can just chuck the data into the database at ridiculous speeds.
- Highly scalable – you can add endless nodes to your cluster at any given time. Cassandra is meant to grow horizontally as much as you need it.
- Fault tolerance – Since all nodes are treated equally, when one goes down, it’s not a real big deal.
- Tunable consistency – Performance tuning can be performed on top of your typical JVM performance tuning. The table level compression options can also be configured when creating tables.
- Cassandra Query Language – SInce Cassandra is NoSQL, you can move data horizontally across the clusters easier, have the potential for massive scalability, and is not subject to the confines of joins and fixed schemas.
This guide offers a step-by-step illustration of how to install and configure Apache Cassandra on Alma Linux 8|Oracle Linux 8.
Step 1 – Update your System.
Begin by refreshing the repository cache and updating all the packages on your system.
sudo dnf update
Now install the EPEL repository on Alma Linux 8|Oracle Linux 8
sudo dnf install yum-utils
sudo dnf install epel-release
Enable PowerTools as below.
sudo dnf config-manager --set-enabled powertools
Step 2 – Install Java on Alma Linux 8|Oracle Linux 8.
Since Apache Cassandra is written in Java, we need to have Java installed on our system before we proceed. In this guide, we will install the OpenJDK 11 to offer the Java runtime environment as below.
sudo dnf install java-11-openjdk
Once installed, verify the version.
$ java --version
openjdk 11.0.14 2022-01-18 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.14+9-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.14+9-LTS, mixed mode, sharing)
Step 3 – Add Apache Cassandra Repository on Alma Linux 8|Oracle Linux 8.
The Cassandra packages are not available in the default Alma Linux 8|Oracle Linux 8 package repositories and therefore need to be added. The main benefit of installing Cassandra from the official repositories is that we are guaranteed of latest software updates using the simple update command.
Create the repository using your favorite editor.
sudo vi /etc/yum.repos.d/cassandra.repo
In the file, add the lines:
[cassandra]
name=Apache Cassandra
baseurl=https://downloads.apache.org/cassandra/redhat/40x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
This repository belongs to version 4.0 which is currently the latest version although there are older releases for Cassandra.
Save the file and update your package index.
sudo dnf update -y
Step 4 – Install Apache Cassandra on Alma Linux 8|Oracle Linux 8.
Using the added repository above, we can easily install the latest version of Apache Cassandra on Alma Linux 8|Oracle Linux 8 using the command:
sudo dnf install cassandra
Dependency Tree:
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
cassandra noarch 4.0.2-1 cassandra 45 M
Installing dependencies:
java-1.8.0-openjdk-headless-slowdebug
x86_64 1:1.8.0.322.b06-2.el8_5 powertools 36 M
java-1.8.0-openjdk-slowdebug x86_64 1:1.8.0.322.b06-2.el8_5 powertools 345 k
Transaction Summary
================================================================================
Install 3 Packages
Total download size: 81 M
Installed size: 194 M
Is this ok [y/N]: y
Accept the GPG key importation and proceed with the installation.
Step 5 – Start and Enable the Cassandra service.
Once installed, we are required to start and enable the Cassandra service to run automatically on system boot. This can be done using the commands below:
sudo service cassandra start
sudo systemctl enable cassandra
Verify if the service is running:
$ systemctl status cassandra
● cassandra.service - LSB: distributed storage system for structured data
Loaded: loaded (/etc/rc.d/init.d/cassandra; generated)
Active: active (running) since Wed 2022-02-16 05:16:34 EST; 11s ago
Docs: man:systemd-sysv-generator(8)
Main PID: 32646 (java)
Tasks: 16 (limit: 36433)
Memory: 1.7G
CGroup: /system.slice/cassandra.service
└─32646 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-2.el8_5.x86_64-slowdebug/jre/bin/java -ea -da:net.openhft... -XX:+UseThrea>
You can also verify if Cassandra is running on localhost:9042 as below. Remember to wait until Cassandra has finished loading all the modules.
nodetool status
Output:
Step 6 – Install the Cassandra Query Language.
To be able to interact with Cassandra, we need the cqlsh
tool which is compatible with Python 2.7 or Python 3.6+. In this guide, we will go for Python 3.8 installed as below.
sudo dnf install python38
If you have multiple versions. you may be required to set the default Python version as below.
$ sudo update-alternatives --config python3
There are 2 programs which provide 'python3'.
Selection Command
-----------------------------------------------
*+ 1 /usr/bin/python3.6
2 /usr/bin/python3.8
Enter to keep the current selection[+], or type selection number: 2
Check the Python version.
$ python3 --version
Python 3.8.8
Now using PIP, install the cqlsh tool.
pip3 install --user cqlsh
Sample output:
Collecting cqlsh
Downloading https://files.pythonhosted.org/packages/af/62/88bf9200252158871843a1f65c5215a5480f64817b663ff8ece41ad0f977/cqlsh-6.0.0-py3-none-any.whl (106kB)
|████████████████████████████████| 112kB 12.3MB/s
Collecting cql
Downloading https://files.pythonhosted.org/packages/0b/15/523f6008d32f05dd3c6a2e7c2f21505f0a785b6dc8949cad325306858afc/cql-1.4.0.tar.gz (76kB)
|████████████████████████████████| 81kB 2.9MB/s
Collecting six
Downloading https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
Collecting cassandra-driver
Downloading https://files.pythonhosted.org/packages/0b/c6/77ffe96b897a6dbf867847bf1c8ebf72ca9881fffbc08c06a206a33ce1e1/cassandra_driver-3.25.0-cp38-cp38-manylinux1_x86_64.whl (3.6MB)
|████████████████████████████████| 3.6MB 51.2MB/s
Collecting thrift
Downloading https://files.pythonhosted.org/packages/6e/97/a73a1a62f62375b21464fa45a0093ef0b653cb14f7599cffce35d51c9161/thrift-0.15.0.tar.gz (59kB)
|████████████████████████████████| 61kB 1.8MB/s
Collecting geomet<0.3,>=0.1
Downloading https://files.pythonhosted.org/packages/c9/81/156ca48f950f833ddc392f8e3677ca50a18cb9d5db38ccb4ecea55a9303f/geomet-0.2.1.post1-py3-none-any.whl
Collecting click
......
Verify the installation.
$ cqlsh --version
cqlsh 6.0.0
Step 7 – Configure Apache Cassandra on Alma Linux 8|Oracle Linux 8.
The Apache Cassandra configuration files are located under /etc/cassandra, Java start-up can be configured under /etc/default/cassandra.
7.1. Configure Storage
This step is for those who wish to configure a secondary disk to serve as the Apache Cassandra storage. By default, Apache Cassandra stores its data is at /var/lib/cassandra. However in this guide, we will configure this storage by mounting an external disk on this path for the data storage.
First, identify the secondary attached storage.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 20G 0 disk
sr0 11:0 1 1024M 0 rom
vda 252:0 0 40G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 39G 0 part
├─almalinux-root 253:0 0 35G 0 lvm /
└─almalinux-swap 253:1 0 4G 0 lvm [SWAP]
Here, the secondary disk is /dev/sda. Format the disk to EXT4 using the mkfs
command
sudo mkfs.ext4 /dev/sda
Sample output
mke2fs 1.45.6 (20-Mar-2020)
Discarding device blocks: done
Creating filesystem with 5242880 4k blocks and 1310720 inodes
Filesystem UUID: 5c3c4032-637b-4e07-9772-83fe0425a6bd
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
Verify if the partition has been created.
$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda ext4 5c3c4032-637b-4e07-9772-83fe0425a6bd
sr0
vda
├─vda1 xfs d64815a0-ceaa-42d9-a5c0-075079daf099 /boot
└─vda2 LVM2_member HH9t2V-12NT-iKyk-sEST-HbsT-Auf1-n14VM1
├─almalinux-root
│ xfs 7872878f-1b01-4717-97e2-1f045e3685e9 /
└─almalinux-swap
swap dbf263b7-aa5a-44d9-8a19-4c1c7adfa966 [SWAP]
Now we will mount this disk to /var/lib/cassandra as below.
sudo cp -r /var/lib/cassandra /var/lib/cassandra.bak
sudo mount /dev/sda /var/lib/cassandra
Verify the mounting:
$ df -hT -P /var/lib/cassandra
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda ext4 20G 45M 19G 1% /var/lib/cassandra
Restore the backup and set the right permissions.
sudo mv /var/lib/cassandra.bak/* /var/lib/cassandra
sudo chown -R cassandra:cassandra /var/lib/cassandra
Now configure permanent mounting as below.
$ sudo vim /etc/fstab
/dev/sda /var/lib/cassandra ext4 defaults 0 0
Now we have secondary storage configured as the Apache Cassandra datastore. For the changes to apply, restart Cassandra
sudo systemctl restart cassandra
7.2. Change the cluster name
After the configurations have been made, switch to the CQL Shell using the command:
cqlsh
Sample output:
Change the cluster name using the below steps:
First, run the command below in the CQL Shell
UPDATE system.local SET cluster_name = 'My CLuster' WHERE KEY = 'local';
Exit the shell:
exit;
Now edit the Cassandra file below.
sudo vi /etc/cassandra/default.conf/cassandra.yaml
Replace the Cluster name with the set name as below.
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'My CLuster'
Now restart the service.
nodetool flush system
sudo systemctl restart cassandra
Verify if the changes have been made.
cqlsh
Once in the shell, use the command below to check the cluster name.
DESC CLUSTER
Sample Output:
7.3. Enable User Authentication
We will begin by taking a backup of the available file before we edit it.
sudo cp /etc/cassandra/conf/cassandra.yaml /etc/cassandra/conf/cassandra.yaml.backup
Now open the file:
sudo vi /etc/cassandra/conf/cassandra.yaml
To enable user authentication, make the below changes:
.....
authenticator: org.apache.cassandra.auth.PasswordAuthenticator
.....
authorizer: org.apache.cassandra.auth.CassandraAuthorizer
......
roles_validity_in_ms: 0
......
permissions_validity_in_ms: 0
.......
Save the file and restart Cassandra.
sudo systemctl restart cassandra
7.4 . Create an Admin user for your Database
Begin by logging in to the shell with the default user credentials as below:
cqlsh -u cassandra -p cassandra
Now create a user with the command below, replacing appropriately:
CREATE ROLE user1 WITH PASSWORD = 'Passw0rd' AND SUPERUSER = true AND LOGIN = true;
Remember to replace user1 and Passw0rd with the preferred user credentials. Once created, exit the shell.
exit;
Now try logging in using the created user.
cqlsh -u user1 -p Passw0rd
Sample Output:
Once in the shell, you can disable the default superuser rights.
ALTER ROLE cassandra WITH PASSWORD = 'cassandra' AND SUPERUSER = false AND LOGIN = false;
Now grant all permissions to the created user.
GRANT ALL PERMISSIONS ON ALL KEYSPACES TO 'user1';
exit;
7.5. Access Apache Cassandra Remotely.
By default, Apache Cassandra is set to listen on localhost. However, you can configure it to be accessed remotely by making adjustments to the config file as below.
sudo vi /etc/cassandra/default.conf/cassandra.yaml
In the file, make the below changes:
# For security reasons, you should not expose this port to the internet. Firewall it if needed.
rpc_address: 192.168.205.3
Save the file and restart Cassandra.
sudo systemctl restart cassandra
Verify if the service is listening on the set IP address:
$ sudo ss -plunt|grep 9042
tcp LISTEN 0 128 192.168.205.3:9042 0.0.0.0:* users:(("java",pid=39432,fd=261))
Allow the port through the firewall.
sudo firewall-cmd --permanent --add-port=9042/tcp
sudo firewall-cmd --reload
Now test the connection on a remote system with cqlsh installed.
cqlsh -u user1 192.168.205.3
Sample Output:
That is it!
Closing Thoughts
We have triumphantly walked through how to install and configure Apache Cassandra on Alma Linux 8|Oracle Linux 8. Furthermore, we have configured a secondary disk as the Apache Cassandra datastore and enabled remote access. You can now proceed and perform horizontal scaling on Cassandra. I hope this was significant.
See more: