Wednesday, October 9, 2024
Google search engine
HomeData Modelling & AIInstall Apache Cassandra on CentOS 8 | Rocky Linux 8

Install Apache Cassandra on CentOS 8 | Rocky Linux 8

How can I install Apache Cassandra on CentOS 8 | Rocky Linux 8 machine?. Apache Cassandra is a free and open-source NoSQL database management system designed to be distributed and highly available. Cassandra can handle large amounts of data across many commodity servers without any single point of failure.

This guide will walk you through the installation of Cassandra on CentOS 8 | Rocky Linux 8. After installation is done, we’ll proceed to do configurations and tuning of Cassandra to work with machines having minimal resources available.

Features of Cassandra

Cassandra provides the Cassandra Query Language (CQL), an SQL-like language,
to create and update database schema and access data. CQL allows users to
organize data within a cluster of Cassandra nodes using:

  • Keyspace: defines how a dataset is replicated, for example in which
    datacenters and how many copies. Keyspaces contain tables.
  • Table: defines the typed schema for a collection of partitions. Cassandra
    tables have flexible addition of new columns to tables with zero downtime.
    Tables contain partitions, which contain partitions, which contain columns.
  • Partition: defines the mandatory part of the primary key all rows in
    Cassandra must have. All performant queries supply the partition key in
    the query.
  • Row: contains a collection of columns identified by a unique primary key
    made up of the partition key and optionally additional clustering keys.
  • Column: A single datum with a type which belong to a row.

Cassandra has support for the following client drivers:

  • Java
  • Python
  • Ruby
  • C# / .NET
  • Nodejs
  • PHP
  • C++
  • Scala
  • Clojure
  • Erlang
  • Go
  • Haskell
  • Rust
  • Perl
  • Elixir
  • Dart

Install Apache Cassandra on CentOS 8 | Rocky Linux 8

Java is required for running Cassandra on CentOS 8 | Rocky Linux 8. But let’s first update and reboot the system.

sudo dnf -y update

Perform a system reboot after the upgrade.

sudo reboot

Step 1: Install Java and Python and cqlsh

Install Python3 Pip and OpenJDK 8 on your CentOS / Rocky Linux 8:

sudo dnf -y install python3 python3-pip java-11-openjdk java-11-openjdk-devel

Install cqsh using pip3 Python package manager:

sudo pip3 install cqlsh tox

Ensure the install is successful:

.....
Collecting py>=1.4.17 (from tox)
  Downloading https://files.pythonhosted.org/packages/f6/f0/10642828a8dfb741e5f3fbaac830550a518a775c7fff6f04a007259b0548/py-1.11.0-py2.py3-none-any.whl (98kB)
    100% |████████████████████████████████| 102kB 9.5MB/s
Collecting toml>=0.10.2; python_version <= "3.6" (from tox)
  Downloading https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl
Collecting filelock>=3.0.0 (from tox)
  Downloading https://files.pythonhosted.org/packages/84/ce/8916d10ef537f3f3b046843255f9799504aa41862bfa87844b9bdc5361cd/filelock-3.4.1-py3-none-any.whl
Collecting geomet<0.3,>=0.1 (from cassandra-driver->cqlsh)
  Downloading https://files.pythonhosted.org/packages/c9/81/156ca48f950f833ddc392f8e3677ca50a18cb9d5db38ccb4ecea55a9303f/geomet-0.2.1.post1-py3-none-any.whl
Collecting distlib<1,>=0.3.6 (from virtualenv!=20.0.0,!=20.0.1,!=20.0.2,!=20.0.3,!=20.0.4,!=20.0.5,!=20.0.6,!=20.0.7,>=16.0.0->tox)
  Downloading https://files.pythonhosted.org/packages/43/a0/9ba967fdbd55293bacfc1507f58e316f740a3b231fc00e3d86dc39bc185a/distlib-0.3.7-py2.py3-none-any.whl (468kB)
    100% |████████████████████████████████| 471kB 2.2MB/s
Collecting platformdirs<3,>=2.4 (from virtualenv!=20.0.0,!=20.0.1,!=20.0.2,!=20.0.3,!=20.0.4,!=20.0.5,!=20.0.6,!=20.0.7,>=16.0.0->tox)
  Downloading https://files.pythonhosted.org/packages/b1/78/dcfd84d3aabd46a9c77260fb47ea5d244806e4daef83aa6fe5d83adb182c/platformdirs-2.4.0-py3-none-any.whl
Collecting importlib-resources>=5.4; python_version < "3.7" (from virtualenv!=20.0.0,!=20.0.1,!=20.0.2,!=20.0.3,!=20.0.4,!=20.0.5,!=20.0.6,!=20.0.7,>=16.0.0->tox)
  Downloading https://files.pythonhosted.org/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl
Collecting typing-extensions>=3.6.4; python_version < "3.8" (from importlib-metadata>=0.12; python_version < "3.8"->tox)
  Downloading https://files.pythonhosted.org/packages/45/6b/44f7f8f1e110027cf88956b59f2fad776cca7e1704396d043f89effd3a0e/typing_extensions-4.1.1-py3-none-any.whl
Collecting zipp>=0.5 (from importlib-metadata>=0.12; python_version < "3.8"->tox)
  Downloading https://files.pythonhosted.org/packages/bd/df/d4a4974a3e3957fd1c1fa3082366d7fff6e428ddb55f074bf64876f8e8ad/zipp-3.6.0-py3-none-any.whl
Collecting pyparsing!=3.0.5,>=2.0.2 (from packaging>=14->tox)
  Downloading https://files.pythonhosted.org/packages/39/92/8486ede85fcc088f1b3dba4ce92dd29d126fd96b0008ea213167940a2475/pyparsing-3.1.1-py3-none-any.whl (103kB)
    100% |████████████████████████████████| 112kB 10.0MB/s
Collecting click (from geomet<0.3,>=0.1->cassandra-driver->cqlsh)
  Downloading https://files.pythonhosted.org/packages/4a/a8/0b2ced25639fb20cc1c9784de90a8c25f9504a7f18cd8b5397bd61696d7d/click-8.0.4-py3-none-any.whl (97kB)
    100% |████████████████████████████████| 102kB 8.5MB/s
Installing collected packages: typing-extensions, zipp, importlib-metadata, click, geomet, cassandra-driver, cqlsh, distlib, filelock, platformdirs, importlib-resources, virtualenv, pluggy, pyparsing, packaging, py, toml, tox
  Running setup.py install for cassandra-driver ... done
Successfully installed cassandra-driver-3.28.0 click-8.0.4 cqlsh-6.1.2 distlib-0.3.7 filelock-3.4.1 geomet-0.2.1.post1 importlib-metadata-4.8.3 importlib-resources-5.4.0 packaging-21.3 platformdirs-2.4.0 pluggy-1.0.0 py-1.11.0 pyparsing-3.1.1 toml-0.10.2 tox-3.28.0 typing-extensions-4.1.1 virtualenv-20.17.1 zipp-3.6.0

Confirm the installation of Java and cqlsh.

$ java -version
openjdk version "11.0.18-ea" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.9-0.3.ea.el8) (build 11.0.18-ea+9-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.9-0.3.ea.el8) (build 11.0.18-ea+9-LTS, mixed mode, sharing)

$ cqlsh --version
cqlsh 6.1.0

Step 2: Install Apache Cassandra on CentOS 8 | Rocky Linux 8

Now that Java and Python are installed. Let’s now add Cassandra repository to our CentOS / Rocky system. More details on configuring the repository is available in official documentation pages.

sudo tee  /etc/yum.repos.d/cassandra.repo<<EOF
[cassandra]
name=Apache Cassandra
baseurl=https://redhat.cassandra.apache.org/41x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://downloads.apache.org/cassandra/KEYS
EOF

Install Apache Cassandra with the command below.

sudo yum -y install cassandra

To confirm the version of cassandra package installed use:

$ rpm -qi cassandra
Name        : cassandra
Version     : 4.1.3
Release     : 1
Architecture: noarch
Install Date: Wed 16 Aug 2023 02:16:33 AM UTC
Group       : Development/Libraries
Size        : 58896904
License     : Apache Software License 2.0
Signature   : RSA/SHA512, Tue 18 Jul 2023 08:18:51 PM UTC, Key ID 32f35cb2f546d93e
Source RPM  : cassandra-4.1.3-1.src.rpm
Build Date  : Tue 18 Jul 2023 08:18:30 PM UTC
Build Host  : 9bc3d3817a9e
Relocations : (not relocatable)
URL         : http://cassandra.apache.org/
....

Create Cassandra service.

sudo tee /etc/systemd/system/cassandra.service<<EOF
[Unit]
Description=Apache Cassandra
After=network.target

[Service]
PIDFile=/var/run/cassandra/cassandra.pid
User=cassandra
Group=cassandra
ExecStart=/usr/sbin/cassandra -f -p /var/run/cassandra/cassandra.pid
Restart=always

[Install]
WantedBy=multi-user.target
EOF

Start and enable service to start at boot.

sudo systemctl daemon-reload
sudo systemctl start cassandra.service
sudo systemctl enable cassandra

Check service status:

$ systemctl status cassandra.service
 cassandra.service - Apache Cassandra
   Loaded: loaded (/etc/systemd/system/cassandra.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-08-16 01:48:30 UTC; 30s ago
 Main PID: 6371 (java)
    Tasks: 16 (limit: 10843)
   Memory: 1.0G
   CGroup: /system.slice/cassandra.service
           └─6371 /usr/bin/java -ea -da:net.openhft... -XX:+UseThreadPriorities -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+AlwaysPreTouch -XX:-UseBiasedLocking -XX:+UseTLAB -XX:+ResizeTLAB -XX:+>

Aug 16 01:48:45 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:45,049 YamlConfigurationLoader.java:102 - Configuration location: file:/etc/cassandra/default.conf/cassandra.yaml
Aug 16 01:48:49 cent8.mylab.io cassandra[6371]: WARN  [main] 2023-08-16 01:48:49,810 YamlConfigurationLoader.java:422 - [key_cache_save_period, counter_cache_save_period, row_cache_save_period] par>
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:49,986 Config.java:1163 - Node configuration:[allocate_tokens_for_keyspace=null; allocate_tokens_for_local_replication_>
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:50,015 DatabaseDescriptor.java:467 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:50,017 DatabaseDescriptor.java:521 - Global memtable on-heap threshold is enabled at 214MiB
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:50,018 DatabaseDescriptor.java:525 - Global memtable off-heap threshold is enabled at 214MiB
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: INFO  [main] 2023-08-16 01:48:50,044 DatabaseDescriptor.java:592 - Native transport rate-limiting disabled.
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: WARN  [main] 2023-08-16 01:48:50,294 DatabaseDescriptor.java:990 - Small commitlog volume detected at '/var/lib/cassandra/commitlog'; setting commitl>
Aug 16 01:48:50 cent8.mylab.io cassandra[6371]: WARN  [main] 2023-08-16 01:48:50,315 DatabaseDescriptor.java:657 - Only 15.499GiB free across all data volumes. Consider adding more capacity to your>
Aug 16 01:48:57 cent8.mylab.io cassandra[6371]: WARN  [main] 2023-08-16 01:48:57,981 YamlConfigurationLoader.java:422 - [key_cache_save_period, counter_cache_save_period, row_cache_save_period] par>
...

You can also verify that Cassandra is running with the command below after some minutes.

$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  70 KiB     256          100.0%            0daf41fa-22e5-4471-bc00-9aed6f566235  rack1

To run a query against Cassandra, invoke the CQL shell with below command.

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042
[cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
  • The default location of configuration files is /etc/cassandra.
  • The default location of log and data directories is /var/log/cassandra/ and /var/lib/cassandra.

Step 3: Configuring Cassandra on CentOS 8 | Rocky Linux 8

For running Cassandra on a single node, the default configuration file present at /etc/cassandra/conf/cassandra.yaml. For cluster of nodes setup, you may need to modify this file to ensure your cluster is tuned properly.

At a minimum you
should consider setting the following properties:

  • cluster_name: the name of your cluster.
  • seeds: a comma separated list of the IP addresses of your cluster seeds.
  • storage_port: you don’t necessarily need to change this but make sure that there are no firewalls blocking this port.
  • listen_address: the IP address of your node, this is what allows other nodes to communicate with this node so it is important that you change it.
  • native_transport_port: as for storage_port, make sure this port is not blocked by firewalls as clients will communicate with Cassandra on this port.

Changing the location of directories

The configuration yaml file controls the following data directories.

  • data_file_directories: one or more directories where data files are located.
  • commitlog_directory: the directory where commitlog files are located.
  • saved_caches_directory: the directory where saved caches are located.
  • hints_directory: the directory where hints are located.

For performance reasons, if you have multiple disks, consider putting commitlog and data files on different disks.

Setting Environment variables

The JVM level settings such as heap size are set in the cassandra-env.sh. Consider adding any additional JVM command line argument to the JVM_OPTS environment variable. These arguments are passed to Cassandra service when it starts.

Cassandra Logging

The logger in use is logback. You can change logging properties by editing logback.xml. By default it will log at INFO level into a file called system.log and at debug level into a file calle debug.log. When running in the foreground, it will also log at INFO level to the console.

Refer to official guide for Clients configuration.

RELATED ARTICLES

Most Popular

Recent Comments