Friday, December 27, 2024
Google search engine
HomeGuest BlogsSetup SeaweedFS Distributed Object Storage Cluster on Ubuntu 20.04

Setup SeaweedFS Distributed Object Storage Cluster on Ubuntu 20.04

.tdi_3.td-a-rec{text-align:center}.tdi_3 .td-element-style{z-index:-1}.tdi_3.td-a-rec-img{text-align:left}.tdi_3.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_3.td-a-rec-img{text-align:center}}

With the surge of information in this connectivity age, many applications require more data to describe clients, to feed AI, to come up with grand Data Projects and so on. In order to deal with the burgeoning petabyte scale of information; efficient, reliable and stable systems are required to enable fast processing, storage and retrieval of the data. There are some systems already built to handle such amounts of data at scale including Ceph, GlusterFS, HDFS, MinIO and SeawedFS. They are all brilliant in their nature and in the way they handle storage of objects and in this guide, we shall focus on SeawedFS. We shall get to see its features then immerse ourselves in getting it installed. Before everything else, let us get ourselves acquainted with SeawedFS.

SeaweedFS is a distributed object store and file system to store and serve billions of files fast! Object store has O(1) disk seek, transparent cloud integration. Filer supports cross-cluster active-active replication, Kubernetes, POSIX, S3 API, encryption, Erasure Coding for warm storage, FUSE mount, Hadoop, WebDAV. Source: SeawedFS GitHub Space

SeaweedFS started as an Object Store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages file volumes, and it lets these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation. Source: SeawedFS GitHub Space

.tdi_2.td-a-rec{text-align:center}.tdi_2 .td-element-style{z-index:-1}.tdi_2.td-a-rec-img{text-align:left}.tdi_2.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_2.td-a-rec-img{text-align:center}}

Seaweed has two objectives:

  • To store billions of files!
  • To serve the files fast!

Features of SeaweedFS

Seaweed has the following features to display to the world

  • SeaweedFS can transparently integrate with the cloud: It can achieve both fast local access time and elastic cloud storage capacity, without any client side changes.
  • SeaweedFS is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases. It is only 40 bytes of disk storage overhead for each file’s metadata.
  • Can choose no replication or different replication levels, rack and data center aware.
  • Automatic master servers failover – no single point of failure (SPOF).
  • Automatic Gzip compression depending on file mime type.
  • Automatic compaction to reclaim disk space after deletion or update.
  • Automatic entry TTL expiration.
  • Any server with some disk spaces can add to the total storage space.
  • Adding/Removing servers does not cause any data re-balancing unless triggered by admin commands.
  • Optional picture resizing.

Installation of SeaweedFS on Ubuntu 20.04

Now to the part where we put on our boots, gloves and get into the farm to get SeaweedFS installed, tilled and well watered on our Ubuntu 20.04 server. Before we shove the shovel into the mud, follow the following steps to first install Go which is required by SeeweedFS.

Step 1: Prepare Your Server

This is a very important step since we shall install latest software and patches before proceeding to install SeaweedFS and Go. We shall also install tools we need here as well.

sudo apt update
sudo apt install vim curl wget zip git -y
sudo apt install build-essential autoconf automake gdb git libffi-dev zlib1g-dev libssl-dev -y

Step 2: Fetch and install Go

You can use Golang available in the APT repository or pulling from source.

Method 1: Install from APT repository:

Run the commands below to install Golang on Ubuntu from APT repos.

sudo apt install golang

Method 2: Install Manually

Visit Go download page to fetch the latest Go tarball version available as follows:

cd ~
wget -c https://golang.org/dl/go1.15.5.linux-amd64.tar.gz -O - | sudo tar -xz -C /usr/local

After that is done, we need to add “/usr/local/go/bin” directory to PATH environment variable so that the sever will find Go executable binaries. Do this by by appending the following line either to the /etc/profile file (for a system-wide installation) or the $HOME/.profile file (for the current user installation):

echo "export PATH=$PATH:/usr/local/go/bin" | sudo tee -a /etc/profile

#### For the current user installation

echo "export PATH=$PATH:/usr/local/go/bin" | tee -a $HOME/.profile

Depending on the file you edited source the file, so that the new PATH environment variable can be loaded into the current shell session:

$ source ~/.profile

# Or
$ source /etc/profile

Step 3: Checkout SeaweedFS Repository

In order to get SeaweedFS installed, we need to get the necessary files into our server. All sources are in GitHub and let us therefore clone the repository and proceed to install.

cd ~
git clone https://github.com/chrislusf/seaweedfs.git

Step 4: Download, compile, and install SeaweedFS

Once all the sources have been cloned, navigate into the new directory and install SeaweedFS project by executing the following command

$ cd ~/seaweedfs
$ make install

##Progress of the installation
$ go get  -d ./weed/
go: downloading github.com/chrislusf/raft v1.0.3
go: downloading github.com/golang/protobuf v1.4.2
go: downloading github.com/gorilla/mux v1.7.4
go: downloading google.golang.org/grpc v1.29.1
go: downloading github.com/google/uuid v1.1.1
go: downloading github.com/syndtr/goleveldb v1.0.0
go: downloading go.etcd.io/etcd v0.5.0-alpha.5.0.20200425165423-262c93980547
go: downloading github.com/klauspost/crc32 v1.2.0

Once this is done, you will find the executable “weed” in your $GOPATH/bin directory. Unfortunately when weed is installed, it creates $GOPATH under your current home directory. You will find weed here “~/go/bin/weed“. So to fix this issue, we shall copy SeaweedFS binaries to the earlier location where Go was installed in Step 2 like this:

sudo cp ~/go/bin/weed   /usr/local/bin/

Now “weed” command is in the PATH environment variable and we can now continue to configure SeaweedFS comfortably as illustrated in the following Steps.

$ weed version
version 30GB 2.12 6d30b21b linux amd64

Step 5: Example Usage of SeaweedFS

In order to grasp the simple examples to be presented in this step, it will be great if we first wrap our minds fully about the working mechanism of SeaweedFS. The architecture is fairly simple. The actual data is stored in volumes on storage nodes(which can lie in same server or in different servers). One volume server can have multiple volumes, and can both support read and write access with basic authentication. Source:SeaweedFS Documentation

All volumes are managed by a master server which contains the volume id to volume server mapping.

Instead of managing chunks as distributed file systems do, SeaweedFS manages data volumes in the master server. Each data volume is 32GB or so in size, and can hold a lot of files. And each storage node can have many data volumes. So the master node only needs to store the metadata about the volumes, which is a fairly small amount of data and is generally stable. Source:SeaweedFS Documentation

By default, the master node runs on port 9333, and the volume nodes run on port 8080. To visualize this, we shall start one master node, and two volume nodes on port 8080 and 8081 respectively. Ideally as it had been mentioned, they should be started from different machines but we shall use one server as an example. If you intend to start the volumes on different servers, make sure that the -mserver IP address point to the master. Moreover the port on the master must be reachable from the volume servers/nodes.

SeaweedFS uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.

Starting Master Server

As it has been made open, the master node runs on port 9333 by default. We can start the master server as follows:

Option 1: The manual way

$ weed master &

I1126 20:22:17  6485 file_util.go:23] Folder /tmp Permission: -rwxrwxrwx
I1126 20:22:17  6485 master.go:168] current: 172.22.3.196:9333 peers:
I1126 20:22:17  6485 master_server.go:107] Volume Size Limit is 30000 MB
I1126 20:22:17  6485 master_server.go:192] adminScripts:
I1126 20:22:17  6485 master.go:122] Start Seaweed Master 30GB 2.12 a1021570 at 0.0.0.0:9333
I1126 20:22:17  6485 raft_server.go:70] Starting RaftServer with 172.22.3.196:9333
I1126 20:22:17  6485 raft_server.go:129] current cluster leader:

Option 2: Start master Using Systemd

You can start the master using Systemd by creating its unit file as follows:

sudo tee /etc/systemd/system/seaweedmaster.service<<EOF
[Unit]
Description=SeaweedFS Master
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed master
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-master

[Install]
WantedBy=multi-user.target
EOF

After updating the file, you need to reload the daemon and start the master as illustrated

sudo systemctl daemon-reload
sudo systemctl start seaweedmaster
sudo systemctl enable seaweedmaster

Then check the its status

$ systemctl status seaweedmaster -l
● seaweedmaster.service - SeaweedFS Master
     Loaded: loaded (/etc/systemd/system/seaweedmaster.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:11:37 UTC; 2s ago
   Main PID: 1653 (weed)
      Tasks: 10 (limit: 2204)
     Memory: 11.8M
     CGroup: /system.slice/seaweedmaster.service
             └─1653 /usr/local/go/bin/weed master

Start Volume Servers

Once the master is ready and waiting for the volumes, we can now have the confidence to start the volumes using the commands below. We shall first create sample directories.

mkdir /tmp/{data1,data2,data3,data4}}

Then let us create the first volumes as shown below. (The first is the command and what follows is the shell output)

Option 1: The manual way

$ weed volume -dir="/tmp/data1" -max=5  -mserver="localhost:9333" -port=8080 &


I1126 20:37:24  6595 disk_location.go:133] Store started on dir: /tmp/data1 with 0 volumes max 5        
I1126 20:37:24  6595 disk_location.go:136] Store started on dir: /tmp/data1 with 0 ec shards
I1126 20:37:24  6595 volume.go:331] Start Seaweed volume server 30GB 2.12 a1021570 at 0.0.0.0:8080      
I1126 20:37:24  6595 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [localhost:9333]
I1126 20:37:24  6595 volume_grpc_client_to_master.go:114] Heartbeat to: localhost:9333
I1126 20:37:24  6507 node.go:278] topo adds child DefaultDataCenter
I1126 20:37:24  6507 node.go:278] topo:DefaultDataCenter adds child DefaultRack
I1126 20:37:24  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8080       
I1126 20:37:24  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8080
I1126 20:37:24  6595 volume_grpc_client_to_master.go:135] Volume Server found a new master newLeader: 172.22.3.196:9333 instead of localhost:9333
W1126 20:37:24  6507 master_grpc_server.go:57] SendHeartbeat.Recv server 172.22.3.196:8080 : rpc error: 
code = Canceled desc = context canceled
I1126 20:37:24  6507 node.go:294] topo:DefaultDataCenter:DefaultRack removes 172.22.3.196:8080
I1126 20:37:24  6507 master_grpc_server.go:29] unregister disconnected volume server 172.22.3.196:8080  
I1126 20:37:27  6595 volume_grpc_client_to_master.go:114] Heartbeat to: 172.22.3.196:9333
I1126 20:37:27  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8080
I1126 20:37:27  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8080

Then create the second one as again as follows. (The first is the command and what follows is the shell output)

$ weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 &

I1126 20:38:56  6612 disk_location.go:133] Store started on dir: /tmp/data2 with 0 volumes max 10       
I1126 20:38:56  6612 disk_location.go:136] Store started on dir: /tmp/data2 with 0 ec shards
I1126 20:38:56  6612 volume_grpc_client_to_master.go:52] Volume server start with seed master nodes: [localhost:9333]
I1126 20:38:56  6612 volume.go:331] Start Seaweed volume server 30GB 2.12 a1021570 at 0.0.0.0:8081      
I1126 20:38:56  6612 volume_grpc_client_to_master.go:114] Heartbeat to: localhost:9333
I1126 20:38:56  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8081       
I1126 20:38:56  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8081
I1126 20:38:56  6612 volume_grpc_client_to_master.go:135] Volume Server found a new master newLeader: 172.22.3.196:9333 instead of localhost:9333
W1126 20:38:56  6507 master_grpc_server.go:57] SendHeartbeat.Recv server 172.22.3.196:8081 : rpc error: 
code = Canceled desc = context canceled
I1126 20:38:56  6507 node.go:294] topo:DefaultDataCenter:DefaultRack removes 172.22.3.196:8081
I1126 20:38:56  6507 master_grpc_server.go:29] unregister disconnected volume server 172.22.3.196:8081  
I1126 20:38:59  6612 volume_grpc_client_to_master.go:114] Heartbeat to: 172.22.3.196:9333
I1126 20:38:59  6507 node.go:278] topo:DefaultDataCenter:DefaultRack adds child 172.22.3.196:8081
I1126 20:38:59  6507 master_grpc_server.go:73] added volume server 172.22.3.196:8081

Option 2: Using SystemD

To start the voules using Systemd, we will need to create two or more volume files in case you require more. It is as simple as follows:

For Volume1

$ sudo vim /etc/systemd/system/seaweedvolume1.service

[Unit]
Description=SeaweedFS Volume
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed volume -dir="/tmp/data2" -max=10 -mserver="172.22.3.196:9333" -port=8081
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-volume

[Install]
WantedBy=multi-user.target

Replace volume path with your correct value, then start and enable.

sudo systemctl daemon-reload
sudo systemctl start seaweedvolume1.service
sudo systemctl enable seaweedvolume1.service

Check status:

$ systemctl status seaweedvolume1
● seaweedvolume1.service - SeaweedFS Volume
     Loaded: loaded (/etc/systemd/system/seaweedvolume1.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:24:43 UTC; 3s ago
   Main PID: 2063 (weed)
      Tasks: 9 (limit: 2204)
     Memory: 9.8M
     CGroup: /system.slice/seaweedvolume1.service
             └─2063 /usr/local/go/bin/weed volume -dir=/tmp/data3 -max=10 -mserver=localhost:9333 -port=8081 -ip=172.22.3.196

For Volume2

$ sudo vim /etc/systemd/system/seaweedvolume2.service

[Unit]
Description=SeaweedFS Volume
After=network.target

[Service]
Type=simple
User=root
Group=root

ExecStart=/usr/local/go/bin/weed volume -dir="/tmp/data1" -max=5  -mserver="172.22.3.196:9333" -port=8080
WorkingDirectory=/usr/local/go/bin/
SyslogIdentifier=seaweedfs-volume2

[Install]
WantedBy=multi-user.target

After updating the files, we will have to reload the daemon as illustrated

sudo systemctl daemon-reload
sudo systemctl start seaweedvolume2
sudo systemctl enable seaweedvolume2

Then check their statuses

sudo systemctl status seaweedvolume2
● seaweedvolume2.service - SeaweedFS Volume
     Loaded: loaded (/etc/systemd/system/seaweedvolume2.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2020-11-30 08:29:22 UTC; 5s ago
   Main PID: 2103 (weed)
      Tasks: 10 (limit: 2204)
     Memory: 10.3M
     CGroup: /system.slice/seaweedvolume2.service
             └─2103 /usr/local/go/bin/weed volume -dir=/tmp/data4 -max=5 -mserver=localhost:9333 -port=8080 -ip=172.22.3.196

Write a sample File

Uploading a file to the SeaweedFS object store is interesting. We first have to send a HTTP POST, PUT, or GET request to /dir/assign to get a File ID (fid) and a volume server url:

$ curl http://localhost:9333/dir/assign

{"fid":"7,0101406762","url":"172.22.3.196:8080","publicUrl":"172.22.3.196:8080","count":1}

After we have those details as it has been shown above the next step is to store the file content. To do this, we will have to send a HTTP multi-part POST request to url + ‘/’ + File ID (fid) from the response. Our fid is 7,0101406762, url is 172.22.3.196:8080. Let us send the request like this. You will get a response as shown below it.

$ curl -F file=@/home/tech/teleport-logo.png http://172.22.3.196:8080/7,0101406762

{"name":"teleport-logo.png","size":70974,"eTag":"ef8deb64899176d3de492f2fa9951e14"}

Update a file already sent to the Object Store

Updating is simpler than you can imagine. You simpley need yo send the same command as above but now with the new file you intend to replace the existing one with. You will maintain the fid and url.

curl -F file=@/home/tech/teleport-logo-updated.png http://172.22.3.196:8080/7,0101406762

Deleting a file from the Object Store

In order to get rid of a file that you had already stored in SeaweedFS, you simply have to send an HTTP DELETE request to the same url + ‘/’ + File ID (fid) URL:

curl -X DELETE http://172.22.3.196:8080/7,0101406762

Reading a saved File

Once you have your files stired, you can read them quite easily. You first take a look up the volume server’s URLs by the file’s volumeId as shown in this example:

$ curl http://http://172.22.3.196:9333/dir/lookup?volumeId=7

{"volumeId":"7","locations":[{"url":"172.22.3.196:8080","publicUrl":"172.22.3.196:8080"}]}

Because volumes do not move quite often, you can cache the results most of the time to increase speed and performance of your unique implementation. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.

Now open up your browser or application you prefer to use to view the file you stored in SeaweedFS object store and point it to the url we got above. In case you have a firewall running, allow the port you will be accessing

sudo ufw allow 8080

http://172.22.3.196:8080/7,0101406762

A screenshot of the file is shared below in this example

seaweed object on browser

If you want a nicer URL, you can use one of these alternative URL formats:

 http://172.22.3.196:8080/7/0101406762/your_preferred_name.jpg
 http://172.22.3.196:8080/7/0101406762.jpg
 http://172.22.3.196:8080/7,0101406762.jpg
 http://172.22.3.196:8080/7/0101406762
 http://172.22.3.196:8080/7,0101406762

If you would wish to get a scaled version of an image, you can add some parameters. Examples are shared below:

http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200
http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200&mode=fit
http://172.22.3.196:8080/7/0101406762.jpg?height=200&width=200&mode=fill

There is so much more that Seaweed is able to do such as accomplishing No point of failure by the use of multiple servers and multiple volumes. Check out SeaweedFS Documentation on GitHub to find our more about this wonderful Object Storage tool.

Closing Remarks

SeaweedFS can turn out to be pretty invaluable in your projects especially if it involves storage and retrieval of large amounts of data in form of objects. In case you have applications that require fetching photos and such kind of data, then SeaweedFS is a good place to settle in. Give it a shot! Meanwhile, we continue to appreciate your time on the blog and the relentless support you have exuded thus far. You can read other similar guides shared below.

EKS Kubernetes Persistent Storage with EFS Storage Service

Setup GlusterFS Storage With Heketi on CentOS 8 / CentOS 7

How To Create & Delete GlusterFS Volumes

Setup S3 Compatible Object Storage Server with Minio

.tdi_4.td-a-rec{text-align:center}.tdi_4 .td-element-style{z-index:-1}.tdi_4.td-a-rec-img{text-align:left}.tdi_4.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_4.td-a-rec-img{text-align:center}}

RELATED ARTICLES

Most Popular

Recent Comments