Thursday, September 4, 2025
HomeGuest BlogsHow to Install Apache Tika on Ubuntu 22.04|20.04|18.04

How to Install Apache Tika on Ubuntu 22.04|20.04|18.04

.tdi_3.td-a-rec{text-align:center}.tdi_3 .td-element-style{z-index:-1}.tdi_3.td-a-rec-img{text-align:left}.tdi_3.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_3.td-a-rec-img{text-align:center}}

How can I install Apache Tika on Ubuntu 22.04|20.04|18.04?. Apache Tika is an Open source toolkit that detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). Tika is very useful for search engine indexing, content analysis, translation e.t.c.

What is new in Apache Tika 2.2.x

  • Add support for OneNote files downloaded from O365
  • Improve extraction of embedded files from MSOffice files created by non-Microsoft tools
  • Added back ability to ignore load errors in TikaConfig
  • Fix logic bug in PipesServer that prevented concatenation of content from attachments
  • Fix default logging in tika-app in batch mode
  • Fix race condition when starting multiple forked servers on multiple ports
  • Add metadata item for whether or not a PDF has a collection/is a Portfolio PDF
  • Add detection of JPEG XL, MARC, ICC profiles, NES-ROM file types
  • Add optional fetch ranges to FetchEmitTuple to allow range fetching from,e.g. http or s3

In this post, we will discuss the installation of Apache Tika on Ubuntu 22.04|20.04|18.04 LTS.

Apache Tika dependencies

What you need to build and install Apache Tika on Ubuntu 22.04|20.04|18.04 LTS are:

.tdi_2.td-a-rec{text-align:center}.tdi_2 .td-element-style{z-index:-1}.tdi_2.td-a-rec-img{text-align:left}.tdi_2.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_2.td-a-rec-img{text-align:center}}
  • Java Runtime Environment (JRE)
  • Apache Maven

We will install these dependencies before we can download and install Tika on Ubuntu 22.04|20.04|18.04 Linux system.

Step 1: Install required dependencies

Start by ensuring you’re running an updated Ubuntu Desktop / Server.

sudo apt update
sudo apt -y install wget curl vim unzip

Step 2: Install Java on Ubuntu 22.04|20.04|18.04

As from Tika 1.19, build from Java 11 is supported. You can install Java on Ubuntu using the following commands:

sudo apt install -y default-jdk

Confirm installed version of Java:

$ java --version
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.13+8-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Step 3: Install Apache Maven

Install Apache Maven by following our guide:

Step 4: Download and Install Apache Tika

Download latest Apache Tika from the Downloads page.

export VER="2.2.1"
wget https://archive.apache.org/dist/tika/${VER}/tika-${VER}-src.zip

Unzip the downloaded file.

unzip tika-${VER}-src.zip

Change to new folder and run mvn install

cd tika-${VER}
mvn install

Sample installation output.

install apache tika ubuntu 18.04

Wait for the installation to finish then test Tika within its base directory.

Reference:

http://tika.apache.org/2.2.1/gettingstarted.html
.tdi_4.td-a-rec{text-align:center}.tdi_4 .td-element-style{z-index:-1}.tdi_4.td-a-rec-img{text-align:left}.tdi_4.td-a-rec-img img{margin:0 auto 0 0}@media(max-width:767px){.tdi_4.td-a-rec-img{text-align:center}}
RELATED ARTICLES

Most Popular

Dominic
32264 POSTS0 COMMENTS
Milvus
81 POSTS0 COMMENTS
Nango Kala
6629 POSTS0 COMMENTS
Nicole Veronica
11799 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11858 POSTS0 COMMENTS
Shaida Kate Naidoo
6749 POSTS0 COMMENTS
Ted Musemwa
7025 POSTS0 COMMENTS
Thapelo Manthata
6698 POSTS0 COMMENTS
Umr Jansen
6716 POSTS0 COMMENTS