How to Create an External Table in Hive

Introduction

In Hive terminology, external tables are tables not managed with Hive. Their purpose is to facilitate importing of data from an external file into the metastore.

The external table data is stored externally, while Hive metastore only contains the metadata schema. Consequently, dropping of an external table does not affect the data.

In this tutorial, you will learn how to create, query, and drop an external table in Hive.

Prerequisites

Ubuntu 18.04 LTS or later
Access to command line with sudo privileges
Apache Hadoop installed and running
Apache Hive installed and running

Note: This tutorial uses Ubuntu 20.04. However, Hive works the same on all operating systems. This means the process of creating, querying and dropping external tables can be applied to Hive on Windows, Mac OS, other Linux distributions, etc.

Creating an External Table in Hive – Syntax Explained

When creating an external table in Hive, you need to provide the following information:

Name of the table – The create external table command creates the table. If a table of the same name already exists in the system, this will cause an error. To avoid this, add if not exists to the statement. Table names are case insensitive.
Column names and types – Just like table names, column names are case insensitive. Column types are values such as int, char, string, etc.
Row format – Rows use native or custom SerDe (Serializer/Deserializer) formats. Native SerDe will be used if the row format is not defined, or if it is specified as delimited.
Field termination character – This is a char type character which separates table values in a row.
Storage format – You can specify storage formats such as textfile, sequencefile, jsonfile, etc.
Location – This is the HDFS directory location of the file containing the table data.

The correct syntax for providing this information to Hive is:

create external table if not exists [external-table-name] (
[column1-name] [column1-type], [column2-name] [column2-type], …)
comment '[comment]'
row format [format-type]
fields terminated by '[termination-character]'
stored as [storage-type]
location '[location]';

Create a Hive External Table – Example

For the purpose of a practical example, this tutorial will show you how to import data from a CSV file into an external table.

Step 1: Prepare the Data File

1. Create a CSV file titled ‘countries.csv’:

sudo nano countries.csv

2. For each country in the list, write a row number, the country’s name, its capital city, and its population in millions:

1,USA,Washington,328
2,France,Paris,67
3,Spain,Madrid,47
4,Russia,Moscow,145
5,Indonesia,Jakarta,267
6,Nigeria,Abuja,196

3. Save the file and make a note of its location.

Step 2: Import the File to HDFS

1. Create an HDFS directory. You will use this directory as an HDFS location of the file you created.

hdfs dfs -mkdir [hdfs-directory-name]

2. Import the CSV file into HDFS:

hdfs dfs -put [original-file-location] [hdfs-directory-name]

Using hdfs dfs put command to copy the file in the HDFS directory

3. Use the -ls command to verify that the file is in the HDFS folder:

hdfs dfs -ls [hdfs-directory-name]

Using hdfs dfs ls command to list the contents of an HDFS directory

The output displays all the files currently in the directory.

Note: To learn more about HDFS, refer to What Is HDFS? Hadoop Distributed File System Guide.

Step 3: Create an External Table

1. After you import the data file to HDFS, initiate Hive and use the syntax explained above to create an external table.

2. To verify that the external table creation was successful, type:

select * from [external-table-name];

The output should list the data from the CSV file you imported into the table:

Using the select command to see the contents of an external table

3. If you wish to create a managed table using the data from an external table, type:

create table if not exists [managed-table-name](
[column1-name] [column1-type], [column2-name] [var2-name], …)
comment '[comment]';

4. Next, import the data from the external table:

insert overwrite table [managed-table-name] select * from [external-table-name];

Using the insert command to import data from an external table to a managed table

5. Verify that the data is successfully inserted into the managed table.

select * from [managed-table-name];

Using the select command to see if the content of the external table successfully transferred to the managed table

How to Query a Hive External Table

To display all the data stored in a table, you will use the select * from command followed by the table name. Hive offers an expansive list of query commands to let you narrow down your searches and sort the data according to your preferences.

For example, you can use the where command after select * from to specify a condition:

select * from [table_name] where [condition];

Hive will output only the rows which satisfy the condition given in the query:

Using conditions to narrow down the output of the select command in Hive

Instead of the asterisk character which stands for “all data”, you can use more specific determiners. Replacing the asterisk with a column name (such as CountryName, from the example above) will show you only the data from the chosen column.

Here are some other useful query functions and their syntax:

Function	Syntax
Query a table according to multiple conditions	*`select from [table_name] where [condition1] and [condition2];`**
Order table data	`select [column1_name], [column2_name] from [table_name] order by [column_name];`
Order table data in descending order	`select [column1_name], [column2_name] from [table_name] order by [column_name] desc;`
Show the row count	*`select count() from [table_name];`**

How to Drop a Hive External Table

1. Dropping an external table in Hive is performed using the same drop command used for managed tables:

drop table [table_name];

The output will confirm the success of the operation:

2. Querying the dropped table will return an error:

Querying a dropped external table returns an error.

However, the data from the external table remains in the system and can be retrieved by creating another external table in the same location.

Conclusion

After reading this tutorial, you should have general understanding of the purpose of external tables in Hive, as well as the syntax for their creation, querying, and dropping.

Was this article helpful?

YesNo

How to Create an External Table in Hive

Creating an External Table in Hive – Syntax Explained

Create a Hive External Table – Example

Step 1: Prepare the Data File

Step 2: Import the File to HDFS

Step 3: Create an External Table

How to Query a Hive External Table

How to Drop a Hive External Table

How to Restart Windows Server 2016 – Reboot Commands

How to Restart or Reboot Linux Server from the Command Line

How to Install Hadoop on Ubuntu 18.04 or 20.04

LEAVE A REPLY Cancel reply

Most Popular

7 Best Books for Learning SQL [2024 Edition]

Introduction to Web Scraping

Must Do Coding Questions for Product Based Companies

Algorithm to solve Rubik’s Cube

Recent Comments

EDITOR PICKS

Java Program to Find the Largest of three Numbers

How to Implement Shapeable ImageView in Android?

Php Program to Find maximum element of each row in a matrix

POPULAR POSTS

Python – Consecutive Kth column Difference in Tuple List

Difference Between poll() and remove() method of Queue Interface in Java

numpy.defchararray.encode() in Python

POPULAR CATEGORY

ABOUT US

FOLLOW US