Friday, January 3, 2025
Google search engine
HomeLanguagesJavaJava Program to Extract Paragraphs From a Word Document

Java Program to Extract Paragraphs From a Word Document

The article demonstrates how to extract paragraphs from a word document using the getParagraphs() method of XWPFDocument class provided by the Apache POI package. Apache POI is a project developed and maintained by Apache Software Foundation that provides libraries to perform numerous operations on Microsoft office files using java. 

To extract paragraphs from a word file, the essential requirement is to import the following library of Apache.

poi-ooxml.jar

Approach

  1. Formulate the path of the word document
  2. Create a FileInputStream and XWPFDocument object for the word document.
  3. Retrieve the list of paragraphs using the getParagraphs() method.
  4. Iterate through the list of paragraphs to print it.

Implementation

  • Step 1: Getting the path of the current working directory where the word document is located.
  • Step 2: Creating a file object with the above-specified path.
  • Step 3: Creating a document object for the word document.
  • Step 4: Using the getParagraphs() method to retrieve the paragraphs list from the word file.
  • Step 5: Iterating through the list of paragraphs
  • Step 6: Printing the paragraphs
  • Step 7: Closing the connections

Sample Input

The content of the Word document is as follows:

Implementation

Example

Java




// Java program to extract paragraphs from a Word Document
  
// Importing IO package for basic file handling
import java.io.*;
import java.util.List;
// Importing Apache POI package
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
  
// Main class to extract paragraphs from word document
public class GFG {
  
    // Main driver method
    public static void main(String[] args) throws Exception
    {
  
        // Step 1: Getting path of the current working
        // directory where the word document is located
        String path = System.getProperty("user.dir");
        path = path + File.separator + "WordFile.docx";
  
        // Step 2: Creating a file object with the above
        // specified path.
        FileInputStream fin = new FileInputStream(path);
  
        // Step 3: Creating a document object for the word
        // document.
        XWPFDocument document = new XWPFDocument(fin);
  
        // Step 4: Using the getParagraphs() method to
        // retrieve the list of paragraphs from the word
        // file.
        List<XWPFParagraph> paragraphs
            = document.getParagraphs();
  
        // Step 5: Iterating through the list of paragraphs
        for (XWPFParagraph para : paragraphs) {
  
            // Step 6: Printing the paragraphs
            System.out.println(para.getText() + "\n");
        }
  
        // Step 7: Closing the connections
        document.close();
    }
}


Output

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments