Friday, January 10, 2025
Google search engine
HomeLanguagesJavaSpring Batch Example – Read and Process CSV File to MySQL Database

Spring Batch Example – Read and Process CSV File to MySQL Database

GeekMart is a multinational technology company that focuses on the e-commerce business. It has a large number of suppliers who supply huge volumes of products to them. Suppliers share the product details in CSV file format. Help GeekMart to read the product details from the CSV file and store it in the “Product” table of MySQL database.

  • Requirement 1: Read the records from the Products.csv file and store the data in the Products table
  • Requirement 2: Do not store the products in the database if the quantity of the product is zero.
  • Requirement 3: Before storing the product details in the database, increase the price of the product. Increase the price by 2%.
  • Requirement 4: Skip all the products whose format is not as expected. A maximum of 20 records can be skipped by the application. If more than 20 products are not in the proper format, the job should fail.
  • Requirement 5: If any records are skipped from writing into the database, display all such product details on the console at the end of the job.

Example

Input:

ProductId Name Category Quantity Price
P1 Samsung Galaxy Mobile 10 19999
P2 Samsung Galaxy Note 8 Mobile 8 38999
P3 MacBook Pro Laptop 5 209999
P4 MacBook Pro Air Laptop 4 109999
P5 Sony Bravia TV 25 63690
P6 LG Ultra HD TV 20 37990

(Sample Products.csv file)

Output: The spring batch should create a job that reads data from the Products.csv file, process it & filters records as per the requirements, and finally insert it into the Products tables in MySQL.

Approach: It follows a batch processing approach using the Spring Batch framework to read the data from the Products.csv file, apply some transformations to the data, and then store the data in the Products table of a MySQL database.

Here is a high-level overview of the approach used in the code:

  1. Configure Job and Step: Spring Batch provides a configuration mechanism to define jobs and steps. In this code, the ProductJob is defined as the job that reads the data from the CSV file, processes it, and stores it in the database. The ProductStep is defined as a step within the job that defines the reader, processor, and writer for the batch job.
  2. Define Reader: Spring Batch provides multiple readers for different file formats. In this code, FlatFileItemReader is used to read the data from the Products.csv file. The reader reads the data line-by-line and maps each line to a Product object.
  3. Define Processor: The processor is responsible for processing the data before writing it into the database. In this code, the processor applies two transformations to the data: it increases the price of each product by 2%, and it skips the products whose quantity is zero.
  4. Define Writer: Spring Batch provides multiple writers for different types of databases. In this code, JdbcBatchItemWriter is used to write the data into the MySQL database. The writer takes a list of Product objects and writes them to the database using JDBC.
  5. Define Skip Policy: In this code, a custom skip policy is defined to skip the products whose format is not as expected. If more than 20 products are not in the proper format, the job fails.
  6. Define Listener: Spring Batch provides multiple listeners for different events in a batch job. In this code, a custom listener is defined to display the skipped product details on the console.

Overall, this approach is a standard and effective way of processing large volumes of data in batches, where data can be read, processed, and written in chunks, making it suitable for handling large amounts of data with limited memory resources.

To fulfill the above requirements, we can use Spring Batch to read the CSV file, process the records and write them into the MySQL database. Here is a sample Spring Batch job configuration for the same:

Java




// Configuration file
@Configuration
@EnableBatchProcessing
public class ProductBatchConfiguration {
  
    private final JobBuilderFactory jobBuilderFactory;
    private final StepBuilderFactory stepBuilderFactory;
    private final DataSource dataSource;
  
    public ProductBatchConfiguration(
        JobBuilderFactory jobBuilderFactory,
        StepBuilderFactory stepBuilderFactory,
        DataSource dataSource)
    {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
        this.dataSource = dataSource;
    }
  
    @Bean
    public Job importProductJob(
        JobCompletionNotificationListener listener)
    {
        return jobBuilderFactory.get("importProductJob")
            .incrementer(new RunIdIncrementer())
            .listener(listener)
            .flow(productStep())
            .end()
            .build();
    }
  
    @Bean public Step productStep()
    {
        return stepBuilderFactory.get("productStep")
            .<Product, Product>chunk(10)
            .reader(productItemReader())
            .processor(productItemProcessor())
            .writer(productItemWriter())
            .faultTolerant()
            .skipLimit(20)
            .skip(CsvFormatException.class)
            .listener(new ProductSkipListener())
            .build();
    }
  
    @Bean
    public FlatFileItemReader<Product> productItemReader()
    {
        return new FlatFileItemReaderBuilder<Product>()
            .name("productItemReader")
            .resource(new ClassPathResource("Products.csv"))
            .delimited()
            .delimiter(",")
            .names(new String[] { "productId", "name",
                                  "category", "quantity",
                                  "price" })
            .linesToSkip(1)
            .fieldSetMapper(
                new BeanWrapperFieldSetMapper<Product>() {
                    {
                        setTargetType(Product.class);
                    }
                })
            .build();
    }
  
    @Bean
    public ItemProcessor<Product, Product>
    productItemProcessor()
    {
        return new ItemProcessor<Product, Product>() {
            @Override
            public Product process(Product product)
                throws Exception
            {
                if (product.getQuantity() == 0) {
                    return null; // do not store if quantity
                                 // is zero
                }
                product.setPrice(
                    product.getPrice()
                    * 1.02); // increase price by 2%
                return product;
            }
        };
    }
  
    @Bean
    public JdbcBatchItemWriter<Product> productItemWriter()
    {
        return new JdbcBatchItemWriterBuilder<Product>()
            .itemSqlParameterSourceProvider(
                new BeanPropertyItemSqlParameterSourceProvider<>())
            .sql(
                "INSERT INTO Product (productId, name, " "category, quantity, price) VALUES " "(:productId, :name, :category, :quantity, " ":price)")
            .dataSource(dataSource)
            .build();
    }
}
  
// this code is contributed by prophet1999


Here, we have defined a Spring Batch job named importProductJob with a single step named productStep. The job reads the records from the Products.csv file using a FlatFileItemReader, processes each record using an ItemProcessor, and writes the valid records into the MySQL database using a JdbcBatchItemWriter.

In the productStep, we have also defined fault-tolerant behavior by setting the skipLimit to 20 and handling CsvFormatException using the skip method. Additionally, we have added a ProductSkipListener to capture all the skipped records. 

We also have a Product model class defined with getters and setters for each attribute in the CSV file. Finally, we have a JobCompletionNotificationListener to print the job execution status at the end of the job. You can run this Spring Batch job using a JobLauncher instance as follows:

Java




@Autowired
JobLauncher jobLauncher


Complexity Analysis

  • Reading from CSV file: The time complexity of reading from a CSV file using the FlatFileItemReader is O(n), where n is the number of lines in the file. The space complexity of the reader is also O(n), as it reads one line at a time and stores it in memory.
  • Processing: The time complexity of the processing step is O(n), where n is the number of products. The processing step applies two transformations to the data: it increases the price of each product by 2%, which is a constant-time operation, and it skips the products whose quantity is zero, which is also a constant-time operation.
  • Writing to the database: The time complexity of writing to the database using the JdbcBatchItemWriter is O(n), where n is the number of products. The space complexity of the writer is also O(n), as it stores the products in memory before writing them to the database.
  • Skip policy: The time complexity of the skipping policy is O(1), as it checks if the number of skipped records exceeds 20, which is a constant-time operation.
  • Listener: The time complexity of the listener is O(k), where k is the number of skipped products. The space complexity of the listener is also O(k), as it stores the skipped products in memory before displaying them on the console.
  • Overall, the time complexity of the code is O(n), where n is the number of products, 
  • The space complexity is also O(n), as it stores the products in memory before writing them to the database. However, the space complexity of the code could increase to O(n+k), where k is the number of skipped products if a large number of products are skipped.
RELATED ARTICLES

Most Popular

Recent Comments