If you work with files or reports and need to automate a process, you always wondered what an efficient way is to automate the process. Being in finance, I have experienced this and wondered if there is a library available to avoid writing boilerplate code. File watcher to the rescue, but wait what is a file watcher and how does it work? For any File Watcher we need three things:
- Listener – listening to an event like create, update, delete
- Observer – on a directory or path configure the observer to look for any event happening in that path, and will call the corresponding method of the listener.
- The last Monitor is one active thread that will have this observer running inside it for any event.
The Observer is a behavioral design pattern in Java that specifies the communication between objects: observables and observers. An observable is a java object which notifies observers about the changes in its state or listens to any event. Let’s look at their theory and implementations of them.
Java NIO
The java.nio.file package provides Watch Service API that looks for changes in files in a given directory. To implement the watch service these are the steps you need to follow:
- Create a WatchService for the file system – Listener
- Register a directory with a watcher that you want to be monitored and define which event are you interested in – Observer
- Implement a loop to wait for incoming events. Keep checking at regular intervals for any new event. A watcher’s queue is used to add any defined event that occurs. Process the event as they arrive and execute the code – Monitor.
When should we use Watch Service API?
Applications that care about event changes like editor and IDE use this API. Hot deployment of code in IDE could potentially use this service. WatchService API is driven by the events taking place at the Operating System level. It collects all the events on the system buffer. If there are a large number of files being dropped or updated, there is a possibility that the operating system buffer overflows. If the application cannot process the events quickly enough some of the events are lost or discarded before the application could read them. To handle this situation, we need to implement a rate limit on the system which will give buffer time to handle bursts of activity to avoid OVERFLOW. This API is not designed for indexing a storage drive. Most file systems support file change natively as part of their implementation. The Watch Service API uses this feature to listen to events sent by OS. However, when a file system does not provide support to this mechanism natively, we can configure a polling mechanism to listen to events.
Apache Commons
org.apache.commons.io is a utility package for working with streams, readers, writers, and files. To use this implementation:
- Create FileAlterationListener implementation(s) that processes events on the file/directory like create, change and delete events
- Register the listener(s) with a FileAlterationObserver for the directory or path.
- Register the observer(s) with a FileAlterationMonitor that will monitor the directory observer
The Commons IO library is not based on the operating system events and hence there is no question of overflow. In every poll, the observer gets the list of files using listFiles() of File Class in the directory and compares it with the list obtained from the previous poll.
- If a file is found in the last poll, onFileCreate() is invoked on the listener
- If a file name found in the previous poll is missing in the file list obtained from the last poll, onFileDelete() is invoked on the listener
- If a match is found, the file is checked for any change in attributes like last modified date, length, etc. In this event onFileChange() is invoked on the listener.
Let’s create an example, first we need an Observer for the directory and register the listeners:
Java
File directory = new File( new File( "." ), "test" ); FileAlterationObserver observer = new FileAlterationObserver(directory); observer.addListener( this ); |
Next, register the observer with a Monitor, which creates a new thread, invoking the observer at the specified interval:
Java
long pollInterval = 100 FileAlterationMonitor monitor = new FileAlterationMonitor(pollInterval); monitor.addObserver(observer); monitor.start(); ....monitor.stop(); |
The listener will have the processing logic of the file, in this case, say we override onFileCreate()
Java
@Override public void onFileCreate( final File newFile) { .. .. } |
Apache commons monitoring uses a polling mechanism, at the fixed interval the algorithm checks for any new events. It works great on Unix and Windows platforms and supports network drives. In my experience, the JDK watch service gets complicated with network drives. It is efficient with even large volumes of files. Since it polls and calls listFiles() after each polling interval, it will take up unnecessary CPU cycles, if the input file inflow is not high. A right polling interval can help to some extent, but the CPU cycle usage is high compared to JDK WatchService. JDK WatchService on another hand is event-based so no polling is required. It relies on the event mechanism of OS to trigger events so less CPU is required. Based on the rate at which files are being changed or events are triggered the CPU usage is determined. If there are a bunch of events occurring in a burst, it can lead to event overflow.
Difference Between Apache Commons and Java NIO
Apache Commons |
Java NIO |
---|---|
Apache commons monitoring uses a polling mechanism, at the fixed intervals the algorithm checks for any new events. | JDK WatchService does not need polling, it is event-based. |
It is useful when huge files are dropped frequently. | The Watch Service API is designed for applications that need to be notified about file change events. It is well suited for any application, as an editor or ID |
It works great on Unix and Windows platforms and supports network drives. | It does not support network drives. |
Uses CPU cycles every time it polls. | JDK WatchService relies on the event mechanism of OS, so no polling is required. |
It can handle huge files. | Based on the rate at which files are being changed or events are triggered the CPU usage is determined. If there are a bunch of events occurring in a burst, it can lead to event overflow. |