The other day I was discussing the finer points of flat file conversion in Application Development (or lack thereof) with one of my colleagues on the Definition 6 Architecture team, and we got into a rather interesting conversation regarding how much of today’s data still gets transmitted via data documents of some kind. Modern programming practice is to transfer data via on-demand services, it's faster, easier, and in most cases much more secure. The truth of the world we live in is that most older systems, and an uncomfortably large number of new systems still transmit data to business partners across the world using plain old text files filled with data in whatever standard they were supporting when the system was built. How's that for communcation planning? [Note: this practice has a name – Electronic Data Interchange, or EDI, and Microsoft has an exceptional tool for optimizing and synhcronizing these efforts, Biztalk 2010, but that's another post entirely – JT]
Now, my colleague, who shall remain nameless, pointed out that these systems have been functioning without problem for years, so something must be said for the persistence of such file transfer processes. My carefully worded retort was that this perceived stability was the direct result of poor sods like myself working my tail off to keep them going. At this point I feel it is my duty to disclose that while it doesn’t seem to have affected him personally, my colleague was once a long-time employee for an organization many would lay the finger of blame at for the creation and continued use of this file-based data transfer silliness, even in the face of more reliable, efficient alternatives (I won’t disclose the name of this much maligned organization, but it rhymes closely with "aye, be them"). And conceding to my colleagues’ point, this is surely the reasoning why these processes still exist today. (A clearer case of ‘if it ain’t broke, don’t fix it’ you’ll never find)
Thus, like them or not, we as developers are stuck with these processes, and it is up to us to implement solutions that accommodate them as best as possible. Now, a simply staggering majority of these systems rely on FTP transfer to distribute their data files, and typically it is up to the receiving party to pick up these files from an ftp "dropbox" and process them in a timely manner. FTP transfer has its own peculiarities that do not help this process, however, the most prominent of which being unpredictable transfer times and the disparity between the file arriving at the destination dropbox and the file completely downloading in its entirety from its origin. Aha! Herein lays our opportunity.
The problem of uncertain FTP delivery schedules causes file recipients to either delay retrieving the files from the dropbox until a time when they are certain the file will be there, "nightly file transfers," etc., or resort to "polling" to periodically spin up a process to look for the file being in the dropbox, and if it is, begin processing of the received file or pass it along to another process for further manipulation. These practices are inherently flawed and incur extensive overhead in time and system resources while constantly checking to see if the file has arrived, then locking resouceswhile the file finishes downloading before the processing of the files data can finally take place.
With .Ne however, these problems can be solved easily enough – if we can’t change the process, at least we can make it better, right? Let’s get started.
The key to our solution is a little-known member of the .NET System.IO namespace, the FileSystemWatcher class. The FileSystemWatcher does exactly what its name implies – it’s a lightweight object that monitors a directory and raises events to any changes that occur. We’re going to build a small console application and use a FileSystemWatcher to monitor our ftp directory for any new files that get created. Code Segment 1 details our System.IO.FileSystemWatcher implementation.
Code Segment 1
class Program
{
static void Main(string[] args)
{
//This should be an actual ftp directory path,
//preferably from an App.Config file
String ftpPath = "path-to-ftp-directory";
//Our watcher!
FileSystemWatcher watcher = new FileSystemWatcher(ftpPath);
// Add event handlers for file created event
watcher.Created += new FileSystemEventHandler(OnCreated);
//Begin watching.
//you need to set this to enable the FileSystemWater to raise events
watcher.EnableRaisingEvents = true;
// Code to shutdown the console if the user hits 'q'
Console.WriteLine("Press 'q' to quit the sample.");
while (Console.Read() != 'q') ;
}
Right, so pretty straight-forward so far, a simple console app in which we setup the directory to be watched, implement a new instance of the FileSystemWatcher class then enable it to raise events whenever anything occurs in that directory. Now, the event we’re clearly interested in here is the Created event, which will fire every time a new ftp transfer arrives in our directory. To react to this Created event we wire up an instance of the FileSystemEventHandler to the watcher’s created event and point it to our method OnCreated, which is outlined in Segment 2.
Code Segment 2
// Define the event handler
private static void OnCreated(object source, FileSystemEventArgs e)
{
// write file name and arrival time out to the console when new files arrive
StringBuilder sb = new StringBuilder();
sb.Append("File: ");
sb.Append(e.FullPath);
sb.Append(" arrived @");
sb.Append(DateTime.Now.ToShortTimeString());
sb.Append(". Processing...");
Console.WriteLine(sb.ToString());
//now route the file to where it needs to go.
ProcessFile(e.FullPath);
}
Again, fairly straight-forward, the OnCreated method simply reacts to the watcher’s Created event, and allows us to kick off whatever further processing we need to, namely writing out to the console the name and arrival time of the new file, then handing the file off to another method for further processing. Notice the FileSystemEventArgs object in the OnCreated method’s signature – we need to implement this class in order to capture the event, but it also allows us to work with a number of key parameters regarding the watcher.Created event, in particular the e.FullPath property which we’ll use to programmatically work with the newly arrived ftp file. The ProcessFile method is outlined in Segment 3.
Code Segment 3
private static void ProcessFile(String filepath)
{
FileInfo file = new FileInfo(filepath);
//Switch to handle different file types
switch (file.Extension)
{
case "txt":
//process text logic here
break;
case "xml":
//process xml logic here
break;
case "csv":
//process csv logic here
break;
default:
break;
}
Ok, so this is the final piece of our solution; we want our console app to be running constantly on our ftp server to watch our ftp dropbox at all times, so we need to be sure it’s as lightweight as possible and doesn’t maintain any internal state whatsoever, otherwise we’re adding extra load to our ftp server, and that’s entirely against what we set out to do in the first place, isn't it?. So let’s not do that.
The ProcessFile method is our routing method to move the file or notify any further services down the line that the file has arrived [an exceptional opportunity to implement the .NET Event Pattern, discussed in my previous post - JT]. We declare a FileInfo class to derive the extension of the file and route the file to a final destination based on file type. By doing so, our console app never opens the file, never reads it into memory or maintains anything that would drain resources away from our server’s memory or processing pool, so it can run quietly alongside the rest of the server’s workload catching every new file that arrives in our FTP directory, and routing them to their final destination.
Ooo, Aahh.
Now, as simple as this solution is, we need to recognize what this implementation saves us – every time a typical FTP polling process started up, it would need to first gain access to the FTP directory, declare the directory reference in memory, then enumerate all the child directories (even if there weren’t any) followed by enumerating all the files contained in the directory to check if the file the batch process is looking for is there, and because the file transfer cannot be guaranteed to arrive at a specified time, the process would have to execute repeatedly until the file was finally found. Our FileSystemWatcher class, on the other hand simply responds to events that occur within the FTP dropbox, consuming substantially less resources.
So the key for gracefully processing FTP files without having to wait for the entire file to arrive is to process the file asynchronously. Doing so allows the main program to continue receiving file processing requests without having to wait for the code that actually processes the file. .NET provides a number of different avenues to finish that thought with, particularly in .NET 4.0, but that’s a bigger topic I’ll save for a later post. (Oh yes I did)
So to recap, yes we still have to deal with flat file transfers, but at least we can do so in a better manner, can’t we? Oh yes we can.