Get the length of your FileStream once when you open it, and don’t forget to advance your position counter every time you read. This is the fastest method by a long shot. In my benchmarking, I’ve found that calling both Position and Length takes twice as long as calling one or the other. After all, I opened the FileStream using the FileShare.Read option, so there was no danger of the file’s length changing, but it appears as though the position and file length are not cached by the class, so every call to Position or Length results in another file system query. Without the profiler results, I never would have imagined that this innocuous looking line of code was cutting the performance of my application into half. I downloaded the ANTS Profiler Demo from Red Gate Software, and was shocked to find that over half the execution time of my program was being spent in the EOF method of my data reader. As it turns out, this statement is a massive performance bottleneck.Īfter finishing the initial build of my application, it was time for some optimization. If your current position is greater than or equal to the length of the stream, you’re going to be pretty hard-pressed to read any additional data. I have no idea why certain byte combinations have been deigned toxic to PeekChar, but prepare for freaky results if you use it. As the result of some dark voodoo process, certain two byte combinations in your binary file can not be converted into an appropriate return value by the method. Who cares? So, you get the next byte for free? Well, something entirely unnatural happens somewhere in the bowels of this method that periodically results in a “Conversion Buffer Overflow” exception. Why doesn’t the BinaryReader include a plain old Peek method that returns the next byte as an int? By now, you’re probably wondering why I’m ranting so much about this. The BinaryReader class is used for reading binary files which are broken into bytes not chars, so why peek at the next char rather than byte? I could understand if there was an issue implementing a common interface, but the TextReader derived classes just use Peek. According to Microsoft, the BinaryReader.PeekChar method “Returns the next available character and does not advance the byte or character position.” The return value is an int containing “The next available character, or -1 if no more characters are available or the stream does not support seeking.” Gee, that sounds awfully useful in determining if we’re at the end of the stream. NET newsgroups, this method is widely used, but I’m not sure why it even exists. If you are using this method in any application, god save you. It may seem silly to have a section on checking for the end of a file (EOF), but there are a plethora of methods employed by programmers, and improperly checking for the EOF can absolutely cripple performance and introduce mysterious errors and exceptions to your application. Hopefully, this article will allow the reader to improve the performance of binary file reading in their application and will shed some light on some of the undocumented performance traps hidden in the System.IO classes. While there is a wealth of information available on the innumerable ways of reading files with C#, there is virtually no discussion about the performance implications of various design decisions. My initial assumption was that throughput would be limited by disk speed, but I found that my first implementation resulted in 100% CPU utilization on my research box. Importing the files into a database would cause a performance hit with no value added, so dealing with the files in their original binary format is the best option. I’ve been working on a time-series analysis project where the data are stored as structures in massive binary files.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |