fgets is slow

Gateway 2000
September 9, 2009 at 12:32:40
Specs: Microsoft Windows XP Professional, AMD Sempron 1.6 GHz / 2 GB of RAM
Hey, I'm facing a strange problem. I've created an application that reads in lines from a file using fgets. In the last version of the program, I developed the program in C using Pelles C. For this new version, I converted the program to C++ and used Visual C++ Studio Express 2008, but I still use fgets. The problem is that fgets is very slow in this version compared to the last version I did in C. The last version can read in the lines of a 131MB text file in 23 seconds, while this version does it in 1 minute, 49 seconds, executing the same exact code!! The code is below. I've stripped everything from the code to isolate the bottleneck.

int ProcessFile(char* in)
{
	FILE* fh1;
	char line[4097];

	fh1 = fopen(in,"r");
	if(fh1 == NULL)
		return -1;

	while(!feof(fh1))
	{
		fgets(line,4096,fh1); // bottleneck
	}

	fclose(fh1);

	return 0;
}

It's no better with optimizations on or off. It makes no difference if I'm in Debug or Release mode. Is there a compiler option I need to check? Thanks!

WinSimple Software
CompTIA A+ Certified


See More: fgets is slow

Report •


#1
September 10, 2009 at 04:12:29
I don't know how the performance of Pelles C run-time library compares with that of Visual Studio's, but one thing to consider is: are you running both apps (Pelles C and VS) using the exact same one-and-only 131 MB file? Or are you using two separate copies of the same file (e.g. in different directories)? The reason I ask is that one may be contiguous, and the other copy may be fragmented.

Report •

#2
September 10, 2009 at 07:17:03
Other possible sources of performance problems include the read buffer size, and the fact that the C run-time library converts every CR LF line terminator ("\r\n") in the text file to LF ("\n") during the read operation. Perhaps Pelles C uses a larger buffer, and perhaps it reads the file as binary by default? Just speculating because I don't know anything about Pelles C.

Also, some comments on your function:

int ProcessFile(char* in)

You are not modifying this argument, so why not const char *name?
{
	FILE* fh1;
	char line[4097];

Why 4097? You are passing 4096 to fgets. It will only write that many characters. You don't need to declare one extra byte.
	fh1 = fopen(in,"r");
	if(fh1 == NULL)
		return -1;

	while(!feof(fh1))
	{
		fgets(line,4096,fh1); // bottleneck

If you are at the end of the file, this will not be detected until AFTER you call fgets. At this point in your program, you will have read PAST the end of the file but you are still inside the while-loop trying to process data that doesn't exist. This while(!feof(fh1)) technique worked in Pascal, but it doesn't work in C. You need to do this:
	while (fgets(line, sizeof line, fh1))
	{
		//...
	}

	fclose(fh1);

	return 0;
}


Report •

#3
September 10, 2009 at 12:55:50
Thanks for your reply! The same file was being processed by both versions, yes.

I think I figured it out. I used a profiler to come to the conclusion that it was fgets that was the problem. Therefore, I changed the code around to make it simpler so I could ask here what might be the problem. I didn't test what I pasted here. I should've. The problem is really ftell that's slow.

Here's how it was really done:

while((ftell(fh1)) < filesize)
{
    fgets(LINE, 4096, fh1);
}

I switched it to what I pasted before and it gets done in 5 seconds.

I used 4097 as the buffer size because using 4096 caused it to crash. Figure that one out.

I noticed the mistake I made with !feof and fixed it.

WinSimple Software
CompTIA A+ Certified


Report •

Related Solutions

#4
September 10, 2009 at 15:51:56
How ftell works with text files is unspecified in the ISO Standard, and I think Microsoft's ftell returns the number of characters that would have been read since the beginning of the file. In order to do this, every time you call it it reads from the beginning of the file looking for CR LF character pairs and counts them as one character (the newline). Other versions of ftell simply return the absolute position (as if it were opened as a binary file) which is obviously much faster.

I have no idea why your program was crashing when you had LINE as 4096 in size. fgets reads, at most, one character less then what you specify in the second argument. It does this to make space at the end of the buffer for the null terminator.


Report •


Ask Question