Opened 14 years ago

Closed 14 years ago

Last modified 2 years ago

#8572 closed patch (wontfix)

scanf for Common::File

Reported by: salty-horse Owned by: fingolfin
Priority: normal Component: --Other--
Keywords: Cc:
Game:

Description

Attached is an implementation of scanf for the file class. It is very basic, and may not meet everyone's needs. It may also be unportable - Are there any C standards being violated?

The reason for the implementation:

During my fiddlings with the Kingdom O' Magic I needed to parse large text files. Some of the data in the files was spread over several lines.

* Using readLine was a bit unreadable since there were several instances where I called readLine over and over just to skip lines. * Reading the entire file into a string and using sscanf was very slow since sscanf calls strlen() each and every time. It's very noticable on large files.

Using the new scanf greatly improved the readability and speed of my text-parsing code.

Ticket imported from: #1553631. Ticket imported from: patches/677.

Attachments (2)

common-file-scanf.patch (1.0 KB ) - added by salty-horse 14 years ago.
common-file-scanf-gcc-warning.patch (1.4 KB ) - added by salty-horse 14 years ago.

Download all attachments as: .zip

Change History (11)

by salty-horse, 14 years ago

Attachment: common-file-scanf.patch added

by salty-horse, 14 years ago

comment:1 by salty-horse, 14 years ago

I'm attaching a new patch that adds gcc scanf warnings to the scanf function.

I'm not sure why cdecl was added to error(), warning() and friends in common/util.h. the scanf warnings work fine without it.

comment:2 by fingolfin, 14 years ago

Owner: set to fingolfin

comment:3 by fingolfin, 14 years ago

I am rather sceptical about this patch. This functionality just doesn't seem to belong into the File class. It's a crude hack in my eyes... maybe it's the only efficient / sane way to do it, in which case I might be convinced, but as it is, I have to wonder if there isn't a better / more elegant solution....

comment:4 by salty-horse, 14 years ago

It's an action on a file, so it naturally belongs in the File object, SeekableReadStream, MemoryReadStream or ReadStream.

Putting it in one of the ReadStream's will lose some of the efficiency, because sscanf() will have to be used instead of scanf(), which as I have already stated, calls strlen() (instead of just using the size() method of those objects).

If the stream and file classes were based on C++'s iostream, the scanf feature would be built-in in all of the input streams - has this been considered, or is the use of stdio a leftover from the C to C++ transition?

Could you specify why you see this as a hack, and why it doesn't belong in the File class?

comment:5 by fingolfin, 14 years ago

We do not use iostreams, or the STL, for portability and efficiency reasons. This has been covered on scummvm-devel in the past, go search the forums. So it's *not* a matter of some kind of "C to C++ transition" or anything like that. Same for our custom container classes, our custom string class, etc.

As for why I consider it a hack: First of, scanf is a function that acts on an arbitrary stream of data. If at all, it should be part of the ReadStream class.

The point that this will cause a performance penalty is valid, but is the wrong argument in my eyes -- just putting something somewhere because it's the quickest way to get a working efficient "solution" usually is not a good drive for the design of a class, and I am deeply convinced that this is just the case here. After all, scanf is a very raw C function. It's powerful, but arcane. It's nothing I'd put into a C++ stream class, even less into a file class (you don't see a "scanf" method in iostreams either).

Personally, I'd think it would make more sense to have such a function as an external function that simply takes a ReadStream as one of its input parameters.

But all in all, I still don't quite understand why you can't read those files line- by-line and user scanf on them. If having to skip empty lines is your only concern, well, writing a wrapper around readLine that does that automatically is a trivial task. So could you elaborate what exactly is problematic about this?

comment:6 by salty-horse, 14 years ago

Using scanf() is not neccesary. I can read the files line-by-line, but I think it's messy. For example, a loop that contains:

f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d", &index); sscanf(line, "%*d %s %d %d", /*vars*/); f.readLine(_characters[index].desc, 50); stripUndies(_characters[index].desc); f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d", /*vars*/); f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d %d %d %d %d",/*vars*/); f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d %d %d", /*vars*/); f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d %d %d %d", /*vars*/); f.readLine(line, LINE_BUFFER_SIZE); sscanf(line, "%d %d %d %d", /*vars*/); // Skip seperator lines f.readLine(line, LINE_BUFFER_SIZE); f.readLine(line, LINE_BUFFER_SIZE);

Can turn into:

f.scanf("%d", &index); f.scanf( "%s %d %d\n" "%[^\r]\n" "%d\n" "%d %d %d %d %d\n" "%d %d %d\n" "%d %d %d %d\n" "%d %d %d %d\n\n", /* vars */); stripUndies(_characters[index].desc);

From 17 function calls to just 3.

All of the text files in the game are parse-able with readLine(). There aren't even any cases where data sometimes appears as "1 2 3" and somtimes as "1 2\n3", so I don't need to check for the number of matches sscanf() made (with File::scanf() I wouldn't have to worry about this at all).

I could make a readLineScanf() that groups some of the calls .

Regarding the file class, scanf() is an advantage to working with files which cannot be emulated with sscanf(). I think it's a shame to miss out on such a great feature. Despite it's complexity, I regard fscanf() on the same level as fread() and fwrite() which are also accesible via the File class. All of these functions modifiy the file handle which is private to the File class.

scanf is not in iostream but the ">>" operator provides most, if not all, of its features.

If an external func that takes a ReadStream is a better solution, then why not implement it as a member of ReadStream? (Using sscanf on MemoryReadStream would require some changes, such as adding a \0 to the end of the buffer)

comment:7 by fingolfin, 14 years ago

fscanf is not at all on the same level as fread/fwrite, but rather above them. Plus it's an extremely arcane yet powerful function. Just the thing for people that love to write tight and hard to read but efficient code (i.e. C coders *g*). I love it and scanf in programming contests. For maintainable production code, though, uhm.... But I digress :-)

Adding zero bytes to the end of memorystreams is no option. A memory stream represents a part of memory (which might even be read-only), not a C-string, and hence shouldn't and couldn't get a zero byte added at the end.

Based on what I read here, it seems to me the easy and simple solution is that you write yourself a readLineScanf() wrapper and use that. Should we ever have to parse large text files in an engine, we can rediscuss this, but I am not willing to dillute the stream classes just because we save a few lines of typing and add support for a hypothetically needed feature...

(And in fact, should we have to parse large files, we could of course use (f)scanf -- or we could simply write a simple Parser class, which takes a stream as parameter, and allows you to call methods like readDouble, readInt, etc.. Or if you really prefer the syntactic sugar, overloards operator >> to emulate iostream. Alas, all that is purely hypothetical raving).

comment:8 by fingolfin, 14 years ago

Resolution: wontfix
Status: newclosed

comment:9 by digitall, 2 years ago

Component: --Other--
Note: See TracTickets for help on using tickets.