Opened 13 years ago

Closed 13 years ago

Last modified 13 months ago

#8572 closed patch (wontfix)

scanf for Common::File

Reported by: salty-horse Owned by: fingolfin
Priority: normal Component: --Other--
Keywords: Cc:
Game:

Description

Attached is an implementation of scanf for the file
class. It is very basic, and may not meet everyone's
needs. It may also be unportable - Are there any C
standards being violated?

The reason for the implementation:

During my fiddlings with the Kingdom O' Magic I needed
to parse large text files. Some of the data in the
files was spread over several lines.

* Using readLine was a bit unreadable since there were
several instances where I called readLine over and over
just to skip lines.
* Reading the entire file into a string and using
sscanf was very slow since sscanf calls strlen() each
and every time. It's very noticable on large files.

Using the new scanf greatly improved the readability
and speed of my text-parsing code.

Ticket imported from: #1553631. Ticket imported from: patches/677.

Attachments (2)

common-file-scanf.patch (1.0 KB ) - added by salty-horse 13 years ago.
common-file-scanf-gcc-warning.patch (1.4 KB ) - added by salty-horse 13 years ago.

Download all attachments as: .zip

Change History (11)

by salty-horse, 13 years ago

Attachment: common-file-scanf.patch added

by salty-horse, 13 years ago

comment:1 by salty-horse, 13 years ago

I'm attaching a new patch that adds gcc scanf warnings to
the scanf function.

I'm not sure why cdecl was added to error(), warning() and
friends in common/util.h. the scanf warnings work fine
without it.

comment:2 by fingolfin, 13 years ago

Owner: set to fingolfin

comment:3 by fingolfin, 13 years ago

I am rather sceptical about this patch. This functionality just doesn't seem to
belong into the File class. It's a crude hack in my eyes... maybe it's the only
efficient / sane way to do it, in which case I might be convinced, but as it is, I
have to wonder if there isn't a better / more elegant solution....

comment:4 by salty-horse, 13 years ago

It's an action on a file, so it naturally belongs in the
File object, SeekableReadStream, MemoryReadStream or ReadStream.

Putting it in one of the ReadStream's will lose some of the
efficiency, because sscanf() will have to be used instead of
scanf(), which as I have already stated, calls strlen()
(instead of just using the size() method of those objects).

If the stream and file classes were based on C++'s iostream,
the scanf feature would be built-in in all of the input
streams - has this been considered, or is the use of stdio a
leftover from the C to C++ transition?

Could you specify why you see this as a hack, and why it
doesn't belong in the File class?

comment:5 by fingolfin, 13 years ago

We do not use iostreams, or the STL, for portability and efficiency reasons.
This has been covered on scummvm-devel in the past, go search the forums.
So it's *not* a matter of some kind of "C to C++ transition" or anything like
that. Same for our custom container classes, our custom string class, etc.

As for why I consider it a hack: First of, scanf is a function that acts on an
arbitrary stream of data. If at all, it should be part of the ReadStream class.

The point that this will cause a performance penalty is valid, but is the wrong
argument in my eyes -- just putting something somewhere because it's the
quickest way to get a working efficient "solution" usually is not a good drive
for the design of a class, and I am deeply convinced that this is just the case
here.
After all, scanf is a very raw C function. It's powerful, but arcane. It's nothing
I'd put into a C++ stream class, even less into a file class (you don't see a
"scanf" method in iostreams either).

Personally, I'd think it would make more sense to have such a function as an
external function that simply takes a ReadStream as one of its input
parameters.

But all in all, I still don't quite understand why you can't read those files line-
by-line and user scanf on them. If having to skip empty lines is your only
concern, well, writing a wrapper around readLine that does that automatically
is a trivial task. So could you elaborate what exactly is problematic about this?

comment:6 by salty-horse, 13 years ago

Using scanf() is not neccesary. I can read the files
line-by-line, but I think it's messy.
For example, a loop that contains:

f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d", &index);
sscanf(line, "%*d %s %d %d", /*vars*/);
f.readLine(_characters[index].desc, 50);
stripUndies(_characters[index].desc);
f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d", /*vars*/);
f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d %d %d %d %d",/*vars*/);
f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d %d %d", /*vars*/);
f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d %d %d %d", /*vars*/);
f.readLine(line, LINE_BUFFER_SIZE);
sscanf(line, "%d %d %d %d", /*vars*/);
// Skip seperator lines
f.readLine(line, LINE_BUFFER_SIZE);
f.readLine(line, LINE_BUFFER_SIZE);

Can turn into:

f.scanf("%d", &index);
f.scanf(
"%s %d %d\n"
"%[^\r]\n"
"%d\n"
"%d %d %d %d %d\n"
"%d %d %d\n"
"%d %d %d %d\n"
"%d %d %d %d\n\n",
/* vars */);
stripUndies(_characters[index].desc);

From 17 function calls to just 3.

All of the text files in the game are parse-able with
readLine(). There aren't even any cases where data sometimes
appears as "1 2 3" and somtimes as "1 2\n3", so I don't need
to check for the number of matches sscanf() made (with
File::scanf() I wouldn't have to worry about this at all).

I could make a readLineScanf() that groups some of the calls .

Regarding the file class, scanf() is an advantage to working
with files which cannot be emulated with sscanf(). I think
it's a shame to miss out on such a great feature. Despite
it's complexity, I regard fscanf() on the same level as
fread() and fwrite() which are also accesible via the File
class. All of these functions modifiy the file handle which
is private to the File class.

scanf is not in iostream but the ">>" operator provides
most, if not all, of its features.

If an external func that takes a ReadStream is a better
solution, then why not implement it as a member of
ReadStream? (Using sscanf on MemoryReadStream would require
some changes, such as adding a \0 to the end of the buffer)

comment:7 by fingolfin, 13 years ago

fscanf is not at all on the same level as fread/fwrite,
but rather above them. Plus it's an extremely arcane
yet powerful function. Just the thing for people that
love to write tight and hard to read but efficient code
(i.e. C coders *g*). I love it and scanf in programming
contests. For maintainable production code, though,
uhm.... But I digress :-)

Adding zero bytes to the end of memorystreams is no
option. A memory stream represents a part of memory
(which might even be read-only), not a C-string, and
hence shouldn't and couldn't get a zero byte added at
the end.

Based on what I read here, it seems to me the easy and
simple solution is that you write yourself a
readLineScanf() wrapper and use that. Should we ever
have to parse large text files in an engine, we can
rediscuss this, but I am not willing to dillute the
stream classes just because we save a few lines of
typing and add support for a hypothetically needed
feature...

(And in fact, should we have to parse large files, we
could of course use (f)scanf -- or we could simply
write a simple Parser class, which takes a stream as
parameter, and allows you to call methods like
readDouble, readInt, etc.. Or if you really prefer the
syntactic sugar, overloards operator >> to emulate
iostream. Alas, all that is purely hypothetical raving).

comment:8 by fingolfin, 13 years ago

Resolution: wontfix
Status: newclosed

comment:9 by digitall, 13 months ago

Component: --Other--
Note: See TracTickets for help on using tickets.