Opened 7 years ago

Closed 6 years ago

Last modified 13 months ago

#6319 closed defect (fixed)

WEB: XML Parse Error with RSS feed

Reported by: SF/mase76 Owned by: bluegr
Priority: normal Component: Web
Keywords: Cc:
Game:

Description

Hi!
I got an error with your RSS feed with Tiny Tiny RSS:
This XML document is invalid, likely due to invalid characters. XML error: Undeclared entity error at line 64, column 24
I reported this to TTRSS, but such errors seem to be site related.

Thomas

Ticket imported from: #3612781. Ticket imported from: bugs/6319.

Change History (14)

comment:1 by digitall, 7 years ago

Summary: Error with RSS feedWEB: XML Parse Error with RSS feed

comment:2 by digitall, 7 years ago

mase76: The parse error is associated with this line in our current RSS feed output:
<title>Touch&eacute;: The Adventures of the Fifth Musketeer Music Enhanced Soundtrack Released</title>

Specifically with the "&eacture;;" token which is used to provide an e with acute accent as Touche is french! :)

I'm not sure myself if this is a bug as I think this is correctly escaped and not malformed XML AFAIK... I suspect your reader is not dealing well with non-ASCII characters?

comment:3 by DrMcCoy, 7 years ago

$ xmllint scummvm.rss
scummvm.rss:64: parser error : Entity 'eacute' not defined
<title>Touch&eacute;: The Adventures of the Fifth Musketeer Music Enhanced So

No, &eacute; is /not/ a valid XML token. You see it alot in RSS feeds, and it always pisses me off, because this is /invalid XML/.

Since our RSS feed specifies an UTF-8 encoding, we should use the actual UTF-8 encoded é.

comment:4 by DrMcCoy, 7 years ago

See also: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML

Note that there's no eacute.

comment:5 by digitall, 7 years ago

drmccoy: I stand corrected.. I didn't know that XML standardised anything outside of the tag format, though I suspected RSS-XML would be a stricter specific definition, including escaping of non-ASCII characters and/or Unicode support.

Since this is generated from our website news feed, I suspect the code in:
https://github.com/scummvm/scummvm-web/blob/master/templates/feed_rss.tpl

will need some work to deal with replacing this and other HTML ISO-8859-1 escaped characters with their unicode equivalent... or switch the RSS feed to HTML ISO-8859-1 ?

comment:6 by DrMcCoy, 7 years ago

If we switch the RSS to ISO-8859-1, we have to replace the "&eacute;" with "&#233;".
There are no predefined entities for those characters in XML at all, and neither in RSS > 0.9. And we really don't want to use RSS 0.9. Also, this won't help with our Atom feed (which, by the way, is currently broken with the inclusion of the invalid &eacute; as well).

What we could theoretically do is define those entities ourselves, by added a DTD with entity definitons ( see https://en.wikipedia.org/wiki/SGML_entity#Syntax ), but from what I heard, many RSS readers won't parse that correctly.

So, we really should fix our RSS generator to convert the HTML entities into UTF-8 characters.
Interestingly, the ScummVM Planet feed already does that somehow.

comment:7 by digitall, 7 years ago

Not surprising as the planet uses a different web server config. It's templates are here: https://github.com/scummvm/scummvm-sites/tree/web-planet/scummvm_template . Looking at these, it should be possible to update the main sites eDS and Atom feed to fix this.

comment:8 by digitall, 7 years ago

Owner: set to djwillis

comment:9 by digitall, 7 years ago

djwillis: Since fixing this will need someone who can do Smarty / PHP code to modify the templates, can you look at this as I assume you did this for the Planet code?

comment:10 by SF/mase76, 6 years ago

I tried to add the feed to the owncloud newsreader.
It throws an error of an invalid xml.
So that is the second reader, which cannot handle
the feed.

comment:11 by digitall, 6 years ago

mase76: Thank you for the further information, but as you can see from the comments here, we know the cause of this... however we don't have many developers with familarity with PHP/Smarty CMS who can work on the tpl code to fix this.

If you are capable of PHP, then please feel free to provide a patch to our code to fix this:
https://github.com/scummvm/scummvm-web/blob/master/templates/feed_rss.tpl

The planet setup deals with this, but is subtly different and thus the solution can NOT be copied across... hence why this bug is still open...
https://github.com/scummvm/scummvm-sites/tree/web-planet/scummvm_template

If you can't fix this, please be patient as djwillis and most of the other project developers are very busy IRL and thus this may take some time to fix...

comment:12 by bluegr, 6 years ago

Owner: changed from djwillis to bluegr
Resolution: fixed
Status: newclosed

comment:13 by bluegr, 6 years ago

This has now been fixed in commit eb35f1bb8c69066474ff8c07e6fc70a93b4b8193. Thanks for reporting, closing.

comment:14 by digitall, 13 months ago

Component: Web
Note: See TracTickets for help on using tickets.