#26 — retrieveSingleFeed fails for certain feeds without guid or link

by Horacio Duran last modified Apr 16, 2010 09:06 PM
State Resolved
Version: 1.0.1
Area Functionality
Issue type Bug
Severity Medium
Submitted by Horacio Duran
Submitted on Apr 16, 2010
Responsible Maurits van Rees
Target release:
I have come across some feeds that w3c marks as valid but have no guid or link.
This fails when _retrieveSingleFeed in utility.py generates the id.
T checks
I did a quick patch for my site but my knowledge of rss is limited (it is the second try/except and the behavior of the except).
I just added another fallback in case link and guid are not present.

 93 for entry in parsed.entries:
 94 try:
 95 sig = md5.new(entry.id)
 96 except AttributeError:
 97 # Sometimes, rss providers send items without guid element.
 98 try:
 99 sig = md5.new(entry.link)
100 except AttributeError:
101 sig = md5.new(str(title_detail))
102 id = sig.hexdigest()

Steps to reproduce:
Install the product.
Create a feed folder.
Add the following source (it failed at the moment of reporting this bug) http://rss.howardshome.com/[…]/rssnet.aspx?uid=94664&pid=1501042
Update feeds.
Added by Maurits van Rees on Apr 16, 2010 09:06 PM
Issue state: UnconfirmedResolved
Responsible manager: (UNASSIGNED)maurits
Actually, if both the guid element and link are missing, then we cannot get a uniquely identifiable id so we cannot know if this is a new item that should be added or an existing that should be updated. So we should ignore the entry entirely for safety.

Fixed on trunk in r115532; merged to branch 1.0 in r

No responses can be added.