#1 — Error updating feeds

by Christian Ledermann last modified Feb 17, 2009 11:29 PM
State Resolved
Version:
Area Functionality
Issue type Bug
Severity Important
Submitted by Christian Ledermann
Submitted on Oct 02, 2006
Responsible Reinout van Rees
Target release: 1.0


I added a feedfolder with two feeds.

After creation this folder is empty.

when i select update feeds it gives the error:

Traceback (innermost last):
  Module ZPublisher.Publish, line 115, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 41, in call_object
  Module Products.feedfeeder.browser.feed, line 23, in __call__
  Module Products.feedfeeder.browser.feed, line 20, in update
  Module Products.feedfeeder.utilities, line 29, in retrieveFeedItems
  Module Products.feedfeeder.utilities, line 53, in _retrieveSingleFeed
  Module Products.feedfeeder.feedparser, line 236, in __getattr__
AttributeError: object has no attribute 'id'
Steps to reproduce:
Plone version overview
Plone 2.5,
Zope (Zope 2.9.4-final, python 2.4.3, linux2),
Python 2.4.3 (#1, Jul 12 2006, 13:47:38) [GCC 3.4.5 20051201 (Red Hat 3.4.5-2)],
PIL 1.1.5
Added by Reinout van Rees on Oct 02, 2006 11:10 AM
Severity: ImportantMedium
Target release: None
Responsible manager: reinout(UNASSIGNED)
That sounds like feedparser complains about the quality of the feeds: they don't have an ID. Or there's an entry that's missing an ID. Are you by chance loading ATOM feeds? Feedparser is probably a bit picky about that (rightfully so)?
Added by Christian Ledermann on Oct 02, 2006 11:43 AM
Ooops, sorry these were RSS feeds (I implicitly thought as feedfeeder uses feedparser that it is able to read any syndication format). It works fine with Atoim feeds.
Added by Christian Ledermann on Oct 02, 2006 12:01 PM
It would be nice though if feedparser had a gracefull fallback for feeds without an id. Maybe a hash of the url could be a solution. Yes I know some feeds 'reuse' urls for different messages, but for those feeds that do not do that (all plone rss feeds behave well in that respect) it would be some value added. As for the RSS feed the rrs:description should be added to the description of the feeditem.
Added by Christian Ledermann on Oct 02, 2006 12:18 PM
sorry for the noise but i just had second thoughts about adding rss:description into the description of the feeditem. As some sites place html markup into the description it would be preferable if the user could choose where to put the rss:description in body or description. This setting could be made at the feedfolder (well yes this way i have to decide what feeds i put in what folder but that is only a slight inconvienience). A smart folder than can consolidate the feeds from the diffrent feedfolders.
Added by Reinout van Rees on Oct 02, 2006 01:23 PM
Issue state: unconfirmedopen
Severity: MediumImportant
Responsible manager: (UNASSIGNED)reinout
URL as fallback: sounds like a good idea.
Added by Christian Ledermann on Dec 28, 2006 10:21 AM
Patch (1.0beta1):

8a9
> import stripogram
56c57,60
< sig = md5.new(entry.id)
---
> try:
> sig = md5.new(entry.id)
> except:
> sig = md5.new(entry.link)
82c86
< summary = getattr(entry, 'summary' , '')
---
> summary = stripogram.html2text(getattr(entry, 'summary' , '').encode('ascii', 'ignore'),ignore_tags=('img',))[:400]



stripogram is used to strip out html (as this is no good for descriptions). The lenght of the description is limited to 400 Characters (I think that is enough). stripogram unluckily works only with ascii so the .encode strips out all non ascii characters, which will give problems with non english websites (wfm).

RSS feeds can be processed with this patch
Added by Reinout van Rees on Dec 28, 2006 10:45 AM
Issue state: openin-progress
Target release: None
Thanks for the patch, I'll apply it next week (bug me if I haven't).

Do you by chance have collective svn access? You're free to commit the patch yourself then.
Added by Christian Ledermann on Dec 29, 2006 12:27 PM
the try - except to generate the id is pretty unintrusive, but I am a bit worried about the

summary = stripogram.html2text(getattr(entry, 'summary' , '').encode('ascii', 'ignore'),ignore_tags=('img',))[:400]

part which might need more discussion before incoporating it.
Added by Maurits van Rees on Feb 17, 2009 11:29 PM
Issue state: In progressResolved
Target release: 1.0
The try/except has been there since r48812 (September 2007) thanks to thomasdesvenain. Closing the issue.

No responses can be added.