#166 — UnicodeEncodeError in RSS feed

State Resolved
Version: 1.1
Area Functionality
Issue type Bug
Severity Medium
Submitted by (anonymous)
Submitted on Jan 25, 2008
Responsible Maurits van Rees
Target release: 1.2
When clicking on the RSS button, got an error:


Site error

This site encountered an error trying to fulfill your request. The errors were:

Error Type
    UnicodeEncodeError
Error Message
    'charmap' codec can't encode character u'\u2019' in position 2: character maps to
Request made at
    2008/01/25 12:06:51.180 US/Pacific

This happens when there is text pasted from Word (mostly the smartquote). If I didn't click on RSS, the issues showed up with no problem.

Here is the traceback.

Traceback (innermost last):
  Module ZPublisher.Publish, line 115, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 41, in call_object
  Module Shared.DC.Scripts.Bindings, line 311, in __call__
  Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
  Module Products.CMFCore.FSPageTemplate, line 195, in _exec
  Module Products.CMFCore.FSPageTemplate, line 134, in pt_render
  Module Products.PageTemplates.PageTemplate, line 104, in pt_render
   - <FSPageTemplate at /Home/poi-issue-search-rss.xml used for /Home/systems/helpdesk>
  Module TAL.TALInterpreter, line 238, in __call__
  Module TAL.TALInterpreter, line 281, in interpret
  Module TAL.TALInterpreter, line 715, in do_condition
  Module TAL.TALInterpreter, line 281, in interpret
  Module TAL.TALInterpreter, line 691, in do_loop_tal
  Module TAL.TALInterpreter, line 281, in interpret
  Module TAL.TALInterpreter, line 457, in do_optTag_tal
  Module TAL.TALInterpreter, line 442, in do_optTag
  Module TAL.TALInterpreter, line 437, in no_tag
  Module TAL.TALInterpreter, line 281, in interpret
  Module TAL.TALInterpreter, line 542, in do_insertText_tal
  Module Products.PlacelessTranslationService.FasterStringIO, line 119, in write
  Module ZPublisher.HTTPResponse, line 454, in _encode_unicode
  Module encodings.iso8859_15, line 18, in encode
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2019' in position 2: character maps to <undefined>
Steps to reproduce:
Use some "smart quote" in a word file, and paste it into an issue.
Click on RSS for unconfirmed issues
Added by Maurits van Rees on Jan 25, 2008 10:25 PM
Target release: 1.2None
Responsible manager: maurits(UNASSIGNED)
Hi,

You may want to take a look at issue #124 which is very similar. I rejected that though, as I could not reproduce it.

What this problem boils down to, I think, is that you pasted a character into your site that is not viewable in the iso-8859-15 encoding.

So you need to configure your Plone Site to output utf-8 instead of iso-8859-15. Or maybe your browser/feedreader should use utf-8.

I do not think Poi can do anything about this. But I am not 100 percent sure. It *is* a bit suspect that viewing the issue goes fine but viewing the rss gives problems.

What happens when you paste that character into a normal Document? Can you view that document normally? And when you look at an rss feed with that Document (by having it show up in a SmartFolder I think or some rss for search results), do you get that error too?
Added by Maurits van Rees on Jan 25, 2008 10:49 PM
Issue state: unconfirmedopen
Responsible manager: (UNASSIGNED)maurits
BTW, the poi-issue-search-rss.xml.pt template differs from the other templates in two ways that may be important here.

It starts with an xml tag which has encoding information:

  <?xml version="1.0" encoding="UTF-8"?>

And it sets a header in the request response:

  request.RESPONSE.setHeader('charset', 'UTF-8')

Hm, so apparently here we really want to send out utf-8 and not whatever encoding the client program wants. Which could make sense. But then we should not be encoding anything into iso-8859-15, but only in utf-8. Maybe we can "fix" the incoming request somehow to tell the Zope Page Template machinery that we really want utf-8. Sounds a bit hackish, but worth a shot.

I will see if I can try something next week. Or maybe I have given you enough hints to try something yourself. :)
Added by (anonymous) on Jan 28, 2008 05:45 PM
Okay. I did another test. I made sure that my firefox is set to utf-8, and added an issue this this content:

“double Quotetation marks”
‘single quotes’

The issues shows up fine, but when I click RSS for all unconfirmed issues, I got the same error as in the original post again.

Then I pasted the same content into a new Page/Document, and added those smart quotes into the title, description and body text, saved it. And I did a search, clicked on the RSS of the result set that included the newly created document, the RSS feed worked fine.

These were all done on the same computer, same browser settings, and the same firefox rss reader (live bookmark).

--bin
Added by Maurits van Rees on Jan 28, 2008 10:02 PM
I cannot reproduce your error. But when I paste those curly quotes in an issue, I get question marks in the rss, which is also not nice.

I fixed that on trunk (at least for me) by not setting any response headers.
See r57837. Can you try and see if that fixes it for you as well?

I did not see you mention your Plone version yet. Trunk is for Plone 3. So if you are on Plone 2.5, you will have to make that change by hand yourself.

Strangely, the rss template has this as first line, both before and after my fix:

  <?xml version="1.0" encoding="UTF-8"?>

But before my fix the resulting feed in the browser had this as first line:

  <?xml version="1.0" encoding="iso-8859-15"?>

How it does that, I do not know...
Added by Maurits van Rees on Jan 28, 2008 10:12 PM
Issue state: openresolved
Target release: None1.2
Actually, on Plone 2.5 with branch 1.1 I *can* reproduce that error. And the fix for trunk indeed fixes it. So I merged that to the branch in r57839.

I will consider this fixed, but if you can report back if this really fixes it for you too, that would be great.
Added by (anonymous) on Jan 28, 2008 11:48 PM
It did fix the problem! Thanks!

--bin

No responses can be added.