High Tech Sorcery

fixing a bug in simple-rss

by on Apr.05, 2011, under Development

Just walking through some bug fixing.

original:


if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/un then
CGI.unescape(content).gsub(/(<!\[CDATA\[|\]\]>)/u,'').strip
else
content.gsub(/(<!\[CDATA\[|\]\]>)/u,'').strip
end

the tests in ruby1.9.2 give this error:


simple-rss/lib/simple-rss.rb:155: warning: regexp match /.../n against to UTF-8 string

And this makes sense because /un in the first regex is the same as /n.  So it was
altered to:


if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/u then
CGI.unescape(content).gsub(/(<!\[CDATA\[|\]\]>)/u,'').strip
else
content.gsub(/(<!\[CDATA\[|\]\]>)/u,'').strip
end

This fixes the errors and looks nice and consistent with all the regexps being utf8.
However, I ran into a problem in production where content was apparently ASCII-8BIT
and got the following error:


incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

So then I thought why is the /u needed in the first place?  So I changed the code to:


if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/ then
CGI.unescape(content).gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
else
content.gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
end

This seems to pass tests in 1.8.7 and 1.9.2 so decided to go with it.


Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...