fixing a bug in simple-rss

Just walking through some bug fixing.

original:


if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/un then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/u,'').strip
else
content.gsub(/(<![CDATA[|]]>)/u,'').strip
end

the tests in ruby1.9.2 give this error:


simple-rss/lib/simple-rss.rb:155: warning: regexp match /.../n against to UTF-8 string

And this makes sense because /un in the first regex is the same as /n.  So it was
altered to:


if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/u then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/u,'').strip
else
content.gsub(/(<![CDATA[|]]>)/u,'').strip
end

This fixes the errors and looks nice and consistent with all the regexps being utf8.
However, I ran into a problem in production where content was apparently ASCII-8BIT
and got the following error:


incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

So then I thought why is the /u needed in the first place?  So I changed the code to:


if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/ then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/,'').strip
else
content.gsub(/(<![CDATA[|]]>)/,'').strip
end

This seems to pass tests in 1.8.7 and 1.9.2 so decided to go with it.

Leave a Reply

Your email address will not be published. Required fields are marked *