Just walking through some bug fixing.
original:
if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/un then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/u,'').strip
else
content.gsub(/(<![CDATA[|]]>)/u,'').strip
end
the tests in ruby1.9.2 give this error:
simple-rss/lib/simple-rss.rb:155: warning: regexp match /.../n against to UTF-8 string
And this makes sense because /un in the first regex is the same as /n. So it was
altered to:
if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/u then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/u,'').strip
else
content.gsub(/(<![CDATA[|]]>)/u,'').strip
end
This fixes the errors and looks nice and consistent with all the regexps being utf8.
However, I ran into a problem in production where content was apparently ASCII-8BIT
and got the following error:
incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
So then I thought why is the /u needed in the first place? So I changed the code to:
if content =~ /([^-_.!~*'()a-zA-Zd;/?:@&=+$,[]]%)/ then
CGI.unescape(content).gsub(/(<![CDATA[|]]>)/,'').strip
else
content.gsub(/(<![CDATA[|]]>)/,'').strip
end
This seems to pass tests in 1.8.7 and 1.9.2 so decided to go with it.