High Tech Sorcery

Ruby regexp surprise: \w is not equivalent to [0-9A-Za-z_]

by on Mar.17, 2011, under Ruby On Rails

Despite ample documentation to the contrary I was surprised to learn that by default ruby matches \w to all unicode characters.  This lead to some strange results in one of my apps where unicode quotes were present and matching \w.  I found the solution is to use /\w+/n to specify the n kcode rather than the u kcode.  But this was still distressing to find as the default in 1.8.7 and 1.9.2.  It was further complicated by trying to get information.  The following websites all report that \w is equivalent to [0-9A-Za-z_]:

According to these links, this behavior is spec but not documented anywhere:


Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Blogroll

A few highly recommended websites...