Ruby regexp surprise: \w is not equivalent to [0-9A-Za-z_]
by specialj on Mar.17, 2011, under Ruby On Rails
Despite ample documentation to the contrary I was surprised to learn that by default ruby matches \w to all unicode characters. This lead to some strange results in one of my apps where unicode quotes were present and matching \w. I found the solution is to use /\w+/n to specify the n kcode rather than the u kcode. But this was still distressing to find as the default in 1.8.7 and 1.9.2. It was further complicated by trying to get information. The following websites all report that \w is equivalent to [0-9A-Za-z_]:
- Regular expressions (Ruby User’s Guide)
- Rubular: a Ruby regular expression editor and tester
- Ruby Regular Expressions
According to these links, this behavior is spec but not documented anywhere: