Despite ample documentation to the contrary I was surprised to learn that by default ruby matches w to all unicode characters. This lead to some strange results in one of my apps where unicode quotes were present and matching w. I found the solution is to use /w+/n to specify the n kcode rather than the u kcode. But this was still distressing to find as the default in 1.8.7 and 1.9.2. It was further complicated by trying to get information. The following websites all report that w is equivalent to [0-9A-Za-z_]:
- Regular expressions (Ruby User’s Guide)
- Rubular: a Ruby regular expression editor and tester
- Ruby Regular Expressions
According to these links, this behavior is spec but not documented anywhere: