Ruby regexp surprise: w is not equivalent to [0-9A-Za-z_]

Despite ample documentation to the contrary I was surprised to learn that by default ruby matches w to all unicode characters.  This lead to some strange results in one of my apps where unicode quotes were present and matching w.  I found the solution is to use /w+/n to specify the n kcode rather than the u kcode.  But this was still distressing to find as the default in 1.8.7 and 1.9.2.  It was further complicated by trying to get information.  The following websites all report that w is equivalent to [0-9A-Za-z_]:

According to these links, this behavior is spec but not documented anywhere:

Leave a Reply

Your email address will not be published. Required fields are marked *