Members of "word character" set?
Posted: Sun Jan 21, 2007 2:00 pm
Other than a-Z,A-Z,0-9, which characters are considered to be part of the "word char" ( \w ) set in the regex used by newsbin?
I'm asking because, for instance, the underscore char '_' is part of the \w set, but the underscore is often used as a separator char, so, if you were looking for the artist "john doe", and were semi-clever with regex, you might use the filter "john\W*doe" - but that misses "john_doe" - so you use "john[\W_]doe" (or "john[^\w_]*doe" if using a negated set in another set strikes you as weird).
So... I'm wondering what other characters might be part of \w - especially those that might be commonly used as separators.
... and thanks!
Bob
I'm asking because, for instance, the underscore char '_' is part of the \w set, but the underscore is often used as a separator char, so, if you were looking for the artist "john doe", and were semi-clever with regex, you might use the filter "john\W*doe" - but that misses "john_doe" - so you use "john[\W_]doe" (or "john[^\w_]*doe" if using a negated set in another set strikes you as weird).
So... I'm wondering what other characters might be part of \w - especially those that might be commonly used as separators.
... and thanks!
Bob