Page 1 of 1

Help with regex needed

PostPosted: Sat Jan 29, 2005 1:41 pm
by Quicksilver
I am trying to write a simple regex which will list all subject names which do not begin with a number.

This is for use in the filters bar (rather than in a filter).

I have tried the following expression but neither seems to work:

^~[0-9]
~^[0-9]

In fact, Newsbin responds by removing the Post List entries as soon as one of these regexes is copied into the dropdown bar (without pressing FIND). Even using brackets like this does not prevent the instant response:

(^(~[0-9]))

What regex should I use to list all subject names which do not begin with a number?

PostPosted: Sat Jan 29, 2005 5:19 pm
by Smite
try ^\D though I give no guarauntees.

PostPosted: Sun Jan 30, 2005 4:16 am
by Quicksilver
Smite wrote:try ^\D though I give no guarauntees.


I like that line of thinking. from what I understand \D defines digits.

Then the addition of the caret here ^\D means "not digits".

I would then add an extra caret for a different function than before to mean start of line ^^\D. But it doesn't work.

To try and improve this, I inserted square brackets which, I read, are allowed with \D but which are not strictly necessary ^[^\D].

I'm now stuck! Any ideas?

PostPosted: Sun Jan 30, 2005 4:45 pm
by Smite
Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").

PostPosted: Mon Jan 31, 2005 8:50 pm
by Quicksilver
Smite wrote:Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").


Thank you, Smite. You can see my regex is coming along but slowly! Image link not allowed for unregistered users

However NB seems to interpret things in a different way because when I use ^\D in the filter window then what I get are all the subject lines which begin with either the letter D or d. Image link not allowed for unregistered users

In the end I got to filter out all the lines which begin with a digit by using this expression: ^[^0-9]

Seems that the first caret in that expression is to indicate beginning of a line and the second caret (in the square brackets) negates the class of values in the square brackets.

PostPosted: Mon Jan 31, 2005 10:19 pm
by Smite
Forgot to mention that only 4.33 has the newer RegEx engine, the older versions might not behave as expected.

Also worth noting is that NewsBin auto-capitalizes everything, so doing a \d for example, is not possible.

PostPosted: Tue Feb 01, 2005 5:22 pm
by Kiltme
Smite wrote:Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").


Not in Newsbin's implementation. See http://www.newsbin.com/nb33help/regexp.htm


^ by itself means at the start of a line
^ inside of [] means NOT.

\ is the escape character (the following character should be a character not a command)

\[ matches [, not the start of a [] pair.

[0-9] means any number \D \d is not defined.

So the example of ^[^0-9] is what you would expect to mean - at the start of line (NOT a number)

PostPosted: Tue Feb 01, 2005 10:42 pm
by Smite
As I said, in the newer version. Check the sticky at the top of the forum, that's real (including v4.33) RegEx.

PostPosted: Wed Feb 02, 2005 12:01 am
by Quade
Yep, pretty sure "\d" and "[0-9]{3}" are legal now. The second one is "three numbers".


/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/

/*
This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.

Written by: Philip Hazel <ph10@cam.ac.uk>


Is the library I use now, PCRE.

PostPosted: Mon Feb 07, 2005 8:00 pm
by Kiltme
Quade wrote:Yep, pretty sure "\d" and "[0-9]{3}" are legal now. The second one is "three numbers".


[0-9]{3}-[0-9]{3} works as expected to filter out pedophile spam (123-456).

\D does something, but not as expected. (in 4.33B6)

In the find bar:

\d is impossible to enter (auto capitalization)
\D appears to match a number anywhere

For example

adele[0-9] lists adele101, adele02, etc
adele\D lists those plus ones like adeleoutdoors1

So \D isn't [^0-9] as it's defined and it's not exactly [0-9] either.

Plus I just discovered I can't copy with <ctrl>c from the find box?!

<ctrl>v works for paste, but to copy you have to use right-click -> copy.

PostPosted: Mon Feb 07, 2005 8:27 pm
by Quade
adele[0-9] lists adele101, adele02, etc

This works because it matches some part of the subject. If you just wanted one number you need to bound it, either with a space or some other character.

adele[0-9][A-Za-z]
adele[0-9][ ]

or something like that.

What's with all the pedo spam? I was commenting to Dex about it the other day. Smaller groups are absolutely full of it.

PostPosted: Wed Feb 09, 2005 7:45 pm
by Kiltme
It wasn't the behavior of [0-9] that I thougth was wierd it was the different behavior of \D.

From the spec. (not the implimenation) \d is supssed to be the same as [0-9] but it's not.