Help with regex needed

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

Help with regex needed

Postby Quicksilver » Sat Jan 29, 2005 1:41 pm

I am trying to write a simple regex which will list all subject names which do not begin with a number.

This is for use in the filters bar (rather than in a filter).

I have tried the following expression but neither seems to work:

^~[0-9]
~^[0-9]

In fact, Newsbin responds by removing the Post List entries as soon as one of these regexes is copied into the dropdown bar (without pressing FIND). Even using brackets like this does not prevent the instant response:

(^(~[0-9]))

What regex should I use to list all subject names which do not begin with a number?
Quicksilver
Active Participant
Active Participant
 
Posts: 88
Joined: Wed Apr 07, 2004 4:46 pm

Postby Smite » Sat Jan 29, 2005 5:19 pm

try ^\D though I give no guarauntees.
Please read the FAQ before asking any questions.
If you're new to newsgroups, and the files on them, you can find a very helpful guide here.
User avatar
Smite
Katamari Damacy Addict
 
Posts: 5318
Joined: Sat May 19, 2001 1:54 am
Location: Alberta, Canada

Registered Newsbin User since: 03/27/03

Postby Quicksilver » Sun Jan 30, 2005 4:16 am

Smite wrote:try ^\D though I give no guarauntees.


I like that line of thinking. from what I understand \D defines digits.

Then the addition of the caret here ^\D means "not digits".

I would then add an extra caret for a different function than before to mean start of line ^^\D. But it doesn't work.

To try and improve this, I inserted square brackets which, I read, are allowed with \D but which are not strictly necessary ^[^\D].

I'm now stuck! Any ideas?
Quicksilver
Active Participant
Active Participant
 
Posts: 88
Joined: Wed Apr 07, 2004 4:46 pm

Postby Smite » Sun Jan 30, 2005 4:45 pm

Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").
Please read the FAQ before asking any questions.
If you're new to newsgroups, and the files on them, you can find a very helpful guide here.
User avatar
Smite
Katamari Damacy Addict
 
Posts: 5318
Joined: Sat May 19, 2001 1:54 am
Location: Alberta, Canada

Registered Newsbin User since: 03/27/03

Postby Quicksilver » Mon Jan 31, 2005 8:50 pm

Smite wrote:Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").


Thank you, Smite. You can see my regex is coming along but slowly! Image link not allowed for unregistered users

However NB seems to interpret things in a different way because when I use ^\D in the filter window then what I get are all the subject lines which begin with either the letter D or d. Image link not allowed for unregistered users

In the end I got to filter out all the lines which begin with a digit by using this expression: ^[^0-9]

Seems that the first caret in that expression is to indicate beginning of a line and the second caret (in the square brackets) negates the class of values in the square brackets.
Quicksilver
Active Participant
Active Participant
 
Posts: 88
Joined: Wed Apr 07, 2004 4:46 pm

Postby Smite » Mon Jan 31, 2005 10:19 pm

Forgot to mention that only 4.33 has the newer RegEx engine, the older versions might not behave as expected.

Also worth noting is that NewsBin auto-capitalizes everything, so doing a \d for example, is not possible.
Please read the FAQ before asking any questions.
If you're new to newsgroups, and the files on them, you can find a very helpful guide here.
User avatar
Smite
Katamari Damacy Addict
 
Posts: 5318
Joined: Sat May 19, 2001 1:54 am
Location: Alberta, Canada

Registered Newsbin User since: 03/27/03

Postby Kiltme » Tue Feb 01, 2005 5:22 pm

Smite wrote:Actually, the ^ means "begining of subject", and the \D means "Not a number" (\d means "a number").


Not in Newsbin's implementation. See http://www.newsbin.com/nb33help/regexp.htm


^ by itself means at the start of a line
^ inside of [] means NOT.

\ is the escape character (the following character should be a character not a command)

\[ matches [, not the start of a [] pair.

[0-9] means any number \D \d is not defined.

So the example of ^[^0-9] is what you would expect to mean - at the start of line (NOT a number)
User avatar
Kiltme
Seasoned User
Seasoned User
 
Posts: 638
Joined: Mon Jan 05, 2004 2:02 am

Registered Newsbin User since: 01/05/04

Postby Smite » Tue Feb 01, 2005 10:42 pm

As I said, in the newer version. Check the sticky at the top of the forum, that's real (including v4.33) RegEx.
Please read the FAQ before asking any questions.
If you're new to newsgroups, and the files on them, you can find a very helpful guide here.
User avatar
Smite
Katamari Damacy Addict
 
Posts: 5318
Joined: Sat May 19, 2001 1:54 am
Location: Alberta, Canada

Registered Newsbin User since: 03/27/03

Postby Quade » Wed Feb 02, 2005 12:01 am

Yep, pretty sure "\d" and "[0-9]{3}" are legal now. The second one is "three numbers".


/*************************************************
* Perl-Compatible Regular Expressions *
*************************************************/

/*
This is a library of functions to support regular expressions whose syntax
and semantics are as close as possible to those of the Perl 5 language. See
the file Tech.Notes for some information on the internals.

Written by: Philip Hazel <ph10@cam.ac.uk>


Is the library I use now, PCRE.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby Kiltme » Mon Feb 07, 2005 8:00 pm

Quade wrote:Yep, pretty sure "\d" and "[0-9]{3}" are legal now. The second one is "three numbers".


[0-9]{3}-[0-9]{3} works as expected to filter out pedophile spam (123-456).

\D does something, but not as expected. (in 4.33B6)

In the find bar:

\d is impossible to enter (auto capitalization)
\D appears to match a number anywhere

For example

adele[0-9] lists adele101, adele02, etc
adele\D lists those plus ones like adeleoutdoors1

So \D isn't [^0-9] as it's defined and it's not exactly [0-9] either.

Plus I just discovered I can't copy with <ctrl>c from the find box?!

<ctrl>v works for paste, but to copy you have to use right-click -> copy.
User avatar
Kiltme
Seasoned User
Seasoned User
 
Posts: 638
Joined: Mon Jan 05, 2004 2:02 am

Registered Newsbin User since: 01/05/04

Postby Quade » Mon Feb 07, 2005 8:27 pm

adele[0-9] lists adele101, adele02, etc

This works because it matches some part of the subject. If you just wanted one number you need to bound it, either with a space or some other character.

adele[0-9][A-Za-z]
adele[0-9][ ]

or something like that.

What's with all the pedo spam? I was commenting to Dex about it the other day. Smaller groups are absolutely full of it.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby Kiltme » Wed Feb 09, 2005 7:45 pm

It wasn't the behavior of [0-9] that I thougth was wierd it was the different behavior of \D.

From the spec. (not the implimenation) \d is supssed to be the same as [0-9] but it's not.
User avatar
Kiltme
Seasoned User
Seasoned User
 
Posts: 638
Joined: Mon Jan 05, 2004 2:02 am

Registered Newsbin User since: 01/05/04


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 2 guests