Maybe these will help new users

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

Maybe these will help new users

Postby Gaijin64 » Sat May 30, 2009 7:48 pm

Hi all. I've been using Newsbin for years and watched the improvements. I've also been lurking here a long time and started studying regex for my newsbin use. Since I think I'm finally getting decent at understanding it, I wanted to start sort of giving back around here. If anyone feels these can be optimized please let me know.

Usually people ask specific questions and get a specific answer. I thought I might start throwing out some examples of my own use with explanations that people can copy or use themselves.

Here's a quick one: I like lossless music, so I hit the lossless groups quite a bit. Fortunately they aren't nearly as spammed and corrupted as some of the other binary groups, but just to be on the safe side I always include a rule to ignore non-word characters before the file extension. Anyway, here's my lossless filter. I include some picture extensions to grab the album art if available. I wouldn't use this filter in a say, multimedia group where I'm looking for movie files.

I've thrown in a comment and included the tree view so you can follow the logic of the expressions. We can get into backreferences and capture groups another time. Honestly, I don't know if there is support for those in newsbin, but they are useful in other apps anyway.

Hope it helps. Gaijin

Will match:

<filename>.rar
<filename>1.rar
<filename>%.rar
<filename>.gif
<filename>.ape
<filename>.flac
<filename>.zip
<filename>.jpeg
<filename>.jpg


won't match

<filename>1. rar
<filename> .rar
<filename>jpg
<filename>gif
<filename>flac
<filename>. flac
<filename> .flac


Ignore non-word characters before file extensions

[^\s]\.(par2|nfo|gif|jpe?g|ape|flac|zip|txt|[r-z][a0-9][r0-9])
  • Match any character that is NOT a "A whitespace character (spaces, tabs, line breaks, etc.)"
  • Match the character "." literally
  • Match the regular expression below and capture its match into backreference number 1
      Match either the regular expression below (attempting the next alternative only if this one fails)
        Match the characters "par2" literally
    • Or match regular expression number 2 below (attempting the next alternative only if this one fails)
        Match the characters "nfo" literally
    • Or match regular expression number 3 below (attempting the next alternative only if this one fails)
        Match the characters "gif" literally
    • Or match regular expression number 4 below (attempting the next alternative only if this one fails)
        Match the characters "jp" literally
      • Match the character "e" literally
          Between zero and one times, as many times as possible, giving back as needed (greedy)
      • Match the character "g" literally
    • Or match regular expression number 5 below (attempting the next alternative only if this one fails)
        Match the characters "ape" literally
    • Or match regular expression number 6 below (attempting the next alternative only if this one fails)
        Match the characters "flac" literally
    • Or match regular expression number 7 below (attempting the next alternative only if this one fails)
        Match the characters "zip" literally
    • Or match regular expression number 8 below (attempting the next alternative only if this one fails)
        Match the characters "txt" literally
    • Or match regular expression number 9 below (the entire group fails if this one fails to match)
        Match a single character in the range between "r" and "z"
      • Match a single character present in the list below
          The character "a"
        • A character in the range between "0" and "9"
      • Match a single character present in the list below
          The character "r"
        • A character in the range between "0" and "9"
Last edited by Gaijin64 on Sun May 31, 2009 7:09 pm, edited 1 time in total.
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Never see a .exe file again

Postby Gaijin64 » Sat May 30, 2009 8:51 pm

We all know they suck...keygens, blah blah blah. Well this one (has probably been posted a million times) is pretty simple. Put in the global filter and you should be all good. Newsbin does a good job of protecting us from ourselves by blocking a launch of this sort of thing from the app.

Will Match:

<some app>.Incl.Patch-NoPE.exe
<some app>.Incl.Patch- blah blah blah NoPE.exe
<some app>.Incl.Patch-NoPE. exe
<some app>really cool keygen.exe


No Match:
<some app>app. exe.cutable
<my.favorite.application.executable.download.now.its.really.cool>
<Download This App Now crack.exe inside>


Never see a .exe file again

[^/:*?"<>|](exe){1,3}$

Options: case insensitive
    Match a single character NOT present in the list "/:*?"<>|"Match the regular expression below and capture its match into backreference number 1
      Between one and 3 times, as many times as possible, giving back as needed (greedy) Note: we repeated the capturing group itself. The group will capture only the last iteration.
      Match the characters "exe" literally
    Assert position at the end of the string (or before the line break at the end of the string, if any)
Last edited by Gaijin64 on Sun May 31, 2009 6:47 pm, edited 4 times in total.
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby Quade » Sat May 30, 2009 10:04 pm

Good information. I'm going to edit out the specific filenames though.

Thanks
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby Gaijin64 » Sat May 30, 2009 10:08 pm

No Prob. Sorry, I'll be more careful. I just yanked them from the properties of a post in NB for examples.
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Comics

Postby Gaijin64 » Sat May 30, 2009 11:14 pm

I know...I seem really really boring. But lets just say that you want to look at the comics groups. They tend to be pretty well defined and have methods for posting. Anyone throwing loose jpegs is pretty much ignored. If you know that, then build the filter.

Will match:

<some comic book>.cbr
<some comic book>.cbz
<some comic book>.rar
<some comic book>.zip
<some comic book collection>.rar
<some comic book collection>.r25
<some comic book collection>.nfo
<some comic book collection>.zip


won't match
<some comic book not wrapped and has a million loose pics>.jpg
<some comic book not wrapped and has a million loose pics>.gif
<some boner pretending to post a collection> .cbr
<some boner pretending to post a collection> .cbr but has a .exe


Comics maybe?
Regex filter:

[^\s]\.(par2|nfo|cbr|cbz|zip|txt|[r-z][a0-9][r0-9])

Options: ^ and $ match at line breaks
    Match any character that is NOT a "A whitespace character (spaces, tabs, line breaks, etc.)"Match the character "." literallyMatch the regular expression below and capture its match into backreference number 1
      Match either the regular expression below (attempting the next alternative only if this one fails)
        Match the characters "par2" literally
      Or match regular expression number 2 below (attempting the next alternative only if this one fails)
        Match the characters "nfo" literally
      Or match regular expression number 3 below (attempting the next alternative only if this one fails)
        Match the characters "cbr" literally
      Or match regular expression number 4 below (attempting the next alternative only if this one fails)
        Match the characters "cbz" literally
      Or match regular expression number 5 below (attempting the next alternative only if this one fails)
        Match the characters "zip" literally
      Or match regular expression number 6 below (attempting the next alternative only if this one fails)
        Match the characters "txt" literally
      Or match regular expression number 7 below (the entire group fails if this one fails to match)
        Match a single character in the range between "r" and "z"Match a single character present in the list below
          The character "a"A character in the range between "0" and "9"
        Match a single character present in the list below
          The character "r"A character in the range between "0" and "9"



Ok. I'm all done for the day. If no one cares about this stuff I'm not offended. I will say that if you stay on regexes one day you wake up and it all makes sense :) forget just NB. Imagine the power of one line of code searching across a hundred files when before you would have to write 10 search queries to get your result.

Anyway. Let me know if you want more
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby Quade » Sun May 31, 2009 1:39 am

Might get Dex to move this to the FAQS page. We have alot more lurkers than posters.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby ozzii » Sun May 31, 2009 6:00 am

Thanks for this.
Usefull for those who don't know regex :wink:
ozzii
Seasoned User
Seasoned User
 
Posts: 410
Joined: Thu Feb 23, 2006 6:10 pm
Location: France

Registered Newsbin User since: 02/23/06

Postby Gaijin64 » Sun May 31, 2009 5:42 pm

ozzii wrote:Thanks for this.
Usefull for those who don't know regex :wink:


Ha. That would have been me not so long ago :D
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

And now, I present "Position assertions"

Postby Gaijin64 » Sun May 31, 2009 5:56 pm

So you're using the above example to grab your favorite files and all of a sudden you start seeing headers that have been named along your lines, but are really files that you don't want. Maybe something like <myfavoritebandrecentrelease>.rar.exe.

Oh no! this filter is a huge failure! Well before you jump on the forums to write me a nasty note, let's just solve this with a single character that tells the expression that we only want to see the files that are matched AT THE END of our string. Like so:

[^\s]\.(par2|nfo|gif|jpe?g|ape|flac|zip|txt|[r-z][a0-9][r0-9])$

In addition to the above example, this simply tells the regex to

Assert position at the end of a line (at the end of the string or before a line break character) "$"

That's right sports fans. With the press of a single key we have sent those guys to "never display again" hell. Here's what we get now:

Will match:

<filename>.rar
<filename>1.rar
<filename>%.rar
<filename>.gif
<filename>.ape
<filename>.flac
<filename>.zip
<filename>.jpeg
<filename>.jpg
<my.favorite.file.rar.download.now>.rar


won't match

<filename>1. rar
<filename> .rar
<filename>jpg
<filename>gif
<filename>.rar.exe
<my.favorite.file.rar.download.now>.exe


Sort of makes you all tingly huh?

Until next time. Gaijin.
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby Gaijin64 » Sun May 31, 2009 7:23 pm

Quade wrote:Might get Dex to move this to the FAQS page. We have alot more lurkers than posters.


If you find it useful. I'd like to start throwing examples for the block list, search field, etc. I think some of those features are woefully underutilized by all except power users. Mostly because there aren't enough solid examples that people can cut and paste and then modify.

Like I said, I won't be offended if this isn't something that people care about.

8)
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby Innuendo » Mon Jun 01, 2009 12:36 am

Keep it coming...I don't know regex and I am not going to have time to learn any anytime soon, but there's going to come a time when I will be ready & having your examples and explanations here waiting for me when I do will be a great help.

I'm sure there are lots of others who feel the same way as me, too.
Innuendo
Seasoned User
Seasoned User
 
Posts: 290
Joined: Thu Jul 14, 2005 12:04 pm

Registered Newsbin User since: 07/13/05

Postby jace808 » Thu Jun 04, 2009 8:05 am

So I chose add filter and made a new one. I put your [^/:*?"<>|](exe){1,3}$ in reject for filename and subject. I did a search and I still see the .exe's.
jace808
n00b
n00b
 
Posts: 5
Joined: Sat Apr 05, 2008 2:00 pm

Registered Newsbin User since: 06/29/03

Postby Quade » Thu Jun 04, 2009 9:50 am

Filter don't apply to search. Internet search that is. Filters are for headers where you might end up with a mix of unrelated files in the same list. Since search is targeted, I don't see the point of applying filters.

Is this internet or local search?
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby jace808 » Fri Jun 05, 2009 7:41 am

Newsbin Pro internet search is what I use.
jace808
n00b
n00b
 
Posts: 5
Joined: Sat Apr 05, 2008 2:00 pm

Registered Newsbin User since: 06/29/03

Postby Gaijin64 » Fri Jun 05, 2009 9:38 pm

I've never tried it on internet search...but it does work on local headers.

Quade, according to this post by you

http://forums.newsbin.com/viewtopic.php?t=24526

filtering Headers during download is an option. I would assume that meets the same level of functionality as the other filters? meaning regex would work.
Gaijin64
n00b
n00b
 
Posts: 9
Joined: Tue Jul 22, 2003 11:16 am

Registered Newsbin User since: 05/14/03

Postby blindguy » Mon Apr 05, 2010 10:11 pm

I have a couple of questions about expressions.

SO is there a AND between each line or is it a OR?

I would like to do something like

((subject contains x264 and freeware)
or
(subject contains x264 and trialware)
or
(subject contains x264 and bogusware))

or maybe it would be like

((subject contains x264 and (freeware or trialware or bogusware))

would I do:

x264 + freeware
x264 + triaware
x264 + bogusware

any help is appreciated.
User avatar
blindguy
Occasional Contributor
Occasional Contributor
 
Posts: 37
Joined: Fri Feb 20, 2004 3:29 am

Registered Newsbin User since: 02/21/04

Postby Quade » Mon Apr 05, 2010 11:27 pm

Between each line it's essentially an OR.

So,

- [.]jpg
- [.]png

On multiple lines or

[.]jpg|[.]png

on a single line are the same.

"Subject" is implied by what list you use. Subject filters work on the subjects in the lists. Filename filters work on the filenames found after download commences.

"x264.*freeware|freeware.*x264 "

Would catch instances where freeware was before or after the X264. With regular expressions the order matters. ".*" means any characters separating the two words.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44984
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Postby blindguy » Tue Apr 06, 2010 4:05 pm

Thanks!! that info helped me 100%!!

I was able to get when I needed going.

Next to master this sqllite thing!

thanks for the assistance!!
User avatar
blindguy
Occasional Contributor
Occasional Contributor
 
Posts: 37
Joined: Fri Feb 20, 2004 3:29 am

Registered Newsbin User since: 02/21/04


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 0 guests