NewsBin Filtering Capabilities
Posted: Tue Mar 12, 2002 5:53 pm
NewsBin 4.0 has very extensive filtering capabilities to match user defined expressions in subjects or filenames, and to set file size ranges and crosspost threshold. Since different newsgroups may require different filters, NewsBin allows you to define a "Filter Set" which you can choose on the main screen depending on the type of files you are downloading.
Filtering Basics
For subject and filename filters, NewsBin provides both an "Accept" list and a "Reject" list. You can run only Accept filterd, only Reject filters, or both. If both filter types are selected, the list of Accept filters is checked first, then anything passing the Accept filters is compared to the Reject filters. Anything matching a Reject filter is filtered out.
The filters themselves are use Perl-like regular expression parsing syntax. This is not the same as the DOS style wildcards. For example the DOS style *.jpg would be written as .*\.jpg in RegEx. The "." means "match any single character" and the "\" is an escape character so the second dot is taken literally instead of a command in the expression. Read more about common RegEx expressions and how to build basic expressions at http://www.newsbin.com/nb33help/regexp.htm.
Subject Filters
Any expression entered here is applied to the subject portion of a post. Headers are filtered and hidden from the Post List if there is a filter hit based on your settings. Many posters include the filename of the the attached file in the subject. Doing a filename type filter in the Subject Filters is more efficient then using the Filename Filters (read more about this in the next section) but the filtering is not 100% correct sicne the filename in the subject may not be the same as the actual filename. Checking the "Show Filtered Posts" checkbox on the main screen will allow filtered posts to be seen in the Post List.
Filename Filters
Any expression entered here is applied to the actual filename of the attached file. It does not match the filename given in the subject header as those are sometimes different, especially when it comes to spam or viruses. Because NewsBin does not know the actual filename until the binary portion of the post is being downloaded, these posts will be seen in the Post List unless they are filtered by a subject filter. When a filter hit is made on a filename, NewsBin must drop the connection to terminate the download and then re-connect to continue downloading other queued posts. Some news server do behave well to dropped connections, especially if you hit a long run of filenames that hit your filter. Subject Filters should always be used before Filename filters for performance reasons but it would be prudent to use Filename filters as a backup to catch unwanted .exe, .scr, .pif, .vbs, or other suspicious or unwanted filenames.
File Size and Crossposts
You can set a minimum file size, maximum file size, or both for limiting the size of files that NewsBin will download. The file size filter is based on the file size reported in the header data and may not reflect the actual file size in the case of spam or virus postings.
The Crosspost filter tells NewsBin to filter out any post that was posted to more than X different newsgroups. X being some number that you configure.
Defining Filter Sets
To save a set of filters defined for a specific category of files, you can define a name identifying your set up filters. Do this by clicking the "New" button under "Filter Options" and specify a name for your new filter set. If you have a previous filter set defined that is close to what you want the new filter set to be, you can specify the other filter set as a "Template" and that will be used as a starting point for your new filter set. Once you have named your filter set, configure your filters and then hit "OK". If you want to change a filter set, go into the filters menu and select the filter set you want to modify. The filters will then be displayed on the screen.
To use a pre-defined filter set, use the "Apply Filter Profile" option on the main screen. This filter will apply to all download performed from that time forward. You can also assign filter set to be used for different groups by right-clicking on the group name and selecting "Filter Profile". Whenever you download from that group, the specified filter set will always be used.
Filtering Basics
For subject and filename filters, NewsBin provides both an "Accept" list and a "Reject" list. You can run only Accept filterd, only Reject filters, or both. If both filter types are selected, the list of Accept filters is checked first, then anything passing the Accept filters is compared to the Reject filters. Anything matching a Reject filter is filtered out.
The filters themselves are use Perl-like regular expression parsing syntax. This is not the same as the DOS style wildcards. For example the DOS style *.jpg would be written as .*\.jpg in RegEx. The "." means "match any single character" and the "\" is an escape character so the second dot is taken literally instead of a command in the expression. Read more about common RegEx expressions and how to build basic expressions at http://www.newsbin.com/nb33help/regexp.htm.
Subject Filters
Any expression entered here is applied to the subject portion of a post. Headers are filtered and hidden from the Post List if there is a filter hit based on your settings. Many posters include the filename of the the attached file in the subject. Doing a filename type filter in the Subject Filters is more efficient then using the Filename Filters (read more about this in the next section) but the filtering is not 100% correct sicne the filename in the subject may not be the same as the actual filename. Checking the "Show Filtered Posts" checkbox on the main screen will allow filtered posts to be seen in the Post List.
Filename Filters
Any expression entered here is applied to the actual filename of the attached file. It does not match the filename given in the subject header as those are sometimes different, especially when it comes to spam or viruses. Because NewsBin does not know the actual filename until the binary portion of the post is being downloaded, these posts will be seen in the Post List unless they are filtered by a subject filter. When a filter hit is made on a filename, NewsBin must drop the connection to terminate the download and then re-connect to continue downloading other queued posts. Some news server do behave well to dropped connections, especially if you hit a long run of filenames that hit your filter. Subject Filters should always be used before Filename filters for performance reasons but it would be prudent to use Filename filters as a backup to catch unwanted .exe, .scr, .pif, .vbs, or other suspicious or unwanted filenames.
File Size and Crossposts
You can set a minimum file size, maximum file size, or both for limiting the size of files that NewsBin will download. The file size filter is based on the file size reported in the header data and may not reflect the actual file size in the case of spam or virus postings.
The Crosspost filter tells NewsBin to filter out any post that was posted to more than X different newsgroups. X being some number that you configure.
Defining Filter Sets
To save a set of filters defined for a specific category of files, you can define a name identifying your set up filters. Do this by clicking the "New" button under "Filter Options" and specify a name for your new filter set. If you have a previous filter set defined that is close to what you want the new filter set to be, you can specify the other filter set as a "Template" and that will be used as a starting point for your new filter set. Once you have named your filter set, configure your filters and then hit "OK". If you want to change a filter set, go into the filters menu and select the filter set you want to modify. The filters will then be displayed on the screen.
To use a pre-defined filter set, use the "Apply Filter Profile" option on the main screen. This filter will apply to all download performed from that time forward. You can also assign filter set to be used for different groups by right-clicking on the group name and selecting "Filter Profile". Whenever you download from that group, the specified filter set will always be used.