BayesFilterSaveDatabase Method (String, String, Int32, Boolean)
Compacts the database by removing non-significant data and saves the database to disk.

Namespace: MailBee.AntiSpam
Assembly: MailBee.NET (in MailBee.NET.dll) Version: 12.4 build 677 for .NET 4.5
Syntax
public void SaveDatabase(
	string spamFilename,
	string nonSpamFilename,
	int threshold,
	bool saveAlways
)

Parameters

spamFilename
Type: SystemString
The full or relative path to the file containing spam samples.
nonSpamFilename
Type: SystemString
The full or relative path to the file containing non-spam samples.
threshold
Type: SystemInt32
Specifies how many times (at least) the word should appear in the database to keep it from removal during compacting the database. If the word appears less times then specified, it will be considered non-significant and thus will be removed. If zero, no words will be removed from the database.
saveAlways
Type: SystemBoolean
If true, the method will save the database even if has not been modified since it had been loaded to memory; if false, the method will save it only if it has been modified.
Exceptions
ExceptionCondition
MailBeeInvalidArgumentExceptionspamFilename or nonSpamFilename is an empty string or a null reference (Nothing in Visual Basic).
MailBeeIOExceptionAn I/O error occurred.
Remarks

Compacting the database will decrease the file size but will decrease accuracy of spam recognition as well. It makes sense for very large databases populated from many thousands of e-mails. In this case, using a value of 3-5 will significantly decrease the database size but won't have big impact on accuracy. In general, the larger database size is, the larger threshold value can be.

If you save the database to the same location where you already have it, specify saveAlways as false. If you save to a new location, specify saveAlways as true.

Make sure your application has read/write access to the specified database file locations.

Examples

This sample removes all words that appear less than 3 times from the Bayesian database.

It's assumed the spam and non-spam samples are .EML files located in C:\AntiSpam\Spam and C:\AntiSpam\NonSpam folders respectively. The database itself (spam.dat and nonspam.dat) will be saved in C:\AntiSpam folder.

// To use the code below, import these namespaces at the top of your code.
using System;
using System.IO;
using MailBee.Mime;
using MailBee.AntiSpam;

class Sample
{
    static void Main(string[] args)
    {
        BayesFilter filter = new BayesFilter();
        MailMessage msg = new MailMessage();

        string spamDatabasePath = @"C:\AntiSpam\spam.dat";
        string nonSpamDatabasePath = @"C:\AntiSpam\nonspam.dat";
        filter.LoadDatabase(spamDatabasePath, nonSpamDatabasePath);

        // Train Bayesian filter for spam messages.
        string[] files = Directory.GetFiles(@"C:\AntiSpam\Spam", "*.eml");
        foreach (string file in files)
        {
            msg.LoadMessage(file);
            filter.TrainFilter(msg, true); // Mark as spam.
        }

        // Train Bayesian filter for non-spam messages.
        files = Directory.GetFiles(@"C:\AntiSpam\NonSpam", "*.eml");
        foreach (string file in files)
        {
            msg.LoadMessage(file);
            filter.TrainFilter(msg, false); // Mark as non-spam.
        }

        // Save Bayesian database to disk without compression.
        filter.SaveDatabase(spamDatabasePath, nonSpamDatabasePath);

        FileInfo fis = new FileInfo(spamDatabasePath);
        FileInfo fins = new FileInfo(nonSpamDatabasePath);
        Console.WriteLine("Size of database before compacting is: {0}", fis.Length + fins.Length);

        // Compress Bayesian database and save it to disk.
        filter.SaveDatabase(spamDatabasePath, nonSpamDatabasePath, 3, false);

        fis = new FileInfo(spamDatabasePath);
        fins = new FileInfo(nonSpamDatabasePath);
        Console.WriteLine("Size of database after compacting is: {0}", fis.Length + fins.Length);
    }
}

// Outputs:
// Size of database before compacting is: 21164
// Size of database after compacting is: 11364 (may differ in your case)
See Also