BayesFilterSaveDatabase Method (String, String, Int32, Boolean) |
Namespace: MailBee.AntiSpam
public void SaveDatabase( string spamFilename, string nonSpamFilename, int threshold, bool saveAlways )
Exception | Condition |
---|---|
MailBeeInvalidArgumentException | spamFilename or nonSpamFilename is an empty string or a null reference (Nothing in Visual Basic). |
MailBeeIOException | An I/O error occurred. |
Compacting the database will decrease the file size but will decrease accuracy of spam recognition as well. It makes sense for very large databases populated from many thousands of e-mails. In this case, using a value of 3-5 will significantly decrease the database size but won't have big impact on accuracy. In general, the larger database size is, the larger threshold value can be.
If you save the database to the same location where you already have it, specify saveAlways as false. If you save to a new location, specify saveAlways as true.
Make sure your application has read/write access to the specified database file locations.
This sample removes all words that appear less than 3 times from the Bayesian database.
It's assumed the spam and non-spam samples are .EML files located in C:\AntiSpam\Spam and C:\AntiSpam\NonSpam folders respectively. The database itself (spam.dat and nonspam.dat) will be saved in C:\AntiSpam folder.
// To use the code below, import these namespaces at the top of your code. using System; using System.IO; using MailBee.Mime; using MailBee.AntiSpam; class Sample { static void Main(string[] args) { BayesFilter filter = new BayesFilter(); MailMessage msg = new MailMessage(); string spamDatabasePath = @"C:\AntiSpam\spam.dat"; string nonSpamDatabasePath = @"C:\AntiSpam\nonspam.dat"; filter.LoadDatabase(spamDatabasePath, nonSpamDatabasePath); // Train Bayesian filter for spam messages. string[] files = Directory.GetFiles(@"C:\AntiSpam\Spam", "*.eml"); foreach (string file in files) { msg.LoadMessage(file); filter.TrainFilter(msg, true); // Mark as spam. } // Train Bayesian filter for non-spam messages. files = Directory.GetFiles(@"C:\AntiSpam\NonSpam", "*.eml"); foreach (string file in files) { msg.LoadMessage(file); filter.TrainFilter(msg, false); // Mark as non-spam. } // Save Bayesian database to disk without compression. filter.SaveDatabase(spamDatabasePath, nonSpamDatabasePath); FileInfo fis = new FileInfo(spamDatabasePath); FileInfo fins = new FileInfo(nonSpamDatabasePath); Console.WriteLine("Size of database before compacting is: {0}", fis.Length + fins.Length); // Compress Bayesian database and save it to disk. filter.SaveDatabase(spamDatabasePath, nonSpamDatabasePath, 3, false); fis = new FileInfo(spamDatabasePath); fins = new FileInfo(nonSpamDatabasePath); Console.WriteLine("Size of database after compacting is: {0}", fis.Length + fins.Length); } } // Outputs: // Size of database before compacting is: 21164 // Size of database after compacting is: 11364 (may differ in your case)