Compose international e-mail

This guide demonstrates creating e-mails written in languages which use non-Latin alphabet, explains code pages, charsets and character encodings, RTL (right-to-left languages), HTML META tags, localized headers and attachment filenames.


Charsets, code pages, encodings, and UTF-8

In most cases, it's enough to specify the correct message charset in case if put some international characters there. Nowadays, UTF-8 is becoming the standard for the Internet communication and most e-mail programs can correctly display e-mails which use UTF-8 charset:

mailer.Message.Charset = "utf-8";
mailer.Message.Charset = "utf-8"

We assume mailer is an Smtp object instance. In addition to that, for all the samples in this guide we assume MailBee, MailBee.Mime, and MailBee.SmtpMail namespaces are imported and the license key is set. See Import namespaces and set license key topic for details.

By the way, it's recommended to use lower-case names for charsets. For instance, utf-8 is better than UTF-8 because some e-mail clients do not understand charset names which are not lowercase.

If international characters appear in the headers of your e-mail message, it's recommended to encode them using base64 or quoted-printable transformation:

mailer.Message.Charset = "utf-8";
mailer.Message.EncodeAllHeaders(System.Text.Encoding.UTF8, HeaderEncodingOptions.None)
mailer.Message.Charset = "utf-8"
mailer.Message.EncodeAllHeaders(System.Text.Encoding.UTF8, HeaderEncodingOptions.None)

It's assumed that all the headers and attachments' filenames with international characters had been set by the moment when MailMessage.EncodeAllHeaders method was called.

In the sample above, we use the same UTF-8 charset and apply UTF-8 character encoding to the headers. Although you can use different charsets for different parts of the e-mail, there is usually no reason to do so.

Let's focus on the difference between charsets, encodings, and code pages. In short, they all mean the same but from different points of view. The charset is the string name for an encoding. The codepage is the integer index of an encoding. The encoding is an object.

For instance, System.Text.UTF8 is the Encoding object which allows for byte-to-string and string-to-byte conversions of UTF-8 encoded bytes. Its charset name is utf-8. Its codepage number is 65001. Thus, you can get the same object as System.Text.UTF8System.Text.Encoding.GetEncoding("utf-8"), or System.Text.Encoding.GetEncoding(65001).

Note that the above describes encoding, charset and codepage in their .NET Framework meaning. For instance, Internet Explorer uses the "encoding" term to refer to a charset. There is no strict definition of these terms and they are often used interchangeably.


Content-transfer encodings

To make things even more tangled, there is also a content-transfer encoding term (typical examples are Base64 and Quoted-Printable). While the character encoding denotes how to convert strings (which are actually sequences of 16-bit Unicode characters) into 8-bit bytes, the content-transfer encoding denotes how to convert 8-bit bytes into 7-bit form. Many 8-bit codes are not allowed in e-mails (for instance, zero byte) and extra encoding is required to convert an arbitrary 8-bit byte into a certain value which would be safe for transfer via communication channels.

Base64 produces 4 (four) 7-bit bytes from every 3 (three) 8-bit bytes. It's effective when the data contains lots of unsafe characters (e.g. binary data or texts in non-Latin charsets). Base64 makes text unreadable.

Quoted-Printable uses a flexible method which leaves ASCII chars as-is (so that Latin text does not change) while unsafe chars are encoded. This method is the best when the text to be encoded contains mostly ASCII chars. Typical examples are texts written in Western European languages and documents which contain only small parts of non-Latin text. The Latin part of the text remains readable even after Quoted-Printable is applied. MailBee.NET uses Quoted-Printable as the default content-transfer encoding.


UTF-8 or local language-specific charset?

When you compose a message in local language, you always have at least two choices:

The main advantage of UTF-8: it's universal. You can represent any text using this charset.

But in some cases you may still prefer a local charset. First of all, UTF-8 is not efficient in terms of the size of encoded data. The same text encoded in a local charset may be twice as short as its UTF-8 version. Also, some old e-mail readers still can't correctly display UTF-8. The local charset is fine when you know that your e-mail won't contain any characters outside this charset.

There is, however, a special case of HTML e-mails which adds yet another option which is described below.


HTML-encoded international characters ("&#xxxx;" sequences)

When creating HTML e-mails, you may opt not to use international characters at all and stick with ASCII charset by converting all international characters into their HTML-encoded representation. That is, Привет HTML sequence represents "Привет" ("Hello" in Russian).

Pros:

Cons:

The best way of using HTML encoded sequences is when you need to display special characters (or strings which can be treated so). For instance, punctuation symbols, math symbols, quotes in Ancient Greek if they serve decoration purposes, and so on. Such strings do not require indexing as they do not contain any searchable text, and they usually do not take the most part of the document and thus won't significantly increase its size when HTML-encoded.

Thus, if your text contains 10.000 Cyrillic characters and 5 Western European symbols, you may use UTF-8 and it will work fine but the produced text will be about 30.000 bytes long (3 bytes per Cyrillic char in UTF-8). If you opt to use windows-1251 for Cyrillic characters and HTML encoding for the remaining 5 Western European symbols, you'll get slightly more than 10.000 bytes (1 byte per Cyrillic char in windows-1251).

Even if you do not care much about the size of produced e-mails, good understanding of charsets and HTML encodings may still help you to quickly and efficiently resolve issues related to storing and processing non-English data.


HTML META tag with charset declaration

When creating HTML e-mails, be aware of HTML META tag containing the charset declaration!

If your HTML includes a charset information in HTML META tag, that charset value must strictly match the charset you set with MailMessage.Charset property of the e-mail being composed.

This sample correctly sets both charset values (we assume Smtp.Message property denotes the e-mail to be sent):

mailer.Message.Charset = "windows-1252";
mailer.Message.BodyHtmlText =
    @"<meta http-equiv=Content-Type content=""text/html; charset=windows-1252"">";
mailer.Message.Charset = "windows-1252"
mailer.Message.BodyHtmlText = _
    "<meta http-equiv=Content-Type content=""text/html; charset=windows-1252"">"

Or, you can simply remove this meta tag at all since it's usually needed only for web pages. E-mail messages provide content-type and charset information in their own headers, no need to also duplicate it in HTML headers.

However, if you do not compose the HTML data manually but rather load it from an external source like HTML file or web page, you have no control of META tag and you may even not know what's in it. Moreover, you may not know the charset of the HTML data (because an HTML file does not have any headers like e-mails do).

In this case, META tag may be of a great help because it allows MailBee.NET to understand the charset encoding of the HTML data. When you import HTML data with MailMessage.LoadBodyText method, MailBee.NET automatically picks up the charset encoding from META tag (if it's there) and sets the charset of the produced HTML text part.

Thus, when loading HTML from a web page or file and that HTML already contains the charset information in META tag, you won't need to set MailMessage.Charset property manually. But what if you don't know if the HTML contains any charset information at all? This is meaningful question as HTML documents may freely contain no charset specification. In such case, you may want to specify the charset manually only when the charset is not specified by META tag:

// Import HTML body from a file.
mailer.Message.LoadBodyText(@"C:\Temp\example.html", MessageBodyType.Html);

// If the HTML contains charset specification in META tag,
// MailBee.NET will automatically assign it to the message.
// If the tag is not there, use UTF-8 charset for the message.
if (string.IsNullOrEmpty(mailer.Message.BodyParts.Html.Charset))
{
    mailer.Message.Charset = "utf-8";
}
' Import HTML body from a file.
mailer.Message.LoadBodyText("C:\Temp\example.html", MessageBodyType.Html)

' If the HTML contains charset specification in META tag,
' MailBee.NET will automatically assign it to the message.
' If the tag is not there, use UTF-8 charset for the message.
If String.IsNullOrEmpty(mailer.Message.BodyParts.Html.Charset) Then
    mailer.Message.Charset = "utf-8"
End If

It's recommended to use this approach whenever you load HTML content from external sources (rather than compose it manually).


Send plain-text e-mail with Chinese content and "utf-8" or "gb2312" charset

You can use UTF-8 to represent text written in any language. It's usually the only option to use if don't know the language of the text.

In case if you know that the text is in written in a certain language (let's say, Simplified Chinese), you can opt to use its local and more specific charset (like gb2312) which represents only the characters of its language and Latin. This makes the text encoded with a local codepage consume less space than encoded with UTF-8.

This sample uses either utf-8 or gb2312 charset to create a plain-text e-mail with headers and body containing Simplified Chinese:

Smtp mailer = new Smtp();

// This time, let's use logging just to remind that it's
// a very effective debug and trouble-shooting tool.
mailer.Log.Enabled = true;
mailer.Log.Filename = @"C:\log.txt";
mailer.Log.Clear();

// Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender", "secret");

// Use this to produce UTF-8 e-mail.
mailer.Message.Charset = "utf-8";

// Or use this to produce Chinese-only e-mail.
mailer.Message.Charset = "gb2312";

// "寄件人" means "Sender".
mailer.Message.From = new EmailAddress("sender@domain.com", "寄件人");

// "收件人" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "收件人");

// Set subject and body.
mailer.Message.Subject = "科目 means Subject in Chinese";
mailer.Message.BodyPlainText = "文本 means Text in Chinese";

// Make sure all headers with non-Latin chars are encoded.
// System.Text.Encoding.GetEncoding(mailer.Message.Charset)
// is used because we don't know which charset you'll use
// (utf-8 or gb2312). If we knew it's, let's say, utf-8,
// we'd simply specify System.Text.Encoding.UTF8.
// Also note that we decided to encode with Base64 rather
// than with Quoted-Printable (default). Chinese texts
// (no matter utf-8 or gb2312) appear shorter in Base64 if
// compared to Quoted-Printable.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.GetEncoding(mailer.Message.Charset),
    HeaderEncodingOptions.Base64);

// For the same reason (to make the data size in bytes smaller), use
// Base64 instead of Quoted-Printable to encode the message body
// provided that is contains mostly non-Latin characters which consume
// a lot of space when encoded with Quoted-Printable. This is, however,
// just an optimization. We could leave Quoted-Printable as is.
mailer.Message.MailTransferEncodingPlain = MailTransferEncoding.Base64;

mailer.Send();
Dim mailer As Smtp = New Smtp

' This time, let's use logging just to remind that it's
' a very effective debug and trouble-shooting tool.
mailer.Log.Enabled = True
mailer.Log.Filename = "C:\log.txt"
mailer.Log.Clear()

' Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender", "secret")

' Use this to produce UTF-8 e-mail.
mailer.Message.Charset = "utf-8"

' Or use this to produce Chinese-only e-mail.
mailer.Message.Charset = "gb2312"

' "寄件人" means "Sender".
mailer.Message.From = New EmailAddress("sender@domain.com", "寄件人")

' "收件人" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "收件人")

' Set subject and body.
mailer.Message.Subject = "科目 means Subject in Chinese"
mailer.Message.BodyPlainText = "文本 means Text in Chinese"

' Make sure all headers with non-Latin chars are encoded.
' System.Text.Encoding.GetEncoding(mailer.Message.Charset)
' is used because we don't know which charset you'll use
' (utf-8 or gb2312). If we knew it's, let's say, utf-8,
' we'd simply specify System.Text.Encoding.UTF8.
' Also note that we decided to encode with Base64 rather
' than with Quoted-Printable (default). Chinese texts
' (no matter utf-8 or gb2312) appear shorter in Base64 if
' compared to Quoted-Printable.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.GetEncoding(mailer.Message.Charset),
    HeaderEncodingOptions.Base64)

' For the same reason (to make the data size in bytes smaller), use
' Base64 instead of Quoted-Printable to encode the message body
' provided that is contains mostly non-Latin characters which consume
' a lot of space when encoded with Quoted-Printable. This is, however,
' just an optimization. We could leave Quoted-Printable as is.
mailer.Message.MailTransferEncodingPlain = MailTransferEncoding.Base64

mailer.Send()

For HTML e-mails, make sure META tag either contains the correct charset or no charset information at all. See Send HTML e-mail with international characters topic for details.


Send plain-text e-mail with Windows-1252 headers (German characters in From, Subject, To)

We'll use windows-1252 charset in this example but we could have used UTF-8 either:

Smtp mailer = new Smtp();

// Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender", "secret");

// Change to utf-8 if you prefer UTF-8 charset.
mailer.Message.Charset = "windows-1252";

// "Absender" means "Sender".
mailer.Message.From = new EmailAddress("sender@domain.com", "Absender");

// "Empfänger" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "Empfänger");

// "The subject contains a special symbol" in German.
mailer.Message.Subject = "Das Thema enthält ein spezielles Symbol";

// "This text contains a special symbol" in German.
mailer.Message.BodyPlainText = "Dieser Text enthält ein spezielles Symbol";

// Make sure all headers with non-Latin chars are encoded.
// 1252 is the codepage number of windows-1252 charset.
// Should you need UTF-8, use 65001 codepage number or just
// System.Text.Encoding.UTF8 object.
// Also note that we decided to stay with Quoted-Printable
// (default) while the Chinese sample used Base64.
// Quoted-Printable is perfect if the text mostly contains
// Latin characters with rare occurrences of non-Latin ones.
// German is a typical "mostly Latin" language.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.GetEncoding(1252),
    HeaderEncodingOptions.Base64);

mailer.Send()
Dim mailer As Smtp = New Smtp

' Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender", "secret")

' Change to utf-8 if you prefer UTF-8 charset.
mailer.Message.Charset = "windows-1252"

' "Absender" means "Sender".
mailer.Message.From = New EmailAddress("sender@domain.com", "Absender")

' "Empfänger" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "Empfänger")

' "The subject contains a special symbol" in German.
mailer.Message.Subject = "Das Thema enthält ein spezielles Symbol"

' "This text contains a special symbol" in German.
mailer.Message.BodyPlainText = "Dieser Text enthält ein spezielles Symbol"

' Make sure all headers with non-Latin chars are encoded.
' 1252 is the codepage number of windows-1252 charset.
' Should you need UTF-8, use 65001 codepage number or just
' System.Text.Encoding.UTF8 object.
' Also note that we decided to stay with Quoted-Printable
' (default) while the Chinese sample used Base64.
' Quoted-Printable is perfect if the text mostly contains
' Latin characters with rare occurrences of non-Latin ones.
' German is a typical "mostly Latin" language.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.GetEncoding(1252),
    HeaderEncodingOptions.Base64)

mailer.Send()

You can use the same approach with any other local codepages and languages, not only Windows-1252 and German.


Send HTML e-mail with international characters

HTML format raises a number of considerations related to international characters and charsets:

The sample below creates HTML e-mail in UTF-8 charset. However, the HTML data gets imported from a web page. If this data contains charset in META tag, that charset will be used for the message body (instead of UTF-8). The message headers will still be encoded in UTF-8:

Smtp mailer = new Smtp();

// Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender@domain.com", "secret");

// "Αποστολέας" means "Sender".
mailer.Message.From = new EmailAddress("sender@domain.com", "Αποστολέας");

// "Αποδέκτη" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "Αποδέκτη");

// Set subject and body.
mailer.Message.Subject = "Θέμα means Subject in Greek";

// Import HTML body from a web page. Detect encoding from meta tag.
// If meta tag is missing, MailBee.NET will assume windows-1253
// as it's Greek's the most popular codepage.
mailer.Message.LoadBodyText("http://www.domain.gr", MessageBodyType.Html,
    System.Text.Encoding.GetEncoding(1253),
    ImportBodyOptions.PathIsUri |
    ImportBodyOptions.PreferCharsetFromMetaTag);

// If the HTML contains charset specification in META tag,
// MailBee.NET will automatically assign it to the message.
// If the tag is not there, use UTF-8 charset for the message.
if (string.IsNullOrEmpty(mailer.Message.BodyParts.Html.Charset))
{
    mailer.Message.Charset = "utf-8";
}

// Let's use UTF-8 for headers regardless what charset will be used
// for the body parts. Use Base64 rather than Quoted-Printable as
// Greek phrases contain mosty non-Latin chars and their Base64
// representation will be shorter than Quoted-Printable.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.UTF8,
    HeaderEncodingOptions.Base64);

// Note that MailBee.NET will also create the plain-text version
// from the supplied HTML when the message gets built.
mailer.Send();
Dim mailer As Smtp = New Smtp

' Use SMTP relay with authentication.
mailer.SmtpServers.Add("smtp.domain.com", "sender@domain.com", "secret")

' "Αποστολέας means "Sender".
mailer.Message.From = New EmailAddress("sender@domain.com", "Αποστολέας")

' "Αποδέκτη" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "Αποδέκτη")

' Set subject and body.
mailer.Message.Subject = "Θέμα means Subject in Greek"

' Import HTML body from a web page. Detect encoding from meta tag.
' If meta tag is missing, MailBee.NET will assume windows-1253
' as it's Greek's the most popular codepage.
mailer.Message.LoadBodyText("http://www.domain.gr", MessageBodyType.Html, _
    System.Text.Encoding.GetEncoding(1253), _
    ImportBodyOptions.PathIsUri Or _
    ImportBodyOptions.PreferCharsetFromMetaTag)

' If the HTML contains charset specification in META tag,
' MailBee.NET will automatically assign it to the message.
' If the tag is not there, use UTF-8 charset for the message.
If String.IsNullOrEmpty(mailer.Message.BodyParts.Html.Charset) Then
    mailer.Message.Charset = "utf-8"
End If

' Let's use UTF-8 for headers regardless what charset will be used
' for the body parts. Use Base64 rather than Quoted-Printable as
' Greek phrases contain mosty non-Latin chars and their Base64
' representation will be shorter than Quoted-Printable.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.UTF8,
    HeaderEncodingOptions.Base64)

' Note that MailBee.NET will also create the plain-text version
' from the supplied HTML when the message gets built.
mailer.Send()

As you can see, the sample does not encode international characters with &#xxxx; sequences. Instead, the correct charset is specified or determined from the META tag. This ensures that the plain-text version of the body will also have these international characters preserved (for plain-text, there is no other way to represent international characters except of using charsets).


Compose e-mail in right-to-left language like Hebrew

You can compose both plain-text and HTML e-mails in right-to-left languages like Arabic and Hebrew without any additional considerations. There are no special settings in MIME for right-to-left rendering, and e-mail reader programs automatically select the direction depending on the actual characters displayed.

You only need to specify the correct charset (UTF-8 or local language charset), just like you would do for any other international e-mail in left-to-right language:

// Set Hebrew's local codepage.
mailer.Message.Charset = "windows-1255";

// Set subject and body.
mailer.Message.Subject = "נושא means Subject in Hebrew";
mailer.Message.BodyPlainText = "טקסט means Text in Hebrew";
' Set Hebrew's local codepage.
mailer.Message.Charset = "windows-1255"

' Set subject and body.
mailer.Message.Subject = "נושא means Subject in Hebrew"
mailer.Message.BodyPlainText = "טקסט means Text in Hebrew"

With HTML e-mails, you may need to apply RTL attribute to certain HTML tags when building complex right-to-left documents but these are general HTML design concerns rather than e-mail specific issues. HTML design of right-to-left documents is beyond the scope of this guide.


Send e-mail with international characters in headers, body, attachment filenames

The sample below composes an international e-mail (Japanese) with the attachment which has Japanese filename. Nothing special here because MailMessage.EncodeAllHeaders method (which was widely used in previous samples) also encodes all non-Latin attachment filenames. Just be sure to call MailMessage.EncodeAllHeaders AFTER adding attachments, not BEFORE:

Smtp mailer = new Smtp();

// Use SMTP relay with authentication.
mailer.SmtpServers.Add("mail.domain.com", "sender", "secret");

// Or we could have used "utf-8" as well.
mailer.Message.Charset = "iso-2022-jp";

// "送信者" means "Sender".
mailer.Message.From = new EmailAddress("sender@domain.com", "送信者");

// "受信者" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "受信者");

// Set subject.
mailer.Message.Subject = "題目 means Subject in Japanese";

// Set HTML body. MailBee.NET will generate plain-text version
// automatically (Japanese text will be converted correctly).
mailer.Message.BodyHtmlText =
    "<html><body>テキスト means Text in Japanese</body></html>";

// 1st attachment has normal Latin filename on the filesystem but it will
// have Japanese name in the e-mail.
mailer.Message.Attachments.Add(@"C:\Temp\12345678.tmp", "ドキュメント.doc");

// 2nd attachment has Japanese filename on the filesystem and will retain
// it in the e-mail.
mailer.Message.Attachments.Add(@"C:\Temp\イメージ.jpg");

// It's important to call this AFTER attachments have already been added.
mailer.Message.EncodeAllHeaders(
    System.Text.Encoding.GetEncoding(mailer.Message.Charset),
    HeaderEncodingOptions.Base64);

// Uncomment if the message body consists mainly of non-Latin (Japanese)
// chars to decrease the resulting message size. In the current sample,
// Base64 won't provide any benefit because most body chars are HTML tags
// with Latin content, and only few chars are Japanese. Thus, we left the
// default Quoted-Printable setting as is.

// mailer.Message.MailTransferEncodingPlain = MailTransferEncoding.Base64;
// mailer.Message.MailTransferEncodingHtml = MailTransferEncoding.Base64;

mailer.Send();
Dim mailer As Smtp = New Smtp()

' Use SMTP relay with authentication.
mailer.SmtpServers.Add("mail.domain.com", "sender", "secret")

' Or we could have used "utf-8" as well.
mailer.Message.Charset = "iso-2022-jp"

' "送信者" means "Sender".
mailer.Message.From = New EmailAddress("sender@domain.com", "送信者")

' "受信者" means "Recipient".
mailer.Message.To.Add("recipient@company.com", "受信者")

' Set subject.
mailer.Message.Subject = "題目 means Subject in Japanese"

' Set HTML body. MailBee.NET will generate plain-text version
' automatically (Japanese text will be converted correctly).
mailer.Message.BodyHtmlText = _
    "<html><body>テキスト means Text in Japanese</body></html>"

' 1st attachment has normal Latin filename on the filesystem but it will
' have Japanese name in the e-mail.
mailer.Message.Attachments.Add("C:\Temp\12345678.tmp", "ドキュメント.doc")

' 2nd attachment has Japanese filename on the filesystem and will retain
' it in the e-mail.
mailer.Message.Attachments.Add("C:\Temp\イメージ.jpg")

' It's important to call this AFTER attachments have already been added.
mailer.Message.EncodeAllHeaders( _
    System.Text.Encoding.GetEncoding(mailer.Message.Charset), _
    HeaderEncodingOptions.Base64)

' Uncomment if the message body consists mainly of non-Latin (Japanese)
' chars to decrease the resulting message size. In the current sample,
' Base64 won't provide any benefit because most body chars are HTML tags
' with Latin content, and only few chars are Japanese. Thus, we left the
' default Quoted-Printable setting as is.

' mailer.Message.MailTransferEncodingPlain = MailTransferEncoding.Base64
' mailer.Message.MailTransferEncodingHtml = MailTransferEncoding.Base64

mailer.Send()

Also, MailMessage.EncodeAllHeaders method has an option NOT to encode attachment filenames, but you won't need it in most cases.


Send feedback to AfterLogic

Copyright © 2006-2017 AfterLogic Corporation. All rights reserved.