--- /dev/null
+rfc1939.txt POP3
+rfc2821.txt SMTP
+rfc2822.txt Internet Message Format
+rfc977.txt NNTP
+rfc2045.txt MIME 1
+rfc2046.txt MIME 2
+rfc2047.txt MIME 3
+rfc2048.txt MIME 4
+rfc2049.txt MIME 5
+rfc2060.txt IMAP4
--- /dev/null
+\r
+\r
+\r
+\r
+\r
+\r
+Network Working Group J. Myers\r
+Request for Comments: 1939 Carnegie Mellon\r
+STD: 53 M. Rose\r
+Obsoletes: 1725 Dover Beach Consulting, Inc.\r
+Category: Standards Track May 1996\r
+\r
+\r
+ Post Office Protocol - Version 3\r
+\r
+Status of this Memo\r
+\r
+ This document specifies an Internet standards track protocol for the\r
+ Internet community, and requests discussion and suggestions for\r
+ improvements. Please refer to the current edition of the "Internet\r
+ Official Protocol Standards" (STD 1) for the standardization state\r
+ and status of this protocol. Distribution of this memo is unlimited.\r
+\r
+Table of Contents\r
+\r
+ 1. Introduction ................................................ 2\r
+ 2. A Short Digression .......................................... 2\r
+ 3. Basic Operation ............................................. 3\r
+ 4. The AUTHORIZATION State ..................................... 4\r
+ QUIT Command ................................................ 5\r
+ 5. The TRANSACTION State ....................................... 5\r
+ STAT Command ................................................ 6\r
+ LIST Command ................................................ 6\r
+ RETR Command ................................................ 8\r
+ DELE Command ................................................ 8\r
+ NOOP Command ................................................ 9\r
+ RSET Command ................................................ 9\r
+ 6. The UPDATE State ............................................ 10\r
+ QUIT Command ................................................ 10\r
+ 7. Optional POP3 Commands ...................................... 11\r
+ TOP Command ................................................. 11\r
+ UIDL Command ................................................ 12\r
+ USER Command ................................................ 13\r
+ PASS Command ................................................ 14\r
+ APOP Command ................................................ 15\r
+ 8. Scaling and Operational Considerations ...................... 16\r
+ 9. POP3 Command Summary ........................................ 18\r
+ 10. Example POP3 Session ....................................... 19\r
+ 11. Message Format ............................................. 19\r
+ 12. References ................................................. 20\r
+ 13. Security Considerations .................................... 20\r
+ 14. Acknowledgements ........................................... 20\r
+ 15. Authors' Addresses ......................................... 21\r
+ Appendix A. Differences from RFC 1725 .......................... 22\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 1]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ Appendix B. Command Index ...................................... 23\r
+\r
+1. Introduction\r
+\r
+ On certain types of smaller nodes in the Internet it is often\r
+ impractical to maintain a message transport system (MTS). For\r
+ example, a workstation may not have sufficient resources (cycles,\r
+ disk space) in order to permit a SMTP server [RFC821] and associated\r
+ local mail delivery system to be kept resident and continuously\r
+ running. Similarly, it may be expensive (or impossible) to keep a\r
+ personal computer interconnected to an IP-style network for long\r
+ amounts of time (the node is lacking the resource known as\r
+ "connectivity").\r
+\r
+ Despite this, it is often very useful to be able to manage mail on\r
+ these smaller nodes, and they often support a user agent (UA) to aid\r
+ the tasks of mail handling. To solve this problem, a node which can\r
+ support an MTS entity offers a maildrop service to these less endowed\r
+ nodes. The Post Office Protocol - Version 3 (POP3) is intended to\r
+ permit a workstation to dynamically access a maildrop on a server\r
+ host in a useful fashion. Usually, this means that the POP3 protocol\r
+ is used to allow a workstation to retrieve mail that the server is\r
+ holding for it.\r
+\r
+ POP3 is not intended to provide extensive manipulation operations of\r
+ mail on the server; normally, mail is downloaded and then deleted. A\r
+ more advanced (and complex) protocol, IMAP4, is discussed in\r
+ [RFC1730].\r
+\r
+ For the remainder of this memo, the term "client host" refers to a\r
+ host making use of the POP3 service, while the term "server host"\r
+ refers to a host which offers the POP3 service.\r
+\r
+2. A Short Digression\r
+\r
+ This memo does not specify how a client host enters mail into the\r
+ transport system, although a method consistent with the philosophy of\r
+ this memo is presented here:\r
+\r
+ When the user agent on a client host wishes to enter a message\r
+ into the transport system, it establishes an SMTP connection to\r
+ its relay host and sends all mail to it. This relay host could\r
+ be, but need not be, the POP3 server host for the client host. Of\r
+ course, the relay host must accept mail for delivery to arbitrary\r
+ recipient addresses, that functionality is not required of all\r
+ SMTP servers.\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 2]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+3. Basic Operation\r
+\r
+ Initially, the server host starts the POP3 service by listening on\r
+ TCP port 110. When a client host wishes to make use of the service,\r
+ it establishes a TCP connection with the server host. When the\r
+ connection is established, the POP3 server sends a greeting. The\r
+ client and POP3 server then exchange commands and responses\r
+ (respectively) until the connection is closed or aborted.\r
+\r
+ Commands in the POP3 consist of a case-insensitive keyword, possibly\r
+ followed by one or more arguments. All commands are terminated by a\r
+ CRLF pair. Keywords and arguments consist of printable ASCII\r
+ characters. Keywords and arguments are each separated by a single\r
+ SPACE character. Keywords are three or four characters long. Each\r
+ argument may be up to 40 characters long.\r
+\r
+ Responses in the POP3 consist of a status indicator and a keyword\r
+ possibly followed by additional information. All responses are\r
+ terminated by a CRLF pair. Responses may be up to 512 characters\r
+ long, including the terminating CRLF. There are currently two status\r
+ indicators: positive ("+OK") and negative ("-ERR"). Servers MUST\r
+ send the "+OK" and "-ERR" in upper case.\r
+\r
+ Responses to certain commands are multi-line. In these cases, which\r
+ are clearly indicated below, after sending the first line of the\r
+ response and a CRLF, any additional lines are sent, each terminated\r
+ by a CRLF pair. When all lines of the response have been sent, a\r
+ final line is sent, consisting of a termination octet (decimal code\r
+ 046, ".") and a CRLF pair. If any line of the multi-line response\r
+ begins with the termination octet, the line is "byte-stuffed" by\r
+ pre-pending the termination octet to that line of the response.\r
+ Hence a multi-line response is terminated with the five octets\r
+ "CRLF.CRLF". When examining a multi-line response, the client checks\r
+ to see if the line begins with the termination octet. If so and if\r
+ octets other than CRLF follow, the first octet of the line (the\r
+ termination octet) is stripped away. If so and if CRLF immediately\r
+ follows the termination character, then the response from the POP\r
+ server is ended and the line containing ".CRLF" is not considered\r
+ part of the multi-line response.\r
+\r
+ A POP3 session progresses through a number of states during its\r
+ lifetime. Once the TCP connection has been opened and the POP3\r
+ server has sent the greeting, the session enters the AUTHORIZATION\r
+ state. In this state, the client must identify itself to the POP3\r
+ server. Once the client has successfully done this, the server\r
+ acquires resources associated with the client's maildrop, and the\r
+ session enters the TRANSACTION state. In this state, the client\r
+ requests actions on the part of the POP3 server. When the client has\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 3]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ issued the QUIT command, the session enters the UPDATE state. In\r
+ this state, the POP3 server releases any resources acquired during\r
+ the TRANSACTION state and says goodbye. The TCP connection is then\r
+ closed.\r
+\r
+ A server MUST respond to an unrecognized, unimplemented, or\r
+ syntactically invalid command by responding with a negative status\r
+ indicator. A server MUST respond to a command issued when the\r
+ session is in an incorrect state by responding with a negative status\r
+ indicator. There is no general method for a client to distinguish\r
+ between a server which does not implement an optional command and a\r
+ server which is unwilling or unable to process the command.\r
+\r
+ A POP3 server MAY have an inactivity autologout timer. Such a timer\r
+ MUST be of at least 10 minutes' duration. The receipt of any command\r
+ from the client during that interval should suffice to reset the\r
+ autologout timer. When the timer expires, the session does NOT enter\r
+ the UPDATE state--the server should close the TCP connection without\r
+ removing any messages or sending any response to the client.\r
+\r
+4. The AUTHORIZATION State\r
+\r
+ Once the TCP connection has been opened by a POP3 client, the POP3\r
+ server issues a one line greeting. This can be any positive\r
+ response. An example might be:\r
+\r
+ S: +OK POP3 server ready\r
+\r
+ The POP3 session is now in the AUTHORIZATION state. The client must\r
+ now identify and authenticate itself to the POP3 server. Two\r
+ possible mechanisms for doing this are described in this document,\r
+ the USER and PASS command combination and the APOP command. Both\r
+ mechanisms are described later in this document. Additional\r
+ authentication mechanisms are described in [RFC1734]. While there is\r
+ no single authentication mechanism that is required of all POP3\r
+ servers, a POP3 server must of course support at least one\r
+ authentication mechanism.\r
+\r
+ Once the POP3 server has determined through the use of any\r
+ authentication command that the client should be given access to the\r
+ appropriate maildrop, the POP3 server then acquires an exclusive-\r
+ access lock on the maildrop, as necessary to prevent messages from\r
+ being modified or removed before the session enters the UPDATE state.\r
+ If the lock is successfully acquired, the POP3 server responds with a\r
+ positive status indicator. The POP3 session now enters the\r
+ TRANSACTION state, with no messages marked as deleted. If the\r
+ maildrop cannot be opened for some reason (for example, a lock can\r
+ not be acquired, the client is denied access to the appropriate\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 4]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ maildrop, or the maildrop cannot be parsed), the POP3 server responds\r
+ with a negative status indicator. (If a lock was acquired but the\r
+ POP3 server intends to respond with a negative status indicator, the\r
+ POP3 server must release the lock prior to rejecting the command.)\r
+ After returning a negative status indicator, the server may close the\r
+ connection. If the server does not close the connection, the client\r
+ may either issue a new authentication command and start again, or the\r
+ client may issue the QUIT command.\r
+\r
+ After the POP3 server has opened the maildrop, it assigns a message-\r
+ number to each message, and notes the size of each message in octets.\r
+ The first message in the maildrop is assigned a message-number of\r
+ "1", the second is assigned "2", and so on, so that the nth message\r
+ in a maildrop is assigned a message-number of "n". In POP3 commands\r
+ and responses, all message-numbers and message sizes are expressed in\r
+ base-10 (i.e., decimal).\r
+\r
+ Here is the summary for the QUIT command when used in the\r
+ AUTHORIZATION state:\r
+\r
+ QUIT\r
+\r
+ Arguments: none\r
+\r
+ Restrictions: none\r
+\r
+ Possible Responses:\r
+ +OK\r
+\r
+ Examples:\r
+ C: QUIT\r
+ S: +OK dewey POP3 server signing off\r
+\r
+5. The TRANSACTION State\r
+\r
+ Once the client has successfully identified itself to the POP3 server\r
+ and the POP3 server has locked and opened the appropriate maildrop,\r
+ the POP3 session is now in the TRANSACTION state. The client may now\r
+ issue any of the following POP3 commands repeatedly. After each\r
+ command, the POP3 server issues a response. Eventually, the client\r
+ issues the QUIT command and the POP3 session enters the UPDATE state.\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 5]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ Here are the POP3 commands valid in the TRANSACTION state:\r
+\r
+ STAT\r
+\r
+ Arguments: none\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ The POP3 server issues a positive response with a line\r
+ containing information for the maildrop. This line is\r
+ called a "drop listing" for that maildrop.\r
+\r
+ In order to simplify parsing, all POP3 servers are\r
+ required to use a certain format for drop listings. The\r
+ positive response consists of "+OK" followed by a single\r
+ space, the number of messages in the maildrop, a single\r
+ space, and the size of the maildrop in octets. This memo\r
+ makes no requirement on what follows the maildrop size.\r
+ Minimal implementations should just end that line of the\r
+ response with a CRLF pair. More advanced implementations\r
+ may include other information.\r
+\r
+ NOTE: This memo STRONGLY discourages implementations\r
+ from supplying additional information in the drop\r
+ listing. Other, optional, facilities are discussed\r
+ later on which permit the client to parse the messages\r
+ in the maildrop.\r
+\r
+ Note that messages marked as deleted are not counted in\r
+ either total.\r
+\r
+ Possible Responses:\r
+ +OK nn mm\r
+\r
+ Examples:\r
+ C: STAT\r
+ S: +OK 2 320\r
+\r
+\r
+ LIST [msg]\r
+\r
+ Arguments:\r
+ a message-number (optional), which, if present, may NOT\r
+ refer to a message marked as deleted\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 6]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ If an argument was given and the POP3 server issues a\r
+ positive response with a line containing information for\r
+ that message. This line is called a "scan listing" for\r
+ that message.\r
+\r
+ If no argument was given and the POP3 server issues a\r
+ positive response, then the response given is multi-line.\r
+ After the initial +OK, for each message in the maildrop,\r
+ the POP3 server responds with a line containing\r
+ information for that message. This line is also called a\r
+ "scan listing" for that message. If there are no\r
+ messages in the maildrop, then the POP3 server responds\r
+ with no scan listings--it issues a positive response\r
+ followed by a line containing a termination octet and a\r
+ CRLF pair.\r
+\r
+ In order to simplify parsing, all POP3 servers are\r
+ required to use a certain format for scan listings. A\r
+ scan listing consists of the message-number of the\r
+ message, followed by a single space and the exact size of\r
+ the message in octets. Methods for calculating the exact\r
+ size of the message are described in the "Message Format"\r
+ section below. This memo makes no requirement on what\r
+ follows the message size in the scan listing. Minimal\r
+ implementations should just end that line of the response\r
+ with a CRLF pair. More advanced implementations may\r
+ include other information, as parsed from the message.\r
+\r
+ NOTE: This memo STRONGLY discourages implementations\r
+ from supplying additional information in the scan\r
+ listing. Other, optional, facilities are discussed\r
+ later on which permit the client to parse the messages\r
+ in the maildrop.\r
+\r
+ Note that messages marked as deleted are not listed.\r
+\r
+ Possible Responses:\r
+ +OK scan listing follows\r
+ -ERR no such message\r
+\r
+ Examples:\r
+ C: LIST\r
+ S: +OK 2 messages (320 octets)\r
+ S: 1 120\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 7]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ S: 2 200\r
+ S: .\r
+ ...\r
+ C: LIST 2\r
+ S: +OK 2 200\r
+ ...\r
+ C: LIST 3\r
+ S: -ERR no such message, only 2 messages in maildrop\r
+\r
+\r
+ RETR msg\r
+\r
+ Arguments:\r
+ a message-number (required) which may NOT refer to a\r
+ message marked as deleted\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ If the POP3 server issues a positive response, then the\r
+ response given is multi-line. After the initial +OK, the\r
+ POP3 server sends the message corresponding to the given\r
+ message-number, being careful to byte-stuff the termination\r
+ character (as with all multi-line responses).\r
+\r
+ Possible Responses:\r
+ +OK message follows\r
+ -ERR no such message\r
+\r
+ Examples:\r
+ C: RETR 1\r
+ S: +OK 120 octets\r
+ S: <the POP3 server sends the entire message here>\r
+ S: .\r
+\r
+\r
+ DELE msg\r
+\r
+ Arguments:\r
+ a message-number (required) which may NOT refer to a\r
+ message marked as deleted\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 8]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ Discussion:\r
+ The POP3 server marks the message as deleted. Any future\r
+ reference to the message-number associated with the message\r
+ in a POP3 command generates an error. The POP3 server does\r
+ not actually delete the message until the POP3 session\r
+ enters the UPDATE state.\r
+\r
+ Possible Responses:\r
+ +OK message deleted\r
+ -ERR no such message\r
+\r
+ Examples:\r
+ C: DELE 1\r
+ S: +OK message 1 deleted\r
+ ...\r
+ C: DELE 2\r
+ S: -ERR message 2 already deleted\r
+\r
+\r
+ NOOP\r
+\r
+ Arguments: none\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ The POP3 server does nothing, it merely replies with a\r
+ positive response.\r
+\r
+ Possible Responses:\r
+ +OK\r
+\r
+ Examples:\r
+ C: NOOP\r
+ S: +OK\r
+\r
+\r
+ RSET\r
+\r
+ Arguments: none\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ If any messages have been marked as deleted by the POP3\r
+ server, they are unmarked. The POP3 server then replies\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 9]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ with a positive response.\r
+\r
+ Possible Responses:\r
+ +OK\r
+\r
+ Examples:\r
+ C: RSET\r
+ S: +OK maildrop has 2 messages (320 octets)\r
+\r
+6. The UPDATE State\r
+\r
+ When the client issues the QUIT command from the TRANSACTION state,\r
+ the POP3 session enters the UPDATE state. (Note that if the client\r
+ issues the QUIT command from the AUTHORIZATION state, the POP3\r
+ session terminates but does NOT enter the UPDATE state.)\r
+\r
+ If a session terminates for some reason other than a client-issued\r
+ QUIT command, the POP3 session does NOT enter the UPDATE state and\r
+ MUST not remove any messages from the maildrop.\r
+\r
+ QUIT\r
+\r
+ Arguments: none\r
+\r
+ Restrictions: none\r
+\r
+ Discussion:\r
+ The POP3 server removes all messages marked as deleted\r
+ from the maildrop and replies as to the status of this\r
+ operation. If there is an error, such as a resource\r
+ shortage, encountered while removing messages, the\r
+ maildrop may result in having some or none of the messages\r
+ marked as deleted be removed. In no case may the server\r
+ remove any messages not marked as deleted.\r
+\r
+ Whether the removal was successful or not, the server\r
+ then releases any exclusive-access lock on the maildrop\r
+ and closes the TCP connection.\r
+\r
+ Possible Responses:\r
+ +OK\r
+ -ERR some deleted messages not removed\r
+\r
+ Examples:\r
+ C: QUIT\r
+ S: +OK dewey POP3 server signing off (maildrop empty)\r
+ ...\r
+ C: QUIT\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 10]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ S: +OK dewey POP3 server signing off (2 messages left)\r
+ ...\r
+\r
+7. Optional POP3 Commands\r
+\r
+ The POP3 commands discussed above must be supported by all minimal\r
+ implementations of POP3 servers.\r
+\r
+ The optional POP3 commands described below permit a POP3 client\r
+ greater freedom in message handling, while preserving a simple POP3\r
+ server implementation.\r
+\r
+ NOTE: This memo STRONGLY encourages implementations to support\r
+ these commands in lieu of developing augmented drop and scan\r
+ listings. In short, the philosophy of this memo is to put\r
+ intelligence in the part of the POP3 client and not the POP3\r
+ server.\r
+\r
+ TOP msg n\r
+\r
+ Arguments:\r
+ a message-number (required) which may NOT refer to to a\r
+ message marked as deleted, and a non-negative number\r
+ of lines (required)\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state\r
+\r
+ Discussion:\r
+ If the POP3 server issues a positive response, then the\r
+ response given is multi-line. After the initial +OK, the\r
+ POP3 server sends the headers of the message, the blank\r
+ line separating the headers from the body, and then the\r
+ number of lines of the indicated message's body, being\r
+ careful to byte-stuff the termination character (as with\r
+ all multi-line responses).\r
+\r
+ Note that if the number of lines requested by the POP3\r
+ client is greater than than the number of lines in the\r
+ body, then the POP3 server sends the entire message.\r
+\r
+ Possible Responses:\r
+ +OK top of message follows\r
+ -ERR no such message\r
+\r
+ Examples:\r
+ C: TOP 1 10\r
+ S: +OK\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 11]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ S: <the POP3 server sends the headers of the\r
+ message, a blank line, and the first 10 lines\r
+ of the body of the message>\r
+ S: .\r
+ ...\r
+ C: TOP 100 3\r
+ S: -ERR no such message\r
+\r
+\r
+ UIDL [msg]\r
+\r
+ Arguments:\r
+ a message-number (optional), which, if present, may NOT\r
+ refer to a message marked as deleted\r
+\r
+ Restrictions:\r
+ may only be given in the TRANSACTION state.\r
+\r
+ Discussion:\r
+ If an argument was given and the POP3 server issues a positive\r
+ response with a line containing information for that message.\r
+ This line is called a "unique-id listing" for that message.\r
+\r
+ If no argument was given and the POP3 server issues a positive\r
+ response, then the response given is multi-line. After the\r
+ initial +OK, for each message in the maildrop, the POP3 server\r
+ responds with a line containing information for that message.\r
+ This line is called a "unique-id listing" for that message.\r
+\r
+ In order to simplify parsing, all POP3 servers are required to\r
+ use a certain format for unique-id listings. A unique-id\r
+ listing consists of the message-number of the message,\r
+ followed by a single space and the unique-id of the message.\r
+ No information follows the unique-id in the unique-id listing.\r
+\r
+ The unique-id of a message is an arbitrary server-determined\r
+ string, consisting of one to 70 characters in the range 0x21\r
+ to 0x7E, which uniquely identifies a message within a\r
+ maildrop and which persists across sessions. This\r
+ persistence is required even if a session ends without\r
+ entering the UPDATE state. The server should never reuse an\r
+ unique-id in a given maildrop, for as long as the entity\r
+ using the unique-id exists.\r
+\r
+ Note that messages marked as deleted are not listed.\r
+\r
+ While it is generally preferable for server implementations\r
+ to store arbitrarily assigned unique-ids in the maildrop,\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 12]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ this specification is intended to permit unique-ids to be\r
+ calculated as a hash of the message. Clients should be able\r
+ to handle a situation where two identical copies of a\r
+ message in a maildrop have the same unique-id.\r
+\r
+ Possible Responses:\r
+ +OK unique-id listing follows\r
+ -ERR no such message\r
+\r
+ Examples:\r
+ C: UIDL\r
+ S: +OK\r
+ S: 1 whqtswO00WBw418f9t5JxYwZ\r
+ S: 2 QhdPYR:00WBw1Ph7x7\r
+ S: .\r
+ ...\r
+ C: UIDL 2\r
+ S: +OK 2 QhdPYR:00WBw1Ph7x7\r
+ ...\r
+ C: UIDL 3\r
+ S: -ERR no such message, only 2 messages in maildrop\r
+\r
+\r
+ USER name\r
+\r
+ Arguments:\r
+ a string identifying a mailbox (required), which is of\r
+ significance ONLY to the server\r
+\r
+ Restrictions:\r
+ may only be given in the AUTHORIZATION state after the POP3\r
+ greeting or after an unsuccessful USER or PASS command\r
+\r
+ Discussion:\r
+ To authenticate using the USER and PASS command\r
+ combination, the client must first issue the USER\r
+ command. If the POP3 server responds with a positive\r
+ status indicator ("+OK"), then the client may issue\r
+ either the PASS command to complete the authentication,\r
+ or the QUIT command to terminate the POP3 session. If\r
+ the POP3 server responds with a negative status indicator\r
+ ("-ERR") to the USER command, then the client may either\r
+ issue a new authentication command or may issue the QUIT\r
+ command.\r
+\r
+ The server may return a positive response even though no\r
+ such mailbox exists. The server may return a negative\r
+ response if mailbox exists, but does not permit plaintext\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 13]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ password authentication.\r
+\r
+ Possible Responses:\r
+ +OK name is a valid mailbox\r
+ -ERR never heard of mailbox name\r
+\r
+ Examples:\r
+ C: USER frated\r
+ S: -ERR sorry, no mailbox for frated here\r
+ ...\r
+ C: USER mrose\r
+ S: +OK mrose is a real hoopy frood\r
+\r
+\r
+ PASS string\r
+\r
+ Arguments:\r
+ a server/mailbox-specific password (required)\r
+\r
+ Restrictions:\r
+ may only be given in the AUTHORIZATION state immediately\r
+ after a successful USER command\r
+\r
+ Discussion:\r
+ When the client issues the PASS command, the POP3 server\r
+ uses the argument pair from the USER and PASS commands to\r
+ determine if the client should be given access to the\r
+ appropriate maildrop.\r
+\r
+ Since the PASS command has exactly one argument, a POP3\r
+ server may treat spaces in the argument as part of the\r
+ password, instead of as argument separators.\r
+\r
+ Possible Responses:\r
+ +OK maildrop locked and ready\r
+ -ERR invalid password\r
+ -ERR unable to lock maildrop\r
+\r
+ Examples:\r
+ C: USER mrose\r
+ S: +OK mrose is a real hoopy frood\r
+ C: PASS secret\r
+ S: -ERR maildrop already locked\r
+ ...\r
+ C: USER mrose\r
+ S: +OK mrose is a real hoopy frood\r
+ C: PASS secret\r
+ S: +OK mrose's maildrop has 2 messages (320 octets)\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 14]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ APOP name digest\r
+\r
+ Arguments:\r
+ a string identifying a mailbox and a MD5 digest string\r
+ (both required)\r
+\r
+ Restrictions:\r
+ may only be given in the AUTHORIZATION state after the POP3\r
+ greeting or after an unsuccessful USER or PASS command\r
+\r
+ Discussion:\r
+ Normally, each POP3 session starts with a USER/PASS\r
+ exchange. This results in a server/user-id specific\r
+ password being sent in the clear on the network. For\r
+ intermittent use of POP3, this may not introduce a sizable\r
+ risk. However, many POP3 client implementations connect to\r
+ the POP3 server on a regular basis -- to check for new\r
+ mail. Further the interval of session initiation may be on\r
+ the order of five minutes. Hence, the risk of password\r
+ capture is greatly enhanced.\r
+\r
+ An alternate method of authentication is required which\r
+ provides for both origin authentication and replay\r
+ protection, but which does not involve sending a password\r
+ in the clear over the network. The APOP command provides\r
+ this functionality.\r
+\r
+ A POP3 server which implements the APOP command will\r
+ include a timestamp in its banner greeting. The syntax of\r
+ the timestamp corresponds to the `msg-id' in [RFC822], and\r
+ MUST be different each time the POP3 server issues a banner\r
+ greeting. For example, on a UNIX implementation in which a\r
+ separate UNIX process is used for each instance of a POP3\r
+ server, the syntax of the timestamp might be:\r
+\r
+ <process-ID.clock@hostname>\r
+\r
+ where `process-ID' is the decimal value of the process's\r
+ PID, clock is the decimal value of the system clock, and\r
+ hostname is the fully-qualified domain-name corresponding\r
+ to the host where the POP3 server is running.\r
+\r
+ The POP3 client makes note of this timestamp, and then\r
+ issues the APOP command. The `name' parameter has\r
+ identical semantics to the `name' parameter of the USER\r
+ command. The `digest' parameter is calculated by applying\r
+ the MD5 algorithm [RFC1321] to a string consisting of the\r
+ timestamp (including angle-brackets) followed by a shared\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 15]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ secret. This shared secret is a string known only to the\r
+ POP3 client and server. Great care should be taken to\r
+ prevent unauthorized disclosure of the secret, as knowledge\r
+ of the secret will allow any entity to successfully\r
+ masquerade as the named user. The `digest' parameter\r
+ itself is a 16-octet value which is sent in hexadecimal\r
+ format, using lower-case ASCII characters.\r
+\r
+ When the POP3 server receives the APOP command, it verifies\r
+ the digest provided. If the digest is correct, the POP3\r
+ server issues a positive response, and the POP3 session\r
+ enters the TRANSACTION state. Otherwise, a negative\r
+ response is issued and the POP3 session remains in the\r
+ AUTHORIZATION state.\r
+\r
+ Note that as the length of the shared secret increases, so\r
+ does the difficulty of deriving it. As such, shared\r
+ secrets should be long strings (considerably longer than\r
+ the 8-character example shown below).\r
+\r
+ Possible Responses:\r
+ +OK maildrop locked and ready\r
+ -ERR permission denied\r
+\r
+ Examples:\r
+ S: +OK POP3 server ready <1896.697170952@dbc.mtview.ca.us>\r
+ C: APOP mrose c4c9334bac560ecc979e58001b3e22fb\r
+ S: +OK maildrop has 1 message (369 octets)\r
+\r
+ In this example, the shared secret is the string `tan-\r
+ staaf'. Hence, the MD5 algorithm is applied to the string\r
+\r
+ <1896.697170952@dbc.mtview.ca.us>tanstaaf\r
+\r
+ which produces a digest value of\r
+\r
+ c4c9334bac560ecc979e58001b3e22fb\r
+\r
+8. Scaling and Operational Considerations\r
+\r
+ Since some of the optional features described above were added to the\r
+ POP3 protocol, experience has accumulated in using them in large-\r
+ scale commercial post office operations where most of the users are\r
+ unrelated to each other. In these situations and others, users and\r
+ vendors of POP3 clients have discovered that the combination of using\r
+ the UIDL command and not issuing the DELE command can provide a weak\r
+ version of the "maildrop as semi-permanent repository" functionality\r
+ normally associated with IMAP. Of course the other capabilities of\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 16]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ IMAP, such as polling an existing connection for newly arrived\r
+ messages and supporting multiple folders on the server, are not\r
+ present in POP3.\r
+\r
+ When these facilities are used in this way by casual users, there has\r
+ been a tendency for already-read messages to accumulate on the server\r
+ without bound. This is clearly an undesirable behavior pattern from\r
+ the standpoint of the server operator. This situation is aggravated\r
+ by the fact that the limited capabilities of the POP3 do not permit\r
+ efficient handling of maildrops which have hundreds or thousands of\r
+ messages.\r
+\r
+ Consequently, it is recommended that operators of large-scale multi-\r
+ user servers, especially ones in which the user's only access to the\r
+ maildrop is via POP3, consider such options as:\r
+\r
+ * Imposing a per-user maildrop storage quota or the like.\r
+\r
+ A disadvantage to this option is that accumulation of messages may\r
+ result in the user's inability to receive new ones into the\r
+ maildrop. Sites which choose this option should be sure to inform\r
+ users of impending or current exhaustion of quota, perhaps by\r
+ inserting an appropriate message into the user's maildrop.\r
+\r
+ * Enforce a site policy regarding mail retention on the server.\r
+\r
+ Sites are free to establish local policy regarding the storage and\r
+ retention of messages on the server, both read and unread. For\r
+ example, a site might delete unread messages from the server after\r
+ 60 days and delete read messages after 7 days. Such message\r
+ deletions are outside the scope of the POP3 protocol and are not\r
+ considered a protocol violation.\r
+\r
+ Server operators enforcing message deletion policies should take\r
+ care to make all users aware of the policies in force.\r
+\r
+ Clients must not assume that a site policy will automate message\r
+ deletions, and should continue to explicitly delete messages using\r
+ the DELE command when appropriate.\r
+\r
+ It should be noted that enforcing site message deletion policies\r
+ may be confusing to the user community, since their POP3 client\r
+ may contain configuration options to leave mail on the server\r
+ which will not in fact be supported by the server.\r
+\r
+ One special case of a site policy is that messages may only be\r
+ downloaded once from the server, and are deleted after this has\r
+ been accomplished. This could be implemented in POP3 server\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 17]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ software by the following mechanism: "following a POP3 login by a\r
+ client which was ended by a QUIT, delete all messages downloaded\r
+ during the session with the RETR command". It is important not to\r
+ delete messages in the event of abnormal connection termination\r
+ (ie, if no QUIT was received from the client) because the client\r
+ may not have successfully received or stored the messages.\r
+ Servers implementing a download-and-delete policy may also wish to\r
+ disable or limit the optional TOP command, since it could be used\r
+ as an alternate mechanism to download entire messages.\r
+\r
+9. POP3 Command Summary\r
+\r
+ Minimal POP3 Commands:\r
+\r
+ USER name valid in the AUTHORIZATION state\r
+ PASS string\r
+ QUIT\r
+\r
+ STAT valid in the TRANSACTION state\r
+ LIST [msg]\r
+ RETR msg\r
+ DELE msg\r
+ NOOP\r
+ RSET\r
+ QUIT\r
+\r
+ Optional POP3 Commands:\r
+\r
+ APOP name digest valid in the AUTHORIZATION state\r
+\r
+ TOP msg n valid in the TRANSACTION state\r
+ UIDL [msg]\r
+\r
+ POP3 Replies:\r
+\r
+ +OK\r
+ -ERR\r
+\r
+ Note that with the exception of the STAT, LIST, and UIDL commands,\r
+ the reply given by the POP3 server to any command is significant\r
+ only to "+OK" and "-ERR". Any text occurring after this reply\r
+ may be ignored by the client.\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 18]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+10. Example POP3 Session\r
+\r
+ S: <wait for connection on TCP port 110>\r
+ C: <open connection>\r
+ S: +OK POP3 server ready <1896.697170952@dbc.mtview.ca.us>\r
+ C: APOP mrose c4c9334bac560ecc979e58001b3e22fb\r
+ S: +OK mrose's maildrop has 2 messages (320 octets)\r
+ C: STAT\r
+ S: +OK 2 320\r
+ C: LIST\r
+ S: +OK 2 messages (320 octets)\r
+ S: 1 120\r
+ S: 2 200\r
+ S: .\r
+ C: RETR 1\r
+ S: +OK 120 octets\r
+ S: <the POP3 server sends message 1>\r
+ S: .\r
+ C: DELE 1\r
+ S: +OK message 1 deleted\r
+ C: RETR 2\r
+ S: +OK 200 octets\r
+ S: <the POP3 server sends message 2>\r
+ S: .\r
+ C: DELE 2\r
+ S: +OK message 2 deleted\r
+ C: QUIT\r
+ S: +OK dewey POP3 server signing off (maildrop empty)\r
+ C: <close connection>\r
+ S: <wait for next connection>\r
+\r
+11. Message Format\r
+\r
+ All messages transmitted during a POP3 session are assumed to conform\r
+ to the standard for the format of Internet text messages [RFC822].\r
+\r
+ It is important to note that the octet count for a message on the\r
+ server host may differ from the octet count assigned to that message\r
+ due to local conventions for designating end-of-line. Usually,\r
+ during the AUTHORIZATION state of the POP3 session, the POP3 server\r
+ can calculate the size of each message in octets when it opens the\r
+ maildrop. For example, if the POP3 server host internally represents\r
+ end-of-line as a single character, then the POP3 server simply counts\r
+ each occurrence of this character in a message as two octets. Note\r
+ that lines in the message which start with the termination octet need\r
+ not (and must not) be counted twice, since the POP3 client will\r
+ remove all byte-stuffed termination characters when it receives a\r
+ multi-line response.\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 19]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+12. References\r
+\r
+ [RFC821] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC\r
+ 821, USC/Information Sciences Institute, August 1982.\r
+\r
+ [RFC822] Crocker, D., "Standard for the Format of ARPA-Internet Text\r
+ Messages", STD 11, RFC 822, University of Delaware, August 1982.\r
+\r
+ [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,\r
+ MIT Laboratory for Computer Science, April 1992.\r
+\r
+ [RFC1730] Crispin, M., "Internet Message Access Protocol - Version\r
+ 4", RFC 1730, University of Washington, December 1994.\r
+\r
+ [RFC1734] Myers, J., "POP3 AUTHentication command", RFC 1734,\r
+ Carnegie Mellon, December 1994.\r
+\r
+13. Security Considerations\r
+\r
+ It is conjectured that use of the APOP command provides origin\r
+ identification and replay protection for a POP3 session.\r
+ Accordingly, a POP3 server which implements both the PASS and APOP\r
+ commands should not allow both methods of access for a given user;\r
+ that is, for a given mailbox name, either the USER/PASS command\r
+ sequence or the APOP command is allowed, but not both.\r
+\r
+ Further, note that as the length of the shared secret increases, so\r
+ does the difficulty of deriving it.\r
+\r
+ Servers that answer -ERR to the USER command are giving potential\r
+ attackers clues about which names are valid.\r
+\r
+ Use of the PASS command sends passwords in the clear over the\r
+ network.\r
+\r
+ Use of the RETR and TOP commands sends mail in the clear over the\r
+ network.\r
+\r
+ Otherwise, security issues are not discussed in this memo.\r
+\r
+14. Acknowledgements\r
+\r
+ The POP family has a long and checkered history. Although primarily\r
+ a minor revision to RFC 1460, POP3 is based on the ideas presented in\r
+ RFCs 918, 937, and 1081.\r
+\r
+ In addition, Alfred Grimstad, Keith McCloghrie, and Neil Ostroff\r
+ provided significant comments on the APOP command.\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 20]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+15. Authors' Addresses\r
+\r
+ John G. Myers\r
+ Carnegie-Mellon University\r
+ 5000 Forbes Ave\r
+ Pittsburgh, PA 15213\r
+\r
+ EMail: jgm+@cmu.edu\r
+\r
+\r
+ Marshall T. Rose\r
+ Dover Beach Consulting, Inc.\r
+ 420 Whisman Court\r
+ Mountain View, CA 94043-2186\r
+\r
+ EMail: mrose@dbc.mtview.ca.us\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 21]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+Appendix A. Differences from RFC 1725\r
+\r
+ This memo is a revision to RFC 1725, a Draft Standard. It makes the\r
+ following changes from that document:\r
+\r
+ - clarifies that command keywords are case insensitive.\r
+\r
+ - specifies that servers must send "+OK" and "-ERR" in\r
+ upper case.\r
+\r
+ - specifies that the initial greeting is a positive response,\r
+ instead of any string which should be a positive response.\r
+\r
+ - clarifies behavior for unimplemented commands.\r
+\r
+ - makes the USER and PASS commands optional.\r
+\r
+ - clarified the set of possible responses to the USER command.\r
+\r
+ - reverses the order of the examples in the USER and PASS\r
+ commands, to reduce confusion.\r
+\r
+ - clarifies that the PASS command may only be given immediately\r
+ after a successful USER command.\r
+\r
+ - clarified the persistence requirements of UIDs and added some\r
+ implementation notes.\r
+\r
+ - specifies a UID length limitation of one to 70 octets.\r
+\r
+ - specifies a status indicator length limitation\r
+ of 512 octets, including the CRLF.\r
+\r
+ - clarifies that LIST with no arguments on an empty mailbox\r
+ returns success.\r
+\r
+ - adds a reference from the LIST command to the Message Format\r
+ section\r
+\r
+ - clarifies the behavior of QUIT upon failure\r
+\r
+ - clarifies the security section to not imply the use of the\r
+ USER command with the APOP command.\r
+\r
+ - adds references to RFCs 1730 and 1734\r
+\r
+ - clarifies the method by which a UA may enter mail into the\r
+ transport system.\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 22]\r
+\f\r
+RFC 1939 POP3 May 1996\r
+\r
+\r
+ - clarifies that the second argument to the TOP command is a\r
+ number of lines.\r
+\r
+ - changes the suggestion in the Security Considerations section\r
+ for a server to not accept both PASS and APOP for a given user\r
+ from a "must" to a "should".\r
+\r
+ - adds a section on scaling and operational considerations\r
+\r
+Appendix B. Command Index\r
+\r
+ APOP ....................................................... 15\r
+ DELE ....................................................... 8\r
+ LIST ....................................................... 6\r
+ NOOP ....................................................... 9\r
+ PASS ....................................................... 14\r
+ QUIT ....................................................... 5\r
+ QUIT ....................................................... 10\r
+ RETR ....................................................... 8\r
+ RSET ....................................................... 9\r
+ STAT ....................................................... 6\r
+ TOP ........................................................ 11\r
+ UIDL ....................................................... 12\r
+ USER ....................................................... 13\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+\r
+Myers & Rose Standards Track [Page 23]\r
+\f\r
--- /dev/null
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2045 Innosoft
+Obsoletes: 1521, 1522, 1590 N. Borenstein
+Category: Standards Track First Virtual
+ November 1996
+
+
+ Multipurpose Internet Mail Extensions
+ (MIME) Part One:
+ Format of Internet Message Bodies
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ STD 11, RFC 822, defines a message representation protocol specifying
+ considerable detail about US-ASCII message headers, and leaves the
+ message content, or message body, as flat US-ASCII text. This set of
+ documents, collectively called the Multipurpose Internet Mail
+ Extensions, or MIME, redefines the format of messages to allow for
+
+ (1) textual message bodies in character sets other than
+ US-ASCII,
+
+ (2) an extensible set of different formats for non-textual
+ message bodies,
+
+ (3) multi-part message bodies, and
+
+ (4) textual header information in character sets other than
+ US-ASCII.
+
+ These documents are based on earlier work documented in RFC 934, STD
+ 11, and RFC 1049, but extends and revises them. Because RFC 822 said
+ so little about message bodies, these documents are largely
+ orthogonal to (rather than a revision of) RFC 822.
+
+ This initial document specifies the various headers used to describe
+ the structure of MIME messages. The second document, RFC 2046,
+ defines the general structure of the MIME media typing system and
+ defines an initial set of media types. The third document, RFC 2047,
+ describes extensions to RFC 822 to allow non-US-ASCII text data in
+
+
+
+Freed & Borenstein Standards Track [Page 1]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ Internet mail header fields. The fourth document, RFC 2048, specifies
+ various IANA registration procedures for MIME-related facilities. The
+ fifth and final document, RFC 2049, describes MIME conformance
+ criteria as well as providing some illustrative examples of MIME
+ message formats, acknowledgements, and the bibliography.
+
+ These documents are revisions of RFCs 1521, 1522, and 1590, which
+ themselves were revisions of RFCs 1341 and 1342. An appendix in RFC
+ 2049 describes differences and changes from previous versions.
+
+Table of Contents
+
+ 1. Introduction ......................................... 3
+ 2. Definitions, Conventions, and Generic BNF Grammar .... 5
+ 2.1 CRLF ................................................ 5
+ 2.2 Character Set ....................................... 6
+ 2.3 Message ............................................. 6
+ 2.4 Entity .............................................. 6
+ 2.5 Body Part ........................................... 7
+ 2.6 Body ................................................ 7
+ 2.7 7bit Data ........................................... 7
+ 2.8 8bit Data ........................................... 7
+ 2.9 Binary Data ......................................... 7
+ 2.10 Lines .............................................. 7
+ 3. MIME Header Fields ................................... 8
+ 4. MIME-Version Header Field ............................ 8
+ 5. Content-Type Header Field ............................ 10
+ 5.1 Syntax of the Content-Type Header Field ............. 12
+ 5.2 Content-Type Defaults ............................... 14
+ 6. Content-Transfer-Encoding Header Field ............... 14
+ 6.1 Content-Transfer-Encoding Syntax .................... 14
+ 6.2 Content-Transfer-Encodings Semantics ................ 15
+ 6.3 New Content-Transfer-Encodings ...................... 16
+ 6.4 Interpretation and Use .............................. 16
+ 6.5 Translating Encodings ............................... 18
+ 6.6 Canonical Encoding Model ............................ 19
+ 6.7 Quoted-Printable Content-Transfer-Encoding .......... 19
+ 6.8 Base64 Content-Transfer-Encoding .................... 24
+ 7. Content-ID Header Field .............................. 26
+ 8. Content-Description Header Field ..................... 27
+ 9. Additional MIME Header Fields ........................ 27
+ 10. Summary ............................................. 27
+ 11. Security Considerations ............................. 27
+ 12. Authors' Addresses .................................. 28
+ A. Collected Grammar .................................... 29
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 2]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+1. Introduction
+
+ Since its publication in 1982, RFC 822 has defined the standard
+ format of textual mail messages on the Internet. Its success has
+ been such that the RFC 822 format has been adopted, wholly or
+ partially, well beyond the confines of the Internet and the Internet
+ SMTP transport defined by RFC 821. As the format has seen wider use,
+ a number of limitations have proven increasingly restrictive for the
+ user community.
+
+ RFC 822 was intended to specify a format for text messages. As such,
+ non-text messages, such as multimedia messages that might include
+ audio or images, are simply not mentioned. Even in the case of text,
+ however, RFC 822 is inadequate for the needs of mail users whose
+ languages require the use of character sets richer than US-ASCII.
+ Since RFC 822 does not specify mechanisms for mail containing audio,
+ video, Asian language text, or even text in most European languages,
+ additional specifications are needed.
+
+ One of the notable limitations of RFC 821/822 based mail systems is
+ the fact that they limit the contents of electronic mail messages to
+ relatively short lines (e.g. 1000 characters or less [RFC-821]) of
+ 7bit US-ASCII. This forces users to convert any non-textual data
+ that they may wish to send into seven-bit bytes representable as
+ printable US-ASCII characters before invoking a local mail UA (User
+ Agent, a program with which human users send and receive mail).
+ Examples of such encodings currently used in the Internet include
+ pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in
+ RFC 1421, the Andrew Toolkit Representation [ATK], and many others.
+
+ The limitations of RFC 822 mail become even more apparent as gateways
+ are designed to allow for the exchange of mail messages between RFC
+ 822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the
+ inclusion of non-textual material within electronic mail messages.
+ The current standards for the mapping of X.400 messages to RFC 822
+ messages specify either that X.400 non-textual material must be
+ converted to (not encoded in) IA5Text format, or that they must be
+ discarded, notifying the RFC 822 user that discarding has occurred.
+ This is clearly undesirable, as information that a user may wish to
+ receive is lost. Even though a user agent may not have the
+ capability of dealing with the non-textual material, the user might
+ have some mechanism external to the UA that can extract useful
+ information from the material. Moreover, it does not allow for the
+ fact that the message may eventually be gatewayed back into an X.400
+ message handling system (i.e., the X.400 message is "tunneled"
+ through Internet mail), where the non-textual information would
+ definitely become useful again.
+
+
+
+
+Freed & Borenstein Standards Track [Page 3]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ This document describes several mechanisms that combine to solve most
+ of these problems without introducing any serious incompatibilities
+ with the existing world of RFC 822 mail. In particular, it
+ describes:
+
+ (1) A MIME-Version header field, which uses a version
+ number to declare a message to be conformant with MIME
+ and allows mail processing agents to distinguish
+ between such messages and those generated by older or
+ non-conformant software, which are presumed to lack
+ such a field.
+
+ (2) A Content-Type header field, generalized from RFC 1049,
+ which can be used to specify the media type and subtype
+ of data in the body of a message and to fully specify
+ the native representation (canonical form) of such
+ data.
+
+ (3) A Content-Transfer-Encoding header field, which can be
+ used to specify both the encoding transformation that
+ was applied to the body and the domain of the result.
+ Encoding transformations other than the identity
+ transformation are usually applied to data in order to
+ allow it to pass through mail transport mechanisms
+ which may have data or character set limitations.
+
+ (4) Two additional header fields that can be used to
+ further describe the data in a body, the Content-ID and
+ Content-Description header fields.
+
+ All of the header fields defined in this document are subject to the
+ general syntactic rules for header fields specified in RFC 822. In
+ particular, all of these header fields except for Content-Disposition
+ can include RFC 822 comments, which have no semantic content and
+ should be ignored during MIME processing.
+
+ Finally, to specify and promote interoperability, RFC 2049 provides a
+ basic applicability statement for a subset of the above mechanisms
+ that defines a minimal level of "conformance" with this document.
+
+ HISTORICAL NOTE: Several of the mechanisms described in this set of
+ documents may seem somewhat strange or even baroque at first reading.
+ It is important to note that compatibility with existing standards
+ AND robustness across existing practice were two of the highest
+ priorities of the working group that developed this set of documents.
+ In particular, compatibility was always favored over elegance.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 4]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ Please refer to the current edition of the "Internet Official
+ Protocol Standards" for the standardization state and status of this
+ protocol. RFC 822 and STD 3, RFC 1123 also provide essential
+ background for MIME since no conforming implementation of MIME can
+ violate them. In addition, several other informational RFC documents
+ will be of interest to the MIME implementor, in particular RFC 1344,
+ RFC 1345, and RFC 1524.
+
+2. Definitions, Conventions, and Generic BNF Grammar
+
+ Although the mechanisms specified in this set of documents are all
+ described in prose, most are also described formally in the augmented
+ BNF notation of RFC 822. Implementors will need to be familiar with
+ this notation in order to understand this set of documents, and are
+ referred to RFC 822 for a complete explanation of the augmented BNF
+ notation.
+
+ Some of the augmented BNF in this set of documents makes named
+ references to syntax rules defined in RFC 822. A complete formal
+ grammar, then, is obtained by combining the collected grammar
+ appendices in each document in this set with the BNF of RFC 822 plus
+ the modifications to RFC 822 defined in RFC 1123 (which specifically
+ changes the syntax for `return', `date' and `mailbox').
+
+ All numeric and octet values are given in decimal notation in this
+ set of documents. All media type values, subtype values, and
+ parameter names as defined are case-insensitive. However, parameter
+ values are case-sensitive unless otherwise specified for the specific
+ parameter.
+
+ FORMATTING NOTE: Notes, such at this one, provide additional
+ nonessential information which may be skipped by the reader without
+ missing anything essential. The primary purpose of these non-
+ essential notes is to convey information about the rationale of this
+ set of documents, or to place these documents in the proper
+ historical or evolutionary context. Such information may in
+ particular be skipped by those who are focused entirely on building a
+ conformant implementation, but may be of use to those who wish to
+ understand why certain design choices were made.
+
+2.1. CRLF
+
+ The term CRLF, in this set of documents, refers to the sequence of
+ octets corresponding to the two US-ASCII characters CR (decimal value
+ 13) and LF (decimal value 10) which, taken together, in this order,
+ denote a line break in RFC 822 mail.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 5]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+2.2. Character Set
+
+ The term "character set" is used in MIME to refer to a method of
+ converting a sequence of octets into a sequence of characters. Note
+ that unconditional and unambiguous conversion in the other direction
+ is not required, in that not all characters may be representable by a
+ given character set and a character set may provide more than one
+ sequence of octets to represent a particular sequence of characters.
+
+ This definition is intended to allow various kinds of character
+ encodings, from simple single-table mappings such as US-ASCII to
+ complex table switching methods such as those that use ISO 2022's
+ techniques, to be used as character sets. However, the definition
+ associated with a MIME character set name must fully specify the
+ mapping to be performed. In particular, use of external profiling
+ information to determine the exact mapping is not permitted.
+
+ NOTE: The term "character set" was originally to describe such
+ straightforward schemes as US-ASCII and ISO-8859-1 which have a
+ simple one-to-one mapping from single octets to single characters.
+ Multi-octet coded character sets and switching techniques make the
+ situation more complex. For example, some communities use the term
+ "character encoding" for what MIME calls a "character set", while
+ using the phrase "coded character set" to denote an abstract mapping
+ from integers (not octets) to characters.
+
+2.3. Message
+
+ The term "message", when not further qualified, means either a
+ (complete or "top-level") RFC 822 message being transferred on a
+ network, or a message encapsulated in a body of type "message/rfc822"
+ or "message/partial".
+
+2.4. Entity
+
+ The term "entity", refers specifically to the MIME-defined header
+ fields and contents of either a message or one of the parts in the
+ body of a multipart entity. The specification of such entities is
+ the essence of MIME. Since the contents of an entity are often
+ called the "body", it makes sense to speak about the body of an
+ entity. Any sort of field may be present in the header of an entity,
+ but only those fields whose names begin with "content-" actually have
+ any MIME-related meaning. Note that this does NOT imply thay they
+ have no meaning at all -- an entity that is also a message has non-
+ MIME header fields whose meanings are defined by RFC 822.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 6]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+2.5. Body Part
+
+ The term "body part" refers to an entity inside of a multipart
+ entity.
+
+2.6. Body
+
+ The term "body", when not further qualified, means the body of an
+ entity, that is, the body of either a message or of a body part.
+
+ NOTE: The previous four definitions are clearly circular. This is
+ unavoidable, since the overall structure of a MIME message is indeed
+ recursive.
+
+2.7. 7bit Data
+
+ "7bit data" refers to data that is all represented as relatively
+ short lines with 998 octets or less between CRLF line separation
+ sequences [RFC-821]. No octets with decimal values greater than 127
+ are allowed and neither are NULs (octets with decimal value 0). CR
+ (decimal value 13) and LF (decimal value 10) octets only occur as
+ part of CRLF line separation sequences.
+
+2.8. 8bit Data
+
+ "8bit data" refers to data that is all represented as relatively
+ short lines with 998 octets or less between CRLF line separation
+ sequences [RFC-821]), but octets with decimal values greater than 127
+ may be used. As with "7bit data" CR and LF octets only occur as part
+ of CRLF line separation sequences and no NULs are allowed.
+
+2.9. Binary Data
+
+ "Binary data" refers to data where any sequence of octets whatsoever
+ is allowed.
+
+2.10. Lines
+
+ "Lines" are defined as sequences of octets separated by a CRLF
+ sequences. This is consistent with both RFC 821 and RFC 822.
+ "Lines" only refers to a unit of data in a message, which may or may
+ not correspond to something that is actually displayed by a user
+ agent.
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 7]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+3. MIME Header Fields
+
+ MIME defines a number of new RFC 822 header fields that are used to
+ describe the content of a MIME entity. These header fields occur in
+ at least two contexts:
+
+ (1) As part of a regular RFC 822 message header.
+
+ (2) In a MIME body part header within a multipart
+ construct.
+
+ The formal definition of these header fields is as follows:
+
+ entity-headers := [ content CRLF ]
+ [ encoding CRLF ]
+ [ id CRLF ]
+ [ description CRLF ]
+ *( MIME-extension-field CRLF )
+
+ MIME-message-headers := entity-headers
+ fields
+ version CRLF
+ ; The ordering of the header
+ ; fields implied by this BNF
+ ; definition should be ignored.
+
+ MIME-part-headers := entity-headers
+ [ fields ]
+ ; Any field not beginning with
+ ; "content-" can have no defined
+ ; meaning and may be ignored.
+ ; The ordering of the header
+ ; fields implied by this BNF
+ ; definition should be ignored.
+
+ The syntax of the various specific MIME header fields will be
+ described in the following sections.
+
+4. MIME-Version Header Field
+
+ Since RFC 822 was published in 1982, there has really been only one
+ format standard for Internet messages, and there has been little
+ perceived need to declare the format standard in use. This document
+ is an independent specification that complements RFC 822. Although
+ the extensions in this document have been defined in such a way as to
+ be compatible with RFC 822, there are still circumstances in which it
+ might be desirable for a mail-processing agent to know whether a
+ message was composed with the new standard in mind.
+
+
+
+Freed & Borenstein Standards Track [Page 8]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ Therefore, this document defines a new header field, "MIME-Version",
+ which is to be used to declare the version of the Internet message
+ body format standard in use.
+
+ Messages composed in accordance with this document MUST include such
+ a header field, with the following verbatim text:
+
+ MIME-Version: 1.0
+
+ The presence of this header field is an assertion that the message
+ has been composed in compliance with this document.
+
+ Since it is possible that a future document might extend the message
+ format standard again, a formal BNF is given for the content of the
+ MIME-Version field:
+
+ version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
+
+ Thus, future format specifiers, which might replace or extend "1.0",
+ are constrained to be two integer fields, separated by a period. If
+ a message is received with a MIME-version value other than "1.0", it
+ cannot be assumed to conform with this document.
+
+ Note that the MIME-Version header field is required at the top level
+ of a message. It is not required for each body part of a multipart
+ entity. It is required for the embedded headers of a body of type
+ "message/rfc822" or "message/partial" if and only if the embedded
+ message is itself claimed to be MIME-conformant.
+
+ It is not possible to fully specify how a mail reader that conforms
+ with MIME as defined in this document should treat a message that
+ might arrive in the future with some value of MIME-Version other than
+ "1.0".
+
+ It is also worth noting that version control for specific media types
+ is not accomplished using the MIME-Version mechanism. In particular,
+ some formats (such as application/postscript) have version numbering
+ conventions that are internal to the media format. Where such
+ conventions exist, MIME does nothing to supersede them. Where no
+ such conventions exist, a MIME media type might use a "version"
+ parameter in the content-type field if necessary.
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 9]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822
+ comment strings that are present must be ignored. In particular, the
+ following four MIME-Version fields are equivalent:
+
+ MIME-Version: 1.0
+
+ MIME-Version: 1.0 (produced by MetaSend Vx.x)
+
+ MIME-Version: (produced by MetaSend Vx.x) 1.0
+
+ MIME-Version: 1.(produced by MetaSend Vx.x)0
+
+ In the absence of a MIME-Version field, a receiving mail user agent
+ (whether conforming to MIME requirements or not) may optionally
+ choose to interpret the body of the message according to local
+ conventions. Many such conventions are currently in use and it
+ should be noted that in practice non-MIME messages can contain just
+ about anything.
+
+ It is impossible to be certain that a non-MIME mail message is
+ actually plain text in the US-ASCII character set since it might well
+ be a message that, using some set of nonstandard local conventions
+ that predate MIME, includes text in another character set or non-
+ textual data presented in a manner that cannot be automatically
+ recognized (e.g., a uuencoded compressed UNIX tar file).
+
+5. Content-Type Header Field
+
+ The purpose of the Content-Type field is to describe the data
+ contained in the body fully enough that the receiving user agent can
+ pick an appropriate agent or mechanism to present the data to the
+ user, or otherwise deal with the data in an appropriate manner. The
+ value in this field is called a media type.
+
+ HISTORICAL NOTE: The Content-Type header field was first defined in
+ RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one
+ that is largely compatible with the mechanism given here.
+
+ The Content-Type header field specifies the nature of the data in the
+ body of an entity by giving media type and subtype identifiers, and
+ by providing auxiliary information that may be required for certain
+ media types. After the media type and subtype names, the remainder
+ of the header field is simply a set of parameters, specified in an
+ attribute=value notation. The ordering of parameters is not
+ significant.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 10]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ In general, the top-level media type is used to declare the general
+ type of data, while the subtype specifies a specific format for that
+ type of data. Thus, a media type of "image/xyz" is enough to tell a
+ user agent that the data is an image, even if the user agent has no
+ knowledge of the specific image format "xyz". Such information can
+ be used, for example, to decide whether or not to show a user the raw
+ data from an unrecognized subtype -- such an action might be
+ reasonable for unrecognized subtypes of text, but not for
+ unrecognized subtypes of image or audio. For this reason, registered
+ subtypes of text, image, audio, and video should not contain embedded
+ information that is really of a different type. Such compound
+ formats should be represented using the "multipart" or "application"
+ types.
+
+ Parameters are modifiers of the media subtype, and as such do not
+ fundamentally affect the nature of the content. The set of
+ meaningful parameters depends on the media type and subtype. Most
+ parameters are associated with a single specific subtype. However, a
+ given top-level media type may define parameters which are applicable
+ to any subtype of that type. Parameters may be required by their
+ defining content type or subtype or they may be optional. MIME
+ implementations must ignore any parameters whose names they do not
+ recognize.
+
+ For example, the "charset" parameter is applicable to any subtype of
+ "text", while the "boundary" parameter is required for any subtype of
+ the "multipart" media type.
+
+ There are NO globally-meaningful parameters that apply to all media
+ types. Truly global mechanisms are best addressed, in the MIME
+ model, by the definition of additional Content-* header fields.
+
+ An initial set of seven top-level media types is defined in RFC 2046.
+ Five of these are discrete types whose content is essentially opaque
+ as far as MIME processing is concerned. The remaining two are
+ composite types whose contents require additional handling by MIME
+ processors.
+
+ This set of top-level media types is intended to be substantially
+ complete. It is expected that additions to the larger set of
+ supported types can generally be accomplished by the creation of new
+ subtypes of these initial types. In the future, more top-level types
+ may be defined only by a standards-track extension to this standard.
+ If another top-level type is to be used for any reason, it must be
+ given a name starting with "X-" to indicate its non-standard status
+ and to avoid a potential conflict with a future official name.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 11]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+5.1. Syntax of the Content-Type Header Field
+
+ In the Augmented BNF notation of RFC 822, a Content-Type header field
+ value is defined as follows:
+
+ content := "Content-Type" ":" type "/" subtype
+ *(";" parameter)
+ ; Matching of media type and subtype
+ ; is ALWAYS case-insensitive.
+
+ type := discrete-type / composite-type
+
+ discrete-type := "text" / "image" / "audio" / "video" /
+ "application" / extension-token
+
+ composite-type := "message" / "multipart" / extension-token
+
+ extension-token := ietf-token / x-token
+
+ ietf-token := <An extension token defined by a
+ standards-track RFC and registered
+ with IANA.>
+
+ x-token := <The two characters "X-" or "x-" followed, with
+ no intervening white space, by any token>
+
+ subtype := extension-token / iana-token
+
+ iana-token := <A publicly-defined extension token. Tokens
+ of this form must be registered with IANA
+ as specified in RFC 2048.>
+
+ parameter := attribute "=" value
+
+ attribute := token
+ ; Matching of attributes
+ ; is ALWAYS case-insensitive.
+
+ value := token / quoted-string
+
+ token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
+ or tspecials>
+
+ tspecials := "(" / ")" / "<" / ">" / "@" /
+ "," / ";" / ":" / "\" / <">
+ "/" / "[" / "]" / "?" / "="
+ ; Must be in quoted-string,
+ ; to use within parameter values
+
+
+
+Freed & Borenstein Standards Track [Page 12]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ Note that the definition of "tspecials" is the same as the RFC 822
+ definition of "specials" with the addition of the three characters
+ "/", "?", and "=", and the removal of ".".
+
+ Note also that a subtype specification is MANDATORY -- it may not be
+ omitted from a Content-Type header field. As such, there are no
+ default subtypes.
+
+ The type, subtype, and parameter names are not case sensitive. For
+ example, TEXT, Text, and TeXt are all equivalent top-level media
+ types. Parameter values are normally case sensitive, but sometimes
+ are interpreted in a case-insensitive fashion, depending on the
+ intended use. (For example, multipart boundaries are case-sensitive,
+ but the "access-type" parameter for message/External-body is not
+ case-sensitive.)
+
+ Note that the value of a quoted string parameter does not include the
+ quotes. That is, the quotation marks in a quoted-string are not a
+ part of the value of the parameter, but are merely used to delimit
+ that parameter value. In addition, comments are allowed in
+ accordance with RFC 822 rules for structured header fields. Thus the
+ following two forms
+
+ Content-type: text/plain; charset=us-ascii (Plain text)
+
+ Content-type: text/plain; charset="us-ascii"
+
+ are completely equivalent.
+
+ Beyond this syntax, the only syntactic constraint on the definition
+ of subtype names is the desire that their uses must not conflict.
+ That is, it would be undesirable to have two different communities
+ using "Content-Type: application/foobar" to mean two different
+ things. The process of defining new media subtypes, then, is not
+ intended to be a mechanism for imposing restrictions, but simply a
+ mechanism for publicizing their definition and usage. There are,
+ therefore, two acceptable mechanisms for defining new media subtypes:
+
+ (1) Private values (starting with "X-") may be defined
+ bilaterally between two cooperating agents without
+ outside registration or standardization. Such values
+ cannot be registered or standardized.
+
+ (2) New standard values should be registered with IANA as
+ described in RFC 2048.
+
+ The second document in this set, RFC 2046, defines the initial set of
+ media types for MIME.
+
+
+
+Freed & Borenstein Standards Track [Page 13]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+5.2. Content-Type Defaults
+
+ Default RFC 822 messages without a MIME Content-Type header are taken
+ by this protocol to be plain text in the US-ASCII character set,
+ which can be explicitly specified as:
+
+ Content-type: text/plain; charset=us-ascii
+
+ This default is assumed if no Content-Type header field is specified.
+ It is also recommend that this default be assumed when a
+ syntactically invalid Content-Type header field is encountered. In
+ the presence of a MIME-Version header field and the absence of any
+ Content-Type header field, a receiving User Agent can also assume
+ that plain US-ASCII text was the sender's intent. Plain US-ASCII
+ text may still be assumed in the absence of a MIME-Version or the
+ presence of an syntactically invalid Content-Type header field, but
+ the sender's intent might have been otherwise.
+
+6. Content-Transfer-Encoding Header Field
+
+ Many media types which could be usefully transported via email are
+ represented, in their "natural" format, as 8bit character or binary
+ data. Such data cannot be transmitted over some transfer protocols.
+ For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII
+ data with lines no longer than 1000 characters including any trailing
+ CRLF line separator.
+
+ It is necessary, therefore, to define a standard mechanism for
+ encoding such data into a 7bit short line format. Proper labelling
+ of unencoded material in less restrictive formats for direct use over
+ less restrictive transports is also desireable. This document
+ specifies that such encodings will be indicated by a new "Content-
+ Transfer-Encoding" header field. This field has not been defined by
+ any previous standard.
+
+6.1. Content-Transfer-Encoding Syntax
+
+ The Content-Transfer-Encoding field's value is a single token
+ specifying the type of encoding, as enumerated below. Formally:
+
+ encoding := "Content-Transfer-Encoding" ":" mechanism
+
+ mechanism := "7bit" / "8bit" / "binary" /
+ "quoted-printable" / "base64" /
+ ietf-token / x-token
+
+ These values are not case sensitive -- Base64 and BASE64 and bAsE64
+ are all equivalent. An encoding type of 7BIT requires that the body
+
+
+
+Freed & Borenstein Standards Track [Page 14]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ is already in a 7bit mail-ready representation. This is the default
+ value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
+ Content-Transfer-Encoding header field is not present.
+
+6.2. Content-Transfer-Encodings Semantics
+
+ This single Content-Transfer-Encoding token actually provides two
+ pieces of information. It specifies what sort of encoding
+ transformation the body was subjected to and hence what decoding
+ operation must be used to restore it to its original form, and it
+ specifies what the domain of the result is.
+
+ The transformation part of any Content-Transfer-Encodings specifies,
+ either explicitly or implicitly, a single, well-defined decoding
+ algorithm, which for any sequence of encoded octets either transforms
+ it to the original sequence of octets which was encoded, or shows
+ that it is illegal as an encoded sequence. Content-Transfer-
+ Encodings transformations never depend on any additional external
+ profile information for proper operation. Note that while decoders
+ must produce a single, well-defined output for a valid encoding no
+ such restrictions exist for encoders: Encoding a given sequence of
+ octets to different, equivalent encoded sequences is perfectly legal.
+
+ Three transformations are currently defined: identity, the "quoted-
+ printable" encoding, and the "base64" encoding. The domains are
+ "binary", "8bit" and "7bit".
+
+ The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
+ mean that the identity (i.e. NO) encoding transformation has been
+ performed. As such, they serve simply as indicators of the domain of
+ the body data, and provide useful information about the sort of
+ encoding that might be needed for transmission in a given transport
+ system. The terms "7bit data", "8bit data", and "binary data" are
+ all defined in Section 2.
+
+ The quoted-printable and base64 encodings transform their input from
+ an arbitrary domain into material in the "7bit" range, thus making it
+ safe to carry over restricted transports. The specific definition of
+ the transformations are given below.
+
+ The proper Content-Transfer-Encoding label must always be used.
+ Labelling unencoded data containing 8bit characters as "7bit" is not
+ allowed, nor is labelling unencoded non-line-oriented data as
+ anything other than "binary" allowed.
+
+ Unlike media subtypes, a proliferation of Content-Transfer-Encoding
+ values is both undesirable and unnecessary. However, establishing
+ only a single transformation into the "7bit" domain does not seem
+
+
+
+Freed & Borenstein Standards Track [Page 15]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ possible. There is a tradeoff between the desire for a compact and
+ efficient encoding of largely- binary data and the desire for a
+ somewhat readable encoding of data that is mostly, but not entirely,
+ 7bit. For this reason, at least two encoding mechanisms are
+ necessary: a more or less readable encoding (quoted-printable) and a
+ "dense" or "uniform" encoding (base64).
+
+ Mail transport for unencoded 8bit data is defined in RFC 1652. As of
+ the initial publication of this document, there are no standardized
+ Internet mail transports for which it is legitimate to include
+ unencoded binary data in mail bodies. Thus there are no
+ circumstances in which the "binary" Content-Transfer-Encoding is
+ actually valid in Internet mail. However, in the event that binary
+ mail transport becomes a reality in Internet mail, or when MIME is
+ used in conjunction with any other binary-capable mail transport
+ mechanism, binary bodies must be labelled as such using this
+ mechanism.
+
+ NOTE: The five values defined for the Content-Transfer-Encoding field
+ imply nothing about the media type other than the algorithm by which
+ it was encoded or the transport system requirements if unencoded.
+
+6.3. New Content-Transfer-Encodings
+
+ Implementors may, if necessary, define private Content-Transfer-
+ Encoding values, but must use an x-token, which is a name prefixed by
+ "X-", to indicate its non-standard status, e.g., "Content-Transfer-
+ Encoding: x-my-new-encoding". Additional standardized Content-
+ Transfer-Encoding values must be specified by a standards-track RFC.
+ The requirements such specifications must meet are given in RFC 2048.
+ As such, all content-transfer-encoding namespace except that
+ beginning with "X-" is explicitly reserved to the IETF for future
+ use.
+
+ Unlike media types and subtypes, the creation of new Content-
+ Transfer-Encoding values is STRONGLY discouraged, as it seems likely
+ to hinder interoperability with little potential benefit
+
+6.4. Interpretation and Use
+
+ If a Content-Transfer-Encoding header field appears as part of a
+ message header, it applies to the entire body of that message. If a
+ Content-Transfer-Encoding header field appears as part of an entity's
+ headers, it applies only to the body of that entity. If an entity is
+ of type "multipart" the Content-Transfer-Encoding is not permitted to
+ have any value other than "7bit", "8bit" or "binary". Even more
+ severe restrictions apply to some subtypes of the "message" type.
+
+
+
+
+Freed & Borenstein Standards Track [Page 16]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ It should be noted that most media types are defined in terms of
+ octets rather than bits, so that the mechanisms described here are
+ mechanisms for encoding arbitrary octet streams, not bit streams. If
+ a bit stream is to be encoded via one of these mechanisms, it must
+ first be converted to an 8bit byte stream using the network standard
+ bit order ("big-endian"), in which the earlier bits in a stream
+ become the higher-order bits in a 8bit byte. A bit stream not ending
+ at an 8bit boundary must be padded with zeroes. RFC 2046 provides a
+ mechanism for noting the addition of such padding in the case of the
+ application/octet-stream media type, which has a "padding" parameter.
+
+ The encoding mechanisms defined here explicitly encode all data in
+ US-ASCII. Thus, for example, suppose an entity has header fields
+ such as:
+
+ Content-Type: text/plain; charset=ISO-8859-1
+ Content-transfer-encoding: base64
+
+ This must be interpreted to mean that the body is a base64 US-ASCII
+ encoding of data that was originally in ISO-8859-1, and will be in
+ that character set again after decoding.
+
+ Certain Content-Transfer-Encoding values may only be used on certain
+ media types. In particular, it is EXPRESSLY FORBIDDEN to use any
+ encodings other than "7bit", "8bit", or "binary" with any composite
+ media type, i.e. one that recursively includes other Content-Type
+ fields. Currently the only composite media types are "multipart" and
+ "message". All encodings that are desired for bodies of type
+ multipart or message must be done at the innermost level, by encoding
+ the actual body that needs to be encoded.
+
+ It should also be noted that, by definition, if a composite entity
+ has a transfer-encoding value such as "7bit", but one of the enclosed
+ entities has a less restrictive value such as "8bit", then either the
+ outer "7bit" labelling is in error, because 8bit data are included,
+ or the inner "8bit" labelling placed an unnecessarily high demand on
+ the transport system because the actual included data were actually
+ 7bit-safe.
+
+ NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using
+ content-transfer-encodings on composite body data may seem overly
+ restrictive, it is necessary to prevent nested encodings, in which
+ data are passed through an encoding algorithm multiple times, and
+ must be decoded multiple times in order to be properly viewed.
+ Nested encodings add considerable complexity to user agents: Aside
+ from the obvious efficiency problems with such multiple encodings,
+ they can obscure the basic structure of a message. In particular,
+ they can imply that several decoding operations are necessary simply
+
+
+
+Freed & Borenstein Standards Track [Page 17]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ to find out what types of bodies a message contains. Banning nested
+ encodings may complicate the job of certain mail gateways, but this
+ seems less of a problem than the effect of nested encodings on user
+ agents.
+
+ Any entity with an unrecognized Content-Transfer-Encoding must be
+ treated as if it has a Content-Type of "application/octet-stream",
+ regardless of what the Content-Type header field actually says.
+
+ NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER-
+ ENCODING: It may seem that the Content-Transfer-Encoding could be
+ inferred from the characteristics of the media that is to be encoded,
+ or, at the very least, that certain Content-Transfer-Encodings could
+ be mandated for use with specific media types. There are several
+ reasons why this is not the case. First, given the varying types of
+ transports used for mail, some encodings may be appropriate for some
+ combinations of media types and transports but not for others. (For
+ example, in an 8bit transport, no encoding would be required for text
+ in certain character sets, while such encodings are clearly required
+ for 7bit SMTP.)
+
+ Second, certain media types may require different types of transfer
+ encoding under different circumstances. For example, many PostScript
+ bodies might consist entirely of short lines of 7bit data and hence
+ require no encoding at all. Other PostScript bodies (especially
+ those using Level 2 PostScript's binary encoding mechanism) may only
+ be reasonably represented using a binary transport encoding.
+ Finally, since the Content-Type field is intended to be an open-ended
+ specification mechanism, strict specification of an association
+ between media types and encodings effectively couples the
+ specification of an application protocol with a specific lower-level
+ transport. This is not desirable since the developers of a media
+ type should not have to be aware of all the transports in use and
+ what their limitations are.
+
+6.5. Translating Encodings
+
+ The quoted-printable and base64 encodings are designed so that
+ conversion between them is possible. The only issue that arises in
+ such a conversion is the handling of hard line breaks in quoted-
+ printable encoding output. When converting from quoted-printable to
+ base64 a hard line break in the quoted-printable form represents a
+ CRLF sequence in the canonical form of the data. It must therefore be
+ converted to a corresponding encoded CRLF in the base64 form of the
+ data. Similarly, a CRLF sequence in the canonical form of the data
+ obtained after base64 decoding must be converted to a quoted-
+ printable hard line break, but ONLY when converting text data.
+
+
+
+
+Freed & Borenstein Standards Track [Page 18]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+6.6. Canonical Encoding Model
+
+ There was some confusion, in the previous versions of this RFC,
+ regarding the model for when email data was to be converted to
+ canonical form and encoded, and in particular how this process would
+ affect the treatment of CRLFs, given that the representation of
+ newlines varies greatly from system to system, and the relationship
+ between content-transfer-encodings and character sets. A canonical
+ model for encoding is presented in RFC 2049 for this reason.
+
+6.7. Quoted-Printable Content-Transfer-Encoding
+
+ The Quoted-Printable encoding is intended to represent data that
+ largely consists of octets that correspond to printable characters in
+ the US-ASCII character set. It encodes the data in such a way that
+ the resulting octets are unlikely to be modified by mail transport.
+ If the data being encoded are mostly US-ASCII text, the encoded form
+ of the data remains largely recognizable by humans. A body which is
+ entirely US-ASCII may also be encoded in Quoted-Printable to ensure
+ the integrity of the data should the message pass through a
+ character-translating, and/or line-wrapping gateway.
+
+ In this encoding, octets are to be represented as determined by the
+ following rules:
+
+ (1) (General 8bit representation) Any octet, except a CR or
+ LF that is part of a CRLF line break of the canonical
+ (standard) form of the data being encoded, may be
+ represented by an "=" followed by a two digit
+ hexadecimal representation of the octet's value. The
+ digits of the hexadecimal alphabet, for this purpose,
+ are "0123456789ABCDEF". Uppercase letters must be
+ used; lowercase letters are not allowed. Thus, for
+ example, the decimal value 12 (US-ASCII form feed) can
+ be represented by "=0C", and the decimal value 61 (US-
+ ASCII EQUAL SIGN) can be represented by "=3D". This
+ rule must be followed except when the following rules
+ allow an alternative encoding.
+
+ (2) (Literal representation) Octets with decimal values of
+ 33 through 60 inclusive, and 62 through 126, inclusive,
+ MAY be represented as the US-ASCII characters which
+ correspond to those octets (EXCLAMATION POINT through
+ LESS THAN, and GREATER THAN through TILDE,
+ respectively).
+
+ (3) (White Space) Octets with values of 9 and 32 MAY be
+ represented as US-ASCII TAB (HT) and SPACE characters,
+
+
+
+Freed & Borenstein Standards Track [Page 19]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ respectively, but MUST NOT be so represented at the end
+ of an encoded line. Any TAB (HT) or SPACE characters
+ on an encoded line MUST thus be followed on that line
+ by a printable character. In particular, an "=" at the
+ end of an encoded line, indicating a soft line break
+ (see rule #5) may follow one or more TAB (HT) or SPACE
+ characters. It follows that an octet with decimal
+ value 9 or 32 appearing at the end of an encoded line
+ must be represented according to Rule #1. This rule is
+ necessary because some MTAs (Message Transport Agents,
+ programs which transport messages from one user to
+ another, or perform a portion of such transfers) are
+ known to pad lines of text with SPACEs, and others are
+ known to remove "white space" characters from the end
+ of a line. Therefore, when decoding a Quoted-Printable
+ body, any trailing white space on a line must be
+ deleted, as it will necessarily have been added by
+ intermediate transport agents.
+
+ (4) (Line Breaks) A line break in a text body, represented
+ as a CRLF sequence in the text canonical form, must be
+ represented by a (RFC 822) line break, which is also a
+ CRLF sequence, in the Quoted-Printable encoding. Since
+ the canonical representation of media types other than
+ text do not generally include the representation of
+ line breaks as CRLF sequences, no hard line breaks
+ (i.e. line breaks that are intended to be meaningful
+ and to be displayed to the user) can occur in the
+ quoted-printable encoding of such types. Sequences
+ like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely
+ appear in non-text data represented in quoted-
+ printable, of course.
+
+ Note that many implementations may elect to encode the
+ local representation of various content types directly
+ rather than converting to canonical form first,
+ encoding, and then converting back to local
+ representation. In particular, this may apply to plain
+ text material on systems that use newline conventions
+ other than a CRLF terminator sequence. Such an
+ implementation optimization is permissible, but only
+ when the combined canonicalization-encoding step is
+ equivalent to performing the three steps separately.
+
+ (5) (Soft Line Breaks) The Quoted-Printable encoding
+ REQUIRES that encoded lines be no more than 76
+ characters long. If longer lines are to be encoded
+ with the Quoted-Printable encoding, "soft" line breaks
+
+
+
+Freed & Borenstein Standards Track [Page 20]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ must be used. An equal sign as the last character on a
+ encoded line indicates such a non-significant ("soft")
+ line break in the encoded text.
+
+ Thus if the "raw" form of the line is a single unencoded line that
+ says:
+
+ Now's the time for all folk to come to the aid of their country.
+
+ This can be represented, in the Quoted-Printable encoding, as:
+
+ Now's the time =
+ for all folk to come=
+ to the aid of their country.
+
+ This provides a mechanism with which long lines are encoded in such a
+ way as to be restored by the user agent. The 76 character limit does
+ not count the trailing CRLF, but counts all other characters,
+ including any equal signs.
+
+ Since the hyphen character ("-") may be represented as itself in the
+ Quoted-Printable encoding, care must be taken, when encapsulating a
+ quoted-printable encoded body inside one or more multipart entities,
+ to ensure that the boundary delimiter does not appear anywhere in the
+ encoded body. (A good strategy is to choose a boundary that includes
+ a character sequence such as "=_" which can never appear in a
+ quoted-printable body. See the definition of multipart messages in
+ RFC 2046.)
+
+ NOTE: The quoted-printable encoding represents something of a
+ compromise between readability and reliability in transport. Bodies
+ encoded with the quoted-printable encoding will work reliably over
+ most mail gateways, but may not work perfectly over a few gateways,
+ notably those involving translation into EBCDIC. A higher level of
+ confidence is offered by the base64 Content-Transfer-Encoding. A way
+ to get reasonably reliable transport through EBCDIC gateways is to
+ also quote the US-ASCII characters
+
+ !"#$@[\]^`{|}~
+
+ according to rule #1.
+
+ Because quoted-printable data is generally assumed to be line-
+ oriented, it is to be expected that the representation of the breaks
+ between the lines of quoted-printable data may be altered in
+ transport, in the same manner that plain text mail has always been
+ altered in Internet mail when passing between systems with differing
+ newline conventions. If such alterations are likely to constitute a
+
+
+
+Freed & Borenstein Standards Track [Page 21]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ corruption of the data, it is probably more sensible to use the
+ base64 encoding rather than the quoted-printable encoding.
+
+ NOTE: Several kinds of substrings cannot be generated according to
+ the encoding rules for the quoted-printable content-transfer-
+ encoding, and hence are formally illegal if they appear in the output
+ of a quoted-printable encoder. This note enumerates these cases and
+ suggests ways to handle such illegal substrings if any are
+ encountered in quoted-printable data that is to be decoded.
+
+ (1) An "=" followed by two hexadecimal digits, one or both
+ of which are lowercase letters in "abcdef", is formally
+ illegal. A robust implementation might choose to
+ recognize them as the corresponding uppercase letters.
+
+ (2) An "=" followed by a character that is neither a
+ hexadecimal digit (including "abcdef") nor the CR
+ character of a CRLF pair is illegal. This case can be
+ the result of US-ASCII text having been included in a
+ quoted-printable part of a message without itself
+ having been subjected to quoted-printable encoding. A
+ reasonable approach by a robust implementation might be
+ to include the "=" character and the following
+ character in the decoded data without any
+ transformation and, if possible, indicate to the user
+ that proper decoding was not possible at this point in
+ the data.
+
+ (3) An "=" cannot be the ultimate or penultimate character
+ in an encoded object. This could be handled as in case
+ (2) above.
+
+ (4) Control characters other than TAB, or CR and LF as
+ parts of CRLF pairs, must not appear. The same is true
+ for octets with decimal values greater than 126. If
+ found in incoming quoted-printable data by a decoder, a
+ robust implementation might exclude them from the
+ decoded data and warn the user that illegal characters
+ were discovered.
+
+ (5) Encoded lines must not be longer than 76 characters,
+ not counting the trailing CRLF. If longer lines are
+ found in incoming, encoded data, a robust
+ implementation might nevertheless decode the lines, and
+ might report the erroneous encoding to the user.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 22]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ WARNING TO IMPLEMENTORS: If binary data is encoded in quoted-
+ printable, care must be taken to encode CR and LF characters as "=0D"
+ and "=0A", respectively. In particular, a CRLF sequence in binary
+ data should be encoded as "=0D=0A". Otherwise, if CRLF were
+ represented as a hard line break, it might be incorrectly decoded on
+ platforms with different line break conventions.
+
+ For formalists, the syntax of quoted-printable data is described by
+ the following grammar:
+
+ quoted-printable := qp-line *(CRLF qp-line)
+
+ qp-line := *(qp-segment transport-padding CRLF)
+ qp-part transport-padding
+
+ qp-part := qp-section
+ ; Maximum length of 76 characters
+
+ qp-segment := qp-section *(SPACE / TAB) "="
+ ; Maximum length of 76 characters
+
+ qp-section := [*(ptext / SPACE / TAB) ptext]
+
+ ptext := hex-octet / safe-char
+
+ safe-char := <any octet with decimal value of 33 through
+ 60 inclusive, and 62 through 126>
+ ; Characters not listed as "mail-safe" in
+ ; RFC 2049 are also not recommended.
+
+ hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
+ ; Octet must be used for characters > 127, =,
+ ; SPACEs or TABs at the ends of lines, and is
+ ; recommended for any character not listed in
+ ; RFC 2049 as "mail-safe".
+
+ transport-padding := *LWSP-char
+ ; Composers MUST NOT generate
+ ; non-zero length transport
+ ; padding, but receivers MUST
+ ; be able to handle padding
+ ; added by message transports.
+
+ IMPORTANT: The addition of LWSP between the elements shown in this
+ BNF is NOT allowed since this BNF does not specify a structured
+ header field.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 23]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+6.8. Base64 Content-Transfer-Encoding
+
+ The Base64 Content-Transfer-Encoding is designed to represent
+ arbitrary sequences of octets in a form that need not be humanly
+ readable. The encoding and decoding algorithms are simple, but the
+ encoded data are consistently only about 33 percent larger than the
+ unencoded data. This encoding is virtually identical to the one used
+ in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.
+
+ A 65-character subset of US-ASCII is used, enabling 6 bits to be
+ represented per printable character. (The extra 65th character, "=",
+ is used to signify a special processing function.)
+
+ NOTE: This subset has the important property that it is represented
+ identically in all versions of ISO 646, including US-ASCII, and all
+ characters in the subset are also represented identically in all
+ versions of EBCDIC. Other popular encodings, such as the encoding
+ used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and
+ the base85 encoding specified as part of Level 2 PostScript, do not
+ share these properties, and thus do not fulfill the portability
+ requirements a binary transport encoding for mail must meet.
+
+ The encoding process represents 24-bit groups of input bits as output
+ strings of 4 encoded characters. Proceeding from left to right, a
+ 24-bit input group is formed by concatenating 3 8bit input groups.
+ These 24 bits are then treated as 4 concatenated 6-bit groups, each
+ of which is translated into a single digit in the base64 alphabet.
+ When encoding a bit stream via the base64 encoding, the bit stream
+ must be presumed to be ordered with the most-significant-bit first.
+ That is, the first bit in the stream will be the high-order bit in
+ the first 8bit byte, and the eighth bit will be the low-order bit in
+ the first 8bit byte, and so on.
+
+ Each 6-bit group is used as an index into an array of 64 printable
+ characters. The character referenced by the index is placed in the
+ output string. These characters, identified in Table 1, below, are
+ selected so as to be universally representable, and the set excludes
+ characters with particular significance to SMTP (e.g., ".", CR, LF)
+ and to the multipart boundary delimiters defined in RFC 2046 (e.g.,
+ "-").
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 24]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ Table 1: The Base64 Alphabet
+
+ Value Encoding Value Encoding Value Encoding Value Encoding
+ 0 A 17 R 34 i 51 z
+ 1 B 18 S 35 j 52 0
+ 2 C 19 T 36 k 53 1
+ 3 D 20 U 37 l 54 2
+ 4 E 21 V 38 m 55 3
+ 5 F 22 W 39 n 56 4
+ 6 G 23 X 40 o 57 5
+ 7 H 24 Y 41 p 58 6
+ 8 I 25 Z 42 q 59 7
+ 9 J 26 a 43 r 60 8
+ 10 K 27 b 44 s 61 9
+ 11 L 28 c 45 t 62 +
+ 12 M 29 d 46 u 63 /
+ 13 N 30 e 47 v
+ 14 O 31 f 48 w (pad) =
+ 15 P 32 g 49 x
+ 16 Q 33 h 50 y
+
+ The encoded output stream must be represented in lines of no more
+ than 76 characters each. All line breaks or other characters not
+ found in Table 1 must be ignored by decoding software. In base64
+ data, characters other than those in Table 1, line breaks, and other
+ white space probably indicate a transmission error, about which a
+ warning message or even a message rejection might be appropriate
+ under some circumstances.
+
+ Special processing is performed if fewer than 24 bits are available
+ at the end of the data being encoded. A full encoding quantum is
+ always completed at the end of a body. When fewer than 24 input bits
+ are available in an input group, zero bits are added (on the right)
+ to form an integral number of 6-bit groups. Padding at the end of
+ the data is performed using the "=" character. Since all base64
+ input is an integral number of octets, only the following cases can
+ arise: (1) the final quantum of encoding input is an integral
+ multiple of 24 bits; here, the final unit of encoded output will be
+ an integral multiple of 4 characters with no "=" padding, (2) the
+ final quantum of encoding input is exactly 8 bits; here, the final
+ unit of encoded output will be two characters followed by two "="
+ padding characters, or (3) the final quantum of encoding input is
+ exactly 16 bits; here, the final unit of encoded output will be three
+ characters followed by one "=" padding character.
+
+ Because it is used only for padding at the end of the data, the
+ occurrence of any "=" characters may be taken as evidence that the
+ end of the data has been reached (without truncation in transit). No
+
+
+
+Freed & Borenstein Standards Track [Page 25]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ such assurance is possible, however, when the number of octets
+ transmitted was a multiple of three and no "=" characters are
+ present.
+
+ Any characters outside of the base64 alphabet are to be ignored in
+ base64-encoded data.
+
+ Care must be taken to use the proper octets for line breaks if base64
+ encoding is applied directly to text material that has not been
+ converted to canonical form. In particular, text line breaks must be
+ converted into CRLF sequences prior to base64 encoding. The
+ important thing to note is that this may be done directly by the
+ encoder rather than in a prior canonicalization step in some
+ implementations.
+
+ NOTE: There is no need to worry about quoting potential boundary
+ delimiters within base64-encoded bodies within multipart entities
+ because no hyphen characters are used in the base64 encoding.
+
+7. Content-ID Header Field
+
+ In constructing a high-level user agent, it may be desirable to allow
+ one body to make reference to another. Accordingly, bodies may be
+ labelled using the "Content-ID" header field, which is syntactically
+ identical to the "Message-ID" header field:
+
+ id := "Content-ID" ":" msg-id
+
+ Like the Message-ID values, Content-ID values must be generated to be
+ world-unique.
+
+ The Content-ID value may be used for uniquely identifying MIME
+ entities in several contexts, particularly for caching data
+ referenced by the message/external-body mechanism. Although the
+ Content-ID header is generally optional, its use is MANDATORY in
+ implementations which generate data of the optional MIME media type
+ "message/external-body". That is, each message/external-body entity
+ must have a Content-ID field to permit caching of such data.
+
+ It is also worth noting that the Content-ID value has special
+ semantics in the case of the multipart/alternative media type. This
+ is explained in the section of RFC 2046 dealing with
+ multipart/alternative.
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 26]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+8. Content-Description Header Field
+
+ The ability to associate some descriptive information with a given
+ body is often desirable. For example, it may be useful to mark an
+ "image" body as "a picture of the Space Shuttle Endeavor." Such text
+ may be placed in the Content-Description header field. This header
+ field is always optional.
+
+ description := "Content-Description" ":" *text
+
+ The description is presumed to be given in the US-ASCII character
+ set, although the mechanism specified in RFC 2047 may be used for
+ non-US-ASCII Content-Description values.
+
+9. Additional MIME Header Fields
+
+ Future documents may elect to define additional MIME header fields
+ for various purposes. Any new header field that further describes
+ the content of a message should begin with the string "Content-" to
+ allow such fields which appear in a message header to be
+ distinguished from ordinary RFC 822 message header fields.
+
+ MIME-extension-field := <Any RFC 822 header field which
+ begins with the string
+ "Content-">
+
+10. Summary
+
+ Using the MIME-Version, Content-Type, and Content-Transfer-Encoding
+ header fields, it is possible to include, in a standardized way,
+ arbitrary types of data with RFC 822 conformant mail messages. No
+ restrictions imposed by either RFC 821 or RFC 822 are violated, and
+ care has been taken to avoid problems caused by additional
+ restrictions imposed by the characteristics of some Internet mail
+ transport mechanisms (see RFC 2049).
+
+ The next document in this set, RFC 2046, specifies the initial set of
+ media types that can be labelled and transported using these headers.
+
+11. Security Considerations
+
+ Security issues are discussed in the second document in this set, RFC
+ 2046.
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 27]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+12. Authors' Addresses
+
+ For more information, the authors of this document are best contacted
+ via Internet mail:
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 East Garvey Avenue South
+ West Covina, CA 91790
+ USA
+
+ Phone: +1 818 919 3600
+ Fax: +1 818 919 3614
+ EMail: ned@innosoft.com
+
+
+ Nathaniel S. Borenstein
+ First Virtual Holdings
+ 25 Washington Avenue
+ Morristown, NJ 07960
+ USA
+
+ Phone: +1 201 540 8967
+ Fax: +1 201 993 3032
+ EMail: nsb@nsb.fv.com
+
+
+ MIME is a result of the work of the Internet Engineering Task Force
+ Working Group on RFC 822 Extensions. The chairman of that group,
+ Greg Vaudreuil, may be reached at:
+
+ Gregory M. Vaudreuil
+ Octel Network Services
+ 17080 Dallas Parkway
+ Dallas, TX 75248-1905
+ USA
+
+ EMail: Greg.Vaudreuil@Octel.Com
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 28]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+Appendix A -- Collected Grammar
+
+ This appendix contains the complete BNF grammar for all the syntax
+ specified by this document.
+
+ By itself, however, this grammar is incomplete. It refers by name to
+ several syntax rules that are defined by RFC 822. Rather than
+ reproduce those definitions here, and risk unintentional differences
+ between the two, this document simply refers the reader to RFC 822
+ for the remaining definitions. Wherever a term is undefined, it
+ refers to the RFC 822 definition.
+
+ attribute := token
+ ; Matching of attributes
+ ; is ALWAYS case-insensitive.
+
+ composite-type := "message" / "multipart" / extension-token
+
+ content := "Content-Type" ":" type "/" subtype
+ *(";" parameter)
+ ; Matching of media type and subtype
+ ; is ALWAYS case-insensitive.
+
+ description := "Content-Description" ":" *text
+
+ discrete-type := "text" / "image" / "audio" / "video" /
+ "application" / extension-token
+
+ encoding := "Content-Transfer-Encoding" ":" mechanism
+
+ entity-headers := [ content CRLF ]
+ [ encoding CRLF ]
+ [ id CRLF ]
+ [ description CRLF ]
+ *( MIME-extension-field CRLF )
+
+ extension-token := ietf-token / x-token
+
+ hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
+ ; Octet must be used for characters > 127, =,
+ ; SPACEs or TABs at the ends of lines, and is
+ ; recommended for any character not listed in
+ ; RFC 2049 as "mail-safe".
+
+ iana-token := <A publicly-defined extension token. Tokens
+ of this form must be registered with IANA
+ as specified in RFC 2048.>
+
+
+
+
+Freed & Borenstein Standards Track [Page 29]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ ietf-token := <An extension token defined by a
+ standards-track RFC and registered
+ with IANA.>
+
+ id := "Content-ID" ":" msg-id
+
+ mechanism := "7bit" / "8bit" / "binary" /
+ "quoted-printable" / "base64" /
+ ietf-token / x-token
+
+ MIME-extension-field := <Any RFC 822 header field which
+ begins with the string
+ "Content-">
+
+ MIME-message-headers := entity-headers
+ fields
+ version CRLF
+ ; The ordering of the header
+ ; fields implied by this BNF
+ ; definition should be ignored.
+
+ MIME-part-headers := entity-headers
+ [fields]
+ ; Any field not beginning with
+ ; "content-" can have no defined
+ ; meaning and may be ignored.
+ ; The ordering of the header
+ ; fields implied by this BNF
+ ; definition should be ignored.
+
+ parameter := attribute "=" value
+
+ ptext := hex-octet / safe-char
+
+ qp-line := *(qp-segment transport-padding CRLF)
+ qp-part transport-padding
+
+ qp-part := qp-section
+ ; Maximum length of 76 characters
+
+ qp-section := [*(ptext / SPACE / TAB) ptext]
+
+ qp-segment := qp-section *(SPACE / TAB) "="
+ ; Maximum length of 76 characters
+
+ quoted-printable := qp-line *(CRLF qp-line)
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 30]
+\f
+RFC 2045 Internet Message Bodies November 1996
+
+
+ safe-char := <any octet with decimal value of 33 through
+ 60 inclusive, and 62 through 126>
+ ; Characters not listed as "mail-safe" in
+ ; RFC 2049 are also not recommended.
+
+ subtype := extension-token / iana-token
+
+ token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
+ or tspecials>
+
+ transport-padding := *LWSP-char
+ ; Composers MUST NOT generate
+ ; non-zero length transport
+ ; padding, but receivers MUST
+ ; be able to handle padding
+ ; added by message transports.
+
+ tspecials := "(" / ")" / "<" / ">" / "@" /
+ "," / ";" / ":" / "\" / <">
+ "/" / "[" / "]" / "?" / "="
+ ; Must be in quoted-string,
+ ; to use within parameter values
+
+ type := discrete-type / composite-type
+
+ value := token / quoted-string
+
+ version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
+
+ x-token := <The two characters "X-" or "x-" followed, with
+ no intervening white space, by any token>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 31]
+\f
--- /dev/null
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2046 Innosoft
+Obsoletes: 1521, 1522, 1590 N. Borenstein
+Category: Standards Track First Virtual
+ November 1996
+
+
+ Multipurpose Internet Mail Extensions
+ (MIME) Part Two:
+ Media Types
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ STD 11, RFC 822 defines a message representation protocol specifying
+ considerable detail about US-ASCII message headers, but which leaves
+ the message content, or message body, as flat US-ASCII text. This
+ set of documents, collectively called the Multipurpose Internet Mail
+ Extensions, or MIME, redefines the format of messages to allow for
+
+ (1) textual message bodies in character sets other than
+ US-ASCII,
+
+ (2) an extensible set of different formats for non-textual
+ message bodies,
+
+ (3) multi-part message bodies, and
+
+ (4) textual header information in character sets other than
+ US-ASCII.
+
+ These documents are based on earlier work documented in RFC 934, STD
+ 11, and RFC 1049, but extends and revises them. Because RFC 822 said
+ so little about message bodies, these documents are largely
+ orthogonal to (rather than a revision of) RFC 822.
+
+ The initial document in this set, RFC 2045, specifies the various
+ headers used to describe the structure of MIME messages. This second
+ document defines the general structure of the MIME media typing
+ system and defines an initial set of media types. The third document,
+ RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text
+
+
+
+Freed & Borenstein Standards Track [Page 1]
+\f
+RFC 2046 Media Types November 1996
+
+
+ data in Internet mail header fields. The fourth document, RFC 2048,
+ specifies various IANA registration procedures for MIME-related
+ facilities. The fifth and final document, RFC 2049, describes MIME
+ conformance criteria as well as providing some illustrative examples
+ of MIME message formats, acknowledgements, and the bibliography.
+
+ These documents are revisions of RFCs 1521 and 1522, which themselves
+ were revisions of RFCs 1341 and 1342. An appendix in RFC 2049
+ describes differences and changes from previous versions.
+
+Table of Contents
+
+ 1. Introduction ......................................... 3
+ 2. Definition of a Top-Level Media Type ................. 4
+ 3. Overview Of The Initial Top-Level Media Types ........ 4
+ 4. Discrete Media Type Values ........................... 6
+ 4.1 Text Media Type ..................................... 6
+ 4.1.1 Representation of Line Breaks ..................... 7
+ 4.1.2 Charset Parameter ................................. 7
+ 4.1.3 Plain Subtype ..................................... 11
+ 4.1.4 Unrecognized Subtypes ............................. 11
+ 4.2 Image Media Type .................................... 11
+ 4.3 Audio Media Type .................................... 11
+ 4.4 Video Media Type .................................... 12
+ 4.5 Application Media Type .............................. 12
+ 4.5.1 Octet-Stream Subtype .............................. 13
+ 4.5.2 PostScript Subtype ................................ 14
+ 4.5.3 Other Application Subtypes ........................ 17
+ 5. Composite Media Type Values .......................... 17
+ 5.1 Multipart Media Type ................................ 17
+ 5.1.1 Common Syntax ..................................... 19
+ 5.1.2 Handling Nested Messages and Multiparts ........... 24
+ 5.1.3 Mixed Subtype ..................................... 24
+ 5.1.4 Alternative Subtype ............................... 24
+ 5.1.5 Digest Subtype .................................... 26
+ 5.1.6 Parallel Subtype .................................. 27
+ 5.1.7 Other Multipart Subtypes .......................... 28
+ 5.2 Message Media Type .................................. 28
+ 5.2.1 RFC822 Subtype .................................... 28
+ 5.2.2 Partial Subtype ................................... 29
+ 5.2.2.1 Message Fragmentation and Reassembly ............ 30
+ 5.2.2.2 Fragmentation and Reassembly Example ............ 31
+ 5.2.3 External-Body Subtype ............................. 33
+ 5.2.4 Other Message Subtypes ............................ 40
+ 6. Experimental Media Type Values ....................... 40
+ 7. Summary .............................................. 41
+ 8. Security Considerations .............................. 41
+ 9. Authors' Addresses ................................... 42
+
+
+
+Freed & Borenstein Standards Track [Page 2]
+\f
+RFC 2046 Media Types November 1996
+
+
+ A. Collected Grammar .................................... 43
+
+1. Introduction
+
+ The first document in this set, RFC 2045, defines a number of header
+ fields, including Content-Type. The Content-Type field is used to
+ specify the nature of the data in the body of a MIME entity, by
+ giving media type and subtype identifiers, and by providing auxiliary
+ information that may be required for certain media types. After the
+ type and subtype names, the remainder of the header field is simply a
+ set of parameters, specified in an attribute/value notation. The
+ ordering of parameters is not significant.
+
+ In general, the top-level media type is used to declare the general
+ type of data, while the subtype specifies a specific format for that
+ type of data. Thus, a media type of "image/xyz" is enough to tell a
+ user agent that the data is an image, even if the user agent has no
+ knowledge of the specific image format "xyz". Such information can
+ be used, for example, to decide whether or not to show a user the raw
+ data from an unrecognized subtype -- such an action might be
+ reasonable for unrecognized subtypes of "text", but not for
+ unrecognized subtypes of "image" or "audio". For this reason,
+ registered subtypes of "text", "image", "audio", and "video" should
+ not contain embedded information that is really of a different type.
+ Such compound formats should be represented using the "multipart" or
+ "application" types.
+
+ Parameters are modifiers of the media subtype, and as such do not
+ fundamentally affect the nature of the content. The set of
+ meaningful parameters depends on the media type and subtype. Most
+ parameters are associated with a single specific subtype. However, a
+ given top-level media type may define parameters which are applicable
+ to any subtype of that type. Parameters may be required by their
+ defining media type or subtype or they may be optional. MIME
+ implementations must also ignore any parameters whose names they do
+ not recognize.
+
+ MIME's Content-Type header field and media type mechanism has been
+ carefully designed to be extensible, and it is expected that the set
+ of media type/subtype pairs and their associated parameters will grow
+ significantly over time. Several other MIME facilities, such as
+ transfer encodings and "message/external-body" access types, are
+ likely to have new values defined over time. In order to ensure that
+ the set of such values is developed in an orderly, well-specified,
+ and public manner, MIME sets up a registration process which uses the
+ Internet Assigned Numbers Authority (IANA) as a central registry for
+ MIME's various areas of extensibility. The registration process for
+ these areas is described in a companion document, RFC 2048.
+
+
+
+Freed & Borenstein Standards Track [Page 3]
+\f
+RFC 2046 Media Types November 1996
+
+
+ The initial seven standard top-level media type are defined and
+ described in the remainder of this document.
+
+2. Definition of a Top-Level Media Type
+
+ The definition of a top-level media type consists of:
+
+ (1) a name and a description of the type, including
+ criteria for whether a particular type would qualify
+ under that type,
+
+ (2) the names and definitions of parameters, if any, which
+ are defined for all subtypes of that type (including
+ whether such parameters are required or optional),
+
+ (3) how a user agent and/or gateway should handle unknown
+ subtypes of this type,
+
+ (4) general considerations on gatewaying entities of this
+ top-level type, if any, and
+
+ (5) any restrictions on content-transfer-encodings for
+ entities of this top-level type.
+
+3. Overview Of The Initial Top-Level Media Types
+
+ The five discrete top-level media types are:
+
+ (1) text -- textual information. The subtype "plain" in
+ particular indicates plain text containing no
+ formatting commands or directives of any sort. Plain
+ text is intended to be displayed "as-is". No special
+ software is required to get the full meaning of the
+ text, aside from support for the indicated character
+ set. Other subtypes are to be used for enriched text in
+ forms where application software may enhance the
+ appearance of the text, but such software must not be
+ required in order to get the general idea of the
+ content. Possible subtypes of "text" thus include any
+ word processor format that can be read without
+ resorting to software that understands the format. In
+ particular, formats that employ embeddded binary
+ formatting information are not considered directly
+ readable. A very simple and portable subtype,
+ "richtext", was defined in RFC 1341, with a further
+ revision in RFC 1896 under the name "enriched".
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 4]
+\f
+RFC 2046 Media Types November 1996
+
+
+ (2) image -- image data. "Image" requires a display device
+ (such as a graphical display, a graphics printer, or a
+ FAX machine) to view the information. An initial
+ subtype is defined for the widely-used image format
+ JPEG. . subtypes are defined for two widely-used image
+ formats, jpeg and gif.
+
+ (3) audio -- audio data. "Audio" requires an audio output
+ device (such as a speaker or a telephone) to "display"
+ the contents. An initial subtype "basic" is defined in
+ this document.
+
+ (4) video -- video data. "Video" requires the capability
+ to display moving images, typically including
+ specialized hardware and software. An initial subtype
+ "mpeg" is defined in this document.
+
+ (5) application -- some other kind of data, typically
+ either uninterpreted binary data or information to be
+ processed by an application. The subtype "octet-
+ stream" is to be used in the case of uninterpreted
+ binary data, in which case the simplest recommended
+ action is to offer to write the information into a file
+ for the user. The "PostScript" subtype is also defined
+ for the transport of PostScript material. Other
+ expected uses for "application" include spreadsheets,
+ data for mail-based scheduling systems, and languages
+ for "active" (computational) messaging, and word
+ processing formats that are not directly readable.
+ Note that security considerations may exist for some
+ types of application data, most notably
+ "application/PostScript" and any form of active
+ messaging. These issues are discussed later in this
+ document.
+
+ The two composite top-level media types are:
+
+ (1) multipart -- data consisting of multiple entities of
+ independent data types. Four subtypes are initially
+ defined, including the basic "mixed" subtype specifying
+ a generic mixed set of parts, "alternative" for
+ representing the same data in multiple formats,
+ "parallel" for parts intended to be viewed
+ simultaneously, and "digest" for multipart entities in
+ which each part has a default type of "message/rfc822".
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 5]
+\f
+RFC 2046 Media Types November 1996
+
+
+ (2) message -- an encapsulated message. A body of media
+ type "message" is itself all or a portion of some kind
+ of message object. Such objects may or may not in turn
+ contain other entities. The "rfc822" subtype is used
+ when the encapsulated content is itself an RFC 822
+ message. The "partial" subtype is defined for partial
+ RFC 822 messages, to permit the fragmented transmission
+ of bodies that are thought to be too large to be passed
+ through transport facilities in one piece. Another
+ subtype, "external-body", is defined for specifying
+ large bodies by reference to an external data source.
+
+ It should be noted that the list of media type values given here may
+ be augmented in time, via the mechanisms described above, and that
+ the set of subtypes is expected to grow substantially.
+
+4. Discrete Media Type Values
+
+ Five of the seven initial media type values refer to discrete bodies.
+ The content of these types must be handled by non-MIME mechanisms;
+ they are opaque to MIME processors.
+
+4.1. Text Media Type
+
+ The "text" media type is intended for sending material which is
+ principally textual in form. A "charset" parameter may be used to
+ indicate the character set of the body text for "text" subtypes,
+ notably including the subtype "text/plain", which is a generic
+ subtype for plain text. Plain text does not provide for or allow
+ formatting commands, font attribute specifications, processing
+ instructions, interpretation directives, or content markup. Plain
+ text is seen simply as a linear sequence of characters, possibly
+ interrupted by line breaks or page breaks. Plain text may allow the
+ stacking of several characters in the same position in the text.
+ Plain text in scripts like Arabic and Hebrew may also include
+ facilitites that allow the arbitrary mixing of text segments with
+ opposite writing directions.
+
+ Beyond plain text, there are many formats for representing what might
+ be known as "rich text". An interesting characteristic of many such
+ representations is that they are to some extent readable even without
+ the software that interprets them. It is useful, then, to
+ distinguish them, at the highest level, from such unreadable data as
+ images, audio, or text represented in an unreadable form. In the
+ absence of appropriate interpretation software, it is reasonable to
+ show subtypes of "text" to the user, while it is not reasonable to do
+ so with most nontextual data. Such formatted textual data should be
+ represented using subtypes of "text".
+
+
+
+Freed & Borenstein Standards Track [Page 6]
+\f
+RFC 2046 Media Types November 1996
+
+
+4.1.1. Representation of Line Breaks
+
+ The canonical form of any MIME "text" subtype MUST always represent a
+ line break as a CRLF sequence. Similarly, any occurrence of CRLF in
+ MIME "text" MUST represent a line break. Use of CR and LF outside of
+ line break sequences is also forbidden.
+
+ This rule applies regardless of format or character set or sets
+ involved.
+
+ NOTE: The proper interpretation of line breaks when a body is
+ displayed depends on the media type. In particular, while it is
+ appropriate to treat a line break as a transition to a new line when
+ displaying a "text/plain" body, this treatment is actually incorrect
+ for other subtypes of "text" like "text/enriched" [RFC-1896].
+ Similarly, whether or not line breaks should be added during display
+ operations is also a function of the media type. It should not be
+ necessary to add any line breaks to display "text/plain" correctly,
+ whereas proper display of "text/enriched" requires the appropriate
+ addition of line breaks.
+
+ NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC-
+ 821] allows a maximum of 998 octets before the next CRLF sequence.
+ To be transported by such protocols, data which includes too long
+ segments without CRLF sequences must be encoded with a suitable
+ content-transfer-encoding.
+
+4.1.2. Charset Parameter
+
+ A critical parameter that may be specified in the Content-Type field
+ for "text/plain" data is the character set. This is specified with a
+ "charset" parameter, as in:
+
+ Content-type: text/plain; charset=iso-8859-1
+
+ Unlike some other parameter values, the values of the charset
+ parameter are NOT case sensitive. The default character set, which
+ must be assumed in the absence of a charset parameter, is US-ASCII.
+
+ The specification for any future subtypes of "text" must specify
+ whether or not they will also utilize a "charset" parameter, and may
+ possibly restrict its values as well. For other subtypes of "text"
+ than "text/plain", the semantics of the "charset" parameter should be
+ defined to be identical to those specified here for "text/plain",
+ i.e., the body consists entirely of characters in the given charset.
+ In particular, definers of future "text" subtypes should pay close
+ attention to the implications of multioctet character sets for their
+ subtype definitions.
+
+
+
+Freed & Borenstein Standards Track [Page 7]
+\f
+RFC 2046 Media Types November 1996
+
+
+ The charset parameter for subtypes of "text" gives a name of a
+ character set, as "character set" is defined in RFC 2045. The rules
+ regarding line breaks detailed in the previous section must also be
+ observed -- a character set whose definition does not conform to
+ these rules cannot be used in a MIME "text" subtype.
+
+ An initial list of predefined character set names can be found at the
+ end of this section. Additional character sets may be registered
+ with IANA.
+
+ Other media types than subtypes of "text" might choose to employ the
+ charset parameter as defined here, but with the CRLF/line break
+ restriction removed. Therefore, all character sets that conform to
+ the general definition of "character set" in RFC 2045 can be
+ registered for MIME use.
+
+ Note that if the specified character set includes 8-bit characters
+ and such characters are used in the body, a Content-Transfer-Encoding
+ header field and a corresponding encoding on the data are required in
+ order to transmit the body via some mail transfer protocols, such as
+ SMTP [RFC-821].
+
+ The default character set, US-ASCII, has been the subject of some
+ confusion and ambiguity in the past. Not only were there some
+ ambiguities in the definition, there have been wide variations in
+ practice. In order to eliminate such ambiguity and variations in the
+ future, it is strongly recommended that new user agents explicitly
+ specify a character set as a media type parameter in the Content-Type
+ header field. "US-ASCII" does not indicate an arbitrary 7-bit
+ character set, but specifies that all octets in the body must be
+ interpreted as characters according to the US-ASCII character set.
+ National and application-oriented versions of ISO 646 [ISO-646] are
+ usually NOT identical to US-ASCII, and in that case their use in
+ Internet mail is explicitly discouraged. The omission of the ISO 646
+ character set from this document is deliberate in this regard. The
+ character set name of "US-ASCII" explicitly refers to the character
+ set defined in ANSI X3.4-1986 [US- ASCII]. The new international
+ reference version (IRV) of the 1991 edition of ISO 646 is identical
+ to US-ASCII. The character set name "ASCII" is reserved and must not
+ be used for any purpose.
+
+ NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier
+ version of the American Standard. Insofar as one of the purposes of
+ specifying a media type and character set is to permit the receiver
+ to unambiguously determine how the sender intended the coded message
+ to be interpreted, assuming anything other than "strict ASCII" as the
+ default would risk unintentional and incompatible changes to the
+ semantics of messages now being transmitted. This also implies that
+
+
+
+Freed & Borenstein Standards Track [Page 8]
+\f
+RFC 2046 Media Types November 1996
+
+
+ messages containing characters coded according to other versions of
+ ISO 646 than US-ASCII and the 1991 IRV, or using code-switching
+ procedures (e.g., those of ISO 2022), as well as 8bit or multiple
+ octet character encodings MUST use an appropriate character set
+ specification to be consistent with MIME.
+
+ The complete US-ASCII character set is listed in ANSI X3.4- 1986.
+ Note that the control characters including DEL (0-31, 127) have no
+ defined meaning in apart from the combination CRLF (US-ASCII values
+ 13 and 10) indicating a new line. Two of the characters have de
+ facto meanings in wide use: FF (12) often means "start subsequent
+ text on the beginning of a new page"; and TAB or HT (9) often (though
+ not always) means "move the cursor to the next available column after
+ the current position where the column number is a multiple of 8
+ (counting the first column as column 0)." Aside from these
+ conventions, any use of the control characters or DEL in a body must
+ either occur
+
+ (1) because a subtype of text other than "plain"
+ specifically assigns some additional meaning, or
+
+ (2) within the context of a private agreement between the
+ sender and recipient. Such private agreements are
+ discouraged and should be replaced by the other
+ capabilities of this document.
+
+ NOTE: An enormous proliferation of character sets exist beyond US-
+ ASCII. A large number of partially or totally overlapping character
+ sets is NOT a good thing. A SINGLE character set that can be used
+ universally for representing all of the world's languages in Internet
+ mail would be preferrable. Unfortunately, existing practice in
+ several communities seems to point to the continued use of multiple
+ character sets in the near future. A small number of standard
+ character sets are, therefore, defined for Internet use in this
+ document.
+
+ The defined charset values are:
+
+ (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII].
+
+ (2) ISO-8859-X -- where "X" is to be replaced, as
+ necessary, for the parts of ISO-8859 [ISO-8859]. Note
+ that the ISO 646 character sets have deliberately been
+ omitted in favor of their 8859 replacements, which are
+ the designated character sets for Internet mail. As of
+ the publication of this document, the legitimate values
+ for "X" are the digits 1 through 10.
+
+
+
+
+Freed & Borenstein Standards Track [Page 9]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Characters in the range 128-159 has no assigned meaning in ISO-8859-
+ X. Characters with values below 128 in ISO-8859-X have the same
+ assigned meaning as they do in US-ASCII.
+
+ Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew
+ alphabet) includes both characters for which the normal writing
+ direction is right to left and characters for which it is left to
+ right, but do not define a canonical ordering method for representing
+ bi-directional text. The charset values "ISO-8859-6" and "ISO-8859-
+ 8", however, specify that the visual method is used [RFC-1556].
+
+ All of these character sets are used as pure 7bit or 8bit sets
+ without any shift or escape functions. The meaning of shift and
+ escape sequences in these character sets is not defined.
+
+ The character sets specified above are the ones that were relatively
+ uncontroversial during the drafting of MIME. This document does not
+ endorse the use of any particular character set other than US-ASCII,
+ and recognizes that the future evolution of world character sets
+ remains unclear.
+
+ Note that the character set used, if anything other than US- ASCII,
+ must always be explicitly specified in the Content-Type field.
+
+ No character set name other than those defined above may be used in
+ Internet mail without the publication of a formal specification and
+ its registration with IANA, or by private agreement, in which case
+ the character set name must begin with "X-".
+
+ Implementors are discouraged from defining new character sets unless
+ absolutely necessary.
+
+ The "charset" parameter has been defined primarily for the purpose of
+ textual data, and is described in this section for that reason.
+ However, it is conceivable that non-textual data might also wish to
+ specify a charset value for some purpose, in which case the same
+ syntax and values should be used.
+
+ In general, composition software should always use the "lowest common
+ denominator" character set possible. For example, if a body contains
+ only US-ASCII characters, it SHOULD be marked as being in the US-
+ ASCII character set, not ISO-8859-1, which, like all the ISO-8859
+ family of character sets, is a superset of US-ASCII. More generally,
+ if a widely-used character set is a subset of another character set,
+ and a body contains only characters in the widely-used subset, it
+ should be labelled as being in that subset. This will increase the
+ chances that the recipient will be able to view the resulting entity
+ correctly.
+
+
+
+Freed & Borenstein Standards Track [Page 10]
+\f
+RFC 2046 Media Types November 1996
+
+
+4.1.3. Plain Subtype
+
+ The simplest and most important subtype of "text" is "plain". This
+ indicates plain text that does not contain any formatting commands or
+ directives. Plain text is intended to be displayed "as-is", that is,
+ no interpretation of embedded formatting commands, font attribute
+ specifications, processing instructions, interpretation directives,
+ or content markup should be necessary for proper display. The
+ default media type of "text/plain; charset=us-ascii" for Internet
+ mail describes existing Internet practice. That is, it is the type
+ of body defined by RFC 822.
+
+ No other "text" subtype is defined by this document.
+
+4.1.4. Unrecognized Subtypes
+
+ Unrecognized subtypes of "text" should be treated as subtype "plain"
+ as long as the MIME implementation knows how to handle the charset.
+ Unrecognized subtypes which also specify an unrecognized charset
+ should be treated as "application/octet- stream".
+
+4.2. Image Media Type
+
+ A media type of "image" indicates that the body contains an image.
+ The subtype names the specific image format. These names are not
+ case sensitive. An initial subtype is "jpeg" for the JPEG format
+ using JFIF encoding [JPEG].
+
+ The list of "image" subtypes given here is neither exclusive nor
+ exhaustive, and is expected to grow as more types are registered with
+ IANA, as described in RFC 2048.
+
+ Unrecognized subtypes of "image" should at a miniumum be treated as
+ "application/octet-stream". Implementations may optionally elect to
+ pass subtypes of "image" that they do not specifically recognize to a
+ secure and robust general-purpose image viewing application, if such
+ an application is available.
+
+ NOTE: Using of a generic-purpose image viewing application this way
+ inherits the security problems of the most dangerous type supported
+ by the application.
+
+4.3. Audio Media Type
+
+ A media type of "audio" indicates that the body contains audio data.
+ Although there is not yet a consensus on an "ideal" audio format for
+ use with computers, there is a pressing need for a format capable of
+ providing interoperable behavior.
+
+
+
+Freed & Borenstein Standards Track [Page 11]
+\f
+RFC 2046 Media Types November 1996
+
+
+ The initial subtype of "basic" is specified to meet this requirement
+ by providing an absolutely minimal lowest common denominator audio
+ format. It is expected that richer formats for higher quality and/or
+ lower bandwidth audio will be defined by a later document.
+
+ The content of the "audio/basic" subtype is single channel audio
+ encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz.
+
+ Unrecognized subtypes of "audio" should at a miniumum be treated as
+ "application/octet-stream". Implementations may optionally elect to
+ pass subtypes of "audio" that they do not specifically recognize to a
+ robust general-purpose audio playing application, if such an
+ application is available.
+
+4.4. Video Media Type
+
+ A media type of "video" indicates that the body contains a time-
+ varying-picture image, possibly with color and coordinated sound.
+ The term 'video' is used in its most generic sense, rather than with
+ reference to any particular technology or format, and is not meant to
+ preclude subtypes such as animated drawings encoded compactly. The
+ subtype "mpeg" refers to video coded according to the MPEG standard
+ [MPEG].
+
+ Note that although in general this document strongly discourages the
+ mixing of multiple media in a single body, it is recognized that many
+ so-called video formats include a representation for synchronized
+ audio, and this is explicitly permitted for subtypes of "video".
+
+ Unrecognized subtypes of "video" should at a minumum be treated as
+ "application/octet-stream". Implementations may optionally elect to
+ pass subtypes of "video" that they do not specifically recognize to a
+ robust general-purpose video display application, if such an
+ application is available.
+
+4.5. Application Media Type
+
+ The "application" media type is to be used for discrete data which do
+ not fit in any of the other categories, and particularly for data to
+ be processed by some type of application program. This is
+ information which must be processed by an application before it is
+ viewable or usable by a user. Expected uses for the "application"
+ media type include file transfer, spreadsheets, data for mail-based
+ scheduling systems, and languages for "active" (computational)
+ material. (The latter, in particular, can pose security problems
+ which must be understood by implementors, and are considered in
+ detail in the discussion of the "application/PostScript" media type.)
+
+
+
+
+Freed & Borenstein Standards Track [Page 12]
+\f
+RFC 2046 Media Types November 1996
+
+
+ For example, a meeting scheduler might define a standard
+ representation for information about proposed meeting dates. An
+ intelligent user agent would use this information to conduct a dialog
+ with the user, and might then send additional material based on that
+ dialog. More generally, there have been several "active" messaging
+ languages developed in which programs in a suitably specialized
+ language are transported to a remote location and automatically run
+ in the recipient's environment.
+
+ Such applications may be defined as subtypes of the "application"
+ media type. This document defines two subtypes:
+
+ octet-stream, and PostScript.
+
+ The subtype of "application" will often be either the name or include
+ part of the name of the application for which the data are intended.
+ This does not mean, however, that any application program name may be
+ used freely as a subtype of "application".
+
+4.5.1. Octet-Stream Subtype
+
+ The "octet-stream" subtype is used to indicate that a body contains
+ arbitrary binary data. The set of currently defined parameters is:
+
+ (1) TYPE -- the general type or category of binary data.
+ This is intended as information for the human recipient
+ rather than for any automatic processing.
+
+ (2) PADDING -- the number of bits of padding that were
+ appended to the bit-stream comprising the actual
+ contents to produce the enclosed 8bit byte-oriented
+ data. This is useful for enclosing a bit-stream in a
+ body when the total number of bits is not a multiple of
+ 8.
+
+ Both of these parameters are optional.
+
+ An additional parameter, "CONVERSIONS", was defined in RFC 1341 but
+ has since been removed. RFC 1341 also defined the use of a "NAME"
+ parameter which gave a suggested file name to be used if the data
+ were to be written to a file. This has been deprecated in
+ anticipation of a separate Content-Disposition header field, to be
+ defined in a subsequent RFC.
+
+ The recommended action for an implementation that receives an
+ "application/octet-stream" entity is to simply offer to put the data
+ in a file, with any Content-Transfer-Encoding undone, or perhaps to
+ use it as input to a user-specified process.
+
+
+
+Freed & Borenstein Standards Track [Page 13]
+\f
+RFC 2046 Media Types November 1996
+
+
+ To reduce the danger of transmitting rogue programs, it is strongly
+ recommended that implementations NOT implement a path-search
+ mechanism whereby an arbitrary program named in the Content-Type
+ parameter (e.g., an "interpreter=" parameter) is found and executed
+ using the message body as input.
+
+4.5.2. PostScript Subtype
+
+ A media type of "application/postscript" indicates a PostScript
+ program. Currently two variants of the PostScript language are
+ allowed; the original level 1 variant is described in [POSTSCRIPT]
+ and the more recent level 2 variant is described in [POSTSCRIPT2].
+
+ PostScript is a registered trademark of Adobe Systems, Inc. Use of
+ the MIME media type "application/postscript" implies recognition of
+ that trademark and all the rights it entails.
+
+ The PostScript language definition provides facilities for internal
+ labelling of the specific language features a given program uses.
+ This labelling, called the PostScript document structuring
+ conventions, or DSC, is very general and provides substantially more
+ information than just the language level. The use of document
+ structuring conventions, while not required, is strongly recommended
+ as an aid to interoperability. Documents which lack proper
+ structuring conventions cannot be tested to see whether or not they
+ will work in a given environment. As such, some systems may assume
+ the worst and refuse to process unstructured documents.
+
+ The execution of general-purpose PostScript interpreters entails
+ serious security risks, and implementors are discouraged from simply
+ sending PostScript bodies to "off- the-shelf" interpreters. While it
+ is usually safe to send PostScript to a printer, where the potential
+ for harm is greatly constrained by typical printer environments,
+ implementors should consider all of the following before they add
+ interactive display of PostScript bodies to their MIME readers.
+
+ The remainder of this section outlines some, though probably not all,
+ of the possible problems with the transport of PostScript entities.
+
+ (1) Dangerous operations in the PostScript language
+ include, but may not be limited to, the PostScript
+ operators "deletefile", "renamefile", "filenameforall",
+ and "file". "File" is only dangerous when applied to
+ something other than standard input or output.
+ Implementations may also define additional nonstandard
+ file operators; these may also pose a threat to
+ security. "Filenameforall", the wildcard file search
+ operator, may appear at first glance to be harmless.
+
+
+
+Freed & Borenstein Standards Track [Page 14]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Note, however, that this operator has the potential to
+ reveal information about what files the recipient has
+ access to, and this information may itself be
+ sensitive. Message senders should avoid the use of
+ potentially dangerous file operators, since these
+ operators are quite likely to be unavailable in secure
+ PostScript implementations. Message receiving and
+ displaying software should either completely disable
+ all potentially dangerous file operators or take
+ special care not to delegate any special authority to
+ their operation. These operators should be viewed as
+ being done by an outside agency when interpreting
+ PostScript documents. Such disabling and/or checking
+ should be done completely outside of the reach of the
+ PostScript language itself; care should be taken to
+ insure that no method exists for re-enabling full-
+ function versions of these operators.
+
+ (2) The PostScript language provides facilities for exiting
+ the normal interpreter, or server, loop. Changes made
+ in this "outer" environment are customarily retained
+ across documents, and may in some cases be retained
+ semipermanently in nonvolatile memory. The operators
+ associated with exiting the interpreter loop have the
+ potential to interfere with subsequent document
+ processing. As such, their unrestrained use
+ constitutes a threat of service denial. PostScript
+ operators that exit the interpreter loop include, but
+ may not be limited to, the exitserver and startjob
+ operators. Message sending software should not
+ generate PostScript that depends on exiting the
+ interpreter loop to operate, since the ability to exit
+ will probably be unavailable in secure PostScript
+ implementations. Message receiving and displaying
+ software should completely disable the ability to make
+ retained changes to the PostScript environment by
+ eliminating or disabling the "startjob" and
+ "exitserver" operations. If these operations cannot be
+ eliminated or completely disabled the password
+ associated with them should at least be set to a hard-
+ to-guess value.
+
+ (3) PostScript provides operators for setting system-wide
+ and device-specific parameters. These parameter
+ settings may be retained across jobs and may
+ potentially pose a threat to the correct operation of
+ the interpreter. The PostScript operators that set
+ system and device parameters include, but may not be
+
+
+
+Freed & Borenstein Standards Track [Page 15]
+\f
+RFC 2046 Media Types November 1996
+
+
+ limited to, the "setsystemparams" and "setdevparams"
+ operators. Message sending software should not
+ generate PostScript that depends on the setting of
+ system or device parameters to operate correctly. The
+ ability to set these parameters will probably be
+ unavailable in secure PostScript implementations.
+ Message receiving and displaying software should
+ disable the ability to change system and device
+ parameters. If these operators cannot be completely
+ disabled the password associated with them should at
+ least be set to a hard-to-guess value.
+
+ (4) Some PostScript implementations provide nonstandard
+ facilities for the direct loading and execution of
+ machine code. Such facilities are quite obviously open
+ to substantial abuse. Message sending software should
+ not make use of such features. Besides being totally
+ hardware-specific, they are also likely to be
+ unavailable in secure implementations of PostScript.
+ Message receiving and displaying software should not
+ allow such operators to be used if they exist.
+
+ (5) PostScript is an extensible language, and many, if not
+ most, implementations of it provide a number of their
+ own extensions. This document does not deal with such
+ extensions explicitly since they constitute an unknown
+ factor. Message sending software should not make use
+ of nonstandard extensions; they are likely to be
+ missing from some implementations. Message receiving
+ and displaying software should make sure that any
+ nonstandard PostScript operators are secure and don't
+ present any kind of threat.
+
+ (6) It is possible to write PostScript that consumes huge
+ amounts of various system resources. It is also
+ possible to write PostScript programs that loop
+ indefinitely. Both types of programs have the
+ potential to cause damage if sent to unsuspecting
+ recipients. Message-sending software should avoid the
+ construction and dissemination of such programs, which
+ is antisocial. Message receiving and displaying
+ software should provide appropriate mechanisms to abort
+ processing after a reasonable amount of time has
+ elapsed. In addition, PostScript interpreters should be
+ limited to the consumption of only a reasonable amount
+ of any given system resource.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 16]
+\f
+RFC 2046 Media Types November 1996
+
+
+ (7) It is possible to include raw binary information inside
+ PostScript in various forms. This is not recommended
+ for use in Internet mail, both because it is not
+ supported by all PostScript interpreters and because it
+ significantly complicates the use of a MIME Content-
+ Transfer-Encoding. (Without such binary, PostScript
+ may typically be viewed as line-oriented data. The
+ treatment of CRLF sequences becomes extremely
+ problematic if binary and line-oriented data are mixed
+ in a single Postscript data stream.)
+
+ (8) Finally, bugs may exist in some PostScript interpreters
+ which could possibly be exploited to gain unauthorized
+ access to a recipient's system. Apart from noting this
+ possibility, there is no specific action to take to
+ prevent this, apart from the timely correction of such
+ bugs if any are found.
+
+4.5.3. Other Application Subtypes
+
+ It is expected that many other subtypes of "application" will be
+ defined in the future. MIME implementations must at a minimum treat
+ any unrecognized subtypes as being equivalent to "application/octet-
+ stream".
+
+5. Composite Media Type Values
+
+ The remaining two of the seven initial Content-Type values refer to
+ composite entities. Composite entities are handled using MIME
+ mechanisms -- a MIME processor typically handles the body directly.
+
+5.1. Multipart Media Type
+
+ In the case of multipart entities, in which one or more different
+ sets of data are combined in a single body, a "multipart" media type
+ field must appear in the entity's header. The body must then contain
+ one or more body parts, each preceded by a boundary delimiter line,
+ and the last one followed by a closing boundary delimiter line.
+ After its boundary delimiter line, each body part then consists of a
+ header area, a blank line, and a body area. Thus a body part is
+ similar to an RFC 822 message in syntax, but different in meaning.
+
+ A body part is an entity and hence is NOT to be interpreted as
+ actually being an RFC 822 message. To begin with, NO header fields
+ are actually required in body parts. A body part that starts with a
+ blank line, therefore, is allowed and is a body part for which all
+ default values are to be assumed. In such a case, the absence of a
+ Content-Type header usually indicates that the corresponding body has
+
+
+
+Freed & Borenstein Standards Track [Page 17]
+\f
+RFC 2046 Media Types November 1996
+
+
+ a content-type of "text/plain; charset=US-ASCII".
+
+ The only header fields that have defined meaning for body parts are
+ those the names of which begin with "Content-". All other header
+ fields may be ignored in body parts. Although they should generally
+ be retained if at all possible, they may be discarded by gateways if
+ necessary. Such other fields are permitted to appear in body parts
+ but must not be depended on. "X-" fields may be created for
+ experimental or private purposes, with the recognition that the
+ information they contain may be lost at some gateways.
+
+ NOTE: The distinction between an RFC 822 message and a body part is
+ subtle, but important. A gateway between Internet and X.400 mail,
+ for example, must be able to tell the difference between a body part
+ that contains an image and a body part that contains an encapsulated
+ message, the body of which is a JPEG image. In order to represent
+ the latter, the body part must have "Content-Type: message/rfc822",
+ and its body (after the blank line) must be the encapsulated message,
+ with its own "Content-Type: image/jpeg" header field. The use of
+ similar syntax facilitates the conversion of messages to body parts,
+ and vice versa, but the distinction between the two must be
+ understood by implementors. (For the special case in which parts
+ actually are messages, a "digest" subtype is also defined.)
+
+ As stated previously, each body part is preceded by a boundary
+ delimiter line that contains the boundary delimiter. The boundary
+ delimiter MUST NOT appear inside any of the encapsulated parts, on a
+ line by itself or as the prefix of any line. This implies that it is
+ crucial that the composing agent be able to choose and specify a
+ unique boundary parameter value that does not contain the boundary
+ parameter value of an enclosing multipart as a prefix.
+
+ All present and future subtypes of the "multipart" type must use an
+ identical syntax. Subtypes may differ in their semantics, and may
+ impose additional restrictions on syntax, but must conform to the
+ required syntax for the "multipart" type. This requirement ensures
+ that all conformant user agents will at least be able to recognize
+ and separate the parts of any multipart entity, even those of an
+ unrecognized subtype.
+
+ As stated in the definition of the Content-Transfer-Encoding field
+ [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is
+ permitted for entities of type "multipart". The "multipart" boundary
+ delimiters and header fields are always represented as 7bit US-ASCII
+ in any case (though the header fields may encode non-US-ASCII header
+ text as per RFC 2047) and data within the body parts can be encoded
+ on a part-by-part basis, with Content-Transfer-Encoding fields for
+ each appropriate body part.
+
+
+
+Freed & Borenstein Standards Track [Page 18]
+\f
+RFC 2046 Media Types November 1996
+
+
+5.1.1. Common Syntax
+
+ This section defines a common syntax for subtypes of "multipart".
+ All subtypes of "multipart" must use this syntax. A simple example
+ of a multipart message also appears in this section. An example of a
+ more complex multipart message is given in RFC 2049.
+
+ The Content-Type field for multipart entities requires one parameter,
+ "boundary". The boundary delimiter line is then defined as a line
+ consisting entirely of two hyphen characters ("-", decimal value 45)
+ followed by the boundary parameter value from the Content-Type header
+ field, optional linear whitespace, and a terminating CRLF.
+
+ NOTE: The hyphens are for rough compatibility with the earlier RFC
+ 934 method of message encapsulation, and for ease of searching for
+ the boundaries in some implementations. However, it should be noted
+ that multipart messages are NOT completely compatible with RFC 934
+ encapsulations; in particular, they do not obey RFC 934 quoting
+ conventions for embedded lines that begin with hyphens. This
+ mechanism was chosen over the RFC 934 mechanism because the latter
+ causes lines to grow with each level of quoting. The combination of
+ this growth with the fact that SMTP implementations sometimes wrap
+ long lines made the RFC 934 mechanism unsuitable for use in the event
+ that deeply-nested multipart structuring is ever desired.
+
+ WARNING TO IMPLEMENTORS: The grammar for parameters on the Content-
+ type field is such that it is often necessary to enclose the boundary
+ parameter values in quotes on the Content-type line. This is not
+ always necessary, but never hurts. Implementors should be sure to
+ study the grammar carefully in order to avoid producing invalid
+ Content-type fields. Thus, a typical "multipart" Content-Type header
+ field might look like this:
+
+ Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p
+
+ But the following is not valid:
+
+ Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p
+
+ (because of the colon) and must instead be represented as
+
+ Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p"
+
+ This Content-Type value indicates that the content consists of one or
+ more parts, each with a structure that is syntactically identical to
+ an RFC 822 message, except that the header area is allowed to be
+ completely empty, and that the parts are each preceded by the line
+
+
+
+
+Freed & Borenstein Standards Track [Page 19]
+\f
+RFC 2046 Media Types November 1996
+
+
+ --gc0pJq0M:08jU534c0p
+
+ The boundary delimiter MUST occur at the beginning of a line, i.e.,
+ following a CRLF, and the initial CRLF is considered to be attached
+ to the boundary delimiter line rather than part of the preceding
+ part. The boundary may be followed by zero or more characters of
+ linear whitespace. It is then terminated by either another CRLF and
+ the header fields for the next part, or by two CRLFs, in which case
+ there are no header fields for the next part. If no Content-Type
+ field is present it is assumed to be "message/rfc822" in a
+ "multipart/digest" and "text/plain" otherwise.
+
+ NOTE: The CRLF preceding the boundary delimiter line is conceptually
+ attached to the boundary so that it is possible to have a part that
+ does not end with a CRLF (line break). Body parts that must be
+ considered to end with line breaks, therefore, must have two CRLFs
+ preceding the boundary delimiter line, the first of which is part of
+ the preceding body part, and the second of which is part of the
+ encapsulation boundary.
+
+ Boundary delimiters must not appear within the encapsulated material,
+ and must be no longer than 70 characters, not counting the two
+ leading hyphens.
+
+ The boundary delimiter line following the last body part is a
+ distinguished delimiter that indicates that no further body parts
+ will follow. Such a delimiter line is identical to the previous
+ delimiter lines, with the addition of two more hyphens after the
+ boundary parameter value.
+
+ --gc0pJq0M:08jU534c0p--
+
+ NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the
+ boundary value with the beginning of each candidate line. An exact
+ match of the entire candidate line is not required; it is sufficient
+ that the boundary appear in its entirety following the CRLF.
+
+ There appears to be room for additional information prior to the
+ first boundary delimiter line and following the final boundary
+ delimiter line. These areas should generally be left blank, and
+ implementations must ignore anything that appears before the first
+ boundary delimiter line or after the last one.
+
+ NOTE: These "preamble" and "epilogue" areas are generally not used
+ because of the lack of proper typing of these parts and the lack of
+ clear semantics for handling these areas at gateways, particularly
+ X.400 gateways. However, rather than leaving the preamble area
+ blank, many MIME implementations have found this to be a convenient
+
+
+
+Freed & Borenstein Standards Track [Page 20]
+\f
+RFC 2046 Media Types November 1996
+
+
+ place to insert an explanatory note for recipients who read the
+ message with pre-MIME software, since such notes will be ignored by
+ MIME-compliant software.
+
+ NOTE: Because boundary delimiters must not appear in the body parts
+ being encapsulated, a user agent must exercise care to choose a
+ unique boundary parameter value. The boundary parameter value in the
+ example above could have been the result of an algorithm designed to
+ produce boundary delimiters with a very low probability of already
+ existing in the data to be encapsulated without having to prescan the
+ data. Alternate algorithms might result in more "readable" boundary
+ delimiters for a recipient with an old user agent, but would require
+ more attention to the possibility that the boundary delimiter might
+ appear at the beginning of some line in the encapsulated part. The
+ simplest boundary delimiter line possible is something like "---",
+ with a closing boundary delimiter line of "-----".
+
+ As a very simple example, the following multipart message has two
+ parts, both of them plain text, one of them explicitly typed and one
+ of them implicitly typed:
+
+ From: Nathaniel Borenstein <nsb@bellcore.com>
+ To: Ned Freed <ned@innosoft.com>
+ Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST)
+ Subject: Sample message
+ MIME-Version: 1.0
+ Content-type: multipart/mixed; boundary="simple boundary"
+
+ This is the preamble. It is to be ignored, though it
+ is a handy place for composition agents to include an
+ explanatory note to non-MIME conformant readers.
+
+ --simple boundary
+
+ This is implicitly typed plain US-ASCII text.
+ It does NOT end with a linebreak.
+ --simple boundary
+ Content-type: text/plain; charset=us-ascii
+
+ This is explicitly typed plain US-ASCII text.
+ It DOES end with a linebreak.
+
+ --simple boundary--
+
+ This is the epilogue. It is also to be ignored.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 21]
+\f
+RFC 2046 Media Types November 1996
+
+
+ The use of a media type of "multipart" in a body part within another
+ "multipart" entity is explicitly allowed. In such cases, for obvious
+ reasons, care must be taken to ensure that each nested "multipart"
+ entity uses a different boundary delimiter. See RFC 2049 for an
+ example of nested "multipart" entities.
+
+ The use of the "multipart" media type with only a single body part
+ may be useful in certain contexts, and is explicitly permitted.
+
+ NOTE: Experience has shown that a "multipart" media type with a
+ single body part is useful for sending non-text media types. It has
+ the advantage of providing the preamble as a place to include
+ decoding instructions. In addition, a number of SMTP gateways move
+ or remove the MIME headers, and a clever MIME decoder can take a good
+ guess at multipart boundaries even in the absence of the Content-Type
+ header and thereby successfully decode the message.
+
+ The only mandatory global parameter for the "multipart" media type is
+ the boundary parameter, which consists of 1 to 70 characters from a
+ set of characters known to be very robust through mail gateways, and
+ NOT ending with white space. (If a boundary delimiter line appears to
+ end with white space, the white space must be presumed to have been
+ added by a gateway, and must be deleted.) It is formally specified
+ by the following BNF:
+
+ boundary := 0*69<bchars> bcharsnospace
+
+ bchars := bcharsnospace / " "
+
+ bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
+ "+" / "_" / "," / "-" / "." /
+ "/" / ":" / "=" / "?"
+
+ Overall, the body of a "multipart" entity may be specified as
+ follows:
+
+ dash-boundary := "--" boundary
+ ; boundary taken from the value of
+ ; boundary parameter of the
+ ; Content-Type field.
+
+ multipart-body := [preamble CRLF]
+ dash-boundary transport-padding CRLF
+ body-part *encapsulation
+ close-delimiter transport-padding
+ [CRLF epilogue]
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 22]
+\f
+RFC 2046 Media Types November 1996
+
+
+ transport-padding := *LWSP-char
+ ; Composers MUST NOT generate
+ ; non-zero length transport
+ ; padding, but receivers MUST
+ ; be able to handle padding
+ ; added by message transports.
+
+ encapsulation := delimiter transport-padding
+ CRLF body-part
+
+ delimiter := CRLF dash-boundary
+
+ close-delimiter := delimiter "--"
+
+ preamble := discard-text
+
+ epilogue := discard-text
+
+ discard-text := *(*text CRLF) *text
+ ; May be ignored or discarded.
+
+ body-part := MIME-part-headers [CRLF *OCTET]
+ ; Lines in a body-part must not start
+ ; with the specified dash-boundary and
+ ; the delimiter must not appear anywhere
+ ; in the body part. Note that the
+ ; semantics of a body-part differ from
+ ; the semantics of a message, as
+ ; described in the text.
+
+ OCTET := <any 0-255 octet value>
+
+ IMPORTANT: The free insertion of linear-white-space and RFC 822
+ comments between the elements shown in this BNF is NOT allowed since
+ this BNF does not specify a structured header field.
+
+ NOTE: In certain transport enclaves, RFC 822 restrictions such as
+ the one that limits bodies to printable US-ASCII characters may not
+ be in force. (That is, the transport domains may exist that resemble
+ standard Internet mail transport as specified in RFC 821 and assumed
+ by RFC 822, but without certain restrictions.) The relaxation of
+ these restrictions should be construed as locally extending the
+ definition of bodies, for example to include octets outside of the
+ US-ASCII range, as long as these extensions are supported by the
+ transport and adequately documented in the Content- Transfer-Encoding
+ header field. However, in no event are headers (either message
+ headers or body part headers) allowed to contain anything other than
+ US-ASCII characters.
+
+
+
+Freed & Borenstein Standards Track [Page 23]
+\f
+RFC 2046 Media Types November 1996
+
+
+ NOTE: Conspicuously missing from the "multipart" type is a notion of
+ structured, related body parts. It is recommended that those wishing
+ to provide more structured or integrated multipart messaging
+ facilities should define subtypes of multipart that are syntactically
+ identical but define relationships between the various parts. For
+ example, subtypes of multipart could be defined that include a
+ distinguished part which in turn is used to specify the relationships
+ between the other parts, probably referring to them by their
+ Content-ID field. Old implementations will not recognize the new
+ subtype if this approach is used, but will treat it as
+ multipart/mixed and will thus be able to show the user the parts that
+ are recognized.
+
+5.1.2. Handling Nested Messages and Multiparts
+
+ The "message/rfc822" subtype defined in a subsequent section of this
+ document has no terminating condition other than running out of data.
+ Similarly, an improperly truncated "multipart" entity may not have
+ any terminating boundary marker, and can turn up operationally due to
+ mail system malfunctions.
+
+ It is essential that such entities be handled correctly when they are
+ themselves imbedded inside of another "multipart" structure. MIME
+ implementations are therefore required to recognize outer level
+ boundary markers at ANY level of inner nesting. It is not sufficient
+ to only check for the next expected marker or other terminating
+ condition.
+
+5.1.3. Mixed Subtype
+
+ The "mixed" subtype of "multipart" is intended for use when the body
+ parts are independent and need to be bundled in a particular order.
+ Any "multipart" subtypes that an implementation does not recognize
+ must be treated as being of subtype "mixed".
+
+5.1.4. Alternative Subtype
+
+ The "multipart/alternative" type is syntactically identical to
+ "multipart/mixed", but the semantics are different. In particular,
+ each of the body parts is an "alternative" version of the same
+ information.
+
+ Systems should recognize that the content of the various parts are
+ interchangeable. Systems should choose the "best" type based on the
+ local environment and references, in some cases even through user
+ interaction. As with "multipart/mixed", the order of body parts is
+ significant. In this case, the alternatives appear in an order of
+ increasing faithfulness to the original content. In general, the
+
+
+
+Freed & Borenstein Standards Track [Page 24]
+\f
+RFC 2046 Media Types November 1996
+
+
+ best choice is the LAST part of a type supported by the recipient
+ system's local environment.
+
+ "Multipart/alternative" may be used, for example, to send a message
+ in a fancy text format in such a way that it can easily be displayed
+ anywhere:
+
+ From: Nathaniel Borenstein <nsb@bellcore.com>
+ To: Ned Freed <ned@innosoft.com>
+ Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
+ Subject: Formatted text mail
+ MIME-Version: 1.0
+ Content-Type: multipart/alternative; boundary=boundary42
+
+ --boundary42
+ Content-Type: text/plain; charset=us-ascii
+
+ ... plain text version of message goes here ...
+
+ --boundary42
+ Content-Type: text/enriched
+
+ ... RFC 1896 text/enriched version of same message
+ goes here ...
+
+ --boundary42
+ Content-Type: application/x-whatever
+
+ ... fanciest version of same message goes here ...
+
+ --boundary42--
+
+ In this example, users whose mail systems understood the
+ "application/x-whatever" format would see only the fancy version,
+ while other users would see only the enriched or plain text version,
+ depending on the capabilities of their system.
+
+ In general, user agents that compose "multipart/alternative" entities
+ must place the body parts in increasing order of preference, that is,
+ with the preferred format last. For fancy text, the sending user
+ agent should put the plainest format first and the richest format
+ last. Receiving user agents should pick and display the last format
+ they are capable of displaying. In the case where one of the
+ alternatives is itself of type "multipart" and contains unrecognized
+ sub-parts, the user agent may choose either to show that alternative,
+ an earlier alternative, or both.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 25]
+\f
+RFC 2046 Media Types November 1996
+
+
+ NOTE: From an implementor's perspective, it might seem more sensible
+ to reverse this ordering, and have the plainest alternative last.
+ However, placing the plainest alternative first is the friendliest
+ possible option when "multipart/alternative" entities are viewed
+ using a non-MIME-conformant viewer. While this approach does impose
+ some burden on conformant MIME viewers, interoperability with older
+ mail readers was deemed to be more important in this case.
+
+ It may be the case that some user agents, if they can recognize more
+ than one of the formats, will prefer to offer the user the choice of
+ which format to view. This makes sense, for example, if a message
+ includes both a nicely- formatted image version and an easily-edited
+ text version. What is most critical, however, is that the user not
+ automatically be shown multiple versions of the same data. Either
+ the user should be shown the last recognized version or should be
+ given the choice.
+
+ THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a
+ "multipart/alternative" entity represents the same data, but the
+ mappings between the two are not necessarily without information
+ loss. For example, information is lost when translating ODA to
+ PostScript or plain text. It is recommended that each part should
+ have a different Content-ID value in the case where the information
+ content of the two parts is not identical. And when the information
+ content is identical -- for example, where several parts of type
+ "message/external-body" specify alternate ways to access the
+ identical data -- the same Content-ID field value should be used, to
+ optimize any caching mechanisms that might be present on the
+ recipient's end. However, the Content-ID values used by the parts
+ should NOT be the same Content-ID value that describes the
+ "multipart/alternative" as a whole, if there is any such Content-ID
+ field. That is, one Content-ID value will refer to the
+ "multipart/alternative" entity, while one or more other Content-ID
+ values will refer to the parts inside it.
+
+5.1.5. Digest Subtype
+
+ This document defines a "digest" subtype of the "multipart" Content-
+ Type. This type is syntactically identical to "multipart/mixed", but
+ the semantics are different. In particular, in a digest, the default
+ Content-Type value for a body part is changed from "text/plain" to
+ "message/rfc822". This is done to allow a more readable digest
+ format that is largely compatible (except for the quoting convention)
+ with RFC 934.
+
+ Note: Though it is possible to specify a Content-Type value for a
+ body part in a digest which is other than "message/rfc822", such as a
+ "text/plain" part containing a description of the material in the
+
+
+
+Freed & Borenstein Standards Track [Page 26]
+\f
+RFC 2046 Media Types November 1996
+
+
+ digest, actually doing so is undesireble. The "multipart/digest"
+ Content-Type is intended to be used to send collections of messages.
+ If a "text/plain" part is needed, it should be included as a seperate
+ part of a "multipart/mixed" message.
+
+ A digest in this format might, then, look something like this:
+
+ From: Moderator-Address
+ To: Recipient-List
+ Date: Mon, 22 Mar 1994 13:34:51 +0000
+ Subject: Internet Digest, volume 42
+ MIME-Version: 1.0
+ Content-Type: multipart/mixed;
+ boundary="---- main boundary ----"
+
+ ------ main boundary ----
+
+ ...Introductory text or table of contents...
+
+ ------ main boundary ----
+ Content-Type: multipart/digest;
+ boundary="---- next message ----"
+
+ ------ next message ----
+
+ From: someone-else
+ Date: Fri, 26 Mar 1993 11:13:32 +0200
+ Subject: my opinion
+
+ ...body goes here ...
+
+ ------ next message ----
+
+ From: someone-else-again
+ Date: Fri, 26 Mar 1993 10:07:13 -0500
+ Subject: my different opinion
+
+ ... another body goes here ...
+
+ ------ next message ------
+
+ ------ main boundary ------
+
+5.1.6. Parallel Subtype
+
+ This document defines a "parallel" subtype of the "multipart"
+ Content-Type. This type is syntactically identical to
+ "multipart/mixed", but the semantics are different. In particular,
+
+
+
+Freed & Borenstein Standards Track [Page 27]
+\f
+RFC 2046 Media Types November 1996
+
+
+ in a parallel entity, the order of body parts is not significant.
+
+ A common presentation of this type is to display all of the parts
+ simultaneously on hardware and software that are capable of doing so.
+ However, composing agents should be aware that many mail readers will
+ lack this capability and will show the parts serially in any event.
+
+5.1.7. Other Multipart Subtypes
+
+ Other "multipart" subtypes are expected in the future. MIME
+ implementations must in general treat unrecognized subtypes of
+ "multipart" as being equivalent to "multipart/mixed".
+
+5.2. Message Media Type
+
+ It is frequently desirable, in sending mail, to encapsulate another
+ mail message. A special media type, "message", is defined to
+ facilitate this. In particular, the "rfc822" subtype of "message" is
+ used to encapsulate RFC 822 messages.
+
+ NOTE: It has been suggested that subtypes of "message" might be
+ defined for forwarded or rejected messages. However, forwarded and
+ rejected messages can be handled as multipart messages in which the
+ first part contains any control or descriptive information, and a
+ second part, of type "message/rfc822", is the forwarded or rejected
+ message. Composing rejection and forwarding messages in this manner
+ will preserve the type information on the original message and allow
+ it to be correctly presented to the recipient, and hence is strongly
+ encouraged.
+
+ Subtypes of "message" often impose restrictions on what encodings are
+ allowed. These restrictions are described in conjunction with each
+ specific subtype.
+
+ Mail gateways, relays, and other mail handling agents are commonly
+ known to alter the top-level header of an RFC 822 message. In
+ particular, they frequently add, remove, or reorder header fields.
+ These operations are explicitly forbidden for the encapsulated
+ headers embedded in the bodies of messages of type "message."
+
+5.2.1. RFC822 Subtype
+
+ A media type of "message/rfc822" indicates that the body contains an
+ encapsulated message, with the syntax of an RFC 822 message.
+ However, unlike top-level RFC 822 messages, the restriction that each
+ "message/rfc822" body must include a "From", "Date", and at least one
+ destination header is removed and replaced with the requirement that
+ at least one of "From", "Subject", or "Date" must be present.
+
+
+
+Freed & Borenstein Standards Track [Page 28]
+\f
+RFC 2046 Media Types November 1996
+
+
+ It should be noted that, despite the use of the numbers "822", a
+ "message/rfc822" entity isn't restricted to material in strict
+ conformance to RFC822, nor are the semantics of "message/rfc822"
+ objects restricted to the semantics defined in RFC822. More
+ specifically, a "message/rfc822" message could well be a News article
+ or a MIME message.
+
+ No encoding other than "7bit", "8bit", or "binary" is permitted for
+ the body of a "message/rfc822" entity. The message header fields are
+ always US-ASCII in any case, and data within the body can still be
+ encoded, in which case the Content-Transfer-Encoding header field in
+ the encapsulated message will reflect this. Non-US-ASCII text in the
+ headers of an encapsulated message can be specified using the
+ mechanisms described in RFC 2047.
+
+5.2.2. Partial Subtype
+
+ The "partial" subtype is defined to allow large entities to be
+ delivered as several separate pieces of mail and automatically
+ reassembled by a receiving user agent. (The concept is similar to IP
+ fragmentation and reassembly in the basic Internet Protocols.) This
+ mechanism can be used when intermediate transport agents limit the
+ size of individual messages that can be sent. The media type
+ "message/partial" thus indicates that the body contains a fragment of
+ a larger entity.
+
+ Because data of type "message" may never be encoded in base64 or
+ quoted-printable, a problem might arise if "message/partial" entities
+ are constructed in an environment that supports binary or 8bit
+ transport. The problem is that the binary data would be split into
+ multiple "message/partial" messages, each of them requiring binary
+ transport. If such messages were encountered at a gateway into a
+ 7bit transport environment, there would be no way to properly encode
+ them for the 7bit world, aside from waiting for all of the fragments,
+ reassembling the inner message, and then encoding the reassembled
+ data in base64 or quoted-printable. Since it is possible that
+ different fragments might go through different gateways, even this is
+ not an acceptable solution. For this reason, it is specified that
+ entities of type "message/partial" must always have a content-
+ transfer-encoding of 7bit (the default). In particular, even in
+ environments that support binary or 8bit transport, the use of a
+ content- transfer-encoding of "8bit" or "binary" is explicitly
+ prohibited for MIME entities of type "message/partial". This in turn
+ implies that the inner message must not use "8bit" or "binary"
+ encoding.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 29]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Because some message transfer agents may choose to automatically
+ fragment large messages, and because such agents may use very
+ different fragmentation thresholds, it is possible that the pieces of
+ a partial message, upon reassembly, may prove themselves to comprise
+ a partial message. This is explicitly permitted.
+
+ Three parameters must be specified in the Content-Type field of type
+ "message/partial": The first, "id", is a unique identifier, as close
+ to a world-unique identifier as possible, to be used to match the
+ fragments together. (In general, the identifier is essentially a
+ message-id; if placed in double quotes, it can be ANY message-id, in
+ accordance with the BNF for "parameter" given in RFC 2045.) The
+ second, "number", an integer, is the fragment number, which indicates
+ where this fragment fits into the sequence of fragments. The third,
+ "total", another integer, is the total number of fragments. This
+ third subfield is required on the final fragment, and is optional
+ (though encouraged) on the earlier fragments. Note also that these
+ parameters may be given in any order.
+
+ Thus, the second piece of a 3-piece message may have either of the
+ following header fields:
+
+ Content-Type: Message/Partial; number=2; total=3;
+ id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
+
+ Content-Type: Message/Partial;
+ id="oc=jpbe0M2Yt4s@thumper.bellcore.com";
+ number=2
+
+ But the third piece MUST specify the total number of fragments:
+
+ Content-Type: Message/Partial; number=3; total=3;
+ id="oc=jpbe0M2Yt4s@thumper.bellcore.com"
+
+ Note that fragment numbering begins with 1, not 0.
+
+ When the fragments of an entity broken up in this manner are put
+ together, the result is always a complete MIME entity, which may have
+ its own Content-Type header field, and thus may contain any other
+ data type.
+
+5.2.2.1. Message Fragmentation and Reassembly
+
+ The semantics of a reassembled partial message must be those of the
+ "inner" message, rather than of a message containing the inner
+ message. This makes it possible, for example, to send a large audio
+ message as several partial messages, and still have it appear to the
+ recipient as a simple audio message rather than as an encapsulated
+
+
+
+Freed & Borenstein Standards Track [Page 30]
+\f
+RFC 2046 Media Types November 1996
+
+
+ message containing an audio message. That is, the encapsulation of
+ the message is considered to be "transparent".
+
+ When generating and reassembling the pieces of a "message/partial"
+ message, the headers of the encapsulated message must be merged with
+ the headers of the enclosing entities. In this process the following
+ rules must be observed:
+
+ (1) Fragmentation agents must split messages at line
+ boundaries only. This restriction is imposed because
+ splits at points other than the ends of lines in turn
+ depends on message transports being able to preserve
+ the semantics of messages that don't end with a CRLF
+ sequence. Many transports are incapable of preserving
+ such semantics.
+
+ (2) All of the header fields from the initial enclosing
+ message, except those that start with "Content-" and
+ the specific header fields "Subject", "Message-ID",
+ "Encrypted", and "MIME-Version", must be copied, in
+ order, to the new message.
+
+ (3) The header fields in the enclosed message which start
+ with "Content-", plus the "Subject", "Message-ID",
+ "Encrypted", and "MIME-Version" fields, must be
+ appended, in order, to the header fields of the new
+ message. Any header fields in the enclosed message
+ which do not start with "Content-" (except for the
+ "Subject", "Message-ID", "Encrypted", and "MIME-
+ Version" fields) will be ignored and dropped.
+
+ (4) All of the header fields from the second and any
+ subsequent enclosing messages are discarded by the
+ reassembly process.
+
+5.2.2.2. Fragmentation and Reassembly Example
+
+ If an audio message is broken into two pieces, the first piece might
+ look something like this:
+
+ X-Weird-Header-1: Foo
+ From: Bill@host.com
+ To: joe@otherhost.com
+ Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
+ Subject: Audio mail (part 1 of 2)
+ Message-ID: <id1@host.com>
+ MIME-Version: 1.0
+ Content-type: message/partial; id="ABC@host.com";
+
+
+
+Freed & Borenstein Standards Track [Page 31]
+\f
+RFC 2046 Media Types November 1996
+
+
+ number=1; total=2
+
+ X-Weird-Header-1: Bar
+ X-Weird-Header-2: Hello
+ Message-ID: <anotherid@foo.com>
+ Subject: Audio mail
+ MIME-Version: 1.0
+ Content-type: audio/basic
+ Content-transfer-encoding: base64
+
+ ... first half of encoded audio data goes here ...
+
+ and the second half might look something like this:
+
+ From: Bill@host.com
+ To: joe@otherhost.com
+ Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
+ Subject: Audio mail (part 2 of 2)
+ MIME-Version: 1.0
+ Message-ID: <id2@host.com>
+ Content-type: message/partial;
+ id="ABC@host.com"; number=2; total=2
+
+ ... second half of encoded audio data goes here ...
+
+ Then, when the fragmented message is reassembled, the resulting
+ message to be displayed to the user should look something like this:
+
+ X-Weird-Header-1: Foo
+ From: Bill@host.com
+ To: joe@otherhost.com
+ Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST)
+ Subject: Audio mail
+ Message-ID: <anotherid@foo.com>
+ MIME-Version: 1.0
+ Content-type: audio/basic
+ Content-transfer-encoding: base64
+
+ ... first half of encoded audio data goes here ...
+ ... second half of encoded audio data goes here ...
+
+ The inclusion of a "References" field in the headers of the second
+ and subsequent pieces of a fragmented message that references the
+ Message-Id on the previous piece may be of benefit to mail readers
+ that understand and track references. However, the generation of
+ such "References" fields is entirely optional.
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 32]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Finally, it should be noted that the "Encrypted" header field has
+ been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421,
+ RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless
+ believed to describe the correct way to treat it if it is encountered
+ in the context of conversion to and from "message/partial" fragments.
+
+5.2.3. External-Body Subtype
+
+ The external-body subtype indicates that the actual body data are not
+ included, but merely referenced. In this case, the parameters
+ describe a mechanism for accessing the external data.
+
+ When a MIME entity is of type "message/external-body", it consists of
+ a header, two consecutive CRLFs, and the message header for the
+ encapsulated message. If another pair of consecutive CRLFs appears,
+ this of course ends the message header for the encapsulated message.
+ However, since the encapsulated message's body is itself external, it
+ does NOT appear in the area that follows. For example, consider the
+ following message:
+
+ Content-type: message/external-body;
+ access-type=local-file;
+ name="/u/nsb/Me.jpeg"
+
+ Content-type: image/jpeg
+ Content-ID: <id42@guppylake.bellcore.com>
+ Content-Transfer-Encoding: binary
+
+ THIS IS NOT REALLY THE BODY!
+
+ The area at the end, which might be called the "phantom body", is
+ ignored for most external-body messages. However, it may be used to
+ contain auxiliary information for some such messages, as indeed it is
+ when the access-type is "mail- server". The only access-type defined
+ in this document that uses the phantom body is "mail-server", but
+ other access-types may be defined in the future in other
+ specifications that use this area.
+
+ The encapsulated headers in ALL "message/external-body" entities MUST
+ include a Content-ID header field to give a unique identifier by
+ which to reference the data. This identifier may be used for caching
+ mechanisms, and for recognizing the receipt of the data when the
+ access-type is "mail-server".
+
+ Note that, as specified here, the tokens that describe external-body
+ data, such as file names and mail server commands, are required to be
+ in the US-ASCII character set.
+
+
+
+
+Freed & Borenstein Standards Track [Page 33]
+\f
+RFC 2046 Media Types November 1996
+
+
+ If this proves problematic in practice, a new mechanism may be
+ required as a future extension to MIME, either as newly defined
+ access-types for "message/external-body" or by some other mechanism.
+
+ As with "message/partial", MIME entities of type "message/external-
+ body" MUST have a content-transfer-encoding of 7bit (the default).
+ In particular, even in environments that support binary or 8bit
+ transport, the use of a content- transfer-encoding of "8bit" or
+ "binary" is explicitly prohibited for entities of type
+ "message/external-body".
+
+5.2.3.1. General External-Body Parameters
+
+ The parameters that may be used with any "message/external- body"
+ are:
+
+ (1) ACCESS-TYPE -- A word indicating the supported access
+ mechanism by which the file or data may be obtained.
+ This word is not case sensitive. Values include, but
+ are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL-
+ FILE", and "MAIL-SERVER". Future values, except for
+ experimental values beginning with "X-", must be
+ registered with IANA, as described in RFC 2048.
+ This parameter is unconditionally mandatory and MUST be
+ present on EVERY "message/external-body".
+
+ (2) EXPIRATION -- The date (in the RFC 822 "date-time"
+ syntax, as extended by RFC 1123 to permit 4 digits in
+ the year field) after which the existence of the
+ external data is not guaranteed. This parameter may be
+ used with ANY access-type and is ALWAYS optional.
+
+ (3) SIZE -- The size (in octets) of the data. The intent
+ of this parameter is to help the recipient decide
+ whether or not to expend the necessary resources to
+ retrieve the external data. Note that this describes
+ the size of the data in its canonical form, that is,
+ before any Content-Transfer-Encoding has been applied
+ or after the data have been decoded. This parameter
+ may be used with ANY access-type and is ALWAYS
+ optional.
+
+ (4) PERMISSION -- A case-insensitive field that indicates
+ whether or not it is expected that clients might also
+ attempt to overwrite the data. By default, or if
+ permission is "read", the assumption is that they are
+ not, and that if the data is retrieved once, it is
+ never needed again. If PERMISSION is "read-write",
+
+
+
+Freed & Borenstein Standards Track [Page 34]
+\f
+RFC 2046 Media Types November 1996
+
+
+ this assumption is invalid, and any local copy must be
+ considered no more than a cache. "Read" and "Read-
+ write" are the only defined values of permission. This
+ parameter may be used with ANY access-type and is
+ ALWAYS optional.
+
+ The precise semantics of the access-types defined here are described
+ in the sections that follow.
+
+5.2.3.2. The 'ftp' and 'tftp' Access-Types
+
+ An access-type of FTP or TFTP indicates that the message body is
+ accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783]
+ protocols, respectively. For these access-types, the following
+ additional parameters are mandatory:
+
+ (1) NAME -- The name of the file that contains the actual
+ body data.
+
+ (2) SITE -- A machine from which the file may be obtained,
+ using the given protocol. This must be a fully
+ qualified domain name, not a nickname.
+
+ (3) Before any data are retrieved, using FTP, the user will
+ generally need to be asked to provide a login id and a
+ password for the machine named by the site parameter.
+ For security reasons, such an id and password are not
+ specified as content-type parameters, but must be
+ obtained from the user.
+
+ In addition, the following parameters are optional:
+
+ (1) DIRECTORY -- A directory from which the data named by
+ NAME should be retrieved.
+
+ (2) MODE -- A case-insensitive string indicating the mode
+ to be used when retrieving the information. The valid
+ values for access-type "TFTP" are "NETASCII", "OCTET",
+ and "MAIL", as specified by the TFTP protocol [RFC-
+ 783]. The valid values for access-type "FTP" are
+ "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a
+ decimal integer, typically 8. These correspond to the
+ representation types "A" "E" "I" and "L n" as specified
+ by the FTP protocol [RFC-959]. Note that "BINARY" and
+ "TENEX" are not valid values for MODE and that "OCTET"
+ or "IMAGE" or "LOCAL8" should be used instead. IF MODE
+ is not specified, the default value is "NETASCII" for
+ TFTP and "ASCII" otherwise.
+
+
+
+Freed & Borenstein Standards Track [Page 35]
+\f
+RFC 2046 Media Types November 1996
+
+
+5.2.3.3. The 'anon-ftp' Access-Type
+
+ The "anon-ftp" access-type is identical to the "ftp" access type,
+ except that the user need not be asked to provide a name and password
+ for the specified site. Instead, the ftp protocol will be used with
+ login "anonymous" and a password that corresponds to the user's mail
+ address.
+
+5.2.3.4. The 'local-file' Access-Type
+
+ An access-type of "local-file" indicates that the actual body is
+ accessible as a file on the local machine. Two additional parameters
+ are defined for this access type:
+
+ (1) NAME -- The name of the file that contains the actual
+ body data. This parameter is mandatory for the
+ "local-file" access-type.
+
+ (2) SITE -- A domain specifier for a machine or set of
+ machines that are known to have access to the data
+ file. This optional parameter is used to describe the
+ locality of reference for the data, that is, the site
+ or sites at which the file is expected to be visible.
+ Asterisks may be used for wildcard matching to a part
+ of a domain name, such as "*.bellcore.com", to indicate
+ a set of machines on which the data should be directly
+ visible, while a single asterisk may be used to
+ indicate a file that is expected to be universally
+ available, e.g., via a global file system.
+
+5.2.3.5. The 'mail-server' Access-Type
+
+ The "mail-server" access-type indicates that the actual body is
+ available from a mail server. Two additional parameters are defined
+ for this access-type:
+
+ (1) SERVER -- The addr-spec of the mail server from which
+ the actual body data can be obtained. This parameter
+ is mandatory for the "mail-server" access-type.
+
+ (2) SUBJECT -- The subject that is to be used in the mail
+ that is sent to obtain the data. Note that keying mail
+ servers on Subject lines is NOT recommended, but such
+ mail servers are known to exist. This is an optional
+ parameter.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 36]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Because mail servers accept a variety of syntaxes, some of which is
+ multiline, the full command to be sent to a mail server is not
+ included as a parameter in the content-type header field. Instead,
+ it is provided as the "phantom body" when the media type is
+ "message/external-body" and the access-type is mail-server.
+
+ Note that MIME does not define a mail server syntax. Rather, it
+ allows the inclusion of arbitrary mail server commands in the phantom
+ body. Implementations must include the phantom body in the body of
+ the message it sends to the mail server address to retrieve the
+ relevant data.
+
+ Unlike other access-types, mail-server access is asynchronous and
+ will happen at an unpredictable time in the future. For this reason,
+ it is important that there be a mechanism by which the returned data
+ can be matched up with the original "message/external-body" entity.
+ MIME mail servers must use the same Content-ID field on the returned
+ message that was used in the original "message/external-body"
+ entities, to facilitate such matching.
+
+5.2.3.6. External-Body Security Issues
+
+ "Message/external-body" entities give rise to two important security
+ issues:
+
+ (1) Accessing data via a "message/external-body" reference
+ effectively results in the message recipient performing
+ an operation that was specified by the message
+ originator. It is therefore possible for the message
+ originator to trick a recipient into doing something
+ they would not have done otherwise. For example, an
+ originator could specify a action that attempts
+ retrieval of material that the recipient is not
+ authorized to obtain, causing the recipient to
+ unwittingly violate some security policy. For this
+ reason, user agents capable of resolving external
+ references must always take steps to describe the
+ action they are to take to the recipient and ask for
+ explicit permisssion prior to performing it.
+
+ The 'mail-server' access-type is particularly
+ vulnerable, in that it causes the recipient to send a
+ new message whose contents are specified by the
+ original message's originator. Given the potential for
+ abuse, any such request messages that are constructed
+ should contain a clear indication that they were
+ generated automatically (e.g. in a Comments: header
+ field) in an attempt to resolve a MIME
+
+
+
+Freed & Borenstein Standards Track [Page 37]
+\f
+RFC 2046 Media Types November 1996
+
+
+ "message/external-body" reference.
+
+ (2) MIME will sometimes be used in environments that
+ provide some guarantee of message integrity and
+ authenticity. If present, such guarantees may apply
+ only to the actual direct content of messages -- they
+ may or may not apply to data accessed through MIME's
+ "message/external-body" mechanism. In particular, it
+ may be possible to subvert certain access mechanisms
+ even when the messaging system itself is secure.
+
+ It should be noted that this problem exists either with
+ or without the availabilty of MIME mechanisms. A
+ casual reference to an FTP site containing a document
+ in the text of a secure message brings up similar
+ issues -- the only difference is that MIME provides for
+ automatic retrieval of such material, and users may
+ place unwarranted trust is such automatic retrieval
+ mechanisms.
+
+5.2.3.7. Examples and Further Explanations
+
+ When the external-body mechanism is used in conjunction with the
+ "multipart/alternative" media type it extends the functionality of
+ "multipart/alternative" to include the case where the same entity is
+ provided in the same format but via different accces mechanisms.
+ When this is done the originator of the message must order the parts
+ first in terms of preferred formats and then by preferred access
+ mechanisms. The recipient's viewer should then evaluate the list
+ both in terms of format and access mechanisms.
+
+ With the emerging possibility of very wide-area file systems, it
+ becomes very hard to know in advance the set of machines where a file
+ will and will not be accessible directly from the file system.
+ Therefore it may make sense to provide both a file name, to be tried
+ directly, and the name of one or more sites from which the file is
+ known to be accessible. An implementation can try to retrieve remote
+ files using FTP or any other protocol, using anonymous file retrieval
+ or prompting the user for the necessary name and password. If an
+ external body is accessible via multiple mechanisms, the sender may
+ include multiple entities of type "message/external-body" within the
+ body parts of an enclosing "multipart/alternative" entity.
+
+ However, the external-body mechanism is not intended to be limited to
+ file retrieval, as shown by the mail-server access-type. Beyond
+ this, one can imagine, for example, using a video server for external
+ references to video clips.
+
+
+
+
+Freed & Borenstein Standards Track [Page 38]
+\f
+RFC 2046 Media Types November 1996
+
+
+ The embedded message header fields which appear in the body of the
+ "message/external-body" data must be used to declare the media type
+ of the external body if it is anything other than plain US-ASCII
+ text, since the external body does not have a header section to
+ declare its type. Similarly, any Content-transfer-encoding other
+ than "7bit" must also be declared here. Thus a complete
+ "message/external-body" message, referring to an object in PostScript
+ format, might look like this:
+
+ From: Whomever
+ To: Someone
+ Date: Whenever
+ Subject: whatever
+ MIME-Version: 1.0
+ Message-ID: <id1@host.com>
+ Content-Type: multipart/alternative; boundary=42
+ Content-ID: <id001@guppylake.bellcore.com>
+
+ --42
+ Content-Type: message/external-body; name="BodyFormats.ps";
+ site="thumper.bellcore.com"; mode="image";
+ access-type=ANON-FTP; directory="pub";
+ expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
+
+ Content-type: application/postscript
+ Content-ID: <id42@guppylake.bellcore.com>
+
+ --42
+ Content-Type: message/external-body; access-type=local-file;
+ name="/u/nsb/writing/rfcs/RFC-MIME.ps";
+ site="thumper.bellcore.com";
+ expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
+
+ Content-type: application/postscript
+ Content-ID: <id42@guppylake.bellcore.com>
+
+ --42
+ Content-Type: message/external-body;
+ access-type=mail-server
+ server="listserv@bogus.bitnet";
+ expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)"
+
+ Content-type: application/postscript
+ Content-ID: <id42@guppylake.bellcore.com>
+
+ get RFC-MIME.DOC
+
+ --42--
+
+
+
+Freed & Borenstein Standards Track [Page 39]
+\f
+RFC 2046 Media Types November 1996
+
+
+ Note that in the above examples, the default Content-transfer-
+ encoding of "7bit" is assumed for the external postscript data.
+
+ Like the "message/partial" type, the "message/external-body" media
+ type is intended to be transparent, that is, to convey the data type
+ in the external body rather than to convey a message with a body of
+ that type. Thus the headers on the outer and inner parts must be
+ merged using the same rules as for "message/partial". In particular,
+ this means that the Content-type and Subject fields are overridden,
+ but the From field is preserved.
+
+ Note that since the external bodies are not transported along with
+ the external body reference, they need not conform to transport
+ limitations that apply to the reference itself. In particular,
+ Internet mail transports may impose 7bit and line length limits, but
+ these do not automatically apply to binary external body references.
+ Thus a Content-Transfer-Encoding is not generally necessary, though
+ it is permitted.
+
+ Note that the body of a message of type "message/external-body" is
+ governed by the basic syntax for an RFC 822 message. In particular,
+ anything before the first consecutive pair of CRLFs is header
+ information, while anything after it is body information, which is
+ ignored for most access-types.
+
+5.2.4. Other Message Subtypes
+
+ MIME implementations must in general treat unrecognized subtypes of
+ "message" as being equivalent to "application/octet-stream".
+
+ Future subtypes of "message" intended for use with email should be
+ restricted to "7bit" encoding. A type other than "message" should be
+ used if restriction to "7bit" is not possible.
+
+6. Experimental Media Type Values
+
+ A media type value beginning with the characters "X-" is a private
+ value, to be used by consenting systems by mutual agreement. Any
+ format without a rigorous and public definition must be named with an
+ "X-" prefix, and publicly specified values shall never begin with
+ "X-". (Older versions of the widely used Andrew system use the "X-
+ BE2" name, so new systems should probably choose a different name.)
+
+ In general, the use of "X-" top-level types is strongly discouraged.
+ Implementors should invent subtypes of the existing types whenever
+ possible. In many cases, a subtype of "application" will be more
+ appropriate than a new top-level type.
+
+
+
+
+Freed & Borenstein Standards Track [Page 40]
+\f
+RFC 2046 Media Types November 1996
+
+
+7. Summary
+
+ The five discrete media types provide provide a standardized
+ mechanism for tagging entities as "audio", "image", or several other
+ kinds of data. The composite "multipart" and "message" media types
+ allow mixing and hierarchical structuring of entities of different
+ types in a single message. A distinguished parameter syntax allows
+ further specification of data format details, particularly the
+ specification of alternate character sets. Additional optional
+ header fields provide mechanisms for certain extensions deemed
+ desirable by many implementors. Finally, a number of useful media
+ types are defined for general use by consenting user agents, notably
+ "message/partial" and "message/external-body".
+
+9. Security Considerations
+
+ Security issues are discussed in the context of the
+ "application/postscript" type, the "message/external-body" type, and
+ in RFC 2048. Implementors should pay special attention to the
+ security implications of any media types that can cause the remote
+ execution of any actions in the recipient's environment. In such
+ cases, the discussion of the "application/postscript" type may serve
+ as a model for considering other media types with remote execution
+ capabilities.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 41]
+\f
+RFC 2046 Media Types November 1996
+
+
+9. Authors' Addresses
+
+ For more information, the authors of this document are best contacted
+ via Internet mail:
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 East Garvey Avenue South
+ West Covina, CA 91790
+ USA
+
+ Phone: +1 818 919 3600
+ Fax: +1 818 919 3614
+ EMail: ned@innosoft.com
+
+
+ Nathaniel S. Borenstein
+ First Virtual Holdings
+ 25 Washington Avenue
+ Morristown, NJ 07960
+ USA
+
+ Phone: +1 201 540 8967
+ Fax: +1 201 993 3032
+ EMail: nsb@nsb.fv.com
+
+
+ MIME is a result of the work of the Internet Engineering Task Force
+ Working Group on RFC 822 Extensions. The chairman of that group,
+ Greg Vaudreuil, may be reached at:
+
+ Gregory M. Vaudreuil
+ Octel Network Services
+ 17080 Dallas Parkway
+ Dallas, TX 75248-1905
+ USA
+
+ EMail: Greg.Vaudreuil@Octel.Com
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 42]
+\f
+RFC 2046 Media Types November 1996
+
+
+Appendix A -- Collected Grammar
+
+ This appendix contains the complete BNF grammar for all the syntax
+ specified by this document.
+
+ By itself, however, this grammar is incomplete. It refers by name to
+ several syntax rules that are defined by RFC 822. Rather than
+ reproduce those definitions here, and risk unintentional differences
+ between the two, this document simply refers the reader to RFC 822
+ for the remaining definitions. Wherever a term is undefined, it
+ refers to the RFC 822 definition.
+
+ boundary := 0*69<bchars> bcharsnospace
+
+ bchars := bcharsnospace / " "
+
+ bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
+ "+" / "_" / "," / "-" / "." /
+ "/" / ":" / "=" / "?"
+
+ body-part := <"message" as defined in RFC 822, with all
+ header fields optional, not starting with the
+ specified dash-boundary, and with the
+ delimiter not occurring anywhere in the
+ body part. Note that the semantics of a
+ part differ from the semantics of a message,
+ as described in the text.>
+
+ close-delimiter := delimiter "--"
+
+ dash-boundary := "--" boundary
+ ; boundary taken from the value of
+ ; boundary parameter of the
+ ; Content-Type field.
+
+ delimiter := CRLF dash-boundary
+
+ discard-text := *(*text CRLF)
+ ; May be ignored or discarded.
+
+ encapsulation := delimiter transport-padding
+ CRLF body-part
+
+ epilogue := discard-text
+
+ multipart-body := [preamble CRLF]
+ dash-boundary transport-padding CRLF
+ body-part *encapsulation
+
+
+
+Freed & Borenstein Standards Track [Page 43]
+\f
+RFC 2046 Media Types November 1996
+
+
+ close-delimiter transport-padding
+ [CRLF epilogue]
+
+ preamble := discard-text
+
+ transport-padding := *LWSP-char
+ ; Composers MUST NOT generate
+ ; non-zero length transport
+ ; padding, but receivers MUST
+ ; be able to handle padding
+ ; added by message transports.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 44]
+\f
--- /dev/null
+
+
+
+
+
+
+Network Working Group K. Moore
+Request for Comments: 2047 University of Tennessee
+Obsoletes: 1521, 1522, 1590 November 1996
+Category: Standards Track
+
+
+ MIME (Multipurpose Internet Mail Extensions) Part Three:
+ Message Header Extensions for Non-ASCII Text
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ STD 11, RFC 822, defines a message representation protocol specifying
+ considerable detail about US-ASCII message headers, and leaves the
+ message content, or message body, as flat US-ASCII text. This set of
+ documents, collectively called the Multipurpose Internet Mail
+ Extensions, or MIME, redefines the format of messages to allow for
+
+ (1) textual message bodies in character sets other than US-ASCII,
+
+ (2) an extensible set of different formats for non-textual message
+ bodies,
+
+ (3) multi-part message bodies, and
+
+ (4) textual header information in character sets other than US-ASCII.
+
+ These documents are based on earlier work documented in RFC 934, STD
+ 11, and RFC 1049, but extends and revises them. Because RFC 822 said
+ so little about message bodies, these documents are largely
+ orthogonal to (rather than a revision of) RFC 822.
+
+ This particular document is the third document in the series. It
+ describes extensions to RFC 822 to allow non-US-ASCII text data in
+ Internet mail header fields.
+
+
+
+
+
+
+
+
+
+Moore Standards Track [Page 1]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ Other documents in this series include:
+
+ + RFC 2045, which specifies the various headers used to describe
+ the structure of MIME messages.
+
+ + RFC 2046, which defines the general structure of the MIME media
+ typing system and defines an initial set of media types,
+
+ + RFC 2048, which specifies various IANA registration procedures
+ for MIME-related facilities, and
+
+ + RFC 2049, which describes MIME conformance criteria and
+ provides some illustrative examples of MIME message formats,
+ acknowledgements, and the bibliography.
+
+ These documents are revisions of RFCs 1521, 1522, and 1590, which
+ themselves were revisions of RFCs 1341 and 1342. An appendix in RFC
+ 2049 describes differences and changes from previous versions.
+
+1. Introduction
+
+ RFC 2045 describes a mechanism for denoting textual body parts which
+ are coded in various character sets, as well as methods for encoding
+ such body parts as sequences of printable US-ASCII characters. This
+ memo describes similar techniques to allow the encoding of non-ASCII
+ text in various portions of a RFC 822 [2] message header, in a manner
+ which is unlikely to confuse existing message handling software.
+
+ Like the encoding techniques described in RFC 2045, the techniques
+ outlined here were designed to allow the use of non-ASCII characters
+ in message headers in a way which is unlikely to be disturbed by the
+ quirks of existing Internet mail handling programs. In particular,
+ some mail relaying programs are known to (a) delete some message
+ header fields while retaining others, (b) rearrange the order of
+ addresses in To or Cc fields, (c) rearrange the (vertical) order of
+ header fields, and/or (d) "wrap" message headers at different places
+ than those in the original message. In addition, some mail reading
+ programs are known to have difficulty correctly parsing message
+ headers which, while legal according to RFC 822, make use of
+ backslash-quoting to "hide" special characters such as "<", ",", or
+ ":", or which exploit other infrequently-used features of that
+ specification.
+
+ While it is unfortunate that these programs do not correctly
+ interpret RFC 822 headers, to "break" these programs would cause
+ severe operational problems for the Internet mail system. The
+ extensions described in this memo therefore do not rely on little-
+ used features of RFC 822.
+
+
+
+Moore Standards Track [Page 2]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ Instead, certain sequences of "ordinary" printable ASCII characters
+ (known as "encoded-words") are reserved for use as encoded data. The
+ syntax of encoded-words is such that they are unlikely to
+ "accidentally" appear as normal text in message headers.
+ Furthermore, the characters used in encoded-words are restricted to
+ those which do not have special meanings in the context in which the
+ encoded-word appears.
+
+ Generally, an "encoded-word" is a sequence of printable ASCII
+ characters that begins with "=?", ends with "?=", and has two "?"s in
+ between. It specifies a character set and an encoding method, and
+ also includes the original text encoded as graphic ASCII characters,
+ according to the rules for that encoding method.
+
+ A mail composer that implements this specification will provide a
+ means of inputting non-ASCII text in header fields, but will
+ translate these fields (or appropriate portions of these fields) into
+ encoded-words before inserting them into the message header.
+
+ A mail reader that implements this specification will recognize
+ encoded-words when they appear in certain portions of the message
+ header. Instead of displaying the encoded-word "as is", it will
+ reverse the encoding and display the original text in the designated
+ character set.
+
+NOTES
+
+ This memo relies heavily on notation and terms defined RFC 822 and
+ RFC 2045. In particular, the syntax for the ABNF used in this memo
+ is defined in RFC 822, as well as many of the terminal or nonterminal
+ symbols from RFC 822 are used in the grammar for the header
+ extensions defined here. Among the symbols defined in RFC 822 and
+ referenced in this memo are: 'addr-spec', 'atom', 'CHAR', 'comment',
+ 'CTLs', 'ctext', 'linear-white-space', 'phrase', 'quoted-pair'.
+ 'quoted-string', 'SPACE', and 'word'. Successful implementation of
+ this protocol extension requires careful attention to the RFC 822
+ definitions of these terms.
+
+ When the term "ASCII" appears in this memo, it refers to the "7-Bit
+ American Standard Code for Information Interchange", ANSI X3.4-1986.
+ The MIME charset name for this character set is "US-ASCII". When not
+ specifically referring to the MIME charset name, this document uses
+ the term "ASCII", both for brevity and for consistency with RFC 822.
+ However, implementors are warned that the character set name must be
+ spelled "US-ASCII" in MIME message and body part headers.
+
+
+
+
+
+
+Moore Standards Track [Page 3]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ This memo specifies a protocol for the representation of non-ASCII
+ text in message headers. It specifically DOES NOT define any
+ translation between "8-bit headers" and pure ASCII headers, nor is
+ any such translation assumed to be possible.
+
+2. Syntax of encoded-words
+
+ An 'encoded-word' is defined by the following ABNF grammar. The
+ notation of RFC 822 is used, with the exception that white space
+ characters MUST NOT appear between components of an 'encoded-word'.
+
+ encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
+
+ charset = token ; see section 3
+
+ encoding = token ; see section 4
+
+ token = 1*<Any CHAR except SPACE, CTLs, and especials>
+
+ especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
+ <"> / "/" / "[" / "]" / "?" / "." / "="
+
+ encoded-text = 1*<Any printable ASCII character other than "?"
+ or SPACE>
+ ; (but see "Use of encoded-words in message
+ ; headers", section 5)
+
+ Both 'encoding' and 'charset' names are case-independent. Thus the
+ charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
+ encoding named "Q" may be spelled either "Q" or "q".
+
+ An 'encoded-word' may not be more than 75 characters long, including
+ 'charset', 'encoding', 'encoded-text', and delimiters. If it is
+ desirable to encode more text than will fit in an 'encoded-word' of
+ 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
+ be used.
+
+ While there is no limit to the length of a multiple-line header
+ field, each line of a header field that contains one or more
+ 'encoded-word's is limited to 76 characters.
+
+ The length restrictions are included both to ease interoperability
+ through internetwork mail gateways, and to impose a limit on the
+ amount of lookahead a header parser must employ (while looking for a
+ final ?= delimiter) before it can decide whether a token is an
+ "encoded-word" or something else.
+
+
+
+
+
+Moore Standards Track [Page 4]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
+ by an RFC 822 parser. As a consequence, unencoded white space
+ characters (such as SPACE and HTAB) are FORBIDDEN within an
+ 'encoded-word'. For example, the character sequence
+
+ =?iso-8859-1?q?this is some text?=
+
+ would be parsed as four 'atom's, rather than as a single 'atom' (by
+ an RFC 822 parser) or 'encoded-word' (by a parser which understands
+ 'encoded-words'). The correct way to encode the string "this is some
+ text" is to encode the SPACE characters as well, e.g.
+
+ =?iso-8859-1?q?this=20is=20some=20text?=
+
+ The characters which may appear in 'encoded-text' are further
+ restricted by the rules in section 5.
+
+3. Character sets
+
+ The 'charset' portion of an 'encoded-word' specifies the character
+ set associated with the unencoded text. A 'charset' can be any of
+ the character set names allowed in an MIME "charset" parameter of a
+ "text/plain" body part, or any character set name registered with
+ IANA for use with the MIME text/plain content-type.
+
+ Some character sets use code-switching techniques to switch between
+ "ASCII mode" and other modes. If unencoded text in an 'encoded-word'
+ contains a sequence which causes the charset interpreter to switch
+ out of ASCII mode, it MUST contain additional control codes such that
+ ASCII mode is again selected at the end of the 'encoded-word'. (This
+ rule applies separately to each 'encoded-word', including adjacent
+ 'encoded-word's within a single header field.)
+
+ When there is a possibility of using more than one character set to
+ represent the text in an 'encoded-word', and in the absence of
+ private agreements between sender and recipients of a message, it is
+ recommended that members of the ISO-8859-* series be used in
+ preference to other character sets.
+
+4. Encodings
+
+ Initially, the legal values for "encoding" are "Q" and "B". These
+ encodings are described below. The "Q" encoding is recommended for
+ use when most of the characters to be encoded are in the ASCII
+ character set; otherwise, the "B" encoding should be used.
+ Nevertheless, a mail reader which claims to recognize 'encoded-word's
+ MUST be able to accept either encoding for any character set which it
+ supports.
+
+
+
+Moore Standards Track [Page 5]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ Only a subset of the printable ASCII characters may be used in
+ 'encoded-text'. Space and tab characters are not allowed, so that
+ the beginning and end of an 'encoded-word' are obvious. The "?"
+ character is used within an 'encoded-word' to separate the various
+ portions of the 'encoded-word' from one another, and thus cannot
+ appear in the 'encoded-text' portion. Other characters are also
+ illegal in certain contexts. For example, an 'encoded-word' in a
+ 'phrase' preceding an address in a From header field may not contain
+ any of the "specials" defined in RFC 822. Finally, certain other
+ characters are disallowed in some contexts, to ensure reliability for
+ messages that pass through internetwork mail gateways.
+
+ The "B" encoding automatically meets these requirements. The "Q"
+ encoding allows a wide range of printable characters to be used in
+ non-critical locations in the message header (e.g., Subject), with
+ fewer characters available for use in other locations.
+
+4.1. The "B" encoding
+
+ The "B" encoding is identical to the "BASE64" encoding defined by RFC
+ 2045.
+
+4.2. The "Q" encoding
+
+ The "Q" encoding is similar to the "Quoted-Printable" content-
+ transfer-encoding defined in RFC 2045. It is designed to allow text
+ containing mostly ASCII characters to be decipherable on an ASCII
+ terminal without decoding.
+
+ (1) Any 8-bit value may be represented by a "=" followed by two
+ hexadecimal digits. For example, if the character set in use
+ were ISO-8859-1, the "=" character would thus be encoded as
+ "=3D", and a SPACE by "=20". (Upper case should be used for
+ hexadecimal digits "A" through "F".)
+
+ (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
+ represented as "_" (underscore, ASCII 95.). (This character may
+ not pass through some internetwork mail gateways, but its use
+ will greatly enhance readability of "Q" encoded data with mail
+ readers that do not support this encoding.) Note that the "_"
+ always represents hexadecimal 20, even if the SPACE character
+ occupies a different code position in the character set in use.
+
+ (3) 8-bit values which correspond to printable ASCII characters other
+ than "=", "?", and "_" (underscore), MAY be represented as those
+ characters. (But see section 5 for restrictions.) In
+ particular, SPACE and TAB MUST NOT be represented as themselves
+ within encoded words.
+
+
+
+Moore Standards Track [Page 6]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+5. Use of encoded-words in message headers
+
+ An 'encoded-word' may appear in a message header or body part header
+ according to the following rules:
+
+(1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
+ in any Subject or Comments header field, any extension message
+ header field, or any MIME body part field for which the field body
+ is defined as '*text'. An 'encoded-word' may also appear in any
+ user-defined ("X-") message or body part header field.
+
+ Ordinary ASCII text and 'encoded-word's may appear together in the
+ same header field. However, an 'encoded-word' that appears in a
+ header field defined as '*text' MUST be separated from any adjacent
+ 'encoded-word' or 'text' by 'linear-white-space'.
+
+(2) An 'encoded-word' may appear within a 'comment' delimited by "(" and
+ ")", i.e., wherever a 'ctext' is allowed. More precisely, the RFC
+ 822 ABNF definition for 'comment' is amended as follows:
+
+ comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"
+
+ A "Q"-encoded 'encoded-word' which appears in a 'comment' MUST NOT
+ contain the characters "(", ")" or "
+ 'encoded-word' that appears in a 'comment' MUST be separated from
+ any adjacent 'encoded-word' or 'ctext' by 'linear-white-space'.
+
+ It is important to note that 'comment's are only recognized inside
+ "structured" field bodies. In fields whose bodies are defined as
+ '*text', "(" and ")" are treated as ordinary characters rather than
+ comment delimiters, and rule (1) of this section applies. (See RFC
+ 822, sections 3.1.2 and 3.1.3)
+
+(3) As a replacement for a 'word' entity within a 'phrase', for example,
+ one that precedes an address in a From, To, or Cc header. The ABNF
+ definition for 'phrase' from RFC 822 thus becomes:
+
+ phrase = 1*( encoded-word / word )
+
+ In this case the set of characters that may be used in a "Q"-encoded
+ 'encoded-word' is restricted to: <upper and lower case ASCII
+ letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
+ (underscore, ASCII 95.)>. An 'encoded-word' that appears within a
+ 'phrase' MUST be separated from any adjacent 'word', 'text' or
+ 'special' by 'linear-white-space'.
+
+
+
+
+
+
+Moore Standards Track [Page 7]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ These are the ONLY locations where an 'encoded-word' may appear. In
+ particular:
+
+ + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.
+
+ + An 'encoded-word' MUST NOT appear within a 'quoted-string'.
+
+ + An 'encoded-word' MUST NOT be used in a Received header field.
+
+ + An 'encoded-word' MUST NOT be used in parameter of a MIME
+ Content-Type or Content-Disposition field, or in any structured
+ field body except within a 'comment' or 'phrase'.
+
+ The 'encoded-text' in an 'encoded-word' must be self-contained;
+ 'encoded-text' MUST NOT be continued from one 'encoded-word' to
+ another. This implies that the 'encoded-text' portion of a "B"
+ 'encoded-word' will be a multiple of 4 characters long; for a "Q"
+ 'encoded-word', any "=" character that appears in the 'encoded-text'
+ portion will be followed by two hexadecimal characters.
+
+ Each 'encoded-word' MUST encode an integral number of octets. The
+ 'encoded-text' in each 'encoded-word' must be well-formed according
+ to the encoding specified; the 'encoded-text' may not be continued in
+ the next 'encoded-word'. (For example, "=?charset?Q?=?=
+ =?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
+ must follow the "=" in the same 'encoded-word'.)
+
+ Each 'encoded-word' MUST represent an integral number of characters.
+ A multi-octet character may not be split across adjacent 'encoded-
+ word's.
+
+ Only printable and white space character data should be encoded using
+ this scheme. However, since these encoding schemes allow the
+ encoding of arbitrary octet values, mail readers that implement this
+ decoding should also ensure that display of the decoded data on the
+ recipient's terminal will not cause unwanted side-effects.
+
+ Use of these methods to encode non-textual data (e.g., pictures or
+ sounds) is not defined by this memo. Use of 'encoded-word's to
+ represent strings of purely ASCII characters is allowed, but
+ discouraged. In rare cases it may be necessary to encode ordinary
+ text that looks like an 'encoded-word'.
+
+
+
+
+
+
+
+
+
+Moore Standards Track [Page 8]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+6. Support of 'encoded-word's by mail readers
+
+6.1. Recognition of 'encoded-word's in message headers
+
+ A mail reader must parse the message and body part headers according
+ to the rules in RFC 822 to correctly recognize 'encoded-word's.
+
+ 'encoded-word's are to be recognized as follows:
+
+ (1) Any message or body part header field defined as '*text', or any
+ user-defined header field, should be parsed as follows: Beginning
+ at the start of the field-body and immediately following each
+ occurrence of 'linear-white-space', each sequence of up to 75
+ printable characters (not containing any 'linear-white-space')
+ should be examined to see if it is an 'encoded-word' according to
+ the syntax rules in section 2. Any other sequence of printable
+ characters should be treated as ordinary ASCII text.
+
+ (2) Any header field not defined as '*text' should be parsed
+ according to the syntax rules for that header field. However,
+ any 'word' that appears within a 'phrase' should be treated as an
+ 'encoded-word' if it meets the syntax rules in section 2.
+ Otherwise it should be treated as an ordinary 'word'.
+
+ (3) Within a 'comment', any sequence of up to 75 printable characters
+ (not containing 'linear-white-space'), that meets the syntax
+ rules in section 2, should be treated as an 'encoded-word'.
+ Otherwise it should be treated as normal comment text.
+
+ (4) A MIME-Version header field is NOT required to be present for
+ 'encoded-word's to be interpreted according to this
+ specification. One reason for this is that the mail reader is
+ not expected to parse the entire message header before displaying
+ lines that may contain 'encoded-word's.
+
+6.2. Display of 'encoded-word's
+
+ Any 'encoded-word's so recognized are decoded, and if possible, the
+ resulting unencoded text is displayed in the original character set.
+
+ NOTE: Decoding and display of encoded-words occurs *after* a
+ structured field body is parsed into tokens. It is therefore
+ possible to hide 'special' characters in encoded-words which, when
+ displayed, will be indistinguishable from 'special' characters in the
+ surrounding text. For this and other reasons, it is NOT generally
+ possible to translate a message header containing 'encoded-word's to
+ an unencoded form which can be parsed by an RFC 822 mail reader.
+
+
+
+
+Moore Standards Track [Page 9]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ When displaying a particular header field that contains multiple
+ 'encoded-word's, any 'linear-white-space' that separates a pair of
+ adjacent 'encoded-word's is ignored. (This is to allow the use of
+ multiple 'encoded-word's to represent long strings of unencoded text,
+ without having to separate 'encoded-word's where spaces occur in the
+ unencoded text.)
+
+ In the event other encodings are defined in the future, and the mail
+ reader does not support the encoding used, it may either (a) display
+ the 'encoded-word' as ordinary text, or (b) substitute an appropriate
+ message indicating that the text could not be decoded.
+
+ If the mail reader does not support the character set used, it may
+ (a) display the 'encoded-word' as ordinary text (i.e., as it appears
+ in the header), (b) make a "best effort" to display using such
+ characters as are available, or (c) substitute an appropriate message
+ indicating that the decoded text could not be displayed.
+
+ If the character set being used employs code-switching techniques,
+ display of the encoded text implicitly begins in "ASCII mode". In
+ addition, the mail reader must ensure that the output device is once
+ again in "ASCII mode" after the 'encoded-word' is displayed.
+
+6.3. Mail reader handling of incorrectly formed 'encoded-word's
+
+ It is possible that an 'encoded-word' that is legal according to the
+ syntax defined in section 2, is incorrectly formed according to the
+ rules for the encoding being used. For example:
+
+ (1) An 'encoded-word' which contains characters which are not legal
+ for a particular encoding (for example, a "-" in the "B"
+ encoding, or a SPACE or HTAB in either the "B" or "Q" encoding),
+ is incorrectly formed.
+
+ (2) Any 'encoded-word' which encodes a non-integral number of
+ characters or octets is incorrectly formed.
+
+ A mail reader need not attempt to display the text associated with an
+ 'encoded-word' that is incorrectly formed. However, a mail reader
+ MUST NOT prevent the display or handling of a message because an
+ 'encoded-word' is incorrectly formed.
+
+7. Conformance
+
+ A mail composing program claiming compliance with this specification
+ MUST ensure that any string of non-white-space printable ASCII
+ characters within a '*text' or '*ctext' that begins with "=?" and
+ ends with "?=" be a valid 'encoded-word'. ("begins" means: at the
+
+
+
+Moore Standards Track [Page 10]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ start of the field-body, immediately following 'linear-white-space',
+ or immediately following a "(" for an 'encoded-word' within '*ctext';
+ "ends" means: at the end of the field-body, immediately preceding
+ 'linear-white-space', or immediately preceding a ")" for an
+ 'encoded-word' within '*ctext'.) In addition, any 'word' within a
+ 'phrase' that begins with "=?" and ends with "?=" must be a valid
+ 'encoded-word'.
+
+ A mail reading program claiming compliance with this specification
+ must be able to distinguish 'encoded-word's from 'text', 'ctext', or
+ 'word's, according to the rules in section 6, anytime they appear in
+ appropriate places in message headers. It must support both the "B"
+ and "Q" encodings for any character set which it supports. The
+ program must be able to display the unencoded text if the character
+ set is "US-ASCII". For the ISO-8859-* character sets, the mail
+ reading program must at least be able to display the characters which
+ are also in the ASCII set.
+
+8. Examples
+
+ The following are examples of message headers containing 'encoded-
+ word's:
+
+ From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
+ To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
+ CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
+ Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
+ =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
+
+ Note: In the first 'encoded-word' of the Subject field above, the
+ last "=" at the end of the 'encoded-text' is necessary because each
+ 'encoded-word' must be self-contained (the "=" character completes a
+ group of 4 base64 characters representing 2 octets). An additional
+ octet could have been encoded in the first 'encoded-word' (so that
+ the encoded-word would contain an exact multiple of 3 encoded
+ octets), except that the second 'encoded-word' uses a different
+ 'charset' than the first one.
+
+ From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
+ To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
+ Subject: Time for ISO 10646?
+
+ To: Dave Crocker <dcrocker@mordor.stanford.edu>
+ Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
+ From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
+ Subject: Re: RFC-HDR care and feeding
+
+
+
+
+
+Moore Standards Track [Page 11]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
+ (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
+ To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
+ <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
+ Subject: Test of new header generator
+ MIME-Version: 1.0
+ Content-type: text/plain; charset=ISO-8859-1
+
+ The following examples illustrate how text containing 'encoded-word's
+ which appear in a structured field body. The rules are slightly
+ different for fields defined as '*text' because "(" and ")" are not
+ recognized as 'comment' delimiters. [Section 5, paragraph (1)].
+
+ In each of the following examples, if the same sequence were to occur
+ in a '*text' field, the "displayed as" form would NOT be treated as
+ encoded words, but be identical to the "encoded form". This is
+ because each of the encoded-words in the following examples is
+ adjacent to a "(" or ")" character.
+
+ encoded form displayed as
+ ---------------------------------------------------------------------
+ (=?ISO-8859-1?Q?a?=) (a)
+
+ (=?ISO-8859-1?Q?a?= b) (a b)
+
+ Within a 'comment', white space MUST appear between an
+ 'encoded-word' and surrounding text. [Section 5,
+ paragraph (2)]. However, white space is not needed between
+ the initial "(" that begins the 'comment', and the
+ 'encoded-word'.
+
+
+ (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab)
+
+ White space between adjacent 'encoded-word's is not
+ displayed.
+
+ (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab)
+
+ Even multiple SPACEs between 'encoded-word's are ignored
+ for the purpose of display.
+
+ (=?ISO-8859-1?Q?a?= (ab)
+ =?ISO-8859-1?Q?b?=)
+
+ Any amount of linear-space-white between 'encoded-word's,
+ even if it includes a CRLF followed by one or more SPACEs,
+ is ignored for the purposes of display.
+
+
+
+Moore Standards Track [Page 12]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+ (=?ISO-8859-1?Q?a_b?=) (a b)
+
+ In order to cause a SPACE to be displayed within a portion
+ of encoded text, the SPACE MUST be encoded as part of the
+ 'encoded-word'.
+
+ (=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=) (a b)
+
+ In order to cause a SPACE to be displayed between two strings
+ of encoded text, the SPACE MAY be encoded as part of one of
+ the 'encoded-word's.
+
+9. References
+
+ [RFC 822] Crocker, D., "Standard for the Format of ARPA Internet Text
+ Messages", STD 11, RFC 822, UDEL, August 1982.
+
+ [RFC 2049] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
+ Extensions (MIME) Part Five: Conformance Criteria and Examples",
+ RFC 2049, November 1996.
+
+ [RFC 2045] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message Bodies",
+ RFC 2045, November 1996.
+
+ [RFC 2046] Borenstein N., and N. Freed, "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046,
+ November 1996.
+
+ [RFC 2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose
+ Internet Mail Extensions (MIME) Part Four: Registration
+ Procedures", RFC 2048, November 1996.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moore Standards Track [Page 13]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+10. Security Considerations
+
+ Security issues are not discussed in this memo.
+
+11. Acknowledgements
+
+ The author wishes to thank Nathaniel Borenstein, Issac Chan, Lutz
+ Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle
+ Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, and Kazuhiko
+ Yamamoto, for their helpful advice, insightful comments, and
+ illuminating questions in response to earlier versions of this
+ specification.
+
+12. Author's Address
+
+ Keith Moore
+ University of Tennessee
+ 107 Ayres Hall
+ Knoxville TN 37996-1301
+
+ EMail: moore@cs.utk.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moore Standards Track [Page 14]
+\f
+RFC 2047 Message Header Extensions November 1996
+
+
+Appendix - changes since RFC 1522 (in no particular order)
+
+ + explicitly state that the MIME-Version is not requried to use
+ 'encoded-word's.
+
+ + add explicit note that SPACEs and TABs are not allowed within
+ 'encoded-word's, explaining that an 'encoded-word' must look like an
+ 'atom' to an RFC822 parser.values, to be precise).
+
+ + add examples from Olle Jarnefors (thanks!) which illustrate how
+ encoded-words with adjacent linear-white-space are displayed.
+
+ + explicitly list terms defined in RFC822 and referenced in this memo
+
+ + fix transcription typos that caused one or two lines and a couple of
+ characters to disappear in the resulting text, due to nroff quirks.
+
+ + clarify that encoded-words are allowed in '*text' fields in both
+ RFC822 headers and MIME body part headers, but NOT as parameter
+ values.
+
+ + clarify the requirement to switch back to ASCII within the encoded
+ portion of an 'encoded-word', for any charset that uses code switching
+ sequences.
+
+ + add a note about 'encoded-word's being delimited by "(" and ")"
+ within a comment, but not in a *text (how bizarre!).
+
+ + fix the Andre Pirard example to get rid of the trailing "_" after
+ the =E9. (no longer needed post-1342).
+
+ + clarification: an 'encoded-word' may appear immediately following
+ the initial "(" or immediately before the final ")" that delimits a
+ comment, not just adjacent to "(" and ")" *within* *ctext.
+
+ + add a note to explain that a "B" 'encoded-word' will always have a
+ multiple of 4 characters in the 'encoded-text' portion.
+
+ + add note about the "=" in the examples
+
+ + note that processing of 'encoded-word's occurs *after* parsing, and
+ some of the implications thereof.
+
+ + explicitly state that you can't expect to translate between
+ 1522 and either vanilla 822 or so-called "8-bit headers".
+
+ + explicitly state that 'encoded-word's are not valid within a
+ 'quoted-string'.
+
+
+
+Moore Standards Track [Page 15]
+\f
--- /dev/null
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2048 Innosoft
+BCP: 13 J. Klensin
+Obsoletes: 1521, 1522, 1590 MCI
+Category: Best Current Practice J. Postel
+ ISI
+ November 1996
+
+
+ Multipurpose Internet Mail Extensions
+ (MIME) Part Four:
+ Registration Procedures
+
+Status of this Memo
+
+ This document specifies an Internet Best Current Practices for the
+ Internet Community, and requests discussion and suggestions for
+ improvements. Distribution of this memo is unlimited.
+
+Abstract
+
+ STD 11, RFC 822, defines a message representation protocol specifying
+ considerable detail about US-ASCII message headers, and leaves the
+ message content, or message body, as flat US-ASCII text. This set of
+ documents, collectively called the Multipurpose Internet Mail
+ Extensions, or MIME, redefines the format of messages to allow for
+
+ (1) textual message bodies in character sets other than
+ US-ASCII,
+
+ (2) an extensible set of different formats for non-textual
+ message bodies,
+
+ (3) multi-part message bodies, and
+
+ (4) textual header information in character sets other than
+ US-ASCII.
+
+ These documents are based on earlier work documented in RFC 934, STD
+ 11, and RFC 1049, but extends and revises them. Because RFC 822 said
+ so little about message bodies, these documents are largely
+ orthogonal to (rather than a revision of) RFC 822.
+
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 1]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ This fourth document, RFC 2048, specifies various IANA registration
+ procedures for the following MIME facilities:
+
+ (1) media types,
+
+ (2) external body access types,
+
+ (3) content-transfer-encodings.
+
+ Registration of character sets for use in MIME is covered elsewhere
+ and is no longer addressed by this document.
+
+ These documents are revisions of RFCs 1521 and 1522, which themselves
+ were revisions of RFCs 1341 and 1342. An appendix in RFC 2049
+ describes differences and changes from previous versions.
+
+Table of Contents
+
+ 1. Introduction ......................................... 3
+ 2. Media Type Registration .............................. 4
+ 2.1 Registration Trees and Subtype Names ................ 4
+ 2.1.1 IETF Tree ......................................... 4
+ 2.1.2 Vendor Tree ....................................... 4
+ 2.1.3 Personal or Vanity Tree ........................... 5
+ 2.1.4 Special `x.' Tree ................................. 5
+ 2.1.5 Additional Registration Trees ..................... 6
+ 2.2 Registration Requirements ........................... 6
+ 2.2.1 Functionality Requirement ......................... 6
+ 2.2.2 Naming Requirements ............................... 6
+ 2.2.3 Parameter Requirements ............................ 7
+ 2.2.4 Canonicalization and Format Requirements .......... 7
+ 2.2.5 Interchange Recommendations ....................... 8
+ 2.2.6 Security Requirements ............................. 8
+ 2.2.7 Usage and Implementation Non-requirements ......... 9
+ 2.2.8 Publication Requirements .......................... 10
+ 2.2.9 Additional Information ............................ 10
+ 2.3 Registration Procedure .............................. 11
+ 2.3.1 Present the Media Type to the Community for Review 11
+ 2.3.2 IESG Approval ..................................... 12
+ 2.3.3 IANA Registration ................................. 12
+ 2.4 Comments on Media Type Registrations ................ 12
+ 2.5 Location of Registered Media Type List .............. 12
+ 2.6 IANA Procedures for Registering Media Types ......... 12
+ 2.7 Change Control ...................................... 13
+ 2.8 Registration Template ............................... 14
+ 3. External Body Access Types ........................... 14
+ 3.1 Registration Requirements ........................... 15
+ 3.1.1 Naming Requirements ............................... 15
+
+
+
+Freed, et. al. Best Current Practice [Page 2]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ 3.1.2 Mechanism Specification Requirements .............. 15
+ 3.1.3 Publication Requirements .......................... 15
+ 3.1.4 Security Requirements ............................. 15
+ 3.2 Registration Procedure .............................. 15
+ 3.2.1 Present the Access Type to the Community .......... 16
+ 3.2.2 Access Type Reviewer .............................. 16
+ 3.2.3 IANA Registration ................................. 16
+ 3.3 Location of Registered Access Type List ............. 16
+ 3.4 IANA Procedures for Registering Access Types ........ 16
+ 4. Transfer Encodings ................................... 17
+ 4.1 Transfer Encoding Requirements ...................... 17
+ 4.1.1 Naming Requirements ............................... 17
+ 4.1.2 Algorithm Specification Requirements .............. 18
+ 4.1.3 Input Domain Requirements ......................... 18
+ 4.1.4 Output Range Requirements ......................... 18
+ 4.1.5 Data Integrity and Generality Requirements ........ 18
+ 4.1.6 New Functionality Requirements .................... 18
+ 4.2 Transfer Encoding Definition Procedure .............. 19
+ 4.3 IANA Procedures for Transfer Encoding Registration... 19
+ 4.4 Location of Registered Transfer Encodings List ...... 19
+ 5. Authors' Addresses ................................... 20
+ A. Grandfathered Media Types ............................ 21
+
+1. Introduction
+
+ Recent Internet protocols have been carefully designed to be easily
+ extensible in certain areas. In particular, MIME [RFC 2045] is an
+ open-ended framework and can accommodate additional object types,
+ character sets, and access methods without any changes to the basic
+ protocol. A registration process is needed, however, to ensure that
+ the set of such values is developed in an orderly, well-specified,
+ and public manner.
+
+ This document defines registration procedures which use the Internet
+ Assigned Numbers Authority (IANA) as a central registry for such
+ values.
+
+ Historical Note: The registration process for media types was
+ initially defined in the context of the asynchronous Internet mail
+ environment. In this mail environment there is a need to limit the
+ number of possible media types to increase the likelihood of
+ interoperability when the capabilities of the remote mail system are
+ not known. As media types are used in new environments, where the
+ proliferation of media types is not a hindrance to interoperability,
+ the original procedure was excessively restrictive and had to be
+ generalized.
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 3]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+2. Media Type Registration
+
+ Registration of a new media type or types starts with the
+ construction of a registration proposal. Registration may occur in
+ several different registration trees, which have different
+ requirements as discussed below. In general, the new registration
+ proposal is circulated and reviewed in a fashion appropriate to the
+ tree involved. The media type is then registered if the proposal is
+ acceptable. The following sections describe the requirements and
+ procedures used for each of the different registration trees.
+
+2.1. Registration Trees and Subtype Names
+
+ In order to increase the efficiency and flexibility of the
+ registration process, different structures of subtype names may be
+ registered to accomodate the different natural requirements for,
+ e.g., a subtype that will be recommended for wide support and
+ implementation by the Internet Community or a subtype that is used to
+ move files associated with proprietary software. The following
+ subsections define registration "trees", distinguished by the use of
+ faceted names (e.g., names of the form "tree.subtree...type"). Note
+ that some media types defined prior to this document do not conform
+ to the naming conventions described below. See Appendix A for a
+ discussion of them.
+
+2.1.1. IETF Tree
+
+ The IETF tree is intended for types of general interest to the
+ Internet Community. Registration in the IETF tree requires approval
+ by the IESG and publication of the media type registration as some
+ form of RFC.
+
+ Media types in the IETF tree are normally denoted by names that are
+ not explicitly faceted, i.e., do not contain period (".", full stop)
+ characters.
+
+ The "owner" of a media type registration in the IETF tree is assumed
+ to be the IETF itself. Modification or alteration of the
+ specification requires the same level of processing (e.g. standards
+ track) required for the initial registration.
+
+2.1.2. Vendor Tree
+
+ The vendor tree is used for media types associated with commercially
+ available products. "Vendor" or "producer" are construed as
+ equivalent and very broadly in this context.
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 4]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ A registration may be placed in the vendor tree by anyone who has
+ need to interchange files associated with the particular product.
+ However, the registration formally belongs to the vendor or
+ organization producing the software or file format. Changes to the
+ specification will be made at their request, as discussed in
+ subsequent sections.
+
+ Registrations in the vendor tree will be distinguished by the leading
+ facet "vnd.". That may be followed, at the discretion of the
+ registration, by either a media type name from a well-known producer
+ (e.g., "vnd.mudpie") or by an IANA-approved designation of the
+ producer's name which is then followed by a media type or product
+ designation (e.g., vnd.bigcompany.funnypictures).
+
+ While public exposure and review of media types to be registered in
+ the vendor tree is not required, using the ietf-types list for review
+ is strongly encouraged to improve the quality of those
+ specifications. Registrations in the vendor tree may be submitted
+ directly to the IANA.
+
+2.1.3. Personal or Vanity Tree
+
+ Registrations for media types created experimentally or as part of
+ products that are not distributed commercially may be registered in
+ the personal or vanity tree. The registrations are distinguished by
+ the leading facet "prs.".
+
+ The owner of "personal" registrations and associated specifications
+ is the person or entity making the registration, or one to whom
+ responsibility has been transferred as described below.
+
+ While public exposure and review of media types to be registered in
+ the personal tree is not required, using the ietf-types list for
+ review is strongly encouraged to improve the quality of those
+ specifications. Registrations in the personl tree may be submitted
+ directly to the IANA.
+
+2.1.4. Special `x.' Tree
+
+ For convenience and symmetry with this registration scheme, media
+ type names with "x." as the first facet may be used for the same
+ purposes for which names starting in "x-" are normally used. These
+ types are unregistered, experimental, and should be used only with
+ the active agreement of the parties exchanging them.
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 5]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ However, with the simplified registration procedures described above
+ for vendor and personal trees, it should rarely, if ever, be
+ necessary to use unregistered experimental types, and as such use of
+ both "x-" and "x." forms is discouraged.
+
+2.1.5. Additional Registration Trees
+
+ From time to time and as required by the community, the IANA may,
+ with the advice and consent of the IESG, create new top-level
+ registration trees. It is explicitly assumed that these trees may be
+ created for external registration and management by well-known
+ permanent bodies, such as scientific societies for media types
+ specific to the sciences they cover. In general, the quality of
+ review of specifications for one of these additional registration
+ trees is expected to be equivalent to that which IETF would give to
+ registrations in its own tree. Establishment of these new trees will
+ be announced through RFC publication approved by the IESG.
+
+2.2. Registration Requirements
+
+ Media type registration proposals are all expected to conform to
+ various requirements laid out in the following sections. Note that
+ requirement specifics sometimes vary depending on the registration
+ tree, again as detailed in the following sections.
+
+2.2.1. Functionality Requirement
+
+ Media types must function as an actual media format: Registration of
+ things that are better thought of as a transfer encoding, as a
+ character set, or as a collection of separate entities of another
+ type, is not allowed. For example, although applications exist to
+ decode the base64 transfer encoding [RFC 2045], base64 cannot be
+ registered as a media type.
+
+ This requirement applies regardless of the registration tree
+ involved.
+
+2.2.2. Naming Requirements
+
+ All registered media types must be assigned MIME type and subtype
+ names. The combination of these names then serves to uniquely
+ identify the media type and the format of the subtype name identifies
+ the registration tree.
+
+ The choice of top-level type name must take the nature of media type
+ involved into account. For example, media normally used for
+ representing still images should be a subtype of the image content
+ type, whereas media capable of representing audio information belongs
+
+
+
+Freed, et. al. Best Current Practice [Page 6]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ under the audio content type. See RFC 2046 for additional information
+ on the basic set of top-level types and their characteristics.
+
+ New subtypes of top-level types must conform to the restrictions of
+ the top-level type, if any. For example, all subtypes of the
+ multipart content type must use the same encapsulation syntax.
+
+ In some cases a new media type may not "fit" under any currently
+ defined top-level content type. Such cases are expected to be quite
+ rare. However, if such a case arises a new top-level type can be
+ defined to accommodate it. Such a definition must be done via
+ standards-track RFC; no other mechanism can be used to define
+ additional top-level content types.
+
+ These requirements apply regardless of the registration tree
+ involved.
+
+2.2.3. Parameter Requirements
+
+ Media types may elect to use one or more MIME content type
+ parameters, or some parameters may be automatically made available to
+ the media type by virtue of being a subtype of a content type that
+ defines a set of parameters applicable to any of its subtypes. In
+ either case, the names, values, and meanings of any parameters must
+ be fully specified when a media type is registered in the IETF tree,
+ and should be specified as completely as possible when media types
+ are registered in the vendor or personal trees.
+
+ New parameters must not be defined as a way to introduce new
+ functionality in types registered in the IETF tree, although new
+ parameters may be added to convey additional information that does
+ not otherwise change existing functionality. An example of this
+ would be a "revision" parameter to indicate a revision level of an
+ external specification such as JPEG. Similar behavior is encouraged
+ for media types registered in the vendor or personal trees but is not
+ required.
+
+2.2.4. Canonicalization and Format Requirements
+
+ All registered media types must employ a single, canonical data
+ format, regardless of registration tree.
+
+ A precise and openly available specification of the format of each
+ media type is required for all types registered in the IETF tree and
+ must at a minimum be referenced by, if it isn't actually included in,
+ the media type registration proposal itself.
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 7]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ The specifications of format and processing particulars may or may
+ not be publically available for media types registered in the vendor
+ tree, and such registration proposals are explicitly permitted to
+ include only a specification of which software and version produce or
+ process such media types. References to or inclusion of format
+ specifications in registration proposals is encouraged but not
+ required.
+
+ Format specifications are still required for registration in the
+ personal tree, but may be either published as RFCs or otherwise
+ deposited with IANA. The deposited specifications will meet the same
+ criteria as those required to register a well-known TCP port and, in
+ particular, need not be made public.
+
+ Some media types involve the use of patented technology. The
+ registration of media types involving patented technology is
+ specifically permitted. However, the restrictions set forth in RFC
+ 1602 on the use of patented technology in standards-track protocols
+ must be respected when the specification of a media type is part of a
+ standards-track protocol.
+
+2.2.5. Interchange Recommendations
+
+ Media types should, whenever possible, interoperate across as many
+ systems and applications as possible. However, some media types will
+ inevitably have problems interoperating across different platforms.
+ Problems with different versions, byte ordering, and specifics of
+ gateway handling can and will arise.
+
+ Universal interoperability of media types is not required, but known
+ interoperability issues should be identified whenever possible.
+ Publication of a media type does not require an exhaustive review of
+ interoperability, and the interoperability considerations section is
+ subject to continuing evaluation.
+
+ These recommendations apply regardless of the registration tree
+ involved.
+
+2.2.6. Security Requirements
+
+ An analysis of security issues is required for for all types
+ registered in the IETF Tree. (This is in accordance with the basic
+ requirements for all IETF protocols.) A similar analysis for media
+ types registered in the vendor or personal trees is encouraged but
+ not required. However, regardless of what security analysis has or
+ has not been done, all descriptions of security issues must be as
+ accurate as possible regardless of registration tree. In particular,
+ a statement that there are "no security issues associated with this
+
+
+
+Freed, et. al. Best Current Practice [Page 8]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ type" must not be confused with "the security issues associates with
+ this type have not been assessed".
+
+ There is absolutely no requirement that media types registered in any
+ tree be secure or completely free from risks. Nevertheless, all
+ known security risks must be identified in the registration of a
+ media type, again regardless of registration tree.
+
+ The security considerations section of all registrations is subject
+ to continuing evaluation and modification, and in particular may be
+ extended by use of the "comments on media types" mechanism described
+ in subsequent sections.
+
+ Some of the issues that should be looked at in a security analysis of
+ a media type are:
+
+ (1) Complex media types may include provisions for
+ directives that institute actions on a recipient's
+ files or other resources. In many cases provision is
+ made for originators to specify arbitrary actions in an
+ unrestricted fashion which may then have devastating
+ effects. See the registration of the
+ application/postscript media type in RFC 2046 for
+ an example of such directives and how to handle them.
+
+ (2) Complex media types may include provisions for
+ directives that institute actions which, while not
+ directly harmful to the recipient, may result in
+ disclosure of information that either facilitates a
+ subsequent attack or else violates a recipient's
+ privacy in some way. Again, the registration of the
+ application/postscript media type illustrates how such
+ directives can be handled.
+
+ (3) A media type might be targeted for applications that
+ require some sort of security assurance but not provide
+ the necessary security mechanisms themselves. For
+ example, a media type could be defined for storage of
+ confidential medical information which in turn requires
+ an external confidentiality service.
+
+2.2.7. Usage and Implementation Non-requirements
+
+ In the asynchronous mail environment, where information on the
+ capabilities of the remote mail agent is frequently not available to
+ the sender, maximum interoperability is attained by restricting the
+ number of media types used to those "common" formats expected to be
+ widely implemented. This was asserted in the past as a reason to
+
+
+
+Freed, et. al. Best Current Practice [Page 9]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ limit the number of possible media types and resulted in a
+ registration process with a significant hurdle and delay for those
+ registering media types.
+
+ However, the need for "common" media types does not require limiting
+ the registration of new media types. If a limited set of media types
+ is recommended for a particular application, that should be asserted
+ by a separate applicability statement specific for the application
+ and/or environment.
+
+ As such, universal support and implementation of a media type is NOT
+ a requirement for registration. If, however, a media type is
+ explicitly intended for limited use, this should be noted in its
+ registration.
+
+2.2.8. Publication Requirements
+
+ Proposals for media types registered in the IETF tree must be
+ published as RFCs. RFC publication of vendor and personal media type
+ proposals is encouraged but not required. In all cases IANA will
+ retain copies of all media type proposals and "publish" them as part
+ of the media types registration tree itself.
+
+ Other than in the IETF tree, the registration of a data type does not
+ imply endorsement, approval, or recommendation by IANA or IETF or
+ even certification that the specification is adequate. To become
+ Internet Standards, protocol, data objects, or whatever must go
+ through the IETF standards process. This is too difficult and too
+ lengthy a process for the convenient registration of media types.
+
+ The IETF tree exists for media types that do require require a
+ substantive review and approval process with the vendor and personal
+ trees exist for those that do not. It is expected that applicability
+ statements for particular applications will be published from time to
+ time that recommend implementation of, and support for, media types
+ that have proven particularly useful in those contexts.
+
+ As discussed above, registration of a top-level type requires
+ standards-track processing and, hence, RFC publication.
+
+2.2.9. Additional Information
+
+ Various sorts of optional information may be included in the
+ specification of a media type if it is available:
+
+ (1) Magic number(s) (length, octet values). Magic numbers
+ are byte sequences that are always present and thus can
+ be used to identify entities as being of a given media
+
+
+
+Freed, et. al. Best Current Practice [Page 10]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ type.
+
+ (2) File extension(s) commonly used on one or more
+ platforms to indicate that some file containing a given
+ type of media.
+
+ (3) Macintosh File Type code(s) (4 octets) used to label
+ files containing a given type of media.
+
+ Such information is often quite useful to implementors and if
+ available should be provided.
+
+2.3. Registration Procedure
+
+ The following procedure has been implemented by the IANA for review
+ and approval of new media types. This is not a formal standards
+ process, but rather an administrative procedure intended to allow
+ community comment and sanity checking without excessive time delay.
+ For registration in the IETF tree, the normal IETF processes should
+ be followed, treating posting of an internet-draft and announcement
+ on the ietf-types list (as described in the next subsection) as a
+ first step. For registrations in the vendor or personal tree, the
+ initial review step described below may be omitted and the type
+ registered directly by submitting the template and an explanation
+ directly to IANA (at iana@iana.org). However, authors of vendor or
+ personal media type specifications are encouraged to seek community
+ review and comment whenever that is feasible.
+
+2.3.1. Present the Media Type to the Community for Review
+
+ Send a proposed media type registration to the "ietf-types@iana.org"
+ mailing list for a two week review period. This mailing list has
+ been established for the purpose of reviewing proposed media and
+ access types. Proposed media types are not formally registered and
+ must not be used; the "x-" prefix specified in RFC 2045 can be used
+ until registration is complete.
+
+ The intent of the public posting is to solicit comments and feedback
+ on the choice of type/subtype name, the unambiguity of the references
+ with respect to versions and external profiling information, and a
+ review of any interoperability or security considerations. The
+ submitter may submit a revised registration, or withdraw the
+ registration completely, at any time.
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 11]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+2.3.2. IESG Approval
+
+ Media types registered in the IETF tree must be submitted to the IESG
+ for approval.
+
+2.3.3. IANA Registration
+
+ Provided that the media type meets the requirements for media types
+ and has obtained approval that is necessary, the author may submit
+ the registration request to the IANA, which will register the media
+ type and make the media type registration available to the community.
+
+2.4. Comments on Media Type Registrations
+
+ Comments on registered media types may be submitted by members of the
+ community to IANA. These comments will be passed on to the "owner"
+ of the media type if possible. Submitters of comments may request
+ that their comment be attached to the media type registration itself,
+ and if IANA approves of this the comment will be made accessible in
+ conjunction with the type registration itself.
+
+2.5. Location of Registered Media Type List
+
+ Media type registrations will be posted in the anonymous FTP
+ directory "ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/"
+ and all registered media types will be listed in the periodically
+ issued "Assigned Numbers" RFC [currently STD 2, RFC 1700]. The media
+ type description and other supporting material may also be published
+ as an Informational RFC by sending it to "rfc-editor@isi.edu" (please
+ follow the instructions to RFC authors [RFC-1543]).
+
+2.6. IANA Procedures for Registering Media Types
+
+ The IANA will only register media types in the IETF tree in response
+ to a communication from the IESG stating that a given registration
+ has been approved. Vendor and personal types will be registered by
+ the IANA automatically and without any formal review as long as the
+ following minimal conditions are met:
+
+ (1) Media types must function as an actual media format.
+ In particular, character sets and transfer encodings
+ may not be registered as media types.
+
+ (2) All media types must have properly formed type and
+ subtype names. All type names must be defined by a
+ standards-track RFC. All subtype names must be unique,
+ must conform to the MIME grammar for such names, and
+ must contain the proper tree prefix.
+
+
+
+Freed, et. al. Best Current Practice [Page 12]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ (3) Types registered in the personal tree must either
+ provide a format specification or a pointer to one.
+
+ (4) Any security considerations given must not be obviously
+ bogus. (It is neither possible nor necessary for the
+ IANA to conduct a comprehensive security review of
+ media type registrations. Nevertheless, IANA has the
+ authority to identify obviously incompetent material
+ and exclude it.)
+
+2.7. Change Control
+
+ Once a media type has been published by IANA, the author may request
+ a change to its definition. The descriptions of the different
+ registration trees above designate the "owners" of each type of
+ registration. The change request follows the same procedure as the
+ registration request:
+
+ (1) Publish the revised template on the ietf-types list.
+
+ (2) Leave at least two weeks for comments.
+
+ (3) Publish using IANA after formal review if required.
+
+ Changes should be requested only when there are serious omission or
+ errors in the published specification. When review is required, a
+ change request may be denied if it renders entities that were valid
+ under the previous definition invalid under the new definition.
+
+ The owner of a content type may pass responsibility for the content
+ type to another person or agency by informing IANA and the ietf-types
+ list; this can be done without discussion or review.
+
+ The IESG may reassign responsibility for a media type. The most
+ common case of this will be to enable changes to be made to types
+ where the author of the registration has died, moved out of contact
+ or is otherwise unable to make changes that are important to the
+ community.
+
+ Media type registrations may not be deleted; media types which are no
+ longer believed appropriate for use can be declared OBSOLETE by a
+ change to their "intended use" field; such media types will be
+ clearly marked in the lists published by IANA.
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 13]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+2.8. Registration Template
+
+ To: ietf-types@iana.org
+ Subject: Registration of MIME media type XXX/YYY
+
+ MIME media type name:
+
+ MIME subtype name:
+
+ Required parameters:
+
+ Optional parameters:
+
+ Encoding considerations:
+
+ Security considerations:
+
+ Interoperability considerations:
+
+ Published specification:
+
+ Applications which use this media type:
+
+ Additional information:
+
+ Magic number(s):
+ File extension(s):
+ Macintosh File Type Code(s):
+
+ Person & email address to contact for further information:
+
+ Intended usage:
+
+ (One of COMMON, LIMITED USE or OBSOLETE)
+
+ Author/Change controller:
+
+ (Any other information that the author deems interesting may be
+ added below this line.)
+
+3. External Body Access Types
+
+ RFC 2046 defines the message/external-body media type, whereby a MIME
+ entity can act as pointer to the actual body data in lieu of
+ including the data directly in the entity body. Each
+ message/external-body reference specifies an access type, which
+ determines the mechanism used to retrieve the actual body data. RFC
+ 2046 defines an initial set of access types, but allows for the
+
+
+
+Freed, et. al. Best Current Practice [Page 14]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+ registration of additional access types to accommodate new retrieval
+ mechanisms.
+
+3.1. Registration Requirements
+
+ New access type specifications must conform to a number of
+ requirements as described below.
+
+3.1.1. Naming Requirements
+
+ Each access type must have a unique name. This name appears in the
+ access-type parameter in the message/external-body content-type
+ header field, and must conform to MIME content type parameter syntax.
+
+3.1.2. Mechanism Specification Requirements
+
+ All of the protocols, transports, and procedures used by a given
+ access type must be described, either in the specification of the
+ access type itself or in some other publicly available specification,
+ in sufficient detail for the access type to be implemented by any
+ competent implementor. Use of secret and/or proprietary methods in
+ access types are expressly prohibited. The restrictions imposed by
+ RFC 1602 on the standardization of patented algorithms must be
+ respected as well.
+
+3.1.3. Publication Requirements
+
+ All access types must be described by an RFC. The RFC may be
+ informational rather than standards-track, although standard-track
+ review and approval are encouraged for all access types.
+
+3.1.4. Security Requirements
+
+ Any known security issues that arise from the use of the access type
+ must be completely and fully described. It is not required that the
+ access type be secure or that it be free from risks, but that the
+ known risks be identified. Publication of a new access type does not
+ require an exhaustive security review, and the security
+ considerations section is subject to continuing evaluation.
+ Additional security considerations should be addressed by publishing
+ revised versions of the access type specification.
+
+3.2. Registration Procedure
+
+ Registration of a new access type starts with the construction of a
+ draft of an RFC.
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 15]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+3.2.1. Present the Access Type to the Community
+
+ Send a proposed access type specification to the "ietf-
+ types@iana.org" mailing list for a two week review period. This
+ mailing list has been established for the purpose of reviewing
+ proposed access and media types. Proposed access types are not
+ formally registered and must not be used.
+
+ The intent of the public posting is to solicit comments and feedback
+ on the access type specification and a review of any security
+ considerations.
+
+3.2.2. Access Type Reviewer
+
+ When the two week period has passed, the access type reviewer, who is
+ appointed by the IETF Applications Area Director, either forwards the
+ request to iana@isi.edu, or rejects it because of significant
+ objections raised on the list.
+
+ Decisions made by the reviewer must be posted to the ietf-types
+ mailing list within 14 days. Decisions made by the reviewer may be
+ appealed to the IESG.
+
+3.2.3. IANA Registration
+
+ Provided that the access type has either passed review or has been
+ successfully appealed to the IESG, the IANA will register the access
+ type and make the registration available to the community. The
+ specification of the access type must also be published as an RFC.
+ Informational RFCs are published by sending them to "rfc-
+ editor@isi.edu" (please follow the instructions to RFC authors [RFC-
+ 1543]).
+
+3.3. Location of Registered Access Type List
+
+ Access type registrations will be posted in the anonymous FTP
+ directory "ftp://ftp.isi.edu/in-notes/iana/assignments/access-types/"
+ and all registered access types will be listed in the periodically
+ issued "Assigned Numbers" RFC [currently RFC-1700].
+
+3.4. IANA Procedures for Registering Access Types
+
+ The identity of the access type reviewer is communicated to the IANA
+ by the IESG. The IANA then only acts in response to access type
+ definitions that either are approved by the access type reviewer and
+ forwarded by the reviewer to the IANA for registration, or in
+ response to a communication from the IESG that an access type
+ definition appeal has overturned the access type reviewer's ruling.
+
+
+
+Freed, et. al. Best Current Practice [Page 16]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+4. Transfer Encodings
+
+ Transfer encodings are tranformations applied to MIME media types
+ after conversion to the media type's canonical form. Transfer
+ encodings are used for several purposes:
+
+ (1) Many transports, especially message transports, can
+ only handle data consisting of relatively short lines
+ of text. There can also be severe restrictions on what
+ characters can be used in these lines of text -- some
+ transports are restricted to a small subset of US-ASCII
+ and others cannot handle certain character sequences.
+ Transfer encodings are used to transform binary data
+ into textual form that can survive such transports.
+ Examples of this sort of transfer encoding include the
+ base64 and quoted-printable transfer encodings defined
+ in RFC 2045.
+
+ (2) Image, audio, video, and even application entities are
+ sometimes quite large. Compression algorithms are often
+ quite effective in reducing the size of large entities.
+ Transfer encodings can be used to apply general-purpose
+ non-lossy compression algorithms to MIME entities.
+
+ (3) Transport encodings can be defined as a means of
+ representing existing encoding formats in a MIME
+ context.
+
+ IMPORTANT: The standardization of a large numbers of different
+ transfer encodings is seen as a significant barrier to widespread
+ interoperability and is expressely discouraged. Nevertheless, the
+ following procedure has been defined to provide a means of defining
+ additional transfer encodings, should standardization actually be
+ justified.
+
+4.1. Transfer Encoding Requirements
+
+ Transfer encoding specifications must conform to a number of
+ requirements as described below.
+
+4.1.1. Naming Requirements
+
+ Each transfer encoding must have a unique name. This name appears in
+ the Content-Transfer-Encoding header field and must conform to the
+ syntax of that field.
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 17]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+4.1.2. Algorithm Specification Requirements
+
+ All of the algorithms used in a transfer encoding (e.g. conversion
+ to printable form, compression) must be described in their entirety
+ in the transfer encoding specification. Use of secret and/or
+ proprietary algorithms in standardized transfer encodings are
+ expressly prohibited. The restrictions imposed by RFC 1602 on the
+ standardization of patented algorithms must be respected as well.
+
+4.1.3. Input Domain Requirements
+
+ All transfer encodings must be applicable to an arbitrary sequence of
+ octets of any length. Dependence on particular input forms is not
+ allowed.
+
+ It should be noted that the 7bit and 8bit encodings do not conform to
+ this requirement. Aside from the undesireability of having
+ specialized encodings, the intent here is to forbid the addition of
+ additional encodings along the lines of 7bit and 8bit.
+
+4.1.4. Output Range Requirements
+
+ There is no requirement that a particular tranfer encoding produce a
+ particular form of encoded output. However, the output format for
+ each transfer encoding must be fully and completely documented. In
+ particular, each specification must clearly state whether the output
+ format always lies within the confines of 7bit data, 8bit data, or is
+ simply pure binary data.
+
+4.1.5. Data Integrity and Generality Requirements
+
+ All transfer encodings must be fully invertible on any platform; it
+ must be possible for anyone to recover the original data by
+ performing the corresponding decoding operation. Note that this
+ requirement effectively excludes all forms of lossy compression as
+ well as all forms of encryption from use as a transfer encoding.
+
+4.1.6. New Functionality Requirements
+
+ All transfer encodings must provide some sort of new functionality.
+ Some degree of functionality overlap with previously defined transfer
+ encodings is acceptable, but any new transfer encoding must also
+ offer something no other transfer encoding provides.
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 18]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+4.2. Transfer Encoding Definition Procedure
+
+ Definition of a new transfer encoding starts with the construction of
+ a draft of a standards-track RFC. The RFC must define the transfer
+ encoding precisely and completely, and must also provide substantial
+ justification for defining and standardizing a new transfer encoding.
+ This specification must then be presented to the IESG for
+ consideration. The IESG can
+
+ (1) reject the specification outright as being
+ inappropriate for standardization,
+
+ (2) approve the formation of an IETF working group to work
+ on the specification in accordance with IETF
+ procedures, or,
+
+ (3) accept the specification as-is and put it directly on
+ the standards track.
+
+ Transfer encoding specifications on the standards track follow normal
+ IETF rules for standards track documents. A transfer encoding is
+ considered to be defined and available for use once it is on the
+ standards track.
+
+4.3. IANA Procedures for Transfer Encoding Registration
+
+ There is no need for a special procedure for registering Transfer
+ Encodings with the IANA. All legitimate transfer encoding
+ registrations must appear as a standards-track RFC, so it is the
+ IESG's responsibility to notify the IANA when a new transfer encoding
+ has been approved.
+
+4.4. Location of Registered Transfer Encodings List
+
+ Transfer encoding registrations will be posted in the anonymous FTP
+ directory "ftp://ftp.isi.edu/in-notes/iana/assignments/transfer-
+ encodings/" and all registered transfer encodings will be listed in
+ the periodically issued "Assigned Numbers" RFC [currently RFC-1700].
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 19]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+5. Authors' Addresses
+
+ For more information, the authors of this document are best
+ contacted via Internet mail:
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 East Garvey Avenue South
+ West Covina, CA 91790
+ USA
+
+ Phone: +1 818 919 3600
+ Fax: +1 818 919 3614
+ EMail: ned@innosoft.com
+
+
+ John Klensin
+ MCI
+ 2100 Reston Parkway
+ Reston, VA 22091
+
+ Phone: +1 703 715-7361
+ Fax: +1 703 715-7436
+ EMail: klensin@mci.net
+
+
+ Jon Postel
+ USC/Information Sciences Institute
+ 4676 Admiralty Way
+ Marina del Rey, CA 90292
+ USA
+
+
+ Phone: +1 310 822 1511
+ Fax: +1 310 823 6714
+ EMail: Postel@ISI.EDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 20]
+\f
+RFC 2048 MIME Registration Procedures November 1996
+
+
+Appendix A -- Grandfathered Media Types
+
+ A number of media types, registered prior to 1996, would, if
+ registered under the guidelines in this document, be placed into
+ either the vendor or personal trees. Reregistration of those types
+ to reflect the appropriate trees is encouraged, but not required.
+ Ownership and change control principles outlined in this document
+ apply to those types as if they had been registered in the trees
+ described above.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed, et. al. Best Current Practice [Page 21]
+\f
--- /dev/null
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2049 Innosoft
+Obsoletes: 1521, 1522, 1590 N. Borenstein
+Category: Standards Track First Virtual
+ November 1996
+
+
+ Multipurpose Internet Mail Extensions
+ (MIME) Part Five:
+ Conformance Criteria and Examples
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ STD 11, RFC 822, defines a message representation protocol specifying
+ considerable detail about US-ASCII message headers, and leaves the
+ message content, or message body, as flat US-ASCII text. This set of
+ documents, collectively called the Multipurpose Internet Mail
+ Extensions, or MIME, redefines the format of messages to allow for
+
+ (1) textual message bodies in character sets other than
+ US-ASCII,
+
+ (2) an extensible set of different formats for non-textual
+ message bodies,
+
+ (3) multi-part message bodies, and
+
+ (4) textual header information in character sets other than
+ US-ASCII.
+
+ These documents are based on earlier work documented in RFC 934, STD
+ 11, and RFC 1049, but extends and revises them. Because RFC 822 said
+ so little about message bodies, these documents are largely
+ orthogonal to (rather than a revision of) RFC 822.
+
+ The initial document in this set, RFC 2045, specifies the various
+ headers used to describe the structure of MIME messages. The second
+ document defines the general structure of the MIME media typing
+ system and defines an initial set of media types. The third
+ document, RFC 2047, describes extensions to RFC 822 to allow non-US-
+
+
+
+Freed & Borenstein Standards Track [Page 1]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ ASCII text data in Internet mail header fields. The fourth document,
+ RFC 2048, specifies various IANA registration procedures for MIME-
+ related facilities. This fifth and final document describes MIME
+ conformance criteria as well as providing some illustrative examples
+ of MIME message formats, acknowledgements, and the bibliography.
+
+ These documents are revisions of RFCs 1521, 1522, and 1590, which
+ themselves were revisions of RFCs 1341 and 1342. Appendix B of this
+ document describes differences and changes from previous versions.
+
+Table of Contents
+
+ 1. Introduction .......................................... 2
+ 2. MIME Conformance ...................................... 2
+ 3. Guidelines for Sending Email Data ..................... 6
+ 4. Canonical Encoding Model .............................. 9
+ 5. Summary ............................................... 12
+ 6. Security Considerations ............................... 12
+ 7. Authors' Addresses .................................... 12
+ 8. Acknowledgements ...................................... 13
+ A. A Complex Multipart Example ........................... 15
+ B. Changes from RFC 1521, 1522, and 1590 ................. 16
+ C. References ............................................ 20
+
+1. Introduction
+
+ The first and second documents in this set define MIME header fields
+ and the initial set of MIME media types. The third document
+ describes extensions to RFC822 formats to allow for character sets
+ other than US-ASCII. This document describes what portions of MIME
+ must be supported by a conformant MIME implementation. It also
+ describes various pitfalls of contemporary messaging systems as well
+ as the canonical encoding model MIME is based on.
+
+2. MIME Conformance
+
+ The mechanisms described in these documents are open-ended. It is
+ definitely not expected that all implementations will support all
+ available media types, nor that they will all share the same
+ extensions. In order to promote interoperability, however, it is
+ useful to define the concept of "MIME-conformance" to define a
+ certain level of implementation that allows the useful interworking
+ of messages with content that differs from US-ASCII text. In this
+ section, we specify the requirements for such conformance.
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 2]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ A mail user agent that is MIME-conformant MUST:
+
+ (1) Always generate a "MIME-Version: 1.0" header field in
+ any message it creates.
+
+ (2) Recognize the Content-Transfer-Encoding header field
+ and decode all received data encoded by either quoted-
+ printable or base64 implementations. The identity
+ transformations 7bit, 8bit, and binary must also be
+ recognized.
+
+ Any non-7bit data that is sent without encoding must be
+ properly labelled with a content-transfer-encoding of
+ 8bit or binary, as appropriate. If the underlying
+ transport does not support 8bit or binary (as SMTP
+ [RFC-821] does not), the sender is required to both
+ encode and label data using an appropriate Content-
+ Transfer-Encoding such as quoted-printable or base64.
+
+ (3) Must treat any unrecognized Content-Transfer-Encoding
+ as if it had a Content-Type of "application/octet-
+ stream", regardless of whether or not the actual
+ Content-Type is recognized.
+
+ (4) Recognize and interpret the Content-Type header field,
+ and avoid showing users raw data with a Content-Type
+ field other than text. Implementations must be able
+ to send at least text/plain messages, with the
+ character set specified with the charset parameter if
+ it is not US-ASCII.
+
+ (5) Ignore any content type parameters whose names they do
+ not recognize.
+
+ (6) Explicitly handle the following media type values, to
+ at least the following extents:
+
+ Text:
+
+ -- Recognize and display "text" mail with the
+ character set "US-ASCII."
+
+ -- Recognize other character sets at least to the
+ extent of being able to inform the user about what
+ character set the message uses.
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 3]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ -- Recognize the "ISO-8859-*" character sets to the
+ extent of being able to display those characters that
+ are common to ISO-8859-* and US-ASCII, namely all
+ characters represented by octet values 1-127.
+
+ -- For unrecognized subtypes in a known character
+ set, show or offer to show the user the "raw" version
+ of the data after conversion of the content from
+ canonical form to local form.
+
+ -- Treat material in an unknown character set as if
+ it were "application/octet-stream".
+
+ Image, audio, and video:
+
+ -- At a minumum provide facilities to treat any
+ unrecognized subtypes as if they were
+ "application/octet-stream".
+
+ Application:
+
+ -- Offer the ability to remove either of the quoted-
+ printable or base64 encodings defined in this
+ document if they were used and put the resulting
+ information in a user file.
+
+ Multipart:
+
+ -- Recognize the mixed subtype. Display all relevant
+ information on the message level and the body part
+ header level and then display or offer to display
+ each of the body parts individually.
+
+ -- Recognize the "alternative" subtype, and avoid
+ showing the user redundant parts of
+ multipart/alternative mail.
+
+ -- Recognize the "multipart/digest" subtype,
+ specifically using "message/rfc822" rather than
+ "text/plain" as the default media type for body parts
+ inside "multipart/digest" entities.
+
+ -- Treat any unrecognized subtypes as if they were
+ "mixed".
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 4]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ Message:
+
+ -- Recognize and display at least the RFC822 message
+ encapsulation (message/rfc822) in such a way as to
+ preserve any recursive structure, that is, displaying
+ or offering to display the encapsulated data in
+ accordance with its media type.
+
+ -- Treat any unrecognized subtypes as if they were
+ "application/octet-stream".
+
+ (7) Upon encountering any unrecognized Content-Type field,
+ an implementation must treat it as if it had a media
+ type of "application/octet-stream" with no parameter
+ sub-arguments. How such data are handled is up to an
+ implementation, but likely options for handling such
+ unrecognized data include offering the user to write it
+ into a file (decoded from its mail transport format) or
+ offering the user to name a program to which the
+ decoded data should be passed as input.
+
+ (8) Conformant user agents are required, if they provide
+ non-standard support for non-MIME messages employing
+ character sets other than US-ASCII, to do so on
+ received messages only. Conforming user agents must not
+ send non-MIME messages containing anything other than
+ US-ASCII text.
+
+ In particular, the use of non-US-ASCII text in mail
+ messages without a MIME-Version field is strongly
+ discouraged as it impedes interoperability when sending
+ messages between regions with different localization
+ conventions. Conforming user agents MUST include proper
+ MIME labelling when sending anything other than plain
+ text in the US-ASCII character set.
+
+ In addition, non-MIME user agents should be upgraded if
+ at all possible to include appropriate MIME header
+ information in the messages they send even if nothing
+ else in MIME is supported. This upgrade will have
+ little, if any, effect on non-MIME recipients and will
+ aid MIME in correctly displaying such messages. It
+ also provides a smooth transition path to eventual
+ adoption of other MIME capabilities.
+
+ (9) Conforming user agents must ensure that any string of
+ non-white-space printable US-ASCII characters within a
+ "*text" or "*ctext" that begins with "=?" and ends with
+
+
+
+Freed & Borenstein Standards Track [Page 5]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ "?=" be a valid encoded-word. ("begins" means: At the
+ start of the field-body or immediately following
+ linear-white-space; "ends" means: At the end of the
+ field-body or immediately preceding linear-white-
+ space.) In addition, any "word" within a "phrase" that
+ begins with "=?" and ends with "?=" must be a valid
+ encoded-word.
+
+ (10) Conforming user agents must be able to distinguish
+ encoded-words from "text", "ctext", or "word"s,
+ according to the rules in section 4, anytime they
+ appear in appropriate places in message headers. It
+ must support both the "B" and "Q" encodings for any
+ character set which it supports. The program must be
+ able to display the unencoded text if the character set
+ is "US-ASCII". For the ISO-8859-* character sets, the
+ mail reading program must at least be able to display
+ the characters which are also in the US-ASCII set.
+
+ A user agent that meets the above conditions is said to be MIME-
+ conformant. The meaning of this phrase is that it is assumed to be
+ "safe" to send virtually any kind of properly-marked data to users of
+ such mail systems, because such systems will at least be able to
+ treat the data as undifferentiated binary, and will not simply splash
+ it onto the screen of unsuspecting users.
+
+ There is another sense in which it is always "safe" to send data in a
+ format that is MIME-conformant, which is that such data will not
+ break or be broken by any known systems that are conformant with RFC
+ 821 and RFC 822. User agents that are MIME-conformant have the
+ additional guarantee that the user will not be shown data that were
+ never intended to be viewed as text.
+
+3. Guidelines for Sending Email Data
+
+ Internet email is not a perfect, homogeneous system. Mail may become
+ corrupted at several stages in its travel to a final destination.
+ Specifically, email sent throughout the Internet may travel across
+ many networking technologies. Many networking and mail technologies
+ do not support the full functionality possible in the SMTP transport
+ environment. Mail traversing these systems is likely to be modified
+ in order that it can be transported.
+
+ There exist many widely-deployed non-conformant MTAs in the Internet.
+ These MTAs, speaking the SMTP protocol, alter messages on the fly to
+ take advantage of the internal data structure of the hosts they are
+ implemented on, or are just plain broken.
+
+
+
+
+Freed & Borenstein Standards Track [Page 6]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ The following guidelines may be useful to anyone devising a data
+ format (media type) that is supposed to survive the widest range of
+ networking technologies and known broken MTAs unscathed. Note that
+ anything encoded in the base64 encoding will satisfy these rules, but
+ that some well-known mechanisms, notably the UNIX uuencode facility,
+ will not. Note also that anything encoded in the Quoted-Printable
+ encoding will survive most gateways intact, but possibly not some
+ gateways to systems that use the EBCDIC character set.
+
+ (1) Under some circumstances the encoding used for data may
+ change as part of normal gateway or user agent
+ operation. In particular, conversion from base64 to
+ quoted-printable and vice versa may be necessary. This
+ may result in the confusion of CRLF sequences with line
+ breaks in text bodies. As such, the persistence of
+ CRLF as something other than a line break must not be
+ relied on.
+
+ (2) Many systems may elect to represent and store text data
+ using local newline conventions. Local newline
+ conventions may not match the RFC822 CRLF convention --
+ systems are known that use plain CR, plain LF, CRLF, or
+ counted records. The result is that isolated CR and LF
+ characters are not well tolerated in general; they may
+ be lost or converted to delimiters on some systems, and
+ hence must not be relied on.
+
+ (3) The transmission of NULs (US-ASCII value 0) is
+ problematic in Internet mail. (This is largely the
+ result of NULs being used as a termination character by
+ many of the standard runtime library routines in the C
+ programming language.) The practice of using NULs as
+ termination characters is so entrenched now that
+ messages should not rely on them being preserved.
+
+ (4) TAB (HT) characters may be misinterpreted or may be
+ automatically converted to variable numbers of spaces.
+ This is unavoidable in some environments, notably those
+ not based on the US-ASCII character set. Such
+ conversion is STRONGLY DISCOURAGED, but it may occur,
+ and mail formats must not rely on the persistence of
+ TAB (HT) characters.
+
+ (5) Lines longer than 76 characters may be wrapped or
+ truncated in some environments. Line wrapping or line
+ truncation imposed by mail transports is STRONGLY
+ DISCOURAGED, but unavoidable in some cases.
+ Applications which require long lines must somehow
+
+
+
+Freed & Borenstein Standards Track [Page 7]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ differentiate between soft and hard line breaks. (A
+ simple way to do this is to use the quoted-printable
+ encoding.)
+
+ (6) Trailing "white space" characters (SPACE, TAB (HT)) on
+ a line may be discarded by some transport agents, while
+ other transport agents may pad lines with these
+ characters so that all lines in a mail file are of
+ equal length. The persistence of trailing white space,
+ therefore, must not be relied on.
+
+ (7) Many mail domains use variations on the US-ASCII
+ character set, or use character sets such as EBCDIC
+ which contain most but not all of the US-ASCII
+ characters. The correct translation of characters not
+ in the "invariant" set cannot be depended on across
+ character converting gateways. For example, this
+ situation is a problem when sending uuencoded
+ information across BITNET, an EBCDIC system. Similar
+ problems can occur without crossing a gateway, since
+ many Internet hosts use character sets other than US-
+ ASCII internally. The definition of Printable Strings
+ in X.400 adds further restrictions in certain special
+ cases. In particular, the only characters that are
+ known to be consistent across all gateways are the 73
+ characters that correspond to the upper and lower case
+ letters A-Z and a-z, the 10 digits 0-9, and the
+ following eleven special characters:
+
+ "'" (US-ASCII decimal value 39)
+ "(" (US-ASCII decimal value 40)
+ ")" (US-ASCII decimal value 41)
+ "+" (US-ASCII decimal value 43)
+ "," (US-ASCII decimal value 44)
+ "-" (US-ASCII decimal value 45)
+ "." (US-ASCII decimal value 46)
+ "/" (US-ASCII decimal value 47)
+ ":" (US-ASCII decimal value 58)
+ "=" (US-ASCII decimal value 61)
+ "?" (US-ASCII decimal value 63)
+
+ A maximally portable mail representation will confine
+ itself to relatively short lines of text in which the
+ only meaningful characters are taken from this set of
+ 73 characters. The base64 encoding follows this rule.
+
+ (8) Some mail transport agents will corrupt data that
+ includes certain literal strings. In particular, a
+
+
+
+Freed & Borenstein Standards Track [Page 8]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ period (".") alone on a line is known to be corrupted
+ by some (incorrect) SMTP implementations, and a line
+ that starts with the five characters "From " (the fifth
+ character is a SPACE) are commonly corrupted as well.
+ A careful composition agent can prevent these
+ corruptions by encoding the data (e.g., in the quoted-
+ printable encoding using "=46rom " in place of "From "
+ at the start of a line, and "=2E" in place of "." alone
+ on a line).
+
+ Please note that the above list is NOT a list of recommended
+ practices for MTAs. RFC 821 MTAs are prohibited from altering the
+ character of white space or wrapping long lines. These BAD and
+ invalid practices are known to occur on established networks, and
+ implementations should be robust in dealing with the bad effects they
+ can cause.
+
+4. Canonical Encoding Model
+
+ There was some confusion, in earlier versions of these documents,
+ regarding the model for when email data was to be converted to
+ canonical form and encoded, and in particular how this process would
+ affect the treatment of CRLFs, given that the representation of
+ newlines varies greatly from system to system. For this reason, a
+ canonical model for encoding is presented below.
+
+ The process of composing a MIME entity can be modeled as being done
+ in a number of steps. Note that these steps are roughly similar to
+ those steps used in PEM [RFC-1421] and are performed for each
+ "innermost level" body:
+
+ (1) Creation of local form.
+
+ The body to be transmitted is created in the system's
+ native format. The native character set is used and,
+ where appropriate, local end of line conventions are
+ used as well. The body may be a UNIX-style text file,
+ or a Sun raster image, or a VMS indexed file, or audio
+ data in a system-dependent format stored only in
+ memory, or anything else that corresponds to the local
+ model for the representation of some form of
+ information. Fundamentally, the data is created in the
+ "native" form that corresponds to the type specified by
+ the media type.
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 9]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ (2) Conversion to canonical form.
+
+ The entire body, including "out-of-band" information
+ such as record lengths and possibly file attribute
+ information, is converted to a universal canonical
+ form. The specific media type of the body as well as
+ its associated attributes dictate the nature of the
+ canonical form that is used. Conversion to the proper
+ canonical form may involve character set conversion,
+ transformation of audio data, compression, or various
+ other operations specific to the various media types.
+ If character set conversion is involved, however, care
+ must be taken to understand the semantics of the media
+ type, which may have strong implications for any
+ character set conversion, e.g. with regard to
+ syntactically meaningful characters in a text subtype
+ other than "plain".
+
+ For example, in the case of text/plain data, the text
+ must be converted to a supported character set and
+ lines must be delimited with CRLF delimiters in
+ accordance with RFC 822. Note that the restriction on
+ line lengths implied by RFC 822 is eliminated if the
+ next step employs either quoted-printable or base64
+ encoding.
+
+ (3) Apply transfer encoding.
+
+ A Content-Transfer-Encoding appropriate for this body
+ is applied. Note that there is no fixed relationship
+ between the media type and the transfer encoding. In
+ particular, it may be appropriate to base the choice of
+ base64 or quoted-printable on character frequency
+ counts which are specific to a given instance of a
+ body.
+
+ (4) Insertion into entity.
+
+ The encoded body is inserted into a MIME entity with
+ appropriate headers. The entity is then inserted into
+ the body of a higher-level entity (message or
+ multipart) as needed.
+
+ Conversion from entity form to local form is accomplished by
+ reversing these steps. Note that reversal of these steps may produce
+ differing results since there is no guarantee that the original and
+ final local forms are the same.
+
+
+
+
+Freed & Borenstein Standards Track [Page 10]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+ It is vital to note that these steps are only a model; they are
+ specifically NOT a blueprint for how an actual system would be built.
+ In particular, the model fails to account for two common designs:
+
+ (1) In many cases the conversion to a canonical form prior
+ to encoding will be subsumed into the encoder itself,
+ which understands local formats directly. For example,
+ the local newline convention for text bodies might be
+ carried through to the encoder itself along with
+ knowledge of what that format is.
+
+ (2) The output of the encoders may have to pass through one
+ or more additional steps prior to being transmitted as
+ a message. As such, the output of the encoder may not
+ be conformant with the formats specified by RFC 822.
+ In particular, once again it may be appropriate for the
+ converter's output to be expressed using local newline
+ conventions rather than using the standard RFC 822 CRLF
+ delimiters.
+
+ Other implementation variations are conceivable as well. The vital
+ aspect of this discussion is that, in spite of any optimizations,
+ collapsings of required steps, or insertion of additional processing,
+ the resulting messages must be consistent with those produced by the
+ model described here. For example, a message with the following
+ header fields:
+
+ Content-type: text/foo; charset=bar
+ Content-Transfer-Encoding: base64
+
+ must be first represented in the text/foo form, then (if necessary)
+ represented in the "bar" character set, and finally transformed via
+ the base64 algorithm into a mail-safe form.
+
+ NOTE: Some confusion has been caused by systems that represent
+ messages in a format which uses local newline conventions which
+ differ from the RFC822 CRLF convention. It is important to note that
+ these formats are not canonical RFC822/MIME. These formats are
+ instead *encodings* of RFC822, where CRLF sequences in the canonical
+ representation of the message are encoded as the local newline
+ convention. Note that formats which encode CRLF sequences as, for
+ example, LF are not capable of representing MIME messages containing
+ binary data which contains LF octets not part of CRLF line separation
+ sequences.
+
+
+
+
+
+
+
+Freed & Borenstein Standards Track [Page 11]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+5. Summary
+
+ This document defines what is meant by MIME Conformance. It also
+ details various problems known to exist in the Internet email system
+ and how to use MIME to overcome them. Finally, it describes MIME's
+ canonical encoding model.
+
+6. Security Considerations
+
+ Security issues are discussed in the second document in this set, RFC
+ 2046.
+
+7. Authors' Addresses
+
+ For more information, the authors of this document are best contacted
+ via Internet mail:
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 East Garvey Avenue South
+ West Covina, CA 91790
+ USA
+
+ Phone: +1 818 919 3600
+ Fax: +1 818 919 3614
+ EMail: ned@innosoft.com
+
+ Nathaniel S. Borenstein
+ First Virtual Holdings
+ 25 Washington Avenue
+ Morristown, NJ 07960
+ USA
+
+ Phone: +1 201 540 8967
+ Fax: +1 201 993 3032
+ EMail: nsb@nsb.fv.com
+
+ MIME is a result of the work of the Internet Engineering Task Force
+ Working Group on RFC 822 Extensions. The chairman of that group,
+ Greg Vaudreuil, may be reached at:
+
+ Gregory M. Vaudreuil
+ Octel Network Services
+ 17080 Dallas Parkway
+ Dallas, TX 75248-1905
+ USA
+
+ EMail: Greg.Vaudreuil@Octel.Com
+
+
+
+Freed & Borenstein Standards Track [Page 12]
+\f
+RFC 2049 MIME Conformance November 1996
+
+
+8. Acknowledgements
+
+ This document is the result of th