MIME content-transfer-encoding type "Hexa" - mime-types

I recently ran across a piece of spam-mail where most of the attachments had a content-transfer-encoding of Hexa.
What is this? Or what is it supposed to be?
The content of these attachments appears to actually be Base64 encoded.
After quite a bit of web searching, I can't find any documentation about this encoding. I'd ordinarily just assume that it's bogus, but GMail seemed to have no problem decoding it.

tl;dr: "Hexa" is an invalid Content-Transfer-Encoding value. Your spammer is sending broken emails.
There are only five valid values for the Content-Transfer-Encoding header: "7bit", "8bit", "base64", "quoted-printable" and "binary". (Private implementations can use other values with an "X-" prefix but obviously no other implementation will recognise these.)
This was originally specified way back in 1992 in RFC 1341, but it hasn't changed since then. As that RFC points out:
The definition of new content-transfer-encodings is explicitly discouraged
So you'll find the same five values described in modern documentation of the header, e.g. IBM's.

Related

Correct MIME type for data URL

I have an image which is encoded as a Data URL (RFC 2397, formerly "data URI", commonly used in browsers), so it looks like "...". It has its own media type and encoding specified internally, but is itself an ascii-compatible string. What's the right content type to use to describe data in this already-wrapped format?
I know I could unwrap it from the data-url format and use the underlying content-type (in this case image/jpeg), but for reasons out-of-scope of this question, that's complicated in my scenario. Plus, this format should have its own content type, right?
Since there might not be a correct answer to this question yet, and this SO question might itself define a reasonable answer for others asking this question, I'll document here my own proposal that I'm using:
application/dataurl
This is inspired by the accepted MIME-type for JSON, which is application/json. This seems appropriate to me as both are data formats which can wrap arbitrary content.

What is the correct Protobuf content type?

JSON has application/json as a standard. For protobuf some people use application/x-protobuf, but I saw something as odd as application/vnd.google.protobuf being proposed. Do we have an RFC or some other standard that I can use as a reference for this?
There's an expired IETF proposal that suggests application/protobuf. It does not address the question how the receiving side could determine the particular message type. Previous discussions suggested using a parameter to specify package and message, e.g. application/protobuf; proto=org.some.Message
In practice, the types you listed seem to be indeed the ones in use, for example the monitoring system Prometheus uses application/vnd.google.protobuf, and the Charles web debugging proxy recognizes application/x-protobuf; messageType="x.y.Z".
At the risk of being overly pedantic, the other answers only make sense if we're assuming that "protobuf content type" means the standard wire format for protos.
Content types should map to encoding schemes, and there are multiple encoding schemes for protos. The wire format is the most common and important one, but there's also text format, and potentially others. That being said, I am unable to find any standard content type for proto text format, so I don't know of any other options to add here.
Protos are just a way of describing schema, and are not tightly coupled with any particular way of encoding data in that schema.
The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending). Meanwhile, protobuf is serialization/de-serialization schema/library.

What is the correct mime type for esoteric languages

What is the correct mime-type type of esoteric languages?
I've googled everywhere, I even tried to ask Chuck Norris, but I didn't find the answer anywhere.
I have tried these for Brainfuck:
application/brainfuck
application/x-brainfuck
application/x+brainfuck
x-esoteric/x-brainfuck
chuck-norris-choice/brainfuck
x-you-lost-the-game/x-fuck-your-brain
42/++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
But none of them seemed to work.
A far as I'm aware, there is no 'official' media type for brainfuck (Official types listed here). You are of course free to make up your own without officially registering the type, but you should take a few things into consideration before choosing what name to use. All the information you need is in RFC2046. I'll discuss the relevant parts below.
Top Level Media Type
As far as I can see, the two options you might choose from are text and application:
text
According to Section 3:
The subtype "plain" in particular indicates plain text containing no formatting commands or directives of any sort. Plain text is intended to be displayed "as-is". No special software is required to get the full meaning of the text, aside from support for the indicated character set.
If you intend for the data to be displayed rather than interpreted by an application, I would use this.
Section 4.1.4 mentions the following about unrecognised subtypes:
Unrecognized subtypes of "text" should be treated as subtype "plain" as long as the MIME implementation knows how to handle the charset.
Setting your top level media type to text will ensure that compliant applications that do not recognise the full type will still render the data as text.
application
If you intend your data to be interpreted or processed further, you should use the application top-level media type. As in the argument above, if you label your data as application, any programs that receive it are more likely to behave in a sensible fashion.
Section 4.5.3 deals with unrecognised application types:
It is expected that many other subtypes of "application" will be defined in the future. MIME implementations must at a minimum treat any unrecognized subtypes as being equivalent to "application/octet-stream".
Reading the appropriate section (Section 4.5.1) we find out how applications are supposed to handle octet streams:
The recommended action for an implementation that receives an "application/octet-stream" entity is to simply offer to put the data in a file, with any Content-Transfer-Encoding undone, or perhaps to use it as input to a user-specified process.
If this seems like the most logical way to handle your data when it is unrecognised, then application is for you.
Sub-type
Choosing the subtype is much easier. Section 6 covers experimental media types:
A media type value beginning with the characters "X-" is a private value, to be used by consenting systems by mutual agreement. Any format without a rigorous and public definition must be named with an "X-" prefix, and publicly specified values shall never begin with "X-".
So your subtype should be X-brainfuck.
Summary
You have two options:
text/X-brainfuck
application/X-brainfuck
If you intend for applications to treat the data as plain text and display it, choose 1. If you intend the data to be interpreted or executed, choose 2. If you're unsure what you want to happen, choose 2, because the default expectation is that an application will prompt the user for what to do if it does not recognise the type.
I have no clue why you think application/... is an appropriate mime type for a text file.
One generally accepted MIME type for .bf is text/x-brainfuck. This is a language, not an executable.

What is the MIME type for Markdown?

Does anyone know if there exists a MIME type for Markdown? I guess it is text/plain, but is there a more specific one?
tl;dr: text/markdown since March 2016
In March 2016, text/markdown was registered as RFC7763 at IETF.
Previously, it should have been text/x-markdown. The text below describes the situation before March 2016, when RFC7763 was still a draft.
There is no official recommendation on Gruber’s definition, but the topic was discussed quite heavily on the official mailing-list, and reached the choice of text/x-markdown.
This conclusion was challenged later, has been confirmed and can be, IMO, considered consensus.
This is the only logical conclusion in the lack of an official mime type: text/ will provide proper default almost everywhere, x- because we're not using an official type, markdown and not gruber. or whatever because the type is now so common.
There are still unknowns regarding the different “flavors” of Markdown, though. I guess someone should register an official type, which is supposedly easy, but I doubt anyone dares do it beyond John Gruber, as he very recently proved his attachment to Markdown.
There is a draft on the IETF for text/markdown, but the contents do not seem to describe Markdown at all, so I wouldn't use it until it gets more complete.
There is no official standard type, but text/markdown seems to be the most common de facto type. Most browsers and other reasonably sophisticated clients will likely see the text/ part and default to text/plain anyway, so there's not much difference.
One caveat, though: all types under the text/ hiearchy default to ISO-8859-1 for their character type in the relevant RFC standards. Most of the world has since moved on to UTF-8. So unless you're positive you won't be using any funny characters (or live in an old Windows world) you might want to specify it as follows:
text/markdown; charset=UTF-8
Looks like text/markdown is going to be the standard.
http://www.iana.org/go/draft-ietf-appsawg-text-markdown
https://www.iana.org/assignments/media-types/media-types.xhtml
Search for markdown.
According to RFC7763 “The text/markdown type” from 2016, the general MIME type is
text/markdown; charset=UTF-8
where the charset parameter is required but need not be UTF-8.
That RFC also specifies an optional variant parameter, and the Internet
Assigned Numbers Authority maintains a registry of Markdown
Variants
by which the specific variant of Markdown can be specified, e.g.,
text/markdown; charset=UTF-8; variant=Original
text/markdown; charset=UTF-8; variant=GFM
text/markdown; charset=UTF-8; variant=CommonMark
Some variants allow further parameters, as specified in
RFC7764 “Guidance on Markdown”,
e.g., you could add extensions=-startnum with the pandoc variant to specify a tweak to the dialect,
although I do not know how/whether pandoc might actually interpret that.
Why is the character set required?
RFC2046 “MIME Part Two” from 1996
set US-ASCII as the default character set, but also said
The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well.
Then RFC2616 “HTTP/1.1” from 1999
specified ISO-8859-1 as the default character set for text/* transported over
HTTP, and with the web becoming a dominant mode of communication,
this became the presumed default encoding for text/* media types.
Without an explicit character set or registered mime-type-specific default, text/* is considered to be
US-ASCII, unless said text is transported over HTTP in which case it is
considered to be ISO-8859-1.
RFC 6657 “Update to MIME regarding "charset" Parameter Handling
in Textual Media Types”
attempted to clarify this discrepancy
by requiring all new media type registrations
to explicitly specify how
to determine the character set,
preferably by including it in the payload as HTML allows with
<meta charset=UTF-8>.
The text/markdown
registration
specifies the charset parameter as “Required.” Therefore using a content-type of
text/markdown is technically invalid, and the character set of such content may
legitimately be interpreted as any of undefined, invalid, US-ASCII,
ISO-8859-1, or the UTF-8 that in practice it will almost always be.
Found this thread from 2008 : http://www.mail-archive.com/markdown-discuss#six.pairlist.net/msg00973.html
Seems like the mime type text/vnd.daringfireball.markdown should be registered by the author of Markdown, until then the Markdown mime type can be specified as text/x-markdown.

What is the possible mimetype hierarchy of an email message?

I'm working with a snippet of code that recursively calls itself and tries to pull out a MIME Type part of text/html from an email (if it exists) for further processing.
The "text/html" could exist inside other content such as multipart/alternative, so I'm trying to find out if there is a defined hierarchy for email MIME Types.
Anybody know if there is and what it is? i.e. what types can parent other types?
In theory, only multipart/ and message/ can parent other types (per RFC2046).
Your question assumes that mail clients follow the RFC standards for MIME encoding, which they don't. I'd advise you collect a bunch of mail from sources and try and process it as-it-exists. The problem you are facing is extremely difficult (perhaps impossible) to solve 100%.

Resources