Correct MIME type for data URL

Correct MIME type for data URL - mime-types

I have an image which is encoded as a Data URL (RFC 2397, formerly "data URI", commonly used in browsers), so it looks like "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQA...". It has its own media type and encoding specified internally, but is itself an ascii-compatible string. What's the right content type to use to describe data in this already-wrapped format?
I know I could unwrap it from the data-url format and use the underlying content-type (in this case image/jpeg), but for reasons out-of-scope of this question, that's complicated in my scenario. Plus, this format should have its own content type, right?

Since there might not be a correct answer to this question yet, and this SO question might itself define a reasonable answer for others asking this question, I'll document here my own proposal that I'm using:
application/dataurl
This is inspired by the accepted MIME-type for JSON, which is application/json. This seems appropriate to me as both are data formats which can wrap arbitrary content.

Related

MIME content-transfer-encoding type "Hexa"

I recently ran across a piece of spam-mail where most of the attachments had a content-transfer-encoding of Hexa.
What is this? Or what is it supposed to be?
The content of these attachments appears to actually be Base64 encoded.
After quite a bit of web searching, I can't find any documentation about this encoding. I'd ordinarily just assume that it's bogus, but GMail seemed to have no problem decoding it.

tl;dr: "Hexa" is an invalid Content-Transfer-Encoding value. Your spammer is sending broken emails.
There are only five valid values for the Content-Transfer-Encoding header: "7bit", "8bit", "base64", "quoted-printable" and "binary". (Private implementations can use other values with an "X-" prefix but obviously no other implementation will recognise these.)
This was originally specified way back in 1992 in RFC 1341, but it hasn't changed since then. As that RFC points out:
The definition of new content-transfer-encodings is explicitly discouraged
So you'll find the same five values described in modern documentation of the header, e.g. IBM's.

How to parse JSON-LD data in java and turn it into a java object

I don't know what kind of format is inside the JSON-LD, but it needs to be converted into a well-defined object.
My question is I don't know what kind of JSON-LD data is uploaded by the client and I don't know if it is possible to convert such data into some object with a well-defined format.
Do you have a solution yet?

If you don't know the structure of the object beforehand, you will probably have to use some generic structure to hold the data in Java. If you use a library like jsonld-java, it will do exactly that. You will work with Maps and Lists and it should be able to accommodate basically any JSON-LD data.
If you did know the target structure (for example, if it were one of several types for which you have a Java class), you could use a library like JB4JSON-LD to load it into an object of that class.
Disclaimer: I am the author of the JB4JSON-LD library.

What is the correct Protobuf content type?

JSON has application/json as a standard. For protobuf some people use application/x-protobuf, but I saw something as odd as application/vnd.google.protobuf being proposed. Do we have an RFC or some other standard that I can use as a reference for this?

There's an expired IETF proposal that suggests application/protobuf. It does not address the question how the receiving side could determine the particular message type. Previous discussions suggested using a parameter to specify package and message, e.g. application/protobuf; proto=org.some.Message
In practice, the types you listed seem to be indeed the ones in use, for example the monitoring system Prometheus uses application/vnd.google.protobuf, and the Charles web debugging proxy recognizes application/x-protobuf; messageType="x.y.Z".

At the risk of being overly pedantic, the other answers only make sense if we're assuming that "protobuf content type" means the standard wire format for protos.
Content types should map to encoding schemes, and there are multiple encoding schemes for protos. The wire format is the most common and important one, but there's also text format, and potentially others. That being said, I am unable to find any standard content type for proto text format, so I don't know of any other options to add here.
Protos are just a way of describing schema, and are not tightly coupled with any particular way of encoding data in that schema.

The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending). Meanwhile, protobuf is serialization/de-serialization schema/library.

What is the MIME type for Markdown?

Does anyone know if there exists a MIME type for Markdown? I guess it is text/plain, but is there a more specific one?

tl;dr: text/markdown since March 2016
In March 2016, text/markdown was registered as RFC7763 at IETF.
Previously, it should have been text/x-markdown. The text below describes the situation before March 2016, when RFC7763 was still a draft.
There is no official recommendation on Gruber’s definition, but the topic was discussed quite heavily on the official mailing-list, and reached the choice of text/x-markdown.
This conclusion was challenged later, has been confirmed and can be, IMO, considered consensus.
This is the only logical conclusion in the lack of an official mime type: text/ will provide proper default almost everywhere, x- because we're not using an official type, markdown and not gruber. or whatever because the type is now so common.
There are still unknowns regarding the different “flavors” of Markdown, though. I guess someone should register an official type, which is supposedly easy, but I doubt anyone dares do it beyond John Gruber, as he very recently proved his attachment to Markdown.
There is a draft on the IETF for text/markdown, but the contents do not seem to describe Markdown at all, so I wouldn't use it until it gets more complete.

There is no official standard type, but text/markdown seems to be the most common de facto type. Most browsers and other reasonably sophisticated clients will likely see the text/ part and default to text/plain anyway, so there's not much difference.
One caveat, though: all types under the text/ hiearchy default to ISO-8859-1 for their character type in the relevant RFC standards. Most of the world has since moved on to UTF-8. So unless you're positive you won't be using any funny characters (or live in an old Windows world) you might want to specify it as follows:
text/markdown; charset=UTF-8

Looks like text/markdown is going to be the standard.
http://www.iana.org/go/draft-ietf-appsawg-text-markdown
https://www.iana.org/assignments/media-types/media-types.xhtml
Search for markdown.

According to RFC7763 “The text/markdown type” from 2016, the general MIME type is
text/markdown; charset=UTF-8
where the charset parameter is required but need not be UTF-8.
That RFC also specifies an optional variant parameter, and the Internet
Assigned Numbers Authority maintains a registry of Markdown
Variants
by which the specific variant of Markdown can be specified, e.g.,
text/markdown; charset=UTF-8; variant=Original
text/markdown; charset=UTF-8; variant=GFM
text/markdown; charset=UTF-8; variant=CommonMark
Some variants allow further parameters, as specified in
RFC7764 “Guidance on Markdown”,
e.g., you could add extensions=-startnum with the pandoc variant to specify a tweak to the dialect,
although I do not know how/whether pandoc might actually interpret that.
Why is the character set required?
RFC2046 “MIME Part Two” from 1996
set US-ASCII as the default character set, but also said
The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well.
Then RFC2616 “HTTP/1.1” from 1999
specified ISO-8859-1 as the default character set for text/* transported over
HTTP, and with the web becoming a dominant mode of communication,
this became the presumed default encoding for text/* media types.
Without an explicit character set or registered mime-type-specific default, text/* is considered to be
US-ASCII, unless said text is transported over HTTP in which case it is
considered to be ISO-8859-1.
RFC 6657 “Update to MIME regarding "charset" Parameter Handling
in Textual Media Types”
attempted to clarify this discrepancy
by requiring all new media type registrations
to explicitly specify how
to determine the character set,
preferably by including it in the payload as HTML allows with
<meta charset=UTF-8>.
The text/markdown
registration
specifies the charset parameter as “Required.” Therefore using a content-type of
text/markdown is technically invalid, and the character set of such content may
legitimately be interpreted as any of undefined, invalid, US-ASCII,
ISO-8859-1, or the UTF-8 that in practice it will almost always be.

Found this thread from 2008 : http://www.mail-archive.com/markdown-discuss#six.pairlist.net/msg00973.html
Seems like the mime type text/vnd.daringfireball.markdown should be registered by the author of Markdown, until then the Markdown mime type can be specified as text/x-markdown.

How can I tell a string is encoded from db.model_to_protobuf

I use db.model_to_protobuf for my AppEngine project, I wonder if is there a way to tell a string is encoded from db.model_to_protobuf? I have no time to read the source code, can anyone give me a favour?
Thanks~

In principle, yes - the protocol buffer encoding format is documented here. In practice, this is a terrible idea: you should use metadata to identify the format of your data, and decode it based on that, not based on trying to guess the content type.