What is the correct mime type for esoteric languages - mime-types

What is the correct mime-type type of esoteric languages?
I've googled everywhere, I even tried to ask Chuck Norris, but I didn't find the answer anywhere.
I have tried these for Brainfuck:
application/brainfuck
application/x-brainfuck
application/x+brainfuck
x-esoteric/x-brainfuck
chuck-norris-choice/brainfuck
x-you-lost-the-game/x-fuck-your-brain
42/++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
But none of them seemed to work.

A far as I'm aware, there is no 'official' media type for brainfuck (Official types listed here). You are of course free to make up your own without officially registering the type, but you should take a few things into consideration before choosing what name to use. All the information you need is in RFC2046. I'll discuss the relevant parts below.
Top Level Media Type
As far as I can see, the two options you might choose from are text and application:
text
According to Section 3:
The subtype "plain" in particular indicates plain text containing no formatting commands or directives of any sort. Plain text is intended to be displayed "as-is". No special software is required to get the full meaning of the text, aside from support for the indicated character set.
If you intend for the data to be displayed rather than interpreted by an application, I would use this.
Section 4.1.4 mentions the following about unrecognised subtypes:
Unrecognized subtypes of "text" should be treated as subtype "plain" as long as the MIME implementation knows how to handle the charset.
Setting your top level media type to text will ensure that compliant applications that do not recognise the full type will still render the data as text.
application
If you intend your data to be interpreted or processed further, you should use the application top-level media type. As in the argument above, if you label your data as application, any programs that receive it are more likely to behave in a sensible fashion.
Section 4.5.3 deals with unrecognised application types:
It is expected that many other subtypes of "application" will be defined in the future. MIME implementations must at a minimum treat any unrecognized subtypes as being equivalent to "application/octet-stream".
Reading the appropriate section (Section 4.5.1) we find out how applications are supposed to handle octet streams:
The recommended action for an implementation that receives an "application/octet-stream" entity is to simply offer to put the data in a file, with any Content-Transfer-Encoding undone, or perhaps to use it as input to a user-specified process.
If this seems like the most logical way to handle your data when it is unrecognised, then application is for you.
Sub-type
Choosing the subtype is much easier. Section 6 covers experimental media types:
A media type value beginning with the characters "X-" is a private value, to be used by consenting systems by mutual agreement. Any format without a rigorous and public definition must be named with an "X-" prefix, and publicly specified values shall never begin with "X-".
So your subtype should be X-brainfuck.
Summary
You have two options:
text/X-brainfuck
application/X-brainfuck
If you intend for applications to treat the data as plain text and display it, choose 1. If you intend the data to be interpreted or executed, choose 2. If you're unsure what you want to happen, choose 2, because the default expectation is that an application will prompt the user for what to do if it does not recognise the type.

I have no clue why you think application/... is an appropriate mime type for a text file.
One generally accepted MIME type for .bf is text/x-brainfuck. This is a language, not an executable.

Related

Voice XML -- need a field filled with raw ASR input

I'm trying to build a voice XML interface to a machine translation system. Most of the menu design is simple enough, but when the user actually says the phrase to be translated, I need to be able to intake whatever text comes from the ASR without trying to match it to a finite grammar. Is there a standard way to do this in voice XML?
If by standard way, you mean VoiceXML with SRGS/SISR, you could build a grammar that had ever word of the target language and the SI to reassemble the content into a slot. Not a practical solution, but a possible one within the specification constraints.
If you are just looking at VoiceXML, only building the capability into a browser would be a constraint, as VoiceXML doesn't provide any relevant restrictions for how $lastresult is populated.
Your implementation constraints and what your are trying to achieve might be helpful to create a practical solution.
The 'standard' VoiceXML not allows to get free text (because you allay use a grammar with strict rules), you plan to be out of the initial scope of the specification.
If you can control your VoiceXML interpreter implementation you can use the same method as us. With our Voximal VoiceXML interpreter we solve this by using a builtin grammar :
<field name="text" type="text" > : it use the builtin:grammar/text
You can extend by adding parameter like "text?lang=en-US" or 'text?model=MyWatsonModel".
The text restult is in the variable, and you can add extra values in the shaddow variables.
All this is platform dependent, and of of the scope of the VoiceXML standard. But I think it is the best way to integrate SpeechToText in the VoiceXML.

What is the correct Protobuf content type?

JSON has application/json as a standard. For protobuf some people use application/x-protobuf, but I saw something as odd as application/vnd.google.protobuf being proposed. Do we have an RFC or some other standard that I can use as a reference for this?
There's an expired IETF proposal that suggests application/protobuf. It does not address the question how the receiving side could determine the particular message type. Previous discussions suggested using a parameter to specify package and message, e.g. application/protobuf; proto=org.some.Message
In practice, the types you listed seem to be indeed the ones in use, for example the monitoring system Prometheus uses application/vnd.google.protobuf, and the Charles web debugging proxy recognizes application/x-protobuf; messageType="x.y.Z".
At the risk of being overly pedantic, the other answers only make sense if we're assuming that "protobuf content type" means the standard wire format for protos.
Content types should map to encoding schemes, and there are multiple encoding schemes for protos. The wire format is the most common and important one, but there's also text format, and potentially others. That being said, I am unable to find any standard content type for proto text format, so I don't know of any other options to add here.
Protos are just a way of describing schema, and are not tightly coupled with any particular way of encoding data in that schema.
The Content-Type representation header is used to indicate the original media type of the resource (prior to any content encoding applied for sending). Meanwhile, protobuf is serialization/de-serialization schema/library.

Validate only a part of XML file using an XSD file

Is there a way to validate only a part of XML file using XSD file and ignore the other contents of the XML file. I want to validate only a couple of tags in XML file using XSD file. My XML file contains many tags, but xsd contains elements for only few of the tags.
Is it possible to attain this somehow?
There are two ways (at least) to achieve this, in principle.
First, you can in principle tell the validator which elements in the document you want to validate; the XSD spec does not require validation to start at the document root. In practice, command-line validators almost never provide run-time options for starting validation anywhere but the root. I think validation libraries are more likely to provide that functionality; they often (or at least sometimes) provide functions to allow you to pass in the element at which validation should start, together with the necessary schema information.
If your validator doesn't allow you to validate selectively, you can write a schema that contains declarations for just those elements and attributes you want to validate, and invoke a validator on the document root in "lax validation mode" -- which means, essentially "If you find in the schema a declaration for an element in the document, then validate the element against its declaration, otherwise accept it (pretend it matches a lax wildcard in the declaration of its parent) and move on." The validator will thus ignore elements for which you provide no declarations and validate elements for which you do provide declarations. (Note that conforming XSD processors are not required to provide lax-validation mode, and the definition of lax validation in the spec is a little underspecified, but I believe most available processors do support it and do the same thing in lax mode.)
An ugly hack would be to construct a "validatable" document from the main one by omitting that which you don't want validated, and validate that one. I don't endorse this approach, but it's an answer at least.
The easiest way to achieve this in practice is probably to do the validation using the validate expression in XQuery (or copy-of with validation in XSLT). This allows you to select the element you want to validate, and perform the validation, in one go.
The downside might be that validate in XQuery is defined to be a fatal error if the document is invalid, so the implementation might simply stop on the first error rather than focusing on giving you as much information about the invalidity as it can. At this stage you need to find out how it's implemented in a particular processor and/or how to configure that processor.

What is the MIME type for Markdown?

Does anyone know if there exists a MIME type for Markdown? I guess it is text/plain, but is there a more specific one?
tl;dr: text/markdown since March 2016
In March 2016, text/markdown was registered as RFC7763 at IETF.
Previously, it should have been text/x-markdown. The text below describes the situation before March 2016, when RFC7763 was still a draft.
There is no official recommendation on Gruber’s definition, but the topic was discussed quite heavily on the official mailing-list, and reached the choice of text/x-markdown.
This conclusion was challenged later, has been confirmed and can be, IMO, considered consensus.
This is the only logical conclusion in the lack of an official mime type: text/ will provide proper default almost everywhere, x- because we're not using an official type, markdown and not gruber. or whatever because the type is now so common.
There are still unknowns regarding the different “flavors” of Markdown, though. I guess someone should register an official type, which is supposedly easy, but I doubt anyone dares do it beyond John Gruber, as he very recently proved his attachment to Markdown.
There is a draft on the IETF for text/markdown, but the contents do not seem to describe Markdown at all, so I wouldn't use it until it gets more complete.
There is no official standard type, but text/markdown seems to be the most common de facto type. Most browsers and other reasonably sophisticated clients will likely see the text/ part and default to text/plain anyway, so there's not much difference.
One caveat, though: all types under the text/ hiearchy default to ISO-8859-1 for their character type in the relevant RFC standards. Most of the world has since moved on to UTF-8. So unless you're positive you won't be using any funny characters (or live in an old Windows world) you might want to specify it as follows:
text/markdown; charset=UTF-8
Looks like text/markdown is going to be the standard.
http://www.iana.org/go/draft-ietf-appsawg-text-markdown
https://www.iana.org/assignments/media-types/media-types.xhtml
Search for markdown.
According to RFC7763 “The text/markdown type” from 2016, the general MIME type is
text/markdown; charset=UTF-8
where the charset parameter is required but need not be UTF-8.
That RFC also specifies an optional variant parameter, and the Internet
Assigned Numbers Authority maintains a registry of Markdown
Variants
by which the specific variant of Markdown can be specified, e.g.,
text/markdown; charset=UTF-8; variant=Original
text/markdown; charset=UTF-8; variant=GFM
text/markdown; charset=UTF-8; variant=CommonMark
Some variants allow further parameters, as specified in
RFC7764 “Guidance on Markdown”,
e.g., you could add extensions=-startnum with the pandoc variant to specify a tweak to the dialect,
although I do not know how/whether pandoc might actually interpret that.
Why is the character set required?
RFC2046 “MIME Part Two” from 1996
set US-ASCII as the default character set, but also said
The specification for any future subtypes of "text" must specify
whether or not they will also utilize a "charset" parameter, and may
possibly restrict its values as well.
Then RFC2616 “HTTP/1.1” from 1999
specified ISO-8859-1 as the default character set for text/* transported over
HTTP, and with the web becoming a dominant mode of communication,
this became the presumed default encoding for text/* media types.
Without an explicit character set or registered mime-type-specific default, text/* is considered to be
US-ASCII, unless said text is transported over HTTP in which case it is
considered to be ISO-8859-1.
RFC 6657 “Update to MIME regarding "charset" Parameter Handling
in Textual Media Types”
attempted to clarify this discrepancy
by requiring all new media type registrations
to explicitly specify how
to determine the character set,
preferably by including it in the payload as HTML allows with
<meta charset=UTF-8>.
The text/markdown
registration
specifies the charset parameter as “Required.” Therefore using a content-type of
text/markdown is technically invalid, and the character set of such content may
legitimately be interpreted as any of undefined, invalid, US-ASCII,
ISO-8859-1, or the UTF-8 that in practice it will almost always be.
Found this thread from 2008 : http://www.mail-archive.com/markdown-discuss#six.pairlist.net/msg00973.html
Seems like the mime type text/vnd.daringfireball.markdown should be registered by the author of Markdown, until then the Markdown mime type can be specified as text/x-markdown.

WPF InkCanvas - how to determine if it has been "signed"

I'm using a WPF InkCanvas control to capture signatures in a Tablet PC application.
One of my requirements is to validate whether or not the application has really been "signed". Right now I'm doing this by checking the Strokes collection of the InkCanvas - if there are 0 strokes, then I know the user has not "signed".
However, if the user enters a single slash, or even a single dot, this counts as a stroke and my validation test will pass, even though the signature isn't really valid.
Any ideas about how to build a better test for this? Granted, the use case for what is and is not a valid signature is pretty fuzzy, but I want to try to eliminate obviously bad signatures.
Or is this simply unsolvable in any straightforward way?
I know there are algorithms to test if a signature is a valid match for an existing signature, like the one outlined here. However, finding out if something is a signature in the first place seems to be much more complex.
From Wikipedia:
On legal documents, an illiterate signatory can make a "mark" (often an "X" but occasionally a personalized symbol)
...
Several cultures whose languages use writing systems other than alphabets do not share the Western notion of signatures per se: the "signing" of one's name results in a written product no different from the result of "writing" one's name in the standard way. For these languages, to write or to sign involves the same written characters. Three such examples are Chinese, Japanese, and Korean.
While this could be approached using Intelligent Character Recognition, I also know that my signature rarely looks like it has any characters in it. It's even worse if I am using one of those UPS or FedEx package singing pads. Next time a package arrives though I will try signing with just a dash and a dot and see if it allows it (Which I think it will allow since it's already close enough to that).
Because a signature may not match any recognizable words or characters, trying to 'validate' it any further then you currently are could actually be discarding some valid signatures. If for some reason you do still need to know if there is something more then a dot, take a look at validating instead that the signature fills a certain sized rectangle. That still may invalidate results that shouldn't be, so if you do attempt to add any validation to it make sure to document fully the expectations of a valid signature.

Resources