Validate only a part of XML file using an XSD file - c

Is there a way to validate only a part of XML file using XSD file and ignore the other contents of the XML file. I want to validate only a couple of tags in XML file using XSD file. My XML file contains many tags, but xsd contains elements for only few of the tags.
Is it possible to attain this somehow?

There are two ways (at least) to achieve this, in principle.
First, you can in principle tell the validator which elements in the document you want to validate; the XSD spec does not require validation to start at the document root. In practice, command-line validators almost never provide run-time options for starting validation anywhere but the root. I think validation libraries are more likely to provide that functionality; they often (or at least sometimes) provide functions to allow you to pass in the element at which validation should start, together with the necessary schema information.
If your validator doesn't allow you to validate selectively, you can write a schema that contains declarations for just those elements and attributes you want to validate, and invoke a validator on the document root in "lax validation mode" -- which means, essentially "If you find in the schema a declaration for an element in the document, then validate the element against its declaration, otherwise accept it (pretend it matches a lax wildcard in the declaration of its parent) and move on." The validator will thus ignore elements for which you provide no declarations and validate elements for which you do provide declarations. (Note that conforming XSD processors are not required to provide lax-validation mode, and the definition of lax validation in the spec is a little underspecified, but I believe most available processors do support it and do the same thing in lax mode.)

An ugly hack would be to construct a "validatable" document from the main one by omitting that which you don't want validated, and validate that one. I don't endorse this approach, but it's an answer at least.

The easiest way to achieve this in practice is probably to do the validation using the validate expression in XQuery (or copy-of with validation in XSLT). This allows you to select the element you want to validate, and perform the validation, in one go.
The downside might be that validate in XQuery is defined to be a fatal error if the document is invalid, so the implementation might simply stop on the first error rather than focusing on giving you as much information about the invalidity as it can. At this stage you need to find out how it's implemented in a particular processor and/or how to configure that processor.

Related

Voice XML -- need a field filled with raw ASR input

I'm trying to build a voice XML interface to a machine translation system. Most of the menu design is simple enough, but when the user actually says the phrase to be translated, I need to be able to intake whatever text comes from the ASR without trying to match it to a finite grammar. Is there a standard way to do this in voice XML?
If by standard way, you mean VoiceXML with SRGS/SISR, you could build a grammar that had ever word of the target language and the SI to reassemble the content into a slot. Not a practical solution, but a possible one within the specification constraints.
If you are just looking at VoiceXML, only building the capability into a browser would be a constraint, as VoiceXML doesn't provide any relevant restrictions for how $lastresult is populated.
Your implementation constraints and what your are trying to achieve might be helpful to create a practical solution.
The 'standard' VoiceXML not allows to get free text (because you allay use a grammar with strict rules), you plan to be out of the initial scope of the specification.
If you can control your VoiceXML interpreter implementation you can use the same method as us. With our Voximal VoiceXML interpreter we solve this by using a builtin grammar :
<field name="text" type="text" > : it use the builtin:grammar/text
You can extend by adding parameter like "text?lang=en-US" or 'text?model=MyWatsonModel".
The text restult is in the variable, and you can add extra values in the shaddow variables.
All this is platform dependent, and of of the scope of the VoiceXML standard. But I think it is the best way to integrate SpeechToText in the VoiceXML.

Terminology - one-time code generation directives

Is there a such thing as a preprocessor whose statements, once processed, disappear completely and get replaced by the target language syntax permanently?
I want to research it on the web but I don't know what term to search for. If I search for "code generator", "templating language", "preprocessor directives", "mixins", "annotations" I get generators whose input becomes the source of truth.
The closest thing I can think of is a macro.
What I'm trying to do
I often have to write code that is verbose and unnecessary manual labor and am looking for a smarter way to input at least the majority of it and have it automatically transformed and only source-control the output (and hand edit if necessary). For example:
Java code - Instead of writing getters/setters, javadoc (perhaps the transformer can be a maven plugin)
HTML - I just want to add URLs, and have my preprocessor automatically convert them to links, images, videos, audio etc. depending on the file extension with some regex substitution (currently I run a perl script via a cron job)
I just want to use it as my own shorthand and not enforce it in my project and make the output editable so that others have to learn a new framework or language (like Protobuf, Stringtemplate, GWT, C hash-defines, PHP, JSP etc).
There should be no direct clue that I used a template/preprocessor to generate it.
What you want is a "program transformation system". See https://en.wikipedia.org/wiki/Program_transformation. (This is a superset of "transpilers" [ugly term]).
A good source-to-source transformation system will let you apply rewrite rules of the form of:
if you see *this*, replace it by *that* if *this_condition*.
You can then take your source code, and run a set of rewrite rules across that code to change it.
The resulting code is "transformed"; the rewrite rules are not visible.
It seems like Transpiler is one way to describe it.

Document MISRA/QA-C message suppression with Doxygen

I'm currently working on a project, which has to be MISRA 2012 compliant. But in the embedded world, you can't fulfill every MISRA rule. So I have to suppress some messages generated by QA-C. What's he best solution to do this?
I was thinking about making a table in every module header file with references (\ref and \anchor) to the relevant code lines, a description, etc. The first problem is: I can't use the Doxygen markdown table feature, because then the description has to be in one line, because Doxygen tables don't support line breaking. So I thought about using a simple verbatim table, what do you think?
Or is there a way to generate such a table automatically?
Greetings
m0nKeY
According to MISRA, all such undesired rules must be handled by your deviation procedure, given that they are either "required" or "advisory". You are not allowed to deviate from "mandatory" rules. (Strictly speaking, you don't need to invoke the deviation procedure for advisory rules.)
In my experience, the safest and smoothest way by far to do this, is to not allow individual deviations on case-by-case basis. All deviations from MISRA should be stated in your company coding standard, and in order to deviate you have to update that document. Which in turn enforces approval from the document owner, who is preferably the most hardened C veteran you have in the team.
That way, you prevent less experienced team members from misinterpreting the rules and ignoring important rules, simply because they don't understand them and mistake them for false positives. There should be a rationale in the document stating why the rule you deviate from is not feasible for your company.
This means that everyone in the dev team is allowed to deviate from the listed rules at any point, without the need to invoke any form of bureaucracy.
Once you have a setup like this, simply customize your static analyser and remove/ignore the undesired warnings. That way, you get rid of a lot of noise and false warnings from the tool.
To answer your question generally: To create an aggregate occurrence list of anything in doxygen, use \xrefitem
We use this as a tool in our code review process. I tag code with a custom tag \reviewme which adds the function to a list of all code in need of peer review. The next guy can come along and clear that tag. We have another custom tag \reviewedby which does not use \xrefitem but simply puts the reivewers name and the date in the code block saying who reviewed it and when. This had gotten a bit clunky as things have scaled with larget code bases and more developers. Now we're looking into tools that integrate with our version control process to handle this better. But when we started this it worked well and fit a shoestring budget. But that example should give you an idea of is capable.
Here is a screen shot of what the output looks like - proprietary stuff and auto names redacted:
Here is how we added this custom tag as an alias to xrefitem in our doxy file as follows
ALIASES = "reviewme = \xrefitem reviewme \"This section needs peer review\" \"Documentation block or code sections that need peer review\""
To add it from the GUI, you would go to Expert->Project->Aliases and add a line like this
reviewme = \xrefitem reviewme "This section needs peer review" "Documentation block or code sections that need peer review"
Same thing, just no need to put quotes around the whole thing and escape out the inner quotes.
\xrefitem is the underpinning of how things like \todo or \bug work in doxygen. You can make a list of just about anything your heart desires.
Speaking specifically to MISRA exceptions: Lundin's post has lot's of merit. I would consider it. I think a better place to document exceptions to coding standards is in the static analysis tool its self. Many tools have their own annotations where you can categorize the rule violation as 'excused' or whatever. But generally this does not remove them from the list, it allows you just to filter or sort them. Perhaps you can use REGEX in a script that runs prior to doxygen that will replace the tool specific annotation with a custom \xrefitem if you are really concerned. Or vice vera, replace the doxy annotation with your tool's annotation.

What is the correct mime type for esoteric languages

What is the correct mime-type type of esoteric languages?
I've googled everywhere, I even tried to ask Chuck Norris, but I didn't find the answer anywhere.
I have tried these for Brainfuck:
application/brainfuck
application/x-brainfuck
application/x+brainfuck
x-esoteric/x-brainfuck
chuck-norris-choice/brainfuck
x-you-lost-the-game/x-fuck-your-brain
42/++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
But none of them seemed to work.
A far as I'm aware, there is no 'official' media type for brainfuck (Official types listed here). You are of course free to make up your own without officially registering the type, but you should take a few things into consideration before choosing what name to use. All the information you need is in RFC2046. I'll discuss the relevant parts below.
Top Level Media Type
As far as I can see, the two options you might choose from are text and application:
text
According to Section 3:
The subtype "plain" in particular indicates plain text containing no formatting commands or directives of any sort. Plain text is intended to be displayed "as-is". No special software is required to get the full meaning of the text, aside from support for the indicated character set.
If you intend for the data to be displayed rather than interpreted by an application, I would use this.
Section 4.1.4 mentions the following about unrecognised subtypes:
Unrecognized subtypes of "text" should be treated as subtype "plain" as long as the MIME implementation knows how to handle the charset.
Setting your top level media type to text will ensure that compliant applications that do not recognise the full type will still render the data as text.
application
If you intend your data to be interpreted or processed further, you should use the application top-level media type. As in the argument above, if you label your data as application, any programs that receive it are more likely to behave in a sensible fashion.
Section 4.5.3 deals with unrecognised application types:
It is expected that many other subtypes of "application" will be defined in the future. MIME implementations must at a minimum treat any unrecognized subtypes as being equivalent to "application/octet-stream".
Reading the appropriate section (Section 4.5.1) we find out how applications are supposed to handle octet streams:
The recommended action for an implementation that receives an "application/octet-stream" entity is to simply offer to put the data in a file, with any Content-Transfer-Encoding undone, or perhaps to use it as input to a user-specified process.
If this seems like the most logical way to handle your data when it is unrecognised, then application is for you.
Sub-type
Choosing the subtype is much easier. Section 6 covers experimental media types:
A media type value beginning with the characters "X-" is a private value, to be used by consenting systems by mutual agreement. Any format without a rigorous and public definition must be named with an "X-" prefix, and publicly specified values shall never begin with "X-".
So your subtype should be X-brainfuck.
Summary
You have two options:
text/X-brainfuck
application/X-brainfuck
If you intend for applications to treat the data as plain text and display it, choose 1. If you intend the data to be interpreted or executed, choose 2. If you're unsure what you want to happen, choose 2, because the default expectation is that an application will prompt the user for what to do if it does not recognise the type.
I have no clue why you think application/... is an appropriate mime type for a text file.
One generally accepted MIME type for .bf is text/x-brainfuck. This is a language, not an executable.

How to avoid a series of "if" statements?

Assume, I have a form ...lets say WinForm... with 'n' number of controls.
I populate them with a default value during the LOAD. Then, the user gets to play with all the controls and set different values for the controls and submit the form. Here is where I find myself writing a series of "if" conditional statements handling the value of each of the controls for (but not restricted to) avoiding nulls, doing validation etc.
Though it works, is there some other more efficient way of doing this instead of disparate "ifs" ?
You may not avoid the 'ifs' entirely, but sometimes it helps to gather related bunch of controls on your Form into User Controls. Then you can move the validation and all from the Form class into individual User Controls, thus reducing clutter.
You should know that WinForms has build in facilities for both validation and data binding. Using these built-in capabilities will definitely result in code that is better structured and easier to write and maintain than hand coding data and validation operations. Beth Massi has done a series of videos that demonstrates these features, you can find them on the MSDN web site.
** Edited **
I don't have a catch-all, as this will vary from form to form, but some general advice.
By the way, I love this question because it's all about keeping your code clean, readable, and doing things as simply as possible.
Use the included validation controls when possible rather than writing if statements to validate code. (see instruction video for winforms (based on the question I'm assuming you mean .Net winforms.) here)
Always look to see if you can write a function to handle repetitive tasks. It takes a line of code to call a function, and if your function is only fivelines long, but you call it tentimes, that means you've saved yourself a lot of duplicate lines of code.
If you can write that function to be smart enough and be able to loop through your controls, so much the better.
In short, look at your code and determine to try to do the job with the least amount of code possible while making it easily readable and understandable, and without resorting to bad practices. Experiment in your spare time on non-production "test" code to refine your technique as you learn, but if you get used to thinking about clean code you get better at writing it.
Create a set of Validators to match 1-for-1 with your controls. Derive from the base Validator a ControlXValidator, which take a ControlX as its constructor, and implements isValid() in the special way that ControlX must evaluate as valid, and implements getDiagnosticMessage to display an appropriate message if the validation fails. Then at the end of your form construction code, create a list of Validators containing the Validator subclass for each control.
Then your validateForm() method can just do something like:
allvalid = True;
foreach(Validator vtor in allValidators)
{
if (!vtor.isValid())
{
StatusBar.Caption = vtor.getDiagnosticMessage();
allvalid = False;
break;
}
}
If you are validating by data-type (dates should look like dates), you could use a function that validates your data and pass the function both the user input and a "sample" of valid data. Valid samples could be stored in an array, keyed by the data-type.
And if the data is not valid, the function returns false and you have one if statement that says "if function returns false, punch the user".
Assume a decently strong language:
Create a hash (a.k.a Map) with the keys as the control identities and the values as functions. Retrieve the function and call.
restrict your control.....life in text box you can set limit of inputr chars ...etc....
not specific to any language: use Guard Clauses is usually a good way to get rid ifs. It is a excellent way to check nulls and validations.

Resources