Textual Protocol which is not a regular language? - theory

The usual way to represent way the grammar associated with a textual network protocol is using ABNF.
Just like any EBNF-related meta-syntax, ABNF enables to describe context-free grammars.
These context-free grammars can represent a non-regular language, right ?
The usual way to implement a network stack is developing a state machine. Is there any textual network protocol which is not a regular language ?

I suppose you refer to "traditional" line-based textual protocols. For instance, any protocol that uses XML is not regular, since XML is not a regular language (in fact, XML is not even context-free if you look at the level of individual characters). In that case, I cannot really think of any non-regular protocols. The most common way to go non-regular in language syntax is to require the parser to be able to count, and it seems to me that a protocol that would require that kind of ability to parse the messages would just be complex (e.g., matching parentheses) or limited (e.g., by having explicit counts instead of allowing arbitrarily long lists).
The use of BNF is probably because it is easy to understand as a syntax description and not because context-freedom gives you any necessary additional power. The main benefit, I would think, is in the ability to use variables to stand for common pieces of syntax. If you look at the BNFs in common Internet protocol specifications, you'll note that they really use just features of regular languages: unbounded repetition, choice, optionality.
Your statement referring to state-machine implementations of protocols sounds to me like some misunderstanding. It is not the parser that is implemented as a state machine but the protocol engine, and state transitions are not triggered by individual characters or tokens in the input but by complete messages. And usually, the states in a protocol state machine are more concerned with establishment and tear-down of communication than the actual communication. For instance, the TCP state machine has 11 states, of which only one suffices for the connection-established state, which is where all the actual data transmission happens, and the rest are all about opening and closing the connection. (Yes, I know TCP is not textual, but it is a well-known protocol with an established state machine, so it serves as a good example; at the level of protocol engines, it does not matter whether the message syntax is text or binary.)

Related

How feasible is it to virtualise the FILE* interfaces of C?

It have often noticed that I would have been able to solve practical problems in C elegantly if there had been a way of creating a ‘virtual FILE’ and attaching the necessary callbacks for events such as buffer full, input requested, close, flush. It should then be possible to use a large part of the stdio.h functions, e.g. fprintf unchanged. Is there a framework enabling one to do this? If not, is it feasible with a moderate amount of effort, on at least some platforms?
Possible applications would be:
To write to or read from a dynamic or static region of memory.
To write to multiple files in parallel.
To read from a thread or co-routine generating data.
To apply a filter to another (virtual or real) FILE.
Support for file formats with indirection (like #include).
A C pre-processor(?).
I am less interested in solutions for specific cases than in a framework to let you roll your own FILE. I am also not looking for a virtual filesystem, but rather virtual FILE*s that I can pass to the CRT.
To my disappointment I have never seen anything of the sort; as far as I can see C11 considers FILE entirely up to the language implementer, which is perhaps reasonable if one wishes to keep the language (+library) specifications small but sad if you compare it with Java I/O streams.
I feel sure that virtual FILEs must be possible with any (fully) open source implementation of the C run-time, but I imagine there might be a large number of details making it trickier than it seems, and if it has already been done it would be a shame to reduplicate the effort. It would also be greatly preferable not to have to modify the CRT code. Without open source one might be able to reverse engineer the functions supplied, but I fear the result would be far too vulnerable to changes in unsupported features, unless there were a commitment to a set of interfaces. I suppose too that any system for which one can write a device driver would allow one to create a virtual device, but I suspect that of being unnecessarily low-level and of requiring one to write privileged code.
I have to admit that while I have code that would have benefited from virtual FILEs, I have no current requirement for it; nonetheless it is something I have often wondered about and that I imagine could be of interest to others.
This is somewhat similar to a-reader-interface-that-consumes-files-and-char-in-c, but there the questioner did not hope to return a virtual FILE; the answer, however, using fmemopen, did.
There is no standard C interface for creating virtual FILE*s, but both the GNU and the BSD standard libraries include one. On linux (glibc), you can use fopencookie; on most *BSD systems, funopen (including Mac OS X). (See Note 1)
The two interfaces are similar but slightly different in some details. However, it is usually very simple to adapt code written for one interface to the other.
These are not complete virtualizations. They associated the FILE* with four callbacks and a void* context (the "cookie" in fopencookie). The callbacks are read, write, seek and close; there are no callbacks for flush or tell operations. Still, this is sufficient for many simple FILE* adaptors.
For a simple example, see the two answers to Write simultaneousely to two streams.
Notes:
funopen is derived from "functional open", not from "file unopen".

c - sockets, why do ip are sent in integer format?

Question
I am wondering why do we connect to sockets by using functions like hton to take care of endianness when we could have sent the ip in plain char array.
Say we want to connect to 184.54.12.169
There is an explanation to this but I cannot figure out why we use integers instead of char, and so involving ourself in endianness hell.
I think char out_ip[] = "184.54.12.169" could have theoretically made it.
Please explain me the subtleties i don't get here.
The basic networking APIs are low level functions. These are very thin wrappers around kernel system calls. Removing these low level functions, forcing everything to use strings, would be rather bad for a low-level API like that, especially considering how tedious string handling is in C. As a concrete hurdle, even IP strings would not be fixed length, so handling them is a lot more complex than just plain 32 bit integers. And moving string handling to kernel is really quite against what kernel is supposed to be, handling arbitrary user strings is really user space problem.
So, you want to create higher-level functions which would accept strings and do the conversion in the library. But, adding such higher level "convenience" functions all over the place in the core libraries would bloat them, because certainly passing IP numbers is not the only place for such convenience. These functions would need to be maintained forever and included everywhere, after they became part of standard (official like POSIX, or de-facto) libraries.
So, removing the low-level functions is not really an option, and adding more functions for higher-level API in the same library is not a good option either.
So solution is to use another library to provide higher level networking API, which could for example handle address strings directly. Not sure what's out ther for C, but it's almost a given for other languages, which also have "real" strings built in so using them is not a hassle.
Because that's how an IP is transmitted in a packet. The "www.xxx.yyy.zzz" string form is really just a human readable form of a 4 byte integer that allows us to see the hierarchical nature a little easier. Sending a whole string would take up a lot more space as well.
Say number 127536 that requires 7 bytes not four. In addition you need to parse it.
I.e. more efficient and do not have to deal with invalid values.

Ultra-portable, small complex config file library in ANSI C?

I'm looking for a very portable, minimalistic/small XML/configuration language library in ANSI C with no external dependencies (or very few), compiling down to less than 100K. I need it for a moderately complex configuration file, and it must support Unicode.
Some more requirements:
OK to use/embed/statically link into proprietary code. Credit will always will be given where credit is due.
Not necessarily XML.
Really, clean code/no weird or inconsistent string handling.
UTF-8.
Thank you fellas.
This is somehow similar to this question: Is there a good tiny XML parser for an embedded C project?
I was able to tweak the compilation flags of the following XML parser libraries for C, and cut down more than 50% of their size on my Ubuntu machine. Mini-XML is the only one close to what you requested:
Mini-XML (36K)
Expat (124K)
RXP (184K)
IMO Protocol Buffers are a much, much better solution for this kind of use case than XML. With protocol buffers you get real types and a schema without having to layer them on top of base XML. Also the syntax is nicer.
// The schema file: can serve as documentation for what
// configuration values are available.
message MyAppConfig {
// Set to control the port the app listens on.
optional int32 port = 1 [default=1234];
// Set to control the local hostname.
optional string hostname = 2 [default="localhost"];
}
Then the user's actual config would look like this:
# I want to listen on a very high port.
port: 50000
The main protocol buffer library does not fit your criteria because it is in C++ and is very large. I am working on a much smaller implementation of the same called upb (ie. "micro" protobufs). It is written in ~5k lines of ANSI C and compiles to <50k.
Protocol buffers have both a binary and a text format, which are equivalent. My library does not (yet) support reading the text format, but you could have your users use the Google standard tool for converting the text version of their config to binary format ahead of time. Then your app itself would only need to read the binary format, and could just use upb.
upb is just now getting to the point where adventurous users could try it out, but it's a bit rough around the edges still and the APIs are still changing somewhat. If you're ok with this, you might try diving in now. If you prefer something more stable, at least keep upb on your radar.

XML messages over a TCP/IP socket

I am using C and want to know are XML messages are preferable over text messages as far as communication over a socket connection is concerned?
Is there any other good option available rather to go for XML?
Which is the best parser(or parsing option) available for parsing XML in C?
Is there any standard library which comes with C and helps to parse XML messages?
You design the protocol so you decide. You can use text or binary communication. Whatever format you use, you decide the how to serialize/de-serialize and interpret data. If you use XML, you can leverage on XMLRPC or SOAP. You can use JSONRPC as well. Last time in my project, I used binary in a very simple yet efficient way: The first to identify the method/function to call. The next 2 bytes to inform the length of data (up to 64K - 1 bytes) and the rest is data. Take note of Big/Small Endianess.
It's very subjective. You could use validating or non-validating parsers. TinyXML is lightweight one. You can look into MiniXML and Expat. libxml2 is fatter.
So far XML parsing is not in standard libraries of C or C++. You could use the aforementioned libraries.
Good luck!
EDIT:
By the way, if you want to use binary format to exchange data, just use any of these 3:
http://tpl.sourceforge.net/ - C serialization library.
http://www.s11n.net/c11n/ - A powerful and complicated C serialization library.
Siseria - http://sourceforge.net/projects/siseria/ , purely in C. I wrote for an embedded system project. It runs without any dependency and is very fast! Compared to other 2, mine is very simple and does not use heap and dynamic memory at all. Everything is on the stack!
There are any number of possible solutions. I would look at a few other options before picking XML, I think. XML has quite a lot of overhead; unless you're going to compress your streams it might be a bit costly. XML also isn't easy to edit for humans, although of course more so than a binary format.
You might want to look at JSON, it's a very popular format and is far simpler than XML. There are plenty of implementations available.
I can highly recommend Protobuf, Google's data interchange format. We're using it for communicating between two processes, at it works great. It has built-in support for C++, Python, and Java, and 3-rd party libraries for a bunch of others (Jon Skeet maintains the C# port).
The main question is what are your performance requirements.
If you are going to send one message per second, feel free. If you have human interface on one end, use XML or any other text format.
If you design a machine-to-machine interface, you'd rather consider binary data. Remember to convert everything to network-standard byte order in this case.

Sample read/write handling of packets in C

I'm a bit new to C, but I've done my homework (some tutorials, books, etc.) and I need to program a simple server to handle requests from clients and interact with a db. I've gone through Beej's Guide to Network programming, but I'm a bit unsure how to piece together and handle different parts of the data getting sent back and forth.
For instance, say the client is sending some information that the server will put in multiple fields. How do I piece together that data to be sent and then break it back up on the server side?
Thanks,
Eric
If I understand correctly, you're asking, "how does the server understand the information the client sends it"?
If that's what you're asking, the answer is simple: it's mutually agreed upon ahead of time that the data structures each uses will be compatible. I.e. you decide upon what your communication protocol will be ahead of time.
So, for example, if I have a client-server application where the client connects and can ask for things such as "time", "date" and can say "settime " and "setdate ", I need to write my server in such a way that it will understand those commands.
Obviously, in the above case it's trivial, since it'd just be a text-based protocol. But let's say you're writing an application that will return a struct of information, i.e.
struct Person {
char* name;
int age;
int heightInInches;
// ... other fields ...
};
You might write the entire struct out from the server/client. In this case there are a few things to be aware of:
You need to hton/ntoh properly
You need to make sure that your client and server both can understand the struct in question.
You may or may not have to align on a 4B boundary (because if you don't, different C compilers may do different things, which may burn you between the client and the server, or it may not).
In general, though, when writing a client/server app, the most important thing to get right is the communication protocol.
I'm not sure if this quite answers your question, though. Is this what you were after, or were you asking more about how, exactly, do you use the send/recv functions?
First, you define how the packet will look - what information will be in it. Make sure the definition is in an architecture-neutral format. That means that you specify it in a sequence that does not depend on whether the machine is big-endian or little-endian, for example, nor on whether you are compiling with 32-bit long or 64-bit long values. If the content is of variable length, make sure the definition contains the information needed to tell how long each part is - in particular, each variable length part should be preceded by a suitable count of its length.
When you need to package the data for transmission, you will take the raw (machine-specific) values and write them into a buffer (think 'character array') at the appropriate positions, in the appropriate format.
This buffer will be sent across the wire to the receiver, which will read it into another buffer, and then reverse the process to obtain the information from the buffer into local variables.
There are functions such as ntohs() to convert from a network ('n') to host ('h') format for a 'short' (meaning 16-bit) integer, and htonl() to convert from a host 'long' (32-bit integer) to network format - etc.
One good book for networking is Stevens' "UNIX Network Programming, Vol 1, 3rd Edn". You can find out more about it at its web site, including example code.
As already mentioned above what you need is a previously agreed means of communication. One thing that helps me is to use xmls to communicate.
e.g. You need time to send time to client then include it in a tag called time.
Then parse it on the client side and read the tag value.
The biggest advantage is that once you have a parser in place on client side then even if you have to send some new information them just have to agree on a tag name that will be parsed on the client side.
It helps me , I hope it helps you too.

Resources