I'd like to determine time complexity of a printf such as:
{
printf("%d",
i);
}
Or:
{
printf("%c",
array[i]);
}
Is it correct to assume that time complexity of a printf is always O(1) or not?
[EDIT] Let's take a function that swaps two values:
void swap(...)
{
tmp = x;
x = y;
y = tmp;
}
Every assignment expression costs 1 (in terms of time complexity), so T(n) = 1 + 1 + 1 = 3 which means O(1). But what can I say about this function?
void swap(...)
{
tmp = x;
x = y;
y = tmp;
printf("Value of x: %d", x);
printf("Value of y: %d", y);
}
Can I say that T(n) is still O(1) in this case?
I don't think this is really a sensible question to ask, because printf's behavior is mostly implementation-defined. C doesn't place any restrictions on what the system decides to do once it hits printf. It does have a notion of a stream. Section 7.21 of the C11 standard states that printf acts over a stream.
C lets the implementation do anything that it wants with streams after they're written to (7.21.2.2):
Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one- to-one correspondence between the characters in a stream and those in the external
representation
So your call to printf is allowed to write out 1 TB whenever a char is printed, and 1 byte whenever an int is printed.
The standard doesn't even require that the write happen when printf is actually called (7.21.3.3):
When a stream is unbuffered, characters are intended to appear from the source or at the
destination as soon as possible. Otherwise characters may be accumulated and
transmitted to or from the host environment as a block. When a stream is fully buffered, characters are intended to be transmitted to or from the host environment as a block when
a buffer is filled... Support for these characteristics is
implementation-defined.
And the standard doesn't specify whether stdout is buffered or unbuffered. So C allows printf to do pretty much whatever it feels like once you ask it for a write.
It is strange to try to evaluate time complexity of printf() as it's a blocking input/output operation that performs some text processing and performs a write operation via a series of write() system calls through an intermediate buffering layer.
The best guess about the text processing part is that the whole input string must be read and all arguments are processed so unless there's some black magic, you can assume O(n) to the number of characters. You're usually not expected to feed the format argument of printf() dynamicaly and therefore the size is known, therefore finite and therefore the complexity is indeed O(1).
On the other hand, the time complexity of a blocking output operation is not bounded. In blocking mode, all write() operations return either with an error or with at least one byte written. Assuming the system is ready to accept new data in a constant time, you're getting O(1) as well.
Any other transformations also occur lineary to the typically constant size of the format or result string, so with a number of assumptions, you can say it's O(1).
Also your code suggests that the output only occurs to actually test the functionality and shouldn't be considered part of the computation at all. The best way is to move the I/O out of the functions you want to consider for the purpose of complexity, e.g. to the main() function to stress that the input and output is there just for testing out the code.
Implementation of the O(1) swap function without I/O:
void swap(int *a, int *b)
{
int tmp = *a;
*a = *b;
*b = tmp;
}
Alternative implementation without a temporary variable (just for fun):
void swap(int *a, int *b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
Implementation of the main function:
int main(int argc, char **argv)
{
int a = 3, b = 5;
printf("a = %d; b = %d\n", a, b);
swap(&a, &b);
printf("a = %d; b = %d\n", a, b);
return 0;
}
Generally, the complexity of printf() is O(N) with N being the number of characters that are output. And this amount is not necessarily a small constant, as in these two calls:
printf("%s", myString);
printf("%*d", width, num);
The length of myString does not necessarily have an upper bound, so complexity of the first call is O(strlen(myString)), and the second call will output width characters, which can be expected to take O(width) time.
However, in most cases the amount of output written by printf() will be bounded by a small constant: format strings are generally compile time constants and computed field widths as above are rarely used. String arguments are more frequent, but oftentimes allow giving an upper limit as well (like when you output a string from a list of error messages). So, I'd wager that at least 90% of the real world printf() calls are O(1).
The time complexity means not how many time required to run particular program. The time complexity is measured in number of frequencies it requires to run. Now if you are considering the simple printf() statement then obviously time complexity will be O(1).
Refer:
Time Complexity For Algorithm
Related
In C99, is there a big different between these two?:
int main() {
int n , m;
scanf("%d %d", &n, &m);
int X[n][m];
X[n-1][m-1] = 5;
printf("%d", X[n-1][m-1]);
}
and:
int main(int argc, char *argv[]) {
int n , m;
int X[n][m];
scanf("%d %d", &n, &m);
X[n-1][m-1] = 5;
printf("%d", X[n-1][m-1]);
}
The first one seems to always work, whereas the second one appears to work for most inputs, but gives a segfault for the inputs 5 5 and 6 6 and returns a different value than 5 for the input 9 9. So do you need to make sure to get the values before declaring them with variable length arrays or is there something else going on here?
When the second one works, it's pure chance. The fact that it ever works proves that, thankfully, compilers can't yet make demons fly out of your nose.
Declaring a variable doesn't necessarily initialize it. int n, m; leaves both n and m with undefined values in this case, and attempting to access those values is undefined behavior. If the raw binary data in the memory those point to happen to be interpreted into a value larger than the values entered for n and m -- which is very, very far from guaranteed -- then your code will work; if not, it won't. Your compiler could also have made this segfault, or made it melt your CPU; it's undefined behavior, so anything can happen.
For example, let's say that the area of memory that the compiler dedicates to n happened to contain the number 10589231, and m got 14. If you then entered an n of 12 and an m of 6, you're golden -- the array happens to be big enough. On the other hand, if n got 4 and m got 2, then your code will look past the end of the array, and you'll get undefined behavior -- which might not even break, since it's entirely possible that the bits stored in four-byte segments after the end of the array are both accessible to your program and valid integers according to your compiler/the C standard. In addition, it's possible for n and m to end up with negative values, which leads to... weird stuff. Probably.
Of course, this is all fluff and speculation depending on the compiler, OS, time of day, and phase of the moon,1 and you can't rely on any numbers happening to be initialized to the right ones.
With the first one, on the other hand, you're assigning the values through scanf, so (assuming it doesn't error) (and the entered numbers aren't negative) (or zero) you're going to have valid indices, because the array is guaranteed to be big enough because the variables are initialized properly.
Just to be clear, even though variables are required to be zero-initialized under some circumstances doesn't mean you should rely on that behavior. You should always explicitly give variables a default value, or initialize them as soon as possible after their declaration (in the case of using something like scanf). This makes your code clearer, and prevents people from wondering if you're relying on this type of UB.
1: Source: Ryan Bemrose, in chat
int X[n][m]; means to declare an array whose dimensions are the values that n and m currently have. C code doesn't look into the future; statements and declarations are executed in the order they are encountered.
In your second code you did not give n or m values, so this is undefined behaviour which means that anything may happen.
Here is another example of sequential execution:
int x = 5;
printf("%d\n", x);
x = 7;
This will print 5, not 7.
The second one should produce bugs because n and m are initialized with pretty much random values if they're local variables. If they're global, they'll be with value 0.
I would like to know if I can choose the storage location of arrays in c. There are a couple of questions already on here with some helpful info, but I'm looking for some extra info.
I have an embedded system with a soft-core ARM cortex implemented on an FPGA.
Upon start-up code is loaded from memory and executed by the processor. My code is in assembley and contains some c functions. One particular function is a uART interrupt which I have included below
void UART_ISR()
{
int count, n=1000, t1=0, t2=1, display=0, y, z;
int x[1000]; //storage array for first 1000 terms of Fibonacci series
x[1] = t1;
x[2] = t2;
printf("\n\nFibonacci Series: \n\n %d \n %d \n ", t1, t2);
count=2; /* count=2 because first two terms are already displayed. */
while (count<n)
{
display=t1+t2;
t1=t2;
t2=display;
x[count] = t2;
++count;
printf(" %d \n",display);
}
printf("\n\n Finished. Sequence written to memory. Reading sequence from memory.....:\n\n");
for (z=0; z<10000; z++){} // Delay
for (y=0; y<1000; y++) { //Read variables from memory
printf("%d \n",x[y]);
}
}
So basically the first 1000 values of the Fibonacci series are printed and stored in array X and then values from the array are printed to the screen again after a short delay.
Please correct me if I'm wrong but the values in the array X are stored on the stack as they are computed in the for loop and retrieved from the stack when the array is read from memory.
Here is he memory map of the system
0x0000_0000 to 0x0000_0be0 is the code
0x0000_0be0 to 0x0010_0be0 is 1MB heap
0x0010_0be0 to 0x0014_0be0 is 256KB stack
0x0014_0be0 to 0x03F_FFFF is of-chip RAM
Is there a function in c that allows me to store the array X in the off-chip ram for later retrieval?
Please let me know if you need any more info
Thanks very much for helping
--W
No, not "in C" as in "specified by the language".
The C language doesn't care about where things are stored, it specifies nothing about the existance of RAM at particular addresses.
But, actual implementations in the form of compilers, assemblers and linkers, often care a great deal about this.
With gcc for instance, you can use the section variable attribute to force a variable into a particular section.
You can then control the linker to map that section to a particular memory area.
UPDATE:
The other way to do this is manually, by not letting the compiler in on the secret and doing it yourself.
Something like:
int *external_array = (int *) 0x00140be0;
memcpy(external_array, x, sizeof x);
will copy the required number of bytes to the external memory. You can then read it back by swapping the two first arguments in the memcpy() call.
Note that this is way more manual, low-level and fragile, compared to letting the compiler/linker dynamic duo Just Make it Work for you.
Also, it seems very unlikely that you want to do all of that work from an ISR.
with the following code, I am trying to make an array of numbers and then sorting them. But if I set a high arraysize (MAX), the program stops at the last 'randomly' generated number and does not continue to the sorting at all. Could anyone please give me a hand with this?
#include <stdio.h>
#define MAX 2000000
int a[MAX];
int rand_seed=10;
/* from K&R
- returns random number between 0 and 62000.*/
int rand();
int bubble_sort();
int main()
{
int i;
/* fill array */
for (i=0; i < MAX; i++)
{
a[i]=rand();
printf(">%d= %d\n", i, a[i]);
}
bubble_sort();
/* print sorted array */
printf("--------------------\n");
for (i=0; i < MAX; i++)
printf("%d\n",a[i]);
return 0;
}
int rand()
{
rand_seed = rand_seed * 1103515245 +12345;
return (unsigned int)(rand_seed / 65536) % 62000;
}
int bubble_sort(void)
{
int t, x, y;
/* bubble sort the array */
for (x=0; x < MAX-1; x++)
for (y=0; y < MAX-x-1; y++)
if (a[y] > a[y+1])
{
t=a[y];
a[y]=a[y+1];
a[y+1]=t;
}
return 0;
}
The problem is that you are storing the array in global section, C doesn't give any guarantee about the maximum size of global section it can support, this is a function of OS, arch compiler.
So instead of creating a global array, create a global C pointer, allocated a large chunk using malloc. Now memory is saved in the heap which is much bigger and can grow at runtime.
Your array will land in BSS section for static vars. It will not be part of an image but program loader will allocate required space and fill it with zeros before your program starts 'real' execution. You can even control this process if using embedded compiler and fill your static data with anything you like. This array may occupy 2GB or your RAM and yet your exe file may be few kilobytes. I've just managed to use over 2GB array this way and my exe was 34KB. I can believe a compiler may warn you when you approach maybe 231-1 elements (if your int is 32bit) but static arrays with 2m elements are not a problem nowadays (unless it is embedded system but I bet it is not).
The problem might be that your bubble sort has 2 nested loops (as all bubble sorts) so trying to sort this array - having 2m elements - causes the program to loop 2*1012 times (arithmetic sequence):
inner loop:
1: 1999999 times
2: 1999998 times
...
2000000: 1 time
So you must swap elements
2000000 * (1999999+1) / 2 = (4 / 2) * 10000002 = 2*1012 times
(correct me if I am wrong above)
Your program simply remains too long in sort routine and you are not even aware of that. What you see it just last rand number printed and program not responding. Even on my really fast PC with 200K array it took around 1minute to sort it this way.
It is not related to your os, compiler, heaps etc. Your program is just stuck as your loop executes 2*1012 times if you have 2m elements.
To verify my words print "sort started" before sorting and "sort finished" after that. I bet the last thing you'll see is "sort started". In addition you may print current x value before your inner loop in bubble_sort - you'll see that it is working.
Dynamic Array
int *Array;
Array= malloc (sizeof(int) * Size);
The original C standard (ANSI 1989/ISO 1990) required that a compiler successfully translate at least one program containing at least one example of a set of environmental limits. One of those limits was being able to create an object of at least 32,767 bytes.
This minimum limit was raised in the 1999 update to the C standard to be at least 65,535 bytes.
No C implementation is required to provide for objects greater than that size, which means that they don't need to allow for an array of ints greater than
(int)(65535 / sizeof(int)).
In very practical terms, on modern computers, it is not possible to say in advance how large an array can be created. It can depend on things like the amount of physical memory installed in the computer, the amount of virtual memory provided by the OS, the number of other tasks, drivers, and programs already running and how much memory that are using. So your program may be able to use more or less memory running today than it could use yesterday or it will be able to use tomorrow.
Many platforms place their strictest limits on automatic objects, that is those defined inside of a function without the use of the 'static' keyword. On some platforms you can create larger arrays if they are static or by dynamic allocation.
How to get and evaluate the Expressions from a string in C
char *str = "2*8-5+6";
This should give the result as 17 after evaluation.
Try by yourself. you can Use stack data structure to evaluate this string here is reference to implement (its in c++)
stack data structre for string calcualtion
You have to do it yourself, C does not provide any way to do this. C is a very low level language. Simplest way to do it would be to find a library that does it, or if that does not exist use lex + yacc to create your own interpreter.
A quick google suggests the following:
http://www.gnu.org/software/libmatheval/
http://expreval.sourceforge.net/
You should try TinyExpr. It's a single C source code file (with no dependencies) that you can add to your project.
Using it to solve your problem is just:
#include <stdio.h>
#include "tinyexpr.h"
int main()
{
double result = te_interp("2*8-5+6", 0);
printf("Result: %f\n", result);
return 0;
}
That will print out: Result: 17
C does not have a standard eval() function.
There are lots of libraries and other tools out there that can do this.
But if you'd like to learn how to write an expression evaluator yourself, it can be surprisingly easy. It is not trivial: it is actually a pretty deeply theoretical problem, because you're basically writing a miniature parser, perhaps built on a miniature lexical analyzer, just like a real compiler.
One straightforward way of writing a parser involves a technique called recursive descent. Writing a recursive descent parser has a lot in common with another great technique for solving big or hard problems, namely by breaking the big, hard problem up into smaller and hopefully easier subproblems.
So let's see what we can come up with. We're going to write a function int eval(const char * expr) that takes a string containing an expression, and returns the int result of evaluating it. But first let's write a tiny main program to test it with. We'll read a line of text typed by the user using fgets, pass it to our expr() function, and print the result.
#include <stdio.h>
int eval(const char *expr);
int main()
{
char line[100];
while(1) {
printf("Expression? ");
if(fgets(line, sizeof line, stdin) == NULL) break;
printf(" -> %d\n", eval(line));
}
}
So now we start writing eval(). The first question is, how will we keep track of how how far we've read through the string as we parse it? A simple (although mildly cryptic) way of doing this will be to pass around a pointer to a pointer to the next character. That way any function can move forwards (or occasionally backwards) through the string. So our eval() function is going to do almost nothing, except take the address of the pointer to the string to be parsed, resulting in the char ** we just decided we need, and calling a function evalexpr() to do the work. (But don't worry, I'm not procrastinating; in just a second we'll start doing something interesting.)
int evalexpr(const char **);
int eval(const char *expr)
{
return evalexpr(&expr);
}
So now it's time to write evalexpr(), which is going to start doing some actual work. Its job is to do the first, top-level parse of the expression. It's going to look for a series of "terms" being added or subtracted. So it wants to get one or more subexpressions, with + or - operators between them. That is, it's going to handle expressions like
1 + 2
or
1 + 2 - 3
or
1 + 2 - 3 + 4
Or it can read a single expression like
1
Or any of the terms being added or subtracted can be a more-complicated subexpression, so it will also be able to (indirectly) handle things like
2*3 + 4*5 - 9/3
But the bottom line is that it wants to take an expression, then maybe a + or - followed by another subexpression, then maybe a + or - followed by another subexpression, and so on, as long as it keeps seeing a + or -. Here is the code. Since it's adding up the additive "terms" of the expression, it gets subexpressions by calling a function evalterm(). It also needs to look for the + and - operators, and it does this by calling a function gettok(). Sometimes it will see an operator other than + or -, but those are not its job to handle, so if it sees one of those it "ungets" it, and returns, because it's done. All of these functions pass the pointer-to-pointer p around, because as I said earlier, that's how all of these functions keep track of how they're moving through the string as they parse it.
int evalterm(const char **);
int gettok(const char **, int *);
void ungettok(int, const char **);
int evalexpr(const char **p)
{
int r = evalterm(p);
while(1) {
int op = gettok(p, NULL);
switch(op) {
case '+': r += evalterm(p); break;
case '-': r -= evalterm(p); break;
default: ungettok(op, p); return r;
}
}
}
This is some pretty dense code, Stare at it carefully and convince yourself that it's doing what I described. It calls evalterm() once, to get the first subexpression, and assigns the result to the local variable r. Then it enters a potentially infinite loop, because it can handle an arbitrary number of added or subtracted terms. Inside the loop, it gets the next operator in the expression, and uses it to decide what to do. (Don't worry about the second, NULL argument to gettok; we'll get to that in a minute.)
If it sees a +, it gets another subexpression (another term) and adds it to the running sum. Similarly, if it sees a -, it gets another term and subtracts it from the running sum. If it gets anything else, this means it's done, so it "ungets" the operator it doesn't want to deal with, and returns the running sum -- which is literally the value it has evaluated. (The return statement also breaks the "infinite" loop.)
At this point you're probably feeling a mixture of "Okay, this is starting to make sense" but also "Wait a minute, you're playing pretty fast and loose here, this is never going to work, is it?" But it is going to work, as we'll see.
The next function we need is the one that collects the "terms" or subexpressions to be added (and subtracted) together by evalexpr(). That function is evalterm(), and it ends up being very similar -- very similar -- to evalexpr. Its job is to collect a series of one or more subexpressions joined by * and/or /, and multiply and divide them. At this point, we're going to call those subexpressions "primaries". Here is the code:
int evalpri(const char **);
int evalterm(const char **p)
{
int r = evalpri(p);
while(1) {
int op = gettok(p, NULL);
switch(op) {
case '*': r *= evalpri(p); break;
case '/': r /= evalpri(p); break;
default: ungettok(op, p); return r;
}
}
}
There's actually nothing more to say here, because the structure of evalterm ends up being exactly like evalexpr, except that it does things with * and /, and it calls evalpri to get/evaluate its subexpressions.
So now let's look at evalpri. Its job is to evaluate the three lowest-level, but highest-precedence elements of an expression: actual numbers, and parenthesized subexpressions, and the unary - operator.
int evalpri(const char **p)
{
int v;
int op = gettok(p, &v);
switch(op) {
case '1': return v;
case '-': return -evalpri(p);
case '(':
v = evalexpr(p);
op = gettok(p, NULL);
if(op != ')') {
fprintf(stderr, "missing ')'\n");
ungettok(op, p);
}
return v;
}
}
The first thing to do is call the same gettok function we used in evalexpr and evalterm. But now it's time to say a little more about it. It is actually the lexical analyzer used by our little parser. A lexical analyzer returns primitive "tokens". Tokens are the basic syntactic elements of a programming language. Tokens can be single characters, like + or -, or they can also be multi-character entities. An integer constant like 123 is considered a single token. In C, other tokens are keywords like while, and identifiers like printf, and multi-character operators like <= and ++. (Our little expression evaluator doesn't have any of those, though.)
So gettok has to return two things. First it has to return a code for what kind of token it found. For single-character tokens like + and - we're going to say that the code is just the character. For numeric constants (that is, any numeric constant), we're going to say that gettok is going to return the character 1. But we're going to need some way of knowing what the value of the numeric constant was, and that, as you may have guessed, is what the second, pointer argument to the gettok function is for. When gettok returns 1 to indicate a numeric constant, and if the caller passes a pointer to an int value, gettok will fill in the integer value there. (We'll see the definition of the gettok function in a moment.)
At any rate, with that explanation of gettok out of the way, we can understand evalpri. It gets one token, passing the address of a local variable v in which the value of the token can be returned, if necessary. If the token is a 1 indicating an integer constant, we simply return the value of that integer constant. If the token is a -, this is a unary minus sign, so we get another subexpression, negate it, and return it. Finally, if the token is a (, we get another whole expression, and return its value, checking to make sure that there's another ) token after it. And, as you may notice, inside the parentheses we make a recursive call back to the top-level evalexpr function to get the subexpression, because obviously we want to allow any subexpression, even one containing lower-precedence operators like + and -, inside the parentheses.
And we're almost done. Next we can look at gettok. As I mentioned, gettok is the lexical analyzer, inspecting individual characters and constructing full tokens from them. We're now, finally, starting to see how the passed-around pointer-to-pointer p is used.
#include <stdlib.h>
#include <ctype.h>
void skipwhite(const char **);
int gettok(const char **p, int *vp)
{
skipwhite(p);
char c = **p;
if(isdigit(c)) {
char *p2;
int v = strtoul(*p, &p2, 0);
*p = p2;
if(vp) *vp = v;
return '1';
}
(*p)++;
return c;
}
Expressions can contain arbitrary whitespace, which is ignored, so we skip over that with an auxiliary function skipwhite. And now we look at the next character. p is a pointer to pointer to that character, so the character itself is **p. If it's a digit, we call strtoul to convert it. strtoul helpfully returns a pointer to the character following the number it scans, so we use that to update p. We fill in the passed pointer vp with the actual value strtoul computed for us, and we return the code 1 indicating an integer constant.
Otherwise -- if the next character isn't a digit -- it's an ordinary character, presumably an operator like + or - or punctuation like ( ), so we simply return the character, after incrementing *p to record the fact that we've consumed it. Properly "incrementing" p is mildly tricky: it's a pointer to a pointer, and we want to increment the pointed-to pointer. If we wrote p++ or *p++ it would increment the pointer p, so we need (*p)++ to say that it's the pointed-to pointer that we want to increment. (See also C FAQ 4.3.)
Two more utility functions, and then we're done. Here's skipwhite:
void skipwhite(const char **p)
{
while(isspace(**p))
(*p)++;
}
This simply skips over zero or more whitespace characters, as determined by the isspace function from <ctype.h>. (Again, we're taking care to remember that p is a pointer-to-pointer.)
Finally, we come to ungettok. It's a hallmark of a recursive descent parser (or, indeed, almost any parser) that it has to "look ahead" in the input, making a decision based on the next token. Sometimes, however, it decides that it's not ready to deal with the next token after all, so it wants to leave it there on the input for some other part of the parser to deal with later.
Stuffing input "back on the input stream", so to speak, can be tricky. This implementation of ungettok is simple, but it's decidedly imperfect:
void ungettok(int op, const char **p)
{
(*p)--;
}
It doesn't even look at the token it's been asked to put back; it just backs the pointer up by 1. This will work if (but only if) the token it's being asked to unget is in fact the most recent token that was gotten, and if it's not an integer constant token. In fact, for the program as written, and as long as the expression it's parsing is well-formed, this will always be the case. It would be possible to write a more complicated version of gettok that explicitly checked for these assumptions, and that would be able to back up over multi-character tokens (such as integer constants) if necessary, but this post has gotten much longer than I had intended, so I'm not going to worry about that for now.
But if you're still with me, we're done! And if you haven't already, I encourage you to copy all the code I've presented into your friendly neighborhood C compiler, and try it out. You'll find, for example, that 1 + 2 * 3 gives 7 (not 9), because the parser "knows" that * and / have higher precedence than + and -. Just like in a real compiler, you can override the default precedence using parentheses: (1 + 2) * 3 gives 9. Left-to-right associativity works, too: 1 - 2 - 3 is -4, not +2. It handles plenty of complicated, and perhaps surprising (but legal) cases, too: (((((5))))) evaluates to just 5, and ----4 evaluates to just 4 (it's parsed as "negative negative negative negative four", since our simplified parser doesn't have C's -- operator).
This parser does have a pretty big limitation, however: its error handling is terrible. It will handle legal expressions, but for illegal expressions, it will either do something bizarre, or just ignore the problem. For example, it simply ignores any trailing garbage it doesn't recognize or wasn't expecting -- the expressions 4 + 5 x, 4 + 5 %, and 4 + 5 ) all evaluate to 9.
Despite being somewhat of a "toy", this is also a very real parser, and as we've seen it can parse a lot of real expressions. You can learn a lot about how expressions are parsed (and about how compilers can be written) by studying this code. (One footnote: recursive descent is not the only way to write a parser, and in fact real compilers will usually use considerably more sophisticated techniques.)
You might even want to try extending this code, to handle other operators or other "primaries" (such as settable variables). Once upon a time, in fact, I started with something like this and extended it all the way into an actual C interpreter.
I was doing massive parsing of positive integers using scanf("%d", &someint). As I wanted to see if scanf was a bottleneck, I implemented a naive integer parsing function using fread, just like:
int result;
char c;
while (fread(&c, sizeof c, 1, stdin), c == ' ' || c == '\n')
;
result = c - '0';
while (fread(&c, sizeof c, 1, stdin), c >= '0' || c <= '9') {
result *= 10;
result += c - '0';
}
return result;
But to my astonishment, the performance of this function is (even with inlining) about 50% worse. Shouldn't there be a possibility to improve on scanf for specialized cases? Isn't fread supposed to be fast (additional hint: Integers are (edit: mostly) 1 or 2 digits)?
The overhead I was encountering was not the parsing itself but the many calls to fread (same with fgetc, and friends). For each call, the libc has to lock the input stream to make sure two threads aren't stepping on each other's feet. Locking is a very costly operation.
What we're looking for is a function that gives us buffered input (reimplementing buffering is just too much effort) but avoids the huge locking overhead of fgetc.
If we can guarantee that there is only a single thread using the input stream, we can use the functions from unlocked_stdio(3), such as getchar_unlocked(3). Here is an example:
static int parseint(void)
{
int c, n;
n = getchar_unlocked() - '0';
while (isdigit((c = getchar_unlocked())))
n = 10*n + c-'0';
return n;
}
The above version doesn't check for errors. But it's guaranteed to terminate. If error handling is needed it might be enough to check feof(stdin) and ferror(stdin) at the end, or let the caller do it.
My original purpose was submitting solutions to programming problems at SPOJ, where the input is only whitespace and digits. So there is still room for improvement, namely the isdigit check.
static int parseint(void)
{
int c, n;
n = getchar_unlocked() - '0';
while ((c = getchar_unlocked()) >= '0')
n = 10*n + c-'0';
return n;
}
It's very, very hard to beat this parsing routine, both performance-wise and in terms of convenience and maintainability.
You'll be able to improve significantly on your example by buffering - read a large number of characters into memory, and then parse them from the in-memory version.
If you're reading from disk you might get a performance increase by your buffer being a multiple of the block size.
Edit: You can let the kernel handle this for you by using mmap to map the file into memory.
Here's something I use.
#define scan(x) do{while((x=getchar())<'0'); for(x-='0'; '0'<=(_=getchar()); x=(x<<3)+(x<<1)+_-'0');}while(0)
char _;
However, this only works with Integers.
From what you say, I derive the following facts:
numbers are in the range of 0-99, which accounts for 10+100 different strings (including leading zeros)
you trust that your input stream adheres to some sort of specification and won't contain any unexpected character sequences
In that case, I'd use a lookup table to convert strings to numbers. Given a string s[2], the index to your lookup table can be computed by s[1]*10 + s[0], swapping the digits and making use of the fact that '\0' equals 0 in ASCII.
Then, you can read your input in the following way:
// given our lookup method, this table may need padding entries
int lookup_table[] = { /*...*/ };
// no need to call superfluous functions
#define str2int(x) (lookup_table[(x)[1]*10 + (x)[0]])
while(read_token_from_stream(stdin, buf))
next_int = str2int(buf);
On today's machines, it will be hard to come up with a faster technique. My guess is that this method will likely run 2 to 10 times faster than any scanf()-based approach.