Faster than scanf? - c

I was doing massive parsing of positive integers using scanf("%d", &someint). As I wanted to see if scanf was a bottleneck, I implemented a naive integer parsing function using fread, just like:
int result;
char c;
while (fread(&c, sizeof c, 1, stdin), c == ' ' || c == '\n')
;
result = c - '0';
while (fread(&c, sizeof c, 1, stdin), c >= '0' || c <= '9') {
result *= 10;
result += c - '0';
}
return result;
But to my astonishment, the performance of this function is (even with inlining) about 50% worse. Shouldn't there be a possibility to improve on scanf for specialized cases? Isn't fread supposed to be fast (additional hint: Integers are (edit: mostly) 1 or 2 digits)?

The overhead I was encountering was not the parsing itself but the many calls to fread (same with fgetc, and friends). For each call, the libc has to lock the input stream to make sure two threads aren't stepping on each other's feet. Locking is a very costly operation.
What we're looking for is a function that gives us buffered input (reimplementing buffering is just too much effort) but avoids the huge locking overhead of fgetc.
If we can guarantee that there is only a single thread using the input stream, we can use the functions from unlocked_stdio(3), such as getchar_unlocked(3). Here is an example:
static int parseint(void)
{
int c, n;
n = getchar_unlocked() - '0';
while (isdigit((c = getchar_unlocked())))
n = 10*n + c-'0';
return n;
}
The above version doesn't check for errors. But it's guaranteed to terminate. If error handling is needed it might be enough to check feof(stdin) and ferror(stdin) at the end, or let the caller do it.
My original purpose was submitting solutions to programming problems at SPOJ, where the input is only whitespace and digits. So there is still room for improvement, namely the isdigit check.
static int parseint(void)
{
int c, n;
n = getchar_unlocked() - '0';
while ((c = getchar_unlocked()) >= '0')
n = 10*n + c-'0';
return n;
}
It's very, very hard to beat this parsing routine, both performance-wise and in terms of convenience and maintainability.

You'll be able to improve significantly on your example by buffering - read a large number of characters into memory, and then parse them from the in-memory version.
If you're reading from disk you might get a performance increase by your buffer being a multiple of the block size.
Edit: You can let the kernel handle this for you by using mmap to map the file into memory.

Here's something I use.
#define scan(x) do{while((x=getchar())<'0'); for(x-='0'; '0'<=(_=getchar()); x=(x<<3)+(x<<1)+_-'0');}while(0)
char _;
However, this only works with Integers.

From what you say, I derive the following facts:
numbers are in the range of 0-99, which accounts for 10+100 different strings (including leading zeros)
you trust that your input stream adheres to some sort of specification and won't contain any unexpected character sequences
In that case, I'd use a lookup table to convert strings to numbers. Given a string s[2], the index to your lookup table can be computed by s[1]*10 + s[0], swapping the digits and making use of the fact that '\0' equals 0 in ASCII.
Then, you can read your input in the following way:
// given our lookup method, this table may need padding entries
int lookup_table[] = { /*...*/ };
// no need to call superfluous functions
#define str2int(x) (lookup_table[(x)[1]*10 + (x)[0]])
while(read_token_from_stream(stdin, buf))
next_int = str2int(buf);
On today's machines, it will be hard to come up with a faster technique. My guess is that this method will likely run 2 to 10 times faster than any scanf()-based approach.

Related

scanf inside function to return value (or other function)

so i was going to run a function in an infinite loop which takes a number input, but then I remembered I codn't do
while (true) {
myfunc(scanf("%d));
}
because I need to put the scanf input into a variable. I can't do scanf(%*d) because that doesn't return value at all. I don't want to have to do
int temp;
while (true) {
scanf("%d", &temp);
myfunc(temp);
or include more libraries. Is there any standard single function like gets (I cod do myfunc((int) strtol(gets(), (char**) NULL, 10)); but its kinda messy sooo yea)
srry if im asking too much or being pedantic and i shod do ^
btw unrelated question is there any way to declare a string as an int--- or even better, a single function for converting int to string? I usually use
//num is some number
char* str = (char*) malloc(12);
sprintf(str, "%d", num);
func(str);
but wodnt func(str(num)); be easier?
For starters, the return value of scanf (and similar functions) is the number of conversions that took place. That return value is also used to signify if an error occurred.
In C you must manually manage these errors.
if ((retv = scanf("%d", &n)) != 1) {
/* Something went wrong. */
}
What you seem to be looking for are conveniences found in higher-level languages. Languages & runtimes that can hide the details from you with garbage collection strategies, exception nets (try .. catch), etc. C is not that kind of language, as by today's standards it is quite a low-level language. If you want "non-messy" functions, you will have to build them up from scratch, but you will have to decide what kinds of tradeoffs you can live with.
For example, perhaps you want a simple function that just gets an int from the user. A tradeoff you could make is that it simply returns 0 on any error whatsoever, in exchange for never knowing if this was an error, or the user actually input 0.
int getint(void) {
int n;
if (scanf("%d", &n) != 1)
return 0;
return n;
}
This means that if a user makes a mistake on input, you have no way of retrying, and the program must simply roll on ahead.
This naive approach scales poorly with the fact that you must manually manage memory in C. It is up to you to free any memory you dynamically allocate.
You could certainly write a simple function like
char *itostr(int n) {
char *r = malloc(12);
if (r && sprintf(r, "%d", n) < 1) {
r[0] = '0';
r[1] = '\0';
}
return r;
}
which does the most minimal of error checking (Again, we don't know if "0" is an error, or a valid input).
The problem comes when you write something like func(itostr(51));, unless func is to be expected to free its argument (which would rule out passing non-dynamically allocated strings), you will constantly be leaking memory with this pattern.
So no there is no real "easy" way to do these things. You will have to get "messy" (handle errors, manage memory, etc.) if you want to build anything with complexity.

Time Complexity of a printf()?

I'd like to determine time complexity of a printf such as:
{
printf("%d",
i);
}
Or:
{
printf("%c",
array[i]);
}
Is it correct to assume that time complexity of a printf is always O(1) or not?
[EDIT] Let's take a function that swaps two values:
void swap(...)
{
tmp = x;
x = y;
y = tmp;
}
Every assignment expression costs 1 (in terms of time complexity), so T(n) = 1 + 1 + 1 = 3 which means O(1). But what can I say about this function?
void swap(...)
{
tmp = x;
x = y;
y = tmp;
printf("Value of x: %d", x);
printf("Value of y: %d", y);
}
Can I say that T(n) is still O(1) in this case?
I don't think this is really a sensible question to ask, because printf's behavior is mostly implementation-defined. C doesn't place any restrictions on what the system decides to do once it hits printf. It does have a notion of a stream. Section 7.21 of the C11 standard states that printf acts over a stream.
C lets the implementation do anything that it wants with streams after they're written to (7.21.2.2):
Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one- to-one correspondence between the characters in a stream and those in the external
representation
So your call to printf is allowed to write out 1 TB whenever a char is printed, and 1 byte whenever an int is printed.
The standard doesn't even require that the write happen when printf is actually called (7.21.3.3):
When a stream is unbuffered, characters are intended to appear from the source or at the
destination as soon as possible. Otherwise characters may be accumulated and
transmitted to or from the host environment as a block. When a stream is fully buffered, characters are intended to be transmitted to or from the host environment as a block when
a buffer is filled... Support for these characteristics is
implementation-defined.
And the standard doesn't specify whether stdout is buffered or unbuffered. So C allows printf to do pretty much whatever it feels like once you ask it for a write.
It is strange to try to evaluate time complexity of printf() as it's a blocking input/output operation that performs some text processing and performs a write operation via a series of write() system calls through an intermediate buffering layer.
The best guess about the text processing part is that the whole input string must be read and all arguments are processed so unless there's some black magic, you can assume O(n) to the number of characters. You're usually not expected to feed the format argument of printf() dynamicaly and therefore the size is known, therefore finite and therefore the complexity is indeed O(1).
On the other hand, the time complexity of a blocking output operation is not bounded. In blocking mode, all write() operations return either with an error or with at least one byte written. Assuming the system is ready to accept new data in a constant time, you're getting O(1) as well.
Any other transformations also occur lineary to the typically constant size of the format or result string, so with a number of assumptions, you can say it's O(1).
Also your code suggests that the output only occurs to actually test the functionality and shouldn't be considered part of the computation at all. The best way is to move the I/O out of the functions you want to consider for the purpose of complexity, e.g. to the main() function to stress that the input and output is there just for testing out the code.
Implementation of the O(1) swap function without I/O:
void swap(int *a, int *b)
{
int tmp = *a;
*a = *b;
*b = tmp;
}
Alternative implementation without a temporary variable (just for fun):
void swap(int *a, int *b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
Implementation of the main function:
int main(int argc, char **argv)
{
int a = 3, b = 5;
printf("a = %d; b = %d\n", a, b);
swap(&a, &b);
printf("a = %d; b = %d\n", a, b);
return 0;
}
Generally, the complexity of printf() is O(N) with N being the number of characters that are output. And this amount is not necessarily a small constant, as in these two calls:
printf("%s", myString);
printf("%*d", width, num);
The length of myString does not necessarily have an upper bound, so complexity of the first call is O(strlen(myString)), and the second call will output width characters, which can be expected to take O(width) time.
However, in most cases the amount of output written by printf() will be bounded by a small constant: format strings are generally compile time constants and computed field widths as above are rarely used. String arguments are more frequent, but oftentimes allow giving an upper limit as well (like when you output a string from a list of error messages). So, I'd wager that at least 90% of the real world printf() calls are O(1).
The time complexity means not how many time required to run particular program. The time complexity is measured in number of frequencies it requires to run. Now if you are considering the simple printf() statement then obviously time complexity will be O(1).
Refer:
Time Complexity For Algorithm

I want to filter a stream of numbers for values within a range

I have a critical section of code which examines each char in many strings to ensure it falls in an acceptable range.
Is there any way i can perform such filtering without branching?
...
int i, c;
int sl = strnlen(s, 1023);
for( i = 0; i < sl; i++ ) {
c = s[i];
if( c < 68 || c > 88 )
return E_INVALID;
}
if( 0 == i )
return E_INVALID;
... do something with s ...
I was thinking some kind of filtering using bitwise operations might be possible, but in practice i can't see how to make this work. Bitwise AND with 95 trims the range down to 0-31,64-95. i can't see how to progress without introducing an if test, rendering the idea of skipping the branching void.
Assuming your strings are really unsigned chars, not ints, you could have a 256 byte lookup table of unacceptable characters, which would make your test if(table[s[i]]) { return E_INVALID; }
However, if you are trying to speed up a critical function, you should do other things for much bigger payoff. To start, you can skip the strnlen entirely, and terminate the loop on a 0 char. That alone will probably get you a factor of 2. Next unroll the loop by a factor of 10 or so, which ought to get another factor of 2.
It is possible to filter using bitwise operations. Try...
c & 68 & ~88;
This should always return zero for values outsize the boundary and a non-zero value for values inside your boundary.
The order is necessary too...
CHAR & LowerBound & ~UpperBound
Flipping the boundaries would result in wrong behaviours

Converting ASCII string to integer using bitwise operators in C & vice versa

I am able to do this without using bitwise operators as below
int AsciiToInteger()
{
char s[] = "Stack Overflow";
int i, n = 0;
for (i = 0; s[i] !='\0'; i++)
{
n += s[i];
}
return n;
}
How can I achieve the same using bitwise operators in C without using for loop?
You can achieve the same without a for loop using recursion:
int AsciiToInteger(const char * Str)
{
if(*Str)
return (int)*Str + AsciiToInteger(Str+1);
else
return 0;
}
/* ... */
int n = AsciiToInteger("Stack Overflow");
I don't know what bitwise operators have to do with this, you surely cannot use only them without a loop and without recursion for arbitrary-length strings (for fixed length strings instead the result would probably be something like unrolling the loop).
... but now that I read the comments I'm quite sure I didn't get the sense of the question... :S
Except as an exercise in building higher level operations from bitwise operations, the task you're trying to accomplish is foolish. Don't do it.
As an exercise, the most important thing to realize is that you don't have to go back to the start every time you need to implement something new in terms of the building blocks. Instead you could write addition and subtraction functions in terms of bitwise building blocks, and put those together using the existing higher-level algorithm you've already got.
As for eliminating the loop, you could just unroll it to support a fixed max number of digits (the longest value that will fit in int, for example) unless you need to support arbitrary number of leading zeros. Recursion is a very bad approach in general and contrary to the whole "close to the metal" aspect of this exercise. Perhaps they just want you to avoid adding/incrementing a counter in the loop with "high level" addition, in which case you could use your bitwise adder function...
One of the main reasons that loops exist is so that you can do operations an unknown number of times. If you don't know how long your string is, you have no way of doing this without a loop. Even if you do know the length of the string, why would you want to do it without a loop?

Best way to convert whole file to lowercase in C

I was wondering if theres a realy good (performant) solution how to Convert a whole file to lower Case in C.
I use fgetc convert the char to lower case and write it in another temp-file with fputc. At the end i remove the original and rename the tempfile to the old originals name. But i think there must be a better Solution for it.
This doesn't really answer the question (community wiki), but here's an (over?)-optimized function to convert text to lowercase:
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
int fast_lowercase(FILE *in, FILE *out)
{
char buffer[65536];
size_t readlen, wrotelen;
char *p, *e;
char conversion_table[256];
int i;
for (i = 0; i < 256; i++)
conversion_table[i] = tolower(i);
for (;;) {
readlen = fread(buffer, 1, sizeof(buffer), in);
if (readlen == 0) {
if (ferror(in))
return 1;
assert(feof(in));
return 0;
}
for (p = buffer, e = buffer + readlen; p < e; p++)
*p = conversion_table[(unsigned char) *p];
wrotelen = fwrite(buffer, 1, readlen, out);
if (wrotelen != readlen)
return 1;
}
}
This isn't Unicode-aware, of course.
I benchmarked this on an Intel Core 2 T5500 (1.66GHz), using GCC 4.6.0 and i686 (32-bit) Linux. Some interesting observations:
It's about 75% as fast when buffer is allocated with malloc rather than on the stack.
It's about 65% as fast using a conditional rather than a conversion table.
I'd say you've hit the nail on the head. Temp file means that you don't delete the original until you're sure that you're done processing it which means upon error the original remains. I'd say that's the correct way of doing it.
As suggested by another answer (if file size permits) you can do a memory mapping of the file via the mmap function and have it readily available in memory (no real performance difference if the file is less than the size of a page as it's probably going to get read into memory once you do the first read anyway)
You can usually get a little bit faster on big inputs by using fread and fwrite to read and write big chunks of the input/output. Also you should probably convert a bigger chunk (whole file if possible) into memory and then write it all at once.
edit: I just rememberd one more thing. Sometimes programs can be faster if you select a prime number (at the very least not a power of 2) as the buffer size. I seem to recall this has to do with specifics of the cacheing mechanism.
If you're processing big files (big as in, say, multi-megabytes) and this operation is absolutely speed-critical, then it might make sense to go beyond what you've inquired about. One thing to consider in particular is that a character-by-character operation will perform less well than using SIMD instructions.
I.e. if you'd use SSE2, you could code the toupper_parallel like (pseudocode):
for (cur_parallel_word = begin_of_block;
cur_parallel_word < end_of_block;
cur_parallel_word += parallel_word_width) {
/*
* in SSE2, parallel compares are either about 'greater' or 'equal'
* so '>=' and '<=' have to be constructed. This would use 'PCMPGTB'.
* The 'ALL' macro is supposed to replicate into all parallel bytes.
*/
mask1 = parallel_compare_greater_than(*cur_parallel_word, ALL('A' - 1));
mask2 = parallel_compare_greater_than(ALL('Z'), *cur_parallel_word);
/*
* vector op - and all bytes in two vectors, 'PAND'
*/
mask = mask1 & mask2;
/*
* vector op - add a vector of bytes. Would use 'PADDB'.
*/
new = parallel_add(cur_parallel_word, ALL('a' - 'A'));
/*
* vector op - zero bytes in the original vector that will be replaced
*/
*cur_parallel_word &= !mask; // that'd become 'PANDN'
/*
* vector op - extract characters from new that replace old, then or in.
*/
*cur_parallel_word |= (new & mask); // PAND / POR
}
I.e. you'd use parallel comparisons to check which bytes are uppercase, and then mask both original value and 'uppercased' version (one with the mask, the other with the inverse) before you or them together to form the result.
If you use mmap'ed file access, this could even be performed in-place, saving on the bounce buffer, and saving on many function and/or system calls.
There is a lot to optimize when your starting point is a character-by-character 'fgetc' / 'fputc' loop; even shell utilities are highly likely to perform better than that.
But I agree that if your need is very special-purpose (i.e. something as clear-cut as ASCII input to be converted to uppercase) then a handcrafted loop as above, using vector instruction sets (like SSE intrinsics/assembly, or ARM NEON, or PPC Altivec), is likely to make a significant speedup possible over existing general-purpose utilities.
Well, you can definitely speed this up a lot, if you know what the character encoding is. Since you're using Linux and C, I'm going to go out on a limb here and assume that you're using ASCII.
In ASCII, we know A-Z and a-z are contiguous and always 32 apart. So, what we can do is ignore the safety checks and locale checks of the toLower() function and do something like this:
(pseudo code)
foreach (int) char c in the file:
c -= 32.
Or, if there may be upper and lowercase letters, do a check like
if (c > 64 && c < 91) // the upper case ASCII range
then do the subtract and write it out to the file.
Also, batch writes are faster, so I would suggest first writing to an array, then all at once writing the contents of the array to the file.
This should be considerable faster.

Resources