How would you implement the pilloried function in the Daily WTF? - c

The Daily WTF for 2008-11-28 pillories the following code:
static char *nice_num(long n)
{
int neg = 0, d = 3;
char *buffer = prtbuf;
int bufsize = 20;
if (n < 0)
{
neg = 1;
n = -n;
}
buffer += bufsize;
*--buffer = '\0';
do
{
*--buffer = '0' + (n % 10);
n /= 10;
if (--d == 0)
{
d = 3;
*--buffer = ',';
}
}
while (n);
if (*buffer == ',') ++buffer;
if (neg) *--buffer = '-';
return buffer;
}
How would you write it?

If you're a seasoned C programmer, you'll realize this code isn't actually that bad. It's relatively straightforward (for C), and it's blazingly fast. It has three problems:
It fails on the edge case of LONG_MIN (-2,147,483,648), since negating this number produces itself in twos-complement
It assumes 32-bit integers - for 64-bit longs, a 20-byte buffer is not big enough
It's not thread-safe - it uses a global static buffer, so multiple threads calling it at the same time will result in a race condition
Problem #1 is easily solved with a special case. To address #2, I'd separate the code into two functions, one for 32-bit integers and one for 64-bit integers. #3 is a little harder - we have to change the interface to make completely thread-safe.
Here is my solution, based on this code but modified to address these problems:
static int nice_num(char *buffer, size_t len, int32_t n)
{
int neg = 0, d = 3;
char buf[16];
size_t bufsize = sizeof(buf);
char *pbuf = buf + bufsize;
if(n < 0)
{
if(n == INT32_MIN)
{
strncpy(buffer, "-2,147,483,648", len);
return len <= 14;
}
neg = 1;
n = -n;
}
*--pbuf = '\0';
do
{
*--pbuf = '0' + (n % 10);
n /= 10;
if(--d == 0)
{
d = 3;
*--pbuf = ',';
}
}
while(n > 0);
if(*pbuf == ',') ++pbuf;
if(neg) *--pbuf = '-';
strncpy(buffer, pbuf, len);
return len <= strlen(pbuf);
}
Explanation: it creates a local buffer on the stack and then fills that in in the same method as the initial code. Then, it copies it into a parameter passed into the function, making sure not to overflow the buffer. It also has a special case for INT32_MIN. The return value is 0 if the original buffer was large enough, or 1 if the buffer was too small and the resulting string was truncated.

Hmm... I guess I shouldn't admit this, but my int to string routine for an embedded system work in pretty much exactly the same way (but without putting in the commas).
It's not particularly straightforward, but I wouldn't call it a WTF if you're working on a system that you can't use snprintf() on.
The guy who wrote the above probably noted that the printf() family of routines can't do comma grouping, so he came up with his own.
Footnote: there are some libraries where the printf() style formatting does support grouping, but they are not standard. And I know that the posted code doesn't support other locales that group using '.'. But that's hardly a WTF, just a bug possibly.

That's probably pretty close to the way I would write it actually. The only thing I can immediately see that is wrong with the solution is that is doesn't work for LONG_MIN on machines where LONG_MIN is -(LONG_MAX + 1), which is most machines nowadays. I might use localeconv to get the thousands separator instead of assuming comma, and I might more carefully calculate the buffer size, but the algorithm and implementation seem pretty straight-forward to me, not really much of a WTF for C (there are much better solutions for C++).

Lisp:
(defun pretty-number (x) (format t "~:D" x))
I'm suprised how easily I could do this. I'm not even past the first chapter in my Lisp book. xD (Or should I say, ~:D)

size_t
signed_as_text_grouped_on_powers_of_1000(char *s, ssize_t max, int n)
{
if (max <= 0)
return 0;
size_t r=0;
bool more_groups = n/1000 != 0;
if (more_groups)
{
r = signed_as_text_grouped_on_powers_of_1000(s, max, n/1000);
r += snprintf(s+r, max-r, ",");
n = abs(n%1000);
r += snprintf(s+r, max-r, "%03d",n);
} else
r += snprintf(s+r, max-r, "% 3d", n);
return r;
}
Unfortunately, this is about 10x slower than the original.

In pure C:
#include <stdio.h>
#include <limits.h>
static char *prettyNumber(long num, int base, char separator)
{
#define bufferSize (sizeof(long) * CHAR_BIT)
static char buffer[bufferSize + 1];
unsigned int pos = 0;
/* We're walking backwards because numbers are right to left. */
char *p = buffer + bufferSize;
*p = '\0';
int negative = num < 0;
do
{
char digit = num % base;
digit += '0';
*(--p) = digit;
++pos;
num /= base;
/* This the last of a digit group? */
if(pos % 3 == 0)
{
/* TODO Make this a user setting. */
#ifndef IM_AMERICAN
# define IM_AMERICAN_BOOL 0
#else
# define IM_AMERICAN_BOOL 1
#endif
/* Handle special thousands case. */
if(!IM_AMERICAN_BOOL && pos == 3 && num < base)
{
/* DO NOTHING */
}
else
{
*(--p) = separator;
}
}
} while(num);
if(negative)
*(--p) = '-';
return p;
#undef bufferSize
}
int main(int argc, char **argv)
{
while(argc > 1)
{
long num = 0;
if(sscanf(argv[1], "%ld", &num) != 1)
continue;
printf("%ld = %s\n", num, prettyNumber(num, 10, ' '));
--argc;
++argv;
};
return 0;
}
Normally I'd return an alloc'd buffer, which would need to be free'd by the user. This addition is trivial.

I got bored and made this naive implementation in Perl. Works.
sub pretify {
my $num = $_[0];
my $numstring = sprintf( "%f", $num );
# Split into whole/decimal
my ( $whole, $decimal ) = ( $numstring =~ /(^\d*)(.\d+)?/ );
my #chunks;
my $output = '';
# Pad whole into multiples of 3
$whole = q{ } x ( 3 - ( length $whole ) % 3 ) . $whole;
# Create an array of all 3 parts.
#chunks = $whole =~ /(.{3})/g;
# Reassemble with commas
$output = join ',', #chunks;
if ($decimal) {
$output .= $decimal;
}
# Strip Padding ( and spurious commas )
$output =~ s/^[ ,]+//;
# Strip excess tailing zeros
$output =~ s/0+$//;
# Ending with . is ugly
$output =~ s/\.$//;
return $output;
}
print "\n", pretify 100000000000000000000000000.0000;
print "\n", pretify 10_202_030.45;
print "\n", pretify 10_101;
print "\n", pretify 0;
print "\n", pretify 0.1;
print "\n", pretify 0.0001;
print "\n";

Related

Adding 2 binary strings

I'm passing almost all leetCode tests with this, but not understanding why the output is wrong ("/0") when the input is:
a = "10100000100100110110010000010101111011011001101110111111111101000000101111001110001111100001101"
b = "110101001011101110001111100110001010100001101011101010000011011011001011101111001100000011011110011"
Anyone has an idea to what is not working ?
Thanks
#include <stdio.h>
#include <stdlib.h>
char * sumBinary(long int binary1, long int binary2, char * result);
char * addBinary(char * a, char * b)
{
char * result;
long int a_int;
long int b_int;
a_int = atoi(a);
b_int = atoi(b);
result = malloc(sizeof(*result) * 1000);
if (!result)
return (NULL);
sumBinary(a_int, b_int, result);
return (result);
}
char * sumBinary(long int binary1, long int binary2, char * result)
{
int i;
int t;
int rem;
int sum[1000];
i = 0;
t = 0;
rem = 0;
if ((binary1 == 0) && (binary2 == 0))
{
result[0] = '0';
result[1] = '\0';
}
else
{
while (binary1 != 0 || binary2 != 0)
{
sum[i++] = (binary1 %10 + binary2 % 10 + rem) % 2;
rem = (binary1 %10 + binary2 % 10 + rem) / 2;
binary1 = binary1 / 10;
binary2 = binary2 / 10;
}
if (rem != 0)
sum[i++] = rem;
--i;
while (i >= 0)
{
result[t] = sum[i] + '0';
t++;
i--;
}
result[t] = '\0';
}
return (result);
}
For a start, you should be using atol(3), not atoi(3) if you're using long int. But that's not the main issue here.
atol(3) and atoi(3) expect strings containing decimal numbers, not binary, so that's not going to work well for you. You would need strtol(3), which you can tell to expect a string in ASCII binary. But again, this is not the main issue.
You don't give the question text, but I'm guessing they want you to add two arbitrarily-long ASCII-binary strings, resulting in an ASCII-binary string.
I imagine their expectation, given it's arbitrarily-long, is that you would be working entirely in the string domain. So you'd allocate for a string whose length is two greater than the longer of the two you get as parameters (+1 for the terminal NUL, the other +1 for a potential overflow digit).
Then you start from the end, working back to the start, adding the corresponding digits of the parameter strings, placing the results into the result string starting from its end (allowing for that terminal NUL), adding as if you were doing it by hand.
Don't forget to add a leading zero to the result string, if you don't overflow into that position.
Note that I'm not going to write the code for you. This is either a learning exercise or a test: either way, you need to do the coding so you can learn from it.

How output a numbers with write() (only #include <unistd.h> allowed) [duplicate]

It is possible to convert integer to string in C without sprintf?
There's a nonstandard function:
char *string = itoa(numberToConvert, 10); // assuming you want a base-10 representation
Edit: it seems you want some algorithm to do this. Here's how in base-10:
#include <stdio.h>
#define STRINGIFY(x) #x
#define INTMIN_STR STRINGIFY(INT_MIN)
int main() {
int anInteger = -13765; // or whatever
if (anInteger == INT_MIN) { // handle corner case
puts(INTMIN_STR);
return 0;
}
int flag = 0;
char str[128] = { 0 }; // large enough for an int even on 64-bit
int i = 126;
if (anInteger < 0) {
flag = 1;
anInteger = -anInteger;
}
while (anInteger != 0) { 
str[i--] = (anInteger % 10) + '0';
anInteger /= 10;
}
if (flag) str[i--] = '-';
printf("The number was: %s\n", str + i + 1);
return 0;
}
Here's an example of how it might work. Given a buffer and a size, we'll keep dividing by 10 and fill the buffer with digits. We'll return -1 if there is not enough space in the buffer.
int
integer_to_string(char *buf, size_t bufsize, int n)
{
char *start;
// Handle negative numbers.
//
if (n < 0)
{
if (!bufsize)
return -1;
*buf++ = '-';
bufsize--;
}
// Remember the start of the string... This will come into play
// at the end.
//
start = buf;
do
{
// Handle the current digit.
//
int digit;
if (!bufsize)
return -1;
digit = n % 10;
if (digit < 0)
digit *= -1;
*buf++ = digit + '0';
bufsize--;
n /= 10;
} while (n);
// Terminate the string.
//
if (!bufsize)
return -1;
*buf = 0;
// We wrote the string backwards, i.e. with least significant digits first.
// Now reverse the string.
//
--buf;
while (start < buf)
{
char a = *start;
*start = *buf;
*buf = a;
++start;
--buf;
}
return 0;
}
Unfortunately none of the answers above can really work out in a clean way in a situation where you need to concoct a string of alphanumeric characters.There are really weird cases I've seen, especially in interviews and at work.
The only bad part of the code is that you need to know the bounds of the integer so you can allocate "string" properly.
In spite of C being hailed predictable, it can have weird behaviour in a large system if you get lost in the coding.
The solution below returns a string of the integer representation with a null terminating character. This does not rely on any outer functions and works on negative integers as well!!
#include <stdio.h>
#include <stdlib.h>
void IntegertoString(char * string, int number) {
if(number == 0) { string[0] = '0'; return; };
int divide = 0;
int modResult;
int length = 0;
int isNegative = 0;
int copyOfNumber;
int offset = 0;
copyOfNumber = number;
if( number < 0 ) {
isNegative = 1;
number = 0 - number;
length++;
}
while(copyOfNumber != 0)
{
length++;
copyOfNumber /= 10;
}
for(divide = 0; divide < length; divide++) {
modResult = number % 10;
number = number / 10;
string[length - (divide + 1)] = modResult + '0';
}
if(isNegative) {
string[0] = '-';
}
string[length] = '\0';
}
int main(void) {
char string[10];
int number = -131230;
IntegertoString(string, number);
printf("%s\n", string);
return 0;
}
You can use itoa where available. If it is not available on your platform, the following implementation may be of interest:
https://web.archive.org/web/20130722203238/https://www.student.cs.uwaterloo.ca/~cs350/common/os161-src-html/atoi_8c-source.html
Usage:
char *numberAsString = itoa(integerValue);
UPDATE
Based on the R..'s comments, it may be worth modifying an existing itoa implementation to accept a result buffer from the caller, rather than having itoa allocate and return a buffer.
Such an implementation should accept both a buffer and the length of the buffer, taking care not to write past the end of the caller-provided buffer.
int i = 24344; /*integer*/
char *str = itoa(i);
/*allocates required memory and
then converts integer to string and the address of first byte of memory is returned to str pointer.*/

Converting negative decimal to two's complement in C

I'm writing a program that, assuming the input is always a valid negative decimal integer, returns the two's complement binary representation (16 bit).
My logic here is that I take in inputs from the command line, and convert that with a simple conversion to binary and add them to the initialized binary array. Then, I take the one's complement (just change 0's to 1's and vise versa) and put that in the onesCom array. However, for the adding 1 part to find the two's complement, I think this is where the issue is but I'm struggling to find it. I am performing binary addition to the least significant bit.
When converting from one-complement to two-complement, i.e. adding 1, your loop should start from the LSB, not from the MSB.
Therefore,
for (j=15; j>=0; j--) { // <-- Error Here
if (onesCom[j] == 1 && carryOver == 1) {
twosCom[j] = 0;
} else if (onesCom[j] == 0 && carryOver == 1) {
twosCom[j] = 1;
carryOver = 0;
} else {
twosCom[j] = onesCom[j];
}
}
Should be replaced by:
for (j=0; j<=15; j++) {
if (onesCom[j] == 1 && carryOver == 1) {
twosCom[j] = 0;
} else if (onesCom[j] == 0 && carryOver == 1) {
twosCom[j] = 1;
carryOver = 0;
} else {
twosCom[j] = onesCom[j];
}
}
In your code, you calculate the one-complement then deduce the two-complement. Please note that it is easier to directly calculate the two-complement, in case you don't need the one-complement, like this:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
int binary[16] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
if (argc == 1) return 1;
int decimal = atoi(argv[1]);
int counter = 0;
if (decimal > -32768 && decimal < 0) {
decimal = 65536 + decimal;
while(decimal > 0) {
binary[counter] = decimal%2;
decimal = decimal/2;
counter++;
}
for (int length = 15; length >=0; length--) {
printf("%d", binary[length]);
}
printf ("\n");
}
return 0;
}
As your snippet is completely blurred, I can only suggest you two approaches to the problem:
The first assuming you are doing two's complement arithmethic all the time, in which case the digit adding must be done with sign.
The second assuming you only parse unsigned values and retaining the sign to make the sign exchange at the end.
Probably both approaches will lead to almost the same efficiency and be compiled into very similar code. I have no preference for any of them.
int decode(char *str, int base)
{
int result = 0,
c,
neg = FALSE;
/* skip whitespace, delete this if you don't
* want to cope with whitespace */
for (; isspace(c = *str); str++) {
continue;
}
if (*str == '-') {
neg = TRUE; /* negative */
str++; /* skip it */
}
/* the next characters might be all digits */
for (; isdigit(c = *str); str++) {
/* multiply by the base */
result *= base;
/* add positive for positives and
* subtract it for negatives */
int d = c - '0'; /* convert c to the digit value */
/* negative if number is negative */
if (neg) d = -d;
/* and add/subtract it */
result = result + d;
}
/* :) got it!! */
return result;
}
and the second approach is:
int decode(char *str, int base)
{
int result = 0,
c,
neg = FALSE;
/* skip whitespace, delete this if you don't
* want to cope with whitespace */
for (; isspace(c = *str); str++) {
continue;
}
if (*str == '-') {
neg = TRUE; /* negative */
str++; /* skip it */
}
/* the next characters might be all digits */
for (; isdigit(c = *str); str++) {
/* multiply by the base */
result *= base;
/* add positive for positives and
* subtract it for negatives */
int d = c - '0'; /* convert c to the digit value */
/* and add/subtract it */
result = result + d;
}
/* :) got it!! */
return neg ? -result : result;
}
Can you see the differences? (hint, I have eliminated one line in the loop and changed one line at the end :) )
If you want to run this code in a full, complete and verifiable example, there's one below, just put one of the above functions in place of the other, and run it.
#include <stdio.h>
#include <ctype.h>
/* these macros are for easy printing, and outputting the file, line and
* function name where the trace is being made */
#define F(_f) __FILE__":%d:%s:"_f, __LINE__, __func__
#define P(_f, ...) printf(F(_f), ##__VA_ARGS__)
/* I use these for portability, as <stdbool.h> is not always available */
#define FALSE (0)
#define TRUE (!FALSE)
int decode(char *str, int base)
{
/* substitute here the body of the function above you want to test */
}
int main()
{
static char *tests[] = {
"0", "-1", "-210", "-211", "-222", "1",
"210", "211", "222", "5400",
/* add more testing cases to your wish */
NULL,
};
int i, passed = 0;
for (i = 0; tests[i]; i++) {
char *test = tests[i];
int expected, actual;
P("Testing '%s' conversion\n", test);
/* expected, decoded with system routines */
if (sscanf(test, "%i", &expected) != 1) {
P("problem scanning %s\n", test);
continue;
}
/* actual, decoded with our function */
actual = decode(test, 10);
char *operator = actual == expected ? "==" : "!=";
P("Test result: actual(%i) %s expected(%i)\n",
actual, operator, expected);
if (actual == expected)
passed++;
}
P("passed %d/%d tests\n", passed, i);
}
Edit
The following code will allow you to easily convert your value to binary:
#define CHK(_n) ((_n) <= sz)
char *to_binary(int p_val, char *buf, size_t sz)
{
CHK(2); /* at least two bytes of buffer space */
buf += sz; /* we start from the end, backwards to avoid having to use
* one bit masks moving all the time around */
*--buf = '\0'; /* this is the last '\0' that should end the string */
sz--; /* update buffer size */
/* we operate better with unsigned, as the
* sign doesn't get involved in shifts (we are reinterpreting
* the sign bit as a normal bit, which makes the assumption that
* integers are stored in two's complement. This is essentially
* nonportable code, but it will work in the stated assumptions. */
unsigned val = (unsigned) p_val;
/* the first below is the second char we check
* above */
do {
*--buf = val & 1 ? '1' : '0';
sz--;
val >>= 1;
} while (CHK(1) && val);
return buf; /* return what we have */
}
And the final main() code looks like this:
int main()
{
static char *tests[] = {
"0", "-1", "-210", "-211", "-222", "1",
"210", "211", "222", "5400",
NULL,
};
int i, passed = 0;
for (i = 0; tests[i]; i++) {
char *test = tests[i];
int expected, actual;
P("Testing '%s' conversion\n", test);
/* expected, decoded with system routines */
if (sscanf(test, "%i", &expected) != 1) {
P("problem scanning %s\n", test);
continue;
}
/* actual, decoded with our function */
actual = decode(test, 10);
char *operator = actual == expected ? "==" : "!=";
char buff[100]; /* temporary variable to hold the
* converted value to binary */
P("Test result: actual(%i/0b%s)\n",
actual,
to_binary(actual, buff, sizeof buff));
P(" %s expected(%i/0b%s)\n",
operator,
expected,
to_binary(expected, buff, sizeof buff));
if (actual == expected)
passed++;
}
P("passed %d/%d tests\n", passed, i);
}

Multiplying large numbers through strings

I'm trying to write a program that will receive 2 strings representing numbers of any length
(for instance, char *a = "10000000000000";, char *b = "9999999999999999";) and multiply them.
This is what I came up with so far, not sure how to continue (nullify simply fills the whole string with '0'):
char *multiply(char *hnum, const char *other)
{
int num1=0, num2=0, carry=0, hnumL=0, otherL=0, i=0, temp1L=0, temp2L=0, n=0;
char *temp1, *temp2;
if(!hnum || !other) return NULL;
for(hnumL=0; hnum[hnumL] != '\0'; hnumL++);
for(otherL=0; other[otherL] != '\0'; otherL++);
temp1 = (char*)malloc(otherL+hnumL);
if(!temp1) return NULL;
temp2 = (char*)malloc(otherL+hnumL);
if(!temp2) return NULL;
nullify(temp1);
nullify(temp2);
hnumL--;
otherL--;
for(otherL; otherL >= 0; otherL--)
{
carry = 0;
num1 = other[otherL] - '0';
for(hnumL; hnumL >= 0; hnumL--)
{
num2 = hnum[hnumL] - '0';
temp1[i+n] = (char)(((int)'0') + ((num1 * num2 + carry) % 10));
carry = (num1 * num2 + carry) / 10;
i++;
temp1L++;
}
if(carry > 0)
{
temp1[i+n] = (char)(((int)'0') + carry);
temp1L++;
}
p.s. Is there a library that handles this already? Couldn't find anything like it.
On paper, you would probably do as follows:
999x99
--------
8991
8991
========
98901
The process is to multiply individual digits starting from the right of each number and adding them up keeping a carry in mind each time ("9 times 9 equals 81, write 1, keep 8 in mind"). I'm pretty sure you covered that in elementary school, didn't you?.
The process can be easily put into an algorithm:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct result
{
int carry;
int res;
};
/*
* multiply two numbers between 0 and 9 into result.res. If there is a carry, put it into
* result.carry
*/
struct result mul(int a, int b)
{
struct result res;
res.res = a * b;
if (res.res > 9)
{
res.carry = res.res / 10;
res.res %= 10;
}
else
res.carry = 0;
return res;
}
/*
* add
* adds a digit (b) to str at pos. If the result generates a carry,
* it's added also (recursively)
*/
add(char str[], int pos, int b)
{
int res;
int carry;
res = str[pos] - '0' + b;
if (res > 9)
{
carry = res / 10;
res %= 10;
add(str, pos - 1, carry);
}
str[pos] = res + '0';
}
void nullify(char *numstr, int len)
{
while (--len >= 0)
numstr[len] = '0';
}
int main(void)
{
struct result res;
char *mp1 = "999";
char *mp2 = "999";
char sum[strlen(mp1) + strlen(mp2) + 1];
int i;
int j;
nullify(sum, strlen(mp1) + strlen(mp2));
for (i = strlen(mp2) - 1; i >= 0; i--)
{
/* iterate from right over second multiplikand */
for (j = strlen(mp1) - 1; j >= 0; j--)
{
/* iterate from right over first multiplikand */
res = mul((mp2[i] - '0'), (mp1[j] - '0'));
add(sum, i + j + 1, res.res); /* add sum */
add(sum, i + j, res.carry); /* add carry */
}
}
printf("%s * %s = %s\n", mp1, mp2, sum);
return 0;
}
This is just the same as on paper, except that you don't need to remember individual summands since we add up everything on the fly.
This might not bee the fastest way to do it, but it doesn't need malloc() (provided you have a C99 compiler, otherwise you would need to dynamically allocate sum) and works for arbitrarily long numbers (up to the stack limit since add() is implemented as recursive function).
Yes there are libraries that handle this. It's actually a pretty big subject area that a lot of research has gone into. I haven't looked through your code that closely, but I know that the library implementations of big num operations have very efficient algorithms that you're unlikely to discover on your own. FOr example, the multiplication routine we all learned in grade school (pre common-core) is a O(n^2) solution to multiplication, but there exist ways to solve it in ~O(n^1.5).
THe standard GNU c big num library is GNU MP
https://gmplib.org/

any one know how to convert a huge char array to float, very huge array, performance better than the atof/strtod/sscanf

I got a char array, a huge array char p[n] read from a txt like.
//1.txt
194.919 -241.808 234.896
195.569 -246.179 234.482
194.919 -241.808 234.896
...
foo(char *p, float x, float y, float z)
{
}
I tried to use atof, strtod, but they are real time consuming when the array is too huge, because they will call the strlen(). and the sscanf is also very slow....
I debug into the code and find that both atof() and strtod call the strlen() in the visual studio, we can check the crt code.
strtod() call:
answer = _fltin2( &answerstruct, ptr, (int)strlen(ptr), 0, 0, _loc_update.GetLocaleT());
atof() call:
return( *(double *)&(_fltin2( &fltstruct, nptr, (int)strlen(nptr), 0, 0, _loc_update.GetLocaleT())->dval) );
I also try to use strtok, but we should not change any data in the 1.txt.
so any one have the best way to convert all these to float x, y, z.
Visual studio 2008 + WIN7
If you can make additional assumptions about the format of the floating point values, parsing them yourself might increase performance.
Example code for parsing ' ' or '\n'-separated values without exponents and no input validation:
float parsef(const char **str)
{
const char *cc = *str;
_Bool neg = (*cc == '-');
if(neg) ++cc;
float value = 0, e = 1;
for(; *cc != '.'; ++cc)
{
if(*cc == ' ' || *cc == '\n' || !*cc)
{
*str = cc;
return neg ? -value : value;
}
value *= 10;
value += *cc - '0';
}
for(++cc;; ++cc)
{
if(*cc == ' ' || *cc == '\n' || !*cc)
{
*str = cc;
return neg ? -value : value;
}
e /= 10;
value += (*cc - '0') * e;
}
}
Example code:
const char *str = "42 -15.4\n23.001";
do printf("%f\n", parsef(&str));
while(*str++);
Okay, how about doing the tokenization yourself and then calling strtod.
What I'm thinking is something like this:
char *current = ...; // initialited to the head of your character array
while (*current != '\0')
{
char buffer[64];
unsigned int idx = 0;
// copy over current number
while (*current != '\0' && !isspace(*current))
{
buffer[idx++] = *current++;
}
buffer[idx] = '\0';
// move forward to next number
while (*current != '\0' && isspace(*current))
{
current++;
}
// use strtod to convert buffer
}
Some issues with this is the tokenization is very simple. It will work for the format you posted, but if the format varies (another line uses : to separate the numbers), it won't work.
Another issue is that the code assumes all numbers have < 64 characters. If they are longer, you'll get a buffer overflow.
Also, the copying to a temporary buffer will add some overhead (but hopefully less then the overhead of constantly doing a strlen on the entire buffer). I know you said you can't change the original buffer, but can you do a temporary change (i.e. the buffer can change as as long as you return it to it's original state before you return):
char *current = ...; // initialited to the head of your character array
while (*current != '\0')
{
char *next_sep = current;
while (*next_sep != '\0' && !isspace(*next_sep))
{
next_sep++;
}
// save the separator before overwriting it
char tmp = *next_sep;
*next_sep = '\0';
// use strtod on current
// Restore the separator.
*next_sep = tmp;
current = next_sep;
// move forward to next number
while (*current != '\0' && isspace(*current))
{
current++;
}
}
This technique means no copying and no worries about buffer overflow. You do need to temporarily modify the buffer; hopefully that is
Check out this code.
It can be further optimized if there's no need to support scientific representation, '+' sign, or leading tabs.
It doesn't use strlen, or any other standard library string routine.
// convert floating-point value in string represention to it's numerical value
// return false if NaN
// F is float/double
// T is char or wchar_t
// '1234.567' -> 1234.567
template <class F, class T> inline bool StrToDouble(const T* pczSrc, F& f)
{
f= 0;
if (!pczSrc)
return false;
while ((32 == *pczSrc) || (9 == *pczSrc))
pczSrc++;
bool bNegative= (_T('-') == *pczSrc);
if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
pczSrc++;
if ( (*pczSrc < _T('0')) || (*pczSrc > _T('9')) )
return false;
// todo: return false if number of digits is too large
while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
{
f= f*10. + (*pczSrc-_T('0'));
pczSrc++;
}
if (_T('.') == *pczSrc)
{
pczSrc++;
double e= 0.;
double g= 1.;
while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
{
e= e*10. + (*pczSrc-_T('0'));
g= g*10. ;
pczSrc++;
}
f+= e/g;
}
if ( (_T('e') == *pczSrc) || (_T('E') == *pczSrc) ) // exponent, such in 7.32e-2
{
pczSrc++;
bool bNegativeExp= (_T('-') == *pczSrc);
if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
pczSrc++;
int nExp= 0;
while ( (*pczSrc >= _T('0')) && (*pczSrc <= _T('9')) )
{
nExp= nExp*10 + (*pczSrc-_T('0'));
pczSrc++;
}
if (bNegativeExp)
nExp= -nExp;
// todo: return false if exponent / number of digits of exponent is too large
f*= pow(10., nExp);
}
if (bNegative)
f= -f;
return true;
}
As long as you are not using a particularly bad standard library (impossible these times, they are all good) it's not possible to do it faster than atof.
I don't see any reason why strod() should call strlen(). Of course it might, but nothing in its specification requires it and I'd be suprised if it did. And I'd say that strtod() about as fast as you'll get, short of writing some FPU processor-specific stuff yourself.
Why do you think atof, strtod use strlen? I've never implemented them, but I can't imagine why they'd need to know the length of the input string. It would be of no value to them. I'd use strtod as per Jason's answer. That's what it's for.
And yes, if you have a very large amount of text, it's going to take some time to convert. That's just the way it is.
Use strtod. It almost certainly does not call strlen. Why would it need to know the length of the input? It merely runs past leading whitespace, then consumes as many characters as possible that make sense for a floating point literal, and then returns a pointer just past that. You can see an example implementation Perhaps you're using it non-optimally? Here's a sample of how to use strtod:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *p = "1.txt 194.919 -241.808 234.896 195.569 -246.179 234.482 194.919 -241.808 234.896";
char *end = p;
char *q;
double d;
while(*end++ != ' '); // move past "1.txt"
do {
q = end;
d = strtod(q, &end);
printf("%g\n", d);
} while(*end != '\0');
}
This outputs:
194.919
-241.808
234.896
195.569
-246.179
234.482
194.919
-241.808
234.896
on my machine.
As others have said, I don't think you're going to do much better than the standard library calls. They have been around for a long time and are quite highly optimized (well, they should be, at least in good implementations).
That said, there are some things that aren't clear to me. Are you reading the whole file into memory and then converting the array to another array? If so, you might want to check that the system you are running on has enough memory to do that with swapping. If you are doing this, would it be possible to just convert one line at a time as you read them off disk instead of storing them?
You could consider multithreading your program. One thread to read and buffer lines off disk, and n threads to process the lines. Dr. Dobb's Journal published a great single-reader/single-writer lockless queue implementation you could use. I've used this in a similar app. My worker threads each have an input queue, and then reader thread reads data off disk and places them into these queues in round robin style.
How about something like:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static float frac[] =
{
0.000,
0.001,
0.002,
... // fill in
0.997,
0.998,
0.999,
};
static float exp[] =
{
1e-38,
1e-37,
1e-36,
... // fill in
1e+36,
1e+37,
1e+38,
};
float cvt(char* p)
{
char* d = strchr(p, '.'); // Find the decimal point.
char* e = strchr(p, 'e'); // Find the exponent.
if (e == NULL)
e = strchr(p, 'E');
float num = atoi(p);
if (num > 0) {
num += frac[atoi(d + 1)];
} else {
num -= frac[atoi(d + 1)];
}
if (e)
num *= exp[atoi(e)];
return num;
}
int main()
{
char line[100];
while(gets(line)) {
printf("in %s, out %g\n", line, cvt(line));
}
}
Should be good to three significant digits.
Edit: watch out for big mantissas.
Edit again: and negative exponents. :-(
I doubt if strlen is costing you much.
If you can take advantage of your numbers falling in a relatively restricted range, then what I suggest is to parse it yourself, doing as little computation as possible, such as:
#define DIGIT(c) ((c)>='0' && (c)<='9')
BOOL parseNum(char* *p0, float *f){
char* p = *p0;
int n = 0, frac = 1;
BOOL bNeg = FALSE;
while(*p == ' ') p++;
if (*p == '-'){p++; bNeg = TRUE;}
if (!(DIGIT(*p) || *p=='.')) return FALSE;
while(DIGIT(*p)){
n = n * 10 + (*p++ - '0');
}
if (*p == '.'){
p++;
while(DIGIT(*p)){
n = n * 10 + (*p++ - '0');
frac *= 10;
}
}
*f = (float)n/(float)frac;
if (bNeg) *f = -*f;
*p0 = p;
return TRUE;
}

Resources