Detect and prevent overflow for unsigned long in C - c

I have some lines in my code to check whether the resulting value overflows (by comparing it to it's previous iteration), and therefore if the input value is too large. This works for some values, but not for values whose increment is so large that it not only overflows, but overflows so much that the resulting value is larger than the previous iteration. For example, it triggers for 18446744073709551616 (MAX_ULONG + 1), but not for 184467440737095516150 (MAX_ULONG * 10). How can I address this issue? The code is as follows:
unsigned long result = 0;
unsigned long overflowCheck = 0;
int power = 0;
for (int i = (strlen(input) - 1); i >= 0; i--) {
if ((input[i] > ('0' - 1)) && (input[i] < ('9' + 1))) {
result += (input[i] - '0') * (unsigned long)pow(iBase, power++);
} else {
printf("Invalid input string.");
valid = 0;
return -1;
}
if (result < overflowCheck) {
printf("Input value too large.");
valid = 0;
return -1;
}
overflowCheck = result;
}
return result;

There are multiple problems in your code:
you should not use pow to perform integer arithmetics: type double may have less value bits than type unsigned long (for example on 64-bit linux, double has 53 value bits and unsigned long has 64). It is simpler to multiply the current value by iBase and add the digit value for each new digit parsed.
it is easier to detect potential overflow before multiplying or adding the values.
Here is a modified version:
#include <errno.h>
#include <limits.h>
unsigned long atoul(const char *input, unsigned int iBase) {
if (iBase < 2 || iBase > 36) {
errno = EINVAL;
return 0;
}
unsigned long result = 0;
unsigned long maxval = ULONG_MAX / iBase;
int maxdigit = ULONG_MAX % iBase;
for (;;) {
int c = *input++;
int digit;
if (c >= '0' && c <= '9') {
digit = c - '0';
} else
if (c >= 'A' && c <= 'Z') {
digit = c - 'A' + 10;
} else
if (c >= 'a' && c <= 'z') {
digit = c - 'a' + 10;
} else {
break;
}
if (digit >= iBase)
break;
if (result > maxval || (result == maxval && digit > maxdigit) {
/* overflow detected */
errno = ERANGE;
return ULONG_MAX;
}
result = result * iBase + digit;
}
return result;
}

Suppose you want to check if x + y overflows where x and y are both unsigned long. The naive approach would be to do this:
if (ULONG_MAX < x + y)
But this will always be false because of overflow. Instead, you would do this:
if (ULONG_MAX - x < y)
This check is algebraically the same as the first attempt but avoids issues of overflow. You can do a similar check in your case:
if ((input[i] > ('0' - 1)) && (input[i] < ('9' + 1))) {
int digit = input[i] - '0';
if (ULONG_MAX / 10 < result) {
printf("overflow");
return -1;
}
result *= 10;
if (ULONG_MAX - digit < result) {
printf("overflow");
return -1;
}
result += digit;
} else {
printf("Invalid input string.");
valid = 0;
return -1;
}

result < 0 will always return false since result is unsigned (and can never be less than 0. One way to check for overflow is to make sure pow() (as a double) is within the bounds for long. However, the real solution here is to not use pow() and keep everything as integers. If you work starting with the most significant digit, you can multiply result by the base (16 in this case) and add the new digit each time. This works because 1234 = base*(base*(base*(0 + 1) + 2) + 3) + 4
Some (incomplete) code would be:
int input_len = strlen(input);
for (int i = 0; i < input_len; i++) {
// After finding out which digit group input[i] is in:
result = result * iBase + (input[i] - '0');
}
Since result will only change by a factor of 16 at most, you can check for overflow by comparing with the previous result every iteration:
unsigned long previous = result;
// Add in the next digit
if (result < previous) {
// Overflow
}

Related

Looking for the absolute FASTEST way to write integers as individual digits - chars to a file in C - including microoptimizations

I'm working on a program in C where the main objective is absolute speed - it's a code performance competition. There are more ways to speed up the program, however, the largest speedup potential is in I/O operations, specifically, saving to text file. The file is structured as follows: 3 integers of arbitrary digit count per line, separated by whitespaces. The integers are known beforehand, they just need to be converted to a string and written to the output buffer.
The integers only range from -1 to INT_MAX.
The buffer size varies (I set it) based on the data being written but most of the time, the written file size is in orders of 100s of megabytes to something over a gigabyte and buffer is between 4 and 8 MB. The main write loop is this:
int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
const size_t w_bufsize = get_bufsize(param);
void *buf = NULL;
posix_memalign(&buf, sysconf(_SC_PAGE_SIZE), w_bufsize)
posix_fadvise(fd, 0, 0, POSIX_FADV_NOREUSE);
size_t num_written = 0;
size_t f_idx = 0;
for (int i = 0; i < num_ints; ++i) {
myStruct *str = main_struct->structs + i;
f_idx = fast_write_3_ints(buf, str->int1, str->int2, str->int3, f_idx);
if (f_idx + BYTES_PER_ROW > w_bufsize) {
write(fd, buf, f_idx) != f_idx
if (num_written)
posix_fadvise(fd, (num_written - 1) * w_bufsize, w_bufsize,
POSIX_FADV_DONTNEED);
f_idx = 0;
++num_written;
}
(Return value checking and frees/closes abbreviated for readability)
For converting the integers to text, I use this method:
https://kenny-peng.com/2021/05/28/printing_integers_fast.html
I further improved it by bypassing the temporary buffer and memcpy-ing the characters directly to the output buffer (10-15 % perf increase on my machine).
Here is abbreviated (where possible) version of my code
size_t fast_write_3_ints(char *out_buf, int num1, int num2, int num3,
size_t idx)
{
char *temp_ptr = NULL;
int n_digits = 0;
if (num1 < 0) {
out_buf[idx++] = '-';
num1 = -num1;
}
if (num1 < 10) {
out_buf[idx++] = num1 + '0';
} else {
idx += count_digits(num1);
temp_ptr = out_buf + idx;
for (; num1 >= 1000; num1 /= 1000) {
temp_ptr -= 3;
lookup_digits(temp_ptr, num1 % 1000, 3);
}
if (num1) {
num1 %= 1000;
n_digits = count_digits(num1);
lookup_digits(temp_ptr - n_digits, num1, n_digits);
}
}
out_buf[idx++] = ' ';
// write int 2 and 3 - abbreviated
out_buf[idx++] = '\n';
return idx;
}
static void lookup_digits(char *arr, int num, char write_size)
{
static const char table[3000] __attribute__((aligned(64))) =
"000001002003004005006007008009"
"010011012013014015016017018019"
"020021022023024025026027028029"
"030031032033034035036037038039"
"040041042043044045046047048049"
"050051052053054055056057058059"
"060061062063064065066067068069"
"070071072073074075076077078079"
"080081082083084085086087088089"
"090091092093094095096097098099"
"100101102103104105106107108109"
"110111112113114115116117118119"
"120121122123124125126127128129"
"130131132133134135136137138139"
"140141142143144145146147148149"
"150151152153154155156157158159"
"160161162163164165166167168169"
"170171172173174175176177178179"
"180181182183184185186187188189"
"190191192193194195196197198199"
"200201202203204205206207208209"
"210211212213214215216217218219"
"220221222223224225226227228229"
"230231232233234235236237238239"
"240241242243244245246247248249"
"250251252253254255256257258259"
"260261262263264265266267268269"
"270271272273274275276277278279"
"280281282283284285286287288289"
"290291292293294295296297298299"
"300301302303304305306307308309"
"310311312313314315316317318319"
"320321322323324325326327328329"
"330331332333334335336337338339"
"340341342343344345346347348349"
"350351352353354355356357358359"
"360361362363364365366367368369"
"370371372373374375376377378379"
"380381382383384385386387388389"
"390391392393394395396397398399"
"400401402403404405406407408409"
"410411412413414415416417418419"
"420421422423424425426427428429"
"430431432433434435436437438439"
"440441442443444445446447448449"
"450451452453454455456457458459"
"460461462463464465466467468469"
"470471472473474475476477478479"
"480481482483484485486487488489"
"490491492493494495496497498499"
"500501502503504505506507508509"
"510511512513514515516517518519"
"520521522523524525526527528529"
"530531532533534535536537538539"
"540541542543544545546547548549"
"550551552553554555556557558559"
"560561562563564565566567568569"
"570571572573574575576577578579"
"580581582583584585586587588589"
"590591592593594595596597598599"
"600601602603604605606607608609"
"610611612613614615616617618619"
"620621622623624625626627628629"
"630631632633634635636637638639"
"640641642643644645646647648649"
"650651652653654655656657658659"
"660661662663664665666667668669"
"670671672673674675676677678679"
"680681682683684685686687688689"
"690691692693694695696697698699"
"700701702703704705706707708709"
"710711712713714715716717718719"
"720721722723724725726727728729"
"730731732733734735736737738739"
"740741742743744745746747748749"
"750751752753754755756757758759"
"760761762763764765766767768769"
"770771772773774775776777778779"
"780781782783784785786787788789"
"790791792793794795796797798799"
"800801802803804805806807808809"
"810811812813814815816817818819"
"820821822823824825826827828829"
"830831832833834835836837838839"
"840841842843844845846847848849"
"850851852853854855856857858859"
"860861862863864865866867868869"
"870871872873874875876877878879"
"880881882883884885886887888889"
"890891892893894895896897898899"
"900901902903904905906907908909"
"910911912913914915916917918919"
"920921922923924925926927928929"
"930931932933934935936937938939"
"940941942943944945946947948949"
"950951952953954955956957958959"
"960961962963964965966967968969"
"970971972973974975976977978979"
"980981982983984985986987988989"
"990991992993994995996997998999";
memcpy(arr, table + 3 * num + 3 - write_size, write_size);
}
static int count_digits(int num)
{
if (num < 100000)
if (num < 1000)
if (num < 100)
if (num < 10)
return 1;
else
return 2;
else
return 3;
else if (num < 10000)
return 4;
else
return 5;
else if (num < 10000000)
if (num < 1000000)
return 6;
else
return 7;
else if (num < 100000000)
return 8;
else if (num < 1000000000)
return 9;
else
return 10;
}
This is the main production code right now. Below I describe what alternatives I tried and how it turned out.
I also have to note that my computer is a 14" Macbook Pro with the M1 Pro chip and very fast SSD, which makes IO operations totally negligible compared to the main computation. However, the evaluation server/machine is of very different specs (likely), and there, saving the file is by far the slowest bit. I also noted that some changes made it perform better on my machine but worse on the actual evaluator (likely cache size/memory speed dependent).
I also tried implementing lookup-free int-to-string processing as described here:
https://johnnylee-sde.github.io/Fast-unsigned-integer-to-string/
this did not improve performance by more than run-to-run variance on my machine.
I also tried extending the table to the 4*10000 numbers, but it improved performance on my machine by only 3-5 % and actually made it a little worse in the evaluation system (likely a lot slower CPU/memory).
Is there anything else I can optimize for? I am running out of ideas. The historically fastest version of the code saves to the file 18 % faster than my implementation.
A thread solving the exact some problem but with different functions that are (in my eyes) slower and perform a lot more ops? The fastest way to save graph to file in C
Or should I attempt to integrate the single large buffer routine into my algorithm and write in st_blksize sized buffers instead?
Thanks so much for any help or suggestions
EDIT: Function that determines output buffer size (consider param to be the amount of lines to be written)
size_t get_bufsize(int param)
{
size_t bufsize = 4096;
if (param >= 1000 && param < 10000)
bufsize <<= 4;
else if (param >= 10000 && param < 100000)
bufsize <<= 6;
else if (param >= 100000 && param < 1000000)
bufsize <<= 8;
else if (param >= 1000000 && param <= 5000000)
bufsize <<= 10;
else if (param > 5000000)
bufsize <<= 11;
// printf("Buffer size: %zu\n", bufsize);
return bufsize;
}
EDIT 2:
The integers only range from -1 to INT_MAX.
Here are some directions to try and improve you code efficiency:
if running on a legacy system, you should specify O_BINARY to ensure the write system call does not perform some system specific conversion.
when flushing the buffer to disk, you should try and only write a whole number of pages and shift the remaining chunk to the beginning of the buffer. Allocating a decent number of 4K pages plus some slack and writing the 4K pages is a better approach to allocating a huge number of pages and issuing partial writes.
Your function fast_write_3_ints has a redundant statement num1 %= 1000; as well as the if (num1) test. It and can be further simplified to improve speed on small values:
size_t fast_write_3_ints(char *out_buf, int num1, int num2, int num3,
size_t idx)
{
char *temp_ptr;
int n_digits;
if (num1 < 0) {
out_buf[idx++] = '-';
num1 = -num1;
}
if (num1 < 1000) {
if (num1 < 10) {
out_buf[idx++] = num1 + '0';
} else {
n_digits = 2 + (num1 >= 100);
lookup_digits(out_buf + idx, num1, n_digits));
idx += n_digits;
}
} else {
n_digits = count_digits(num1);
idx += n_digits;
temp_ptr = out_buf + idx;
while (n_digits > 3) {
int digits = num1 % 1000;
num1 /= 1000; // group division and modulo
temp_ptr -= 3;
lookup_digits(temp_ptr, digits, 3);
n_digits -= 3;
}
lookup_digits(temp_ptr - n_digits, num1, n_digits);
}
out_buf[idx++] = ' ';
// write int 2 and 3 - abbreviated
out_buf[idx++] = '\n';
return idx;
}
using branchless code for count_digits might get you some speed gains:
static int count_digits(int num) {
return 1 + (num > 9) + (num > 99) + (num > 999) +
(num > 9999) + (num > 99999) + (num > 999999) +
(num > 9999999) + (num > 99999999) + (num > 999999999);
}
int vs. int_fast32_t
Rather than int, consider int_fast32_t as potentially a 64-bit type may be faster.
Avoid interval tests with 2 values
A little improvement perhaps with a simplified if tree.
Also, favor testing large values first as more likely to match.
uint_fast32_t get_bufsize(int param) {
#define BLOCK ((uint_fast32_t) 4096)
if (param >= 5000000) {
return BLOCK << 11;
}
if (param >= 1000000) {
return BLOCK << 10;
}
if (param >= 100000) {
return BLOCK << 8;
}
if (param >= 10000) {
return BLOCK) << 6;
}
if (param >= 1000) {
return BLOCK << 4;
}
return BLOCK;
}
unsigned vs. int
I have never encounter using int faster than unsigned, yet using unsigned has some potential for faster code. Something to try. After if (num1 < 0) test, code could move to unsigned math and maybe see a marginal improvement.
I doubt any of these will dramatically improve, yet may nudge toward a faster code.
If you're trying to optimise to avoid unnecessarily executing code AND the only negative value is -1, change:
if (num1 < 0) {
out_buf[idx++] = '-';
num1 = -num1;
}
if (num1 < 10) {
out_buf[idx++] = num1 + '0';
} else {
to
if (num1 < 10) {
if (num1 < 0) num = 1, out_buf[idx++] = '-';
out_buf[idx++] = num1 + '0';
} else {
Further, it seems you try to handle the residual 1,2or3 digits in some special case. This is unnecessary.
The example code below "borrows" the branchless function from #chqrlie. It also computes double/triple digits instead of indexing into a LUT. Think about that LUT... Slice off the first 100 values into a second "two digit" function, trim the leading zeros, and stop performing arcane calculations on pointers and counts. (I'm not suggesting you use these functions. Too much arithmetic happening. You could use two distinct conversion functions... or not.) Finally, this example only deals with positive numbers and only translates one.
void lookup_2_digits( char *p, int n ) { // Use a LUT... I didn't for this example
p[1] = (char)(n % 10 + '0'); n /= 10;
p[0] = (char)(n + '0');
}
void lookup_3_digits( char *p, int n ) { // Use a LUT... I didn't for this example
p[2] = (char)(n % 10 + '0'); n /= 10;
p[1] = (char)(n % 10 + '0'); n /= 10;
p[0] = (char)(n + '0');
}
int count_digits(int n) {
return 1+ (n > 9) + (n > 99) + (n > 999)
+ (n > 9999) + (n > 99999) + (n > 999999)
+ (n > 9999999) + (n > 99999999) + (n > 999999999);
}
void doit( int num1 ) {
char out_buf[512] = {0};
int idx = 0;
idx += count_digits( num1 );
char *temp_ptr = out_buf + idx;
do {
if( num1 <= 99 ) {
if( num1 <= 9 )
/* Can deal with -1 here */
*--temp_ptr = num1 + '0';
else
lookup_2_digits( temp_ptr-2, num1 );
num1 = 0;
} else {
lookup_3_digits( temp_ptr -= 3, num1 % 1000 );
num1 /= 1000;
}
} while( num1 > 0 );
puts( out_buf );
}
int main( void ) {
doit( 2165536 );
return 0;
}

When inserting exponents (n) above 10, this function gives the wrong values

Just a quick issue I have in c. In this case x is always 10, and n varies. the function works fine when n < 11, but as soon as it goes above 11 the function churns out random values which are sometimes negative too. Any tips?
int myPow(int x,int n)
{
int i; /* Variable used in loop counter */
long number = 1;
for (i = 0; i < n; i++)
number *= x;
return(number);
}
Typically code like this is undefined behavior due to int overflow.
Yet OP is using long in the multiplication, so it is UB once the product exceeds LONG_MAX.
When INT_MAX and LONG_MAX are 2,147,483,647 (231 - 1), codes suffers UB with my_pow(10,10).
When LONG_MAX in 9,223,372,036,854,775,807 (263 - 1), codes suffers UB with my_pow(10,19) and my_pow(10,10) is implementation defined behavior converting the large long to int on the return.
Code using wider intergern types to cope with larger values, but integer math is only good for so far with an exponential function
Some dusty old sample code the considers all int values and uses Exponentiation by squaring with projection against signed integer overflow.
#define DIV0 INTMAX_MIN
// Likely needs work ...
static bool safe_mul(intmax_t a, intmax_t b, intmax_t *product) {
if (a > 0) {
if (b > 0) {
if (a > INTMAX_MAX / b)
return true; // a positive, b positive
} else if (b < INTMAX_MIN / a)
return true; // a positive, b not positive
} else if (b > 0) {
if (a < INTMAX_MIN / b)
return true; // a not positive, b positive
} else if (a != 0 && b < INTMAX_MAX / a)
return true; // a not positive, b not positive
*product = a * b;
return false;
}
intmax_t pow_i(int base, int expo) {
if (expo < 0) {
if (base == 1)
return 1;
if (base == -1)
return (expo % 2) ? -1 : 1;
if (base == 0)
return DIV0; //
return 0; // round toward 0
}
intmax_t wide_base = base;
intmax_t pow = 1;
while (expo > 0) {
if (expo % 2) {
if (safe_mul(pow, wide_base, &pow))
return 0;
}
if (safe_mul(wide_base, wide_base, &wide_base))
return 0;
expo /= 2;
}
return pow;
}

Print an integer digit by digit

I have a function print_number.
The function checks if in front of the number there exists '-', then it reverse the number and takes every digit and prints it. The algorithm works pretty good but if i give -2.147.483.648 ( which should be the bottom limit of an integer ) it pritns -0 and i don't know why.
#include<stdio.h>
void print_char(char character)
{
printf("%c",character);
}
void print_number(int nr)
{
int reverse=0;
if (nr < 0)
{
print_char('-');
nr *= -1;
}
while(nr > 9)
{
reverse = reverse * 10 + nr % 10;
nr = nr / 10;
}
print_char(nr + '0');
while(reverse)
{
print_char(reverse % 10 + '0');
reverse = reverse / 10;
}
}
When you are doing
if (nr < 0)
{
print_char('-');
nr *= -1;
}
It inverses negative number to the positive one.
If you will run it for -2.147.483.648, you will receive
nr = 2.147.483.648 // == binary 1 0000000000000000000000000000000
As INT is 32 BIT variable in your architecture (and at least 16 BIT variable by the spec), so '1' overflows it and so on
nr = 0 // For gcc-like C realisation
And accepting the ISO9899 spec, this behaviour of signed int overflow is realisation-specified thing and may not be predicted in common.
Use long long value if you're needing to use your program for larger values.
Something like:
#include<stdio.h>
void print_char(char character)
{
printf("%c",character);
}
void print_number(long long nr)
{
int reverse=0;
if (nr < 0)
{
print_char('-');
nr *= -1;
}
while(nr > 9)
{
reverse = reverse * 10 + nr % 10;
nr = nr / 10;
}
print_char(nr + '0');
while(reverse)
{
print_char(reverse % 10 + '0');
reverse = reverse / 10;
}
}
void main(void){
print_number(-2147483648LL);
}
And test:
> gcc test.c
> ./a.out
-2147483648
Firstly, the MAX and MIN range for an INT are -2,147,483,648 and 2,147,483,647 respectively.
Negating -2,147,483,648 means a positive value 2,147,483,648 would result in an overflow by 1 as it is out of bounds for the MAX range.
This operation will result in the same value of -2,147,483,648.
Secondly, you might encounter an overflow during the integer reversing process.
Example, reversing 2147483647 causes an overflow after the intermediate result of 746384741.
Therefore, you should handle that by throwing an exception or returning 0.
Thirdly, your loop for reversing the number is inaccurate. It should loop till while(nr != 0)
Here's the complete code.
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main()
{
void reverseNumber(int);
reverseNumber(124249732);
return 0;
}
void reverseNumber(int nr)
{
printf("nr = %d\n", nr);
int reverse = 0;
bool neg = false;
if (nr < 0) {
neg = true;
nr *= -1;
}
while (nr != 0) {
int digit = nr % 10;
int result = reverse * 10 + digit;
if ((result - digit) / 10 != reverse) {
printf("ERROR\n");
exit(0);
}
reverse = result;
nr = nr / 10;
}
if(neg) {
printf("%c", '-');
}
printf("%d\n", reverse);
}
nr *= -1; is a problme when nr == INT_MIN as that is signed integer overflow. The result is undefined behavior (UB). Best to avoid.
Wider integers are not always available.
Using OP's general, approach, do not change the sign of nr until it is reduced.
void print_number(int nr) {
int reverse = 0;
if (nr < 0) {
print_char('-');
//nr *= -1;
}
while (nr/10) { // new test
reverse = reverse * 10 + nr % 10;
nr = nr / 10;
}
reverse = abs(reverse); // reverse = |reverse|
nr = abs(nr); // nr = |nr|
print_char(nr + '0');
while (reverse) {
print_char(reverse % 10 + '0');
reverse = reverse / 10;
}
}

C string to int without any libraries

I'm trying to write my first kernel module so I'm not able to include libraries for atoi, strtol, etc. How can I convert a string to int without these built-in functions? I tried:
int num;
num = string[0] - '0';
which works for the first character, but if I remove the [0] to try and convert the full string it gives me a warning: assignment makes integer from pointer without a cast. So what do I do?
When creating your own string to int function, make sure you check and protect against overflow. For example:
/* an atoi replacement performing the conversion in a single
pass and incorporating 'err' to indicate a failed conversion.
passing NULL as error causes it to be ignored */
int strtoi (const char *s, unsigned char *err)
{
char *p = (char *)s;
int nmax = (1ULL << 31) - 1; /* INT_MAX */
int nmin = -nmax - 1; /* INT_MIN */
long long sum = 0;
char sign = *p;
if (*p == '-' || *p == '+') p++;
while (*p >= '0' && *p <= '9') {
sum = sum * 10 - (*p - '0');
if (sum < nmin || (sign != '-' && -sum > nmax)) goto error;
p++;
}
if (sign != '-') sum = -sum;
return (int)sum;
error:
fprintf (stderr, "strtoi() error: invalid conversion for type int.\n");
if (err) *err = 1;
return 0;
}
You can't remove the [0]. That means that you are subtracting '0' from the pointer string, which is meaningless. You still need to dereference it:
num = string[i] - '0';
A string is an array of characters, represented by an address (a.k.a pointer).
An pointer has an value that might look something like 0xa1de2bdf. This value tells me where the start of the array is.
You cannot subtract a pointer type with a character type (e.g 0xa1de2bdf - 'b' does not really make sense).
To convert a string to a number, you could try this:
//Find the length of the string
int len = 0;
while (str[len] != '\0') {
len++;
}
//Loop through the string
int num = 0, i = 0, digit;
for (i=0; i<len; i++) {
//Extract the digit
digit = ing[i] - '0';
//Multiply the digit with its correct position (ones, tens, hundreds, etc.)
num += digit * pow(10, (len-1)-i);
}
Of course if you are not allowed to use math.h library, you could write your own pow(a,b) function which gives you the value of a^b.
int mypowfunc(int a, int b) {
int i=0, ans=1;
//multiply the value a for b number of times
for (i=0; i<b; i++) {
ans *= a;
}
return ans;
}
I have written the code above in a way that is simple to understand. It assumes that your string has a null character ('\0') right behind the last useful character (which is good practice).
Also, you might want to check that the string is actually a valid string with only digits (e.g '0', '1', '2', etc.). You could do this by including an if... else.. statement while looping through the string.
In modern kernels you want to use kstrto*:
http://lxr.free-electrons.com/source/include/linux/kernel.h#L274
274 /**
275 * kstrtoul - convert a string to an unsigned long
276 * #s: The start of the string. The string must be null-terminated, and may also
277 * include a single newline before its terminating null. The first character
278 * may also be a plus sign, but not a minus sign.
279 * #base: The number base to use. The maximum supported base is 16. If base is
280 * given as 0, then the base of the string is automatically detected with the
281 * conventional semantics - If it begins with 0x the number will be parsed as a
282 * hexadecimal (case insensitive), if it otherwise begins with 0, it will be
283 * parsed as an octal number. Otherwise it will be parsed as a decimal.
284 * #res: Where to write the result of the conversion on success.
285 *
286 * Returns 0 on success, -ERANGE on overflow and -EINVAL on parsing error.
287 * Used as a replacement for the obsolete simple_strtoull. Return code must
288 * be checked.
289 */
This function skips leading and trailing whitespace, handles one optional + / - sign, and returns 0 on invalid input,
// Convert standard null-terminated string to an integer
// - Skips leading whitespaces.
// - Skips trailing whitespaces.
// - Allows for one, optional +/- sign at the front.
// - Returns zero if any non-+/-, non-numeric, non-space character is encountered.
// - Returns zero if digits are separated by spaces (eg "123 45")
// - Range is checked against Overflow/Underflow (INT_MAX / INT_MIN), and returns 0.
int StrToInt(const char* s)
{
int minInt = 1 << (sizeof(int)*CHAR_BIT-1);
int maxInt = -(minInt+1);
char* w;
do { // Skip any leading whitespace
for(w=" \t\n\v\f\r"; *w && *s != *w; ++w) ;
if (*s == *w) ++s; else break;
} while(*s);
int sign = 1;
if ('-' == *s) sign = -1;
if ('+' == *s || '-' == *s) ++s;
long long i=0;
while('0' <= *s && *s <= '9')
{
i = 10*i + *s++ - '0';
if (sign*i < minInt || maxInt < sign*i)
{
i = 0;
break;
}
}
while (*s) // Skip any trailing whitespace
{
for(w=" \t\n\v\f\r"; *w && *s != *w; ++w) ;
if (*w && *s == *w) ++s; else break;
}
return (int)(!*s*sign*i);
}
" not able to include libraries" --> Unclear if code is allowed access to INT_MAX, INT_MIN. There is no way to determine the minimum/maximum signed integer in a completely portable fashion without using the language provided macros like INT_MAX, INT_MIN.
Use INT_MAX, INT_MIN is available. Else we could guess the char width is 8. We could guess there are no padding bits. We could guess that integers are 2's complement. With these reasonable assumptions, minimum and maximum are defined below.
Note: Shifting into the sign bit is undefined behavior (UB), so don't do that.
Let us add another restriction: make a solution that works for any signed integer from signed char to intmax_t. This disallows code from using a wider type, as there may not be a wider type.
typedef int Austin_int;
#define Austin_INT_MAXMID ( ((Austin_int)1) << (sizeof(Austin_int)*8 - 2) )
#define Austin_INT_MAX (Austin_INT_MAXMID - 1 + Austin_INT_MAXMID)
#define Austin_INT_MIN (-Austin_INT_MAX - 1)
int Austin_isspace(int ch) {
const char *ws = " \t\n\r\f\v";
while (*ws) {
if (*ws == ch) return 1;
ws++;
}
return 0;
}
// *endptr points to where parsing stopped
// *errorptr indicates overflow
Austin_int Austin_strtoi(const char *s, char **endptr, int *errorptr) {
int error = 0;
while (Austin_isspace(*s)) {
s++;
}
char sign = *s;
if (*s == '-' || *s == '+') {
s++;
}
Austin_int sum = 0;
while (*s >= '0' && *s <= '9') {
int ch = *s - '0';
if (sum <= Austin_INT_MIN / 10 &&
(sum < Austin_INT_MIN / 10 || -ch < Austin_INT_MIN % 10)) {
sum = Austin_INT_MIN;
error = 1;
} else {
sum = sum * 10 - ch;
}
s++;
}
if (sign != '-') {
if (sum < -Austin_INT_MAX) {
sum = Austin_INT_MAX;
error = 1;
} else {
sum = -sum;
}
}
if (endptr) {
*endptr = (char *) s;
}
if (errorptr) {
*errorptr = error;
}
return sum;
}
The above depends on C99 or later in the Austin_INT_MIN Austin_INT_MIN % 10 part.
This is the cleanest and safest way I could come up with
int str_to_int(const char * str, size_t n, int * int_value) {
int i;
int cvalue;
int value_muliplier = 1;
int res_value = 0;
int neg = 1; // -1 for negative and 1 for whole.
size_t str_len; // String length.
int end_at = 0; // Where loop should end.
if (str == NULL || int_value == NULL || n <= 0)
return -1;
// Get string length
str_len = strnlen(str, n);
if (str_len <= 0)
return -1;
// Is negative.
if (str[0] == '-') {
neg = -1;
end_at = 1; // If negative 0 item in 'str' is skipped.
}
// Do the math.
for (i = str_len - 1; i >= end_at; i--) {
cvalue = char_to_int(str[i]);
// Character not a number.
if (cvalue == -1)
return -1;
// Do the same math that is down below.
res_value += cvalue * value_muliplier;
value_muliplier *= 10;
}
/*
* "436"
* res_value = (6 * 1) + (3 * 10) + (4 * 100)
*/
*int_value = (res_value * neg);
return 0;
}
int char_to_int(char c) {
int cvalue = (int)c;
// Not a number.
// 48 to 57 is 0 to 9 in ascii.
if (cvalue < 48 || cvalue > 57)
return -1;
return cvalue - 48; // 48 is the value of zero in ascii.
}

atoi implementation in C

I can't understand the following atoi implementation code, specifically this line:
k = (k << 3) + (k << 1) + (*p) - '0';
Here is the code:
int my_atoi(char *p) {
int k = 0;
while (*p) {
k = (k << 3) + (k << 1) + (*p) - '0';
p++;
}
return k;
}
Can someone explain it to me ?
Another question: what should be the algorithm of atof implementation ?
<< is bit shift, (k<<3)+(k<<1) is k*10, written by someone who thought he was more clever than a compiler (well, he was wrong...)
(*p) - '0' is subtracting the value of character 0 from the character pointed by p, effectively converting the character to a number.
I hope you can figure out the rest... just remember how the decimal system works.
Here is a specification for the standard function atoi. Sorry for not quoting the standard, but this will work just as fine (from: http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/ )
The function first discards as many whitespace characters (as in
isspace) as necessary until the first non-whitespace character is
found. Then, starting from this character, takes an optional initial
plus or minus sign followed by as many base-10 digits as possible, and
interprets them as a numerical value.
The string can contain additional characters after those that form the
integral number, which are ignored and have no effect on the behavior
of this function.
If the first sequence of non-whitespace characters in str is not a
valid integral number, or if no such sequence exists because either
str is empty or it contains only whitespace characters, no conversion
is performed and zero is returned.
k = (k << 3) + (k << 1);
means
k = k * 2³ + k * 2¹ = k * 8 + k * 2 = k * 10
Does that help?
The *p - '0' term adds the value of the next digit; this works because C requires that the digit characters have consecutive values, so that '1' == '0' + 1, '2' == '0' + 2, etc.
As for your second question (atof), that should be its own question, and it's the subject for a thesis, not something simple to answer...
#include <stdio.h>
#include <errno.h>
#include <limits.h>
double atof(const char *string);
int debug=1;
int main(int argc, char **argv)
{
char *str1="3.14159",*str2="3",*str3="0.707106",*str4="-5.2";
double f1,f2,f3,f4;
if (debug) printf("convert %s, %s, %s, %s\n",str1,str2,str3,str4);
f1=atof(str1);
f2=atof(str2);
f3=atof(str3);
f4=atof(str4);
if (debug) printf("converted values=%f, %f, %f, %f\n",f1,f2,f3,f4);
if (argc > 1)
{
printf("string %s is floating point %f\n",argv[1],atof(argv[1]));
}
}
double atof(const char *string)
{
double result=0.0;
double multiplier=1;
double divisor=1.0;
int integer_portion=0;
if (!string) return result;
integer_portion=atoi(string);
result = (double)integer_portion;
if (debug) printf("so far %s looks like %f\n",string,result);
/* capture whether string is negative, don't use "result" as it could be 0 */
if (*string == '-')
{
result *= -1; /* won't care if it was 0 in integer portion */
multiplier = -1;
}
while (*string && (*string != '.'))
{
string++;
}
if (debug) printf("fractional part=%s\n",string);
// if we haven't hit end of string, go past the decimal point
if (*string)
{
string++;
if (debug) printf("first char after decimal=%c\n",*string);
}
while (*string)
{
if (*string < '0' || *string > '9') return result;
divisor *= 10.0;
result += (double)(*string - '0')/divisor;
if (debug) printf("result so far=%f\n",result);
string++;
}
return result*multiplier;
}
Interestingly, the man page for atoi doesn't indicate setting of errno so if you're talking any number > (2^31)-1, you're out of luck and similarly for numbers less than -2^31 (assuming 32-bit int). You'll get back an answer but it won't be what you want. Here's one that could take a range of -((2^31)-1) to (2^31)-1, and return INT_MIN (-(2^31)) if in error. errno could then be checked to see if it overflowed.
#include <stdio.h>
#include <errno.h> /* for errno */
#include <limits.h> /* for INT_MIN */
#include <string.h> /* for strerror */
extern int errno;
int debug=0;
int atoi(const char *c)
{
int previous_result=0, result=0;
int multiplier=1;
if (debug) printf("converting %s to integer\n",c?c:"");
if (c && *c == '-')
{
multiplier = -1;
c++;
}
else
{
multiplier = 1;
}
if (debug) printf("multiplier = %d\n",multiplier);
while (*c)
{
if (*c < '0' || *c > '9')
{
return result * multiplier;
}
result *= 10;
if (result < previous_result)
{
if (debug) printf("number overflowed - return INT_MIN, errno=%d\n",errno);
errno = EOVERFLOW;
return(INT_MIN);
}
else
{
previous_result *= 10;
}
if (debug) printf("%c\n",*c);
result += *c - '0';
if (result < previous_result)
{
if (debug) printf("number overflowed - return MIN_INT\n");
errno = EOVERFLOW;
return(INT_MIN);
}
else
{
previous_result += *c - '0';
}
c++;
}
return(result * multiplier);
}
int main(int argc,char **argv)
{
int result;
printf("INT_MIN=%d will be output when number too high or too low, and errno set\n",INT_MIN);
printf("string=%s, int=%d\n","563",atoi("563"));
printf("string=%s, int=%d\n","-563",atoi("-563"));
printf("string=%s, int=%d\n","-5a3",atoi("-5a3"));
if (argc > 1)
{
result=atoi(argv[1]);
printf("atoi(%s)=%d %s",argv[1],result,(result==INT_MIN)?", errno=":"",errno,strerror(errno));
if (errno) printf("%d - %s\n",errno,strerror(errno));
else printf("\n");
}
return(errno);
}
Here is my implementation(tested successfully with cases containing and starting with letters, +, - and zero's).
I tried to reverse-engineer atoi function in Visual Studio. If the input string only contained numerical characters, it could be implemented in one loop. but it gets complicated because you should take care of -,+ and letters.
int atoi(char *s)
{
int c=1, a=0, sign, start, end, base=1;
//Determine if the number is negative or positive
if (s[0] == '-')
sign = -1;
else if (s[0] <= '9' && s[0] >= '0')
sign = 1;
else if (s[0] == '+')
sign = 2;
//No further processing if it starts with a letter
else
return 0;
//Scanning the string to find the position of the last consecutive number
while (s[c] != '\n' && s[c] <= '9' && s[c] >= '0')
c++;
//Index of the last consecutive number from beginning
start = c - 1;
//Based on sign, index of the 1st number is set
if (sign==-1)
end = 1;
else if (sign==1)
end = 0;
//When it starts with +, it is actually positive but with a different index
//for the 1st number
else
{
end = 1;
sign = 1;
}
//This the main loop of algorithm which generates the absolute value of the
//number from consecutive numerical characters.
for (int i = start; i >=end ; i--)
{
a += (s[i]-'0') * base;
base *= 10;
}
//The correct sign of generated absolute value is applied
return sign*a;
}
about atoi() hint code from here:
and based on the atoi(), my implementation of atof():
[have same limitation of original code, doesn't check length, etc]
double atof(const char* s)
{
double value_h = 0;
double value_l = 0;
double sign = 1;
if (*s == '+' || *s == '-')
{
if (*s == '-') sign = -1;
++s;
}
while (*s >= 0x30 && *s <= 0x39)
{
value_h *= 10;
value_h += (double)(*s - 0x30);
++s;
}
// 0x2E == '.'
if (*s == 0x2E)
{
double divider = 1;
++s;
while (*s >= 0x30 && *s <= 0x39)
{
divider *= 10;
value_l *= 10;
value_l += (double)(*s - 0x30);
++s;
}
return (value_h + value_l/divider) * sign;
}
else
{
return value_h * sign;
}
}

Resources