Convert from char array to 16bit signed int and 32bit unsigned int - c

I'm working on some embedded C for a PCB I've developed, but my C is a little rusty!
I'm looking to do some conversions from a char array to various integer types.
First Example:
[input] " 1234" (note the space before the 1)
[convert to] (int16_t) 1234
Second Example:
[input] "-1234"
[convert to] (int16_t) -1234
Third Example:
[input] "2017061234"
[convert to] (uint32_t) 2017061234
I've tried playing around with atoi(), but I don't seem to be getting the result I expected. Any suggestions?
[EDIT]
This is the code for the conversions:
char *c_sensor_id = "0061176056";
char *c_reading1 = " 3630";
char *c_reading2 = "-24.30";
uint32_t sensor_id = atoi(c_sensor_id); // comes out as 536880136
uint16_t reading1 = atoi(c_reading1); // comes out as 9224
uint16_t reading2 = atoi(c_reading2); // comes out as 9224

A couple of things:
Never use the atoi family of functions since they have no error handling and may crash if the input format is bad. Instead, use the strtol family of functions.
Either of these functions is somewhat resource heavy on resource-constrained microcontrollers. You might have to roll out your own version of strtol.
Example:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
int main()
{
const char* c_sensor_id = "0061176056";
const char* c_reading1 = " 3630";
const char* c_reading2 = "-1234";
c_reading1++; // fix the weird string format
uint32_t sensor_id = (uint32_t)strtoul(c_sensor_id, NULL, 10);
uint16_t reading1 = (uint16_t)strtoul(c_reading1, NULL, 10);
int16_t reading2 = (int16_t) strtol (c_reading2, NULL, 10);
printf("%"PRIu32 "\n", sensor_id);
printf("%"PRIu16 "\n", reading1);
printf("%"PRId16 "\n", reading2);
}
Output:
61176056
3630
-1234

The observed behavior is very surprising. I suggest writing functions to convert character strings to int32_t and uint32_t and use them instead of atoi():
uint32_t atou32(const char *s) {
uint32_t v = 0;
while (*s == ' ')
s++;
while (*s >= '0' && *s <= '9')
v = v * 10 + *s++ - '0';
return v;
}
int32_t atos32(const char *s) {
while (*s == ' ')
s++;
return (*s == '-') ? -atou32(s + 1) : atou32(s);
}
There is no error handling, a + is not even supported but at least the value is converted as 32 bits, which would not be the case for atoi() if type int only has 16 bits on your target platform.

Try initializing your strings like this:
char c_sensor_id[] = "0061176056";
char c_reading1[] = " 3630";
char c_reading2[] = "-24.30";

Related

Memory corruption when attempting to use strtok

When I run this code on my microcontroller, it crashes when attempting to printf on "price_right_of_period".
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DEBUG
char *str = "$3.45";
int main()
{
char *decimal_pos; // where the decimal place is in the string
char buffer[64]; // temporary working buffer
int half_price_auto_calc = 1;
// the cost of the product price difference for half price mode
int price_save_left, price_save_right = 0;
int price_half_left, price_half_right = 0;
// EG: let's say we get the string "$2.89"
char *price_left_of_period; // this will contain "2"
char *price_right_of_period; // this will contain "89"
// find where the decimal is
decimal_pos = strstr(str, ".");
if (decimal_pos != NULL)
{
printf("\nThe decimal point was found at array index %d\n", decimal_pos - str);
printf("Splitting the string \"%s\" into left and right segments\n\n", str);
}
// get everything before the period
strcpy(buffer, str); // copy the string
price_left_of_period = strtok(buffer, ".");
// if the dollar sign exists, skip over it
if (price_left_of_period[0] == '$') price_left_of_period++;
#ifdef DEBUG
printf("price_left_of_period = \"%s\"\n", price_left_of_period);
#endif
// get everything after the period
//
// strtok remembers the last string it worked with and where it ended
// to get the next string, call it again with NULL as the first argument
price_right_of_period = strtok(NULL, "");
#ifdef DEBUG
printf("price_right_of_period = \"%s\"\n\n", price_right_of_period);
#endif
if (half_price_auto_calc == 1)
{
// calculate the amount we saved (before the decimal)
price_save_left = atoi((const char *)price_left_of_period);
// halve the value if half price value is true
price_half_left = price_save_left / 2;
// calculate the amount we saved (before the decimal)
price_save_right = atoi((const char *)price_right_of_period);
// halve the value if half price value is true
price_half_right = price_save_right / 2;
#ifdef DEBUG
printf("price_half_left = \"%d\"\n", price_half_left);
printf("price_half_right = \"%d\"", price_half_right);
#endif
}
return 0;
}
The code runs and works fine here: https://onlinegdb.com/kDAw2cJyz. However as mentioned above, on my MCU it crashes (image below).
Would anyone have any ideas why this might be happening as a result of my code? The code looks right to me but it's always nice to get a second opinion from other C experts :)
SOLUTION:
Your code does have one bug. %d\n", decimal_pos - str doesn't work because decimal_pos - str has the wrong type to
print through %d. You need to cast it (I doubt that's causing the
crash but you can test by commenting it out and re-testing)
The code indeed has a bug here:
printf("\nThe decimal point was found at array index %d\n", decimal_pos - str);
The difference of 2 pointers has type ptrdiff_t which may be different from int expected for %d. You should either use %td or cast the difference as (int)(decimal_pos - str). It is however very surprising that this type mismatch be the cause of your problem.
Note that you copy the string without testing its length in strcpy(buffer, str); which for this example is OK but might have undefined behavior if str points to a longer string.
The code is too complicated: there is no need for strtok() as you already have the offset to the decimal point if any. You can use atoi() with a pointer to the beginning of the integral portion without patching the . with a null byte. You could also use strtol() to avoid strstr() too.
Note also that the code will compute an incorrect price in most cases: "$3.45" will be changed to "$1.22" which is substantially more than a 50% rebate.
You should just convert the number as an integral number of cents and use that to compute the reduced price.
Here is a simplified version:
#include <stdio.h>
#include <stdlib.h>
int half_price_auto_calc = 1;
char *str = "$3.45";
int main() {
int cents;
char *p = str;
if (*p == '$')
p++;
// use strtol to convert dollars and cents
cents = strtol(p, &p, 10) * 100;
if (*p == '.') {
cents += strtol(p + 1, NULL, 10);
}
if (half_price_auto_calc) {
cents /= 2;
}
printf("reduced price: $%d.%02d\n", cents / 100, cents % 100);
return 0;
}

SegFault due to invalid assignment?

I have an array that is declared inside a public struct like this:
uint16_t *registers;
In a function I'm retrieving a char string (stored in buffer, see code below) that contains numerical values separated by a comma (e.g., "1,12,0,136,5,76,1243"). My goal is to get each individual numerical value and store it in the array, one after another.
i = 0;
const char delimiter[] = ",";
char *end;
tmp.vals = strtok(buffer, delimiter);
while (tmp.vals != NULL) {
tmp.registers[i] = strtol(tmp.vals, &end, 10);
tmp.vals = strtok(NULL, delimiter);
i++;
}
The problem is that the line containing strtol is producing a Segmentation fault (core dumped) error. I'm pretty sure it's caused by trying to fit unsigned long values into uint16_t array slots but no matter what I try I can't get it fixed.
Changing the code as follows seems to have solved the problem:
unsigned long num = 0;
size_t size = 0;
i = 0;
size = 1;
tmp.vals = (char *)calloc(strlen(buffer) + 1, sizeof(char));
tmp.registers = (uint16_t *)calloc(size, sizeof(uint16_t));
tmp.vals = strtok(buffer, delimiter);
while (tmp.vals != NULL) {
num = strtoul(tmp.vals, &end, 10);
if (0 <= num && num < 65536) {
tmp.registers = (uint16_t *)realloc(tmp.registers, size + i);
tmp.registers[i] = (uint16_t)num;
} else {
fprintf(stderr, "==> %lu is too large to fit in register[%d]\n", num, i);
}
tmp.vals = strtok(NULL, delimiter);
i++;
}
A long integer is at least 32 bits, so yes, you're going to lose information trying to shove a signed 32 bit integer into an unsigned 16 bit integer. If you have compiler warnings on (I use -Wall -Wshadow -Wwrite-strings -Wextra -Wconversion -std=c99 -pedantic) it should tell you that.
test.c:20:28: warning: implicit conversion loses integer precision: 'long' to 'uint16_t'
(aka 'unsigned short') [-Wconversion]
tmp.registers[i] = strtol(tmp.vals, &end, 10);
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
However, this isn't going to cause a segfault. You'll lose 16 bits and the change in sign will do funny things.
#include <stdio.h>
#include <inttypes.h>
int main() {
long big = 1234567;
uint16_t small = big;
printf("big = %ld, small = %" PRIu16 "\n", big, small);
}
If you know what you're reading will fit into 16 bits, you can make things a little safer first by using strtoul to read an unsigned long, verify that it's small enough to fit, and explicitly cast it.
unsigned long num = strtoul(tmp.vals, &end, 10);
if( 0 <= num && num < 65536 ) {
tmp.registers[i] = (uint16_t)num;
}
else {
fprintf(stderr, "%lu is too large to fit in the register\n", num);
}
More likely tmp.registers (and possibly buffer) weren't properly initialized and allocated points to garbage. If you simply declared the tmp on the stack like so:
Registers tmp;
This only allocates memory for tmp, not the things it points to. And it will contain garbage. tmp.registers will point to some random spot in memory. When you try to write to it it will segfault... eventually.
The register array needs to be allocated.
size_t how_many = 10;
uint16_t *registers = malloc( sizeof(uint16_t) * how_many );
Thing tmp = {
.registers = registers,
.vals = NULL
};
This is fine so long as your loop only ever runs how_many times. But you can't be sure of that when reading input. Your loop is potentially reading an infinite number of registers. If it goes over the 10 we've allocated it will again start writing into someone else's memory and segfault.
Dynamic memory is too big a topic for here, but we can at least limit the loop to the size of the array by tracking the maximum size of registers and how far in it is. We could do it in the loop, but it really belongs in the struct.
typedef struct {
uint16_t *registers;
char *vals;
size_t max;
size_t size;
} Registers;
While we're at it, put initialization into a function so we're sure it's done reliably each time.
void Registers_init( Registers *registers, size_t size ) {
registers->registers = malloc( sizeof(uint16_t) * size );
registers->max = size;
registers->size = 0;
}
And same with our bounds check.
void Registers_push( Registers *registers, uint16_t num ) {
if( registers->size == registers->max ) {
fprintf(stderr, "Register has reached its limit of %zu\n", registers->max);
exit(1);
}
registers->registers[ registers->size ] = (uint16_t)num;
registers->size++;
}
Now we can add registers safely. Or at least it will error nicely.
Registers registers;
Registers_init( &registers, 10 );
tmp.vals = strtok(buffer, delimiter);
while (tmp.vals != NULL) {
unsigned long num = strtoul(tmp.vals, &end, 10);
if( 0 <= num && num < 65536 ) {
Registers_push( &tmp, (uint16_t)num );
}
else {
fprintf(stderr, "%lu is too large to fit in the register\n", num);
}
tmp.vals = strtok(NULL, delimiter);
i++;
}
At this point we're re-implementing a size-bound array. It's a good exercise, but for production code use an existing library such as GLib which provides self-growing arrays and a lot more features.

String Allocated with malloc Enaccessible after Function Return

I have run into a strange bug and I cannot for the life of me get it figured out. I have a function that decodes a byte array into a string based on another encoding function. The function that decodes looks roughly like this:
char *decode_string( uint8_t *encoded_string, uint32_t length,
uint8_t encoding_bits ) {
char *sequence_string;
uint32_t idx = 0;
uint32_t posn_in_buffer;
uint32_t posn_in_cell;
uint32_t encoded_nucleotide;
uint32_t bit_mask;
// Useful Constants
const uint8_t CELL_SIZE = 8;
const uint8_t NUCL_PER_CELL = CELL_SIZE / encoding_bits;
sequence_string = malloc( sizeof(char) * (length + 1) );
if ( !sequence_string ) {
ERR_PRINT("could not allocate enough space to decode the string\n");
return NULL;
}
// Iterate over the buffer, converting one nucleotide at a time.
while ( idx < length ) {
posn_in_buffer = idx / NUCL_PER_CELL;
posn_in_cell = idx % NUCL_PER_CELL;
encoded_nucleotide = encoded_string[posn_in_buffer];
encoded_nucleotide >>= (CELL_SIZE - encoding_bits*(posn_in_cell+1));
bit_mask = (1 << encoding_bits) - 1;
encoded_nucleotide &= bit_mask;
sequence_string[idx] = decode_nucleotide( encoded_nucleotide );
// decode_nucleotide returns a char on integer input.
idx++;
}
sequence_string[idx] = '\0';
printf("%s", sequence_string); // prints the correct string
return sequence_string;
}
The bug is that the return pointer, if I try to print it, causes a segmentation fault. But calling printf("%s\n", sequence_string) inside of the function will print everything just fine. If I call the function like this:
const char *seq = "AA";
uint8_t *encoded_seq;
encode_string( &encoded_seq, seq, 2, 2);
char *decoded_seq = decode_string( encoded_seq, 2, 2);
if ( decoded_seq ) {
printf("%s\n",decoded_seq); // this crashes
if ( !strcmp(decoded_seq, seq) ) {
printf("Success!");
}
then it will crash on the print.
A few notes, the other functions seem to all work, I've tested them fairly thoroughly (i.e. decode_nucleotide, encode_string). The string also prints correctly inside the function. It is only after the function returns that it stops working.
My question is, what might cause this memory to become invalid just by returning the pointer from a function? Thanks in advance!
First (and not that important, but) in the statement:
sequence_string = malloc( sizeof(char) * (length + 1) );
sizeof(char) by definition is always == 1. so the statement becomes:
sequence_string = malloc(length + 1);
In this section of your post:
char *decoded_seq = decode_string( encoded_seq, 2, 2);
...since I cannot see your implementation of decode_string, I can only make assumptions about how you are verifying its output before returning it. I do however understand that you are expecting the return value to contain values that would be legal contents for a C string. I can also assume that because you are working with coding and decoding, that the output type is likely unsigned char. If I am correct, then a legal range of characters for an output type of unsigned char is 0-255.
You are not checking the output before sending the value to the printf statement. If the value at the memory address of decoded_seq happens to be 0, (in the range of unsigned char) your program would crash. String functions do not work well with null pointers.
You should verify the return of _decode_string_ sending it to printf
char *decoded_seq = decode_string( encoded_seq, 2, 2);
if(decoded_seq != NULL)
{
...

fastest way to convert C string to all one case processor wise

The following code which I whipped up in 7 minutes takes a short string and converts all letters to lower case:
void tolower(char *out,const char *in){
int l=strlen(in);int cc;int i;
for (i=0;i<l;i++){
cc=(int)in[i]-0;
if (cc >=65 && cc <=90){cc+=0x20;}
out[i]=(char)cc;
}
}
int main(int argc, char *argv[]){
const char *w="aBcDe";
char w2[6]=" ";
tolower(w2,w);
printf("x=%s %s\n",w,w2);
return EXIT_SUCCESS;
}
The problem with it is that I will be dealing with large sets of data (approximately 10KB worth of data per second) and I want to be able to make a function that works the fastest possible.
I have seen code out there that can deal with machine registers and when I used code like that in the past with Quick Basic, things worked faster.
So I'm curious as to how I could use machine registers (like eax) compatible with both 32 and 64 bit processors in my C program.
If I could take at least 4 bytes of the string at a time and then act on all 4 bytes simultaneously, then that would be best.
In Quick Basic, I could achieve what I need with the help of its mkd$() and cvd() functions.
Anyone know how I can make the function I posted work faster? and please don't say upgrade the computer processor.
Two approaches, which one is faster depends on profiling in your system.
// tolower()
void Mike_tolower1(char *out, const char *in) {
while ((*out++ = tolower((unsigned char) (*in++) )) != 0);
}
}
// table lookup
void Mike_tolower2(char *out, const char *in) {
// fill in the table
static const char lwr[CHAR_MAX+1] = { '\0', '\1', '\2', ...
'a', 'b' ...
'a', 'b' ...
};
while (*in) {
*out++ = lwr[(unsigned char) (*in++)];
}
}
Fastest way don't use strlen() since it does exactly what the following code does, it counts how many characters are there before the '\0' appears so you are iterating through the string twice, do it this way
#include <ctype.h>
void string_tolower(char *string)
{
while (*string != '\0')
{
*string = tolower(*string);
string++;
}
}
and don't call your function tolower it's a standard function declared in ctype.h and it converts a single ascii character to lower case.
Fastest depends on the processor, but a pretty speedy version is:
int c;
char *cp;
for (cp = out; 0 != (c=*cp); ++cp) {
if ((c >= 'A') && (c <= 'Z'))
*cp = (char)(c + 'a' - 'A');
}
That includes Jester's suggestion of avoiding strlen.
If you have some RAM to burn, you can compute a lookup table. So for example:
pseudocode for creating a tolower lookup table
lookup["AA"] = "aa"
lookup["bb"] = "bb"
This way you can lowercase 2 bytes at a time and with no if statements needed.
If you really want to go nuts, you could write a GPGPU implementation that would scream.
See Chux answer for an example of implementing a 1 char at a time look-up table.
I'm gonna use this code I just constructed now based from various sources including here: https://code.google.com/p/stringencoders/wiki/PerformanceAscii.
void tolower1(char *out,const char *in,int lg){
uint32_t x;
const uint32_t* s = (const uint32_t*) in;
uint32_t* d = (uint32_t*) out;
int l=(lg/sizeof(uint32_t));
int i;
for(i=0;i<l;++i){
x=s[i];
x=x-(((x+(0x05050505+0x1a1a1a1a)) >> 2) & 0x20202020);
d[i]=x;
}
}
I suggest something more like:
#include <ctype.h>
int main()
{
const char *w="aBcDe";
// allow room for nul termination byte
char w2[6]=" " = {'\0'};
// this 'for' statement may need tweaking if
// w[] contains '\0' byte except at the end
for( int i=0; w[i]; i++)
{
w3[i] = tolower(w[i]);
}
printf("x=%s %s\n",w,w2);
return EXIT_SUCCESS;
} // end function: main
or, for a callable function,
that also allows for nul bytes within the string
char *myToLower( int byteCount, char *originalArray )
{
char *lowerArray = NULL;
if( NULL == (lowerArray = malloc(byteCount) ) )
{ // then, malloc failed
perror( "malloc failed" );
exit( EXIT_FAILURE );
}
// implied else, malloc successful
for( int i=0; i< byteCount; i++ )
{
lowerArray[i] = tolower(originalArray[i]);
}
return( lowerArray );
} // end function: myToLower

Printing hex characters in c giving weird output

I have a network capture tool that captures packets and does some processing on them. Now, here is a small fragment of the code. u_char is the name given to unsigned char. When I try to print the variable smac to stdout, all I get is 000000000000 on the screen. I want to print the actual smac and dmac.
char * str;
char converted[6*2 + 1];
u_char smac[6], dmac[6];
int i;
if (ntop_lua_check(vm, __FUNCTION__, id, LUA_TSTRING))
return(CONST_LUA_PARAM_ERROR);
if ((str = (char*)lua_tostring(vm, id)) == NULL)
return(CONST_LUA_PARAM_ERROR);
sscanf(str, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&smac[0], &smac[1], &smac[2], &smac[3], &smac[4], &smac[5]);
for (i = 0; i < 6; i++)
{
sprintf(&converted[i*2], "%02X", smac[i]);
}
printf("%s\n", converted);
Is the problem with unsigned char getting promoted to int or something by sprintf and printing the unnecessary two bytes? I am not sure.Any help would be of great value. Thank you.

Resources