strtok returns too many strings

strtok returns too many strings - c

I am developing a program that's a sort of heartbeat designed to run on a variety of servers. The function in question, reproduced below, retrieves the list of "friends," and for each "friend" in the list, it executes a handshake operation (via ping_and_report, not shown).
The problem is that on first call to this routine, strtok_r seems to return more strings than are present in the source, and I have not been able to determine why. The code:
void pingServerList(int dummy) {
char *p ;
char *my_friends ;
char *nextSvr, *savePtr ; ;
char *separators = ",; \t" ;
server_list_t *ent = NULL ;
static long round_nbr = 0 ;
unsigned int len ;
time_t now ;
char message[4096] ;
char *hex ;
round_nbr++ ;
p = get_server_list() ;
if (p) {
len =strlen(p) ;
my_friends = malloc(len+1) ;
strncpy(my_friends, p, len) ;
}
nextSvr = strtok_r(my_friends, separators, &savePtr) ;
while (nextSvr) {
// Ensure that nobody messes with nextSvr. . .
char *workSvr = malloc(strlen(nextSvr) + 1) ;
strcpy(workSvr, nextSvr) ;
if (debug) {
len = strlen(workSvr) * 2 + 3 ;
hex = malloc(len) ;
get_hex_val(workSvr, hex, len) ;
write_log(fp_debug
, "Server: %s (x'%s')"
, workSvr, hex) ;
free(hex) ;
}
ping_and_report(workSvr, round_nbr) ;
free(workSvr) ;
nextSvr = strtok_r(NULL, separators, &savePtr) ;
}
... is not too complex at that point, I think. And I don't see any room for mucking with the values. But the log file reveals the issue here:
2012-07-09 23:26 Debug activated...
2012-07-09 23:26 get_server_list() returning velmicro, stora-2 (x'76656C6D6963726F2C2073746F72612D32')
2012-07-09 23:26 Server: velmicro (x'76656C6D6963726F')
2012-07-09 23:26 Server: stora-2 (x'73746F72612D32')
2012-07-09 23:26 Server: re (x'726519')
The crazy thing is that (at least from several executions of the code) this will only fail on the first call. Calls 2-n (where n is in the hundreds) do not exhibit this problem.
Do any of you folks see what I'm obviously missing? (BTW: this fails exactly the same way on four different systems with versions of linux.)

when you write this
strncpy(my_friends, p, len) ;
you are not ensuring that my_friends ends with a \0
try
strncpy(my_friends, p, len)[len-1] = '\0';
alt. use calloc to allocate my_friends

Related

Integer data compression for transfer in C without external libraries

I googled ans searched here a bunch without a fitting solution. The title is maybe a bit weird or not fully accurate, but let me explain:
My IoT device collects a bunch of data every second that I can represent as a list of integer. Here is an example of one row of sensor reads (the zeros are not always 0 btw):
230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0
On trigger I want to send the whole table (all rows until then) to my computer. I could just stringify it and concatenate everything, but wonder if there is a more efficient encoding/compression to reduce the byte count, both when storing in RAM/flash and for reduced transfer volume. Ideally this could be achieved with integrated functions, ie no external compression libraries. I am not that strong with encoding/compression, hope you can give me a hint.

Zlib/Zstd libraries are better suited for doing general purpose compression. If I may assume that you don't want to use any third party libraries, here is a hand coded version of some naive compression method, which saves half of the bytes of the input string.
The basic idea is very simple. Your strings will at most have 16 different characters which can be mapped to 4-bits rather than typical 8-bits. SEE THE ASSUMPTIONS BELOW. You can try base16, base64, base128 encodings too, but this is the simplest.
Assumptions:
First you'll convert all your numbers into a string in decimal format.
The string won't contain any other characters than 0,1,2,3,4,5,6,7,8,9,+,-,.,space, and a comma.
============================================================================
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static inline char map(char c)
{
switch(c) {
case ' ' : return ('/' - '*');
case '\0': return 0;
default : return c - '*';
}
return 0;
}
static inline char revmap(char c)
{
switch(c) {
case '\0' : return 0;
case '/' - '*': return ' ';
default : return c + '*';
}
return 0;
}
char *compress(const char *s, int len)
{
int i, j;
char *compr = malloc((len+1)/2 + 1);
j = 0;
for (i = 1; i < len; i += 2)
compr[j++] = map(s[i-1]) << 4 | map(s[i]);
if (i-1 < len)
compr[j++] = map(s[i-1]) << 4;
compr[j] = '\0';
return compr;
}
char *decompress(const char *s, int len)
{
int i, j;
char *decompr = malloc(2*len + 1);
for (i = j = 0; i < len; i++) {
decompr[j++] = revmap((s[i] & 0xf0) >> 4);
decompr[j++] = revmap(s[i] & 0xf);
}
decompr[j] = '\0';
return decompr;
}
int main()
{
const char *input = "230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0 ";
int plen = strlen(input);
printf("plain(len=%d): %s\n", plen, input);
char *compr = compress(input, plen);
int clen = strlen(compr);
char *decompr = decompress(compr, clen);
int dlen = strlen(decompr);
printf("decompressed(len=%d): %s\n", dlen, decompr);
free(compr);
free(decompr);
}

Simplest solution is to simply dump data out in binary form. It may be smaller or bigger than string form depending on your data, but you don't have to do any data processing on device.
If most of your data is small, you can use variable length data encoding for serialization. There are several, but CBOR is fairly simple.
If your data changes only very little, you could send only first row as absolute values, and remaining rows as delta of previous row. This would result in many small numbers, which typically are more efficient in previously mentioned encoding systems.
I wouldn't try to implement any general purpose compression algorithms without any experience and external libraries, unless you absolutely need it. Finding suitable algorithm that compresses your data well enough and with reasonable resource usage can be time consuming.

sprintf raw bytes to string in C?

I am sending some raw bytes over the wire in C (using HTTP). I'm currently doing it like this:
// response is a large buffer
int n = 0; // response length
int x = 42; // want client to read x
int y = 43; // and y
// write a simple HTTP response containing a 200 status code then x and y in binary format
strcpy(response, "HTTP/1.1 200\r\n\r\n");
n += 16; // status line we just wrote is 16 bytes long
memcpy(response + n, &x, sizeof(x));
n += sizeof(x);
memcpy(response + n, &y, sizeof(y));
n += sizeof(y);
write(client, response, n);
In JavaScript, I then read this data using code like this:
request = new XMLHttpRequest();
request.responseType = "arraybuffer";
request.open("GET", "/test");
request.onreadystatechange = function() { if (this.readyState === XMLHttpRequest.DONE) { console.log(new Int32Array(this.response)) } }
request.send();
which prints [42, 43] as it should.
I'm wondering if there is a more elegant way to do this on the server-side though, e.g.
n += sprintf(response, "HTTP/1.1 200\r\n\r\n%4b%4b", &x, &y);
Where %4b is a made-up format specifier which just says: copy the 4 bytes from that address into the string (which would be "*\0\0\0") Is there a format specifier like the fictional %4b that does something like this?

It is an XY problem, you are asking about how to use sprintf() to solve your problem, rather than simply asking how to solve your problem. YOur actual problem is how to make that code more "elegant".
There is no particular reason to send the data in a single write operation - the network stack buffering will ensure that the data is packetised efficiently:
static const char header[] = "HTTP/1.1 200\r\n\r\n" ;
write( client, header, sizeof(header) - 1 ) ;
write( client, &x, sizeof(x) ) ;
write( client, &y, sizeof(y) ) ;
Note that X and Y will be written in the native machine byte order, which may be incorrect at the receiver. More generically then:
static const char header[] = "HTTP/1.1 200\r\n\r\n" ;
write( client, header, sizeof(header) - 1 ) ;
uint32_t nl = htonl( x ) ;
write( client, &nl, sizeof(nl) ) ;
nl = htonl( y ) ;
write( client, &nl, sizeof(nl) ) ;

Is there a format specifier like the fictional %4b?
No, there is not, and your method is fine. I would suggest using snprintf and some check to avoid buffer overflow, adding ex. static_assert(sizeof(int) == 4, "") checking that the platform uses big endian and similar environment and error handling and avoiding undefined behavior checks.
That said, you can use %c printf specifier multiple times, like "%c%c%c%c", ((char*)&x)[3], ((char*)&x)[2], ((char*)&x)[1], ((char*)&x)[0] to print 4 bytes. You can wrap it in macros and do:
#include <stdio.h>
#define PRI_BYTES_4 "%c%c%c%c"
#define ARG_BYTES_BE_4(var) \
((const char*)&(var))[3], \
((const char*)&(var))[2], \
((const char*)&(var))[1], \
((const char*)&(var))[0]
int main() {
int var =
'l' << 24 |
'a' << 16 |
'm' << 8 |
'e';
printf("Hello, I am " PRI_BYTES_4 ".\n",
ARG_BYTES_BE_4(var));
// will print `Hello, I am lame.`
}

WinDBG conditional breakpoint based on string arguments

I want to set a conditional breakpoint when the value of the 4th argument is equal to "abc".
void FunctionA(char* a, char* b, char* c, char* d)
{
`enter code here`//some code here
}
I use the following command but it doesn't work. Could you help?
bp app!FunctionA "as /mu ${/v:MyAlias} poi(d);.block{.if ($spat(\"${MyAlias}\", \"abc\") == 0) { } .else { gc } }"
Note: app.exe is my application name.

you cannot use /mu on char * /mu is for null terminated unicode string not ascii string for ascii string use /ma
I assume you have descriptive argument names and not an argument like d
which would obviously clash with 0xd aka 0n13
is d a number , string or symbol ??
what would poi(d) resolve to in your case is it poi(0x13) which obviously is a bad de-referance
or a local symbol illogically named d ??
also alias is not interpreted when you break
when using alias you should always stuff them in a script file and execute
the script file on each break
here is an example of a script file
as /ma ${/v:MyAlias} poi(k)
.block {
r $t0 = $spat("${MyAlias}" , "tiger")
.printf "%x\t${MyAlias}\n" , #$t0
.if(#$t0 != 1) {gc}
}
here is code on which this is operated comipled in debug mode with optimizations turned off
in release mode compiler will be smart enough to inline the printf() call
#include <stdio.h>
#include <stdlib.h> //msvc _countof
void func(char* h,char* i,char* j,char* k ) {
printf( "%s %s %s %s\n" ,h,i,j,k );
return;
}
int main(void) {
char* foo[] = {"goat","dog","sheep","cat","lion","tiger",0,"vampire"};
for(int x=0;x<_countof(foo);x++) {
func("this" , "is" , "a" , foo[x]);
}
return 0;
}
usage
windbg app.exe
set the break and run
keep in mind this or any script that uses alias will fail on
evaluating the null entry before char * vampire
if you want to break on "vampire" you may need to improvise without using alias at all
0:000> bl
0:000> bp strbp!func "$$>a< strbpcond.txt"
0:000> bl
0 e 00171260 0001 (0001) 0:**** strbp!func "$$>a< strbpcond.txt"
0:000> g
ModLoad: 72670000 72673000 C:\Windows\system32\api-ms-win-core-synch-l1-2-0.DLL
0 goat
0 dog
0 sheep
0 cat
0 lion
1 tiger
eax=00000005 ebx=7ffd7000 ecx=00000005 edx=001ac1e0 esi=001b6678 edi=001b667c
eip=00171260 esp=002bfa54 ebp=002bfa90 iopl=0 nv up ei ng nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293
strbp!func:
00171260 55 push ebp
0:000> dv
h = 0x001ac1f8 "this"
i = 0x001ac1f4 "is"
j = 0x001ac1f0 "a"
k = 0x001ac1e0 "tiger"

How to get an integer from stdin? Fast. Can trade accuracy for performance

I'm trying to come up with a fast and simple "Get a string from stdin and convert to an integer. If you can't, just pretend we got zero".
This is a Linux embedded system, CPU and memory are at a premium. Performance is important, accuracy not so much. This should be able to do multiple ingests per second. I will eventually turn it into a daemon and store latest 1024 values in an array.
Here's my take using atoi:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[] ) {
char *c = argv[1];
unsigned int i = 1; /* on atoi() failure, i = 0 */
if (i = atoi(c)) {
puts ("atoi() success");
}
else {
puts ("atoi() FAILED");
}
printf("argv[1] = %s\n", argv[1]);
printf(" i = %d\n", i);
}
A few test runs / fuzzing:
# ./test_atoi 3
atoi() success
argv[1] = 3
i = 3
# ./test_atoi 99999999999999999999
atoi() success
argv[1] = 99999999999999999999
i = 2147483647
# ./test_atoi 3.14159
atoi() success
argv[1] = 3.14159
i = 3
# ./test_atoi $(echo -ne "\u2605")
atoi() FAILED
argv[1] = ★
i = 0
This fails:
# ./test_atoi $(echo -e "\0")
Segmentation fault
I'll add a check for NUL then:
if (argv[1] == '\0') {
i = 0;
}
Will this be enough? Have i just (badly) re-implemented strtol?
Should i just go ahead and use strtol? If yes, anything i should be checking for, that strtol isn't already?
What i really really care about is not dying because of bad input. I can happily live with getting occasional garbage from the conversion.
EDIT: int i = 1 just because i want to see if atoi() makes it 0.
Ghetto profiling with time
EDIT: i've dropped the print statements and wrapped reading from stdin into atoi/strtol in a for loop.
# time seq 0 999888 | ./test_atoi
real 0m5.245s
user 0m5.870s
sys 0m0.030s
# time seq 0 999888 | ./test_atoi
real 0m5.230s
user 0m5.960s
sys 0m0.050s
# time seq 0 999888 | ./test_atoi
real 0m5.395s
user 0m5.920s
sys 0m0.080s
# time seq 0 999888 | ./test_strtol
real 0m5.332s
user 0m5.860s
sys 0m0.030s
# time seq 0 999888 | ./test_strtol
real 0m5.023s
user 0m5.790s
sys 0m0.060s
# time seq 0 999888 | ./test_strtol
real 0m5.286s
user 0m5.970s
sys 0m0.010s
Alright, this is insane. I should do something more productive with my time, and yours!

This is a Linux embedded system, CPU and memory are at a premium.
Yes. Err, no. If you're running a normal linux, your kernel will use atoi and the inverse in a few thousand places. Your single number parser will hardly make any impact, unless you're intending to call it several thousand times per second...
Should i just go ahead and use strtol?
for the reasons above: yes.
If yes, anything i should be checking for, that strtol isn't already?
you should check strtol's return value. I really don't sympathesize with your "don't need precision" approach. something like this is either done right, or catastrophically wrong.
EDIT You said:
don't need precision = i only care about values 0 - 100
This means a) you just need atoi, not atol/strtol; there, CPU cycles saved. Next do you actually need to convert strings that might look like 13.288 to integers, or can you assume that all strings are 1 to three characters long? In that case, and for raw performance, maybe
inline unsigned char char2digit(const char *c) {
unsigned char v = *c - '0';
return (v<1 || v>9)? 0 : v;
}
inline signed char characters2number(const char *string)
{
size_t len = strnlen(string,4);
if(len < 1 || len > 3)
return -1;
signed char val = 0;
signed char power_of_ten = 1;
for(unsigned char idx = 1; idx <= len; ++idx)
{
signed char val += power_of_ten * char2digit(string + len - idx)
power_of_ten *= 10;
}
return val;
}
I mean, if you're on a toaster. Otherwise atoi has your back. You might still want to check strnlen.

main (int argc, char **argv)
{
int i = 0;
if (argc > 1)
sscanf (argv [1], "%d", &i);
printf ("i = %d\n", i);
}

c - Avoid if in loop

Context
Debian 64.
Core 2 duo.
Fiddling with a loop. I came with different variations of the same loop but I would like to avoid conditional branching if possible.
But, even if I think it will be difficult to beat.
I thought about SSE or bit shifting but still, it would require a jump (look at the computed goto below). Spoiler : a computed jump doesn't seems to be the way to go.
The code is compiled without PGO. Because on this piece of code, it makes the code slower..
flags :
gcc -march=native -O3 -std=c11 test_comp.c
Unrolling the loop didn't help here..
63 in ascii is '?'.
The printf is here to force the code to execute. Nothing more.
My need :
A logic to avoid the condition. I assume this as a challenge to make my holydays :)
The code :
Test with the sentence. The character '?' is guaranteed to be there but at a random position.
hjkjhqsjhdjshnbcvvyzayuazeioufdhkjbvcxmlkdqijebdvyxjgqddsyduge?iorfe
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char **argv){
/* This is quite slow. Average actually.
Executes in 369,041 cycles here (cachegrind) */
for (int x = 0; x < 100; ++x){
if (argv[1][x] == 63){
printf("%d\n",x);
break;
}
}
/* This is the slowest.
Executes in 370,385 cycles here (cachegrind) */
register unsigned int i = 0;
static void * restrict table[] = {&&keep,&&end};
keep:
++i;
goto *table[(argv[1][i-1] == 63)];
end:
printf("i = %d",i-1);
/* This is slower. Because of the calculation..
Executes in 369,109 cycles here (cachegrind) */
for (int x = 100; ; --x){
if (argv[1][100 - x ] == 63){printf("%d\n",100-x);break;}
}
return 0;
}
Question
Is there a way to make it faster, avoiding the branch maybe ?
The branch miss is huge with 11.3% (cachegrind with --branch-sim=yes).
I cannot think it is the best one can achieve.
If some of you manage assembly with enough talent, please come in.

Assuming you have a buffer of well know size being able to hold the maximum amount of chars to test against, like
char buffer[100];
make it one byte larger
char buffer[100 + 1];
then fill it with the sequence to test against
read(fileno(stdin), buffer, 100);
and put your test-char '?' at the very end
buffer[100] = '?';
This allows you for a loop with only one test condition:
size_t i = 0;
while ('?' != buffer[i])
{
++i;
}
if (100 == i)
{
/* test failed */
}
else
{
/* test passed for i */
}
All other optimisation leave to the compiler.
However I couldn't resist, so here's a possible approach to do micro optimisation
char buffer[100 + 1];
read(fileno(stdin), buffer, 100);
buffer[100] = '?';
char * p = buffer;
while ('?' != *p)
{
++p;
}
if ((p - buffer) == 100)
{
/* test failed */
}
else
{
/* test passed for (p - buffer) */
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

strtok returns too many strings - c

when you write this strncpy(my_friends, p, len) ; you are not ensuring that my_friends ends with a \0 try strncpy(my_friends, p, len)[len-1] = '\0'; alt. use calloc to allocate my_friends

Related

Integer data compression for transfer in C without external libraries

sprintf raw bytes to string in C?

WinDBG conditional breakpoint based on string arguments

How to get an integer from stdin? Fast. Can trade accuracy for performance

c - Avoid if in loop

Categories

Resources