C extract some data from my string with sscanf or else

C extract some data from my string with sscanf or else - c

I am very poor in C, I just learning it. I have a string like:
a 322 4.1 5.2
(with whitespaces/tabs/spaces)
or
b 1.22 4.1 5.2 4.11
what is the way to get all the strings without whitespace
so:
string[0]="s";
string[1]="322";
string[2]="4.1";
etc...
edit
I just trying to find the best/fastest way to do it, for big line numbers. (70-100.000 strings)
Working on Android/galaxy s/linkedlist
test: 71.000 arrays took about 7-8 seconds with C++(without string/std), 14 sec with java

What the original poster asked using sscanf():
#include <stdio.h>
#include <string.h>
int main(){
// 5 elements, each of 32 bytes, 31 for characters the 32nd for storing \0
char string[5][32];
char* inputString="a 322 4.1 5.2";
memset(string,0,sizeof(string));//to initialize to NULL, always be safe on C
sscanf(inputString,"%s%s%s%s",string[0],string[1],string[2],string[3]);
printf("res0= %s\n",string[0]);
printf("res1= %s\n",string[1]);
printf("res2= %s\n",string[2]);
printf("res2= %s\n",string[3]);
return 0;
}
This will print:
res0= a
res1= 322
res2= 4.1
res2= 5.2

You can use strtok, as Martin Beckett said, which is recommended for portability. However, if your system has strsep available, I'd go with it. Its man page on BSD has the solution to your question in the examples section.
#include <string.h>
int main()
{
char input[] = " a 322 4.1 5.2";
char **ap, *argv[5], *inputstring = input;
for (ap = argv; (*ap = strsep(&inputstring, " \t")) != NULL;)
if (**ap != '\0')
if (++ap >= &argv[10])
break;
/* degubber output for `p argv':
*
* $1 = {
* 0x1001009a1 "a",
* 0x1001009ac "322",
* 0x1001009b1 "4.1",
* 0x1001009b7 "5.2",
* 0x0
* }
*/
}

Safest way to split a string (especially if you don't know what the string may contain) is strtok.
You might also need to check how you are creating the string[] array in 'C'

for(int i=0,j=0; str[i]; i++)
{
if(str[i]==' ')
continue;
str2[j]=str[i];
j++;
}
In the above code, str is your previous string and str1 is the new.

Related

Integer data compression for transfer in C without external libraries

I googled ans searched here a bunch without a fitting solution. The title is maybe a bit weird or not fully accurate, but let me explain:
My IoT device collects a bunch of data every second that I can represent as a list of integer. Here is an example of one row of sensor reads (the zeros are not always 0 btw):
230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0
On trigger I want to send the whole table (all rows until then) to my computer. I could just stringify it and concatenate everything, but wonder if there is a more efficient encoding/compression to reduce the byte count, both when storing in RAM/flash and for reduced transfer volume. Ideally this could be achieved with integrated functions, ie no external compression libraries. I am not that strong with encoding/compression, hope you can give me a hint.

Zlib/Zstd libraries are better suited for doing general purpose compression. If I may assume that you don't want to use any third party libraries, here is a hand coded version of some naive compression method, which saves half of the bytes of the input string.
The basic idea is very simple. Your strings will at most have 16 different characters which can be mapped to 4-bits rather than typical 8-bits. SEE THE ASSUMPTIONS BELOW. You can try base16, base64, base128 encodings too, but this is the simplest.
Assumptions:
First you'll convert all your numbers into a string in decimal format.
The string won't contain any other characters than 0,1,2,3,4,5,6,7,8,9,+,-,.,space, and a comma.
============================================================================
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static inline char map(char c)
{
switch(c) {
case ' ' : return ('/' - '*');
case '\0': return 0;
default : return c - '*';
}
return 0;
}
static inline char revmap(char c)
{
switch(c) {
case '\0' : return 0;
case '/' - '*': return ' ';
default : return c + '*';
}
return 0;
}
char *compress(const char *s, int len)
{
int i, j;
char *compr = malloc((len+1)/2 + 1);
j = 0;
for (i = 1; i < len; i += 2)
compr[j++] = map(s[i-1]) << 4 | map(s[i]);
if (i-1 < len)
compr[j++] = map(s[i-1]) << 4;
compr[j] = '\0';
return compr;
}
char *decompress(const char *s, int len)
{
int i, j;
char *decompr = malloc(2*len + 1);
for (i = j = 0; i < len; i++) {
decompr[j++] = revmap((s[i] & 0xf0) >> 4);
decompr[j++] = revmap(s[i] & 0xf);
}
decompr[j] = '\0';
return decompr;
}
int main()
{
const char *input = "230982 0 4294753011 -9 4294198951 -1 4294225518 0 0 0 524789 0 934585 0 4 0 0 0 0 ";
int plen = strlen(input);
printf("plain(len=%d): %s\n", plen, input);
char *compr = compress(input, plen);
int clen = strlen(compr);
char *decompr = decompress(compr, clen);
int dlen = strlen(decompr);
printf("decompressed(len=%d): %s\n", dlen, decompr);
free(compr);
free(decompr);
}

Simplest solution is to simply dump data out in binary form. It may be smaller or bigger than string form depending on your data, but you don't have to do any data processing on device.
If most of your data is small, you can use variable length data encoding for serialization. There are several, but CBOR is fairly simple.
If your data changes only very little, you could send only first row as absolute values, and remaining rows as delta of previous row. This would result in many small numbers, which typically are more efficient in previously mentioned encoding systems.
I wouldn't try to implement any general purpose compression algorithms without any experience and external libraries, unless you absolutely need it. Finding suitable algorithm that compresses your data well enough and with reasonable resource usage can be time consuming.

Computing websocket Sec-WebSocket-Accept value using libtomcrypt

RFC6455 specifies a method of computing the Sec-WebSocket-Accept response header from the value of the Sec-WebSocket-Key header. This method is based on SHA-1 hashing and Base64-encoding the result.
How can I implement this method in plain C using libtomcrypt for SHA-1 and Base64?
Note: This question intentionally does not show any effort because I immediately answered it myself. See below for my effort.

Here's a full compilable example that uses only libtomcrypt without any dynamic memory allocation and successfully computes the reference example from RFC6455:
//This file is licensed under CC0 1.0 Universal (public domain)
//Compile like this: gcc -o wsencodetest wsencodetest.c -ltomcrypt
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>
#include <tomcrypt.h>
#define SHA1_HASHSIZE 20
//Magic GUID as defined in RFC6455 section 1.3
static const char magicGUID[] = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
/**
* Compute the value of the Sec-WebSocket-Accept response header
* from the value of the Sec-WebSocket-Key header.
* #param key The whitespace or NUL terminated Sec-WebSocket-Key value
* #param out Where to store the base64-encoded output. Must provide 29 bytes of memory.
* The 29 bytes starting at out contain the resulting value (plus a terminating NUL byte)
*/
void computeWebsocketSecAccept(const char* key, char* dst) {
/**
* Determine start & length of key minus leading/trailing whitespace
* See RFC6455 section 1.3
*/
//Skip leading whitespace
while(isspace(*key)) {
key++;
}
//Determine key size.
size_t keySize = 0;
while(!isspace(key[keySize]) && key[keySize] != 0) {
keySize++;
}
//Compute SHA1 hash. See RFC6455 section 1.3
char hashOut[SHA1_HASHSIZE];
hash_state md;
sha1_desc.init(&md);
sha1_desc.process(&md, key, keySize);
sha1_desc.process(&md, magicGUID, sizeof(magicGUID));
sha1_desc.done(&md, hashOut);
//Encode hash to output buffer
size_t outlen = 29; //We know the output is 28 in size
base64_encode(hashOut, SHA1_HASHSIZE, dst, &outlen);
}
/**
* Usage example
*/
int main(int argc, char** argv) {
//Whitespace needs to be removed according to RFC6455
//Example from RFC6455
const char* key = " dGhlIHNhbXBsZSBub25jZQ== ";
char buf[29];
//Perform computation
computeWebsocketSecAccept(key, buf);
//Should print s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
printf("%s\n", buf);
}

Having issues iterating through machine code

I'm attempting to recreate the wc command in c and having issues getting the proper number of words in any file containing machine code (core files or compiled c). The number of logged words always comes up around 90% short of the amount returned by wc.
For reference here is the project info
Compile statement
gcc -ggdb wordCount.c -o wordCount -std=c99
wordCount.c
/*
* Author(s) - Colin McGrath
* Description - Lab 3 - WC LINUX
* Date - January 28, 2015
*/
#include<stdio.h>
#include<string.h>
#include<dirent.h>
#include<sys/stat.h>
#include<ctype.h>
struct counterStruct {
int newlines;
int words;
int bt;
};
typedef struct counterStruct ct;
ct totals = {0};
struct stat st;
void wc(ct counter, char *arg)
{
printf("%6lu %6lu %6lu %s\n", counter.newlines, counter.words, counter.bt, arg);
}
void process(char *arg)
{
lstat(arg, &st);
if (S_ISDIR(st.st_mode))
{
char message[4056] = "wc: ";
strcat(message, arg);
strcat(message, ": Is a directory\n");
printf(message);
ct counter = {0};
wc(counter, arg);
}
else if (S_ISREG(st.st_mode))
{
FILE *file;
file = fopen(arg, "r");
ct currentCount = {0};
if (file != NULL)
{
char holder[65536];
while (fgets(holder, 65536, file) != NULL)
{
totals.newlines++;
currentCount.newlines++;
int c = 0;
for (int i=0; i<strlen(holder); i++)
{
if (isspace(holder[i]))
{
if (c != 0)
{
totals.words++;
currentCount.words++;
c = 0;
}
}
else
c = 1;
}
}
}
currentCount.bt = st.st_size;
totals.bt = totals.bt + st.st_size;
wc(currentCount, arg);
}
}
int main(int argc, char *argv[])
{
if (argc > 1)
{
for (int i=1; i<argc; i++)
{
//printf("%s\n", argv[i]);
process(argv[i]);
}
}
wc(totals, "total");
return 0;
}
Sample wc output:
135 742 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
231 1189 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
24 224 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
39 247 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
41 230 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11508 85256 1324239 total
Sample rebuild output (./wordCount):
139 76 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
233 493 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
25 3 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
40 6 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
42 3 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11517 83205 1324239 total
Notice the difference in the word count (second int) from the first two files (core files) as well as the roman file and parseURL files (machine code, no extension).

C strings do not store their length. They are terminated by a single NUL (0) byte.
Consequently, strlen needs to scan the entire string, character by character, until it reaches the NUL. That makes this:
for (int i=0; i<strlen(holder); i++)
desperately inefficient: for every character in holder, it needs to count all the characters in holder in order to test whether i is still in range. That transforms a simple linear Θ(N) algorithm into an Θ(N2) cycle-burner.
But in this case, it also produces the wrong result, since binary files typically include lots of NUL characters. Since strlen will actually tell you where the first NUL is, rather than how long the "line" is, you'll end up skipping a lot of bytes in the file. (On the bright side, that makes the scan quadratically faster, but computing the wrong result more rapidly is not really a win.)
You cannot use fgets to read binary files because the fgets interface doesn't tell you how much it read. You can use the Posix 2008 getline interface instead, or you can do binary input with fread, which is more efficient but will force you to count newlines yourself. (Not the worst thing in the world; you seem to be getting that count wrong, too.)
Or, of course, you could read the file one character at a time with fgetc. For a school exercise, that's not a bad solution; the resulting code is easy to write and understand, and typical implementations of fgetc are more efficient than the FUD would indicate.

Trouble with populating student's marks

This is in response to a similar threat I posted the other day with reading a file into the requisite data structure with the file data like so, I can't remember who said it but yes there's four subjects. (I wanted to post an overall reply to all responses but could only comment on each post made):
131782 Mathematics 59
075160 Mathematics 92
580313 Physics 63
073241 Mathematics 32
487476 Mathematics 73
075160 Physics 98
472832 English 44
...
I'm using fscanf() now to parse the data and this is a much better approach. I made another thread yesterday about removing duplicate strings. I've scrapped that idea now and just used qsort on the student IDs and created a for loop that skips every four elements and rings the unique student IDs into the structure. I did a printf() command earlier and they're successfully stored. Now I've got the IDs stored I'm now ready to search for that ID and populate their marks and I "think" it's almost there except for a slight problem inside the update_student() function.
If you look at my code, or even compile it, it's not liking the line that's supposed to populate the mark for the student, student_data[idx].marks[buffer_subjects]=marks. But buffer_subjects is a string but if you look at my defines those strings are constants which is the whole idea when it gets to this stage.
How can I fix this?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define STUDENTS 20
#define COMPUTING 0
#define ENGLISH 1
#define MATHEMATICS 2
#define PHYSICS 3
#define SUBJECTS 4
#define ROWS 80
#define SIZE 100
int string_compare(void const *x, void const *y)
{
return strcmp(*(char**)x, *(char**)y);
}
struct student
{
char student_ID[SIZE];
int marks[SUBJECTS];
};struct student student_data[STUDENTS];
int find_student(char buffer_IDs[])
{
int j;
for(j=0;j<STUDENTS;j++)
if(strcmp(student_data[j].student_ID,buffer_IDs)==0)
return j;
}
void update_student(char buffer_IDs[], char buffer_subjects[], int marks[])
{
int idx = find_student(buffer_IDs);
student_data[idx].marks[buffer_subjects] = marks;
}
int main(void)
{
FILE *input;
int i,j, data_items;
char buffer_IDs[ROWS][SIZE];
char buffer_subject[ROWS][SIZE];
int marks[ROWS][SIZE];
char *string_ptrs[ROWS];
if((input=fopen("C:\\marks\\marks.txt","r"))==NULL)
perror("File open failed!");
else
{
for(i=0;i<ROWS;i++)
{
while((data_items=fscanf(input, "%s %s %d", buffer_IDs[i], buffer_subject[i], marks[i])!=3));
printf("%s %s %d\n", buffer_IDs[i], buffer_subject[i], *marks[i]);
string_ptrs[i]=buffer_IDs[i];
}
putchar('\n');
qsort(string_ptrs, sizeof(string_ptrs)/sizeof(char*), sizeof(char*), string_compare);
for(i=0;i<ROWS;i=i+4)
{
j=0;
strcpy(student_data[j].student_ID,string_ptrs[i]);
printf("%s\n",student_data[j].student_ID);
j++;
}
for(i=0;i<ROWS;i++)
update_student(buffer_IDs[i], buffer_subject[i], marks[i]);
}
return 0;
}
> Blockquote

There are numerous failures in this block of code. I would seriously recommend building it up block by block, and using a debugger to confirm that the data in each stage is as expected, and not proceeding till each of the lower level blocks work (and do not produce compile errors).
Running it through GDB with some sample data suggests problems even at the point of reading in and parsing data from the file on disk.
We can assist with individual issues as they arise from this approach.

Why am I getting different results from different skein hash APIs?

I've tried a few. Python's pyskein; a javascript skein calculator I found online somewhere; and the skein calculator being used for xkcd's april fools' comic all give the same output for a given input.
But when I download version 1.3 of the reference C source here I get different results. Worst of all, the results I get from the C API perfectly match the "known answer test" examples that come with the source code, so I assume I'm using it right.
My C code:
#include <stdio.h>
#include <stdlib.h>
#include "SHA3api_ref.h"
int main(int argc, const char * argv[])
{
const int BITS = 256; // length of hash in bits
const int LENGTH = 32; // length of data in bits
BitSequence *hashval = calloc(BITS/8, 1);
const BitSequence content[] = {0xC1, 0xEC, 0xFD, 0xFC};
Hash(BITS, content, LENGTH, hashval);
for (int i = 0; i < BITS/8; i++) {
printf("%02X", hashval[i]);
}
return 0;
}
result hex: 2638B1711F1346D08BF02B5D1A575CD924140A608512AF5B8E4475632599A896
Python code for the same hash on the same data:
import skein
print( skein.skein256(bytes([0xC1, 0xEC, 0xFD, 0xFC])).hexdigest() )
result hex: 07e785ce898fa5cfa22e15294481717935923985ea90f67fc65cb5b3cb718190
Note that the C answer is the expected answer according to the KAT_MCT/ShortMsgKAT_256.txt file that comes with the code. But pyskein gives results that everyone else seems to agree are correct. What am I missing?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C extract some data from my string with sscanf or else - c

Safest way to split a string (especially if you don't know what the string may contain) is strtok. You might also need to check how you are creating the string[] array in 'C'

for(int i=0,j=0; str[i]; i++) { if(str[i]==' ') continue; str2[j]=str[i]; j++; } In the above code, str is your previous string and str1 is the new.

Related

Integer data compression for transfer in C without external libraries

Computing websocket Sec-WebSocket-Accept value using libtomcrypt

Having issues iterating through machine code

Trouble with populating student's marks

Why am I getting different results from different skein hash APIs?

Categories

Resources