Using strcat to append spaces. Compiles but overwrites string - c

The language I am working in is C.
I am trying to use a mix of built in c string functions in order to take a list of tokens (space separated) and "convert" it into a list of tokens that is split by quotations.
A string like
echo "Hello 1 2 3 4" test test2
gets converted to
[echo] ["Hello] [1] [2] [3] [4"] [test] [test2]
I then use my code (at bottom) to attempt to convert it into something like
[echo] [Hello 1 2 3 4] [test] [test2]
For some reason the second 'token' in the quoted statement gets overridden.
Here's a snippet of the code that runs over the token list and converts it to the new one.
88 for (int i = 0; i < counter; i++) {
89 if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 0)) {
90 inQuotes = 1;
91 tokenizedQuoted[quoteCounter] = tokenized[i];
92 strcat(tokenizedQuoted[quoteCounter]," ");
93 } else if ( (strstr(tokenized[i],"\"") != NULL) && (inQuotes == 1)) {
94 inQuotes = 0;
95 strcat(tokenizedQuoted[quoteCounter],tokenized[i]);
96 quoteCounter++;
97 } else {
98 if (inQuotes == 0) {
99 tokenizedQuoted[quoteCounter] = tokenized[i];
100 quoteCounter++;
101 } else if (inQuotes == 1) {
102 strcat(tokenizedQuoted[quoteCounter], tokenized[i]);
103 strcat(tokenizedQuoted[quoteCounter], " ");
104 }
105 }
106
107 }

In short, adding an space to a char * means that the memory pointed by it needs more bytes. Since you do not provide it, you are overwritting the first byte of the following "word" with \0, so the char * to it is interpreted as the empty string. Note that writting to a location that has not been reserved is an undefined behavior, so really ANYTHING could happen (from segmentation fault to "correct" results with no errors).
Use malloc to create a new buffer for the expanded result with enough bytes for it (do not forget to free the old buffers if they were malloc'd).

Related

Reading data from a text file line by line into arrays using strtok in C

Currently trying to read data from a text file line by line using strtok and a space as a delimiter and save the info into different arrays. Im using the FatFs library to read the file from an sd card. Atm im only trying to read the first 2 elements from the line.
My text file looks like this:
223 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
221 895 200 200 87 700 700 700
222 895 200 200 87 700 700 700
My current code is something like this:
void sd_card_read()
{
char buffer[30];
char buffer2[10];
char buffer3[10];
int i=0;
int k=0;
int l=0;
int16 temp_array[500];
int16 hum_array[500];
char *p;
FIL fileO;
uint8 resultF;
resultF = f_open(&fileO, "dados.txt", FA_READ);
if(resultF == FR_OK)
{
UART_UartPutString("Reading...");
UART_UartPutString("\n\r");
while(f_gets(buffer, sizeof(buffer), &fileO))
{
p = strtok(buffer, " ");
temp_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
p = strtok(NULL, " ");
hum_array[i] = atoi(p);
UART_UartPutString(p);
UART_UartPutString("\r\n");
i++;
}
UART_UartPutString("Done reading");
resultF = f_close(&fileO);
}
UART_UartPutString("Printing");
UART_UartPutString("\r\n");
for (k = 0; k < 10; k++)
{
itoa(temp_array[k], buffer2, 10);
UART_UartPutString(buffer2);
UART_UartPutString("\r\n");
}
for (l = 0; l < 10; l++)
{
itoa(hum_array[l], buffer3, 10);
UART_UartPutString(buffer3);
UART_UartPutString("\r\n");
}
}
The output atm is this:
223
0
222
0
etc..
895
0
895
0
etc..
After reading one time it puts the next position the value of 0 in both arrays, which is not what is wanted. Its probably something basic but cant see what is wrong.
Any help is valuable!
If we take the first line of the file
223 895 200 200 87 700 700 700
That lines is, including space and newline (assuming single '\n') 31 characters long. And since strings in C needs to be terminated by '\0' the line requires at least 32 characters (if f_gets works similar to the standard fgets function, and adds the newline).
Your buffer you read into only fits 30 characters, which means only 29 characters of your line would be read and then the terminator added. So that means you only read
223 895 200 200 87 700 700 70
The next time you call f_gets the function will read the remaining
0
You need to increase the size of the buffer to be able to fit all of the line. With the current data it needs to be at least 32 characters. But be careful since an extra character in one of the lines will give you the same problem again.

strstr() function returns the address 0x0000

I'm trying to check whether (and where) a substring ("DATA") is located in a big string (located in a buffer - linearBuffer) by strstr() function, but it doesn't seem to work and I don't know why eventhough my source string (located in the linearBuffer) in null terminated.
What really happended is that a ringbuffer (buf) fills with characters for every USART interrupt. Then, in some point of the code its content copied into a linear buffer (through ringBuff_to_linearBuff()) and I apply the strstr() function on it in order to find a wanted substring. The value that I get when the function strstr() returns is the value 244 and not the location of the substring eventhough I know its there from setting a breakpoint
** Note that my code is spread on many files so I tried to gather all question related code together.
#include <string.h>
#define BUFFER_SIZE 400
#define LINEAR_BUFFER_SIZE (BUFFER_SIZE+1)
#define WIFI_CMD_DATA "DATA"
typedef RingBuff_Data_t uint8_t;
typedef struct
{
RingBuff_Data_t Buffer[BUFFER_SIZE]; /**< Internal ring buffer data, referenced by the buffer pointers. */
RingBuff_Data_t* In; /**< Current storage location in the circular buffer */
RingBuff_Data_t* Out; /**< Current retrieval location in the circular buffer */
} RingBuff_t;
volatile RingBuff_t buf;
uint8_t linearBuffer[LINEAR_BUFFER_SIZE]="";
static inline void RingBuffer_Insert(RingBuff_t* const Buffer, const RingBuff_Data_t Data)
{
*Buffer->In = Data;
if (++Buffer->In == &Buffer->Buffer[BUFFER_SIZE])
Buffer->In = Buffer->Buffer;
ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
{
Buffer->Count++;
}
}
ISR(USART1_RX_vect)
{
//code to be executed when the rx pin of the USART receives a char
uint8_t c = UDR_N;
if (c != '\n')
RingBuffer_Insert(&buf,c);
else
RingBuffer_Insert(&buf,'\0');
}
void ringBuff_to_linearBuff(uint8_t linearBuffer[])
{
memset(linearBuffer,0,LINEAR_BUFFER_SIZE);
RingBuff_Data_t* tempIn = buf.In;
if (buf.Out < tempIn){
memcpy(linearBuffer, buf.Out, tempIn - buf.Out);
}
else if (buf.Out > tempIn){
size_t s1 = buf.Buffer + BUFFER_SIZE - buf.Out;
size_t s2 = buf.In - buf.Buffer;
memcpy(linearBuffer, buf.Out, s1);
memcpy(linearBuffer + s1, buf.Buffer, s2);
}
}
void main ()
{
uint8_t* linearBufferp;
while (1)
{
if (buf.Out != buf.In)
{
ringBuff_to_linearBuff(linearBuffer);
linearBufferp = strstr(linearBuffer, WIFI_CMD_DATA); // Checking if a new DATA msg from a client had arrived
if (linearBufferp != NULL)
{
//do something
}
}
}
}
debugging
When strstr returns NULL it means that it didn't find the substring (and when that happens, the value it points to have no meaning at all, so forget about the 224).
So your question should be:
Why doesn't strstr find my substring?
When looking at the debug picture you posted, your linearBuffercontains:
13
0
13
0
43
....
....
68 <---- This is what you want to find
65
....
However, there are multiple strings in your buffer:
13 <----- Start of first string
0 <----- End of first string
13 <----- Start of second string
0 <----- End of second string
43 <----- Start of thrid string
....
....
68 <---- This is what you want to find
65
....
strstr will only search the first string. When strstr sees the first 0 (index [1]), it returns NULL because it didn't find what it was looking for.
In other words - strstr never looks at the part of the buffer where the match is. It returns long before that.
So what's wrong with your code?
It is hard to say since you haven't posted a complete code base. So this is a guess. I think you receive a number of "newlines" in the form:
13 10 13 10
before the message. So you receive:
13 10 13 10 43 ...... 68 65 .....
Your ISR turns the 10 into 0 so the buffer becomes
13 0 13 0 43 ...... 68 65 .....
which is 3 strings instead of 1 string.
What to do?
Well, there could be several different solutions. The correct depends on your system requirements. A simple solution would be to skip the extra 13 0 before calling strstr. Something like:
ringBuff_to_linearBuff(linearBuffer);
// Skip "13 0"
while (*linearBuffer == 13 && *(linearBuffer+1) == 0)
{
linearBuffer += 2;
}
linearBufferp = strstr(linearBuffer, WIFI_CMD_DATA);
Note: You should add some range check also so that linearBuffer isn't incremented so much that you read out of bounds

Having issues iterating through machine code

I'm attempting to recreate the wc command in c and having issues getting the proper number of words in any file containing machine code (core files or compiled c). The number of logged words always comes up around 90% short of the amount returned by wc.
For reference here is the project info
Compile statement
gcc -ggdb wordCount.c -o wordCount -std=c99
wordCount.c
/*
* Author(s) - Colin McGrath
* Description - Lab 3 - WC LINUX
* Date - January 28, 2015
*/
#include<stdio.h>
#include<string.h>
#include<dirent.h>
#include<sys/stat.h>
#include<ctype.h>
struct counterStruct {
int newlines;
int words;
int bt;
};
typedef struct counterStruct ct;
ct totals = {0};
struct stat st;
void wc(ct counter, char *arg)
{
printf("%6lu %6lu %6lu %s\n", counter.newlines, counter.words, counter.bt, arg);
}
void process(char *arg)
{
lstat(arg, &st);
if (S_ISDIR(st.st_mode))
{
char message[4056] = "wc: ";
strcat(message, arg);
strcat(message, ": Is a directory\n");
printf(message);
ct counter = {0};
wc(counter, arg);
}
else if (S_ISREG(st.st_mode))
{
FILE *file;
file = fopen(arg, "r");
ct currentCount = {0};
if (file != NULL)
{
char holder[65536];
while (fgets(holder, 65536, file) != NULL)
{
totals.newlines++;
currentCount.newlines++;
int c = 0;
for (int i=0; i<strlen(holder); i++)
{
if (isspace(holder[i]))
{
if (c != 0)
{
totals.words++;
currentCount.words++;
c = 0;
}
}
else
c = 1;
}
}
}
currentCount.bt = st.st_size;
totals.bt = totals.bt + st.st_size;
wc(currentCount, arg);
}
}
int main(int argc, char *argv[])
{
if (argc > 1)
{
for (int i=1; i<argc; i++)
{
//printf("%s\n", argv[i]);
process(argv[i]);
}
}
wc(totals, "total");
return 0;
}
Sample wc output:
135 742 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
231 1189 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
24 224 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
39 247 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
41 230 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11508 85256 1324239 total
Sample rebuild output (./wordCount):
139 76 360448 /home/cpmcgrat/53/labs/lab-2/core.22321
233 493 192512 /home/cpmcgrat/53/labs/lab-2/core.26554
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/file
25 3 12494 /home/cpmcgrat/53/labs/lab-2/frequency
45 116 869 /home/cpmcgrat/53/labs/lab-2/frequency.c
5372 40960 365441 /home/cpmcgrat/53/labs/lab-2/lineIn
12 50 1013 /home/cpmcgrat/53/labs/lab-2/lineIn2
0 0 0 /home/cpmcgrat/53/labs/lab-2/lineOut
40 6 11225 /home/cpmcgrat/53/labs/lab-2/parseURL
138 318 2151 /home/cpmcgrat/53/labs/lab-2/parseURL.c
42 3 10942 /home/cpmcgrat/53/labs/lab-2/roman
66 162 1164 /home/cpmcgrat/53/labs/lab-2/roman.c
13 13 83 /home/cpmcgrat/53/labs/lab-2/romanIn
13 39 169 /home/cpmcgrat/53/labs/lab-2/romanOut
7 6 287 /home/cpmcgrat/53/labs/lab-2/URLs
11517 83205 1324239 total
Notice the difference in the word count (second int) from the first two files (core files) as well as the roman file and parseURL files (machine code, no extension).
C strings do not store their length. They are terminated by a single NUL (0) byte.
Consequently, strlen needs to scan the entire string, character by character, until it reaches the NUL. That makes this:
for (int i=0; i<strlen(holder); i++)
desperately inefficient: for every character in holder, it needs to count all the characters in holder in order to test whether i is still in range. That transforms a simple linear Θ(N) algorithm into an Θ(N2) cycle-burner.
But in this case, it also produces the wrong result, since binary files typically include lots of NUL characters. Since strlen will actually tell you where the first NUL is, rather than how long the "line" is, you'll end up skipping a lot of bytes in the file. (On the bright side, that makes the scan quadratically faster, but computing the wrong result more rapidly is not really a win.)
You cannot use fgets to read binary files because the fgets interface doesn't tell you how much it read. You can use the Posix 2008 getline interface instead, or you can do binary input with fread, which is more efficient but will force you to count newlines yourself. (Not the worst thing in the world; you seem to be getting that count wrong, too.)
Or, of course, you could read the file one character at a time with fgetc. For a school exercise, that's not a bad solution; the resulting code is easy to write and understand, and typical implementations of fgetc are more efficient than the FUD would indicate.

C reading file using ./a.out<filename and how to stop reading

In my class today we were assigned a project that involves reading in a file using the ./a.out"<"filename command. The contents of the file look like this
16915 46.25 32 32
10492 34.05 56 52
10027 98.53 94 44
13926 32.94 19 65
15736 87.67 5 1
16429 31.00 58 25
15123 49.93 65 38
19802 37.89 10 20
-1
but larger
My issue is that any scanf used afterwards is completely ignored and just scans in what looks like garbage when printed out, rather than taking in user input. In my actual program this is causing an issue with a menu that requires input.
How do I get the program to stop reading the file provided by the ./a.out"<"filename command?
also I stop searching at -1 rather than EOF for the sake of not having an extra set of array data starting with -1
ex
-1 0 0 0
in my real program the class size is a constant that is adjustable and is used to calculate class averages, I'd rather not have a set of 0's skewing that data.
#include <stdio.h>
int main(void)
{
int i = 0,j = 1,d,euid[200],num;
int tester = 0;
float hw[200],ex1[200],ex2[200];
while(j)
{
scanf("%d",&tester);
if( tester == -1)
{
j = 0;
}
else
{
euid[i] = tester;
}
scanf("%f",hw+i);
scanf("%f",ex1+i);
scanf("%f",ex2+i);
i++;
}
for(d = 0;d < 50;d++) /*50 because the actual file size contains much more than example*/
{
printf("euid = %d\n",euid[d]);
printf("hw = %f\n",hw[d]);
printf("ex1 = %f\n",ex1[d]);
printf("ex2 = %f\n",ex2[d]);
}
printf("input something user\n");
scanf("%d",&num);
printf("This is what is being printed out -> %d\n",num);
return 0;
}
I'm having the exact same problem. Tried every method I could find to eat the remaining input in the buffer, but it never ends.
Got it to work using fopen and fscanf, but the prof. said he prefers the code using a.out < filename
Turns out this is in fact not possible.

strtok() appends some character to my string

I'm using strtok() to parse a string I get from fgets() that is separated by the ~ character
e.g. data_1~data_2
Here's a sample of my code:
fgets(buff, LINELEN, stdin);
pch = strtok(buff, " ~\n");
//do stuff
pch = strtok(NULL, " ~\n");
//do stuff
The first instance of strtok breaks it apart fine, I get data_1 as is, and strlen(data_1) provides the correct length of it. However, the second instance of strtok returns the string, with something appended to it.
With an input of andrewjohn ~ jamessmith, I printed out each character and the index, and I get this output:
a0
n1
d2
r3
e4
w5
j6
o7
h8
n9
j0
a1
m2
e3
s4
s5
m6
i7
t8
h9
10
What is that "11th" value corresponding to?
EDIT:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char buff[100];
char * pch;
fgets(buff, 100, stdin);
pch = strtok(buff, " ~\n");
printf("FIRST NAME\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
printf("SECOND NAME\n");
pch = strtok(NULL, " ~\n");
for(i = 0; i < strlen(pch); i++)
{
printf("%c %d %d\n", *(pch+i), *(pch+i), i);
}
}
I ran it by:
cat sample.in | ./myfile
Where sample.in had
andrewjohn ~ johnsmith
Output was:
FIRST NAME
a 97 0
n 110 1
d 100 2
r 114 3
e 101 4
w 119 5
j 106 6
o 111 7
h 104 8
n 110 9
SECOND NAME
j 106 0
o 111 1
h 104 2
n 110 3
s 115 4
m 109 5
i 105 6
t 116 7
h 104 8
13 9
So the last character is ASCII value 13, which says it's a carriage return ('\r'). Why is this coming up?
Based on your edit, the input line ends in \r\n. As a workaround you could just add \r to your list of tokens in strtok.
However, this should be investigated further. \r\n is the line ending in a Windows file, but stdin is a text stream, so \r\n in a file would be converted to just \n in the fgets result.
Are you perhaps piping in a file that contains something weird like \r\r\n ? Try hex-dumping the file you're piping in to check this.
Another possible explanation might be that your Cygwin (or whatever) environment has somehow been configured not to translate line endings in a file piped in.
edit: Joachim's suggestion is much more likely - using a \r\n file on a non-Windows system. If this is the case , you can fix it by running dos2unix on the file. But in accordance with the principle "accept everything, generate correctly" it would be useful for your program to handle this file.

Resources