strtok disappearing when returning -1 - c

So I'm writing code to put strings into arrays and it's working perfectly, however I want it to terminate the reading of the strings when I hit a ## in the file. I'm running a loop and parsing the strings line by line. Within my string parser I put a loop to check for the ##. It's at the very end of my parser function and it goes:
for (i = 0; i < strlen(line)); i++)
{
if ((buffer[i] == '#') && (buffer[i+1] == '#'))
{
return -1;
}
}
The problem is that when it hits the line with the ## at the end it doesn't parse the string into my array. It seems like it's just ignoring the code before this loop.
As additional information I'm using strtok to put the tokens in positions in my char* array before this for loop.
EDIT: Here's my parseString function:
int parseString(char* line, char*** inString)
{
char* buffer;
int Token, i;
buffer = (char*) malloc(strlen(line) * sizeof(char));
strcpy(buffer,line);
(*inString) = (char**) malloc(MAX_TOKS * sizeof(char**));
Token = 0;
(*inString)[Token++] = strtok(buffer, DELIMITERS);
while ((((*inString)[token] = strtok(NULL, DELIMITERS)) != NULL) && (Token < MAX_TOKS))
Token++;
for(i=0; i<strlen(line); i++)
{
if ((buffer[i] == '#') && (buffer[i+1] == '#'))
{
return -1;
}
}
return Token;
}

First of all, you are reading out of bounds on an array, because array[-1] is not good. Secondly, use a variable to hold the string length, as the way you do it causes the for loop to re-evaluate strlen(line) for each iteration.
Now, for your problem, it seems like you're putting it before the code that adds it to an array. If you could give us a bit more code, that would help.

Insufficient buffer allocation
// buffer = (char*) malloc(strlen(line) * sizeof(char));
buffer = malloc(strlen(line) + 1); // +1 for the \0
strcpy(buffer,line);
Memory Leak
The allocated 'buffer' may be lost. The *inString array_ have a pointer to the beginning of 'buffer', allowing it to be freed in the calling routine, but that is iffy. Suggest using first element of *inString to save that buffer explicitly.
Algorithm hole
(*inString)[token-1] == NULL should be asserted before for().
O(n*n) via strlen()
Suggestion:
// for(i=0; i<strlen(line); i++)
int length = strlen(line); // `length` should be used in `malloc()` too.
for(i=0; i<length; i++)
OP's early edit approach was almost OK
Just needed to start indexing at 1, rather than 0. No need to test every index i of line, but (length-1). So (i = 1; i<length; i++) or (i = 0; i<length-1; i++).
// for (i = 0; i < strlen(line)); i++) {
int length = strlen(line);
for (i = 1; i<length; i++) { // start at 1
if ((buffer[i-1] == '#') && (buffer[i] == '#')) {
return -1;
}
}
For better assistance, recommend OP provide sample line, line with the ## at the end, MAX_TOKS and DELIMITERS.

Related

Can someone please explain why I'm getting a seg fault error

This code compiles successfully but when I debug it shows a SIGSEV seg fault error. Can someone help please?
char *_strdup(char *str)
{
int i, size = 0;
char *mp;
if (str == NULL)
{
return (NULL);
}
for (; str[size] != '0'; size++)
mp = malloc(size * sizeof(str) + 1);
/* + 1 to get last part of the str */
if (mp == 0)
{
return (NULL);
}
else
{
for (; i < size; i++)
{
mp[i] = str[i];
}
}
return (mp);
}
First, just because it compiles successfully, this does not mean that your code is correct. It just means that syntactically the compiler is fine. I hope you use the maximum warning level and correct your code until all warnings and errors are gone.
You have multiple problems:
You seem to look for the terminating end-of-string marker. But instead of the correct '\0' you typed '0'. This can lead to a much too big size, depending where a zero digit is found. Depending on your system, a segmentation fault is also possible.
sizeof is an operator that yields the size of its argument, in your case the size of a pointer. str is of type char *. Effectively you allocate too much, but this is harmless.
The for loop uses the memory allocation as its body. I'm sure you didn't mean this, but there is no empty statement. So you are allocating multiple memory spaces, which are leaks in the end.
An empty statement is a single semicolon or an empty pair of curly braces.
What you most probably want to achieve is to find the number of characters that str points to. You can get it by calling strlen(str).
i is not initialized, it can have any value. This can lead to a segmentation fault, if it starts with a negative value.
You did not add the end-of-string marker in the duplicate. Depending on the other code we don't see, this can lead to segmentation faults.
This is a possible solution without calling strlen():
char *_strdup(const char *str)
{
int i;
int size;
char *mp;
if (str == NULL)
{
return NULL;
}
for (size = 0; str[size] != '\0'; size++)
{
/* just looking for the end of the string */
}
size++;
/* + 1 for the end-of-string marker */
mp = malloc(size);
if (mp == NULL)
{
return NULL;
}
for (i = 0; i < size; i++)
{
mp[i] = str[i];
}
return mp;
}
I made a bit more:
Use separate variable definitions, it avoid errors and eases maintenance.
return is not a function and needs no parentheses for its expression.
Put the initialization of the index variable where it belongs, in the initializing statement of for. This way everything about this index is at one place.
Consider the end-of-string marker by incrementing size. This eases the following code.
Since sizeof (char) is 1, it can be ommitted at the calculation of the needed memory size.
Compare mp with NULL instead of 0. It is a pointer, and this is C, not C++.
Your variable i has been declared but not initialized so a random number is used in your for(; i < size;
Just add int i = 0, size = 0; at the beginning or change your for statement to for(i = 0; i < size; i++)
This was the reason for your segmentation fault. Some other issues:
As mentioned in comments string termination character is not '0'. It's either 0 or '\0'.
You are calling malloc on each iteration of your for statement. This causes memory leak.Just call it once after you got your string size right. This is fixed by putting a semicolon after the for.
Maybe something like this.
char *_strdup (char *str)
{
int i, size;
char *mp;
if (str == NULL)
{
return (NULL);
}
for (size = 0; str[size] != 0; size++);
mp = malloc (size * sizeof (str) + 1);
/* + 1 to get last part of the str */
if (mp == 0)
{
return (NULL);
}
else
{
for (i = 0; i < size; i++)
{
mp[i] = str[i];
}
}
return (mp);
}

buffer overrun while trying to link two strings together, why do I have this error?

(in C, using visual studio 2022 preview), I have to do a program that link two strings together. Here's what I did:
I wrote two for-loops to count characters of first string and second
string,
I checked (inside the link function if the pointers are null (first and second). If they are null, then "return NULL".
I created "char *result". this is a new string and this is the string to be returned. I allocated enough memory to store nprime, nsecond, and 1 more character (the zero terminator). I used a malloc.
then, I checked if result is null. if it's null then "return NULL".
then, I wrote 2 for-loops to perform the linking between the first string and the second string. And here I got a compiler warning (because I think it's in compile time not in debug time). buffer overrun, the writable size is
"nprime+nsecond+1" but 2 bytes might be written.
my theory is that the program is trying to write outside the result-array, so there could be a loss of data, I tried to edit my code, therefore I write "nprime+nsecond+2" instead but it doesn't work, and it keeps showing me the same buffer overrun error.
#include <stdlib.h>
char* link( const char* first, const char* second) {
size_t nprime = 0;
size_t nsecond = 0;
if (first == NULL) {
return NULL;
}
if (second == NULL) {
return NULL;
}
for (size_t i = 0; first[i] < '\0'; i++) {
nprime++;
}
for (size_t i = 0; second[i] < '\0'; i++) {
nsecond++;
}
char* result = malloc(nprime + nsecond + 1);
if (result == NULL) {
return NULL;
}
for (size_t i = 0; i < nprime; i++) {
result[i] = first[i];
}
for (size_t i = 0; i < nsecond; i++) {
result[nprime + i] = second[i];
}
result[nprime + nsecond] = 0;
return result;
}
this is the main:
int main(void) {
char s1[] = "this is a general string ";
char s2[] = "this is a general test.";
char* s;
s = link(s1, s2);
return 0;
}
The warning is given due to the wrong conditions you defined in the first 2 for loops. The right loops should be as follows:
for (size_t i = 0; first[i] != '\0'; i++) {
nprime++;
}
for (size_t i = 0; second[i] != '\0'; i++) {
nsecond++;
}
With the conditions you defined (i.e. first[i] < '\0') you are just counting how many chars in the given string have an ASCII code lower than the ASCII code of \0 and exit the loop as soon as you find a char not fulfilling such condition.
Since '\0' has ASCII value 0, your nprime and nsecond are never incremented, leading to a malloc with insufficient room for the chars you actually need.

Odd behavior removing duplicate characters in a C string

I am using the following method in a program used for simple substitution-based encryption. This method is specifically used for removing duplicate characters in the encryption/decryption key.
The method is functional, as is the rest of the program, and it works for 99% of the keys I've tried. However, when I pass it the key "goodmorning" or any key consisting of the same letters in any order (e.g. "dggimnnooor"), it fails. Further, keys containing more characters than "goodmorning" work, as well as keys with less characters.
I ran the executable through lldb with the same arguments and it works. I've cloned my repository on a machine running CentOS, and it works as is.
But I get no warnings or errors on compile.
//setting the key in main method
char * key;
key = removeDuplicates(argv[2]);
//return 1 if char in word
int targetFound(char * charArr, int num, char target){
int found = 0;
if(strchr(charArr,target))
found = 1;
return found;
}
//remove duplicate chars
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len * sizeof(char));
if (result == NULL)
errorHandler(2);
char ch;
int i;
int j;
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
}
}
return result;
}
Per request: if "feather" was passed in to this function the resulting string would be "feathr".
As R Sahu already said, you are not terminating your string with a NUL character. Now I'm not going to explain why you need to do this, but you always need to terminate your strings with a NUL character, which is '\0'. If you want to know why, head over here for a good explanation. However this is not the only problem with your code.
The main problem is that the function strchr that you are calling to find out if your result already contains some character expects you to pass a NUL terminated string, but your variable is not NUL terminated, because you keep appending characters to it.
To solve your problem, I would suggest you to use a map instead. Map all the characters you already used and if they aren't in the map add them both to the map and the result. This is simpler (no need to call strchr or any other function), faster (no need to scan all the string every time), and most importantly correct.
Here's a simple solution:
char *removeDuplicates(char *word){
char *result, *map, ch;
int i, j;
map = calloc(256, 1);
if (map == NULL)
// Maybe you want some other number here?
errorHandler(2);
// Add one char for the NUL terminator:
result = malloc(strlen(word) + 1);
if (result == NULL)
errorHandler(2);
for(i = 0, j = 0; word[i] != '\0'; i++) {
ch = word[i];
// Check if you already saw this character:
if(map[(size_t)ch] == 0) {
// If not, add it to the map:
map[(size_t)ch] = 1;
// And to your result string:
result[j] = ch;
j++;
}
}
// Correctly NUL terminate the new string;
result[j] = '\0';
return result;
}
Why does this work on other machines, but not on your machine?
You are being a victim of undefined behavior. Different compilers on different systems treat undefined behavior differently. For example, GCC may decide to not do anything in this particular case and make strchr just keep searching in the memory until it founds a '\0' character, and this is exactly what happens. Your program keeps searching for the NUL terminator and never stops because who knows where a '\0' could be in memory after your string? This is both dangerous and incorrect, because the program is not reading inside the memory reserved for it, so for example, another compiler could decide to stop the search there, and give you a correct result. This however is not something to take for granted, and you should always avoid undefined behavior.
I see couple of problems in your code:
You are not terminating the output with the null character.
You are not allocating enough memory to hold the null character when there are no duplicate characters in the input.
As a consequence, your program has undefined behavior.
Change
result = malloc (len * sizeof(char));
to
result = malloc (len+1); // No need for sizeof(char)
Add the following before the function returns.
result[j] = '\0';
The other problem, the main one, is that you are using strchr on result, which is not a null terminated string when you call targetFound. That also caused undefined behavior. You need to use:
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len+1);
if (result == NULL)
{
errorHandler(2);
}
char ch;
int i;
int j;
// Make result an empty string.
result[0] = '\0';
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
// Null terminate again so that next call to targetFound()
// will work.
result[j] = '\0';
}
}
return result;
}
A second option is to not use strchr in targetFound. Use num instead and implement the equivalent functionality.
int targetFound(char * charArr, int num, char target)
{
for ( int i = 0; i < num; ++i )
{
if ( charArr[i] == target )
{
return 1;
}
}
return 0;
}
That will allow you to avoid assigning the null character to result so many times. You will need to null terminate result only at the end.
char * removeDuplicates(char * word){
char * result;
int len = strlen(word);
result = malloc (len+1);
if (result == NULL)
{
errorHandler(2);
}
char ch;
int i;
int j;
for( i = 0, j = 0; i < len; i++){
ch = word[i];
if(!targetFound(result, i, ch)){
result[j] = ch;
j++;
}
}
result[j] = '\0';
return result;
}

Parsing character array to words held in pointer array (C-programming)

I am trying to separate each word from a character array and put them into a pointer array, one word for each slot. Also, I am supposed to use isspace() to detect blanks. But if there is a better way, I am all ears. At the end of the code I want to print out the content of the parameter array.
Let's say the line is: "this is a sentence". What happens is that it prints out "sentence" (the last word in the line, and usually followed by some random character) 4 times (the number of words). Then I get "Segmentation fault (core dumped)".
Where am I going wrong?
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
for(i = 0; i < 120; i++)
{
if(line[i] == '\0')
break;
else if(!isspace(line[i]))
{
buffer[k] = line[i];
k++;
}
else if(isspace(line[i]))
{
buffer[k+1] = '\0';
param[j] = buffer; // Puts word into pointer array
j++;
k = 0;
}
else if(j == 21)
{
param[j] = NULL;
break;
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
There are many little problems in this code :
param[j] = buffer; k = 0; : you rewrite at the beginning of buffer erasing previous words
if(!isspace(line[i])) ... else if(isspace(line[i])) ... else ... : isspace(line[i]) is either true of false, and you always use the 2 first choices and never the third.
if (line[i] == '\0') : you forget to terminate current word by a '\0'
if there are multiple white spaces, you currently (try to) add empty words in param
Here is a working version :
int split_line(char line[120])
{
char *param[21]; // Here I want to put one word for each slot
char buffer[120]; // Word buffer
int i; // For characters in line
int j = 0; // For param words
int k = 0; // For buffer chars
int inspace = 0;
param[j] = buffer;
for(i = 0; i < 120; i++) {
if(line[i] == '\0') {
param[j++][k] = '\0';
param[j] = NULL;
break;
}
else if(!isspace(line[i])) {
inspace = 0;
param[j][k++] = line[i];
}
else if (! inspace) {
inspace = 1;
param[j++][k] = '\0';
param[j] = &(param[j-1][k+1]);
k = 0;
if(j == 21) {
param[j] = NULL;
break;
}
}
}
i = 0;
while(param[i] != NULL)
{
printf("%s\n", param[i]);
i++;
}
return 0;
}
I only fixed the errors. I leave for you as an exercise the following improvements :
the split_line routine should not print itself but rather return an array of words - beware you cannot return an automatic array, but it would be another question
you should not have magic constants in you code (120), you should at least have a #define and use symbolic constants, or better accept a line of any size - here again it is not simple because you will have to malloc and free at appropriate places, and again would be a different question
Anyway good luck in learning that good old C :-)
This line does not seems right to me
param[j] = buffer;
because you keep assigning the same value buffer to different param[j] s .
I would suggest you copy all the char s from line[120] to buffer[120], then point param[j] to location of buffer + Next_Word_Postition.
You may want to look at strtok in string.h. It sounds like this is what you are looking for, as it will separate words/tokens based on the delimiter you choose. To separate by spaces, simply use:
dest = strtok(src, " ");
Where src is the source string and dest is the destination for the first token on the source string. Looping through until dest == NULL will give you all of the separated words, and all you have to do is change dest each time based on your pointer array. It is also nice to note that passing NULL for the src argument will continue parsing from where strtok left off, so after an initial strtok outside of your loop, just use src = NULL inside. I hope that helps. Good luck!

C - Array of Char Arrays

Im trying to work with the example in the K and R book for this topic, but struggling.
I want an array of Char Arrays, whereby each element of the 'Father' Array points to an array of characters (string). Basically, I am reading from a file, line at a time, storing each line into an array, and then trying to store that array, into another array, which I can then sort via qsort.
But I can't seem to get anywhere with this! Anyhelp on my code is much appreciated, i.e. where to go from where I am!
EDIT: The problem is, the printing function isn't printing out my words that should be within the array of arrays, instead its just printing garbage, the main problem is, I'm not sure whether i am de-referencing things correctly, or not at all, whether I am adding it to the array of arrays correctly etc.
Regards.
#define MAXLINES 5000 /* max no. lines to be stored */
#define MAXLEN 1000 /* max length of single line */
char *lineptr[MAXLINES];
void writelines(char *lineptr[], int nlines);
int main(int argc, char *argv[]) {
int nlines = 0, i, j, k;
char line[MAXLEN];
FILE *fpIn;
fpIn = fopen(argv[1], "rb");
while((fgets(line, 65, fpIn)) != NULL) {
j = strlen(line);
if (j > 0 && (line[j-1] == '\n')) {
line[j-1] = '\0';
}
if (j > 8) {
lineptr[nlines++] = line;
}
}
for(i = 0; i < nlines; i++)
printf("%s\n", lineptr[i] );
return 0;
}
A problem is that line[MAXLEN] is an automatic variable, and so each time through the while loop it refers to the same array. You should dynamically allocate line each time through the while loop (line = calloc(MAXLEN, sizeof(char)) before calling fgets). Otherwise fgets always writes to the same memory location and lineptr always points to the same array.
Dan definitely found one error, the identical storage. But I think there are more bugs here:
while((fgets(line, 65, fpIn)) != NULL) {
Why only 65? You've got MAXLEN space to work with, you might as well let your input be a bit longer.
j = strlen(line);
if (j > 0 && (line[j-1] == '\n')) {
line[j-1] = '\0';
}
if (j > 8) {
lineptr[nlines++] = line;
}
}
Why exactly j > 8? Are you supposed to be throwing away short lines? Don't forget to deallocate the memory for the line in this case, once you've moved to the dynamic allocation that Dan suggests.
Update
ott recommends strdup(3) -- this would be easy to fit into your existing system:
while((fgets(line, 65, fpIn)) != NULL) {
j = strlen(line);
if (j > 0 && (line[j-1] == '\n')) {
line[j-1] = '\0';
}
if (j > 8) {
lineptr[nlines++] = strdup(line);
}
}
Dan recommended calloc(3), that would be only slightly more work:
line = calloc(MAXLINE, sizeof char);
while((fgets(line, 65, fpIn)) != NULL) {
j = strlen(line);
if (j > 0 && (line[j-1] == '\n')) {
line[j-1] = '\0';
}
if (j > 8) {
lineptr[nlines++] = line;
line = calloc(MAXLINE, sizeof char);
}
}
Of course, both these approaches will blow up if the memory allocation fails --
checking error returns from memory allocation is always a good idea. And
there's something distinctly unbeautiful about the second mechanism.

Resources