Parsing and dynamically allocating substrings with inconsistant sizes using sscanf - c

I implemented a read_line function:
#include<stdlib.h>
#include<stdio.h>
#include<stdbool.h>
char* read_line(){
const char UNIX_LINEBREAK = '\n';
const char WINDOWS_LINEBREAK = '\r';
const char C_STRING_TERMINATOR = '\0';
char extra_linebreak;
char current_letter;
char* line = NULL;
int position = 0;
bool reading_line = true;
while(reading_line){
scanf("%c", &current_letter);
if(current_letter == UNIX_LINEBREAK || current_letter == EOF){
reading_line = false;
}
else if(current_letter == WINDOWS_LINEBREAK) {
reading_line = false;
extra_linebreak = (char)getchar();
}
else {
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = current_letter;
position ++;
}
}
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = C_STRING_TERMINATOR;
return line;
}
Which I'm using for reading strings in the format:
operation number number
for example:
sum 13 13
However I'm implementing operations with numbers that may (and will) overflow the max int size. For example:
sum 23879238932898239832983298329839229383928329 239823983298392893289238932883290312803291832109230189
Which forces me to read them in a string format, parse them and finally work with them through a linked list (There may be better approaches but that's not the point yet). By now, I'm trying to use auxiliary buffers (operation, first_number_buffer and second_number_buffer) with sscanf for splitting the line read with read_line in three substrings.
#include <stdio.h>
#include <readline.h>
#include <stdio.h>
#include <readline.h>
int main (){
char* line = read_line();
char operation[4];
char* first_number_buffer;
char* second_number_buffer;
sscanf(line, "%s %s %s", operation, first_number_buffer, second_number_buffer);
printf("%s\n%s\n%s\n", line,first_number_buffer,second_number_buffer);
}
The code above doesn't work very well, since I'm not really allocating first_number_buffer and second_number_buffer yet. I would like to know if there's an efficient way for using sscanf in that situation. I didn't manage to find good results in google, since scanf overlaps sscanf results.
The problem seems to be: Usually, to dynamically allocate a string, one uses realloc to grow it size one by one. However sscanf tries to "throw" all the content of the parsed substring at once. Since strings have inconsistent sizes I cannot simply make them static, like I did with operation.
Yes, I could use a big static buffer but that seems to be an important task, and since I'm an undergrad I would like to know the proper way to to that. Thanks in advance!

I believe I manage to do what I initially intended.
It's hard to use sscanf for the task, since one would have to previously allocate the memory necessary for sscanf to use. Which, in the context of the question, is unknown.
However, #Cheatah suggested using strtok, which worked pretty fine. Here's the final version of the code, using it:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include<stdbool.h>
#include<stdlib.h>
#include<stdio.h>
char* read_line(){
const char UNIX_LINEBREAK = '\n';
const char WINDOWS_LINEBREAK = '\r';
const char C_STRING_TERMINATOR = '\0';
char extra_linebreak;
char current_letter;
char* line = NULL;
int position = 0;
bool reading_line = true;
while(reading_line){
scanf("%c", &current_letter);
if(current_letter == UNIX_LINEBREAK || current_letter == EOF){
reading_line = false;
}
else if(current_letter == WINDOWS_LINEBREAK) {
reading_line = false;
extra_linebreak = (char)getchar();
}
else {
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = current_letter;
position ++;
}
}
line = (char*) realloc(line, sizeof(char) * (position + 1));
line[position] = C_STRING_TERMINATOR;
return line;
}
int main (){
char* line = read_line();
char* operation;
char* first_number_buffer;
char* second_number_buffer;
char *line_split = strtok(line, " ");
operation = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(operation, line_split);
line_split = strtok(NULL, " ");
first_number_buffer = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(first_number_buffer, line_split);
line_split = strtok(NULL, " ");
second_number_buffer = (char *) malloc(strlen(line_split) * sizeof(char));
strcpy(second_number_buffer, line_split);
printf("%s\n%s\n%s\n", line,first_number_buffer,second_number_buffer);
}
input:
sum 23879238932898239832983298329839229383928329 239823983298392893289238932883290312803291832109230189
output:
sum
23879238932898239832983298329839229383928329
239823983298392893289238932883290312803291832109230189
The code can be improved in many ways. Some people pointed EOF is not properly checked within read_line(), and main can definitely be refactored in smaller functions.
However the idea of using strtok() as a substitute to sscanf() even with an indefinite number of tokens, for that case, works. See an example of a strtok inside a while: https://www.cplusplus.com/reference/cstring/strtok/

Related

sscanf loop only reads first input multiple times [duplicate]

This question already has an answer here:
How to use sscanf in loops?
(1 answer)
Closed 4 years ago.
I used sscanf to segment one string taken from the input and store every token in a structure. The problem is that sscanf only reads the first word of the string and doesn't move ahead to the next word, printing the same token over and over. Here's the code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define dim 30
typedef struct string {
char* token[dim];
}string;
int main() {
string* New = (string *)malloc(dim*sizeof(string));
char* s;
char buffer[dim];
int i = 0, r = 0, j = 0;
s = (char*)malloc(sizeof(char*));
printf("\nString to read:\n");
fgets(s, dim, stdin);
printf("\nThe string is: %s", s);
while(sscanf(s, " %s ", buffer) != EOF) {
New->token[i] = malloc(dim*sizeof(char));
strcpy(New->token[i], buffer);
printf("\nAdded: %s", New->token[i]);
++i;
}
}
For example, if i give "this is a string" as an input, sscanf will only get the word "this" multiple times without moving on to the next word.
You need to increment the pointer of the source sscanf() reads from, so that it won't read from the same point, again and again.
Furthermore, the memory dynamically allocated for s by you didn't make any sense. It was too less in any case. By the call to fgets() later in the code I can see you meant to say s = malloc(dim * sizeof(char));, so I went ahead and fixed that.
Example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define dim 30
typedef struct string {
char* token[dim];
} string;
int main() {
string* New = malloc(dim*sizeof(string));
char* s;
char buffer[dim];
int i = 0;
s = malloc(dim * sizeof(char));
fgets(s, dim, stdin);
printf("The string is: %s\n", s);
char* ptr = s;
int offset;
while (sscanf(ptr, "%s%n", buffer, &offset) == 1) {
ptr += offset;
New->token[i] = malloc(strlen(buffer) + 1);
strcpy(New->token[i], buffer);
printf("Added: %s\n", New->token[i]);
++i;
}
// do work
for(int j = 0; j < i; ++j)
free(New->token[i]);
free(New);
free(s);
return 0;
}
Output:
The string is: this is a string
Added: this
Added: is
Added: a
Added: string
PS: I am not sure about the schema of structures you have in mind, maybe you need to spend a moment or two, thinking about that twice; I mean whether your design approach is meaningful or not.
PPS: Unrelated to your problem: Do I cast the result of malloc? No!
Edit: As #chux said, " " in " %s%n" of sscanf() serves no purpose. I changed it to "%s%n".
Moreover, in order to reserve exactly as much memory as needed (which is the thing to do, when dealing with dynamic memory allocation), New->token[i] = malloc(dim*sizeof(char)); was changed to New->token[i] = malloc(strlen(buffer) + 1);.

Manipulation array in function - c - Segmentation fault

So I started to learn how to code a few weeks ago, and this site helped me so much, thank you for that. But this time I got stuck and canĀ“t really figure out why...Hope you can help me.
Basically I have a function prototype I have to use in my program and I have my troubles with it. The function should receive a string and then only copy every second char of that string and return the result...
This is what I've got so far:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define max_size 1000
char * everySecondChar(char * dest, char * input);
int main() {
char inputstr[max_size] = {0};
char *input[max_size] = {0};
char *dest[max_size] = {0};
char temp[max_size] = {0};
int i = 0;
while (fgets(inputstr, max_size, stdin) != NULL)
{
input[i] = strndup(inputstr, max_size);
strcat(temp,inputstr);
i++;
}
input[0] = strndup(temp, max_size);
printf("Inputted text:\n%s", *input);
printf("\n");
printf("\n");
printf("Resulting string:\n");
everySecondChar(*dest, *input);
printf("%s", *dest);
return 0;
}
char * everySecondChar(char * dest, char * input)
{
int i = 0;
for(i = 0; i < max_size; i+=2) {
strcat(dest,input);
}
return dest;
}
I know this is probably a 1-min challenge for the most of you, but I am having my troubles whenever I see those nasty * in a function prototype :(
Congrats on getting started with programming!
To your question: there's quite a few things that could be addressed, but since there seems to be some more basic confusion and misunderstanding, I'll address what makes sense given the context of your issue.
First, you're using strcat which concatenates strings (e.g. adds to the string), when you just need simple character assignment.
Next, you have a lot of pointers to arrays and there seems to be some confusion regarding pointers; in your main function, you don't need all of the temporary variables to do what you're wanting.
You could have simply:
char inputstr[MAX_SIZE] = {0};
char dest[MAX_SIZE] = {0};
You could have less (realistically) but we'll stick with the basics for now.
Next, you're looping to get user input:
while (fgets(inputstr, max_size, stdin) != NULL)
{
input[i] = strndup(inputstr, max_size);
strcat(temp,inputstr);
i++;
}
Here, you don't check if i exceeds max_size which your input variable has been allocated for; if i exceeds max_size when you go to assign input[i] to the memory location returned by strndup (which calls malloc), you are writing beyond your memory bounds, which is also known as a buffer overflow. This is potentially where your segmentation fault is happening. You could also have some issues when you do strcat(temp,inputstr); since strcat:
Appends a copy of the source string to the destination string. The terminating null character in destination is overwritten by the first character of source, and a null-character is included at the end of the new string formed by the concatenation of both in destination.
If you're simply just trying to get what the user entered, and print every 2nd character with your function, you don't need to loop:
if (fgets(inputstr, MAX_SIZE, stdin) != NULL) {
everySecondChar(dest, inputstr);
printf("Inputted text:\n%s\n\nResulting string:\n%s\n", inputstr, dest);
}
Lastly, in your everySecondChar function, you're using strcat again when all you need to do is simple assignment (which does a 'copy'):
char * everySecondChar(char * dest, char * input)
{
int i, j;
for(i = 0, j = 0; i < MAX_SIZE; ++i, ++j) {
if (input[i] == 0) break; // end if string?
dest[j] = input[i++];
}
return dest;
}
Putting all of it together, you get:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SIZE 1000
char * everySecondChar(char * dest, char * input);
int main(void)
{
char inputstr[MAX_SIZE] = {0};
char dest[MAX_SIZE] = {0};
printf("Enter some text: ");
if (fgets(inputstr, MAX_SIZE, stdin) != NULL) {
everySecondChar(dest, inputstr);
printf("Inputted text:\n%s\n\nResulting string:\n%s\n", inputstr, dest);
}
return 0;
}
char * everySecondChar(char * dest, char * input)
{
int i, j;
for(i = 0, j = 0; i < MAX_SIZE; ++i, ++j) {
if (input[i] == 0) break; // end if string?
dest[j] = input[i++];
}
return dest;
}
That aside, I'll address some other things; typically if you have a constant value, like your max_size variable, it's considered "best practice" to capitalize the entire thing:
`#define MAX_SIZE 1000`
I am having my troubles whenever I see those nasty * in a function prototype :(
Those nasty *'s in your function prototype (and variable declarations) are known as a pointer qualifier; it indicates that the type of the variable is a pointer to the type specified. A pointer isn't something to be scared of, you're learning C, it's highly important you understand what a pointer is and it's utility.
I won't dive into all of the specificities of pointers, aliases, etc. etc. since that is beyond the scope of this Q&A, but WikiBooks has a great intro and explanation covering a lot of those concepts.
Hope that can help!

C string nested splitting

I'm a beginner at C and I'm stuck on a simple problem. Here it goes:
I have a string formatted like this: "first1:second1\nsecond2\nfirst3:second3" ... and so on.
As you can see from the the example the first field is optional ([firstx:]secondx).
I need to get a resulting string which contains only the second field. Like this: "second1\nsecond2\nsecond3".
I did some research here on stack (string splitting in C) and I found that there are two main functions in C for string splitting: strtok (obsolete) and strsep.
I tried to write the code using both functions (plus strdup) without success. Most of the time I get some unpredictable result.
Better ideas?
Thanks in advance
EDIT:
This was my first try
int main(int argc, char** argv){
char * stri = "ciao:come\nva\nquialla:grande\n";
char * strcopy = strdup(stri); // since strsep and strtok both modify the input string
char * token;
while((token = strsep(&strcopy, "\n"))){
if(token[0] != '\0'){ // I don't want the last match of '\n'
char * sub_copy = strdup(token);
char * sub_token = strtok(sub_copy, ":");
sub_token = strtok(NULL, ":");
if(sub_token[0] != '\0'){
printf("%s\n", sub_token);
}
}
free(sub_copy);
}
free(strcopy);
}
Expected output: "come", "si", "grande"
Here's a solution with strcspn:
#include <stdio.h>
#include <string.h>
int main(void) {
const char *str = "ciao:come\nva\nquialla:grande\n";
const char *p = str;
while (*p) {
size_t n = strcspn(p, ":\n");
if (p[n] == ':') {
p += n + 1;
n = strcspn(p , "\n");
}
if (p[n] == '\n') {
n++;
}
fwrite(p, 1, n, stdout);
p += n;
}
return 0;
}
We compute the size of the initial segment not containing : or \n. If it's followed by a :, we skip over it and get the next segment that doesn't contain \n.
If it's followed by \n, we include the newline character in the segment. Then we just need to output the current segment and update p to continue processing the rest of the string in the same way.
We stop when *p is '\0', i.e. when the end of the string is reached.

Using strncpy to remove part of a char*

I am trying to remove a certain part of my string using strncpy but I am facing some issues here.
This is what my 2 char* has.
trimmed has for example "127.0.0.1/8|rubbish|rubbish2|" which is a
prefix of a address.
backportion contains "|rubbish|rubbish2|"
What I wanna do is to remove the backportion of the code from trimmed. So far I got this:
char* extractPrefix(char buf[1024]){
int count = 0;
const char *divider = "|";
char *c = buf;
char *trimmed;
char *backportionl;
while(*c){
if(strchr(divider,*c)){
count++;
if(count == 5){
++c;
trimmed = c;
//printf("Statement: %s\n",trimmed);
}
if(count == 6){
backportionl = c;
}
}
c++;
}
strncpy(trimmed,backportionl,sizeof(backportionl));
printf("Statement 2: %s\n", trimmed);
Which nets me an error of backportionl being a char* instead of a char.
Is there anyway I can fix this issue or find a better way to trim this char* to get my aim?
Here's one way that works for a list of dividers, similar to how strtok works the first time it's called:
char *extractPrefix(char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
if (buf[div_idx] != 0)
buf[div_idx] = 0;
return buf;
}
If you don't want the original buffer modified, you can use strndup, assuming your platform supports the function (Windows doesn't; you'd need to code it yourself). Don't forget to free the pointer that is returned when you're done with it:
char *extractPrefix(const char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
return strndup(buf, div_idx);
}
Alternatively, you could just return the number of characters (or some value less than 0 if the number of characters in the prefix won't fit in an int):
int pfxlen(const char *buf, const char *dividers)
{
size_t div_idx = strcspn(buf, dividers);
if (div_idx > (size_t)INT_MAX)
return -1;
return (int)div_idx;
}
and use it like this:
int n;
const char *example = "127.0.0.1/8|rubbish|rubbish2|";
n = pfxlen(example, "|");
if (n >= 0)
printf("Prefix: %.*s\n", n, example);
else
fprintf(stderr, "prefix too long\n");
Obviously you have a number of options. It's really up to you which one you want to use.
Welp, this is stupid but i fixed my issue in basically one line. so here goes,
trimmed[strchr(trimmed,'|')-trimmed] = '\0';
printf("Statement 2: %s\n", trimmed);
So by getting the index of 'backportion' from the trimmed char* using strchr, i was effectively able to fix the issue.
Thanks internet, for not much.
Disclaimer: I'm not sure whether I correctly understood what you actually want to achieve. Some examples would probably be helpful.
I am trying to remove a certain part of my string [..]
I have no idea what you're trying in your code, but this is pretty easy to achieve with strstr, strlen and memmove:
First, find the position of the string you want to remove using strstr. Then copy what's behind that found string to the position where the found string starts.
char cut_out_first(char * input, char const * unwanted) {
assert(input); assert(unwanted);
char * start = strstr(input, unwanted);
if (start == NULL) {
return 0;
}
char * rest = start + strlen(unwanted);
memmove(start, rest, strlen(rest) + 1);
return 1;
}

C - Is this the right way to use strtok in the following situation

If i have a string that contains 10X15. And i want to separate the 10 and 15. Would the following code be correct. I am concerned about the second part of the code, is putting "NULL" there the right thing to do.
char * stringSixrows = strtok(stringSix[0], "X");
char * stringSixcollumns = strtok(NULL, "NULL");
//I put the second null there cause its the end of string, im not sure if its right though.
I'd say the "canonical" way to obtain the "pointer to the remaining string" is:
strtok(NULL, "")
strtok searches for any of the delimiters in the provided string, so if you don't provide any delimiters, it cannot find anything and thus only stops at the end of the input string.
example
#include <stdio.h>
#include <string.h>
int main(void){
char stringSix[] = "10X15";
char *stringSixrows = strtok(stringSix, "X");
char *stringSixcolumns = strtok(NULL, "X");
printf("%s, %s\n", stringSixrows, stringSixcolumns);
return 0;
}
another way
char stringSix[] = "10X15";
char stringSixrows[3];
char stringSixcolumns[3];
char *p;
if(NULL!=(p = strchr(stringSix, 'X')))
*p = '\0';
else {
printf("invalid format\n");
return -1;
}
strcpy(stringSixrows, stringSix);
strcpy(stringSixcolumns, p+1);
printf("%s, %s\n", stringSixrows, stringSixcolumns);
const char *stringSix = "10X15";
int stringSixrows;
int stringSixcolumns;
if(2==sscanf(stringSix, "%dX%d", &stringSixrows, &stringSixcolumns))
printf("%d, %d\n", stringSixrows, stringSixcolumns);
You can use strtol to convert the string to numbers as well as seek to the next string. Below code safely does the intended operation:
char stringSix[] = "10X15";
char * pEnd;
long firstNumber = strtol (stringSix,&pEnd, 10);
pEnd = strtok(pEnd, "");
long secondNumber = strtol (pEnd,&pEnd, 10);

Resources