Split string with multiple delimiters using strtok in C - c

I have problem with splitting a string. The code below works, but only if between strings are ' ' (spaces). But I need to split strings even if there is any whitespace char. Is strtok() even necessary?
char input[1024];
char *string[3];
int i=0;
fgets(input,1024,stdin)!='\0') //get input
{
string[0]=strtok(input," "); //parce first string
while(string[i]!=NULL) //parce others
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL," ");
}

A simple example that shows how to use multiple delimiters and potential improvements in your code. See embedded comments for explanation.
Be warned about the general shortcomings of strtok() (from manual):
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread
safe. Use strtok_r() if this matters to you.
#include <stdio.h>
#include<string.h>
int main(void)
{
char input[1024];
char *string[256]; // 1) 3 is dangerously small,256 can hold a while;-)
// You may want to dynamically allocate the pointers
// in a general, robust case.
char delimit[]=" \t\r\n\v\f"; // 2) POSIX whitespace characters
int i = 0, j = 0;
if(fgets(input, sizeof input, stdin)) // 3) fgets() returns NULL on error.
// 4) Better practice to use sizeof
// input rather hard-coding size
{
string[i]=strtok(input,delimit); // 5) Make use of i to be explicit
while(string[i]!=NULL)
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL,delimit);
}
for (j=0;j<i;j++)
printf("%s", string[i]);
}
return 0;
}

Related

C - split String into strings [duplicate]

I have problem with splitting a string. The code below works, but only if between strings are ' ' (spaces). But I need to split strings even if there is any whitespace char. Is strtok() even necessary?
char input[1024];
char *string[3];
int i=0;
fgets(input,1024,stdin)!='\0') //get input
{
string[0]=strtok(input," "); //parce first string
while(string[i]!=NULL) //parce others
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL," ");
}
A simple example that shows how to use multiple delimiters and potential improvements in your code. See embedded comments for explanation.
Be warned about the general shortcomings of strtok() (from manual):
These functions modify their first argument.
These functions cannot be used on constant strings.
The identity of the delimiting byte is lost.
The strtok() function uses a static buffer while parsing, so it's not thread
safe. Use strtok_r() if this matters to you.
#include <stdio.h>
#include<string.h>
int main(void)
{
char input[1024];
char *string[256]; // 1) 3 is dangerously small,256 can hold a while;-)
// You may want to dynamically allocate the pointers
// in a general, robust case.
char delimit[]=" \t\r\n\v\f"; // 2) POSIX whitespace characters
int i = 0, j = 0;
if(fgets(input, sizeof input, stdin)) // 3) fgets() returns NULL on error.
// 4) Better practice to use sizeof
// input rather hard-coding size
{
string[i]=strtok(input,delimit); // 5) Make use of i to be explicit
while(string[i]!=NULL)
{
printf("string [%d]=%s\n",i,string[i]);
i++;
string[i]=strtok(NULL,delimit);
}
for (j=0;j<i;j++)
printf("%s", string[i]);
}
return 0;
}

Allocating string with malloc

I'm new in programming in C and now I'm studying strings.
My question is: if I allocate a string using malloc (as in the code below), is the NULL character automatically inserted at the end of the string?
I find an answer in another question here, and it seems that the NULL character is not automatically included.
But here comes the problem: I know functions like strlen don't work if there isn't the NULL character, and in this code I use it and it works. So I think there is \0 at the end of my string, even if I don't write it anywhere.
What's the answer?
Here's the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
char *stringa1;
int n;
int i;
printf("How many characters in the string? ");
scanf("%d", &n);
stringa1 = (char*) malloc(n*sizeof(char));
printf("Insert the string: ");
scanf("%s", stringa1);
free(stringa1);
return 0;
}
malloc() returns a void* pointer to a block of memory stored in the heap. Allocating with malloc() does not initialize any string, only space waiting to be occupied.To add a null-terminating character, you either have to do this yourself, or use a function like scanf(), which adds this character for you. Having said this, you need to allocate space for this \0 character beforehand.
Your malloc() call should be this instead:
stringa1 = (char*) malloc((n+1)*sizeof(char)); /*+1 for '\0' character */
Note: You don't need to cast return of malloc. For more information, read this.
Another thing to point out is sizeof(char) is 1, so multiplying this in your malloc() call is not necessary.
You also need to check if malloc() returns NULL. This can be done like this:
if (stringa1 == NULL) {
/* handle exit */
Also, you can only use strlen() on a null-terminated string, otherwise this ends up being undefined behaviour.
Once scanf() is called, and the stringa1 contains some characters, you can call strlen() on it.
Additionally, checking return of scanf() is also a good idea. You can check it like this:
if (scanf("%d", &n) != 1) {
/* handle exit */
Your code with these changes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char *stringa1 = NULL;
size_t n, slen;
printf("How many characters in the string? ");
if (scanf("%zu", &n) != 1) {
printf("Invalid input\n");
exit(EXIT_FAILURE);
}
stringa1 = malloc(n+1);
if (stringa1 == NULL) {
printf("Cannot allocate %zu bytes for string\n", n+1);
exit(EXIT_FAILURE);
}
printf("Insert the string: ");
scanf("%s", stringa1);
slen = strlen(stringa1);
printf("String: %s Length: %zu\n", stringa1, slen);
free(stringa1);
stringa1 = NULL;
return 0;
}
if I allocate a string using malloc (as in the code below), is the NULL character automatically inserted at the end of the string?
No. malloc() returns a block of uninitialized memory.
I know functions like 'strlen' don't work if there isn't the NULL character, and in this code I use it and it works. So I think there is '\0' at the end of my string, even if I don't wrote it nowhere.
scanf() inserts the null byte ('\0') for you when you use %s format specifier (assuming scanf() succeeded).
From man scanf():
s Matches a sequence of non-white-space characters; the next
pointer must be a pointer to the initial element of a
character array that is long enough to hold the input sequence
and the terminating null byte ('\0'), which is added
automatically. The input string stops at white space or at
the maximum field width, whichever occurs first.
(emphasis mine).
By the way, you should do error checking for scanf() and malloc() calls.
malloc returns pointer to an uninitialized memory extent.
If you want that the memory extent would be initialized by zeroes then you can use another standard function calloc instead of malloc.
Take into account that usually such a question like this
printf("How many characters in the string? ");
imply that the terminating zero is not counted. So you have to allocate one more byte of memory. For example
stringa1 = ( char* )malloc( ( n + 1 ) *sizeof( char ) );
or
stringa1 = ( char* )calloc( n + 1, sizeof( char ) );
In the last case you may apply the function strlen which returns 0 because the memory extent is zero-initialized.
This call of scanf
scanf("%s", stringa1);
is unsafe. It is better to use fgets instead. For example
fgets( stringa1, n + 1, stdin );
This function can append the string with the new line character. To remove it from the string you can write
stringa1[strcspn( stringa1, "\n" )] = '\0';
The definition of "string" in C is a sequence of characters, terminated by a null character.
To allocate memory for a string, count the chracters (e.g. strlen) and add 1 for this terminating null character.
Functions like scanf and strcpy add the null character; a function like strncpy doesn't always do that.
The easy way to achieve this is to include cs50 library.
Just use get_string function:
#include <stdio.h>
#include <cs50.h>
int main(void) {
// input the string to stringa1
char *stringa1 = get_string("Insert the string: ");
// call the string
printf("The string you type was: %s\n", stringa1);
return 0;
}
Sample output:
Insert the string: Hello World, I am newbie!
The string you type was: Hello World, I am newbie!

How to correctly input a string in C

I am currently learning C, and so I wanted to make a program that asks the user to input a string and to output the number of characters that were entered, the code compiles fine, when I enter just 1 character it does fine, but when I enter 2 or more characters, no matter what number of character I enter, it will always say there is just one character and crashes after that. This is my code and I can't figure out what is wrong.
int main(void)
{
int siz;
char i[] = "";
printf("Enter a string.\n");
scanf("%s", i);
siz = sizeof(i)/sizeof(char);
printf("%d", siz);
getch();
return 0;
}
I am currently learning to program, so if there is a way to do it using the same scanf() function I will appreciate that since I haven't learned how to use any other function and probably won't understand how it works.
Please, FORGET that scanf exists. The problem you are running into, whilst caused mostly by your understandable inexperience, will continue to BITE you even when you have experience - until you stop.
Here is why:
scanf will read the input, and put the result in the char buffer you provided. However, it will make no check to make sure there is enough space. If it needs more space than you provided, it will overwrite other memory locations - often with disastrous consequences.
A safer method uses fgets - this is a function that does broadly the same thing as scanf, but it will only read in as many characters as you created space for (or: as you say you created space for).
Other observation: sizeof can only evaluate the size known at compile time : the number of bytes taken by a primitive type (int, double, etc) or size of a fixed array (like int i[100];). It cannot be used to determine the size during the program (if the "size" is a thing that changes).
Your program would look like this:
#include <stdio.h>
#include <string.h>
#define BUFLEN 100 // your buffer length
int main(void) // <<< for correctness, include 'void'
{
int siz;
char i[BUFLEN]; // <<< now you have space for a 99 character string plus the '\0'
printf("Enter a string.\n");
fgets(i, BUFLEN, stdin); // read the input, copy the first BUFLEN characters to i
siz = sizeof(i)/sizeof(char); // it turns out that this will give you the answer BUFLEN
// probably not what you wanted. 'sizeof' gives size of array in
// this case, not size of string
// also not
siz = strlen(i) - 1; // strlen is a function that is declared in string.h
// it produces the string length
// subtract 1 if you don't want to count \n
printf("The string length is %d\n", siz); // don't just print the number, say what it is
// and end with a newline: \n
printf("hit <return> to exit program\n"); // tell user what to do next!
getc(stdin);
return 0;
}
I hope this helps.
update you asked the reasonable follow-up question: "how do I know the string was too long".
See this code snippet for inspiration:
#include <stdio.h>
#include <string.h>
#define N 50
int main(void) {
char a[N];
char *b;
printf("enter a string:\n");
b = fgets(a, N, stdin);
if(b == NULL) {
printf("an error occurred reading input!\n"); // can't think how this would happen...
return 0;
}
if (strlen(a) == N-1 && a[N-2] != '\n') { // used all space, didn't get to end of line
printf("string is too long!\n");
}
else {
printf("The string is %s which is %d characters long\n", a, strlen(a)-1); // all went according to plan
}
}
Remember that when you have space for N characters, the last character (at location N-1) must be a '\0' and since fgets includes the '\n' the largest string you can input is really N-2 characters long.
This line:
char i[] = "";
is equivalent to:
char i[1] = {'\0'};
The array i has only one element, the program crashes because of buffer overflow.
I suggest you using fgets() to replace scanf() like this:
#include <stdio.h>
#define MAX_LEN 1024
int main(void)
{
char line[MAX_LEN];
if (fgets(line, sizeof(line), stdin) != NULL)
printf("%zu\n", strlen(line) - 1);
return 0;
}
The length is decremented by 1 because fgets() would store the new line character at the end.
The problem is here:
char i[] = "";
You are essentially creating a char array with a size of 1 due to setting it equal to "";
Instead, use a buffer with a larger size:
char i[128]; /* You can also malloc space if you desire. */
scanf("%s", i);
See the link below to a similar question if you want to include spaces in your input string. There is also some good input there regarding scanf alternatives.
How do you allow spaces to be entered using scanf?
That's because char i[] = ""; is actually an one element array.
Strings in C are stored as the text which ends with \0 (char of value 0). You should use bigger buffer as others said, for example:
char i[100];
scanf("%s", i);
Then, when calculating length of this string you need to search for the \0 char.
int length = 0;
while (i[length] != '\0')
{
length++;
}
After running this code length contains length of the specified input.
You need to allocate space where it will put the input data. In your program, you can allocate space like:
char i[] = " ";
Which will be ok. But, using malloc is better. Check out the man pages.

Reading formatted strings from file into Array in C

I am new to the C programming language and trying to improve by solving problems from the Project Euler website using only C and its standard libraires. I have covered basic C fundamentals(I think), functions, pointers, and some basic file IO but now am running into some issues.
The question is about reading a text file of first names and calculating a "name score" blah blah, I know the algorithm I am going to use and have most of the program setup but just cannot figure out how to read the file correctly.
The file is in the format
"Nameone","Nametwo","billy","bobby","frank"...
I have searched and searched and tried countless things but cannot seem to read these as individual names into an array of strings(I think thats the right way to store them individually?) I have tried using sscanf/fscanf with %[^\",]. I have tried different combos of those functions and fgets, but my understanding of fgets is everytime I call it it will get a new line, and this is a text file with over 45,000 characters all on the same line.
I am unsure if I am running into problems with my misunderstanding of the scanf functions, or my misunderstanding with storing an array of strings. As far as the array of strings goes, I (think) I have realized that when I declare an array of strings it does not allocate memory for the strings themselves, something that I need to do. But I still cannot get anything to work.
Here is the code I have now to try to just read in some names I enter from the command line to test my methods.
This code works to input any string up to buffer size(100):
int main(void)
{
int i;
char input[100];
char* names[10];
printf("\nEnter up to 10 names\nEnter an empty string to terminate input: \n");
for(int i = 0; i < 10; i++)
{
int length = 0;
printf("%d: ", i);
fgets(input, 100, stdin);
length = (int)strlen(input);
input[length-1] = 0; // Delete newline character
length--;
if(length < 1)
{
break;
}
names[i] = malloc(length+1);
assert(names[i] != NULL);
strcpy(names[i], input);
}
}
However, I simply cannot make this work for reading in the formatted strings.
PLEASE advise me as to how to read it in with format. I have previously used sscanf on the input buffer and that has worked fine, but I dont feel like I can do that on a 45000+ char line? Am I correct in assuming this? Is this even an acceptable way to read strings into an array?
I apologize if this is long and/or not clear, it is very late and I am very frustrated.
Thank anyone and everyone for helping, and I am looking forward to finally becoming an active member on this site!
There are really two basic issues here:
Whether scanning string input is the proper strategy here. I would argue not because while it might work on this task you are going to run into more complicated scenarios where it too easily breaks.
How to handle a 45k string.
In reality you won't run into too many string of this size but it is nothing that a modern computer of any capacity can't easily handle. Insofar as this is for learning purposes then learn iteratively.
The easiest first approach is to fread() the entire line/file into an appropriately sized buffer and parse it yourself. You can use strtok() to break up the comma-delimited tokens and then pass the tokens to a function that strips the quotes and returns the word. Add the word to your array.
For a second pass you can do away with strtok() and just parse the string yourself by iterating over the buffer and breaking up the comma tokens yourself.
Last but not least you can write a version that reads smaller chunks of the file into a smaller buffer and parses them. This has the added complexity of handling multiple reads and managing the buffers to account for half-read tokens at the end of a buffer and so on.
In any case, break the problem into chunks and learn with each refinement.
EDIT
#define MAX_STRINGS 5000
#define MAX_NAME_LENGTH 30
char* stripQuotes(char *str, char *newstr)
{
char *temp = newstr;
while (*str)
{
if (*str != '"')
{
*temp = *str;
temp++;
}
str++;
}
return(newstr);
}
int main(int argc, char *argv[])
{
char fakeline[] = "\"Nameone\",\"Nametwo\",\"billy\",\"bobby\",\"frank\"";
char *token;
char namebuffer[MAX_NAME_LENGTH] = {'\0'};
char *name;
int index = 0;
char nameArray[MAX_STRINGS][MAX_NAME_LENGTH];
token = strtok(fakeline, ",");
if (token)
{
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
while (token != NULL)
{
token = strtok(NULL, ",");
if (token)
{
memset(namebuffer, '\0', sizeof(namebuffer));
name = stripQuotes(token, namebuffer);
strcpy(nameArray[index++], name);
}
}
return(0);
}
fscanf("%s", input) reads one token (a string surrounded by spaces) at a time. You can either scan the input until you encounter a specific "end-of-input" string, such as "!", or you can wait for the end-of-file signal, which is achieved by pressing "Ctrl+D" on a Unix console or by pressing "Ctrl+Z" on a Windows console.
The first option:
fscanf("%s", input);
if (input[0] == '!') {
break;
}
// Put input on the array...
The second option:
result = fscanf("%s", input);
if (result == EOF) {
break;
}
// Put input on the array...
Either way, as you read one token at a time, there are no limits on the size of the input.
Why not search the giant string for quote characters instead? Something like this:
#include <stdio.h>
#include <string.h>
int main(void)
{
char mydata[] = "\"John\",\"Smith\",\"Foo\",\"Bar\"";
char namebuffer[20];
unsigned int i, j;
int begin = 1;
unsigned int beginName, endName;
for (i = 0; i < sizeof(mydata); i++)
{
if (mydata[i] == '"')
{
if (begin)
{
beginName = i;
}
else
{
endName = i;
for (j = beginName + 1; j < endName; j++)
{
namebuffer[j-beginName-1] = mydata[j];
}
namebuffer[endName-beginName-1] = '\0';
printf("%s\n", namebuffer);
}
begin = !begin;
}
}
}
You find the first double quote, then the second, and then read out the characters in between to your name string. Then you process those characters as needed for the problem in question.

How to concatenate string and int in C?

I need to form a string, inside each iteration of the loop, which contains the loop index i:
for(i=0;i<100;i++) {
// Shown in java-like code which I need working in c!
String prefix = "pre_";
String suffix = "_suff";
// This is the string I need formed:
// e.g. "pre_3_suff"
String result = prefix + i + suffix;
}
I tried using various combinations of strcat and itoa with no luck.
Strings are hard work in C.
#include <stdio.h>
int main()
{
int i;
char buf[12];
for (i = 0; i < 100; i++) {
snprintf(buf, 12, "pre_%d_suff", i); // puts string into buffer
printf("%s\n", buf); // outputs so you can see it
}
}
The 12 is enough bytes to store the text "pre_", the text "_suff", a string of up to two characters ("99") and the NULL terminator that goes on the end of C string buffers.
This will tell you how to use snprintf, but I suggest a good C book!
Use sprintf (or snprintf if like me you can't count) with format string "pre_%d_suff".
For what it's worth, with itoa/strcat you could do:
char dst[12] = "pre_";
itoa(i, dst+4, 10);
strcat(dst, "_suff");
Look at snprintf or, if GNU extensions are OK, asprintf (which will allocate memory for you).

Resources