Let's say I have a string "file1.h: file2.c,file3.cpp" and I want to split it into "file1.h" and "file2.c,file3.cpp" - that is using : (: and whitespace) as delimiter. How can I do it?
I tried this code with no help:
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s: %s", name, depends);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
}
And the output I get is:
Name: file1.h:
Deps:
What you seem to need is strtok(). Read about it in the man page. Related quote from C11, chapter §7.24.5.8
A sequence of calls to the strtok function breaks the string pointed to by s1 into a
sequence of tokens, each of which is delimited by a character from the string pointed to
by s2. [...]
In your case, you can use a delimiter like
char * delim = ": "; //combination of : and a space
go get the job done.
Things to mention additionally,
the input needs to be modifiable (which is, in your case) for strtok()
and it actually destroys the input fed to it, keep a copy around if you need the actual later.
This is an alternative way to do it, it uses strchr(), but this assumes that the input string always has the format
name: item1,item2,item3,...,itemN
Here is the program
#include <string.h>
#include <stdio.h>
int
main(void)
{
const char *const string = "file1.h: file2.c,file3.cpp ";
const char *head;
const char *tail;
const char *next;
// This basically makes a pointer to the `:'
head = string;
// If there is no `:' this string does not follow
// the assumption that the format is
//
// name: item1,item2,item3,...,itemN
//
if ((tail = strchr(head, ':')) == NULL)
return -1;
// Save a pointer to the next character after the `:'
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
fputc('*', stdout);
// Simply print the characters between `head' and `tail'
// you could as well copy them, or whatever
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
head = next;
while (head != NULL) {
tail = strchr(head, ',');
if (tail == NULL) {
// This means there are no more `,'
// so we now try to point to the end
// of the string
tail = strchr(head, '\0');
}
// This is basically the same algorithm
// just with a different delimiter which
// will presumably be the same from
// here
next = tail + 1;
// Strip leading spaces
while (isspace((unsigned char) *head) != 0)
++head;
// Strip trailing spaces
while (isspace((unsigned char) *(tail - 1)) != 0)
--tail;
// Here is where you can extract the string
// I print it surrounded by `*' to show that
// it's stripping white spaces
fputc('*', stdout);
fwrite(head, 1, tail - head, stdout);
fputc('*', stdout);
fputc('\n', stdout);
// Try to point to the next one
// or make head `NULL' if this is
// the end of the string
//
// Note that the original `tail' pointer
// that was pointing to the next `,' or
// the end of the string, has changed but
// we have saved it's original value
// plus one, we now inspect what was
// there
if (*(next - 1) == '\0') {
head = NULL;
} else {
head = next;
}
}
fputc('\n', stderr);
return 0;
}
It's excessively commented to guide the reader.
As Sourav says, you really need to use strtok for tokenizing strings. But this doesn't explain why your existing code is not working.
The answer lies in the specification for sscanf and how it handles a '%s' in the format string.
From the man page:
s Matches a sequence of non-white-space characters;
So, the presence of a colon-space in your format string is largely irrelevant for mathcing the first '%s'. When sscanf sees the first %s it simply consumes the input string until a whitespace character is encountered, giving you your value for name of "file1.h:" (note the inclusion of the colon).
Next it tries to deal with the colon-space sequence in your format string.
Again, from the man page
The format string consists of a sequence of directives which describe how to process the sequence of input characters.
The colon-space sequence does not match any known directive (i.e. "%" followed by something) and thus you get a matching failure.
If, instead, your format string was simply "%s%s", then sscanf will get you almost exactly what you want.
int main(int argc, char *argv[]) {
char str[] = "file1.h: file2.c,file3.cpp";
char name[100];
char depends[100];
sscanf(str, "%s%s", name, depends);
printf("str: '%s'\n", str);
printf("Name: %s\n", name);
printf("Deps: %s\n", depends);
return 0;
}
Which gives this output:
str: 'file1.h: file2.c,file3.cpp'
Name: file1.h:
Deps: file2.c,file3.cpp
At this point, you can simply check that sscanf gave a return value of 2 (i.e. it found two values), and that the last character of name is a colon. Then just truncate name and you have your answer.
Of course, by this logic, you aren't going to be able to use sscanf to parse your depends variable into multiple strings ... which is why others are recommending using strtok, strpbrk etc because you are both parsing and tokenizing your input.
Well, I am pretty late. I do not have much knowledge on inbuilt functions in C. So I started writing a solution for you. I don't think you need this now. But, anyway here it is and modify it as per your need. If you find any bug feel free to tell.
Related
I need to try and fix sentences from an input in c, so I tried separating tokens and making new strings and then I wanted to access the first char of each string and make it a capital letter.
Now I am having trouble understanding how to access only one char of each new string, like trying to access only 'e' in hello which is in str1[0] second char.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main()
{
char str1[601], * str2[601];
int i = 0, j = 0;
printf_s("*************** Welcome to the text cleaner ***************\n\n");
printf_s("Please enter text:\n");
gets_s(str1, sizeof(str1));
char* sentence=NULL,*next_sentence=NULL;
sentence = strtok_s(str1,".",&next_sentence);
while (sentence != NULL)
{
printf(" %s\n", sentence);
str2[i++] = sentence;
sentence = strtok_s(NULL, ".", &next_sentence);
}
str2[i++] = '\0';
printf_s("%s", str2[1]);
}
Code and content of variables in debugger
Here is my take on what you are trying to do. I'm showing the code and the results. I have simplified your effort since you are mixing printf and printf_s. You use the _s variant for buffer overflow control. That does not seem to be your concern while simply learning about arrays.
Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char str1[601]; // This is an array of chars.
// If storing a string, final elem is 0x0
char *str2[601]; // This is a pointer to an array of chars.
//
int i = 0;
int j = 0;
// I removed your _s variants of standard libraries. Let's keep
// things simple.
printf("*************** Welcome to the text cleaner ***************\n\n");
printf("Please enter text:\n");
// ditto for gets to fgets
//
// Excerpt rom the manpage
//
// char *fgets(char *s, int size, FILE *stream);
//
// fgets() reads in at most one less than size characters from
// stream and stores them into the buffer pointed to by s.
// Reading stops after an EOF or a newline. If a newline is read,
// it is stored into the buffer. A terminating null byte ('\0')
// is stored after the last character in the buffer.
//
// fgets() returns s on success, and NULL on error or when end of
// file occurs while no characters have been read.
//str1 = fgets(str1, sizeof(str1), stdin);
// I would do a null check here.
if (NULL == fgets(str1, sizeof(str1), stdin)) {
return; // graceful exit
}
// Notice on the bracket print of your text, the closing >
// is shown on the next line. This is because its capturing the
// newline/carriage return character.
printf("You entered %d chars and the text was:\n<%s>\n", strlen(str1), str1);
// These are for your strtok operation
// I would call them tokens or words, but
// whatever.
char *sentence=NULL;
char *next_sentence=NULL;
// wants to parse a string
// Excerpt from manpage
//
// char *strtok(char *str, const char *delim);
// Ahh, now I see why you name is sentence. You
// are looking for periods to separage sentences.
printf("Lets use strtok\n");
sentence = strtok(str1, ".");
while (sentence != NULL) {
printf("A sentence is:\n %s\n", sentence);
str2[i++] = sentence;
sentence = strtok(NULL, ".");
}
// So now, your individual sentences are stored
// in the array str2.
// str2[0] is the first sentence.
// str2[1] is the next sentence.
//
// To access the characters, specify a sentence and
// then specify the character.
//
// You can do the math, but do a man ascii, look at
// difference in lowercase a and uppercase A in terms
// of ascii. If its not captializ3ed already, simply
// add that offset or error out if not in set a-z.
//
// Here I will just make the first letter of the second
// sentence to be J.
str2[1][0] = 'J';
// Note, since you are going to have in the 'space'
// since you are delimitting on '.', It will have the
// effect of replacing 'space' with 'J'.
printf("Sentence two is: \n%s\n", str2[1]);
}
Here is the code in action.
*************** Welcome to the text cleaner ***************
Please enter text:
John was here. and here.
You entered 25 chars and the text was:
<John was here. and here.
>
Lets use strtok
A sentence is:
John was here
A sentence is:
and here
A sentence is:
Sentence two is:
Jand here
I hope that helps. TLDR use str2[x][y] to access a string x at character y.
I make user enter a username and I then go to this file and extract the values corresponding the particular user. I know the fault is with the way that I am using strtok as it only works for the first user.
Once I find the user, I want to stop searching in the file.
int fd;
fd=open(fileName,O_RDONLY,0744);
if (fd==-1)
{
printf("The file userDetails.txt failed to open.\n");
exit(1);
}
int fileSize = sizeof(fileOutput)/sizeof(fileOutput[0]); //size of file
printf("%d\n",fileSize);
int bytesRead = read(fd,&fileOutput,fileSize);
//TERMINATING BUFFER PROPERLY
fileOutput[bytesRead] = '\0';
printf("%s\n",fileOutput);
//READING LINE BY LINE IN FILE
char *line;
char *data;
char *name;
char *saltValue;
char *encryptedValue;
line = strtok(fileOutput,"\n"); //SPLIT ACCORDING TO LINE
while (line != NULL)
{
data = strtok(line, ":");
while (data != NULL)
{
name = data;
if (strcmp(name,userName)==0)
{
printf("%s\n","User exists");
saltValue = strtok(NULL,":");
printf("%s\n",saltValue);
encryptedValue = strtok(NULL, ":");
printf("%s\n",encryptedValue);
break;
}
else
{
break;
}
}
if (strcmp(name,userName)==0) //user found
{
break;
}
else //user not found
{
strtok(NULL,"\n");
}
}
If you are limited to read, that's fine, but you can only use strtok once on "\n" to parse each line from fileOutput, not nested again to parse the ':'. Otherwise, since strtok modifies the string by inserting '\0' at the delimiter found, you will be writing the nul-terminating character within lines that will cause the outer strtok to consider the string finished on the next iteration.
Instead, use a single pointer on each line with strchr (line, ':') to locate the first ':' with the line and then strncmp() using the pointer to the start of line and then pointer locating ':'. For example, if you have a function to check if the userName is contained in your file (returning 0 on success and 1 on failure) you could do:
...
for (char *line = strtok(fileOutput,"\n"); line; line = strtok (NULL, "\n"))
{
char *p = strchr (line, ':'); /* find first ':' */
if (!p) /* if not found, bail */
break;
if (strncmp (userName, line, p - line) == 0) { /* check name */
printf ("found user: %s hash: %s\n", userName, p+1);
return 0;
}
}
fputs ("user not found.\n", stdout);
return 1;
This is probably one of the simpler approaches you could take.
Strtok modifies its input string, which makes impossible to call it in nesting mode, the inner loop workings destroy the work of the outer strtok(), making it impossible to continue.
Appart of this, using strtok() in your problem is not adequate for another reason: if you try to use it to parse the /etc/passwd file (or one of such similar format files that we cope with today) you'll run in trouble with empty fields. In case you have an empty field (two consecutive : chars in sequence, strtok() will skip over both, skipping completely undetected the empty field) Strtok is an old, legacy function that was writen to cope with the three characters (\n\t) that are used to separate arguments in bourne shell. In the case of /etc/passwd you need to cope with possibly empty fields, and that makes it impossible to use strtok() to parse them.
You can easily use strchr() instead to search for the : of /etc/passwd in a non-skipping way, just write something like (you can encapsulate this in a function):
char *not_strtok_but_almost(char *s, char *delim)
{
static char *p = NULL; /* this makes the function non-reentrant, like strtok() */
char *saved = NULL;
if (s) {
p = s;
while (strchr(delim, *p)) /* while *p is in the delimiters, skip */
p++;
/* *p is not more in the delimiters. save as return value */
saved = p;
}
/* search for delimiters after value */
while (*p && !strchr(delim, *p)) /* while *p not null, and not in the delimiter set */
p++;
/* *p is at the end of the string or p points to one of the delimiters */
*p = '\0';
return saved;
}
This function has all the trouble of strtok(3) but you can use it (taking care of its non-reentrancy and that it modifies the source string, making it not nestable on several loops) because it doesn't skip all the separators in one shot, but just stops after the first separator found.
To solve the nesting problem, you can act in a different way, lets assume you have several identifiers separated by spaces (as in /etc/group file) which should require (it doesn't, as the names field is the last, you are not going to call strtok again on the first loop, but to get a NULL. You can process your file in a level first precedence, instead of a depth first precedence. You first seek all fields in the first level, and then go, field by field, reading its subfields (that will use a different delimiter probably)
As all of these modifications are all done in the same string, no need to allocate a buffer for each and strdup() the strings before use... the work can be done in the same file, and strdup()ing the string at the beginning if you need to store the different subfields.
Make any comments if you are in doubt with this (be careful as I have not tested the routine above, it can have probably a bug)
I have encountered a problem with my homework. I need to scan some data from a text file, to a struct.
The text file looks like this.
012345678;danny;cohen;22;M;danny1993;123;1,2,4,8;Nice person
223325222;or;dan;25;M;ordan10;1234;3,5,6,7;Singer and dancer
203484758;shani;israel;25;F;shaninush;12345;4,5,6,7;Happy and cool girl
349950234;nadav;cohen;50;M;nd50;nadav;3,6,7,8;Engineer very smart
345656974;oshrit;hasson;30;F;osh321;111;3,4,5,7;Layer and a painter
Each item of data to its matching variable.
id = 012345678
first_name = danny
etc...
Now I can't use fscanf because there is no spacing, and the fgets scanning all the line.
I found some solution with %[^;]s, but then I will need to write one block of code and, copy and past it 9 times for each item of data.
Is there any other option without changing the text file, that similar to the code I would write with fscanf, if there was spacing between each item of data?
************* UPDATE **************
Hey, First of all, thanks everyone for the help really appreciating.
I didn't understand all your answers, but here something I did use.
Here's my code :
#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct
{
char *idP, *firstNameP, *lastNameP;
int age;
char gender, *userNameP, *passwordP, hobbies, *descriptionP;
}user;
void main() {
FILE *fileP;
user temp;
char test[99];
temp.idP = (char *)malloc(99);
temp.firstNameP = (char *)malloc(99);
temp.lastNameP = (char *)malloc(99);
temp.age = (int )malloc(4);
temp.gender = (char )malloc(sizeof(char));
temp.userNameP = (char *)malloc(99);
fileP = fopen("input.txt", "r");
fscanf(fileP, "%9[^;];%99[^;];%99[^;];%d;%c", temp.idP,temp.firstNameP,temp.lastNameP,&temp.age, temp.gender);
printf("%s\n%s\n%s\n%d\n%c", temp.idP, temp.firstNameP, temp.lastNameP, temp.age, temp.gender);
fgets(test, 60, fileP); // Just testing where it stop scanning
printf("\n\n%s", test);
fclose(fileP);
getchar();
}
It all works well until I scan the int variable, right after that it doesn't scan anything, and I get an error.
Thanks a lot.
As discussed in the comments, fscanf is probably the shortest option (although fgets followed by strtok, and manual parsing are viable options).
You need to use the %[^;] specifier for the string fields (meaning: a string of characters other than ;), with the fields separated by ; to consume the actual semicolons (which we specifically requested not to be consumed as part of the string field). The last field should be %[^\n] to consume up to the newline, since the input doesn't have a terminating semicolon.
You should also (always) limit the length of each string field read with a scanf family function to one less than the available space (the terminating NUL byte is the +1). So, for example, if the first field is at most 9 characters long, you would need char field1[10] and the format would be %9[^;].
It is usually a good idea to put a single space in the beginning of the format string to consume any whitespace (such as the previous newline).
And, of course you should check the return value of fscanf, e.g., if you have 9 fields as per the example, it should return 9.
So, the end result would be something like:
if (fscanf(file, " %9[^;];%99[^;];%99[^;];%d;%c;%99[^;];%d;%99[^;];%99[^\n]",
s.field1, s.field2, s.field3, &s.field4, …, s.field9) != 9) {
// error
break;
}
(Alternatively, the field with numbers separated by commas could be read as four separate fields as %d,%d,%d,%d, in which case the count would go up to 12.)
Here you have simple tokenizer. As I see you have more than one delimiter here (; & ,)
str - string to be tokenized
del - string containing delimiters (in your case ";," or ";" only)
allowempty - if true allows empty tokens if there are two or more consecutive delimiters
return value is a NULL terminated table of pointers to the tokens.
char **mystrtok(const char *str, const char *del, int allowempty)
{
char **result = NULL;
const char *end = str;
size_t size = 0;
int extrachar;
while(*end)
{
if((extrachar = !!strchr(del, *end)) || !*(end + 1))
{
/* add temp variable and malloc / realloc checks */
/* free allocated memory on error */
if(!(!allowempty && !(end - str)))
{
extrachar = !extrachar * !*(end + 1);
result = realloc(result, (++size + 1) * sizeof(*result));
result[size] = NULL;
result[size -1] = malloc(end - str + 1 + extrachar);
strncpy(result[size -1], str, end - str + extrachar);
result[size -1][end - str + extrachar] = 0;
}
str = end + 1;
}
end++;
}
return result;
}
To free the the memory allocated by the tokenizer:
void myfree(char **ptr)
{
char **savedptr = ptr;
while(*ptr)
{
free(*ptr++);
}
free(savedptr);
}
Function is simple but your can use any separators and any number of separators.
I have tried this code to separate my Str[] string into 2 string, but my problem is "I want to separate John(name) as string and 100(marks) as integer",How can I do it, any suggestion?
#include <stdio.h>
#include <string.h>
void main()
{
char Str[] = "John,100";
int i, j, xchange;
char name[50];
char marks[10];
j = 0; xchange = 0;
for(i=0; Str[i]!='\0'; i++){
if(Str[i]!=',' && xchange!=-1){
name[i] = Str[i];
}else{
xchange = -1;
}
if(xchange==-1){
marks[j++] = Str[i+1];
}
}
printf("Student name is %s\n", name);
printf("Student marks is %s", marks);
}
How to separate "John,100" into 2 strings?
There are three common approaches:
Use strtok() to split the string into individual tokens. This will modify the original string, but is quite simple to implement:
int main(void)
{
char line[] = "John,100;passed";
char *name, *score, *status;
/* Detach the initial part of the line,
up to the first comma, and set name
to point to that part. */
name = strtok(line, ",");
/* Detach the next part of the line,
up to the next comma or semicolon,
setting score to point to that part. */
score = strtok(NULL, ",;");
/* Detach the final part of the line,
setting status to point to it. */
status = strtok(NULL, "");
Note that if you change char line[] = "John,100"; then status will be NULL, but the code is otherwise safe to run.
So, in practice, if you required all three fields to exist in line, it would be sufficient to ensure the last one was not NULL:
if (!status) {
fprintf(stderr, "line[] did not have three fields!\n");
return EXIT_FAILURE;
}
Use sscanf() to convert the string. For example,
char line[] = "John,100";
char name[20];
int score;
if (sscanf(line, "%19[^,],%d", name, &score) != 2) {
fprintf(stderr, "Cannot parse line[] correctly.\n");
return EXIT_FAILURE;
}
Here, the 19 refers to the number of chars in name (one is always reserved for the end-of-string nul char, '\0'), and [^,] is a string conversion, consuming everything except a comma. %d converts an int. The return value is the number of successful conversions.
This approach does not modify the original string, and it allows you to try a number of different parsing patterns; as long as you try them the most complex one first, you can allow multiple input formats with very little added code. I do this regularly when taking 2D or 3D vectors as inputs.
The downside is that sscanf() (all functions in the scanf family) ignores overflow. For example, on 32-bit architectures, the largest int is 2147483647, but scanf functions will happily convert e.g. 9999999999 to 1410065407 (or some other value!) without returning an error. You can only assume the numerical inputs are sane and within the limits; you cannot verify.
Use helper functions to tokenise and/or parse the string.
Typically, the helper functions are something like
char *parse_string(char *source, char **to);
char *parse_long(char *source, long *to);
where source is a pointer to the next character in the string to be parsed, and to is a pointer to where the parsed value will be stored; or
char *next_string(char **source);
long next_long(char **source);
where source is a pointer to a pointer to the next character in the string to be parsed, and the return value is the value of the extracted token.
These tend to be longer than above, and if written by me, then quite paranoid about the inputs they accept. (I want my programs to complain if their input cannot be reliably parsed, rather than silently produce garbage.)
If the data is some variant of CSV (comma-separated values) read from a file, then the proper approach is a different one: instead of reading the file line by line, you read the file token by token.
The only "trick" is to remember the separator character that ended the token (you can use ungetc() for this), and use a different function to (read and ignore the rest of the tokens in the current record, and) consume the newline separator.
I want to take input of a particular part of a string like
"First (helloWorld): last"
From that string I want to take input only "helloWorld" by regular expression. I am using
%*[^(] (%s):"
But that does not serve my purpose. Please somebody help me to solve this problem.
The format specifiers in the scanf family of functions are not generally considered to be a species of regular expression.
However, you can do what you want something like this.
#include <stdio.h>
int main() {
char str[256];
scanf("First (helloWorld): last", "%*[^(](%[^)]%*[^\n]", str);
printf("%s\n", str);
return 0;
}
%*[^(] read and discard everything up to opening paren
( read and discard the opening paren
%[^)] read and store up up to (but not including) the closing paren
%*[^\n] read and discard up to (but not including) the newline
The last format specifier is not necessary in the context of the above sscanf, but would be useful if reading from a stream and you want it positioned at the end of the current line for the next read. Note that the newline is still left in the stream, though.
Rather than use fscanf (or scanf) to read from a stream directly, it's pretty much always better read a line with fgets and then extract the fields of interest with sscanf
// Read lines, extracting the first parenthesized substring.
#include <stdio.h>
int main() {
char line[256], str[128];
while (fgets(line, sizeof line, stdin)) {
sscanf(line, "%*[^(](%127[^)]", str);
printf("|%s|\n", str);
}
return 0;
}
Sample run:
one (two) three
|two|
four (five) six
|five|
seven eight (nine) ten
|nine|
Sorry, no true regular expression parser in standard C.
Using the format in the scanf() family is not a full-fledged regular expression, but can do the job. "%n" tells sscanf() to save the current scanning offset.
#include <stdio.h>
#include <stdlib.h>
char *foo(char *buf) {
#define NotOpenParen "%*[^(]"
#define NotCloseParen "%*[^)]"
int start;
int end = 0;
sscanf(buf, NotOpenParen "(%n" NotCloseParen ")%n", &start, &end);
if (end == 0) {
return NULL; // End never found
}
buf[end-1] = '\0';
return &buf[start];
}
// Usage example
char buf[] = "First (helloWorld): last";
printf("%s\n", foo(buf));
But this approach fails on "First (): last". More code would be needed.
A pair of strchr() calls is better.
char *foo(char *buf) {
char *start = strchr(buf, '(');
if (start == NULL) {
return NULL; // start never found
}
char *end = strchr(start, ')');
if (end == NULL) {
return NULL; // end never found
}
*end = '\0';
return &start[1];
}
Else one needs to use a not-part-of-the C-spec solution.