splitting string in c - c

I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.

I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.

A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.

fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);

Related

read 2 bytes in hexadecimal base and convert into decimal using C language fscanf

Well as said Im using C language and fscanf for this task but it seems to make the program crash each time then its surely that I did something wrong here, I havent dealed a lot with this type of input read so even after reading several topics here I still cant find the right way, I have this array to read the 2 bytes
char p[2];
and this line to read them, of course fopen was called earlier with file pointer fp, I used "rb" as read mode but tried other options too when I noticed this was crashing, Im just saving space and focusing in the trouble itself.
fscanf(fp,"%x%x",p[0],p[1]);
later to convert into decimal I have this line (if its not the EOF that we reached)
v = strtol(p, 0, 10);
Well v is mere integer to store the final value we are seeking. But the program keeps crashing when scanf is called or I think thats the case, Im not compiling to console so its a pitty that I cant output what has been done and what hasnt but in debugger it seems like crashing there
Well I hope you can help me out in this, Im a bit lost regarding this type of read/conversion any clue will help me greatly, thanks =).
PS forgot to add that this is not homework, a friend want to make some file conversion for a game and this code will manipulate the files needed alone, so while I could be using any language or environment for this, I always feel better in C language
char strings in C are really called null-terminated byte strings. That null-terminated part is important, as it means a string of two characters needs space for three characters to include the null-terminator character '\0'. Not having the terminator means string functions will go out of bounds in their search for it, leading to undefined behavior.
Furthermore the "%x" format is to read a heaxadecimal integer number and store it in an int. Mismatching format specifiers and arguments leads to undefined behavior.
Lastly and probably what's causing the crash: The scanf family of function expects pointers as their arguments. Not providing pointers will again lead to undefined behavior.
There are two solutions to the above problems:
Going with code similar to what you already use, first of all you must make space for the terminator in the array. Then you need to read two characters. Lastly you need to add the terminator:
char p[3] = { 0 }; // String for two characters, initialized to zero
// The initialization means that we don't need to explicitly add the terminator
// Read two characters, skipping possible leading white-space
fscanf(fp," %c%c",p[0],p[1]);
// Now convert the string to an integer value
// The string is in base-16 (two hexadecimal characters)
v = strtol(p, 0, 16);
Read the hexadecimal value into an integer directly:
unsigned int v;
fscanf(fp, "%2x", &v); // Read as hexadecimal
The second alternative is what I would recommend. It reads two characters and parses it as a hexadecimal value, and stores the result into the variable v. It's important to note that the value in v is stored in binary! Hexadecimal, decimal or octal are just presentation formats, internally in the computer it will still be stored in binary ones and zeros (which is true for the first alternative as well). To print it as decimal use e.g.
printf("%d\n", v);
You need to pass to fscanf() the address of a the variable(s) to scan into.
Also the conversion specifier need to suite the variable provided. In your case those are chars. x expects an int, to scan into a char use the appropriate length modifiers, two times h here:
fscanf(fp, "%hhx%hhx", &p[0], &p[1]);
strtol() expects a C-string as 1st parameter.
What you pass isn't a C-string, as a C-string ought to be 0-terminated, which p isn't.
To fix this you could do the following:
char p[3];
fscanf(fp, "%x%x", p[0], p[1]);
p[2] = '\0';
long v = strtol(p, 0, 10);

Using fscanf to get 2 string in 2 columns

I'm trying to make my program read a file with 2 columns and the first column contains some strings and i cant make it to store into an array
Here is my code:
fp2=fopen("Symbol Table.txt","r");char str[100];
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
puts(stemp[scnt++]);getch(); //This is just here to display conents of second col
}
fclose(fp2)
and here is my txt file:
void void
main Main
( Left Parenthesis
) Right Parenthesis
{ Left Brace
S Identifier
: Colon
$% Start of Block Comment
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV String
%$ End of Block Comment
Unsigned Noise Words
int Integer
the code store the long string is divided before going into the array
When you try to read strings with scanf or fscanf it reads the string until it encounters a space, a tab, or a newline. In this case, as I can see, in the first loop the str and stemp will be assigned to the stings "void" and "void", in the second loop, it will be "main" and "Main" and in the third loop, "(" and "Left" and so on.
You need to separate the two columns with multiple strings with a special character like the tab (\t) character so that one knows when the 1st column ends and read with the getc function instead of scanf until you reach that special character in your line and then combine all the characters (i.e. letters) in to a string. You can adopt the same method to read the second column with a getc function until you encounter the newline (\n).
while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) {
is not going to work
"%s" scans for non- white-space text. "This program is" and "Left Parenthesis" have embedded white-space.
Need to read a line-at-a-time and then parse the line into 2 strings for the 2 columns.
char buf[100];
if (fgets(buf, sizeof buf, fp2) == NULL) Handle_EOForIOError();
column[2][100];
size_t len = strlen(buf);
if (len < 62) Handle_ShortLine();
buf[61] = = '\0'; // cut the line in 2
column[0][0] = '\0'; // In case column is all white-space
sscanf(buf, " %[^\n]", column[0]);
column[1][0] = '\0';
sscanf(&buf[62], " %[^\n]", column[1]);
Other answers contain statements which are not true. Actually,
fscanf does not always read strings only until a white-space character. The conversion specifier [ also reads a string, and includes the set of expected characters.
You do not need to separate the two columns with a special character, since the columns are of known width 62.
You do not need to read a line at a time and then parse it; though, that might be easier.
So, if you prefer fscanf, you can use
while (fscanf(fp2, "%62[^\n]%[^\n]\n", str, stemp[scnt]) == 2)
(your comparing with NULL is wrong, because NULL is a pointer constant and fscanf doesn't return a pointer but the number of input items assigned or EOF).
A version without fscanf:
while (fgets(str, 62+1, fp2) && fgets(stemp[scnt], 62+1, fp2) && getc(fp2))
(The getc reads the \n.)
The Nature of while(fscanf(fp2,"%s %s",str,stemp[scnt])!=NULL) is will read only the two string separated by a space. if your text file contains long strings like this
This program is a simple calculatorFuctions:ADD,SUB,MULT,DIV
While fetching it will divide this strings into parts of %s %s and fetch. So change your text file in the following way and try:
void void
main Main
( LeftParenthesis
) RightParenthesis
{ LeftBrace
S Identifier
: Colon
$% StartofBlockComment
ThisProgramIsASimpleCalculatorFuctions:ADD,SUB,MULT,DIV String
%$ EndofBlockComment
Unsigned NoiseWords
int Integer

Sscanf not returning what I want

I have the following problem:
sscanf is not returning the way I want it to.
This is the sscanf:
sscanf(naru,
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]"
"%[^;]%[^;]%[^;]%[^;]%[^;]%[^;]",
&jokeri, &paiva1, &keskilampo1, &minlampo1, &maxlampo1,
&paiva2, &keskilampo2, &minlampo2, &maxlampo2, &paiva3,
&keskilampo3, &minlampo3, &maxlampo3, &paiva4, &keskilampo4,
&minlampo4, &maxlampo4, &paiva5, &keskilampo5, &minlampo5,
&maxlampo5, &paiva6, &keskilampo6, &minlampo6, &maxlampo6,
&paiva7, &keskilampo7, &minlampo7, &maxlampo7);
The string it's scanning:
const char *str = "city;"
"2014-04-14;7.61;4.76;7.61;"
"2014-04-15;5.7;5.26;6.63;"
"2014-04-16;4.84;2.49;5.26;"
"2014-04-17;2.13;1.22;3.45;"
"2014-04-18;3;2.15;3.01;"
"2014-04-19;7.28;3.82;7.28;"
"2014-04-20;10.62;5.5;10.62;";
All of the variables are stored as char paiva1[22] etc; however, the sscanf isn't storing anything except the city correctly. I've been trying to stop each variable at ;.
Any help how to get it to store the dates etc correctly would be appreciated.
Or if there's a smarter way to do this, I'm open to suggestions.
There are multiple problems, but BLUEPIXY hit the first one — the scan-set notation doesn't follow %s.
Your first line of the format is:
"%s[^;]%s[^;]%s[^;]%s[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
As it stands, it looks for a space separated word, followed by a [, a ^, a ;, and a ] (which is self-contradictory; the character after the string is a space or end of string).
The first fixup would be to use scan-sets properly:
"%[^;]%[^;]%[^;]%[^;]%f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
Now you have a problem that the first %[^;] scans everything up to the end of string or first semicolon, leaving nothing for the second %[;] to match.
"%[^;]; %[^;]; %[^;]; %[^;]; %f[^';']%f[^';']%[^;]%[^;]%[^;]%[^;]"
This looks for a string up to a semicolon, then for the semicolon, then optional white space, then repeats for three items. Apart from adding a length to limit the size of string, preventing overflow, these are fine. The %f is OK. The following material looks for an odd sequence of characters again.
However, when the data is looked at, it seems to consist of a city, and then seven sets of 'a date plus three numbers'.
You'd do better with an array of structures (if you've worked with those yet), or a set of 4 parallel arrays, and a loop:
char jokeri[30];
char paiva[7][30];
float keskilampo[7];
float minlampo[7];
float maxlampo[7];
int eoc; // End of conversion
int offset = 0;
char sep;
if (fscanf(str + offset, "%29[^;]%c%n", jokeri, &sep, &eoc) != 2 || sep != ';')
...report error...
offset += eoc;
for (int i = 0; i < 7; i++)
{
if (fscanf(str + offset, "%29[^;];%f;%f;%f%c%n", paiva[i],
&keskilampo[i], &minlampo[i], &maxlampo[i], &sep, &eoc) != 5 ||
sep != ';')
...report error...
offset += eoc;
}
See also How to use sscanf() in loops.
Now you have data that can be managed. The set of 29 separately named variables is a ghastly thought; the code using them will be horrid.
Note that the scan-set conversion specifications limit the string to a maximum length one shorter than the size of jokeri and the paiva array elements.
You might legitimately be wondering about why the code uses %c%n and &sep before &eoc. There is a reason, but it is subtle. Suppose that the sscanf() format string is:
"%29[^;];%f;%f;%f;%n"
Further, suppose there's a problem in the data that the semicolon after the third number is missing. The call to sscanf() will report that it made 4 successful conversions, but it doesn't count the %n as an assignment, so you can't tell that sscanf() didn't find a semicolon and therefore did not set &eoc at all; the value is left over from a previous call to sscanf(), or simply uninitialized. By using the %c to scan a value into sep, we get 5 returned on success, and we can be sure the %n was successful too. The code checks that the value in sep is in fact a semicolon and not something else.
You might want to consider a space before the semi-colons, and before the %c. They'll allow some other data strings to be converted that would not be matched otherwise. Spaces in a format string (outside a scan-set) indicate where optional white space may appear.
I would use strtok function to break your string into pieces using ; as a delimiter. Such a long format string may be a source of problems in future.

moving elements of array in c, is there a better option

I am reading input from file (line by line) Each line is a state of a game board. Below is example of input:
(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0
I have used fgets() and strtok() to split the string at (),
My problem:
I want the first 6 integers in their individual variables such as:
int column = 8
int row = 7
so on..
I want to get rid of the last integer at the end of input- 0
and the chars should be stored in an array, because they represent pieces of a board.
Right now, I have an array with all the integers and chars stored together.
I can iterate through my array, and copy the integers to their variables and then chars to a new array. But that's inefficient.
Is there another way to do it?
I used fscanf() but don't know how to split the string using delimiters.
Thanks
WELL-FORMED INPUT ONLY
if (fscanf(FILE_PTR, "(%d,%d,...,%c,%c,%c,...,%c) %*d", &column, &row, ..., &chars[0], &chars[1], ...) == 60)
or something like that
the %*d specifier will discard that input (you didn't want the last number)
for the chars, give pointers to their indices for a preallocated array
for the ints, give the variable pointer/ref
Thank you to Jon Leffler for reminding that you should test the output of *scanf (number of things read)!
More information
REEDIT nope, it was right -
int fscanf ( FILE * stream, const char * format, ... );
format: C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type and format of the data to be retrieved from the stream and stored into the locations pointed by the additional arguments.
Above quote from here. I am aware of the hostility towards cplusplus.com here but I do not have access to the standard. please feel free to edit if you do
I have used fgets() and strtok() to split the string at "()"
later
I used fscanf() but don't know how to split the string using delimiters.
I guess if strtok() worked for parenthesis, it would work for commas too.
Apart from that: you have several possibilities for doing what you want. Without much context provided, I can't really tell you which one you want, but here we go:
Grab a pointer to the first non-integer, and use it as if it was a pointer to the first element of another array, containing the integers only. This avoids all copying and/or moving overhead.
Use memcpy() to copy only the necessary parts of the array to another array. memcpy() is generally highly optimized and faster than the naive for-loop-with-assignment approach.
if you have a char * you can think of it as an array or as a string, as the memory layout is the same...
char * input = "(8,7,1,0,0,0,b,b,b,b,b,b,b,b,b,b,b,b,s,s,r,r,g,b,r,g,r,r,r,r,b,r,r,s,b,b,b,b,r,s,s,r,b,b,r,s,s,s,r,b,g,b,r,r,r,r,r,r,r,r,r,s) 0";
size_t len = strlen(input);
int currentIndex = 0;
char * output = calloc(1,len);
for (int i = 0 ; i<len ; i++)
{
if (input[i] == '(' || input[i] == ')' || input[i] == ','|| input[i] == ' ') {
continue;
}
output[currentIndex++] = input[i];
}
assert(strlen(output) == 63); //well formatted?
char a = output[0];
char b = output[1];
char (* board)[60] = malloc(60); //pointer to array or is it a mal-formed string.
memcpy(board, output+2, 60);
char last = output[62];
the main thing that I would add, if you want to use it more like a string, then you have to make the array 1 bigger and set board[60] = \0;

Invalid output with fscanf()

The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!
This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.
A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.
I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));

Resources