how to write out a tokenized string

how to write out a tokenized string - c

I am quite new to C and any examples I have found of my problem didn't seem to work, or I completely misunderstood what that solution was. I have a large file with data that looks like:
LYS 24L HB2 45.212 39.585 124.457 SC0 0.145 -0.795 0.585 0.157
on each line. I have tokenized the data already using strtok. What I need is from the second field, I wish to have the 24 stored as an integer for comparison, and the L to be stored as a char for comparison as well.
I tried using
sscanf(token[1], "%d%s", number, letter);
but I keep getting Segmentation fault error. Also upon further experimenting with sscanf i tried simply to print out "LYS" (in attempt to further understand my problem) however my program would only print L using the following command:
sscanf(token[0], "%c", &stemp);
letter = stemp;
printf("%c \n", letter);
However if change %c ---> %s (hoping to obtain the whole string) then I obtain the Segmentation fault error once again ... Is there something I do not understand about the sscanf command? Why can i not read in a full string?? Thank you in advance for your time and help!!
Paul

I suspect the issue is actually that number and letter are of type int and char, respectively. scanf() needs addresses of memory locations in which to store the values, not the variables themselves; i.e.,
int number;
char letter[2];
sscanf(token[1], "%d%s", &number, letter);
I've made letter into an array of two characters, and am passing the address of the array; that matches the %s scan conversion that you've used.

Related

segmentation fault with c program

The program is getting input from a text file and then parsing them by space and then storing them in an array. I'm using the "<" in linux to read in the file.I needed to create a dynamically allocated array to store the words
if the input strings are "I am the cat in the hat"
then the array should look like:
index:
"I" "am" "the" "cat" "in" "the" "hat"
0 1 2 3 4 5 6
I am getting a segmentation fault and I can't figure out why I am getting the error? I need to Any help would be appreciated.
code:
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
int main()
{
int wordCount = 1, i;
char ch,* wordsArray;
while((ch = getc(stdin)) != EOF)
{
if(ch == ' ') {++wordCount;}
}
wordsArray =(char*)malloc(wordCount * sizeof(char));
for(i = 0; i<wordCount; i++)
{
scanf("%s",&wordsArray[i]);
printf("%s\n", wordsArray[i])
}
return 0;
}
Edit: I have posted the actual code. I am no longer getting segmentation falls, but the words are not going in the array.
example output:
a.out<dict.txt
(null)
(null)
(null)
(null)
(null)
(null)
(null)

Briefly, you're asking scanf to scan strings into memory space that you've only allocated big enough for a few integers. Observe that your malloc call is given a number of ints to allocate and then you're assigning that to a char *. Those two things don't mix, and what's happening is you're scanning a bunch of bytes into your buffer that is too small, running off the end, and stomping on something else, then crashing. The compiler is letting you get away with this because both malloc and scanf are designed to work with various types (void*), so the compiler doesn't know to enforce the type consistency in this case.
If your goal is to save the word tokens from the input string into an array of words, you can do that but you need to allocate something quite different up front and manage it a bit differently.
But if your goal is simply to produce the output you want, there are actually simpler ways of doing it, like with using strtok or just looping through the input char by char and tracking where you hit spaces, etc.

What you're trying to accomplish is parsing standard input. Some functions are available to assist you, such as strtok() and strsep().
You will need an input buffer, which you can mallocate or use an array (I would use an array). As the functions parse the input string, you can fling off the words' addresses into a char pointer array. You could also build a linked list.
Finally, you must buffer input, which adds some delicious complexity to the operation. Still, it's nothing a good C programmer can't handle. Have fun.

Why can i not use %s instead of %c?

The whole function the question is about is about giving a two dimensional array initialized with {0} as output and making a user able to move a 1 over the field with
char wasd;
scanf("%c", &wasd);
(the function to move by changing the value of the variable wasd is not important i think)
now my question is why using
scanf("%s", &wasd);
does only work partly(sometimes the 1 keeps being at a field and appears a 2nd time at the new place though it actually should be deleted)
and
scanf("%.1s", &wasd);
leads to the field being printed out without stop until closing the execution program. I came up with using %.1s after researching the difference between %c and %s here Why does C's printf format string have both %c and %s?? If one can figure out the answer by reading through that, i am not clever or far enough with c learning to get it.
I also found this fscanf() in C - difference between %s and %c but i do not know anything about EOF which one answer says is the cause of the problem so i would prefer getting an answer without it.
Thank you for an answer

Simple as that, %s is the conversion for a (non-empty) string. A string in C always ends with a 0 byte, so any non-empty string needs at least two bytes. If you pass a pointer to a single char variable, scanf() will just overwrite whatever is in memory after that variable -- you cause undefined behavior and anything can happen.
Side note, scanf("%s", ..), even if you give it an array of char, will always overflow the buffer if something longer is entered, therefore causing undefined behavior. You have to include a field width like
char str[10];
scanf("%9s", str);
Best is not to use scanf() at all. For your single character input, you can just use getchar() (be aware it returns an int). You might also want to read my beginners' guide away from scanf.

A char variable can hold only one byte of memory to hold a single character. But a string (array of characters) is different from a char variable as it is always ended with a null character \0 or numeric 0. So in scanf you specifically mentioned whether you are reading a character or a string so that scanf can add a null character at the end of a string. So you are not suppose to use a %s to read a value for a char variable

Why in C, when reading files, do we have to use a character array?

I've literally just started programming in C. Coming from a little understanding of Python.
Just had a lecture on C, the lecture was about this:
#include <stdio.h>
int main() {
FILE *file;
char name[10], degree[5];
int mark;
file = fopen("file.txt", "r");
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
fclose(file);
}
I'm specifically asking why we the lecturer would have used an array rather than just declaring two string variables. Searching for a deeper answer than just, "that's just C for you".

There are multiple typos on this line:
while (fscan(file("%s %s %d", name, degree, &mark) != EOF);
printf("%s %s %d", name, degree, mark);
It should read:
while (fscanf(file, "%s %s %d", name, degree, &mark) == 3)
printf("%s %s %d", name, degree, mark);
Can you spot the 4 mistakes?
The function is called fscanf
file is an argument followed by ,, not a function name followed by (
you should keep looping for as long as fscanf converts 3 values. If conversion fails, for example because the third field is not a number, it will return a short count, not necessarily EOF.
you typed an extra ; after the condition. This is parsed as an empty statement: the loop keeps reading until end of file, doing nothing, and finally executes printf just once, with potentially invalid arguments.
The programmer uses char arrays and passes their address to fscanf. If he had used pointers (char *), he would have needed to allocate memory to make them point to something and pass their values to fscanf, a different approach that is not needed for fixed array sizes.
Note that the code should prevent potential buffer overflows by specifying the maximum number of characters to store into the arrays:
while (fscanf(file, "%9s %4s %d", name, degree, &mark) == 3) {
printf("%s %s %d", name, degree, mark);
}
Note also that these hard-coded numbers must match the array sizes minus 1, an there is no direct way to pass the array sizes to fscanf(), a common source of bugs when the code is modified. This function has many quirks and shortcomings, use with extreme care.

Take name[] for example. It's an array of chars, a collection of chars if you like.
There is no string type in c, so we use an array an array of chars when we want to use a string.
The code is written as such, so that we can read the actual string in a line of the file, in our array.
As a side note, this program will produce syntax errors.

What you have in every non-empty file is a series of bytes. Therefor, what your C program has to do is read bytes. Since the variable type char is used to represent a byte, and since you want to read multiple bytes at once for efficiency purposes, you read an array of chars. That's for the general understanding of what reading from a file means.
Going back to your example, there is no string type in C. A string is an array of bytes (char[]) ended by a nul character.
What the lecturer is doing is:
define an array of chars (which will contain a string) => char name[10] is an array that may contain a string of at most 9 characters (the last byte would be used for '\0').
ask fscanf to read a string, which means it will read multiple bytes (multiple chars) until it finds a nul character ('\0') and put all of that in the given array.
To understand what's happening, forget about string as an opaque data type (which might be true in other programming languages) and see them as what they really are: arrays of chars.

why i have exc_bad_access doing cast?

i am programming in C and i have a problem while casting an int into a char.
I am using my mac with Xcode to program c.
The code is:
int main(){
int t = 2;
printf("test %s\n", (char)t); //EXC_BAD_ACCESS
return 0;
}
I tried all I found in many post, I really don't know what is going on... any suggestion?

Please include the goal of your code in your question not in a comment below.
If you want the output
test 2
your have to change %s to %d
printf("test %d\n", t);
I guess you got the wrong idea about the %s. It does not tell the printf that you want to have the int as string, it does tell printf that the parameter is a string! It is obviously not, so you got the exception.
if you use %c you tell the printf function that you want to output your number as character from your current ASCII table. For example 65 is 'A'.
If you have a concatenation situation like
strcpy(str_buscar, "controlID='");
strcat(str_buscar, (char) t);
strcat(str_buscar, "'");
you need itoa instead of the cast:
strcat(str_buscar, (char) t);
you need the follow:
char buffer[32]; // enough space for a "number as string"
itoa(t,buffer,10);
strcat(str_buscar, buffer);
a (IMHO) shortcut is to "print" to a buffer with sprintf
sprintf(str_buscar,"controlID='%d'",t);
instead of printing to a console sprintf prints into the given buffer. Make shure that your buffer str_buscar is big enough.

The %s format specifier represents a string, not an individual character. Printf thinks that the number 2 you're passing it is a string's address. It tries to access the memory address 2 and fails, because that address doesn't exist.
If you want to print a character, you'll want the %c specifier. This tells printf to print the character whose ASCII code is 2. The ASCII character number 2 is, according to the ASCII table, a control character that cannot be printed, which is why you're getting strange output.
If you actually want to print the character '2' (which has a different code, 50), you will want to use something like:
printf("test: %c", (char)('0' + c));
This example leverages the fact that all ASCII characters have consecutive codes, starting with 48 ('0'). This way, if you wanted to print the digit 0, you'd end up printing the '0' character (ASCII code 48 = 48 + 0). If you want to print the digit 2, you'll end up printing the '2' character (50 = 48 + 2).
This way, however, is a bit clunky and fails when encountering numbers larger than 9 (i.e. it only works with digits). The easier way consists of no longer working with characters at all and, instead, using the '%d' specifier (used for printing whole number):
int t = 0;
printf("test: %d", t);

Why does C's printf format string have both %c and %s?

Why does C's printf format string have both %c and %s?
I know that %c represents a single character and %s represents a null-terminated string of characters, but wouldn't the string representation alone be enough?

Probably to distinguish between null terminated string and a character. If they just had %s, then every single character must also be null terminated.
char c = 'a';
In the above case, c must be null terminated. This is my assumption though :)

%s prints out chars until it reaches a 0 (or '\0', same thing).
If you just have a char x;, printing it with printf("%s", &x); - you'd have to provide the address, since %s expects a char* - would yield unexpected results, as &x + 1 might not be 0.
So you couldn't just print a single character unless it was null-terminated (very inefficent).
EDIT: As other have pointed out, the two expect different things in the var args parameters - one a pointer, the other a single char. But that difference is somewhat clear.

The issue that is mentioned by others that a single character would have to be null terminated isn't a real one. This could be dealt with by providing a precision to the format %.1s would do the trick.
What is more important in my view is that for %s in any of its forms you'd have to provide a pointer to one or several characters. That would mean that you wouldn't be able to print rvalues (computed expressions, function returns etc) or register variables.
Edit: I am really pissed off by the reaction to this answer, so I will probably delete this, this is really not worth it. It seems that people react on this without even having read the question or knowing how to appreciate the technicality of the question.
To make that clear: I don't say that you should prefer %.1s over %c. I only say that reasons why %c cannot be replaced by that are different than the other answer pretend to tell. These other answers are just technically wrong. Null termination is not an issue with %s.

The printf function is a variadic function, meaning that it has variable number of arguments. Arguments are pushed on the stack before the function (printf) is called. In order for the function printf to use the stack, it needs to know information about what is in the stack, the format string is used for that purpose.
e.g.
printf( "%c", ch ); tells the function the argument 'ch'
is to be interpreted as a character and sizeof(char)
whereas
printf( "%s", s ); tells the function the argument 's' is a pointer
to a null terminated string sizeof(char*)
it is not possible inside the printf function to otherwise determine stack contents e.g. distinguishing between 'ch' and 's' because in C there is no type checking during runtime.

%s says print all the characters until you find a null (treat the variable as a pointer).
%c says print just one character (treat the variable as a character code)
Using %s for a character doesn't work because the character is going to be treated like a pointer, then it's going to try to print all the characters following that place in memory until it finds a null
Stealing from the other answers to explain it in a different way.
If you wanted to print a character using %s, you could use the following to properly pass it an address of a char and to keep it from writing garbage on the screen until finding a null.
char c = 'c';
printf('%.1s', &c);

For %s, we need provide the address of string, not its value.
For %c, we provide the value of characters.
If we used the %s instead of %c, how would we provide a '\0' after the characters?

Id like to add another point of perspective to this fun question.
Really this comes down to data typing. I have seen answers on here that state that you could provide a pointer to the char, and provide a
"%.1s"
This could indeed be true. But the answer lies in the C designer's trying to provide flexibility to the programmer, and indeed a (albeit small) way of decreasing footprint of your application.
Sometimes a programmer might like to run a series of if-else statements or a switch-case, where the need is to simply output a character based upon the state. For this, hard coding the the characters could indeed take less actual space in memory as the single characters are 8 bits versus the pointer which is 32 or 64 bits (for 64 bit computers). A pointer will take up more space in memory.
If you would like to decrease the size through using actual chars versus pointers to chars, then there are two ways one could think to do this within printf types of operators. One would be to key off of the .1s, but how is the routine supposed to know for certain that you are truly providing a char type versus a pointer to a char or pointer to a string (array of chars)? This is why they went with the "%c", as it is different.
Fun Question :-)

C has the %c and %s format specifiers because they handle different types.
A char and a string are about as different as night and 1.

%c expects a char, which is an integer value and prints it according to encoding rules.
%s expects a pointer to a location of memory that contains char values, and prints the characters in that location according to encoding rules until it finds a 0 (null) character.
So you see, under the hood, the two cases while they look alike they have not much in common, as one works with values and the other with pointers. One is instructions for interpreting a specific integer value as an ascii char, and the other is iterating the contents of a memory location char by char and interpreting them until a zero value is encountered.

I have done a experiment with printf("%.1s", &c) and printf("%c", c).
I used the code below to test, and the bash's time utility the get the runing time.
#include<stdio.h>
int main(){
char c = 'a';
int i;
for(i = 0; i < 40000000; i++){
//printf("%.1s", &c); get a result of 4.3s
//printf("%c", c); get a result of 0.67s
}
return 0;
}
The result says that using %c is 10 times faster than %.1s. So, althought %s can do the job of %c, %c is still needed for performance.

Since no one has provided an answer with ANY reference whatsoever, here is a printf specification from pubs.opengroup.com which is similar to the format definition from IBM
%c
The int argument shall be converted to an unsigned char, and the resulting byte shall be written.
%s
The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

how to write out a tokenized string - c

Related

segmentation fault with c program

Why can i not use %s instead of %c?

Why in C, when reading files, do we have to use a character array?

why i have exc_bad_access doing cast?

Why does C's printf format string have both %c and %s?

Categories

Resources