Tokenizing a phone number in C - c

I'm trying to tokenize a phone number and split it into two arrays. It starts out in a string in the form of "(515) 555-5555". I'm looking to tokenize the area code, the first 3 digits, and the last 4 digits. The area code I would store in one array, and the other 7 digits in another one. Both arrays are to hold just the numbers themselves.
My code seems to work... sort of. The issue is when I print the two storage arrays, I find some quirks;
My array aCode; it stores the first 3 digits as I ask it to, but then it also prints some garbage values notched at the end. I walked through it in the debugger, and the array only stores what I'm asking it to store- the 515. So how come it's printing those garbage values? What gives?
My array aNum; I can append the tokens I need to the end of it, the only problem is I end up with an extra space at the front (which makes sense; I'm adding on to an empty array, ie adding on to empty space). I modify the code to only hold 7 variables just to mess around, I step into the debugger, and it tells me that the array holds and empty space and 6 of the digits I need- there's no room for the last one. Yet when I print it, the space AND all 7 digits are printed. How does that happen?
And how could I set up my strtok function so that it first copies the 3 digits before the "-", then appends to that the last 4 I need? All examples of tokenization I've seen utilize a while loop, which would mean I'd have to choose either strcat or strcpy to complete my task. I can set up an "if" statement to check for the size of the current token each time, but that seems too crude to me and I feel like there's a simpler method to this. Thanks all!
int main() {
char phoneNum[]= "(515) 555-5555";
char aCode[3];
char aNum[7];
char *numPtr;
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
printf("%s\n", aCode);
numPtr = strtok(&phoneNum[6], "-");
while (numPtr != NULL) {
strcat(aNum, numPtr);
numPtr = strtok(NULL, "-");
}
printf("%s", aNum);
}

I can primarily see two errors,
Being an array of 3 chars, aCode is not null-terminated here. Using it as an argument to %s format specifier in printf() invokes undefined behaviour. Same thing in a differrent way for aNum, too.
strcat() expects a null-terminated array for both the arguments. aNum is not null-terminated, when used for the first time, will result in UB, too. Always initialize your local variables.
Also, see other answers for a complete bug-free code.

The biggest problem in your code is undefined behavior: since you are reading a three-character constant into a three-character array, you have left no space for null terminator.
Since you are tokenizing a value in a very specific format of fixed length, you could get away with a very concise implementation that employs sscanf:
char *phoneNum = "(515) 555-5555";
char aCode[3+1];
char aNum[7+1];
sscanf(phoneNum, "(%3[0-9]) %3[0-9]-%4[0-9]", aCode, aNum, &aNum[3]);
printf("%s %s", aCode, aNum);
This solution passes the format (###) ###-#### directly to sscanf, and tells the function where each value needs to be placed. The only "trick" used above is passing &aNum[3] for the last argument, instructing sscanf to place data for the third segment into the same storage as the second segment, but starting at position 3.
Demo.

Your code has multiple issues
You allocate the wrong size for aCode, you should add 1 for the nul terminator byte and initialize the whole array to '\0' to ensure end of lines.
char aCode[4] = {'\0'};
You don't check if strtok() returns NULL.
numPtr = strtok(phoneNum, " ");
strncpy(aCode, &numPtr[1], 3);
Point 1, applies to aNum in strcat(aNum, numPtr) which will also fail because aNum is not yet initialized at the first call.
Subsequent calls to strtok() must have NULL as the first parameter, hence
numPtr = strtok(&phoneNum[6], "-");
is wrong, it should be
numPtr = strtok(NULL, "-");

Other answers have already mentioned the major issue, which is insufficient space in aCode and aNum for the terminating NUL character. The sscanf answer is also the cleanest for solving the problem, but given the restriction of using strtok, here's one possible solution to consider:
char phone_number[]= "(515) 555-1234";
char area[3+1] = "";
char digits[7+1] = "";
const char *separators = " (-)";
char *p = strtok(phone_number, separators);
if (p) {
int len = 0;
(void) snprintf(area, sizeof(area), "%s", p);
while (len < sizeof(digits) && (p = strtok(NULL, separators))) {
len += snprintf(digits + len, sizeof(digits) - len, "%s", p);
}
}
(void) printf("(%s) %s\n", area, digits);

Related

string gets filled with garbage

i got a string and a scanf that reads from input until it finds a *, which is the character i picked for the end of the text. After the * all the remaining cells get filled with random characters.
I know that a string after the \0 character if not filled completly until the last cell will fill all the remaining empty ones with \0, why is this not the case and how can i make it so that after the last letter given in input all the remaining cells are the same value?
char string1 [100];
scanf("%[^*]s", string1);
for (int i = 0; i < 100; ++i) {
printf("\n %d=%d",i,string1[i]);
}
if i try to input something like hello*, here's the output:
0=104
1=101
2=108
3=108
4=111
5=0
6=0
7=0
8=92
9=0
10=68
You have an uninitialized array:
char string1 [100];
that has indeterminate values. You could initialize the array like
char string1 [100] = { 0 };
or
char string1 [100] = "";
In this call
scanf("%[^*]s", string1);
you need to remove the trailing character s, because %[] and %s are distinct format specifiers. There is no %[]s format specifier. It should look like this:
scanf("%[^*]", string1);
The array contains a string terminated by the zero character '\0'.
So to output the string you should write for example
for ( int i = 0; string1[i] != '\0'; ++i) {
printf( "%c", string1[i] ); // or putchar( string1[i] );
putchar( '\n' );
or like
for ( int i = 0; string1[i] != '\0'; ++i) {
printf("\n %d=%c",i,string1[i]);
putchar( '\n' );
or just
puts( string1 );
As for your statement
printf("\n %d=%d",i,string1[i]);
then it outputs each character (including non-initialized characters) as integers due to using the conversion specifier d instead of c. That is the function outputs internal ASCII representations of characters.
I know that a string after the \0 character if not filled completly
until the last cell will fill all the remaining empty ones with \0
No, that's not true.
It couldn't be true: there is no length to a string. No where neither the compiler nor any function can even know what is the size of the string. Only you do. So, no, string don't autofill with '\0'
Keep in minds that there aren't any string types in C. Just pointer to chars (sometimes those pointers are constant pointers to an array, but still, they are just pointers. We know where they start, but there is no way (other than deciding it and being consistent while coding) to know where they stop.
Sure, most of the time, there is an obvious answer, that make obvious for any reader of the code what is the size of the allocated memory.
For example, when you code
char string1[20];
sprintf(string1, "hello");
it is quite obvious for a reader of that code that the allocated memory is 20 bytes. So you may think that the compiler should know, when sprinting in it of sscaning to it, that it should fill the unused part of the 20 bytes with 0. But, first of all, the compiler is not there anymore when you will sscanf or sprintf. That occurs at runtime, and compiler is at compilation time. At run time, there is not trace of that 20.
Plus, it can be more complicated than that
void fillString(char *p){
sprintf(p, "hello");
}
int main(){
char string1[20];
string1[0]='O';
string1[1]='t';
fillString(&(string1[2]));
}
How in this case does sprintf is supposed to know that it must fill 18 bytes with the string then '\0'?
And that is for normal usage. I haven't started yet with convoluted but legal usages. Such as using char buffer[1000]; as an array of 50 length-20 strings (buffer, buffer+20, buffer+40, ...) or things like
union {
char str[40];
struct {
char substr1[20];
char substr2[20];
} s;
}
So, no, strings are not filled up with '\0'. That is not the case. It is not the habit in C to have implicit thing happening under the hood. And that could not be the case, even if we wanted to.
Your "star-terminated string" behaves exactly as a "null-terminated string" does. Sometimes the rest of the allocated memory is full of 0, sometimes it is not. The scanf won't touch anything else that what is strictly needed. The rest of the allocated memory remains untouched. If that memory happened to be full of '\0' before the call to scanf, then it remains so. Otherwise not. Which leads me to my last remark: you seem to believe that it is scanf that fills the memory with non-null chars. It is not. Those chars were already there before. If you had the feeling that some other methods fill the rest of memory with '\0', that was just an impression (a natural one, since most of the time, newly allocated memory are 0. Not because a rule says so. But because that is the most frequent byte to be found in random area of memory. That is why uninitialized variables bugs are so painful: they occur only from times to times, because very often uninitialized variables are 0, just by chance, but still they are)
The easiest way to create a zeroed array is to use calloc.
Try replacing
char string1 [100];
with
char *string1=calloc(1,100);

String concatenation in C?

I am trying to understand string's behavior in C and it is bothering me since my following two code snippets result into different output:
(For the sake of this question, Let us assume user enters 12)
int main(void)
{
char L_Red[2];
char temp[] = "I";
printf("Enter pin connected to red: ");
scanf("%s", L_Red);
strcat(temp,L_Red);
printf("%s \n", temp);
return 0;
}
this yields: 12 as output (and not I12) Why ?
int main(void)
{
char L_Red[2];
printf("Enter pin connected to red: ");
scanf("%s", L_Red);
char temp[] = "I";
strcat(temp,L_Red);
printf("%s \n", temp);
return 0;
}
This yields: I12I (and not, I12) Why ?
I have read about string in C and as per my understanding, neither am I allocating temp any fixed size and changing it later to get these vague outputs nor am I using strings like the way they are not supposed to. Is there any other concept at play here ?
The array temp is an array of two characters (the 'I' and the string terminator '\0'). That's it. Attempting to append more characters to that array will write out of bounds and lead to undefined behavior.
You need to make sure that the destination array temp have enough space to fit its original content plus the string you want to append (plus the terminator).
Also, if you want to input more than one character for the "string" L_Red you need to increase its size as well.
I also recommend you use a limit in the format specifier so you can't write out of bounds:
char L_Red[3]; // Space for two characters, plus terminator
scanf("%2s", L_Red); // Read at most two characters of input
You are getting strange answers because your destination string (ie the first argument to strcat) is not long enough to handle both strings plus a null termination character. Also the length of L_Red is too short as it does not have enough space for the null termination character either.

C - How can I concatenate an array of strings into a buffer?

I am trying to concatenate a random number of lines from the song twinkle twinkle. Into the buffer before sending it out because I need to count the size of the buffer.
My code:
char temp_buffer[10000];
char lyrics_buffer[10000];
char *twinkle[20];
int arr_num;
int i;
twinkle[0] = "Twinkle, twinkle, little star,";
twinkle[1] = "How I wonder what you are!";
twinkle[2] = "Up above the world so high,";
twinkle[3] = "Like a diamond in the sky.";
twinkle[4] = "When the blazing sun is gone,";
twinkle[5] = "When he nothing shines upon,";
srand(time(NULL));
arr_num = rand() % 5;
for (i=0; i<arr_num; i++);
{
sprintf(temp_buffer, "%s\n", twinkle[i]);
strcat(lyrics_buffer, temp_buffer);
}
printf("%s%d\n", lyrics_buffer, arr_num);
My current code only prints 1 line even when I get a number greater than 0.
There are two problems: The first was found by BLUEPIXY and it's that your loop never does what you think it does. You would have found this out very easily if you just used a debugger to step through the code (please do that first in the future).
The second problem is that contents of non-static local variables (like your lyrics_buffer is indeterminate. Using such variables without initialization leads to undefined behavior. The reason this happens is because the strcat function looks for the end of the destination string, and it does that by looking for the terminating '\0' character. _If the contents of the destination string is indeterminate it will seem random, and the terminator may not be anywhere in the array.
To initialize the array you simply do e.g.
char lyrics_buffer[10000] = { 0 };
That will make the compiler initialize it all to zero, which is what '\0' is.
This initialization is not needed for temp_buffer because sprintf unconditionally starts to write at the first location, it doesn't examine the content in any way. It does, in other words, initialize the buffer.
Update the buffer address after each print after initializing buffer with 0.
char temp_buffer[10000] = {0};
for (i=0; i<arr_num; i++) //removed semicolon from here
{
sprintf(temp_buffer + strlen(temp_buffer), "%s\n", twinkle[i]);
}
temp_buffer should contain final output. Make sure you have enough buffer size
You don't need strcat

Splitting a string into an array of words in C

I'm trying to make a function that splits a cstring into an array of words. Eg. if I send in "Hello world" then I'd get an array with two places where the 1st place has the element "Hello", second "world". I'm getting segmentation fault and I can't for the life of me figure out what seems to be wrong.
In the for-loop I am checking how many spaces there are, which determines how many words there are in total (spacing is always N-1, N = number of words). Then I tell the program to add 1 to the counter (Because of N-1).
I declare a char* array[] in order to keep each of the individual words seperated, not sure if I need counter+1 here (because of the \0?)
This is where the tricky part comes in. I use strtok to seperate each words (using " "). I then malloc each position of char*array[] so that it has enough memory to store the words. This is done until there are no more words to put in.
I would really appreciate it if someone could give me a hint of which part is causing the segmentation fault!
void split(char* s){
int counter = 0;
int pos = 0;
for(int i=0; s[i]!='\0'; i++){
if(s[i] == ' '){
counter ++;
}
}
counter += 1;
char* array[counter+1];
char *token = strtok(s, " ");
while(token != NULL){
array[pos] = malloc(strlen(token));
strcpy(array[pos], token);
token = strtok(NULL, " ");
pos++;
}
if I send in "Hello world" then I'd get an array with two places where
the 1st place has the element "Hello", second "world".
No. You can't pass a string literal to that function. Because strtok() modifies its input. So, it'll attempt to modify string literal, resulting in undefined behaviour.
Be aware of the limitations of strtok(). From the man page of strtok():
Be cautious when using these functions. If you do use them, note
that:
* These functions modify their first argument.
* These functions cannot be used on constant strings.
* The identity of the delimiting byte is lost.
* The strtok() function uses a static buffer while parsing, so it's
not thread safe. Use strtok_r() if this matters to you.
So, you need to pass a pointer to a modifiable memory location if you want to use strtok().
As BLUPIXY pointed out, your malloc() call doesn't allocate sufficient space. You need one more byte than the string length (for the terminating nul byte).

Why is fgets() and strncmp() not working in this C code for string comparison?

This is a very fun problem I am running into. I did a lot of searching on stack overflow and found others had some similar problems. So I wrote my code accordingly. I originally had fscan() and strcmp(), but that completely bombed on me. So other posts suggested fgets() and strncmp() and using the length to compare them.
I tried to debug what I was doing by printing out the size of my two strings. I thought, maybe they have /n floating in there or something and messing it up (another post talked about that, but I don't think that is happening here). So if the size is the same, the limit for strncmp() should be the same. Right? Just to make sure they are supposedly being compared right. Now, I know that if the strings are the same, it returns 0 otherwise a negative with strncmp(). But it's not working.
Here is the output I am getting:
perk
repk
Enter your guess: perk
Word size: 8 and Guess size: 8
Your guess is wrong
Enter your guess:
Here is my code:
void guess(char *word, char *jumbleWord)
{
size_t wordLen = strlen(word);
size_t guessLen;
printf("word is: %s\n",word);
printf("jumble is: %s\n", jumbleWord);
char *guess = malloc(sizeof(char) * (MAX_WORD_LENGTH + 1));
do
{
printf("Enter your guess: ");
fgets(guess, MAX_WORD_LENGTH, stdin);
printf("\nword: -%s- and guess: -%s-", word, guess);
guessLen = strlen(guess);
//int size1 = strlen(word);
//int size2 = strlen(guess);
//printf("Word size: %d and Guess size: %d\n",size1,size2);
if(strncmp(guess,word,wordLen) == 0)
{
printf("Your guess is correct\n");
break;
}
}while(1);
}
I updated it from suggestions below. Especially after learning the difference between char * as a pointer and referring to something as a string. However, it's still giving me the same error.
Please note that MAX_WORD_LENGTH is a define statement used at the top of my program as
#define MAX_WORD_LENGTH 25
Use strlen, not sizeof. Also, you shouldn't use strncmp here, if your guess is a prefix of the word it will mistakenly report a match. Use strcmp.
sizeof(guess) is returning the size of a char * not the length of the string guess. Your problem is that you're using sizeof to manage string lengths. C has a function for string length: strlen.
sizeof is used to determine the size of data types and arrays. sizeof only works for strings in one very specific case - I won't go into that here - but even then, always use strlen to work with string lengths.
You'll want to decide how many characters you'll allow for your words. This is a property of your game, i.e. words in the game are never more that 11 characters long.
So:
// define this somewhere, a header, or near top of your file
#define MAX_WORD_LENGTH 11
// ...
size_t wordlen = strlen(word);
size_t guessLen;
// MAX_WORD_LENGTH + 1, 1 more for the null-terminator:
char *guess = malloc(sizeof(char) * (MAX_WORD_LENGTH + 1));
printf("Enter your guess: ");
fgets(guess, MAX_WORD_LENGTH, stdin);
guessLen = strlen(guess);
Also review the docs for fgets and note that the newline character is retained in the input, so you'll need to account for that if you want to compare the two words. One quick fix for this is to only compare up to the length of word, and not the length of guess, so: if( strncmp(guess, word, wordLen) == 0). The problem with this quick fix is that it will pass invalid inputs, i.e. if word is eject, and guess is ejection, the comparison will pass.
Finally, there's no reason to allocate memory for a new guess in each iteration of the loop, just use the string that you've already allocated. You could change your function setup to:
char guess(char *word, char *jumbledWord)
{
int exit;
size_t wordLen = strlen(word);
size_t guessLen;
char *guess = malloc(sizeof(char) * (MAX_WORD_LENGTH + 1));
do
{
printf("Enter your guess: ");
// ...
As everyone else has stated, use strlen not sizeof. The reason this is happening though, is a fundamental concept of C that is different from Java.
Java does not give you access to pointers. Not only does C have pointers, but they are fundamental to the design of the language. If you don't understand and use pointers properly in C then things won't make sense, and you will have quite a bit of trouble.
So, in this case, sizeof is returning the size of the char * pointer, which is (usually) 4 or 8 bytes. What you want is the length of the data structure "at the other end" of the pointer. This is what strlen encapsulates for you.
If you didn't have strlen, you would need to dereference the pointer, then walk the string until you find the null byte marking the end.
i = 1;
while(*guess++) { i++ }
Afterwards, i will hold the length of your string.
Update:
Your code is fine, except for one minor detail. The docs for fgets note that it will keep the trailing newline char.
To fix this, add the following code in between the fgets and strncmp sections:
if ( guess[guessLen-1] == '\n' ) {
guess[guessLen-1] = '\0';
}
That way the trailing newline, if any, gets removed and you are no longer off by one.
Some list of problems / advices for your code, much too long to fit in a comment:
your function returns a char which is strange. I don't see the
logic and what is more important, you actually never return a value. Don't do that, it will bring you trouble
look into other control structures in C, in particular don't do your exit thing. First, exit in C is a function, which does what it says, it exits the program. Then there is a break statement to leave a loop.
A common idiom is
do {
if (something) break;
} while(1)
you allocate a buffer in each iteration, but you never free it. this will give you big memory leaks, buffers that will be wasted and inaccessible to your code
your strncmp approach is only correct if the strings have the same length, so you'd have to test that first

Resources