Appending Characters to an Empty String in C - c

I'm relatively new to C, so any help understanding what's going on would be awesome!!!
I have a struct called Token that is as follows:
//Token struct
struct Token {
char type[16];
char value[1024];
};
I am trying to read from a file and append characters read from the file into Token.value like so:
struct Token newToken;
char ch;
ch = fgetc(file);
strncat(newToken.value, &ch, 1);
THIS WORKS!
My problem is that Token.value begins with several values I don't understand, preceding the characters that I appended. When I print the result of newToken.value to the console, I get #�����TheCharactersIWantedToAppend. I could probably figure out a band-aid solution to retroactively remove or work around these characters, but I'd rather not if I don't have to.
In analyzing the � characters, I see them as (in order from index 1-5): \330, \377, \377, \377, \177. I read that \377 is a special character for EOF in C, but also 255 in decimal? Do these values make up a memory address? Am I adding the address to newToken.value by using &ch in strncat? If so, how can I keep them from getting into newToken.value?
Note: I get a segmentation fault if I use strncat(newToken.value, ch, 1) instead of strncat(newToken.value, &ch, 1) (ch vs. &ch).

I'll try to consolidate the answers already given in the comments.
This version of the code uses strncat(), as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to strncat() must be a string, and not a pointer to a single char) (Yes, a "string" is actually a char[] and the value passed to the function is a char*; but it must point to an array of at least two chars, the last one containing a '\0'.)
Please be aware that strncat(), strncpy() and all related functions are tricky. They don't write more than N chars. But strncpy() only adds the final '\0' to the target string when the source has less than N chars; and strncat() always adds it, even if it the source has exactly N chars or more (edited; thanks, #Clifford).
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
This other version uses an index variable and writes each singe char directly into the "current" position of the target string, without using strncat(). I think is simpler and more secure, because it doesn't mix the confusing semantics of single chars and strings.
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
Edited: fgetc() returns an int and we should check for EOF before casting it to a char (thanks, #chqrlie).

You are appending string that is not initialised, so can contain anything. The end I'd a string is indicated by a NUL(0) character, and in your example there happened to be one after 6 bytes, but there need not be any within the value array, so the code is seriously flawed, and will result in non-deterministic behaviour.
You need to initialise the newToken instance to empty string. For example:
struct Token newToken = { "", "" } ;
or to zero initialise the whole structure:
struct Token newToken = { 0 } ;
The point is that C does not initialise non-static objects without an explicit initialiser.
Furthermore using strncat() is very inefficient and has non-deterministic execution time that depends on the length of the destination string (see https://www.joelonsoftware.com/2001/12/11/back-to-basics/).
In this case you would do better to maintain a count of the number of characters added, and write the character and terminator directly to the array. For example:
size_t index ;
int ch = 0 ;
do
{
ch = fgetc(file);
if( ch != EOF )
{
newToken.value[index] = (char)ch ;
index++ ;
newToken.value[index] = '\0' ;
}
} while( ch != EOF &&
index < size of(newToken.value) - 1 ) ;

Related

Convert string to boolean array

I need to convert a string that consists of a million 'zero' or 'one' characters (1039680 characters to be specific) to a boolean array. The way I have it now takes a few seconds for a 300000 character string and that is too long. I need to be able to do the whole milion character conversion in less than a second.
The way I tried to do it was to read a file with one line of (in this trial case) 300000 zeros.
I know my code will act funky for strings that contain stuff other than zeros or ones, but I know that the string will only contain those.
I also looked at atoi, but I don't think it would suit my needs.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define BUFFERSIZE 1039680
int main ()
{
int i ;
char buffer[BUFFERSIZE];
bool boolList[BUFFERSIZE] ;
// READ FILE WITH A LOT OF ZEROS
FILE *fptr;
if ((fptr=fopen("300000zeros.txt","r"))==NULL){
printf("Error! opening file\n");
exit(1);
}
fscanf(fptr,"%[^\n]",buffer);
fclose(fptr);
// CONVERT STRING TO BOOLEAN ARRAY
for (i=0 ; i<strlen(buffer) ; i++) {
if (buffer[i] == '1') boolList[i] = 1 ;
}
return 0;
}
Try
char *sptr = buffer;
bool *bptr = boolList;
while (*sptr != '\0')
*bptr++ = *sptr++ == '1'? 1:0;
If the string length is always 1039680 characters like you said then why do you use strlen(buffer) in your code? Why don't just loop BUFFERSIZE times? And if the string length can be changed somehow then you should cache the length into a variable like others said instead of calling it again and again each loop.
More importantly you haven't included space for the NULL termination byte in the buffer, so when you read exact BUFFERSIZE characters, the char array is not a valid NULL terminated string, hence calling strlen on it invokes undefined behavior
If you want to read the file as text then you must add one more char to buffer
char buffer[BUFFERSIZE + 1];
Otherwise, open the file as binary and read the whole 1039680-byte block at once. That'll be much faster
fread(buffer, sizeof(buffer[0]), BUFFERSIZE, fptr);
And then just loop over BUFFERSIZE bytes and set it to 0 without a branch
for (i = 0 ; i < BUFFERSIZE; i++)
{
buffer[i] -= '0';
}
You don't need another boolList, just use buffer as boolList or change the name to boolList and discard the buffer

How to use getchar() function from a stored string array?

I wrote a simple C program in Linux that reads a single character from a string. I get some error regarding string functions. This is my code:
#include <stdio.h>
#include <string.h>
void main () {
char arr[10], vv[10];
int i = 0, len;
printf("enter the staement\n");
scanf("%s", arr);
len = strlen(arr);
printf("String laength=%d\n", len);
while ((vv[i] = getchar(arr)) != '\n') {
printf("%d charcter\n");
i++;
}
}
I don't want to use getchar() directly on the input text like this:
arr[i] = getchar();
I want to use getchar() from a stored string like this:
getchar(string array);
But unfortunately I get an error. Can I use the getchar() function directly from a stored string array?
Read about getchar. The link clearly says that getchar is a function that gets a character (an unsigned char) from stdin. Also, it takes no arguments. This would mean that you cannot copy each character of an array to another array using getchar. Just copy it directly using
while( (vv[i] = arr[i]) != '\n')
But I don't think this loop will end as scanf does not include the newline character when scanning a string(%s). So,you got two options:
Use fgets to get input.
Use the following
while( (vv[i] = arr[i]) != '\0')
When you have string in C, it is actually an array of chars which is terminated by '\0'. You do not need any method to get chars from it. Simply get the char as if you were accessing an array.
while((vv[i] = arr[i])!='\n')
As you have you arr[10] it will cause issues when your input is larger than 10 characters including the '\0'. So it is be better to declare it with enough space!
vv is a single char. You may not write vv[i].
Also, are you sure you want \n and not \0 [null]? scanf() won't give you a string with \n in it.
EDIT:
It is still unclear what you want to achieve, but if you want to check the presence of valid characters in the arr or vv, you can
take the base address of the arr or vv into a char *p.
check if (*p++) and do something.
EDIT:
You may try out something like
char * ip = NULL;
char * op = NULL;
int i = 10; //same as array size.
ip = arr;
op = vv;
while( (*op++ = *ip++) && i--)
{
//do something
};

Array of strings being overwritten

I have a program that is trying to take a text file that consists of the following and feed it to my other program.
Bruce, Wayne
Bruce, Banner
Princess, Diana
Austin, Powers
This is my C code. It is trying to get the number of lines in the file, parse the comma-separated keys and values, and put them all in a list of strings. Lastly, it is trying to iterate through the list of strings and print them out. The output of this is just Austin Powers over and over again. I'm not sure if the problem is how I'm appending the strings to the list or how I'm reading them off.
#include<stdio.h>
#include <stdlib.h>
int main(){
char* fileName = "Example.txt";
FILE *fp = fopen(fileName, "r");
char line[512];
char * keyname = (char*)(malloc(sizeof(char)*80));
char * val = (char*)(malloc(sizeof(char)*80));
int i = 0;
int ch, lines;
while(!feof(fp)){
ch = fgetc(fp);
if(ch == '\n'){ //counts how many lines there are
lines++;
}
}
rewind(fp);
char* targets[lines*2];
while (fgets(line, sizeof(line), fp)){
strtok(line,"\n");
sscanf(line, "%[^','], %[^',']%s\n", keyname, val);
targets[i] = keyname;
targets[i+1] = val;
i+=2;
}
int q = 0;
while (q!=i){
printf("%s\n", targets[q]);
q++;
}
return 0;
}
The problem is with the two lines:
targets[i] = keyname;
targets[i+1] = val;
These do not make copies of the string - they only copy the address of whatever memory they point to. So, at the end of the while loop, each pair of target elements point to the same two blocks.
To make copies of the string, you'll either have to use strdup (if provided), or implement it yourself with strlen, malloc, and strcpy.
Also, as #mch mentioned, you never initialize lines, so while it may be zero, it may also be any garbage value (which can cause char* targets[lines*2]; to fail).
First you open the file. The in the while loop, check the condition to find \n or EOF to end the loop. In the loop, if you get anything other than alphanumeric, then separate the token and store it in string array. Increment the count when you encounter \n or EOF. Better use do{}while(ch!=EOF);

How do I parse a string in C?

I am a beginner learning C; so, please go easy on me. :)
I am trying to write a very simple program that takes each word of a string into a "Hi (input)!" sentence (it assumes you type in names). Also, I am using arrays because I need to practice them.
My problem is that, some garbage gets putten into the arrays somewhere, and it messes up the program. I tried to figure out the problem but to no avail; so, it is time to ask for expert help. Where have I made mistakes?
p.s.: It also has an infinite loop somewhere, but it is probably the result of the garbage that is put into the array.
#include <stdio.h>
#define MAX 500 //Maximum Array size.
int main(int argc, const char * argv[])
{
int stringArray [MAX];
int wordArray [MAX];
int counter = 0;
int wordCounter = 0;
printf("Please type in a list of names then hit ENTER:\n");
// Fill up the stringArray with user input.
stringArray[counter] = getchar();
while (stringArray[counter] != '\n') {
stringArray[++counter] = getchar();
}
// Main function.
counter = 0;
while (stringArray[wordCounter] != '\n') {
// Puts first word into temporary wordArray.
while ((stringArray[wordCounter] != ' ') && (stringArray[wordCounter] != '\n')) {
wordArray[counter++] = stringArray[wordCounter++];
}
wordArray[counter] = '\0';
//Prints out the content of wordArray.
counter = 0;
printf("Hi ");
while (wordArray[counter] != '\0') {
putchar(wordArray[counter]);
counter++;
}
printf("!\n");
//Clears temporary wordArray for new use.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
wordCounter++;
counter = 0;
}
return 0;
}
Solved it! I needed to add to following if sentence to the end when I incremented the wordCounter. :)
if (stringArray[wordCounter] != '\n') {
wordCounter++;
}
You are using int arrays to represent strings, probably because getchar() returns in int. However, strings are better represented as char arrays, since that's what they are, in C. The fact that getchar() returns an int is certainly confusing, it's because it needs to be able to return the special value EOF, which doesn't fit in a char. Therefore it uses int, which is a "larger" type (able to represent more different values). So, it can fit all the char values, and EOF.
With char arrays, you can use C's string functions directly:
char stringArray[MAX];
if(fgets(stringArray, sizeof stringArray, stdin) != NULL)
printf("You entered %s", stringArray);
Note that fscanf() will leave the end of line character(s) in the string, so you might want to strip them out. I suggest implementing an in-place function that trims off leading and trailing whitespace, it's a good exercise as well.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
You never enter into this loop.
user1799795,
For what it's worth (now that you've solved your problem) I took the liberty of showing you how I'd do this given the restriction "use arrays", and explaining a bit about why I'd do it that way... Just beware that while I am experienced programmer I'm no C guru... I've worked with guys who absolutely blew me into the C-weeds (pun intended).
#include <stdio.h>
#include <string.h>
#define LINE_SIZE 500
#define MAX_WORDS 50
#define WORD_SIZE 20
// Main function.
int main(int argc, const char * argv[])
{
int counter = 0;
// ----------------------------------
// Read a line of input from the user (ie stdin)
// ----------------------------------
char line[LINE_SIZE];
printf("Please type in a list of names then hit ENTER:\n");
while ( fgets(line, LINE_SIZE, stdin) == NULL )
fprintf(stderr, "You must enter something. Pretty please!");
// A note on that LINE_SIZE parameter to the fgets function:
// wherever possible it's a good idea to use the version of the standard
// library function that allows you specificy the maximum length of the
// string (or indeed any array) because that dramatically reduces the
// incedence "string overruns", which are a major source of bugs in c
// programmes.
// Also note that fgets includes the end-of-line character/sequence in
// the returned string, so you have to ensure there's room for it in the
// destination string, and remember to handle it in your string processing.
// -------------------------
// split the line into words
// -------------------------
// the current word
char word[WORD_SIZE];
int wordLength = 0;
// the list of words
char words[MAX_WORDS][WORD_SIZE]; // an array of upto 50 words of
// upto 20 characters each
int wordCount = 0; // the number of words in the array.
// The below loop syntax is a bit cyptic.
// The "char *c=line;" initialises the char-pointer "c" to the start of "line".
// The " *c;" is ultra-shorthand for: "is the-char-at-c not equal to zero".
// All strings in c end with a "null terminator" character, which has the
// integer value of zero, and is commonly expressed as '\0', 0, or NULL
// (a #defined macro). In the C language any integer may be evaluated as a
// boolean (true|false) expression, where 0 is false, and (pretty obviously)
// everything-else is true. So: If the character at the address-c is not
// zero (the null terminator) then go-round the loop again. Capiche?
// The "++c" moves the char-pointer to the next character in the line. I use
// the pre-increment "++c" in preference to the more common post-increment
// "c++" because it's a smidge more efficient.
//
// Note that this syntax is commonly used by "low level programmers" to loop
// through strings. There is an alternative which is less cryptic and is
// therefore preferred by most programmers, even though it's not quite as
// efficient. In this case the loop would be:
// int lineLength = strlen(line);
// for ( int i=0; i<lineLength; ++i)
// and then to get the current character
// char ch = line[i];
// We get the length of the line once, because the strlen function has to
// loop through the characters in the array looking for the null-terminator
// character at its end (guess what it's implementation looks like ;-)...
// which is inherently an "expensive" operation (totally dependant on the
// length of the string) so we atleast avoid repeating this operation.
//
// I know I might sound like I'm banging on about not-very-much but once you
// start dealing with "real word" magnitude datasets then such habits,
// formed early on, pay huge dividends in the ability to write performant
// code the first time round. Premature optimisation is evil, but my code
// doesn't hardly ever NEED optimising, because it was "fairly efficient"
// to start with. Yeah?
for ( char *c=line; *c; ++c ) { // foreach char in line.
char ch = *c; // "ch" is the character value-at the-char-pointer "c".
if ( ch==' ' // if this char is a space,
|| ch=='\n' // or we've reached the EOL char
) {
// 1. add the word to the end of the words list.
// note that we copy only wordLength characters, instead of
// relying on a null-terminator (which doesn't exist), as we
// would do if we called the more usual strcpy function instead.
strncpy(words[wordCount++], word, wordLength);
// 2. and "clear" the word buffer.
wordLength=0;
} else if (wordLength==WORD_SIZE-1) { // this word is too long
// so split this word into two words.
strncpy(words[wordCount++], word, wordLength);
wordLength=0;
word[wordLength++] = ch;
} else {
// otherwise: append this character to the end of the word.
word[wordLength++] = ch;
}
}
// -------------------------
// print out the words
// -------------------------
for ( int w=0; w<wordCount; ++w ) {
printf("Hi %s!\n", words[w]);
}
return 0;
}
In the real world one can't make such restrictive assumptions about the maximum-length of words, or how many there will be, and if such restrictions are given they're almost allways arbitrary and therefore proven wrong all too soon... so straight-off-the-bat for this problem, I'd be inclined to use a linked-list instead of the "words" array... wait till you get to "dynamic data structures"... You'll love em ;-)
Cheers. Keith.
PS: You're going pretty well... My advise is "just keep on truckin"... this gets a LOT easier with practice.

How to get the start of an email address

I have two strings, one with an email address, and the other is empty.
If the email adress is e.g. "abc123#gmail.com", I need to pass the start of the email address, just before the # into the second string. For example:
first string: "abc123#gmail.com"
second string: "abc123"
I've written a loop, but it doesn't work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char email[256] = "abc123#gmail.com";
char temp[256];
int i = 0;
while (email[i] != '#')
{
temp = strcat(temp, email[i]);
i++;
}
printf ("%s\n", temp);
system ("PAUSE");
return 0;
}
Basically, I took every time one char from the email address, and added it into the new string. For example if the new string has a on it, now I'll put b with it too using strcat....
Pointers. Firstly, strcat() returns a char pointer, which C can't cast as a char array for some reason (which I hear all C programmers must know). Secondly, the second argument to strcat() is supposed to be a char pointer, not a char.
Replacing temp = strcat(temp, email[i]); with temp[i] = email[i]; should do the trick.
Also, after the loop ends, terminate the string with a null character.
temp[i] = '\0';
(After the loop ends, i is equal to the length of your extracted string, so temp[i] is where the terminal should go.)
There are better ways to solve this problem (e.g. by finding the index of the # (by strcspn or otherwise) and doing a memcpy), but your method is very close to working, so we can just make a few small adjustments.
As others have identified, the problem is with this line:
temp = strcat(temp, email[i]);
Presumably, you are attempting to copy the character at the ith position of email into the corresponding position of temp. However, strcat is not the correct way to do so: strcat copies data from one char* to another char*, that is, it copies strings. You just want to copy a single character, which is exactly what = does.
Looking at it from a higher level (so that I don't just tell you the answer), you want to set the appropriate character of temp to the appropriate character of email (you will need to use i to index both email and temp).
Also, remember that strings in C have to be terminated by '\0', so you have to set the next character of temp to '\0' after you have finished copying the string. (On this line of thought, you should consider what happens if your email string doesn't have an # in it, your while loop will keep going past the end of the string email: remember that you can tell if you are at the end of a string by character == '\0' or just using character as a condition.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char email[256] = "abc123#gmail.com";
char temp[256];
size_t i = 0;
#if 0
for (i=0; email[i] && email[i] != '#'; i++) {;}
/* at the end of the loop email[i] is either the first '#',
** or that of the terminating '\0' (aka as strlen() )
*/
#else
i = strcspn(email, "#" );
/* the return value for strcspn() is either the index of the first '#'
* or of the terminating '\0'
*/
#endif
memcpy (temp, email, i);
temp[i] = 0;
printf ("%s\n", temp);
system ("PAUSE");
return 0;
}
UPDATE: a totally different approach would be to do the copying inside the loop (I guess this was the OP's intention):
for (i=0; temp[i] = (email[i] == '#' ? '\0' : email[i]) ; i++) {;}
You may want to try using strtok()

Resources