K&R - Recursive descent parser - strcat - c

What would be the reason for out[0] = '\0'; on the main() function?
It does seem to be working without it.
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
int tokentype;
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
void dcl(void);
void dirdcl(void);
int gettoken(void);
/*
Grammar:
dcl: optional * direct-dcl
direct-dcl: name
(dcl)
direct-dcl()
direct-dcl[optional size]
*/
int main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
/* 1. gettoken() gets the datatype from the token */
strcpy(datatype, token);
/* 2. Init out to end of the line? */
/* out[0] = '\0'; */
/* parse rest of line */
dcl();
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token;
/* Skip blank spaces and tabs */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()");
return tokentype = PARENS;
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
/* Reads the next character of input */
for (*p++ = c; isalnum(c = getch()); ) {
*p++ = c;
}
*p = '\0';
ungetch(c); /* Get back the space, tab */
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
}
else if (tokentype == NAME) /* variable name */ {
strcpy(name, token);
printf("token: %s\n", token);
}
else
printf("error: expected name or (dcl)\n");
while ((type = gettoken()) == PARENS || type == BRACKETS) {
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
}

You need out[0] to be zero in order for strcat to work.
While this line
out[0] = '\0';
was required prior to the introduction of static initialization rules, it is no longer required, because static arrays, such as out[], are initialized to all zeros.
According to initialization rules of C99,
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero.
if it is an aggregate, every member is initialized (recursively) according to these rules.

It is resetting the char array (aka string) to empty array. (removing junk values)
like we use:
int i = 0;
before doing something like:
i += 1;
so that junk value don't add
So just '\0' in 0 index of array tells that array is completely empty and the strcat function starts appending value from 0 index, over writing the junk values in other indexes of array.
If program is working without resetting array then it means your IDE tool is doing that for you, but it is good practice to reset it.

In short: In this particular case it's not strictly necessary, but in many other cases that look suspiciously similar, it is, so most people do it as "good style". So why would it be necessary?
There is no such thing as "empty" memory. There is no such thing as a "length". Unless you explicitly keep track of it, or define your own.
Memory is just bytes, which are numbers from 0 to 255. Since 0 is just as valid a number as 255, there is no way to tell whether a byte is used or not. You can "add up" several bytes if you need larger numbers, but everything is built out of bytes, in the end. Text is simply mapped to a number. A couple decades ago it was decided which number represents which character. So if you see a byte with the value 32, it could be a 32. Or it could be the 32nd letter in the computer's alphabet (which is the space character).
When you receive a string and you don't know how much text you will be dealing with, what you usually do is you reserve a large block of bytes. This is what char out[1000]; above does. But how do you tell where the text ends? How much of the 1000 bytes you've already used?
Well, in the old days, some people would just declare another variable, say, int length; and keep track of how many bytes they've used so far. The designers of C went a different route. They decided to pick a very rare character and use that as a marker. They picked the character with the value 0 for that (That is not the character '0'. The character '0' actually is the 48th letter of a computer's alphabet).
So you can just look at all the bytes in your string from the start, and if a character is > 0, you know it is used. If you reach a 0 character, you know this is the end of your string. There are various advantages to either approach. An int uses 4 bytes, an additional 0-character only 1. On the other hand, if you use an int, a string can also contain a 0-character, it's just another character, nobody cares.
Whenever you write "foo" in C, what C actually does is reserve room for 4 bytes, for 'f', 'o', 'o' and for the 0 to indicate the end. When you write "" in C, what it does is reserve room for a single byte, the 0. So that you can tell that the string is empty.
So, what is memory filled with before you put something into it at startup? Well, in most cases, it is just garbage. Whatever was in that memory the last time it was used (after all, you have limited RAM, so when you quit one application on your computer, its memory can get re-used for the next app you launch after that). These will be random numbers, often outside of the range of common characters.
So, if you want strcat to see out as an empty string, you need to give it a block of memory that starts with this 0 value character. If you just leave memory like it is, there might be some random characters in it. Your buffer might contain "jbhasugaudq7e1723876123798dbkda0skno§§^^%$#-9H0HWDZmwus0/usr/local/bin"
or whatever was in that memory before. If you now appended some text to it, it would think the stuff before the first 0 (which is just randomly in this place) was a valid string, and append it to that. It will only know that this string is supposed to be empty, if you put a 0 right at the start.
So why did I say it is "not strictly necessary"? Well, because in your case, out is a global variable, and global variables are special because they automatically get cleared to 0 when your application starts up (or assigned any value that you assign them when you declare them).
However, this is only true for global variables (both regular globals and static globals). So many programmers make it a habit to always initialize their blocks of bytes. That way, if someone later decides to change a global into a local variable, or copy-and-pastes the code to another spot to use with a local variable, they do not have to worry about forgetting to add this statement.
This is especially useful as random memory often contains 0 characters. So depending on what program you previously used, you might not notice you forgot the initial 0 because there happened to be one already in there. And only later, when one of your users runs this application, they get garbage at the start of their string.
Does that clarify things a bit?

Related

I am trying to create a code polisher program in C

I am trying to create the function delete_comments(). The read_file() and main functions are given.
Implement function char *delete_comments(char *input) that removes C comments from program stored at input. input variable points to dynamically allocated memory. The function returns pointer to the polished program. You may allocate a new memory block for the output, or modify the content directly in the input buffer.
You’ll need to process two types of comments:
Traditional block comments delimited by /* and */. These comments may span multiple lines. You should remove only characters starting from /* and ending to */ and for example leave any following newlines untouched.
Line comments starting with // until the newline character. In this case, newline character must also be removed.
The function calling delete_comments() only handles return pointer from delete_comments(). It does not allocate memory for any pointers. One way to implement delete_comments() function is to allocate memory for destination string. However, if new memory is allocated then the original memory in input must be released after use.
I'm having trouble understanding why my current approach is wrong or what is the specific problem that I'm getting weird output. I'm approaching the problem by trying to create a new array where to copy the input string with the new rules.
#include "source.h"
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
/* Remove C comments from the program stored in memory block <input>.
* Returns pointer to code after removal of comments.
* Calling code is responsible of freeing only the memory block returned by
* the function.
*/
char *delete_comments(char *input)
{
input = malloc(strlen(input) * sizeof (char));
char *secondarray = malloc(strlen(input) * sizeof (char));
int x, y = 0;
for (x = 0, y = 0; input[x] != '\0'; x++) {
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
else if ((input[x] == '/') && (input[x + 1] == '/')) {
int j = 0;
while (input[x + j] != '\n') {
y++;
j++;
}
}
else {
secondarray[x] = input[y];
y++;
}
}
return secondarray;
}
/* Read given file <filename> to dynamically allocated memory.
* Return pointer to the allocated memory with file content, or
* NULL on errors.
*/
char *read_file(const char *filename)
{
FILE *f = fopen(filename, "r");
if (!f)
return NULL;
char *buf = NULL;
unsigned int count = 0;
const unsigned int ReadBlock = 100;
unsigned int n;
do {
buf = realloc(buf, count + ReadBlock + 1);
n = fread(buf + count, 1, ReadBlock, f);
count += n;
} while (n == ReadBlock);
buf[count] = 0;
return buf;
}
int main(void)
{
char *code = read_file("testfile.c");
if (!code) {
printf("No code read");
return -1;
}
printf("-- Original:\n");
fputs(code, stdout);
code = delete_comments(code);
printf("-- Comments removed:\n");
fputs(code, stdout);
free(code);
}
Your program has fundamental issues.
It fails to tokenize the input. Comment start sequences can occur inside string literals, in which case they do not denote comments: "/* not a comment".
You have some basic bugs:
if ((input[x] == '/') && (input[x + 1] == '*')) {
int i = 0;
while ((input[x + i] != '*') && (input[x + i + 1] != '/')) {
y++;
i++;
}
}
Here, when we enter the loop, with i = 0, input + x is still pointing to the opening /. We did not skip over the opening * and are already looking for a closing *. This means that the sequence /*/ will be recognized as a complete comment, which it isn't.
This loop's also assumes that every /* comment is properly closed. It's not checking for the null character which can terminate the input, so if the comment is not closed, it will march beyond the end of the buffer.
C has line continuations. In ISO C translation stage 2, all backlash-newline sequences are deleted, converting one or more physical lines into logical lines. What that means is that a // comment can span multiple physical lines:
// this is an \
extended comment
You can see, by the way, that StackOverflow's automatic language detector for syntax highlighting is getting this right!
Line continuations are independent of tokenization, which doesn't happen until translation stage 3. Which means:
/\
/\
this is an extended \
comment
That one has defeated StackOverflow's syntax highlighting.
Furthermore, a line continuation can happen in any token, possibly multiple times:
"\
this is a string literal\
"
If you really want to make this work 100% correctly, you need to parse the input. By "parse" I mean a more formal, rigorous detection routine that understands what it is reading, in the context it is reading it.
For example, there are many times where this code could be defeated.
printf("the answer is %d // %d\n", a, b);
would likely trip your // detection and strip the end of the printf.
There are two general approaches to the problem above:
Find every corner case where comment-like characters could be used, and write conditional statements to avoid them before stripping.
Fully parse the language, so you will know if you are within a string or some other context that's wrapping comment like characters, or if you are in the top level context where the characters really mean "this is a comment"
To learn about parsing, I generally recommend "The Dragon Book" but it is a hard read, unless you have studied a bit of Discrete Mathematics. It covers a lot of different parsing techniques, and in doing so it doesn't have many pages left for examples. This means that it's the kind of book where you have to read, think, and then program a mini-example. If you follow that path, there is no input you can't tackle.
If you are pragmatic in your solution, and it is not about learning parsing, but about stripping comments, I recommend that you find a well constructed parser for C, and then learn how to walk the Abstract Syntax Tree in an Emitter, which fails to emit the comments.
There are some projects that do this already; but, I don't know if they have the right structure for easy modification. lint comes to mind, as well as other "pretty-printers" GCC certainly has the parsing code in there, but I've heard that GCC's Abstract Syntax Tree isn't easy to learn.
Your solution has several problems:
The worst issue
As the first instruction in delete_comments() you overwrite input with a new pointer returned by malloc(), which points to memory of random contents.
In consequence the address to the real input is lost.
Oh, and please check the returned value, if you call malloc().
Failing to increment the scanned position in comments correctly
You are scanning the input by the index x, but if you detect a comment, you don't change it.
You are actually advancing y but this is only used for the copying.
Think about lines like these:
int x; /* some /* weird /* comment */
///////////////////////////////
for (;;) { }
Ignoring character and string literals
Your solution should take character and string literals into account.
For example:
int c_plus_plus_comment_start = '//'; /* multi character constant */
const char* c_comment_start = "/*";
Note: There are more. Learn to use a debugger, or at least insert lots of printf()s in "interesting" places.

How to check if an index contains a symbol?

I want to check to make sure that a given string contained in an array called secretWord has no symbols in it (e.g. $ % & #). If it does have a symbol in it, I make the user re-enter the string. It takes advantage of recursion to keep asking until they enter a string that does not contain a symbol.
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
My function is as follows:
void checkForSymbols(char *array, int arraysize){ //Checks for symbols in the array and if there are any it recursively calls this function until it gets input without them.
for (int i = 0; i < arraysize; i++){
if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0){
flushArray(array, arraysize);
printf("No symbols are allowed in the word. Please try again: ");
fgets(secretWord, sizeof(secretWord) - 1, stdin);
checkForSymbols(secretWord, sizeof(secretWord));
}//end if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != 0)
else
continue;
}//end for(i = 0; i < sizeof(string[]); i++){
}//end checkForSymbols
The problem: When I enter any input (see example below), the if statement runs (it prints No symbols are allowed in the word. Please try again: and asks for new input).
I assume the problem obviously stems from the statement if (!isdigit(array[i]) && !isalpha(array[i]) && array[i] != (char) 0). But I have tried changing the (char) 0 part to '\0' and 0 as well and neither change had any effect.
How do I compare if what is in the index is a symbol, then? Why are strings without symbols setting this if statement off?
And if any of you are wondering what the "flushArray" method I used was, here it is:
void flushArray(char *array, int arraysize){ //Fills in the entire passed array with NULL characters
for (int i = 0; i < arraysize; i++){
array[i] = 0;
}
}//end flushArray
This function is called on the third line of my main() method, right after a print statement on the first line that asks users to input a word, and an fgets() statement on the second line that gets the input that this checkForSymbols function is used on.
As per request, an example would be if I input "Hello" as the secretWord string. The program then runs the function on it, and the if statement is for some reason triggered, causing it to
Replace all values stored in the secretWord array with the ASCII value of 0. (AKA NULL)
Prints No symbols are allowed in the word. Please try again: to the console.
Waits for new input that it will store in the secretWord array.
Calls the checkForSymbols() method on these new values stored in secretWord.
And no matter what you input as new secretWord, the checkForSymbols() method's if statement fires and it repeats steps 1 - 4 all over again.
Thank you for being patient and understanding with your help!
You can do something like this to find symbols in your code, put the code at proper location
#include <stdio.h>
#include <string.h>
int main () {
char invalids[] = "#.<#>";
char * temp;
temp=strchr(invalids,'s');//is s an invalid character?
if (temp!=NULL) {
printf ("Invalid character");
} else {
printf("Valid character");
}
return 0;
}
This will check if s is valid entry or not similarly for you can create an array and do something like this if array is not null terminated.
#include <string.h>
char false[] = { '#', '#', '&', '$', '<' }; // note last element isn't '\0'
if (memchr(false, 'a', sizeof(false)){
// do stuff
}
memchr is used if your array is not null terminated.
As suggested by #David C. Rankin you can also use strpbrk like
#include <stdio.h>
#include <string.h>
int main () {
const char str1[] = ",*##_$&+.!";
const char str2[] = "##"; //input string
char *ret;
ret = strpbrk(str1, str2);
if(ret) {
printf("First matching character: %c\n", *ret);
} else {
printf("Continue");
}
return(0);
}
The only symbol I do accept is the NULL symbol (the symbol represented by the ASCII value of zero). This is because I fill all the empty space in the array with NULL symbols.
NULL is a pointer; if you want a character value 0, you should use 0 or '\0'. I assume you're using memset or strncpy to ensure the trailing bytes are zero? Nope... What a shame, your MCVE could be so much shorter (and complete). :(
void checkForSymbols(char *array, int arraysize){
/* ... */
if (!isdigit(array[i]) && !isalpha(array[i]) /* ... */
As per section 7.4p1 of the C standard, ...
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
Not all char values are representable as an unsigned char or equal to EOF, and so it's possible (and highly likely given the nature of this question) that the code above invokes undefined behaviour.
As you haven't completed your question (by providing an MCVE, and describing what errors are occuring) I'm assuming that the question you're trying to ask might be a duplicate of this question, this question, this question, this question and probably a whole lot of others... If so, did you try Googling the error message? That's probably the first thing you should've done. Should that fail in the future, ask a question about the error message!
As per request, an example would be if I input "Hello" as the secretWord string.
I assume secretWord is declared as char secretWord[] = "Hello"; in your example, and not char *secretWord = "Hello";. The two types are distinct, and your book should clarify that. If not, which book are you reading? I can probably recommend a better book, if you'd like.
Any attempt to modify a string literal (i.e. char *array = "Hello"; flushArray(array, ...)) is undefined behaviour, as explained by answers to this question (among many others, I'm sure).
It seems a solution to this problem might be available by using something like this...
In response to your comment, you are probably making it a bit tougher on yourself than it needs to be. You have two issues to deal with (one you are not seeing). The first being to check the input to validate only a-zA-Z0-9 are entered. (you know that). The second being you need to identify and remove the trailing '\n' read and included in your input by fgets. (that one may be tripping you up)
You don't show how the initial array is filled, but given your use of fgets on secretWord[1], I suspect you are also using fgets for array. Which is exactly what you should be using. However, you need to remove the '\n' included at the end of the buffer filled by fgets before you call checkforsymbols. Otherwise you have character 0xa (the '\n') at the end, which, of course, is not a-zA-Z0-9 and will cause your check to fail.
To remove the trailing '\n', all you need to do is check the last character in your buffer. If it is a '\n', then simply overwrite it with the nul-terminating character (either 0 or the equivalent character representation '\0' -- your choice). You simply need the length of the string (which you get with strlen from string.h) and then check if (string[len - 1] == '\n'). For example:
size_t len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
A third issue, important, but not directly related to the comparison, is to always choose a type for your function that will return an indication of Success/Failure as needed. In your case the choice of void gives you nothing to check to determine whether there were any symbols found or not. You can choose any type you like int, char, char *, etc.. All will allow the return of a value to gauge success or failure. For testing strings, the normal choice is char *, returning a valid pointer on success or NULL on failure.
A fourth issue when taking input is you always need to handle the case where the user chooses to cancel input by generating a manual EOF with either ctrl+d on Linux or ctrl+z on windoze. The return of NULL by fgets gives you that ability. But with it (and every other input function), you have to check the return and make use of the return information in order to validate the user input. Simply check whether fgets returns NULL on your request for input, e.g.
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1; /* change as needed */
}
For your specific case where you only want a-zA-Z0-9, all you need to do is iterate down the string the user entered, checking each character to make sure it is a-zA-Z0-9 and return failure if anything else is encountered. This is made easy given that every string in C is nul-terminated. So you simply assign a pointer to the start of your string (e.g. char *p = str;) and then use either a for or while loop to check each character, e.g.
for (; *p != 0; p++) { do stuff }
that can be written in shorthand:
for (; *p; p++) { do stuff }
or use while:
while (*p) { do stuff; p++; }
Putting all of those pieces together, you could write your function to take a string as its only parameter and return NULL if a symbol is encountered, or return a pointer to your original string on success, e.g.
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
A short complete test routine could be:
#include <stdio.h>
#include <string.h>
#define MAXS 256
char *checkforsymbols (char *s);
int main (void) {
char str[MAXS] = "";
size_t len = 0;
for (;;) { /* loop until str w/o symbols */
printf (" enter string: "); /* prompt for user input */
if (!fgets (str, MAXS, stdin)) { /* read/validate input */
fprintf (stderr, "EOF received -> user canceled input.\n");
return 1;
}
len = strlen (str); /* get length of str */
if (str[len - 1] == '\n') /* check for trailing '\n' */
str[--len] = 0; /* overwrite with nul-byte */
if (checkforsymbols (str)) /* check for symbols */
break;
}
printf (" valid str: '%s'\n", str);
return 0;
}
char *checkforsymbols (char *s)
{
if (!s || !*s) return NULL; /* validate string and not empty */
char *p = s; /* pointer to iterate over string */
for (; *p; p++) /* for each char in s */
if ((*p < 'a' || *p > 'z') && /* char is not a-z */
(*p < 'A' || *p > 'Z') && /* char is not A-Z */
(*p < '0' || *p > '9')) { /* char is not 0-9 */
fprintf (stderr, "error: '%c' not allowed in input.\n", *p);
return NULL; /* indicate failure */
}
return s; /* indicate success */
}
Example Use/Output
$ ./bin/str_chksym
enter string: mydoghas$20worthoffleas
error: '$' not allowed in input.
enter string: Baddog!
error: '!' not allowed in input.
enter string: Okheisagood10yearolddog
valid str: 'Okheisagood10yearolddog'
or if the user cancels user input:
$ ./bin/str_chksym
enter string: EOF received -> user canceled input.
footnote 1.
C generally prefers the use of all lower-case variable names, while reserving all upper-case for macros and defines. Leave MixedCase or camelCase variable names for C++ and java. However, since this is a matter of style, this is completely up to you.

Comparing 2 Strings, one in a struct other not C programming

I have this database and I Need to check whether a Product Name is already in the database otherwise I ask the user to input another one.
The problem is this:
I'm trying to compare a string (the Product Name) found inside the struct with the string the user inputs.
The coding of the struct, the user input part and the search method are here below:
product Structure
typedef struct
{
char pName[100];
char pDescription [100];
float pPrice;
int pStock;
int pOrder;
}product;
the checkProduct method:
int checkProduct (char nameCheck[100])
{
product temp;
p.pName = nameCheck;
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))
{
return 1;
}
}
return 0;
}
and the user input part [part of the code]:
char nameCheck[100];
gets (nameCheck);
checkProduct (nameCheck);
while (checkProduct == 1)
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
{
continue;
}
}
p.pName = nameCheck;
Now I am having the following errors (I Use ECLIPSE):
on the line
while (checkProduct == 1) [found in the user input] is giving me:
"comparison between pointer and integer - enabled by default" marked by a yellow warning triangle
p.pName = nameCheck; is marked as a red cross and stopping my compiling saying:
"incompatible types when assigning to type 'char [100] from type 'char*'
^---- Is giving me trouble BOTH in the userinput AND when I'm comparing strings.
Any suggestions how I can fix it or maybe how I can deference it? I can't understand why in the struct the char pName is being marked as '*' whereas in the char[100] it's not.
Any brief explanation please?
Thank you in advance
EDIT: After emending the code with some of below:
THIS Is the INPUT NAME OF PRODUCT section;
char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);
checkProduct (nameCheck);
int value = checkProduct (nameCheck);
while (value == 1)
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
{
}
}
strcpy (p.pName, nameCheck);
this is the new checkName method
int checkProduct (char *nameCheck)
{
product temp;
strcpy (p.pName, nameCheck);
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName) == 0)
{
return 1;
}
}
return 0;
}
p.pName = nameCheck;
is wrong as you try to assign address of one array to another. What you probably want is to copy it.
Use strcpy() instead.
strcpy(p.pName, nameCheck);
while (checkProduct == 1)
Since checkProduct is a function, the above condition will always be false as the address of function won't be equal to 1. You can store the return value in another integer like this:
int value = checkProduct(nameCheck);
while (value == 1)
/* rest of the code */
Or rather simply:
while ( checkProduct(nameCheck) == 1 ) {
...
Note - I've not checked entire code, there might be other bugs apart from this one. Btw, if you are new to programming, you can start with small examples from textbooks and then work towards slightly complex stuff.
int checkProduct (char nameCheck[100])
Note that the type signature is a lie. The signature should be
int checkProduct(char *nameCheck)
since the argument the function expects and receives is a pointer to a char, or, to document it for the user that the argument should be a pointer to the first element of a 0-terminated char array
int checkProduct(char nameCheck[])
Arrays are never passed as arguments to functions, as function arguments, and in most circumstances [the exceptions are when the array is the operand of sizeof, _Alignof or the address operator &] are converted to pointers to the first element.
{
product temp;
p.pName = nameCheck;
Arrays are not assignable. The only time you can have an array name on the left of a = is initialisation at the point where the array is declared.
You probably want
strcpy(p.pName, nameCheck);
there.
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))
strcmp returns a negative value if the first argument is lexicographically smaller than the second, 0 if both arguments are equal, and a positive value if the first is lexicographically larger than the second.
You probably want
if (strcmp(temp.pName, p.pName) == 0)
there.
gets (nameCheck);
Never use gets. It is extremely unsafe (and has been remoed from the language in the last standard, yay). Use
fgets(nameCheck, sizeof nameCheck, stdin);
but that stores the newline in the buffer if there is enough space, so you have to overwrite that with 0 if present.
If you are on a POSIX system and don't need to care about portability, you can use getline() to read in a line without storing the trailing newline.
checkProduct (nameCheck);
You check whether the product is known, but throw away the result. Store it in a variable.
while (checkProduct == 1)
checkProduct is a function. In almost all circumstances, a function designator is converted into a pointer, hence the warning about the comparison between a pointer and an integer. You meant to compare to the value of the call you should have stored above.
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
You read in characters without storing them. So you will never change the contents of nameCheck, and then be trapped in an infinite loop.
{
continue;
}
If the only statement in a loop body is continue;, you should leave the body empty.
}
p.pName = nameCheck;
Once again, you can't assign to an array.
Concerning the edit,
char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);
you have changed nameCheck from an array to a pointer. That means that sizeof nameCheck now doesn't give the number of chars you can store in the array, but the size of a pointer to char, which is independent of what it points to (usually 4 on 32-bit systems and 8 on 64-bit systems).
And you let that pointer point to a string literal "", which is the reason for the crash. Attempting to modify string literals is undefined behaviour, and more often than not leads to a crash, since string literals are usually stored in a read-only segment of the memory nowadays.
You should have left it at
char nameCheck[100];
fgets(nameCheck, sizeof nameCheck, stdin);
and then you can use sizeof nameCheck to tell fgets how many characters it may read, or, alternatively, you could have a pointer and malloc some memory,
#define NAME_LENGTH 100
char *nameCheck = malloc(NAME_LENGTH);
if (nameCheck == NULL) {
// malloc failed, handle it if possible, or
exit(EXIT_FAILURE);
}
fgets(nameCheck, NAME_LENGTH, stdin);
Either way, after getting input, remove the newline if there is one:
size_t len = strlen(nameCheck);
if (len > 0 && nameCheck[len-1] == '\n') {
nameCheck[len-1] = 0;
}
// Does windows also add a '\r' when reading from stdin?
if (len > 1 && nameCheck[len-2] == '\r') {
nameCheck[len-2] = 0;
}

How do I parse a string in C?

I am a beginner learning C; so, please go easy on me. :)
I am trying to write a very simple program that takes each word of a string into a "Hi (input)!" sentence (it assumes you type in names). Also, I am using arrays because I need to practice them.
My problem is that, some garbage gets putten into the arrays somewhere, and it messes up the program. I tried to figure out the problem but to no avail; so, it is time to ask for expert help. Where have I made mistakes?
p.s.: It also has an infinite loop somewhere, but it is probably the result of the garbage that is put into the array.
#include <stdio.h>
#define MAX 500 //Maximum Array size.
int main(int argc, const char * argv[])
{
int stringArray [MAX];
int wordArray [MAX];
int counter = 0;
int wordCounter = 0;
printf("Please type in a list of names then hit ENTER:\n");
// Fill up the stringArray with user input.
stringArray[counter] = getchar();
while (stringArray[counter] != '\n') {
stringArray[++counter] = getchar();
}
// Main function.
counter = 0;
while (stringArray[wordCounter] != '\n') {
// Puts first word into temporary wordArray.
while ((stringArray[wordCounter] != ' ') && (stringArray[wordCounter] != '\n')) {
wordArray[counter++] = stringArray[wordCounter++];
}
wordArray[counter] = '\0';
//Prints out the content of wordArray.
counter = 0;
printf("Hi ");
while (wordArray[counter] != '\0') {
putchar(wordArray[counter]);
counter++;
}
printf("!\n");
//Clears temporary wordArray for new use.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
wordCounter++;
counter = 0;
}
return 0;
}
Solved it! I needed to add to following if sentence to the end when I incremented the wordCounter. :)
if (stringArray[wordCounter] != '\n') {
wordCounter++;
}
You are using int arrays to represent strings, probably because getchar() returns in int. However, strings are better represented as char arrays, since that's what they are, in C. The fact that getchar() returns an int is certainly confusing, it's because it needs to be able to return the special value EOF, which doesn't fit in a char. Therefore it uses int, which is a "larger" type (able to represent more different values). So, it can fit all the char values, and EOF.
With char arrays, you can use C's string functions directly:
char stringArray[MAX];
if(fgets(stringArray, sizeof stringArray, stdin) != NULL)
printf("You entered %s", stringArray);
Note that fscanf() will leave the end of line character(s) in the string, so you might want to strip them out. I suggest implementing an in-place function that trims off leading and trailing whitespace, it's a good exercise as well.
for (counter = 0; counter == MAX; counter++) {
wordArray[counter] = '\0';
}
You never enter into this loop.
user1799795,
For what it's worth (now that you've solved your problem) I took the liberty of showing you how I'd do this given the restriction "use arrays", and explaining a bit about why I'd do it that way... Just beware that while I am experienced programmer I'm no C guru... I've worked with guys who absolutely blew me into the C-weeds (pun intended).
#include <stdio.h>
#include <string.h>
#define LINE_SIZE 500
#define MAX_WORDS 50
#define WORD_SIZE 20
// Main function.
int main(int argc, const char * argv[])
{
int counter = 0;
// ----------------------------------
// Read a line of input from the user (ie stdin)
// ----------------------------------
char line[LINE_SIZE];
printf("Please type in a list of names then hit ENTER:\n");
while ( fgets(line, LINE_SIZE, stdin) == NULL )
fprintf(stderr, "You must enter something. Pretty please!");
// A note on that LINE_SIZE parameter to the fgets function:
// wherever possible it's a good idea to use the version of the standard
// library function that allows you specificy the maximum length of the
// string (or indeed any array) because that dramatically reduces the
// incedence "string overruns", which are a major source of bugs in c
// programmes.
// Also note that fgets includes the end-of-line character/sequence in
// the returned string, so you have to ensure there's room for it in the
// destination string, and remember to handle it in your string processing.
// -------------------------
// split the line into words
// -------------------------
// the current word
char word[WORD_SIZE];
int wordLength = 0;
// the list of words
char words[MAX_WORDS][WORD_SIZE]; // an array of upto 50 words of
// upto 20 characters each
int wordCount = 0; // the number of words in the array.
// The below loop syntax is a bit cyptic.
// The "char *c=line;" initialises the char-pointer "c" to the start of "line".
// The " *c;" is ultra-shorthand for: "is the-char-at-c not equal to zero".
// All strings in c end with a "null terminator" character, which has the
// integer value of zero, and is commonly expressed as '\0', 0, or NULL
// (a #defined macro). In the C language any integer may be evaluated as a
// boolean (true|false) expression, where 0 is false, and (pretty obviously)
// everything-else is true. So: If the character at the address-c is not
// zero (the null terminator) then go-round the loop again. Capiche?
// The "++c" moves the char-pointer to the next character in the line. I use
// the pre-increment "++c" in preference to the more common post-increment
// "c++" because it's a smidge more efficient.
//
// Note that this syntax is commonly used by "low level programmers" to loop
// through strings. There is an alternative which is less cryptic and is
// therefore preferred by most programmers, even though it's not quite as
// efficient. In this case the loop would be:
// int lineLength = strlen(line);
// for ( int i=0; i<lineLength; ++i)
// and then to get the current character
// char ch = line[i];
// We get the length of the line once, because the strlen function has to
// loop through the characters in the array looking for the null-terminator
// character at its end (guess what it's implementation looks like ;-)...
// which is inherently an "expensive" operation (totally dependant on the
// length of the string) so we atleast avoid repeating this operation.
//
// I know I might sound like I'm banging on about not-very-much but once you
// start dealing with "real word" magnitude datasets then such habits,
// formed early on, pay huge dividends in the ability to write performant
// code the first time round. Premature optimisation is evil, but my code
// doesn't hardly ever NEED optimising, because it was "fairly efficient"
// to start with. Yeah?
for ( char *c=line; *c; ++c ) { // foreach char in line.
char ch = *c; // "ch" is the character value-at the-char-pointer "c".
if ( ch==' ' // if this char is a space,
|| ch=='\n' // or we've reached the EOL char
) {
// 1. add the word to the end of the words list.
// note that we copy only wordLength characters, instead of
// relying on a null-terminator (which doesn't exist), as we
// would do if we called the more usual strcpy function instead.
strncpy(words[wordCount++], word, wordLength);
// 2. and "clear" the word buffer.
wordLength=0;
} else if (wordLength==WORD_SIZE-1) { // this word is too long
// so split this word into two words.
strncpy(words[wordCount++], word, wordLength);
wordLength=0;
word[wordLength++] = ch;
} else {
// otherwise: append this character to the end of the word.
word[wordLength++] = ch;
}
}
// -------------------------
// print out the words
// -------------------------
for ( int w=0; w<wordCount; ++w ) {
printf("Hi %s!\n", words[w]);
}
return 0;
}
In the real world one can't make such restrictive assumptions about the maximum-length of words, or how many there will be, and if such restrictions are given they're almost allways arbitrary and therefore proven wrong all too soon... so straight-off-the-bat for this problem, I'd be inclined to use a linked-list instead of the "words" array... wait till you get to "dynamic data structures"... You'll love em ;-)
Cheers. Keith.
PS: You're going pretty well... My advise is "just keep on truckin"... this gets a LOT easier with practice.

Skip white space and return one word at a time in C

This code is supposed to skip white space and return one word at a time. A couple of questions on this code: When the code gets to the *word++=c; line I get a core dump. Have I written this line correctly? and is return correct. And Do I need to somehow allocate memory to store the word?
//get_word
int get_word(char *word,int lim){
int i=0;
int c;
int quotes=0;
int inword = 1;
while(
inword &&
(i < (lim-1)) &&
((c=getchar()) != EOF)
){
if(c==('\"')){//this is so i can get a "string"
if (quotes) {
inword = 0;
}
quotes = ! quotes;
}
else if(quotes){ //if in a string keep storing til the end of the string
*word++=c;//pointer word gets c and increments the pointer
i++;
}
else if(!isspace(c)) {//if not in string store
*word++=c;
i++;
}
else {
// Only end if we have read some character ...
if (i)
inword = 0;
}
}
*word='\0'; //null at the end to signify
return i; //value
}
It's impossible to tell why this core dumps without seeing the code that calls get_word. The failure at the line you named implies that you are passing it something invalid in the first parameter. There's nothing wrong with that line in and of itself, but if word does not point to writable memory large enough to hold your output characters, you are in trouble.
The answer to your question about allocating memory to hold it is yes - however this could be local (e.g. a char array in the caller's local variables, global, or heap-based (e.g. from char * wordHolder = malloc(wordLimit);). The fact you are asking this supports the guess that your parameter 1 value is the problem.

Resources