How can I check the file extensions in c? - c

I'm struggling with this for several days now.
I want to create a functions that goes through a directory, pick all the files with an extension *.csv and read the data in them.
I created a function that is supposed to check each file name that is legal, by checking that the string ends with .csv. To do this, I want to go to the end of the char * array. I tried this:
char * point = file_name;
while (point != "\0"){
point += 1;
}
which goes through the char * array without finding and "\0".
If I write
*point != "\0"
The compiler warns me that I'm comparing and int to a char.
I should note that I get the filename using
dirent->d_name

point != "\0" compares a pointer to another pointer, which is not what you want.
You want to compare whatever point points to , to a char with the value 0. So use e.g.
*point != '\0';
Note, if you want to find the end of the string, you could also do
point = file_name + strlen(filename);
If you want to check whether a string ends in .csv, you could also do something like
if((point = strrchr(filename,'.')) != NULL ) {
if(strcmp(point,".csv") == 0) {
//ends with csv
}
}
EDIT : fixed formatting

You must dereference the pointer to look at the value it points at:
while (*point != '\0')
point++;
Note that the right hand side is a character literal (single quotes), not a string (double quotes). This fixes the warning you got when trying to use a string.
Also note that it's really unnecessary to find the end of the string to check this. A better approach would be:
int ends_with(const char* name, const char* extension, size_t length)
{
const char* ldot = strrchr(name, '.');
if (ldot != NULL)
{
if (length == 0)
length = strlen(extension);
return strncmp(ldot + 1, extension, length) == 0;
}
return 0;
}
Call the above with e.g. ends_with("test.foo", "foo", 3) and you will get 1 returned, if no match is found it returns 0.
This is not faster, but it's a lot clearer since it operates at a higher level, using only well-known standard string functions.

To get the extension from a file name you can use the strrchr function to get the position of the last occurrence of a character in a string:
char * end = strrchr(filename, '.');
if(strcmp(end, ".csv") == 0)
{
// Yuppee! A CSV file! Let's do something with it!
}
(And have a look at what others have said, as they address several problems with your code.)

unwind is right, you just interpreted his code wrong.
What many (especially newbies; don't want to call you such) don't notice (especially when coming from other languages like JavaScript or PHP) is, that "" and '' aren't the same in C++. (Neither are they in PHP but the difference is bigger here.)
"a" is a string - essentially a const char[2].
'a' is a single char - it's just a char.
What you want is comparing a single character to a single character, so you have to use '\0'.
If you use "\0" you essentially compare your character to the address of the essentially empty string (which is an integer).

Try single quotes not double quotes. This limits the constant to a character. Double quotes are for strings not characters. You need to dereference point using * to get to the character:
char * point = file_name;
while (*point != '\0')
++point;

Based on #unwind's answer, I propose this version to check if a name ends with a specific extension. Note that the inputs are declared as const pointers pointing to const char.
int stringEndsWith(
char const * const name,
char const * const extension)
{
size_t length = 0;
char* ldot = NULL;
if (name == NULL) return 0;
if (extension == NULL) return 0;
length = strlen(extension);
if (length == 0) return 0;
ldot = strrchr(name, extension[0]);
if (ldot != NULL)
{
return (strncmp(ldot, extension, length) == 0);
}
return 0;
}

Related

Appending Characters to an Empty String in C

I'm relatively new to C, so any help understanding what's going on would be awesome!!!
I have a struct called Token that is as follows:
//Token struct
struct Token {
char type[16];
char value[1024];
};
I am trying to read from a file and append characters read from the file into Token.value like so:
struct Token newToken;
char ch;
ch = fgetc(file);
strncat(newToken.value, &ch, 1);
THIS WORKS!
My problem is that Token.value begins with several values I don't understand, preceding the characters that I appended. When I print the result of newToken.value to the console, I get #�����TheCharactersIWantedToAppend. I could probably figure out a band-aid solution to retroactively remove or work around these characters, but I'd rather not if I don't have to.
In analyzing the � characters, I see them as (in order from index 1-5): \330, \377, \377, \377, \177. I read that \377 is a special character for EOF in C, but also 255 in decimal? Do these values make up a memory address? Am I adding the address to newToken.value by using &ch in strncat? If so, how can I keep them from getting into newToken.value?
Note: I get a segmentation fault if I use strncat(newToken.value, ch, 1) instead of strncat(newToken.value, &ch, 1) (ch vs. &ch).
I'll try to consolidate the answers already given in the comments.
This version of the code uses strncat(), as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to strncat() must be a string, and not a pointer to a single char) (Yes, a "string" is actually a char[] and the value passed to the function is a char*; but it must point to an array of at least two chars, the last one containing a '\0'.)
Please be aware that strncat(), strncpy() and all related functions are tricky. They don't write more than N chars. But strncpy() only adds the final '\0' to the target string when the source has less than N chars; and strncat() always adds it, even if it the source has exactly N chars or more (edited; thanks, #Clifford).
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
This other version uses an index variable and writes each singe char directly into the "current" position of the target string, without using strncat(). I think is simpler and more secure, because it doesn't mix the confusing semantics of single chars and strings.
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
Edited: fgetc() returns an int and we should check for EOF before casting it to a char (thanks, #chqrlie).
You are appending string that is not initialised, so can contain anything. The end I'd a string is indicated by a NUL(0) character, and in your example there happened to be one after 6 bytes, but there need not be any within the value array, so the code is seriously flawed, and will result in non-deterministic behaviour.
You need to initialise the newToken instance to empty string. For example:
struct Token newToken = { "", "" } ;
or to zero initialise the whole structure:
struct Token newToken = { 0 } ;
The point is that C does not initialise non-static objects without an explicit initialiser.
Furthermore using strncat() is very inefficient and has non-deterministic execution time that depends on the length of the destination string (see https://www.joelonsoftware.com/2001/12/11/back-to-basics/).
In this case you would do better to maintain a count of the number of characters added, and write the character and terminator directly to the array. For example:
size_t index ;
int ch = 0 ;
do
{
ch = fgetc(file);
if( ch != EOF )
{
newToken.value[index] = (char)ch ;
index++ ;
newToken.value[index] = '\0' ;
}
} while( ch != EOF &&
index < size of(newToken.value) - 1 ) ;

K&R - Recursive descent parser - strcat

What would be the reason for out[0] = '\0'; on the main() function?
It does seem to be working without it.
Code
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXTOKEN 100
enum { NAME, PARENS, BRACKETS };
int tokentype;
char token[MAXTOKEN]; /*last token string */
char name[MAXTOKEN]; /*identifier name */
char datatype[MAXTOKEN]; /*data type = char, int, etc. */
char out[1000];
void dcl(void);
void dirdcl(void);
int gettoken(void);
/*
Grammar:
dcl: optional * direct-dcl
direct-dcl: name
(dcl)
direct-dcl()
direct-dcl[optional size]
*/
int main() /* convert declaration to words */
{
while (gettoken() != EOF) { /* 1st token on line */
/* 1. gettoken() gets the datatype from the token */
strcpy(datatype, token);
/* 2. Init out to end of the line? */
/* out[0] = '\0'; */
/* parse rest of line */
dcl();
if (tokentype != '\n')
printf("syntax error\n");
printf("%s: %s %s\n", name, out, datatype);
}
return 0;
}
int gettoken(void) /* return next token */
{
int c, getch(void);
void ungetch(int);
char *p = token;
/* Skip blank spaces and tabs */
while ((c = getch()) == ' ' || c == '\t')
;
if (c == '(') {
if ((c = getch()) == ')') {
strcpy(token, "()");
return tokentype = PARENS;
} else {
ungetch(c);
return tokentype = '(';
}
} else if (c == '[') {
for (*p++ = c; (*p++ = getch()) != ']'; )
;
*p = '\0';
return tokentype = BRACKETS;
} else if (isalpha(c)) {
/* Reads the next character of input */
for (*p++ = c; isalnum(c = getch()); ) {
*p++ = c;
}
*p = '\0';
ungetch(c); /* Get back the space, tab */
return tokentype = NAME;
} else
return tokentype = c;
}
/* dcl: parse a declarator */
void dcl(void)
{
int ns;
for (ns = 0; gettoken() == '*'; ) /* count *'s */
ns++;
dirdcl();
while (ns-- > 0)
strcat(out, " pointer to");
}
/* dirdcl: parse a direct declarator */
void dirdcl(void)
{
int type;
if (tokentype == '(') {
dcl();
if (tokentype != ')')
printf("error: missing )\n");
}
else if (tokentype == NAME) /* variable name */ {
strcpy(name, token);
printf("token: %s\n", token);
}
else
printf("error: expected name or (dcl)\n");
while ((type = gettoken()) == PARENS || type == BRACKETS) {
if (type == PARENS)
strcat(out, " function returning");
else {
strcat(out, " array");
strcat(out, token);
strcat(out, " of");
}
}
}
You need out[0] to be zero in order for strcat to work.
While this line
out[0] = '\0';
was required prior to the introduction of static initialization rules, it is no longer required, because static arrays, such as out[], are initialized to all zeros.
According to initialization rules of C99,
...
if it has arithmetic type, it is initialized to (positive or unsigned) zero.
if it is an aggregate, every member is initialized (recursively) according to these rules.
It is resetting the char array (aka string) to empty array. (removing junk values)
like we use:
int i = 0;
before doing something like:
i += 1;
so that junk value don't add
So just '\0' in 0 index of array tells that array is completely empty and the strcat function starts appending value from 0 index, over writing the junk values in other indexes of array.
If program is working without resetting array then it means your IDE tool is doing that for you, but it is good practice to reset it.
In short: In this particular case it's not strictly necessary, but in many other cases that look suspiciously similar, it is, so most people do it as "good style". So why would it be necessary?
There is no such thing as "empty" memory. There is no such thing as a "length". Unless you explicitly keep track of it, or define your own.
Memory is just bytes, which are numbers from 0 to 255. Since 0 is just as valid a number as 255, there is no way to tell whether a byte is used or not. You can "add up" several bytes if you need larger numbers, but everything is built out of bytes, in the end. Text is simply mapped to a number. A couple decades ago it was decided which number represents which character. So if you see a byte with the value 32, it could be a 32. Or it could be the 32nd letter in the computer's alphabet (which is the space character).
When you receive a string and you don't know how much text you will be dealing with, what you usually do is you reserve a large block of bytes. This is what char out[1000]; above does. But how do you tell where the text ends? How much of the 1000 bytes you've already used?
Well, in the old days, some people would just declare another variable, say, int length; and keep track of how many bytes they've used so far. The designers of C went a different route. They decided to pick a very rare character and use that as a marker. They picked the character with the value 0 for that (That is not the character '0'. The character '0' actually is the 48th letter of a computer's alphabet).
So you can just look at all the bytes in your string from the start, and if a character is > 0, you know it is used. If you reach a 0 character, you know this is the end of your string. There are various advantages to either approach. An int uses 4 bytes, an additional 0-character only 1. On the other hand, if you use an int, a string can also contain a 0-character, it's just another character, nobody cares.
Whenever you write "foo" in C, what C actually does is reserve room for 4 bytes, for 'f', 'o', 'o' and for the 0 to indicate the end. When you write "" in C, what it does is reserve room for a single byte, the 0. So that you can tell that the string is empty.
So, what is memory filled with before you put something into it at startup? Well, in most cases, it is just garbage. Whatever was in that memory the last time it was used (after all, you have limited RAM, so when you quit one application on your computer, its memory can get re-used for the next app you launch after that). These will be random numbers, often outside of the range of common characters.
So, if you want strcat to see out as an empty string, you need to give it a block of memory that starts with this 0 value character. If you just leave memory like it is, there might be some random characters in it. Your buffer might contain "jbhasugaudq7e1723876123798dbkda0skno§§^^%$#-9H0HWDZmwus0/usr/local/bin"
or whatever was in that memory before. If you now appended some text to it, it would think the stuff before the first 0 (which is just randomly in this place) was a valid string, and append it to that. It will only know that this string is supposed to be empty, if you put a 0 right at the start.
So why did I say it is "not strictly necessary"? Well, because in your case, out is a global variable, and global variables are special because they automatically get cleared to 0 when your application starts up (or assigned any value that you assign them when you declare them).
However, this is only true for global variables (both regular globals and static globals). So many programmers make it a habit to always initialize their blocks of bytes. That way, if someone later decides to change a global into a local variable, or copy-and-pastes the code to another spot to use with a local variable, they do not have to worry about forgetting to add this statement.
This is especially useful as random memory often contains 0 characters. So depending on what program you previously used, you might not notice you forgot the initial 0 because there happened to be one already in there. And only later, when one of your users runs this application, they get garbage at the start of their string.
Does that clarify things a bit?

Changing the extension of a passed filename

My function is passed a filename of the type
char *myFilename;
I want to change the existing extension to ".sav", or if there is no extension, simply add ".sav" to the end of the file. But I need to consider files named such as "myfile.ver1.dat".
Can anyone give me an idea on the best way to achieve this.
I was considering using a function to find the last "." and remove all characters after it and replace them with "sav". or if no "." is found, simple add ".sav" to the end of the string. But not sure how to do it as I get confused by the '\0' part of the string and whether strlen returns the whole string with '\0' or do I need to +1 to the string length after.
I want to eventual end up with a filename to pass to fopen().
May be something like this :
char *ptrFile = strrchr(myFilename, '/');
ptrFile = (ptrFile) ? myFilename : ptrFile+1;
char *ptrExt = strrchr(ptrFile, '.');
if (ptrExt != NULL)
strcpy(ptrExt, ".sav");
else
strcat(ptrFile, ".sav");
And then the traditional way , remove and rename
Here's something lazy I've whipped up, it makes minimum use of the standard library functions (maybe you'd like something that does?):
#include <stdio.h>
#include <string.h>
void change_type(char* input, char* new_extension, int size)
{
char* output = input; // save pointer to input in case we need to append a dot and add at the end of input
while(*(++input) != '\0') // move pointer to final position
;
while(*(--input) != '.' && --size > 0) // start going backwards until we encounter a dot or we go back to the start
;
// if we've encountered a dot, let's replace the extension, otherwise let's append it to the original string
size == 0 ? strncat(output, new_extension, 4 ) : strncpy(input, new_extension, 4);
}
int main()
{
char input[10] = "file";
change_type(input, ".bff", sizeof(input));
printf("%s\n", input);
return 0;
}
And it indeed prints file.bff. Please note that this handles extensions up to 3 chars long.
strlen returns the number of characters in the string but arrays are indexed from 0 so
filename [strlen(filename)]
is the terminating null.
int p;
for (p = strlen (filename) - 1; (p > 0) && (filename[p] != '.'); p--)
will loop to zero if no extension and stop at the correct spot otherwise.

Comparing 2 Strings, one in a struct other not C programming

I have this database and I Need to check whether a Product Name is already in the database otherwise I ask the user to input another one.
The problem is this:
I'm trying to compare a string (the Product Name) found inside the struct with the string the user inputs.
The coding of the struct, the user input part and the search method are here below:
product Structure
typedef struct
{
char pName[100];
char pDescription [100];
float pPrice;
int pStock;
int pOrder;
}product;
the checkProduct method:
int checkProduct (char nameCheck[100])
{
product temp;
p.pName = nameCheck;
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))
{
return 1;
}
}
return 0;
}
and the user input part [part of the code]:
char nameCheck[100];
gets (nameCheck);
checkProduct (nameCheck);
while (checkProduct == 1)
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
{
continue;
}
}
p.pName = nameCheck;
Now I am having the following errors (I Use ECLIPSE):
on the line
while (checkProduct == 1) [found in the user input] is giving me:
"comparison between pointer and integer - enabled by default" marked by a yellow warning triangle
p.pName = nameCheck; is marked as a red cross and stopping my compiling saying:
"incompatible types when assigning to type 'char [100] from type 'char*'
^---- Is giving me trouble BOTH in the userinput AND when I'm comparing strings.
Any suggestions how I can fix it or maybe how I can deference it? I can't understand why in the struct the char pName is being marked as '*' whereas in the char[100] it's not.
Any brief explanation please?
Thank you in advance
EDIT: After emending the code with some of below:
THIS Is the INPUT NAME OF PRODUCT section;
char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);
checkProduct (nameCheck);
int value = checkProduct (nameCheck);
while (value == 1)
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
{
}
}
strcpy (p.pName, nameCheck);
this is the new checkName method
int checkProduct (char *nameCheck)
{
product temp;
strcpy (p.pName, nameCheck);
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName) == 0)
{
return 1;
}
}
return 0;
}
p.pName = nameCheck;
is wrong as you try to assign address of one array to another. What you probably want is to copy it.
Use strcpy() instead.
strcpy(p.pName, nameCheck);
while (checkProduct == 1)
Since checkProduct is a function, the above condition will always be false as the address of function won't be equal to 1. You can store the return value in another integer like this:
int value = checkProduct(nameCheck);
while (value == 1)
/* rest of the code */
Or rather simply:
while ( checkProduct(nameCheck) == 1 ) {
...
Note - I've not checked entire code, there might be other bugs apart from this one. Btw, if you are new to programming, you can start with small examples from textbooks and then work towards slightly complex stuff.
int checkProduct (char nameCheck[100])
Note that the type signature is a lie. The signature should be
int checkProduct(char *nameCheck)
since the argument the function expects and receives is a pointer to a char, or, to document it for the user that the argument should be a pointer to the first element of a 0-terminated char array
int checkProduct(char nameCheck[])
Arrays are never passed as arguments to functions, as function arguments, and in most circumstances [the exceptions are when the array is the operand of sizeof, _Alignof or the address operator &] are converted to pointers to the first element.
{
product temp;
p.pName = nameCheck;
Arrays are not assignable. The only time you can have an array name on the left of a = is initialisation at the point where the array is declared.
You probably want
strcpy(p.pName, nameCheck);
there.
rewind (pfp);
while (fread(&temp,STRUCTSIZE,1,pfp)==1)
{
if (strcmp (temp.pName,p.pName))
strcmp returns a negative value if the first argument is lexicographically smaller than the second, 0 if both arguments are equal, and a positive value if the first is lexicographically larger than the second.
You probably want
if (strcmp(temp.pName, p.pName) == 0)
there.
gets (nameCheck);
Never use gets. It is extremely unsafe (and has been remoed from the language in the last standard, yay). Use
fgets(nameCheck, sizeof nameCheck, stdin);
but that stores the newline in the buffer if there is enough space, so you have to overwrite that with 0 if present.
If you are on a POSIX system and don't need to care about portability, you can use getline() to read in a line without storing the trailing newline.
checkProduct (nameCheck);
You check whether the product is known, but throw away the result. Store it in a variable.
while (checkProduct == 1)
checkProduct is a function. In almost all circumstances, a function designator is converted into a pointer, hence the warning about the comparison between a pointer and an integer. You meant to compare to the value of the call you should have stored above.
{
printf ("Product Already Exists!\n Enter another!\n");
while (getchar() !='\n')
You read in characters without storing them. So you will never change the contents of nameCheck, and then be trapped in an infinite loop.
{
continue;
}
If the only statement in a loop body is continue;, you should leave the body empty.
}
p.pName = nameCheck;
Once again, you can't assign to an array.
Concerning the edit,
char *nameCheck;
nameCheck = "";
fgets(nameCheck,sizeof nameCheck, stdin);
you have changed nameCheck from an array to a pointer. That means that sizeof nameCheck now doesn't give the number of chars you can store in the array, but the size of a pointer to char, which is independent of what it points to (usually 4 on 32-bit systems and 8 on 64-bit systems).
And you let that pointer point to a string literal "", which is the reason for the crash. Attempting to modify string literals is undefined behaviour, and more often than not leads to a crash, since string literals are usually stored in a read-only segment of the memory nowadays.
You should have left it at
char nameCheck[100];
fgets(nameCheck, sizeof nameCheck, stdin);
and then you can use sizeof nameCheck to tell fgets how many characters it may read, or, alternatively, you could have a pointer and malloc some memory,
#define NAME_LENGTH 100
char *nameCheck = malloc(NAME_LENGTH);
if (nameCheck == NULL) {
// malloc failed, handle it if possible, or
exit(EXIT_FAILURE);
}
fgets(nameCheck, NAME_LENGTH, stdin);
Either way, after getting input, remove the newline if there is one:
size_t len = strlen(nameCheck);
if (len > 0 && nameCheck[len-1] == '\n') {
nameCheck[len-1] = 0;
}
// Does windows also add a '\r' when reading from stdin?
if (len > 1 && nameCheck[len-2] == '\r') {
nameCheck[len-2] = 0;
}

How to write a function to search for a string within an array

This is what I have so far:
for(i = 0; i <= 9; i++){
printf("%d", i);
found = strpbrk(nameholder[i], searchterm);
if(strpbrk(nameholder[i], searchterm) == searchterm){
printf("found\n");
foundwhere = i + 1;
break;
}
}// end for
When I run the program, the strpbrk function finds the string, but for some reason it never triggers the if statement. What am I missing?
According to http://en.cppreference.com/w/c/string/byte/strpbrk , strpbrk() is for
const char* strpbrk( const char* dest, const char* str );
Finds the first character in byte string pointed to by dest, that is also in byte string pointed to by str.
Thus, if you really want to find the whole searchterm instead of any character of searchterm in nameholder[i], you should use strcmp or strstr.
Also note that the operator == can not be used to compare the equality of two char* strings since it simply compare if the addresses are equal or not disregarding the string content. Use strcmp() instead.
If I correctly understood (your description is vague) what you are trying to do, then you seem to be using a wrong function.
Quoting cpp docs on strpbrk:
Returns a pointer to the first occurrence in str1 of any of the characters that are part of str2, or a null pointer if there are no matches.
That's not what you want it to do, right? You should be looking at strcpm function.
http://www.cplusplus.com/reference/clibrary/cstring/strcmp/
Your code should look like:
for(i = 0; i <= 9; i++){
if(strcmp(nameholder[i], searchterm) == 0){
printf("found\n");
foundwhere = i + 1;
break;
}
}// end for

Resources