C regex validate filename under a folder - c

I am new to regular expressions in C and I am trying to find if the given filename is under a folder using regex using regex.h library. This is what I have tried:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>
int checkregex(char regex_str[100], char test[100]) {
regex_t regex;
printf("regex_str: %s\n\n", regex_str);
int reti = regcomp(&regex, regex_str, REG_EXTENDED | REG_ICASE);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(&regex, test, 0, NULL, REG_EXTENDED | REG_ICASE);
regfree(&regex);
return reti;
}
void main(int argc, char *argv[]) {
const char *safepath = "/home";
size_t spl = strlen(safepath);
char *fn = argv[1];
int noDoubleDots = checkregex("[^..\\/]", fn);
int allowedChars = checkregex("^[[:alnum:]\\/._ -]*$", fn);
int backslashWithSpace = checkregex(".*(\\ ).*", fn);
puts("noDoubleDots");
puts((noDoubleDots == 0 ? "Match\n" : "No Match\n"));
puts("allowedChars");
puts((allowedChars == 0 ? "Match\n" : "No Match\n"));
puts("backslashWithSpace");
puts((backslashWithSpace == 0 ? "Match\n" : "No Match\n"));
return;
}
My first attempt was just do not match if it includes .. (I couldn't even manage to do it) with noDubleDots. But then I tested and saw that file names and folder names can have .. in them, like folder..name/. So I wanted to exclude the ones with /.. or ../. But if the folder name is something like folder .. and it has another folder inside named folder2/ then the path will be folder\ ../folder2 and excluding ../ would result in wrong output.
In the code, allowedChars works fine. I think if I also checked if the file name has exactly .., \ .. or \ ([:alnum:])* to validate the file path, it would be done. But my regular expression doesn't seem to be working. For example, backslashWithSpace matches with asd / and asd\ /.
How can I check and make sure that the given path is under a folder using regular expressions? Thanks in advance.

POSIX offer a nice function realpath()
realpath() expands all symbolic links and resolves references to /./,
/../ and extra '/' characters in the null-terminated string named by
path to produce a canonicalized absolute pathname. The resulting
pathname is stored as a null-terminated string, up to a maximum of
PATH_MAX bytes, in the buffer pointed to by resolved_path. The
resulting path will have no symbolic link, /./ or /../ components.
If you can use it, I think it will fit your need, if not maybe you could copy the source code.

Related

C realpath return NULL vector, can't find the path

I wanted a function to return the absolute path given a relative path to an existing file.
Searching online I came across realpath here: https://stackoverflow.com/a/229038/19637794
And followed the example here: Example of realpath function in C
So I wrote:
#include <limits.h>
#include <stdlib.h>
char path[PATH_MAX];
void test(void){
char *ppath = realpath("../../test/src/file", path );
if (ppath != NULL)
{
*do stuff, read the file*
}
else
{
return -1;
}
free(ppath);
}
int main ()
{
test();
}
Where my working directories are as follow
dir
|---CMakeLists.txt
|---build/
| |---test/
| |
| |---executable
|
|---test/
| |---CMakeLists.txt
| |---src/
| |
| |---test.c
| |---file
I launched the executable from build directory with ./test/executable but I kept getting -1, checked with gdb and verified that ppath was 0x0.
I read My realpath return null for files
But didn't seem to fit my problem
Then I read about #include <errno.h>, so added this two line and it said "No such file or directory"
else
{
char* errStr = strerror(errno);
printf("%s" ,errStr);
}
After that I added printf("%s", path); that returned: dir/test, which is the directory above the one I wanted, still ppath is NULL.
I tried to add a file in the directory above, but doesn't seem to work either, nor with its original path nor with the new one.
I also read it might happen having realpath returning NULL if the path exceed the maximum allowed in phase of declaration, so I also tried to remove path and feeding NULL, like it was done at https://www.demo2s.com/c/c-char-real-realpath-p-null.html as follow:
char *ppath = realpath("../../test/src/file", NULL );
Which didn't work either.
What am I doing wrong?
Edit: few modify based on comment
Edit2: added parenthesis between test and ;
As your comment says "from the build directory I run ./test/executable", the present working directory is "dir/build". Starting from this directory, realpath() is right telling you that there is no "dir/build/../../test/src/file".

Browse a whole folder in C

I'm trying to list all the contents of a folder (including subfolder and its files)
Like ls -R with Linux
(I am using windows 10)
I already have this basic code with "dirent.h"
#include <stdio.h>
#include <dirent.h>
int main()
{
DIR *rep ;
struct dirent *file ;
rep = opendir ("c:\test") ;
if (rep != NULL)
{
while (file = readdir(rep))
printf ("%s\n", file->d_name) ;
(void) closedir (rep) ;
}
return 0;
}
It lists the contents of a folder well but does not browse the sub-folders
For example it could browse a whole hard drive
like C: /
I can't use d_type for detect if the content is a file or a folder
Because with windows the struct is:
struct dirent
{
long d_ino; /* Always zero. */
unsigned short d_reclen; /* Always zero. */
unsigned short d_namlen; /* Length of name in d_name. */
char d_name[260]; /* [FILENAME_MAX] */ /* File name. */
};
So I'm stuck on this problem, if anyone has an idea, or even a code
COMPILER: MinGW32 1.5.0
Here is an example of directory list for Windows.
I used Microsoft Visual Studio Community 2019 to build. It works as a Unicode Windows application. That is files and folders having name with non ASCII characters are handled correctly.
To achieve that, I used Windows typical data types and functions:
char -> WCHAR
strcpy -> wcscpy
strcat -> wcscat
strncmp -> wcsncmp
printf -> wprintf
Depending on the compiler you use, you may use the standard data types and functions.
String constant are prefixed with L to specify an Unicode string (16 bit characters).
The main function is ScanDir which take the starting directory and a file mask. Example of call:
ScanDir(L"C:\\Users\\fpiette\\Documents", L"*.jpg");
ScanDir will scan the specified folder for all files and then scan again for all directories, calling ScanDir recursively. For each file, the size and filename are displayed (Of course you may display other properties like time stamp and attributes). For each directory, the name is displayed.
Basically, iterating a directory is done using Windows FindFirstFile and FileNextFile.
Source code:
#include <stdio.h>
#include <stdlib.h>
#include <io.h>
#include <fcntl.h>
#include <Windows.h>
BOOL ScanDir(
WCHAR* srcDir,
WCHAR* fileMask)
{
WIN32_FIND_DATA fd;
HANDLE fh;
BOOL more;
WCHAR fromDir[MAX_PATH];
BOOL found;
_int64 fileSize;
wcscpy(fromDir, srcDir);
wcscat(fromDir, L"\\");
wcscat(fromDir, fileMask);
// First step: process files in current dir
fh = FindFirstFile(fromDir, &fd);
more = fh != INVALID_HANDLE_VALUE;
found = FALSE;
while (more) {
// Ignore directories in first step
if (0 == (fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
if (!found) {
// For the first file found, display the title
found = TRUE;
wprintf(L"\nDirectory %s\n\n", srcDir);
}
fileSize = ((_int64)fd.nFileSizeHigh << 32) + fd.nFileSizeLow;
// display file information
wprintf(L"%12lld %s\n", fileSize, fd.cFileName);
}
more = FindNextFile(fh, &fd);
}
FindClose(fh);
// Second step: recursively process subfolders
wcscpy(fromDir, srcDir);
wcscat(fromDir, L"\\*.*");
fh = FindFirstFile(fromDir, &fd);
more = fh != INVALID_HANDLE_VALUE;
while (more) {
// Ignore files in second step
if (0 != (fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
// We have a directory, process it recursively
if (wcsncmp(fd.cFileName, L".", 2) && // Ignore current directory "."
wcsncmp(fd.cFileName, L"..", 3)) { // Ignore parent directory ".."
wcscpy(fromDir, srcDir);
wcscat(fromDir, L"\\");
wcscat(fromDir, fd.cFileName);
if (!ScanDir(fromDir, fileMask))
return 0;
}
}
more = FindNextFile(fh, &fd);
}
FindClose(fh);
return TRUE;
}
int main()
{
// Change console output to unicode 16 bit (default is OEM)
_setmode(_fileno(stdout), _O_U16TEXT);
ScanDir(L"C:\\Users\\fpiette\\Documents", L"*.jpg");
return 0;
}

How to get normalized (canonical) file path on Linux "even if the filepath is not existing on the file system"? (In a C program))

I have researched a lot on this topic but could not get anything substantial.
By normalize/canonicalize I mean to remove all the "..", ".", multiple slashes etc from a file path and get a simple absolute path.
e.g.
"/rootdir/dir1/dir2/dir3/../././././dir4//////////" to
"/rootdir/dir1/dir2/dir4"
On windows I have GetFullPathName() and I can get the canonical filepath name, but for Linux I cannot find any such API which can do the same work for me,
realpath() is there, but even realpath() needs the filepath to be present on the file system to be able to output normalized path, e.g. if the path /rootdir/dir1/dir2/dir4 is not on file system - realpath() will throw error on the above specified complex filepath input.
Is there any way by which one could get the normalized file path even if it is not existing on the file system?
realpath(3) does not resolve missing filenames.
But GNU core utilities (https://www.gnu.org/software/coreutils/) have a program realpath(1) which is similar to realpath(3) function, but have option:
-m, --canonicalize-missing no components of the path need exist
And your task can be done by canonicalize_filename_mode() function from file lib/canonicalize.c of the coreutils source.
canonicalize_filename_mode() from Gnulib is a great option but cannot be used in commercial software (GPL License)
We use the following implementation that depends on cwalk library:
#define _GNU_SOURCE
#include <unistd.h>
#include <stdlib.h>
#include "cwalk.h"
/* extended version of canonicalize_file_name(3) that can handle non existing paths*/
static char *canonicalize_file_name_missing(const char *path) {
char *resolved_path = canonicalize_file_name(path);
if (resolved_path != NULL) {
return resolved_path;
}
/* handle missing files*/
char *cwd = get_current_dir_name();
if (cwd == NULL) {
/* cannot detect current working directory */
return NULL;
}
size_t resolved_path_len = cwk_path_get_absolute(cwd, path, NULL, 0);
if (resolved_path_len == 0) {
return NULL;
}
resolved_path = malloc(resolved_path_len + 1);
cwk_path_get_absolute(cwd, path, resolved_path, resolved_path_len + 1);
free(cwd);
return resolved_path;
}

How to remove the path to get the filename

How does one remove the path of a filepath, leaving only the filename?
I want to extract only the filename from a fts_path and store this in a char *fileName.
Here's a function to remove the path on POSIX-style (/-separated) pathnames:
char *base_name(const char *pathname)
{
char *lastsep = strrchr(pathname, '/');
return lastsep ? lastsep+1 : pathname;
}
If you need to support legacy systems with odd path separators (like MacOS 9 or Windows), you might need to adapt the above to search for multiple possible separators. For example on Windows, both / and \ are path separators and any mix of them can be used.
You want basename(3).
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <libgen.h>
int main(void)
{
char * path = "/homes/mk08/Desktop/lala.c";
char * tmp = strdup(path);
if(tmp) {
printf("%s\n", basename(tmp));
free(tmp);
}
return EXIT_SUCCESS;
}
This will output:
lala.c
I'm sure there is a less roundabout way of doing this, but you could always search through the filepath (I assume it is stored as a char array?), get the position of the final '\', and then erase everything prior to that.
Edit: See R's comment.

How to extract filename from path

There should be something elegant in Linux API/POSIX to extract base file name from full path
See char *basename(char *path).
Or run the command "man 3 basename" on your target UNIX/POSIX system.
Use basename (which has odd corner case semantics) or do it yourself by calling strrchr(pathname, '/') and treating the whole string as a basename if it does not contain a '/' character.
Here's an example of a one-liner (given char * whoami) which illustrates the basic algorithm:
(whoami = strrchr(argv[0], '/')) ? ++whoami : (whoami = argv[0]);
an additional check is needed if NULL is a possibility. Also note that this just points into the original string -- a "strdup()" may be appropriate.
You could use strstr in case you are interested in the directory names too:
char *path ="ab/cde/fg.out";
char *ssc;
int l = 0;
ssc = strstr(path, "/");
do{
l = strlen(ssc) + 1;
path = &path[strlen(path)-l+2];
ssc = strstr(path, "/");
}while(ssc);
printf("%s\n", path);
The basename() function returns the last component of a path, which could be a folder name and not a file name. There are two versions of the basename() function: the GNU version and the POSIX version.
The GNU version can be found in string.h after you include #define _GNU_SOURCE:
#define _GNU_SOURCE
#include <string.h>
The GNU version uses const and does not modify the argument.
char * basename (const char *path)
This function is overridden by the XPG (POSIX) version if libgen.h is included.
char * basename (char *path)
This function may modify the argument by removing trailing '/' bytes. The result may be different from the GNU version in this case:
basename("foo/bar/")
will return the string "bar" if you use the XPG version and an empty string if you use the GNU version.
References:
basename (3) - Linux Man Pages
Function: char * basename (const char *filename), Finding Tokens in a String.
Of course if this is a Gnu/Linux only question then you could use the library functions.
https://linux.die.net/man/3/basename
And though some may disapprove these POSIX compliant Gnu Library functions do not use const. As library utility functions rarely do. If that is important to you I guess you will have to stick to your own functionality or maybe the following will be more to your taste?
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char *fn;
char *input;
if (argc > 1)
input = argv[1];
else
input = argv[0];
/* handle trailing '/' e.g.
input == "/home/me/myprogram/" */
if (input[(strlen(input) - 1)] == '/')
input[(strlen(input) - 1)] = '\0';
(fn = strrchr(input, '/')) ? ++fn : (fn = input);
printf("%s\n", fn);
return 0;
}
template<typename charType>
charType* getFileNameFromPath( charType* path )
{
if( path == NULL )
return NULL;
charType * pFileName = path;
for( charType * pCur = path; *pCur != '\0'; pCur++)
{
if( *pCur == '/' || *pCur == '\\' )
pFileName = pCur+1;
}
return pFileName;
}
call:
wchar_t * fileName = getFileNameFromPath < wchar_t > ( filePath );
(this is a c++)
You can escape slashes to backslash and use this code:
#include <stdio.h>
#include <string.h>
int main(void)
{
char path[] = "C:\\etc\\passwd.c"; //string with escaped slashes
char temp[256]; //result here
char *ch; //define this
ch = strtok(path, "\\"); //first split
while (ch != NULL) {
strcpy(temp, ch);//copy result
printf("%s\n", ch);
ch = strtok(NULL, "\\");//next split
}
printf("last filename: %s", temp);//result filename
return 0;
}
I used a simpler way to get just the filename or last part in a path.
char * extract_file_name(char *path)
{
int len = strlen(path);
int flag=0;
printf("\nlength of %s : %d",path, len);
for(int i=len-1; i>0; i--)
{
if(path[i]=='\\' || path[i]=='//' || path[i]=='/' )
{
flag=1;
path = path+i+1;
break;
}
}
return path;
}
Input path = "C:/Users/me/Documents/somefile.txt"
Output = "somefile.txt"
#Nikolay Khilyuk offers the best solution except.
1) Go back to using char *, there is absolutely no good reason for using const.
2) This code is not portable and is likely to fail on none POSIX systems where the / is not the file system delimiter depending on the compiler implementation. For some windows compilers you might want to test for '\' instead of '/'. You might even test for the system and set the delimiter based on the results.
The function name is long but descriptive, no problem there. There is no way to ever be sure that a function will return a filename, you can only be sure that it can if the function is coded correctly, which you achieved. Though if someone uses it on a string that is not a path obviously it will fail. I would have probably named it basename, as it would convey to many programmers what its purpose was. That is just my preference though based on my bias your name is fine. As far as the length of the string this function will handle and why anyone thought that would be a point? You will unlikely deal with a path name longer than what this function can handle on an ANSI C compiler. As size_t is defined as a unsigned long int which has a range of 0 to 4,294,967,295.
I proofed your function with the following.
#include <stdio.h>
#include <string.h>
char* getFileNameFromPath(char* path);
int main(int argc, char *argv[])
{
char *fn;
fn = getFileNameFromPath(argv[0]);
printf("%s\n", fn);
return 0;
}
char* getFileNameFromPath(char* path)
{
for(size_t i = strlen(path) - 1; i; i--)
{
if (path[i] == '/')
{
return &path[i+1];
}
}
return path;
}
Worked great, though Daniel Kamil Kozar did find a 1 off error that I corrected above. The error would only show with a malformed absolute path but still the function should be able to handle bogus input. Do not listen to everyone that critiques you. Some people just like to have an opinion, even when it is not worth anything.
I do not like the strstr() solution as it will fail if filename is the same as a directory name in the path and yes that can and does happen especially on a POSIX system where executable files often do not have an extension, at least the first time which will mean you have to do multiple tests and searching the delimiter with strstr() is even more cumbersome as there is no way of knowing how many delimiters there might be. If you are wondering why a person would want the basename of an executable think busybox, egrep, fgrep etc...
strrchar() would be cumbersome to implement as it searches for characters not strings so I do not find it nearly as viable or succinct as this solution. I stand corrected by Rad Lexus this would not be as cumbersome as I thought as strrchar() has the side effect of returning the index of the string beyond the character found.
Take Care
My example (improved):
#include <string.h>
const char* getFileNameFromPath(const char* path, char separator = '/')
{
if(path != nullptr)
{
for(size_t i = strlen(path); i > 0; --i)
{
if (path[i-1] == separator)
{
return &path[i];
}
}
}
return path;
}

Resources