How can I convert path containing wildcard to corresponding file entries in C program? - c

I'm trying to implement the ls command with wildcard, *.
I have just learned the fact that most shells convert ls-argument containing * to the corresponding entries when performing ls command.
For example, The directory foo consist of a.file, b.file, and directory bar.
Then, the directory bar has c.file, d.file, and e.file.
and assume that current directory is the directory foo.
the argument */* is converted is to the following entries.
"bar/c.file", "bar/d.file", "bar/e.file"
How can program perform this? I don't know where to start from. And
there are many possible cases.
*/../*, ../../*, */*/*, etc.
Any advice would be awesome. Thank you..

You can of couse use glob() to do a lot of this work.
Such patterns are called globs, for some reason I won't dig up now. :)

POSIX provides glob(3) for programmatic wildcard path expansion.

Related

own shell in C using execv

I am trying to build my own shell in C as part of a class project. We are required to use execv and implement our own path. For better understanding here is the question:
The list of paths is empty by default, but may grow to any arbitrary size. You should implement a built-in command to control this variable:
path [+|- /some/dir]
path (without arguments) displays all the entries in the list separated by colons, e.g. "/bin:/usr/bin".
path + /some/dir appends the given pathname to the path list.
path - /some/dir removes the given pathname from the path list.
I have misread the assignment and used execvp so far. Please can you shed some light on how to create my own path variable, and for each command executed search the directory it is in and add it to the path? Or is there any simple shell written using execv I can take a look at?
I saw http://linuxgazette.net/111/ramankutty.html, but I found the search a little too complex, and he uses execve.
so far i have char *mypath variable which is null initially. but the user can add or remove using path + some/dir or path - /some/dir. syntax for execv is execv("/some/dir", argv) how do i search my path for the executable and pass it to execv....for example mypath=/bin/ls ; when i pass execv(mypath, argv) it does not work...so how do i pass the path to execv?
I'm guessing the reason you are supposed to use excev is precisely that it doesn't take into account the path of the environment, but the call has to provide a full path to the function.
Since this is a class project, you are supposed to write your code - writing code is how you learn how to do things, much more than copy-and-paste from the internet, so I'm not going to write code to solve the problem but instead describe the solution.
You will need to keep a list of path entries - adjusted through the path + some/dir and path - some/dir mechanism - so these commands need to be handled inside your shell, of course, and they should add/remove from your list of path entries.
When you then come to executing something, say "mycommand" is entered, you will have to scan the list of path entries, and check if there is a file by the name "mycommand" in the directory specified by the path entry that can be executed (has execute bit set in the directory entry). If so, call execv on the string of current path entry and "mycommand" concatenated. (You can produce the concatenated string and use the stat function to get the information about the file, for example)
Do check for errors, and report if something goes wrong.
Please do not try to find someone else's shell on the internet. That is not how you learn, and if you don't actually learn from the class exercises, you will most likely not succeed once you finish school - and that's ultimately WHY you are going to school, right?

What corner cases must we consider when parsing $PATH on Linux?

I'm working on a C application that has to walk $PATH to find full pathnames for binaries, and the only allowed dependency is glibc (i.e. no calling external programs like which). In the normal case, this just entails splitting getenv("PATH") by colons and checking each directory one by one, but I want to be sure I cover all of the possible corner cases. What gotchas should I look out for? In particular, are relative paths, paths starting with ~ meant to be expanded to $HOME, or paths containing the : char allowed?
One thing that once surprised me is that the empty string in PATH means the current directory. Two adjacent colons or a colon at the end or beginning of PATH means the current directory is included. This is documented in man bash for instance.
It also is in the POSIX specification.
So
PATH=:/bin
PATH=/bin:
PATH=/bin::/usr/bin
All mean the current directory is in PATH
I'm not sure this is a problem with Linux in general, but make sure that your code works if PATH has some funky (like, UTF-8) encoding to deal with directories with fancy letters. I suspect this might depend on the filesystem encoding.
I remember working on a bug report of some russian guy who had fancy letters in his user name (and hence, his home directory name which appeared in PATH).
This is minor but I'll added it since it hasn't already been mentioned. $PATH can include both absolute and relative paths. If your crawling the paths list by chdir(2)ing into each directory, you need to keep track of the original working directory (getcwd(3)) and chdir(2) back to it at each iteration of the crawl.
The existing answers cover most of it, but it's worth covering parts of the question that wasn't answered yet:
$ and ~ are not special in the value of $PATH.
If $PATH is not set at all, execvp() will use a default value.

implementing globbing in a shell prototype

I'm implementing a linux shell for my weekend assignment and I am having some problems implementing wilcard matching as a feature in shell. As we all know, shells are a complete language by themselves, e.g. bash, ksh, etc. I don't need to implement the complete features like control structures, jobs etc. But how to implement the *?
A quick analysis gives you the following result:
echo *
lists all the files in the current directory. Is this the only logical manifestation of the shell? I mean, not considering the language-specific features of bash, is this what a shell does, internally? Replace a * with all the files in the current directory matching the pattern?
Also I have heard about Perl Compatible Regular Expression , but it seems to complex to use a third party library.
Any suggestions, links, etc.? I will try to look at the source code as well, for bash.
This is called "globbing" and the function performing this is named the same: glob(3)
Yes, that's what shell does. It will replace '*' characters by all files and folder names in cwd. It is in fact very basic regular expressions supporting only '?' and '*' and matching with file and folder names in cwd.
Remark that backslashed \* and '*' enclosed between simple or double quotes ' or " are not replaced (backslash and quotes are removed before passing to the command executed).
If you want more control than glob gives, the standard function fnmatch performs just glob matching.
Note that shells also performs word expansion (e.g. "~" → "/home/user"), which should be done before glob expansion, if you're doing filename matching manually. (Or use wordexp.)

What can I do if getcwd() and getenv("PWD") don't match?

I have a build system tool that is using getcwd() to get the current working directory. That's great, except that sometimes people have spaces in their paths, which isn't supported by the build system. You'd think that you could just make a symbolic link:
ln -s "Directory With Spaces" DirectoryWithoutSpaces
And then be happy. But unfortunately for me, getcwd() resolves all the symbolic links. I tried to use getenv("PWD"), but it is not pointing at the same path as I get back from getcwd(). I blame make -C for not updating the environment variable, I think. Right now, getcwd() gives me back a path like this:
/Users/carl/Directory With Spaces/Some/Other/Directories
And getenv("PWD") gives me:
/Users/carl/DirectoryWithoutSpaces
So - is there any function like getcwd() that doesn't resolve the symbolic links?
Edit:
I changed
make -C Some/Other/Directories
to
cd Some/Other/Directories ; make
And then getenv("PWD") works.. If there's no other solution, I can use that.
According to the Advanced Programming in the UNIX Environment bible by Stevens, p.112:
Since the kernel must maintain knowledge of the current working directory, we should be able to fetch its current value. Unfortunately, all the kernel maintains for each process is the i-node number and device identification for the current working directory. The kernel does not maintain the full pathname of the directory.
Sorry, looks like you do need to work around this in another way.
There is no way for getcwd() to determine the path you followed via symbolic links. The basic implementation of getcwd() stats the current directory '.', and then opens the parent directory '..' and scans the entries until it finds the directory name with the same inode number as '.' has. It then repeats the process upwards until it finds the root directory, at which point it has the full path. At no point does it ever traverse a symbolic link. So the goal of having getcwd() calculate the path followed via symlinks is impossible, whether it is implemented as a system call or as a library function.
The best resolution is to ensure that the build system handles path names containing spaces. That means quoting pathnames passed through the shell. C programs don't care about the spaces in the name; it is only when a program like the shell interprets the strings that you run into problems. (Compilers implemented as shell scripts that run pre-processors often have problems with pathnames that contain spaces - speaking from experience.)

C - Reading multiple files

just had a general question about how to approach a certain problem I'm facing. I'm fairly new to C so bear with me here. Say I have a folder with 1000+ text files, the files are not named in any kind of numbered order, but they are alphabetical. For my problem I have files of stock data, each file is named after the company's respective ticker. I want to write a program that will open each file, read the data find the historical low and compare it to the current price and calculate the percent change, and then print it. Searching and calculating are not a problem, the problem is getting the program to go through and open each file. The only way I can see to attack this is to create a text file containing all of the ticker symbols, having the program read that into an array and then run a loop that first opens the first filename in the array, perform the calculations, print the output, close the file, then loop back around moving to the second element (the next ticker symbol) in the array. This would be fairly simple to set up (I think) but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this? Not really asking for code ( unless there is some amazing function in c that will do this for me ;) ), just some advice from more experienced C programmers.
Thanks :)
Edit: This is on Linux, sorry I forgot to metion that!
Under Linux/Unix (BSD, OS X, POSIX, etc.) you can use opendir / readdir to go through the directory structure. No need to generate static files that need to be updated, when the file system has the information you want. If you only want a sub-set of stocks at a given time, then using glob would be quicker, there is also scandir.
I don't know what Win32 (Windows / Platform SDK) functions are called, if you are developing using Visual C++ as your C compiler. Searching MSDN Library should help you.
Assuming you're running on linux...
ls /path/to/text/files > names.txt
is exactly what you want.
opendir(); on linux.
http://linux.die.net/man/3/opendir
Exemple :
http://snippets.dzone.com/posts/show/5734
In pseudo code it would look like this, I cannot define the code as I'm not 100% sure if this is the correct approach...
for each directory entry
scan the filename
extract the ticker name from the filename
open the file
read the data
create a record consisting of the filename, data.....
close the file
add the record to a list/array...
> sort the list/array into alphabetical order based on
the ticker name in the filename...
You could vary it slightly if you wish, scan the filenames in the directory entries and sort them first by building a record with the filenames first, then go back to the start of the list/array and open each one individually reading the data and putting it into the record then....
Hope this helps,
best regards,
Tom.
There are no functions in standard C that have any notion of a "directory". You will need to use some kind of platform-specific function to do this. For some examples, take a look at this post from Cprogrammnig.com.
Personally, I prefer using the opendir()/readdir() approach as shown in the second example. It works natively under Linux and also on Windows if you are using Cygwin.
Approach 1) I would just have a specific directory in which I have ONLY these files containing the ticker data and nothing else. I would then use the C readdir API to list all files in the directory and iterate over each one performing the data processing that you require. Which ticker the file applies to is determined only by the filename.
Pros: Easy to code
Cons: It really depends where the files are stored and where they come from.
Approach 2) Change the file format so the ticker files start with a magic code identifying that this is a ticker file, and a string containing the name. As before use readdir to iterate through all files in the folder and open each file, ensure that the magic number is set and read the ticker name from the file, and process the data as before
Pros: More flexible than before. Filename needn't reflect name of ticker
Cons: Harder to code, file format may be fixed.
but I'd really like to avoid typing out over a thousand file names into a text file. Is there a better way to approach this?
I have solved the exact same problem a while back, albeit for personal uses :)
What I did was to use the OS shell commands to generate a list of those files and redirected the output to a text file and had my program run through them.
On UNIX, there's the handy glob function:
glob_t results;
memset(&results, 0, sizeof(results));
glob("*.txt", 0, NULL, &results);
for (i = 0; i < results.gl_pathc; i++)
printf("%s\n", results.gl_pathv[i]);
globfree(&results);
On Linux or a related system, you could use the fts library. It's designed for traversing file hierarchies: man fts,
or even something as simple as readdir
If on Windows, you can use their Directory Management API's. More specifically, the FindFirstFile function, used with wildcards, in conjunction with FindNextFile

Resources