stat() giving wrong directory size in c - c

I need to find the size of a file or a directory whatever given in the commandline using stat(). It works fine for the files (both relative and absolute paths) but when I give a directory, it always returns the size as 512 or 1024.
If I print the files in the directory it goes as follows :
Name : .
Name : ..
Name : new
Name : new.c
but only the new and new.c files are actually in there. For this, the size is returned as 512 even if I place more files in the directory.
Here s my code fragment:
if (stat(request.data,&st)>=0){
request.msgType = (short)0xfe21;
printf("\n Size : %ld\n",st.st_size);
sprintf(reply.data,"%ld",st.st_size);
reply.dataLen = strlen(reply.data);
}
else{
perror("\n Stat()");
}
}
Where did I go wrong???
here is my request, reply structure:
struct message{
unsigned short msgType;
unsigned int offset;
unsigned int serverDelay;
unsigned int dataLen;
char data[100];
};
struct message request,reply;
I run it in gcc compiler in unix os.

stat() on a directory doesn't return the sum of the file sizes in it. The size field represents how much space it taken by the directory entry instead, and it varies depending on a few factors. If you want to know how much space is taken by all files below a specific directory, then you have to recurse down the tree, adding up the space taken by all files. This is how tools like du work.

Yes. opendir() + loop on readdir()/stat() will give you the file/directory sizes which you can sum to get a total. If you have sub-directories you will also have to loop on those and the files within them.
To use du you could use the system() function. This only returns a result code to the calling program so you could save the results to a file and then read the file. The code would be something like,
system("du -sb dirname > du_res_file");
Then you can read the file du_res_file (assuming it has been created successfully) to get your answer. This would give the size of the directory + sub-directories + files in one go.

Im sorry, I missed it the first time, stat only gives the size of files, not directories:
These functions return information about a file. No permissions are required on the file itself, but-in the case of stat() and lstat() - execute (search) permission is required on all of the directories in path that lead to the file.
The st_size field gives the size of the file (if it is a regular file or a symbolic link) in bytes. The size of a symbolic link is the length of the pathname it contains, without a terminating null byte.
look at the man page on fstat/stat

Related

Why doesn't the "GetShortPathName" function sometimes gives me the short path?

I am trying to use the GetShortPathName() function to give me the short version of two paths, but it succeeds in only one path and fails in the other path.
// Get the game directory path
wchar_t GameDirPath[MAX_PATH] = L"\0";
GetCurrentDirectory(MAX_PATH, GameDirPath);
// Get the engine directory path
wchar_t EngineDirPath[MAX_PATH] = L"\0";
wcscat(EngineDirPath, GameDirPath);
wcscat(EngineDirPath, L"\\Assets\\Engine\\");
// Get the short path of the engine directory
wchar_t EngineShortPath[MAX_PATH] = L"\0";
GetShortPathName(EngineDirPath, EngineShortPath, MAX_PATH);
The following gives me the correct short path:
D:\Games\NEEDFO~1\Assets\Engine\
But this one doesn't:
D:\Games\FIFA 97\Assets\Engine\
Note that the two examples exist in the same folder "Games".
In short:
I want to pass the path to "DOSBox.exe" as a parameter but it doesn't accept the windows paths like this "D:\Games\FIFA 97\Assets\Engine", so you must convert it to a DOS path like this "D:\Games\FIFA97~1\Assets\Engine", so, I try to use the GetShortPathName() function to do that mission.
Why does this problem happen, and how can I solve it?
As the documentation explicitly states:
If the specified path is already in its short form and conversion is not needed, the function simply copies the specified path to the buffer specified by the lpszShortPath parameter.
The API behaves as documented. There's nothing that needs to be fixed.
Back to 90s, when DOS OS was largely used, files and directories names were limited to a maximum length of 8 characters (8.3 format; meaning 8 bytes for the file name and 3 for the file extension).
So hello.txt file was admitted, and helloguys.txt (9 chars log) was "illegal".
With Windows this limitation has beeen removed, and short names have been introduced in order to convert paths to DOS compliant format.
Now that we know what a short path is, we can analyze your case. In path
D:\Games\Fifa 97\Assets\Engine\
every token is DOS compliant. So what is the short version of this path? Well, the path itself. And that's why GetShortPathName( ) returns an unchanged path.
You can find a wide description in docs page.

How to check if a symbolic link refers to a directory

I am currently recoding the "ls" command to learn. However, when I browse files: I may have an error when I try to open the "folder" of the path pointed by the symbolic link. Because it's not a directory (I thought all symbolic links pointed to folders).
How can I check if it points to a directory? (I watch the manuals, stat, dir ..)
I thought all symbolic links pointed to folders
Nope. A symbolic link is an indirect reference to another path. That other path can refer to any kind of file that can be represented in any mounted file system, or to no file at all (i.e. it can be a broken link).
How to check that it points to a directory?
You mention the stat() function, but for reimplementing ls you should mostly be using lstat(), instead. The difference is that when the specified path refers to a symbolic link, stat returns information about the link's target path, whereas lstat returns information about the link itself (including information about the file type, from which you can tell that it is a link).
In the event that you encounter a symbolic link, you can simply check the same path again with stat() to find out what kind of file it points to. stat() will recursively resolve symbolic links to discover the information for the ultimate target, which will be a symbolic link only if it is a broken one. Any way around, you don't need to distinguish between a broken link and any other form of non-directory for your particular purpose.
I just ran into the same problem, and here is my solution:
bool IsDir(const char *path)
{
std::string tmp = path;
tmp += '/';
struct stat statbuf;
return (lstat(tmp.c_str(), &statbuf) >= 0) && S_ISDIR(statbuf.st_mode);
}
the key is the tail / in the path
however, I have no idea whether it's portable

stat alternative for long file paths

I'm writing a program that iterates through a directory tree depth first (similar to the GNU find program) by recursively constructing paths to each file in the tree and stores the relative paths of encountered files. It also collects some statistics about these files. For this purpose I'm using the stat function.
I've notices that this fails for very deep directory hierarchies, i.e. long file paths, in accordance with stat's documentation.
Now my question is: what alternative approach could I use here that is guaranteed to work for paths of any length? (I don't need working code, just a rough outline would be sufficient).
As you are traversing, open each directory you traverse.
You can then get information about a file in that directory using fstatat. The fstatat function takes an additional parameter, dirfd. If you pass a handle to an open directory in that parameter, the path is interpreted as relative to that directory.
int fstatat(int dirfd, const char *pathname, struct stat *buf,
int flags);
The basic usage is:
int dirfd = open("directory path", O_RDONLY);
struct stat st;
int r = fstatat(dirfd, "relative file path", &st, 0);
You can, of course, also use openat instead of open, as you recurse. And the special value AT_FDCWD can be passed as dirfd to refer to the current working directory.
Caveats
It is easy to get into symlink loops and recurse forever. It is not uncommon to find symlink loops in practice. On my system, /usr/bin/X11 is a symlink to /usr/bin.
Alternatives
There are easier ways to traverse file hierarchies. Use ftw or fts instead, if you can.

#include "~/file_name" doesn't compile

I have 2 programs:
#include "file1"
int main(void)
{
return (1);
}
where file1 is just an empty file located in the same directory as the program.
Then I have:
#include "~/file2"
int main(void)
{
return (1);
}
where file2 is an empty file but this time is located in my home directory.
The first program compiles, the second program complains and says file not found
Can someone explain what is going on here?
A directive like:
#include "file.h"
searches for a file whose name is file.h. The string ~/file2 in your example is not actually the name of a file. The ~ is expanded by the shell to the path of your home directory; the actual file name is something like /home/username/file2.
The mapping of strings to files can be complex, and it can vary from system to system, but in general there's a fair amount of syntax that's recognized by the shell and converted to a file name, but that doesn't form a file name itself. Variable names like $HOME are similar; you couldn't use $HOME in a #include directive.
Actually
#include "~/file2"
could be valid -- if you have a directory whose name is literally ~ containing a file whose name is file2. That would be legal but confusing.
You could use
#include "/home/username/file2"
but that ties the source to your particular home directory and would make it difficult for anyone else to use ut.
Usually a file to be included should have a name ending in .h, and its location should be either relative to the directory containing the source file, or in one of several locations searched by the compiler.

Using File Descriptors with readlink()

I have a situation where I need to get a file name so that I can call the readlink() function. All I have is an integer that was originally stored as a file descriptor via an open() command. Problem is, I don't have access to the function where the open() command executed (if I did, then I wouldn't be posting this). The return value from open() was stored in a struct that I do have access to.
char buf[PATH_MAX];
char tempFD[2]; //file descriptor number of the temporary file created
tempFD[0] = fi->fh + '0';
tempFD[1] = '\0';
char parentFD[2]; //file descriptor number of the original file
parentFD[0] = (fi->fh - 1) + '0';
parentFD[1] = '\0';
if (readlink(tempFD, buf, sizeof(buf)) < 0) {
log_msg("\treadlink() error\n");
perror("readlink() error");
} else
log_msg("readlink() returned '%s' for '%s'\n", buf, tempFD);
This is part of the FUSE file system. The struct is called fi, and the file descriptor is stored in fh, which is of type uint64_t. Because of the way this program executes, I know that the two linked files have file descriptor numbers that are always 1 apart. At least that's my working assumption, which I am trying to verify with this code.
This compiles, but when I run it, my log file shows a readlink error every time. My file descriptors have the correct integer values stored in them, but it's not working.
Does anyone know how I can get the file name from these integer values? Thanks!
If it's acceptable that your code becomes non portable and is tied to being run on a somewhat modern version of Linux, then you can use /proc/<pid>/fd/<fd>. However, I would recommend against adding '0' to the fd as a means to get the string representing the number, because it uses the assumption that fd < 10.
However it would be best if you were able to just pick up the filename instead of relying on /proc. At the very least, you can replace calls to the library's function with a wrapper function using a linker flag. Example of usage is gcc program.c -Wl,-wrap,theFunctionToBeOverriden -o program, all calls to the library function will be linked against __wrap_theFunctionToBeOverriden; the original function is accessible under the name __real_theFunctionToBeOverriden. See this answer https://stackoverflow.com/a/617606/111160 for details.
But, back to the answer not involving linkage rerouting: you can do it something like
char fd_path[100];
snprintf("/proc/%d/fd/%d", sizeof(fd_path), getpid(), fi->fh);
You should now use this /proc/... path (it is a softlink) rather than using the path it links to.
You can call readlink to find the actual path in the filesystem. However, doing so introduces a security vulnerability and I suggest against using the path readlink returns.
When the file the descriptor points at is deleted,unlinked, then you can still access it through the /proc/... path. However, when you readlink on it, you get the original pathname (appended with a ' (deleted)' text).
If your file was /tmp/a.txt and it gets deleted, readlink on the /proc/... path returns /tmp/a.txt (deleted). If this path exists, you will be able to access it!, while you wanted to access a different file (/tmp/a.txt). An attacker may be able to provide hostile contents in the /tmp/a.txt (deleted) file.
On the other hand, if you just access the file through the /proc/... path, you will access the correct (unlinked but still alive) file, even if the path claims to be a link to something else.

Resources