Cross-platform way of storing setting/cache - c

I'm currently writing a program in C that needs to store configuration files, and various cache files on disk, as well as some static data files that never change, and that can safely be shared for all users.
I want to store these files in the appropriate location of the OS, in a portable way if possible…
For example, on GNU/Linux :
$XDG_CONFIG_HOME/<program_rame>/ -- for configuration files
$XDG_CACHE_HOME/<program_name>/ -- for cache files
$XDG_DATA_DIRS/<program_name>/ -- for static files
And, if these are unset, I can use the recommended default values ("~/.config", "~/.cache" and "/usr/share"). GNU/Linux is the easy one.
However, on Windows, I have no idea how to find the equivalent of these. I think it's going to be somewhere in AppData, but where ? And, most importantly, how do I get this programatically ? And what about other OSes (especially Mac OS, since most other UNIXes could use the "GNU/Linux" method as well) ?
I'm open to all solutions, whether it requires a library or not. Thanks in advance for your help !

Windows has so called known folder IDs, formerly (before Vista) known as special item IDs, or CSIDL. For instance in the folder referenced by FOLDERID_LocalAppData (that is %USERPROFILE%\AppData\Local) might be what you're looking for. I'm not on Windows and thus I don't know which C-API these values provide. However, chances are that these folders are also available as environment variables (e.g. %LOCALAPPDATA%).

Related

Case Sensitive Directory Path in Windows

I have reviewed the questions/answers asking whether or not directory/file names are case sensitive in a Windows environment as well as those discussing a need for case-sensitive searching [usually in Python, not C], so I think I understand the essential facts, but none of the postings include my particular application architecture, and the problem I am having resolving the problem.
So, let me briefly explain the application architecture of which I am speaking. The heart of the application is built using Adobe AIR. Yes, that means that much of the U/I involves the Flex framework, but the file handling problem I am needing help with has no dependency upon the Flex U/I part of the application.
As I am trying to process a very large list of recursive directory structures, I am using the low level C RunTime API via a well-behaved mechanism which AIR provides for such cases where access to the host's Native Environment is needed.
The suite of functions which I am using is FindFileFirst, FindFileNext and FindClose. If I write a stand-alone test program, it nicely lists the directories, sub-directories and files. The case of the directories and files is correctly shown -- just as they are correctly shown in Windows Explorer, or using the dir command.
If, however, I launch precisely the same function via the Adobe ANE interface, I receive exactly the same output with the exception that all directory names will be reduced to lower case.
Now, I should clarify that when this code is being executed as a Native Extension, it is not passing data back to AIR, it is directly outputting the results in a file that is opened and closed entirely in the CRT world, so we are not talking about any sort of communication confusion via the passing of either text or byte arrays between two different worlds.
Without kludging up this forum with lots and lots of extraneous code, I think what will help anyone who is able to help me is these snippets:
// This is where the output gets written.
FILE* textFile = _wfopen (L"Peek.txt", L"wt,ccs=UTF8");
WIN32_FIND_DATAW fdf;
HANDLE find = NULL;
wchar_t fullPath[2048];
// I am just showing the third argument as a literal to exemplify
// what, in reality is passed into the recursively-called function as
// a variable.
wsprintf (fullPath, L"\\\\?\\%ls\\*.*", L"F:\\");
hFind = FindFirstFile (fullPath, &fdf);
// After checking for success there appears a do..while loop
// inside which there is the expected check for the "." and ".."
// pseudo directories and a test of fdf.dwFileAttributes for
// file versus sub-directory.
// When the NextFile is a file a function is called to format
// the output in the textFile, like this:
fwprintf (textF, L"%ls\t%ls\t%2.2x\t%4d/%02d/%02d/%02d/%02d/%02d \t%9ld.\n",
parentPath, fdf.cFileName,
(fdf.dwFileAttributes & 0x0f),
st.wYear, st.wMonth, st.wDay,
st.wHour, st.wMinute, st.wSecond,
fSize);
At that point parentPath will be a concatenated wide character string and
the other file attributes will be of the types shown.
So, to summarize: All of this code works perfectly if I just write a stand-alone test. When, however, the code is running as a task called from an Adobe ANE, the names of all the sub-directory parts are reduced to lower case. I have tested every combination of file type attribute -- binary and text and encoding -- UTF-8 and UTF-16LE, but no matter what configuration I choose, the result remains the same: Standalone the API delivers case-correct strings, running as a task in a dll invoked from AIR, the same API delivers only lower-case strings.
First, my thanks to Messrs Ogilvie and Passant for helpful suggestions.
Second, I apologize for not really knowing the protocol here as a very infrequent visitor. If I am supposed to flag either response as helpful and therefore correct, let these words at least reflect that fact.
I am providing an answer which was discovered by taking the advice above.
A. I discovered several tools that helped me get a handle on the contents of the .exe and .dll files. I should add some detail that was not part of the original posting: I have purposely been using the mingw-w64 toolchain rather than Visual Studio for this development work. So, as it turns out, both ldd and dumpbin helped me get a handle on whether or not the two slightly-different build environments were perhaps leaving me with different dependencies.
B. When I saw that one output included a reference to FindFirstFileExW, which function I had once tried in order to solve what I thought was the problem, I thought I had perhaps found a reason for the different results. In the event, that was just a red-herring and I do not mean to waste the forum's time with my low-level of experience and understanding, but it seems useful to note this sort of trouble-shooting methodology as a possible assist to others.
C. So what was the problem? There was, indeed, a small difference in the code between the stand-alone and the ANE-integrated implementations of the recursive directory search. In the production ANE use case, there is logic to apply a level of filtering to the returned results. The actual application allows the user to qualify a search for duplicate files by interrogating parts of the parent string in addition to the filename string itself.
In one corner condition, the filter may be case-sensitive or case-insensitive and I was using _wcslwr in the mistaken belief that that function behaved the nice, Unicode-compliant way that string handling methods are provided in AIR/Actionscript3. I did not notice that the function actually does an in-place replacement of the original string with one reduced to lowercase.
User error, not, any untoward linking of non-standard CRT Kernel functions by Adobe's Native Extension interoperability, was the culprit.

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Storing folder's paths

Where can I store folder's paths, which can be accessed from every function/variable in a C program?
Ex. I have an executable called do_input.exe in the path c:\tests\myprog\bin\do_input.exe,
another one in C:\tools\degreesToDms.exe, etc. how and where should I store these?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
Yes, they are absolutely right: "baking in" installation-specific strings with paths in a file system into a compiled code is not a good decision, because you must recompile simply to change locations of some key files. This limits the flexibility of other members of your team to run your tests, and may prevent your tests from being ran automatically in an automated testing environment.
A better solution would use a plain text configuration file with the locations of the key directories, and functions that read that file and produce correct locations at run-time.
Alternatively, you could provide locations of key directories as command-line parameters to your program. This way, users who run your program would be able to set correct locations without recompiling.
If they stay the same, then I don't see any problem defining these paths in a ".h" header file included in all the various .c files that reference the paths. But every computer this thing will be running on may have different paths ("Tests" instead of "test"), so this is super risky programming and probably only safe if you're running it on a single machine or a set of machines that you control directly.
If the paths will change, then you need to create a storage place for these paths (e.g. static character array, etc.) and then have methods to allow these to be fetched and possibly reset dynamically (e.g. instead of writing output files to "results", maybe the user wants to change things to write files to "/tmp"). Totally depends on what you are doing in your code and what the tools you're writing will be doing.

How to ensure unused symbols are not linked into the final executable?

First of all my apologies to those of you who would have followed my questions posted in the last few days. This might sound a little repetitive as I had been asking questions related to -ffunction-sections & -fdata-sections and this one is on the same line. Those questions and their answers didn't solve my problem, so I realized it is best for me to state the full problem here and let SO experts ponder about it. Sorry for not doing so earlier.
So, here goes my problem:
I build a set of static libraries which provide a lot of functionalities. These static libraries will be provided to many products. Not all products will use all of the functionalities provided by my libs. The problem is that the library sizes are quite big and the products want it to be reduced. The main goal is to reduce the final executable size and not the library size itself.
Now, I did some research and found out that, if there are 4 functions in a source file and only one function of that is used by the application, the linker will still include the rest of the 3 functions into the final executable as they all belong to the same object file. I further analyzed and found that -ffunction-sections, -fdata-sections and -gc-sections(this one is a linker option) will ensure only that one function gets linked.
But, these options for some reasons beyond my control cannot be used now.
Is there any other way in which I can ensure that the linker will link only the function which is strictly required and exclude all other functions even if they are in the same object file?
Are there any other ways of dealing with the problem?
Note: Reorganizing my code is almost ruled out as it is a legacy code and big.
I am dealing mainly with VxWorks & GCC here.
Thanks for any help!
Ultimately, the only way to ensure that only the functions you want are linked is to ensure that each source (object) file in the library only exports one function symbol - one (visible) function per file. Typically, there are some files which export several functions which are always all used together - the initialization and finalization functions for a package, for example. Also, there are often functions used by the exported function that do not need to be visible outside the source (object) file - make sure they are static.
If you looked at Plauger's "The Standard C Library", you'll find that every function is implemented in a separate file, even if the file ends up 4 lines long (one header, one function line, an open brace, one line of code, and a close brace).
Jay asked:
In the case of a big project, doesn't it become difficult to manage with so many files? Also, I don't find many open source projects following this model. OpenSSL is one example.
I didn't say it was widely used - it isn't. But it is the way to make sure that binaries are minimized. The compiler (linker) won't do the minimization for you - at least, I'm not aware of any that do. On a large project, you design the source files so that closely related functions that will normally all be used together are grouped in single source files. Functions that are only occasionally used should be placed in separate files. Ideally, the rarely used functions should each be in their own file; failing that, group small numbers of them into small (but non-minimal) files. That way, if one of the rarely used functions is used, you only get a limited amount of extra unused code linked.
As to number of files - yes, the technique espoused does mean a lot of files. You have to weigh the workload of managing (naming) lots of files against the benefit of minimal code size. Automatic build systems remove most of the pain; VCS systems handle lots of files.
Another alternative is to put the library code into a shared object - or dynamic link library (DLL). The programs then link with the shared object, which is loaded into memory just once and shared between programs using it. The (non-constant) data is replicated for each process. This reduces the size of the programs on disk, at the cost of fixups during the load process. However, you then don't need to worry about executable size; the executables do not include the shared objects. And you can update the library (if you're careful) without recompiling the main programs that use it. The reduced size of the executables is one reason shared libraries are popular.

Moving libraries and headers

I have some c code which provides libfoo.so and libfoo.a along with the header file foo.h. A large number of clients currently use these libraries from /old_location/lib and /old_location/include directories which is where they are disted.
Now I want to move this code to /new_location. Yet I am not in a position to inform the clients about this change. I would want the old clients to continue accessing the libs and headers from the /old_location.
For this, will creating symlinks to the libs/headers to the new locations work?
/old_location/lib/libfoo.so -> /new_location/lib/libnewfoo.so
/old_location/lib/libfoo.a -> /new_location/lib/libnewfoo.a
/old_location/inlcude/foo.h -> /new_location/inlcude/foo.h
[Note that I need to name the new lib as libnewfoo and not libfoo due to some constraints. Can this renaming cause any problem? Yet the C code that generates these has not changed.]
It seems to work for the few simple cases I tried. But can there be cases where clients are using the libs and headers in a way which may break as a result of this change. Please let me know what kind of intricacies can be involved in this. Sorry if this seems to be a novice question, I've hardly worked with c before and am a java person.
You have to differentiate between compile time and run time.
For compile time, clients need to update their Makefile and / or configure logic.
For run time, you simply tell ld.so via ld.so.conf about where to find the .so library (or tell your clients to adjust LD_LIBRARY_PATH, a second best choice). The static library does not matter as its code is already built into the executable.
And yes, by providing symbolic links you can make the move 'disappear' as well and provide all files via the old location.
And all this is pretty testable from your end before roll-out.
I don't see any reason why this would break, this is more a question about symlinks than C. To an unsuspecting user program (one which doesn't have special code to detect symlinks and complain), a symlink is transparent.
If you do experience errors feel free to post them and we'll do our best to advise. However I see nothing off the top of my head that would cause issues.
The only problem with the symlinks could be if some clients mount the new location with a different path, which is possible in a networked unix type environment. For example, you could have the location as:
/var/stuff/new_location/include/...
and the client could be mounting that as:
/auto/var/stuff/new_location/include/..
In which case a relative symlink might work better, i.e.:
old_location/include/foo.h -> ../new_location/include/foo.h
Another thing to consider is to replace old_location/foo.h with:
/*
* Please note that this library has moved to a new location...
*/
#include "new_location/include/foo.h"
The symlinks will work on any operating system and file system that supports symlinks.

Resources