Are there any naming conventions when creating your own file suffix? - filesystems

I'm working on a little game and figured I'd pack images and other resources into my own files to keep the directories neat. Then I thought, is there any convention to what I should call my file, or can I simply call it what ever without anyone ever caring? There's probably not a lot of strong opinions about this rather innocent subject, but I thought I'd better have some kind of response on it.
The most obvious would be to not use reserved characters.
< > : " / \ | ? *
Those are the ones for windows. Anyone care to add what characters are reserved on other systems?
There's some standard suffixes that I'm guessing shouldn't be used, unless the file actually apply to the suffix's standard.
.bat .exe .dll .lib .cmd
And then there's all the image file types and what not, but those are about as obvious. What more?
What is your opinion? Should I name my suffix as uniquely as possible, say .maf (My Awesome File) or what ever... or should I be more informative and stick to a known suffix that might reveal what my file is actually doing there? Or perhaps a bland old .dat or .bin?

If you want to create something that you want to associate with your program, you do, of course, want it to be as unique as possible. When you come up with an extension, check with FILExt to see if it's conflicting with anything major.
If you just want to convey "this is a binary file, don't try to open it in notepad or tamper with it", i'd go with something like .bin, yes.

Unix platforms don't have filename restrictions (other than NULL and forward slash), so don't worry about any characters other than what Windows doesn't like.
You can worry about using an extension that hasn't been used before, but unless you want to use a really long one, I'd say don't bother, you can always go with something generic like .dat or .bin. You don't actually need to even use an extension, which (imo) is just as good, unless you will be distributing some of these files other than with the program (for example, user made maps, you will want to have an extension since users will be distributing the files).
Another minor point you might want to consider is that MS DOS extensions need to be exactly three characters after the dot. Being DOS compatible isn't a huge issue (not an issue at all really), but that's why you'll see a lot of extensions are three characters.

Use what makes sense to you - I would avoid well known extensions, as you have proposed to avoid them being accidentaly opened by another application.
Most applications/games will give an extension that is related to the application name or use (.doc, .psd etc...).

Unless users are going to double-click on the files from explorer, having a nice informative, unique extension is not important, so you might want to go with .bin or .dat. However there exist good mechanisms for packing files together (.zip or .7z) so you might want to go for a standard packer, with a standard extension.

Related

Case Sensitive Directory Path in Windows

I have reviewed the questions/answers asking whether or not directory/file names are case sensitive in a Windows environment as well as those discussing a need for case-sensitive searching [usually in Python, not C], so I think I understand the essential facts, but none of the postings include my particular application architecture, and the problem I am having resolving the problem.
So, let me briefly explain the application architecture of which I am speaking. The heart of the application is built using Adobe AIR. Yes, that means that much of the U/I involves the Flex framework, but the file handling problem I am needing help with has no dependency upon the Flex U/I part of the application.
As I am trying to process a very large list of recursive directory structures, I am using the low level C RunTime API via a well-behaved mechanism which AIR provides for such cases where access to the host's Native Environment is needed.
The suite of functions which I am using is FindFileFirst, FindFileNext and FindClose. If I write a stand-alone test program, it nicely lists the directories, sub-directories and files. The case of the directories and files is correctly shown -- just as they are correctly shown in Windows Explorer, or using the dir command.
If, however, I launch precisely the same function via the Adobe ANE interface, I receive exactly the same output with the exception that all directory names will be reduced to lower case.
Now, I should clarify that when this code is being executed as a Native Extension, it is not passing data back to AIR, it is directly outputting the results in a file that is opened and closed entirely in the CRT world, so we are not talking about any sort of communication confusion via the passing of either text or byte arrays between two different worlds.
Without kludging up this forum with lots and lots of extraneous code, I think what will help anyone who is able to help me is these snippets:
// This is where the output gets written.
FILE* textFile = _wfopen (L"Peek.txt", L"wt,ccs=UTF8");
WIN32_FIND_DATAW fdf;
HANDLE find = NULL;
wchar_t fullPath[2048];
// I am just showing the third argument as a literal to exemplify
// what, in reality is passed into the recursively-called function as
// a variable.
wsprintf (fullPath, L"\\\\?\\%ls\\*.*", L"F:\\");
hFind = FindFirstFile (fullPath, &fdf);
// After checking for success there appears a do..while loop
// inside which there is the expected check for the "." and ".."
// pseudo directories and a test of fdf.dwFileAttributes for
// file versus sub-directory.
// When the NextFile is a file a function is called to format
// the output in the textFile, like this:
fwprintf (textF, L"%ls\t%ls\t%2.2x\t%4d/%02d/%02d/%02d/%02d/%02d \t%9ld.\n",
parentPath, fdf.cFileName,
(fdf.dwFileAttributes & 0x0f),
st.wYear, st.wMonth, st.wDay,
st.wHour, st.wMinute, st.wSecond,
fSize);
At that point parentPath will be a concatenated wide character string and
the other file attributes will be of the types shown.
So, to summarize: All of this code works perfectly if I just write a stand-alone test. When, however, the code is running as a task called from an Adobe ANE, the names of all the sub-directory parts are reduced to lower case. I have tested every combination of file type attribute -- binary and text and encoding -- UTF-8 and UTF-16LE, but no matter what configuration I choose, the result remains the same: Standalone the API delivers case-correct strings, running as a task in a dll invoked from AIR, the same API delivers only lower-case strings.
First, my thanks to Messrs Ogilvie and Passant for helpful suggestions.
Second, I apologize for not really knowing the protocol here as a very infrequent visitor. If I am supposed to flag either response as helpful and therefore correct, let these words at least reflect that fact.
I am providing an answer which was discovered by taking the advice above.
A. I discovered several tools that helped me get a handle on the contents of the .exe and .dll files. I should add some detail that was not part of the original posting: I have purposely been using the mingw-w64 toolchain rather than Visual Studio for this development work. So, as it turns out, both ldd and dumpbin helped me get a handle on whether or not the two slightly-different build environments were perhaps leaving me with different dependencies.
B. When I saw that one output included a reference to FindFirstFileExW, which function I had once tried in order to solve what I thought was the problem, I thought I had perhaps found a reason for the different results. In the event, that was just a red-herring and I do not mean to waste the forum's time with my low-level of experience and understanding, but it seems useful to note this sort of trouble-shooting methodology as a possible assist to others.
C. So what was the problem? There was, indeed, a small difference in the code between the stand-alone and the ANE-integrated implementations of the recursive directory search. In the production ANE use case, there is logic to apply a level of filtering to the returned results. The actual application allows the user to qualify a search for duplicate files by interrogating parts of the parent string in addition to the filename string itself.
In one corner condition, the filter may be case-sensitive or case-insensitive and I was using _wcslwr in the mistaken belief that that function behaved the nice, Unicode-compliant way that string handling methods are provided in AIR/Actionscript3. I did not notice that the function actually does an in-place replacement of the original string with one reduced to lowercase.
User error, not, any untoward linking of non-standard CRT Kernel functions by Adobe's Native Extension interoperability, was the culprit.

Split C file by its functions

How can I automatically split a single C file with various functions in it into various files with only a single function each? Anyone have a script or let's say a plugin on notepad++ that could do it? Thank you
It may not even be possible. If a single global static variable exists in one of the files, it shall be shared by all the functions of that file but not be accessible (even with the extern modifier) from functions of other files. And even without that, processing of includes and global variables will be a nightmare.
Anyway, on Unix-Linux, the good old ctags command should be close to your requirements: it does not split the files, but creates an index file (called a tags file) which contains the file and position of all functions from the specified C, Pacal, Fortran, yacc, lex, and Lisp sources. The man page says:
Using the tags file, ex [or vi, vim, etc.] can quickly locate these object definitions.
Depending upon the options provided to ctags, objects will consist of
subroutines, typedefs, defines, structs, enums and unions.
You can either use it (if on Unix world) or mimic it, on Windows for example.
For reasons explained in Serge Ballesta's answer, splitting a single C file into smaller pieces is not automatable in general.
And having several small files instead of a larger one is generally a bad idea. The code becomes less readable, its execution could be slower (because there are less inlining and optimizing opportunities for the compiler).
In some cases, you might want to split a big C file (e.g. more than ten thousands lines of source code) into a few smaller ones (e.g. at least a thousands lines of code each). This may require some work, like renaming static functions or variables into a longer (and globally unique) name declared as extern, moving some short functions (or adding some macros) into header files and declaring them as static inline, etc. This cannot be really automatized in the general case.
My recommendation is often to merge a few small (but related) files into one single bigger one. As a rule of thumb, I would suggest having files of more than a thousand lines each, but YMMV.
In particular, there is no reason to have only one function definition in each of your source file. This practically forbids inlining (unless you compile with link-time-optimization, a very expensive approach).
Look into existing free software projects (e.g. on github) for inspiration. Or look into the Linux kernel source code.
Splitting a C file into smaller ones (or conversely, merging several source files in a single bigger one) generally requires some code refactoring. In many cases, it is quite simple (perhaps even as trivial as copy & pasting some functions one by one); in some cases, it could be difficult. You should do it manually and incrementally (and enable all warnings in your compiler, to help you find mistakes in your refactoring; don't forget to recompile often!). You may want to improve your naming conventions in your code while you split it.
Of course you need a version control system (I recommend git), and you'll compile and commit your code several times while splitting it. You need also a good source code editor (I recommend GNU emacs, but it is a matter of taste; some people prefer vim, etc ....).
You certainly don't want to automatize C file splitting (you might write some scripts to help you, generally it is not worth the trouble). You need to control that split.

Why is it mandatory to specify the module name at start of source file?

GHC insist that the module name has to equal the file name. But if they are the same, then why does a Haskell compiler need both? Seems redundant for me. Is this only a language design mistake?
Beside the inconvinience it also raises the problem that if I want to use 2 libraries that accidentially have the same top module name, then I can not disambiguate simply by renaming the folder of one of them. What is the idiomatic solution to this problem?
The Haskell language specification doesn't talk about files. It only talks about modules and their syntax. So there's clearly no language design mistake.
The GHC compiler (and many others) chose to follow a pattern of one module per file, and searching for modules in files with matching names. Seems like a decent strategy to me. Otherwise you'd need to provide the compiler with some mapping from module name to file name or an explicit list of every file in use.
I would say that one of the big reasons is that you don't always want the module name to be path to the file appended with the file name. This is the same as with Java, C#, and many other languages that prefer an explicit namespace declaration in the source code, explicit is better than implicit in many cases. It gives the programmer maximum control over their filenames without tying it to the filename only.
Imagine that I was a Japanese Haskell programmer, and my OS used Japanese characters for file names. I can write my source code using Japanese characters where possible, but I also want to export an API that uses ASCII characters. If module name and filename had to be identical, this would be impossible and would make it very difficult for people in other countries to use my library.
And as #chi has pointed out, if you have two packages with conflicting module names (a very rare occurrence in my experience), you can always use package-qualified imports.
The Haskell language specification requires that modules are started by a module header, and it does not mention files - it leaves total freedom for the implementing compilers regarding files. So the Haskell language lacks the ability to express where files containing modules are. Because of this some compilers [including the most important one: GHC] use a simple solution : the name of the module must match the path from an include directory to the file. This introduced the redundancy.
To avoid the redundancy, the compilers could drop the requirement in the language specification to start each module by a header. However they chose not to do this simply for the sake of confirming to the specification. Perhaps a GHC language extension could do this, but currently there is no such extension.
So the problem is a language design mistake, and lives on as legacy.
To combat possible name collisions among independent libraries, GHC extension Package-qualified imports seems the best way.

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Storing folder's paths

Where can I store folder's paths, which can be accessed from every function/variable in a C program?
Ex. I have an executable called do_input.exe in the path c:\tests\myprog\bin\do_input.exe,
another one in C:\tools\degreesToDms.exe, etc. how and where should I store these?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
Yes, they are absolutely right: "baking in" installation-specific strings with paths in a file system into a compiled code is not a good decision, because you must recompile simply to change locations of some key files. This limits the flexibility of other members of your team to run your tests, and may prevent your tests from being ran automatically in an automated testing environment.
A better solution would use a plain text configuration file with the locations of the key directories, and functions that read that file and produce correct locations at run-time.
Alternatively, you could provide locations of key directories as command-line parameters to your program. This way, users who run your program would be able to set correct locations without recompiling.
If they stay the same, then I don't see any problem defining these paths in a ".h" header file included in all the various .c files that reference the paths. But every computer this thing will be running on may have different paths ("Tests" instead of "test"), so this is super risky programming and probably only safe if you're running it on a single machine or a set of machines that you control directly.
If the paths will change, then you need to create a storage place for these paths (e.g. static character array, etc.) and then have methods to allow these to be fetched and possibly reset dynamically (e.g. instead of writing output files to "results", maybe the user wants to change things to write files to "/tmp"). Totally depends on what you are doing in your code and what the tools you're writing will be doing.

Resources