Order files by creation time to the millisecond in Bash - file

I need to create a list of files which are located on my hard disk in order of when they arrived on the hard disk. To do so, I have used the following:
ls -lat
which lists all the files in date/time order, however, it only orders them to the nearest second. The problem here is that there are thousands of files and every so often, a few of them come clumped together in the same second. I need the exact correct ordering. I'm guessing the easiest way to do this is to get the creation time to the milli (or perhaps nano) second. To do this, I have tried using the following:
stat $myfile
to look at the modification time, but it always shows hour:minute:second.00000000000.
Is there a way to do this?
Thanks,
Rik

The accuracy depends on the file system you are using, but even with a high accuracy file system such as ext4, the standard implementation of stat uses time_t which has a 1 second resolution.
If you have access to the source of the program spitting out all those files, try setting a timestamp as part of the filename instead and then sort on the filename rather than the modification time.

you'll probably have to write your own stat command, using the stat(2) function

I'm not sure this is possible. My reasoning:
If you look at the stat() function call, you see that it returns a struct containing information about a file. One of its members is this:
time_t st_mtime; /* time of last modification */
And if you look at the time_t structure, well, wikipedia says this:
Unix and POSIX-compliant systems
implement time_t as an integer or
real-floating type (typically a
32- or 64-bit integer) which
represents the number of seconds since
the start of the Unix epoch...
Which means that stat()'s time is in terms of seconds, not milliseconds. I haven't looked at how each inode stores file information, but it might not store info up to the millisecond.
An alternative might be to append the mill/microsecond value to the filename itself when they are being created and order them that way?

Related

Generating load time serial number for PCB application

I am trying to generate an incrementing value at load time to be used to "serialize" a PCB with a unique code value. Not an expert in ld or preprocessor commands, so looking for some help.
The value will be used in a unique ID for each board that the code is loaded on and will also be used as a counter for boards in the field.
I have no preconceived idea of how I might accomplish this, so any workable answer to get me started, including a pre-preprocessor macro is fine. In my olden days, I recollect adding a couple lines to the linker file that would accomplish this, but I have been unable to resurrect that information anywhere (including my brain's memory cells).
The simpler the answer, the better.
My solution to the problem was remarkably simple.
The binary contained
const char *serial = "XY-00000";
I then wrote a short program that boiled down to:
char uniqueserial [8];
/* Generate serial - this was an SQL call to the manufacturing DB */
char *array;
/* Read binary into array */
memcpy(memmem(array, "XY-00000",8), uniqueserial,8);
/* Write array to temp bin file for flashing */
Depends on the serial template string being unique in the binary. Use strings command to check. I disable crc protected object files due to taste. I like my embedded binaries being exact memory dumps.
The linker is not the right place for two reasons:
the executable can be loaded with the same id in several devices, making your approach void.
You should have to link the executable for each device you are programming, which poses an spent of cpu resources.
The best place is to patch the executable at loading time with the serial number.
Select a data patern as token to initialize your variable with the device id (a pattern difficult to happen elsewhere in your program binary) and initialize your serial number variable to that data pattern (better if you do it statically initializing an array variable or something similar)
Make a program to be executed on each download to device that search for the pattern in the executable file, before loading the binary program into the device and writes the correct value to be programmed into the device (beware that you are patching a binary, so you cannot think on variable lenght strings or the like, that can trash all the work made by the linker)
Once patched the binary executable, you can download it to the device.
Another solution is to reserve a fixed area in your linker script for all this kind of information. Then, put all your device information variables there. Then get the exact positions in rom for the individual variables and include the proper data in the loaded image. In this case, the linker is your friend, reserving some fixed segment in your device's rom allocated for storing the device's individual data (you can put there mac addresses, serial numbers, default configuration, etc.)

API to set the timestamps on files & directories in btrfs

BTRFS files/directories contains the timestamps:
Creation (otime)
Modification (mtime)
Attribute modification (ctime)
Access (atime)
Is there some API where I could set these all these timestamps for a file? I googled a bit but haven't found anything yet.
Programming languages doesn't matter, I would expect there to be some C API, but python is fine too and would be nicer.
From C, the mtime and atime can be set using utime(2) and its relatives. utime(2) itself gives you seconds precision, utimes(2) has microseconds, and utimensat(2) gives you nanoseconds. There are variants like futime if you have a file handle instead of a file name.
Python can provide the same via the os.utime function.
Traditionally it is not possible to arbitrarily modify the otime or ctime, other than by manually editing the raw filesystem. I am not aware that Linux has provided any kernel API to modify them. Of course, you can update the ctime to the current time by changing its status in some way, and you can update the otime to the current time by deleting and recreating the file. In principle you can set them to a different time by changing the system clock first (if you are root), but this is likely to mess up lots of other stuff on the system and is probably a bad idea.

Loading thousand of files in same memory chunk C

Box: Linux, gcc.
Problem :
Finding out the file signature of an home folder, which contains thousand of items, by scanning this folder recursively.
Done so far:
Using mmap() system call to load the first 1k byte of each file and check the file magic number.
The drawback with that method is that for each file encountered i've got to make two system calls (e.g mmap() and munmap()).
Best solution if possible:
I would like to allocate a single chunk of memory, load each file (in a row) in this unique buffer and when processing is completed deallocate it, meaning that for each folder scanned i would only use two system calls.
I can't figure out which system call to use in order to achieve that, not even if this solution is realistic!
Any advice would be greatly appreciated.
Don't worry about performance until you know it isn't enough. Your time is much more valuable than the gains in program run time (except for extremely rare cases). And when the performance isn't enough, measure before digging in. There are numerous war stories on "performance optimizations" that were a complete waste (if not positivley harmful).

C: creating an archive file header

I am creating a file archiver/extractor (like tar), using POSIX API system calls in C. I have done part of the archiving bit.
I would like to know if any one could help me with some C source code (using above) to create a file header for a file in C (where header acts as an index), which describes the files attributes/meta data (name,date time,etc). All I have done so far is understand (not sure if that's even correct) that to create a file header it needs
a struct to hold the meta data, and lseek is needed to seek to beginning/end of file
like:
FileName=file.txt FileSize=0
FileDir=./blah/blah
FilePerms=000
\n\n
The archiving part of program has this process:
Get a list of all the files from the command line. (I can do this part)
Create a structure to hold the meta data about each file: name (255 char), size (64-bit int), date and time, and permissions.
For each file, get its stats.
Store the stats of each file within an array of structures.
Open the archive for writing. (I can do this part)
Write the header structure.
For each file, append its content to the archive file (at the end/start of each file).
Close the archive file. (I can do this part)
I am having difficulty in creating the header file as a whole even though I know what it needs to do, as mentioned in numbered points above the bits I cant do are stated (2,3,4,6,7).
Any help will be appreciated.
Thanks.
As ijw notes, there are several ways to create an archive file header. If cross-platform portability is going to be an issue at all - or if you need to switch between 32-bit and 64-bit builds of the software on the same platform, even - then you need to ensure that the sizes and layouts of the fields are fully understood on all platforms.
Per-file Metadata
One way to do that is to use a fixed format binary header with types of known size and endianness. This is what ijw suggested. However, you will need to handle long file names, and so you will need to store a length (probably in a 2-byte unsigned integer) and then follow that with the actual pathname.
The alternative, and generally now favoured technique, is to use printable fields (often called ASCII format, though that is something of a misnomer). The time is recorded as the decimal number of seconds since the Epoch converted to a string, etc. This is what modern ar archives use; it is what GNU tar does (more or less; there are some historical quirks that make that more confusing); it is what cpio -c (which is usually the default these days) does. The fields might be separated by nulls or spaces; there is an easy way to detect the end of the header; the header contains information about the file name (not necessarily as directly as you'd like or expect, but again, that is usually because the format has evolved over the years), and then is followed by the actual data. Somehow, you know the size of each field, and the file which the header describes, so that you can read the data reliably.
Efficiency is a red herring. The conversion to/from the text format is so swift by comparison with the first disk access that there is essentially no measurable performance issue. And the guaranteed portability typically far outweighs the (microscopic) performance benefit from using binary data format instead - doubly so when the binary data has to be transformed on input or output anyway to get it into an architecture-neutral format.
Central Index vs Distributed Index
The other issue to consider is whether the index of files in the archive is centralized (at the front, or at the end) or distributed (the metadata for each file immediately precedes the data for the file). There are some advantages to each format - generally, systems use the distributed version because you can write the information for each file without knowing how many files there are to process in total (for example, because you are recursively archiving a directory's contents). Having a central index up front means you can list the files without reading the whole archive - distributed metadata means you have to read the whole file. However, the central index complicates the building of the archive.
Note that even with a distributed index, you will normally need a header for the archive as a whole so that you can detect that the file is in the format you expect. Typically, there is some sort of marker information (!<arch>\n for an ar archive, usually; %PDF-1.2\n at the start of a PDF file, etc) to reassure you that the file contains what you expect. There might be some overall (archive-level) metadata. Then you will have the first file metadata followed by the file data, repeating until the end of archive (which might, or might not, have a formal end marker - more metadata).
[H]ow would I go about implementing it in the 'fixed format binary header' you suggested. I am having trouble with deciding what commands/functions are needed.
I intended to suggest that you do not go with a fixed format binary header; you should use a text-based header format. If you can work out how to do the binary format, be my guest (I've done it numerous times over the years - that doesn't mean I think it is a good idea).
So, some pointers here towards the 'text header' format.
For the file metadata, you might define that you include:
size
mode (permissions, type)
owner
group
modification time
length of name
name
You might reasonably decide that your file sizes are limited to 64-bit unsigned integer quantities, which means 20 decimal digits. The mode might be printed as a 16-bit octal number, requiring 6 octal digits. The owner and group might be printed as UID and GID values (rather than name), in which case you could use 10 digits for each. Alternatively, you could decide to use names, but you should then allow for names up to say 32 characters each. Note that names are typically more portable than numbers. Neither name nor number is of much relevance on the receiving machine unless you extract the data as root (but why would you want to do that?). The modification time is classically a 32-bit signed integer, representing the number of seconds since the Epoch (1970-01-01 00:00:00Z). You should allow for the Y2038 bug by allowing the number of seconds to grow bigger than the 32-bit quantity; you might decide that 12 leading digits will take you beyond the Y10K crisis (by a factor of 4 or so), and this is good enough; you might decide to allow for fractional seconds too. Together, this suggests that 26 spaces for the timestamp should be overkill. You can decide that each field will be separated from the next by a space (for legibility - think 'ease of debugging'!). You might reasonably decide that all file names will be restricted to 4 decimal digits in total length.
You need to know how to format the types portably - #include <inttypes.h> is your friend.
You then devise a format string for printing (writing) the file metadata, and a parallel string for scanning (reading) the file metadata.
Printing:
"%20" PRIu64 " %06o %-.32s %-.32s %26" PRIu64 " %-4d %s\n"
This prints the name too. It terminates the header with a newline. The total size is 127 bytes plus the length of the file name. That's probably excessive, but you can tweak the numbers to suit yourself.
Scanning:
"%" SCNu64 " %o %.32s %.32s %" SCNu64 "%d"
This does not scan the name; you need to create the scanner for the name carefully, not least because you need to read spaces in the name. In fact, the code for scanning the user name and group name both assume no spaces too. If that is not acceptable (that is, names may contain spaces), then you need a more complex scan format, or something other than sscanf() to process the input data.
I'm assuming a 64-bit integer for the time field, rather than mixing up fractional seconds, etc, even though there's enough space to allow for fractional seconds. You'd likely save some space here.
The getting of information for each file you can do with the stat() system call.
For the writing of the header, here's two solutions.
Trivial but evil:
struct file_header {
... data you want to put in
} fhdr;
fwrite(file, fhdr, sizeof(fhdr));
This is evil because structure packing varies from machine to machine, as does byte order and the size of basic types like 'int'. A file written by your program may not be readable by your program when it's compiled on another machine, or even with another compiler on the same machine in some cases.
Non-trivial but safe:
char name[xxx];
uint32_t length; /* Fixed byte length across architectures */
...
fwrite(file, name, sizeof(name));
length=htonl(length); /* Or something else that converts
the length to a known endianness */
fwrite(file, &length, sizeof(length);
Personally I'm not a fan of htonl() and friends, I prefer to write something that converts a uint32_t to a uchar[4] using shift operators (which can be written trivially using shift operators) because C doesn't pin down the format of even an integer in memory. In practice you'd be hard pushed to find something that doesn't store a uint32_t as 4 bytes of 8 bits, but it's a thing to consider.
The variables listed above can be structure members in your structure. Reversing the process on read is left as an exercise to the reader.

Using modified date of file for comparison. Is it safe?

I want to make a procedure that does one way syncrhonization of a folder between a server and a client. I was thinking to use ModifiedDate as a criterio, provided that only the date of the server files will be used. Procedure will not use Modified dates of files on the client at all. It will read dates from the server and compare them with dates read from the server last time the procedure run.
Do you think this is safe?
Is there any possibility that Modified Date will not be changed when a file is edited or it will be changed without touching the contents of the file (eg. from some strange antivirus programs)?
Don't count on the modification date of a file.
Strange programs (antiviruses and such) are not the problem more than the fact that you simply can't count on the client and server clocks to be synchronized.
Why not do a straightforward diff or hash calculation? You can't get a better comparison than that.
Taking performance considerations into effect, you can use the following heuristic:
If date hasn't changed then file is obviously the same
If date has changed, file contents might have changed, and might have not (for example: it has been touched). In this case in order to get a definitive answer you must examine the file somehow.
Bottom line: modification date can always give you a true negative (file not changed), but might sometimes yield a false positive - and this case you must verify.
You didn't mention what OS you're on, but on UNIX platforms the modification time can be set by client code to any value it wants (see the utimes() API or touch command). Therefore you shouldn't rely on modification times to tell you whether a file has changed or not. I imagine Windows is somewhat similar.

Resources