File Repository (with file history) - What implementation to use? - filesystems

I'm writing a file backup utility that:
(1) Backs up your current files; and
(2) Allows you to retrieve past versions of such files (similar to code repository revisions).
I'm considering using a source code repository (SVN, Git, Mercurial are main candidates) since they provide similar functionality, except to source code.
What are the advantages/disadvantages of that compared to writing my own proprietary code (e.g. for each file, keep the current file and maintain a binary diff chain down to the oldest revision)?
What method would you recommend, in light of performance considerations?
If it matters, the server program will be written in Python, with performance-critical areas done by C extensions.

Your requirement can be done perfectly using a source code repository. You can just reuse them.
Many project is open source, you can modify them if you want.
EDIT:
For small frequently commits, I think it depends on the frequency and how large the repository is. If the repository is very large and committed frequently, then I think it is very difficult to reach your goal. But if the number of files to back up is not large or the frequency is not high, then it will be OK.

Related

How to automatically extract all the source code and header files that a small project depends on, within a big project?

A small project is in a big project. The small project that can be compiled with make only uses a small part of the files of this big project. Are there any tools that can automatically extract all the source code and header files that this small project depends on? Manually picking is definitely feasible, but it is inefficient and error-prone since the Makefiles are complex and deeply-nested. Modern IDEs usually build indexes, and I don't know if any IDEs provide this feature of extracting all dependencies.
I ended up using the Process Monitor tool to track and filter all the OpenFile system calls of the build system on Windows, then exported the report and wrote a script to copy the files. But such an approach is not so elegant >_<

Is there a way to prevent a file from being completely loaded by a software?

Is there a way to limit a hard drive from reading a certain file? Ex. It's given to Program A the order to open a .txt file. Program B overloads the .txt file opening hundreds times a second. Program A is unable to open the txt file.
So I'm trying to stress test a game engine that relies on extracting all used textures from a single file at once. I think that this extraction method is causing some core problems to the game developing experience of the engine overall. My theory is that the problem is caused by the slow reading time of some hard drives. But I'm not sure if I'm right on this, and I needed I way to test this out.
Most operating systems support file locking and file sharing so that you can establish rules for processes that share access to a file.
.NET, for example (which runs on Windows, Linux, and MacOS), provides the facility to open a file in a variety of sharing modes.
For very rapid access like you describe, you may want to consider a memory-mapped file. They are supported on many operating systems and via various programming languages. .NET also provides support.

How would I read the NTFS master file table in C (*not* C++)?

I need a simple, lightweight way to read the NTFS MFT on a Windows server, using only C. My goal is to return a collection of directories and their permissions in a programmatic way for an application my company is building.
Every other answer I've researched on StackOverflow and other places have involved using C++ or other languages, and are typically very bloated. I'm pretty sure that what I want can be done in just a few lines of code, using the Windows API to call CreateFile (to get a handle to the root volume) and DeviceIoControl (to read the MFT). But I can't find a simple C solution for doing this.
Note that, although I've been a C#/.NET developer for many years (and also know other languages including Java and Python), I am fairly new to low-level C programming and Windows API calls. I also realize that there is a free too, Mft2Csv, that does exactly this. But the actual source code isn't available for me to reverse-engineer (GitHub has only the executable and supporting files).
I also realize I could just parse the directory tree using C# the .NET namespaces System.IO and System.Security.AccessControl. But this is way too slow for my purposes.

What is the best way to bundle static resources in Nim?

Currently, I am writing a web application using Jester and would like to facilitate the deployment by bundling all static resources (CSS, HTML, JS).
What is the best way to do this in Nim ?
The basic way is to use staticRead (aka slurp) to read a file at compile time and have it as a constant in your program. This can get tedious pretty fast since you would need to do this for each file manually, or generate a .nim file with lots of these staticRead() calls based on the current files of your directory before shipping and use those variables.
Another way might be to zip all files and have your program read/unpack them at runtime. The zip can be created without compression if you just want to use it to reduce file clutter on your deployment though you could experiment with fast compression settings which typically improve overall speed (IO is slow, so you program ends up spending less time waiting for the read to complete, and CPUs are really good at uncompressing today).
Combining the above you might want to embed the zip file into your binary and use it as a kind of embedded virtual filesystem.

Maintain a separate branch for each platform

I'm trying to port my project to another platform and I've found a few differences between this new platform and the one I started on. I've seen the autotools package and configure scripts which are supposed to help with that, but I was wondering how feasible it would be to just have a separate branch for each new platform.
The only problem I see is how to do development on the target platform and then merge in changes to other branches without getting the platform-dependent changes. If there is a way to do that, it seems to me it'd be much cleaner.
Has anyone done this who can recommend/discourage this approach?
I would definitely discourage that approach.
You're just asking for trouble if you keep the same code in branches that can't be merged. It's going to be incredibly confusing to keep track of what changes have been applied to what branches and a nightmare should you forget to apply a change to one of your platform branches.
You didn't mention the language, but use the features available in the language to separate code differences between platforms, but using one branch. For example, in C++, you should first use file-based separation. For example, if you have sound code for Mac, Linux and Windows platforms, create a sound_mac.cpp, sound_windows.cpp and sound_linux.cpp file, each containing the same classes and methods, but containing very different platform-specific implementations. Obviously, you only add the appropriate file to the IDE on the particular platform. So, your Xcode project gets sound_mac.cpp file, while your Visual Studio project uses the sound_windows.cpp file. The files which reference those classes and methods will use #ifdef's to determine which headers to include.
You'll use a similar approach for things like installer scripts. You may have a different installer on the Mac than on Windows, but the files for both will be in the branch. Your build script on the Mac will simply utilize the Mac-specific installer files and ignore the Windows-specific files.
Keeping things in one branch and just ignoring what doesn't apply to the current platform allows you merge back and forth between topic branches and the master, making your life much more sane.
Branching to work out compatibility for a target platform is doable. Just be sure to separate out changes that don't have to do with the target platform specifically into another branch.

Resources