How to copy a hard disk partition? [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
Hii
I wanna copy an NTFs partition to another partition of same type and same size.And I tried with windows function
Copyfile() and it worked but slow speed is a problem.Then I did with readfile() and WriteFile() instead of Copyfile() again speed is a problem.
How can I get a better speed...??
I did the same operation in kernel mode and getting slow performance using zwCreatefile() ,zwReadfile() & zwWriteFile()...
How can I get a better speed .....?
I want to copy a hard disk partition into another partition. My source and destination partitions are of NTFs and same size. For that purpose first I did by copying all sectors and it is working, But I wanna copy only used sectors…
Then I find the used clusters by reading the FSCTL_VOLUME_BITMAP. But this one also a slow one ..I want to get better speed. And I tried to get the used clusters by using the FSCTL_GET_RETEIVAL_POINTER also. But it's a slow one.....
At last I tried the windows API CopyFile() also…But everything gives a slow performance…
I know fundamentally Kernel mode(ring 0) is slower than User mode in speed ,(even if ring 0 can access Hardware directly).....
Apart them these I tried also Asynchronous operation by setting OVERLAPPED flag in CreateFile.... getting small improvement....
And I've taken snapshot (Volume shadow copy)of the Volume and copied the files using Hobo copy method...but everything gives the same speed.....
Any idea to help...
I have used the Software Acronis Disk director suite .I exclaimed after finding it's speed......!!!!!!
Any idea to help me...to get a good speed.......???
Any links to the white papers related to this section...???
Thanking you

I think the easiest way is to use a Linux Live Distribution or a Linux Rescue Disk.
After the start in a terminal you have to type (if "/dev/hda1" is the source partition and "/dev/hdb1" is the destination):
dd if=/dev/hda1 of=/dev/hdb1 bs=64k
Instead of "dd" with some rescue distributions you can also use "dd_rescue".
Be careful to use the right devices! Apart of this it works very well!
Werner

In order to help you you have to share with us your definition of "better speed".
In order to calculate expected speed (rough) you need to know
1. What is the raw performance of your block devices (hard disks in your case?)
2. The size of the data you need to transfer
So if your partitions get X1 and X2 mb/s and there is Y mb to copy and the two partitions are not on the same physical device, you should expect the copy yo be done in y / min(X1, X2) seconds. Again - this is a rough estimate, just some reference point so we can give meaning to the words "better speed"
How slower than that estimate are the results you are getting?

Related

Fast and robust checksum algo for a small data file (~10KB) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I have a data file that needs to be pushed to an embedded device. The typical size for the file ranges from a few bytes to about 10K max. My intention is to detect tampering with the contents of this file(chksum to be last element in the data file). The data range is a mix of strings, signed and unsigned integers. I am looking for a robust algo to avoid a lot of collisions as well something that does not use a lot of cycles to compute. I am considering Fletcher16(), CRC-32 and the solution discussed in this post
Any suggestions for a simple algo for my kind of data size/contents?
Thanks in advance!
EDIT:-
Thanks everyone for the insightful answers and suggestions.
Some background: This is not a hyper secure data file. I just want to be able to detect whether someone wrote it by mistake. The file gets generated by a module and should be just read only by the SW. Recently there has been a few instances where folks have pulled it from the target file system, edited and pushed back to the target hoping that would fix their problems. (Which btw it would if edited carefully). But this defeats the very purpose of auto generatating this file and the existence of this module. I would like to detect and such playful "hacks" and abort gracefully.
My intention is to detect tampering with the contents of this file
If you need to detect intentional tampering with a file, you need some sort of cryptographic signature -- not just a hash.
If you can protect a key within the device, using HMAC as a signature algorithm may be sufficient. However, if the secret is extracted from the device, users will be able to use this to forge signatures.
If you cannot protect a key within the device, you will need to use an asymmetric signature algorithm. Libsodium's crypto_sign APIs provide a nice API for this. Alternatively, if you want to use the underlying algorithms directly, EdDSA is a decent choice.
Either of these options will require a relatively large amount of space (32 to 64 bytes) to be allocated for a signature, and verifying that signature will take significantly more time than a noncryptographic signature. This is largely unavoidable if you need to effectively prevent tampering.
For your purpose, you can use a cryptographic hash such as SHA256. It is very reliable and collisions are abysmally unlikely but you should test if the speed is OK.
There is a sample implementation in this response: https://stackoverflow.com/a/55033209/4593267
To detect intentional tampering with the data, you can add a secret key to the hashed data. The device will need to have a copy of the secret key, so it is not a very secure method as the key could be extracted from the device through reverse engineering or other methods. If the device is well protected against that, for example if it is inside a secure location, a secure chip or in a very remote location such as a satellite in space and you are confident there are no flaws providing remote access, this may be sufficient.
Otherwise an asymmetrical cryptographic system is required, with a private key known only to the legitimate source(s) of those data files, and a public key used by the device to verify the cryptographic hash, as documented in duskwuff's answer.
If you're only concerned about accidental or non-malicious tampering, a CRC should be sufficient.
(I'm using a somewhat circular definition of 'malicious' here: if somebody goes to the trouble of recalculating or manipulating the CRC to get their edits to work, that counts as 'malicious' and we don't defend against it.)

how much time for opening a file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
in my program, i'm using file.open(path_to_file); .
in the server side, i have a directory that contains plenty of files, and i'm afraid that the program will take longer time to run if the directory is more and more bigger because of the file.open();
//code:
ofstream file;
file.open("/mnt/srv/links/154");//154 is the link id and in directory /mnt/srv/links i have plenty of files
//write to file
file.close();
Question: can the time to excecute file.open() vary according to the number of files in the directory?
I'm using debian, and I believe my filesystem is ext3.
I'm going to try to answer this - however, it is rather difficult, as it would depend on, for example:
What filesystem is used - in some filesystems, a directory consists of an unsorted list of files, in which case the time to find a particular file is O(n) - so with 900000 files, it would be a long list to search. On the other hand, some others use a hash algorithm or a sorted list, allow O(1) and O(log2(n)) respectively - of course, each part of a directory has to be found individually. With a number of 900k, O(n) is 900000 times slower than O(1), and O(log2(n)) for 900k is just under 20, so 18000 times "faster". However, with 900k files, even a binary search may take some doing, because if we have a size of each directory entry of 100 bytes [1], we're talking about 85MB of directory data. So it will be several sectors to read in, even if we only touch at 19 or 20 different places.
The location of the file itself - a file located on my own hard-disk will be much quicker to get to than a file on my Austin,TX colleague's file-server, when I'm in England.
The load of any file-server and comms links involved - naturally, if I'm the only one using a decent setup of a NFS or SAMBA server, it's going to be much quicker than using a file-server that is serving a cluster of 2000 machines that are all busy requesting files.
The amount of memory and overall memory usage on the system with the file, and/or the amount of memory available in the local machine. Most modern OS's will have a file-cache locally, and if you are using a server, also a file-cache on the server. More memory -> more space to cache things -> quicker access. Particularly, it may well cache the directory structure and content.
The overall performance of your local machine. Although nearly all of the above factors are important, the simple effort of searching files may well be enough to make some difference with a huge number of files - especially if the search is linear.
[1] A directory entry will have, at least:
A date/time for access, creation and update. With 64-bit timestamps, that's 24 bytes.
Filesize - at least 64-bits, so 8 bytes
Some sort of reference to where the file is - another 8 bytes at least.
A filename - variable length, but one can assume an average of 20 bytes.
Access control bits, at least 6 bytes.
That comes to 66 bytes. But I feel that 100 bytes is probably more typical.
Yes, it can. That depends entirely on the filesystem, not on the language. The times for opening/reading/writing/closing files are all dominated by the times of the corresponding syscalls. C++ should add relatively little overhead, even though you can get surprises from your C++ implementation.
There are a lot of variables which might affect the answer to this, but the general answer is that the number of files will influence the time taken to open a file.
The biggest variable is the filesystem used. Modern filesystems use directory index structures such as B-Trees, to allow searching for known files to be a relatively fast operation. On the other hand, listing all the files in the directory or searching for subsets using wildcards can take much longer.
Other factors include:
Whether symlinks need to be traversed to identify the file
Whether the file is local or mounter over a network
Cacheing
In my experience, using a modern filesystem, an individual file can be located in directories containing 100's of thousands of files in times less than a second.

Atomic file replacement in Clojure

I have an app that writes that updates a disk file, but I want to make sure, as much as possible, that the previous version of the file doesn't get corrupted.
The most straight forward way to update a file, of course, is to simply write:
(spit "myfile.txt" mystring)
However, if the PC (or java process) dies in the middle of writing, this has a small chance of corrupting the file.
A better solution is probably to write:
(do (spit "tempfile" mystring)
(.rename (file "tempfile") "myfile.txt")
(delete-file "tempfile"))
This uses the java file rename function, which I gather is typically atomic when performed on a single storage device in most cases.
Do any Clojurians with some deeper knowledge of Clojure file IO have any advice on whether this is the best approach, or if there's a better way to minimize the risk of file corruption when updating a disk file?
Thanks!
This is not specific to Clojure; a temp-rename-delete scenario does not guarantee an atomic replace under the POSIX standard. This is due to the possibility of write reordering - the rename might get to the physical disk before the temp writes do, so when a power failure happens within this time window, data loss happens. This is not a purely theoretical possibility:
http://en.wikipedia.org/wiki/Ext4#Delayed_allocation_and_potential_data_loss
You need an fsync() after writing the temp file. This question discusses calling fsync() from Java.
The example you give is to my understanding completely idiomatic and correct. I would just do a delete on tempfile first in case the previous run failed and add some error detection.
Based on the feedback from your comment, I would recommend that you avoid trying to roll your own file-backed database, based on a couple of observations:
Persistent storage of data structures in the filesystem that is consistent in the case of crashes is a tough problem to solve. Lots of really smart people have spent lots of time thinking about this problem.
Small databases tend to grow into big databases and collect extra features over time. If you roll your own, you'll find yourself reinventing the wheel over the course of the project.
If you're truly interested in maintaining consistency of your application's data in the event of a crash, then I'd recommend you look at embedding one of the many freely available databases that are available - you could start by looking at Berkely DB, HyperSQL, or for one with a more Clojure flavor, Datomic.

Performance issues in writing to large files?

I have been recently involved in handling the console logs for a server and I was wondering, out of curiosity, that is there a performance issue in writing to a large file as compared to small ones.
For instance is it a good idea to keep the log file size small instead of letting them grow bulky, but I was not able to argue much in favor of either approach.
There might be problems in reading or searching in the file, but right now I am more interested in knowing if writing can be affected in any way.
Looking for an expert advice.
Edit:
The way I thought it was that the OS only has to open a file handle and push the data to the file system. There is little correlation to the file size, since you have to keep on appending the data to the end of the file and whenever a block of data is full, OS will assign another block to the file. As I said earlier, there can be problems in reading and searching because of defragmentation of file blocks, but I could not find much difference while writing.
As a general rule, there should be no practical difference between appending a block to a small file (or writing the first block which is appending to a zero-length file) or appending a block to a large file.
There are special cases (like trying to fault in a triple-indirect block or the initial open having to read all mapping information) which could add additional I/O's. but the steady-state should be the same.
I'd be more worried about the manageability of having huge files: slow to backup, slow to copy, slow to view, etc.
I am not an expert, but I will try to answer anyway.
Larger files may take longer to write on disk and in fact it is not a programming issue. It is file system issue. Perhaps there are file systems, which does not have such issues, but on Windows large files cannot be write down in one piece so fragmenting them will take time (for the simple reason that head will have to move to some other cylinder). Assuming that we are talking about "classic" hard drives...
If you want an advice, I would go for writing down smaller files and rotating them either daily or when they hit some size (or both actually). That is rather common approach I saw in an enterprise-grade products.

Is it possible to delete both ends of a large file without copying?

I would like to know if it is possible, using Windows and c++, to take a large video file (several gigabytes in length) and delete the first and last few hundred megabytes of it “in-place”.
The traditional approach of copying the useful data to a new file often takes upwards of 20 minutes of seemingly needless copying.
Is there anything clever that can be done low-level with the disk to make this happen?
Sure, it's possible in theory. But if your filesystem is NTFS, be prepared to spend a few months learning about all the data structures that you'll need to update. (All of which are officially undocumented BTW.)
Also, you'll need to either
Somehow unmount the volume and make your changes then; or
Learn how to write a kernel filesystem driver, buy a license from MS, develop the driver and use it to make changes to a live filesystem.
It's a bit easier if your filesystem is something simpler like FAT32. But either way: in short, it might be possible, but even if it is it'll take years out of your life. My advice: don't bother.
Instead, look at other ways you could solve the problem: e.g. by using an avisynth script to serve just the frames from the region you are interested in.
Are you hoping to just fiddle around with sector addresses in the directory entry? It's virtually inconceivable that plan would work.
First of all, it would require that the amount of data you wish to delete be exactly a sector size. That's not very likely considering that there is probably some header data at the very start that must remain there.
Even if it mets those requirements, it would take a low-level modification, which Windows tries very hard to prevent you from doing.
Maybe your file format allows to 'skip' the bytes, so that you could simply write over (i.e. with memory mapping) the necessary parts. This would of course still use up unnecessarily much disk space.
Yes, you can do this, on NTFS.
The end you remove with SetFileLength.
The beginning, or any other large consecutive region of the file, you overwrite with zeros. You then mark the file "sparse", which allows the file system to reclaim those clusters.
Note that this won't actually change the offset of the data relative to the beginning of the file, it only prevents the filesystem from wasting space storing unneeded data.
Even if low level filesystem operations were easy, editing a video file is not simply a matter of deleting unwanted megabytes. You still do have to consider concepts such as compression, frames, audio and video muxing, media file containers, and many others...
Your best solution is to simply accept your idle twenty minutes.

Resources