Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I wish to compare a value from a particular location in a binary file (say, value from index n x i, where i = 0,1,2,3... and n = any number, say 10).
I want to see if that value is equal to another, say "m". The location of that value in the file is always in n x i only.
I can think of three methods to this:
I maintain a temp variable which stores the value of n x i and I directly use fseek go to that index and see if it is equal to m.
I do an fseek for the value of m in the file.
I search for the value of m in locations 0, n, 2n, 3n,... and so on using fseek.
I don't know how each of these operations work, but which one of these is the most efficient with respect to space and time taken?
Edit:
This process is a part of a bigger process, which has many more files and hence time is important.
If there is any other way than using fseek, please do tell.
Any help appreciated. Thanks!
Without any prior knowledge of the values and ordering in the file you are searching, the quickest way is just to look through the file linearly and compare values.
This might be best done by using fseek() repeatedly, but repeatedly calling fseek and read may be slower than just reading big chunks of the file and looking through them in memory - because system calls have a lot of overhead.
However if you are doing a lot of searches of the same files, you would be better off building an index and/or sorting your records. One way to do this would be put the data into a relational database with built-in indexes (pretty much any SQL database)
Edit:
Since you know your file is sorted, you can use a binary search.
Related
I have a large csv file in the following format:-
ID,Hash
abc,123
def,456
ghij,7890
I want to efficiently read a line corresponding to given ID and make changes to corresponding hash. I am allowed to store some information in an initial pass, but the changes need to be dynamic. What can I do?
I don't want to iterate over all lines while making changes. No assumptions can be made about size of any entry in general. It may also change. File has no order.
This seems difficult, but please provide me some code by which I can acess some part of file in constant time. I think I can figure out a heuristic. It would be best if the address can be iterated in both directions from a given point.
Michael Walz asks "Does the hash have a fixed size? The answer depends heavily on this. If the size is fixed, then it's easy to update the hash directly in the file, otherwise the file must be read and rewritten".
More general, if the records in the file have a fixed length, then you can seek to the record and replace it. If not, you have to spool the file. Assuming fixed length, you can sp[eed up a possible search process if the file has e.g. an order (is sorted) as then you can use binary search to quickly (O(log N)) find the record.
See Klas Lindbach's solution of a basic binary search at Fastest array lookup algorithm in C for embedded system?. The same idea holds for a file (an array of records, but on disk).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
There two computers having same number of many files. How do we find out if there is a slight change in any one of the file in One computer. The Network communication is very slow between these computers
You can use md5sum utility. For windows please check [this] (https://support.microsoft.com/en-us/help/889768/how-to-compute-the-md5-or-sha-1-cryptographic-hash-values-for-a-file) and for linux use md5sum filename and then compare the hash values.
One idea would be to generate a hash for each file. Hashes convert an arbitrary length file to a fixed size. You could further hash the hashes together, then upload it and compare. Hashing is something used extensively to ensure downloads are not corrupt.
You could hash the files and compare the hashes via the network.
A good hashfunction is designed that if there is only a little difference in the input of the function, then the output will be totally different. Furthermore most hashfunctions have a output-length of 160-512 bits nowadays. Meaning although you might want to compare two files which are several gigabyte big you would only need to send a small string of 512 bits over the network to see if the hashes match.
If you have millions of files maybe this would be already to much. A solution would look like this:
Hash each file on each computer
Then concatenate the hashes and hash the concatenated string again
Now compare this output if it differs you know, that there is a difference in those files.
To find which file differs (or even where exactly in the file) you can use binary search:
Split the millions of files into two parts, now go to step 1-3 (if you have enough space you could save the hash of each file to speed up).
Now for each of the two hashes which differ go to step 4-6 recursively.
If you located the files that differ you could again split up the file by the number of lines and work like in 4-6.
At some point the number of lines will be such small that the hash maybe would be longer than the actual content of the lines. Now it is of course more efficient to compare the actual content in the naive way.
Assuming you would have only one file which differs this would only need logarithmic many hashes to be sent over the network and therefore minimize the network traffic.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In C, I am using the Libsndfile library to help me read the values of a wav file so that I can do some computations to them after. Although, when I get the output of the file, I am not sure what these numbers mean. Why are the numbers in the millions? At first I thought it was Hz, but it did not make sense in my mind. The information regarding the wav file can be seen below. Under that, I am using the function sf_read_int() to write the values into memory.
What does sf_read_int() do? This was obtained from the api documentation of libsndfile:
The file write items functions write the data in the array pointed to by ptr to the file. The items parameter must be an integer product of the number of channels or an error will occur.
I decided to plot some of these huge values on a graph, and it looks very similar to what the wav file should look like (If I imported into audacity and zoomed in on a specific location, I would see this). Note that the values shown are not the same values on the graph, I sampled the values at a random point in time. So I guess the real question is, why are these values so big (in the millions)? And what do they represent? (Are they bytes?)
in limits.h you can probably find two such definitions (among other stuff):
#define INT_MAX 0x7FFFFFFF
#define INT_MIN 0x80000000
which correspond to the decimal range between -2147483648 and 2147483647.
Libsndfile manual says:
sf_count_t sf_read_int(SNDFILE *sndfile, int *ptr, sf_count_t items);
i.e., reads sound file content into integer values pointed to by int *ptr. Whichever value falls between INT_MIN and INT_MAX is a legitimate value. In libsndfile API the data type used by the calling program and the data format of the file do not need to be the same.
Please also observe there's no such thing as "frequency" in a sound file. Linear PCM files only consist of raw sample data preceded by a header, whereas "frequency" is a mathematical abstraction or analysis result.
This might be of your interest:
When converting between integer PCM formats of differing size (e.g. using sf_read_int() to read a 16 bit PCM encoded WAV file) libsndfile obeys one simple rule:
Whenever integer data is moved from one sized container to another sized container, the most significant bit in the source container will become the most significant bit in the destination container.
Be sure to thoroughly read the manual, especially when it is clearly written.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
It's said that DES is insecure. I guess it's because the key is 55 bit long, so using brute-force would take max 2^55 iterations to find out the key which is not many nowadays. But if we iterate 2^55, when do we know when to stop?
It's 256 not 255.
There are a couple of options for how to know when to stop. One is that you're doing a known-plain-text attack -- i.e., you know the actual text of a specific message, use that to learn the key, then use that to be able to read other messages. In many cases, you won't know the full text, but may know some pieces anyway -- for example, you may know something about an address block that's used with all messages of the type you care about, or if a file has been encrypted may have a recognizable header even though the actual content is unknown.
If you don't know (any of) the text for any message, you generally depend on the fact that natural languages are generally fairly redundant -- quite a bit is known about their structure. For a few examples, in English, you know that a space is generally the most common character, e is the most common letter, almost no word has more than two identical letters in a row, nearly all words contain at least one vowel, etc. In a typical case, you do a couple different levels of statistical analysis -- a really simple one that rules out most possibilities very quickly. For those that pass that test you do a second analysis that rules out the vast majority of the rest fairly quickly as well.
When you're done, it's possible you may need human judgement to choose between a few possibilities -- but in all honesty, that's fairly unusual. Statistical analysis is generally entirely adequate.
I should probably add that some people find statistical analysis problematic enough that they attempt to prevent it, such as by compressing the data with an algorithm like Huffman compression to maximize entropy in the compressed data.
It depends on the content. Any key will produce some output, so there's no automatic way to know you've found the correct key unless you can guess what sort of thing you're looking for. If you expect that the encrypted data is text, you can check whether each decrypt contains mostly ASCII letters; similarly, if you expect that it's a JPEG file, you can check whether the decrypt starts with the characters "JFIF".
If you expect that the data is not compressed, you can run various entropy tests on the decrypts, looking for decrypts with low entropy.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
Hii
I wanna copy an NTFs partition to another partition of same type and same size.And I tried with windows function
Copyfile() and it worked but slow speed is a problem.Then I did with readfile() and WriteFile() instead of Copyfile() again speed is a problem.
How can I get a better speed...??
I did the same operation in kernel mode and getting slow performance using zwCreatefile() ,zwReadfile() & zwWriteFile()...
How can I get a better speed .....?
I want to copy a hard disk partition into another partition. My source and destination partitions are of NTFs and same size. For that purpose first I did by copying all sectors and it is working, But I wanna copy only used sectors…
Then I find the used clusters by reading the FSCTL_VOLUME_BITMAP. But this one also a slow one ..I want to get better speed. And I tried to get the used clusters by using the FSCTL_GET_RETEIVAL_POINTER also. But it's a slow one.....
At last I tried the windows API CopyFile() also…But everything gives a slow performance…
I know fundamentally Kernel mode(ring 0) is slower than User mode in speed ,(even if ring 0 can access Hardware directly).....
Apart them these I tried also Asynchronous operation by setting OVERLAPPED flag in CreateFile.... getting small improvement....
And I've taken snapshot (Volume shadow copy)of the Volume and copied the files using Hobo copy method...but everything gives the same speed.....
Any idea to help...
I have used the Software Acronis Disk director suite .I exclaimed after finding it's speed......!!!!!!
Any idea to help me...to get a good speed.......???
Any links to the white papers related to this section...???
Thanking you
I think the easiest way is to use a Linux Live Distribution or a Linux Rescue Disk.
After the start in a terminal you have to type (if "/dev/hda1" is the source partition and "/dev/hdb1" is the destination):
dd if=/dev/hda1 of=/dev/hdb1 bs=64k
Instead of "dd" with some rescue distributions you can also use "dd_rescue".
Be careful to use the right devices! Apart of this it works very well!
Werner
In order to help you you have to share with us your definition of "better speed".
In order to calculate expected speed (rough) you need to know
1. What is the raw performance of your block devices (hard disks in your case?)
2. The size of the data you need to transfer
So if your partitions get X1 and X2 mb/s and there is Y mb to copy and the two partitions are not on the same physical device, you should expect the copy yo be done in y / min(X1, X2) seconds. Again - this is a rough estimate, just some reference point so we can give meaning to the words "better speed"
How slower than that estimate are the results you are getting?