Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In C, I am using the Libsndfile library to help me read the values of a wav file so that I can do some computations to them after. Although, when I get the output of the file, I am not sure what these numbers mean. Why are the numbers in the millions? At first I thought it was Hz, but it did not make sense in my mind. The information regarding the wav file can be seen below. Under that, I am using the function sf_read_int() to write the values into memory.
What does sf_read_int() do? This was obtained from the api documentation of libsndfile:
The file write items functions write the data in the array pointed to by ptr to the file. The items parameter must be an integer product of the number of channels or an error will occur.
I decided to plot some of these huge values on a graph, and it looks very similar to what the wav file should look like (If I imported into audacity and zoomed in on a specific location, I would see this). Note that the values shown are not the same values on the graph, I sampled the values at a random point in time. So I guess the real question is, why are these values so big (in the millions)? And what do they represent? (Are they bytes?)
in limits.h you can probably find two such definitions (among other stuff):
#define INT_MAX 0x7FFFFFFF
#define INT_MIN 0x80000000
which correspond to the decimal range between -2147483648 and 2147483647.
Libsndfile manual says:
sf_count_t sf_read_int(SNDFILE *sndfile, int *ptr, sf_count_t items);
i.e., reads sound file content into integer values pointed to by int *ptr. Whichever value falls between INT_MIN and INT_MAX is a legitimate value. In libsndfile API the data type used by the calling program and the data format of the file do not need to be the same.
Please also observe there's no such thing as "frequency" in a sound file. Linear PCM files only consist of raw sample data preceded by a header, whereas "frequency" is a mathematical abstraction or analysis result.
This might be of your interest:
When converting between integer PCM formats of differing size (e.g. using sf_read_int() to read a 16 bit PCM encoded WAV file) libsndfile obeys one simple rule:
Whenever integer data is moved from one sized container to another sized container, the most significant bit in the source container will become the most significant bit in the destination container.
Be sure to thoroughly read the manual, especially when it is clearly written.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need a way to say for example "check the 500.000th integer and confront it with x", i just know that the integers inside the file are ordered from lowest to highest.
I have a file with 1.000.000 integers , i need to find the posizion of the integer X , is there a way to move between integer positions ?
I need to use the binary search without loading the integers in memory.
I've tought about using lseek(), fseek() and so on, but i am not sure how to use them, integers seem to occupy different byte sizes too.
Thanks in advance for your time.
If the numerals in the file are different lengths, e.g., “34” is two characters, and “1733” is four characters, there is generally no way to move directly to a specific numeral in the file, because neither you nor the file system knows exactly where in a file a particular numeral begins until you or it examines the file.
Since the numerals are ordered by value, you could find a particular value by guessing where it might be in the file, seeking to that location, reading to find a delimiting character (something that is not part of a numeral, such as a space, comma, new-line, for example), and then reading the numeral that follows. The value of that numeral would tell you whether to move further back or ahead in the file.
To find numerals by index (which numeral it is in the file—first, second, third, and so on), one option would be to read the file, counting numerals, and recording the position within the file of every, say, 100th numeral. So you could build an index table that you kept in memory. Then, when you wanted to find some numeral, say the 3437th, you could use the table to seek to the 3400th numeral and then read characters until you counted 37 numerals.
Another option would be to read the file and rewrite the numerals to a new file using a fixed width for them (adding leading zeros or spaces so every numeral had the same number of characters). Then you could easily seek in the new file since the position of numeral x would be xw, where w is the number of characters used for each numeral, including delimiters.
Note when seeking with calculated offset: In C, you should use streams in binary mode, which supports seeking a specified number of bytes from the beginning of the file. Text mode streams generally do not support this.
A binary search would work even with a text file, provided that your integers have clear separators in the file. It would also have the required - though suboptimal - O(lg n) search depth. Just seek to the approximate byte offset and read characters until you find a separator and read the next number following it.
Even more so if you can memory map.
But this is still solving the wrong problem - if you can, you should write the numbers in a fixed-length binary format, and memory map that file and use the standard library bsearch function to realize the binary search!
This question already has answers here:
how to write an integer to a file (the difference between fprintf and fwrite)
(2 answers)
Closed 5 years ago.
I want to write the content of an array into a file. The code below is a simplified program of mine. It does create the file "test.txt", but it only writes a bunch of 0 and \ into the file. What am I doing wrong?
#include <stdio.h>
#include <stdlib.h>
int main(){
int a[10],i;
for(i=0;i<10;i++) a[i]=i;
FILE* f = NULL;
f = fopen("test.txt","ab+");
fwrite(&a,sizeof(int),sizeof(a),f);
return 0;
}
These are the contents of the file:
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00 \00\00\00\90#\00\00\00\00\00P8\00\00\00\00\00\00\00\00 \00\00\00\00\00\00\00\00\00\00\00\B9\C5?\00\00\00\00\00\00 \00\00\008%j\FC\00\00\00\00\00\00\00\00\00\80#\00\00\00\00\00\00\00\00\00\00\00\00\00\AD\E6\FE\8A\F6T\90#\00\00\00\00\000%j\FC\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00
You are failing to distinguish between values and representations. Understanding this difference is a vital skill every programmer needs, it's as important as understanding the distinction between things and stuff.
If you want to write the values of the array to a file, you'll have to convert them into a sensible representation to write to the file. Your code never does that.
Suppose I want to tell you how many cars I own. And it's two cars. I have to convert the number two into some form that I can communicate to you. Since we both speak English, I can say the word "two". But if I just tried to present to you the internal way my brain encodes the number of cars I have, you wouldn't be able to make any sense out of it. That encoding will only make sense to my brain.
Your code tries to write the internal encoding of the array to the file directly. But who knows how to make sense of that internal encoding? It will look like gibberish.
It's like trying to actually write the number three on a piece of paper. You can write a representation of it, like "three" or "3" or "III". But you can't just take the number itself and somehow write it on the paper. That's a category error.
Similarly, if you are happy and want to tell people that you are happy, you have to pick a language and a way to encode that language (spoken or written) and represent the idea that you are happy in that chosen language and form. You can't just output that fact that you happy without first choosing an appropriate representation and encoding the idea appropriately for transport to others. Again, that's a category error.
Punch "serialization" -- the process of converting internal encodings into sensible streams of bytes with a well-defined format -- into your favorite search engine.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Suppose I'm designing a file manager and want to implement searching of a file by its type hypothetically then which one of these methods will be more efficient -
use the name of the file and trim the extension of each file.
use of specific bytes for the type of file we are searching for example in the case of jpeg images.
bytes 0xFF, 0xD8 indicate start of image
bytes 0xFF, 0xD9 indicate end of image
Since you have to know it's filename before open it, the name trim option will be probably faster. However, you could have false results with that method if an extension does not match with actual file type.
Doing that way will save you some system calls (open, read, maybe fseek, close).
Assuming your goal being: "search a file by its type" without further limitations you have to do it by checking the actual data.
But you might be OK with some false positives and false negatives. If you are searching for image files by looking for extensions only, you can get "image.jpg?width=1024&height=800" instead of "image.jpg" for an image file, a false negative, or "image.jpg" instead of "image.exe", a false positive.
You can, on the other side, check for the first couple of bytes in the file--most schemes for image data have an individual header. This method has much less points of failure. You can get a false positive if you got a chunk of random data with the first bytes resembling the header of an image file. Possible, but highly unlikely. You can get a false negative if the header got stripped (e.g.: on the transport, somehow, or a bad script that produced the file). Also possible and also unlikely, even more so, if not much more.
The small Unix tool file does that and once had an easy to parse text-file you could use for you own project. It's nowadays a large folder with several single files that doesn't even gets installed, only in a precompiled form. You can find the folder with the text-files online, at e.g.: http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/saucy/file/saucy/files/head:/magic/Magdir/ The format is described in the manpage magic(5) which is also online at e.g.: https://linux.die.net/man/5/magic
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
There two computers having same number of many files. How do we find out if there is a slight change in any one of the file in One computer. The Network communication is very slow between these computers
You can use md5sum utility. For windows please check [this] (https://support.microsoft.com/en-us/help/889768/how-to-compute-the-md5-or-sha-1-cryptographic-hash-values-for-a-file) and for linux use md5sum filename and then compare the hash values.
One idea would be to generate a hash for each file. Hashes convert an arbitrary length file to a fixed size. You could further hash the hashes together, then upload it and compare. Hashing is something used extensively to ensure downloads are not corrupt.
You could hash the files and compare the hashes via the network.
A good hashfunction is designed that if there is only a little difference in the input of the function, then the output will be totally different. Furthermore most hashfunctions have a output-length of 160-512 bits nowadays. Meaning although you might want to compare two files which are several gigabyte big you would only need to send a small string of 512 bits over the network to see if the hashes match.
If you have millions of files maybe this would be already to much. A solution would look like this:
Hash each file on each computer
Then concatenate the hashes and hash the concatenated string again
Now compare this output if it differs you know, that there is a difference in those files.
To find which file differs (or even where exactly in the file) you can use binary search:
Split the millions of files into two parts, now go to step 1-3 (if you have enough space you could save the hash of each file to speed up).
Now for each of the two hashes which differ go to step 4-6 recursively.
If you located the files that differ you could again split up the file by the number of lines and work like in 4-6.
At some point the number of lines will be such small that the hash maybe would be longer than the actual content of the lines. Now it is of course more efficient to compare the actual content in the naive way.
Assuming you would have only one file which differs this would only need logarithmic many hashes to be sent over the network and therefore minimize the network traffic.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I wish to compare a value from a particular location in a binary file (say, value from index n x i, where i = 0,1,2,3... and n = any number, say 10).
I want to see if that value is equal to another, say "m". The location of that value in the file is always in n x i only.
I can think of three methods to this:
I maintain a temp variable which stores the value of n x i and I directly use fseek go to that index and see if it is equal to m.
I do an fseek for the value of m in the file.
I search for the value of m in locations 0, n, 2n, 3n,... and so on using fseek.
I don't know how each of these operations work, but which one of these is the most efficient with respect to space and time taken?
Edit:
This process is a part of a bigger process, which has many more files and hence time is important.
If there is any other way than using fseek, please do tell.
Any help appreciated. Thanks!
Without any prior knowledge of the values and ordering in the file you are searching, the quickest way is just to look through the file linearly and compare values.
This might be best done by using fseek() repeatedly, but repeatedly calling fseek and read may be slower than just reading big chunks of the file and looking through them in memory - because system calls have a lot of overhead.
However if you are doing a lot of searches of the same files, you would be better off building an index and/or sorting your records. One way to do this would be put the data into a relational database with built-in indexes (pretty much any SQL database)
Edit:
Since you know your file is sorted, you can use a binary search.