I am working on a project in which I have to merge two 8bits .wav files using C and i still have no clue how to do it.
I have read about wav files and I want to start by reading one of the files.
There's one thing i didn't understand:
Let's say i have an 8bit WAV audio file, And i was able to read (even tho I am still trying to) the Data that starts after the 44 byte, I will get numbers between 0 and 255 logically.
My question is:
What do those numbers mean?
If I get 255 or 0 what do they mean?
Are they samples from the wave?
Can anyone please explain?
Thanks in advance
Assuming we're not dealing with file format issues, getting values between 0 and 255 means that the audio samples are of unsigned eight-bit format, as you have put it.
One way of merging data would consist of reading data from files into buffers, arrays a and b and summing them value by value: c[i] = a[i] + b[i]. By doing so, you'd have to take care of the following:
length of the files may not be equal
on summing the unsigned 8-bit buffers, such as yours will almost certainly overflow
This is usually achieved using a for loop. You first get the sizes of the chunks. Your for loop has to be written in such a way that it neither reads past the array boundary, nor ignores what can be read. For preventing overflows you can either:
divide values by two on reading
or
read (convert) into a format which wouldn't overflow, then normalize and convert the merged data back into the original format or whichever format desired.
For all particulars of reading from and writing to a .wav format file you may use some of the existing audio file libraries, or write your own routine. Dealing with audio file format is not a trivial thing, though. Here's a reference on .wav format.
Here are few audio file APIs worth of looking at:
libsndfile
sndlib
Hope this can help.
See any good guide to WAVE for information on the format of samples in the data chunk, such as this one I found: http://www.neurophys.wisc.edu/auditory/riff-format.txt
Relevant excerpts:
In a single-channel WAVE file, samples are stored consecutively. For
stereo WAVE files, channel 0 represents the left channel, and channel
1 represents the right channel. The speaker position mapping for more
than two channels is currently undefined. In multiple-channel WAVE
files, samples are interleaved.
Data Format of the Samples
Each sample is contained in an integer i. The size of i is the
smallest number of bytes required to contain the specified sample
size. The least significant byte is stored first. The bits that
represent the sample amplitude are stored in the most significant bits
of i, and the remaining bits are set to zero.
For example, if the sample size (recorded in nBitsPerSample) is 12
bits, then each sample is stored in a two-byte integer. The least
significant four bits of the first (least significant) byte is set to
zero.
The data format and maximum and minimums values for PCM waveform
samples of various sizes are as follows:
Sample Size Data Format Maximum Value Minimum Value
One to Unsigned 255 (0xFF) 0
eight bits integer
Nine or Signed Largest Most negative more bits
integer i positive value of i
value of i
N.B.: Even if the file has >8 bits of audio resolution, you should read the file as an array of unsigned char and reconstitute the larger samples manually as per the above spec. Don't try to do anything like reading the samples directly over an array of native C ints, as their layout and size is platform-dependent and therefore should not be relied upon in any code.
Note also that the header is not guaranteed to be 44 bytes long: How can I detect whether a WAV file has a 44 or 46-byte header? You need to read the length and process the header based on that, not any assumption.
Related
This question was migrated from Stack Overflow because it can be answered on Super User.
Migrated 19 days ago.
Whenever we try to open jpeg or pdf file with any text editor we find strange symbols other than ASCII. Isn't Ascii most efficient because of less space consumption by limited number of possible characters available.
I was working with a database file in linux with plocate and I found something similar.
Isn't Ascii most efficient because of less space consumption by limited number of possible characters available.
Not at all. Where did you get that idea from?
ASCII chars are 7bits long, but hardware doesn't support storing 7bits items, so ASCII is stored with 8bits, the first bit being always 0. Furthermore, ASCII includes a number of control characters that can cause issues in some situation. Therefore, the most prominent ASCII encoding (base 64) uses only 6bits. This mean that in order to encode 3 bytes (38 = 24 bits) of data you need 4 ASCII characters (4 6 = 24). Those 4 ASCII characters are then stored using 4 bytes on disk. Hence, converting a file to ASCII increases disk usage by 33%.
You can test this with the base64 command:
base64 pic.jpg > b64_jpeg.txt
ls -lh pic.jpg b64_jpeg.txt
Of course, you could try to use another ASCII encoding than the standard base64 and use all 7 bits available in ASCII. You would still get only 7bits of data per bytes on disk, thus have a +14% disk usage increase for the same data.
All modern storage uses 8-bit bytes. ASCII is an obsolete 7 bits standard, so it would take 8/7th as much storage (+14%).
It is nothing to do with number of bits as such, all binary files are the same 2 bits (true or false) what makes an image or PDF look different to Ascii text is that each single byte of bits is compressed in groups for optimal efficiency. Those symbolic strings are perhaps ASCII but compressed to about 10%.
Take a pdf of a graph as follows
ASCII = 394,132 bytes
ZIP = 88,367 bytes
PDF = 75,753 bytes
DocX = 32,940 bytes its text and lines (there are no images)
Take an image
PNG = 265,490 bytes
ZIP = 265,028 bytes
PDF = 220,152 bytes
PDF as ASCII = 3,250,970 bytes
3 0 obj
<</Length 3120001/Type/XObject/Subtype/Image/Width 640/Height 800/BitsPerComponent 8/SMask 4 0 R/ColorSpace/DeviceRGB/Filter/ASCIIHexDecode>>
stream
9cb6c79cb6c79cb6c79cb6c79db7c89db7c89db7c89fb7c9a0b8caa1b8caa1b8
caa1b8caa2b9cba2b9cba2b9cba2b9cba3bacba3bacaa4bbcba4bbcba6bccca7
...to infinity and beyond
So why is as ASCII image bigger than all the rest is because those 9cb6c7 can be tokenised as 4 x 9cb6c7 , 3 x 9db7c8 , etc , that's roughly how RunLengthEncoding would work, but zip is better than that.
So PARTS of a pdf may be compressed (needing slower decompression to view) in a zip style of coding (used for lossless fonts and bitmaps), whilst others may keep their optimal native photographic lossy compression (like jpeg). Overall for PDF parsing a higher percentage needs to be 8 bit ANSI (compatible with uni-coding or variable per platform) or 7bit ASCII for simplistic parsing.
Short answer compression is the means to reduce time of transmission or amount of storage resources. However decompression adds an overhead so is slower than RAW ASCII to display as graphics. Avoid exotic wavelets in a PDF where most objects need fast decompression.
I am currently reading about PNG file format. It turns out that the first byte of the file is specified to be equal to 0x89.
I am wondering what are the reasons of the value of that byte.
What I've already learned about the format is that the first byte is used to detect the transmition over 7-bit channel. If the value was 0x80 (1000 0000), it would make sense (if after transmition we have 0 on the first byte then 7-bit mode was used and the file is corrupted). But what is the sense of ones on zero and third positions of 0x89 (1000 1001)?
Extract from http://www.libpng.org/pub/png/spec/1.2/PNG-Rationale.html#R.PNG-file-signature
The first two bytes distinguish PNG files on systems that expect the
first two bytes to identify the file type uniquely. The first byte is
chosen as a non-ASCII value to reduce the probability that a text file
may be misrecognized as a PNG file; also, it catches bad file
transfers that clear bit 7
So the LSB of the first byte is used for file type identification.
I've read through numerous articles on GIF LZW decompression, but I'm still confused as to how it works or how to solve, in terms of coding, the more fiddly bits of coding.
As I understand it, when I get to the byte stream in the GIF for the LZW compressed data, the stream tells me:
Minimum code size, AKA number of bits the first byte starts off with.
Now, as I understand it, I have to either add one to this for the clear code, or add two for clear code and EOI code. But I'm confused as to which of these it is?
So say I have 3 colour codes (01, 10, 11), with EOI code assumed (as 00) will the byte that follows the minimum code size (of 2) be 2 bits, or will it be 3 bits factoring in the clear code? Or is the clear code/EOI code both already factored into the minimum size?
The second question is, what is the easiest way to read in dynamically sized bits from a file? Because reading an odd numbers of bits (3 bits, 12 bits etc) from an even numbered byte (8) sounds like it could be messy and buggy?
To start with your second question: yes you have to read the dynamically sized bits from an 8bit bytestream. You have to keep track of the size you are reading, and the number of unused bits left from previous read operations (used for correctly putting the 'next byte' from the file).
IIRC there is a minimum code size of 8 bits, which would give you a clear code of 256 (base 10) and an End Of Input of 257. The first stored code is then 258.
I am not sure why you did not looked up the source of one of the public domain graphics libraries. I know I did not because back in 1989 (!) there were no libraries to use and no internet with complete descriptions. I had to implement a decoder from an example executable (for MS-DOS from Compuserve) that could display images and a few GIF files, so I know that can be done (but it is not the most efficient way of spending your time).
I have a situation where there is a corrupt WAV file from which I'm trying to recover data.
My colleagues have sliced up the large WAV file into smaller WAV files with proper headers. This has produced some interesting results.
Sliced into 1MB segments we get these results:
The first wave file segment is all noise.
The second wave file segment is distorted.
The third wave file segment is clear.
This pattern is repeated for the entire length of the file (after it's been broken into smaller files).
For 20MB slices:
The first wave file segment is all noise.
The second wave file segment is clear.
The third wave file segment is distorted.
Again, this pattern is repeated for the entire length of the file (after it's been broken into smaller files).
Would anyone know why this is occurring?
Assuming the WAV contains uncompressed (raw) samples, recovery should be easy. You need to know the sample format. For example: 16 bits, two channels, 44100 Hz (which is cd quality). Because one of the segments is okay, then you can look at this to figure out what the right values are.
Then just open the WAV using these values in, e.g., Adobe Audition (formerly Cool Edit), or any other wave editor that supports import of raw data.
Edit: Okay, now to answer your question. Some segments are clear, because then the alignment is right. Take the cd quality again, as I described before. The bytes of one sample look like this:
left_channel_high | left_channel_low | right_channel_high | right_channel_low
(I'm not sure about the ordering here! But it's just an example.) So the first data byte had better be the most significant byte of the left channel, or else you'll end up with fragments of two samples being interpreted as one whole sample:
left_channel_low | right_channel_high | right_channel_low || left_channel_high
-------------------part of first sample------------------ || --second sample--
You can see that everything "shifted" here, which happens because the size of your file slices is not a multiple of the sample size in bytes.
If you're lucky, this just causes the channels to be swapped. If you're unlucky, high and low bytes get swapped. Interestingly, this does lead to kind-of recognizable, but severely distorted audio.
What puzzles me is that the pattern you report repeats in blocks of three. From the above, I'd expect either two or four. Perhaps you are using an unusual sample format, such as 24-bits (3 bytes)?
I'm trying to read a 16-bit greyscale TIFF file (BitsPerSample=16) using a small C program to convert into an array of floating point numbers for further analysis. The pixel data are, according to the header information, in a single strip of 2048x2048 pixels. Encoding is little-endian.
With that header information, I was expecting to be able to read a single block of 2048x2048x2 bytes and interpret it as 2048x2048 2-byte integers. What I in fact get is a picture split into four quadrants of 1024x1024 pixels each, the lower two of which contain only zeros. Each of the top two quadrants look like I expected the whole picture to look: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457.png
If I read the same file into Gimp or Imagemagick, both tell me that they have to reduce to 8-bit (which doesn't help me - I need the full range), but the pixels turn up in the right places: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457_gimp.png
This would suggest that my idea about how the data are arranged within the one strip is wrong. On the other hand, the file must be correctly formatted in terms of the header information as otherwise Gimp wouldn't get it right. Where am I going wrong?
Output from tiffdump:
15_inRT_0p457.tiff:
Magic: 0x4949 Version: 0x2a
Directory 0: offset 8 (0x8) next 0 (0)
ImageWidth (256) LONG (4) 1<2048>
ImageLength (257) LONG (4) 1<2048>
BitsPerSample (258) SHORT (3) 1<16>
Compression (259) SHORT (3) 1<1>
Photometric (262) SHORT (3) 1<1>
StripOffsets (273) LONG (4) 1<4096>
Orientation (274) SHORT (3) 1<1>
RowsPerStrip (278) LONG (4) 1<2048>
StripByteCounts (279) LONG (4) 1<8388608>
XResolution (282) RATIONAL (5) 1<126.582>
YResolution (283) RATIONAL (5) 1<126.582>
ResolutionUnit (296) SHORT (3) 1<3>
34710 (0x8796) LONG (4) 1<0>
(Tag 34710 is camera information; to make sure this doesn't somehow make any difference, I've zeroed the whole range from the end of the image file directory to the start of data at 0x1000, and that in fact doesn't make any difference.)
I've found the problem - it was in my C program...
I had allocated memory for an array of longs and used fread() to read in the data:
#define PPR 2048;
#define BPP 2;
long *pix;
pix=malloc(PPR*PPR*sizeof(long));
fread(pix,BPP,PPR*PPR,in);
But since the data come in 2-byte chunks (BPP=2) but sizeof(long)=4, fread() packs the data densely inside the allocated memory rather than packing them into long-sized parcels. Thus I end up with two rows packed together into one and the second half of the picture empty.
I've changed it to loop over the number of pixels and read two bytes each time and store them in the allocated memory instead:
for (m=0;m<PPR*PPR;m++) {
b1=fgetc(in);
b2=fgetc(in);
*(pix+m)=256*b1+b2;
}
You understand that if StripOffsets is an array, it is an offset to an array of offsets, right? You might not be doing that dereference properly.
What's your platform? What are you trying to do? If you're willing to work in .NET on Windows, my company sells an image processing toolkit that includes a TIFF codec that works on pretty much anything you can throw at it and will return 16 bpp images. We also have many tools that operate natively on 16bpp images.