One of the big deals in Silverlight v4 is audio/video capture... but I haven't found an example yet that does what I want to do. So:
How do you capture audio/video with Silverlight (from a webcam), and then save it as a compressed format (WMV or MP4)? The idea here is to upload it after compression.
Have already looked at this blog post for the capture piece, but need to find a way to compress audio/video for upload.
Silverlight does not support video encoding and more likely this won't be implemented at least by Microsoft. To transmit video over network, some people use "pseudo-MJPEG" codec by compressing individual frames as regular JPEG images. Some people even improved that idea by dividing frames into fixed block (say 8x8), and only transmits changed blocks (with lossy comparison).
If you're a veteran programmer and enjoy coding, here is another slightly improved version of "psuedo-MJPEG" idea:
Divide current frame into fixed 8x8 block
Apply RGB -> YCbCr color space conversion for each block
Down sample Cb and Cr plane by half
Apply DCT to YCbCr
Quantize DCT coefficients with a quantization matrix
Compare this DCT coefficients with previous frame's block. This way you make "perceptually lossy" comparison for each consecutive frames.
Use a bit-wise range-coder and encode a flag for unchanged blocks
For changed blocks, transmit DCT coefficient by modeling them (you can use JPEG's standard zig-zag pattern and zero-run model) and encode them with range coder.
This is more or less a standard JPEG algorithm actually. But, actual advantages over standard JPEG are:
Perceptually lossy comparison for blocks
Stronger compression due to both small overhead and stronger entropy coder (range coder)
Another option could be pay for 3rd party software (sorry, I don't know any free software). I find that product. I didn't used it at all. But, I believe it could be useful for you.
Related
edit: I try to rephrase this as to make this clearer the best I can :)
I need to find a suitable way / choose a suitable compression to store a blob of data (say approx. 900KB) in a ROM where the available amount of free space is only about 700KB. If I compress the blob with some modern compression tool (eg. WinZIP/WinRAR) I can achieve the required compression easily.
The matter here is that the decompression will take place on a very VERY VERY limited hardware where I can't afford to have more than few bytes of RAM available (say no more than 100 bytes, for the sake of it).
I already tried RLE'ing the data... the data hardly compress.
While I'm working trying to change the data blob format so that it could potentially have more redundancy and achieve better compression ratio, I'm at the same time seeking a compression method that will enable me to decompress on my limited hardware. I have a limited knowledge of compression algorithms so I'm seeking suggestions/pointers to continue with my hunt.
Thanks!
Original question was "I need info/pointers on decompression algorithms that can work without using the uncompressed data, as this will be unavailable right after decompression. LZ like approaches would still be preferred."
I'm afraid this is off topic because too broad.
LZW uses a sizable state that is not very different from keeping a slice of uncompressed data. Even if the state is constant and read from ROM, handling it with just registers seems difficult. There are many different algorithms than can use a constant state, but if you really have NO RAM, then only the most basic algorithms can be used.
Look up RLE, run length encoding.
EDIT: OK, no sliding window, but if you can access ROM, 100 bytes of RAM give you quite some possibilities. You want to implement this in assembly, so stick with very simple algorithms. RLE plus a dictionary. Given your requirements, the choice of algorithm should be based on the type of data you need to decompress.
I'm working with radio-to-radio communications where bandwidth is really really precious. It's all done with on the metal C code (no OS, small atmel 8bit microprocessors). So the idea of compression becomes appealing for some large, but rare, transmissions.
I'm no compression expert. I've used the command line tools to shrink files and looked at how much I get. And linked a library or two over the years. But never anything this low level.
In one example, I want to move about 28K over the air between processors. If I just do a simple bzip2 -9 on a representative file, I get about 65% of the original size.
But I'm curious if I can do better though. I am (naively?) under the impression that most basic compression formats must be some declaration of metadata up front, that describes how to inflate a bitstream that follows. What I don't know is how much space that metadata itself takes up. I histogram'ed said same file, and a number of other ones, and found that due to the nature of what's being transmitted, the histogram is almost always about the same. So I'm curious if I could hard code these frequencies in my code so that that was no longer dynamic, but also wasn't transmitted as part of the packet.
For example, my understanding of a huffman encoding is that usually there's a "dictionary" up front, followed by a bitstream. And that if a compressor does it by blocks, each block will have its own dictionary.
On top of this, it's a small processor, with a small footprint, I'd like to keep whatever I do small, simple, and straightforward.
So I guess the basic question is, what, if any, basic compression algorithm would you implement in this kind of environment/scenario. Especially taking into account, that you can basically precompile a representative histogram of the bytes per transmission.
What you are suggesting, providing preset frequency data, would help very little. Or more likely it would hurt, since you will take a hit by not using optimal codes. As an example, only about 80 bytes at the start of a deflate block is needed to represent the literal/length and distance Huffman codes. A slight increase in the, say, 18 KB of your compressed data could easily cancel that.
With zlib, you could use a representative one of your 28K messages as a dictionary in which to search for matching strings. This could help the compression quite a bit, if there are many common strings in your messages. See deflateSetDictionary() and inflateSetDictionary().
This question already has an answer here:
How to generate the audio spectrum using fft in C++? [closed]
(1 answer)
Closed 9 years ago.
How would I go about implementing a spectrum analyser like the ones in WinAmp below?
Just by looking at it, I think that these bars are rendered to display the 'volume level' of a specific frequency band of the incoming audio data; however, I'm not sure how to actually calculate this data needed for the rather easy task of drawing the bars.
From what I've been told and understand, calculating these values can be done by using an FFT — however, I'm not exactly sure how to calculate those, given a buffer of input data — am I on the right track about FFTs? How would I apply an FFT on the input data and get, say, an integer out of the FFT that represents the 'volume' of a specific frequency band?
The drawing part isn't a problem, since I can just draw directly to my framebuffer and render that out. I'm doing this as a project on an FPGA, using a Nios II soft-CPU, in case anyone's wondering about potential hardware limitations. Audio data comes in as 24-bit data at 96kHz.
You're probably looking for FFTw.
edit:
To elaborate on your question:
calculating these values can be done by using an FFT — however, I'm not exactly sure how to calculate those, given a buffer of input data: yes, you're right; that's exactly how it's done. You take a (necssarily small, due to the time-frequency uncertainty principle) sample segment out of the currently playing audio data, and feed it to a (typically) discrete, real-only FFT (one of the best known, most widely used and fastest being the DCT family of DFTs - in fact there are highly optimized versions of most DCTs in FFTw). Then you take out the next sample segment and repeat the process.
The output of the FFT will be the freqeuncy decomposition of the audio signal that has been fed in - you then need to decide how to display it (i.e. which function to use on the outputs of the FFT, common candidates being f(x) = x; f(x) = sqrt(x); f(x) = log(x)) and also how to present/animate the following readings (e.g. you could average each band in the temporal direction or you could have the maximums "fall off" slowly).
rage-edit:
Additional links since it appears somebody knows how to downvote but not how to use google:
http://en.wikipedia.org/wiki/FFTW
http://webcache.googleusercontent.com/search?q=cache:m6ou54tn_soJ:www.fftw.org/+&cd=1&hl=en&ct=clnk&gl=it&client=firefox-a
http://web.archive.org/web/20130123131356/http://fftw.org/
It's pretty straight forward - just use one of the many FFT algorithms! Most of them require floating point calculations, but a google search brings up methods with just integers. You are spot on though, FFT's are what you want.
http://en.wikipedia.org/wiki/Fast_Fourier_transform
To understand how to apply a FFT, you should have a read of this page on discrete fourier transforms, although it's quite heavy on maths:
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
To implement it on your FPGA, I'd have a look at the source code of this project:
http://qt-project.org/doc/qt-4.8/demos-spectrum.html
Here's a previous SO question that gives a summary of how it works (in any language).
How to generate the audio spectrum using fft in C++?
There's an immense amount of information on creating what by the way is called a "Spectrum Analyser", there must be dozens of complete implementations where the source code is freely available. Just page through a google search for "Spectrum analyser source code C" for example.
What is the fastest algorithm for compressing RGBA 32 bit image data? I am working in C, but am happy for examples in other programming languages.
Right now I am using LZ4 but I am considering run length / delta encoding.
Lossless encoding, a mix of real life images and computer generated / clipart images. Alpha channel always exists, but is usually constant.
I ended up just using LZ4. Nothing else was even close to as fast and LZ4 usually got at least 50% size reduction.
Lossy or lossless?
"Real" images or computer graphics?
Do you actually have an alpha channel?
If you need lossless (or semi=lossless) then converting into YUV and compressing that will probably reduce by about 1/2 (after already having it in going to 2bytes/pixel) try Huffyuv
If you have real images then H264 can do very high compression and there are optomised libraries and HW support so it can be very fast.
If you have computer graphics type images with few colours but need to preserve edges, or you actually have an A channel then run length might be good - try splitting the image into per-colour frames first.
LZ4 is LZ77 family which is a few lines of code but I never did it myself but I guess you are right run length or delta code is the fastest and also good for images. There is also snappy algorithm. Recently I tried the exdupe utility to compress my virtual machines. This thing is also incredible fast: http://www.exdupe.com. exdupe seems to use a rzip thing: http://encode.ru/threads/1354-Data-deduplication.
Where Can I find algorithm details for holistic word recognition? I need to build a simple OCR system in hardware (FPGAs actually), and the scientific journals seems so abstract?
Are there any open source (open core) codes for holistic word recognition?
Thanks
For an algorithm that is quite suitable for FPGA implementation (embarrassingly parallel) you might look at:
http://en.wikipedia.org/wiki/Cross-correlation
It is fast, and easily implemented.
The only thing is: it recognizes a shape (in your case some text) DEPENDENT of the rotation and size / stretch / skew etc. But if that isn't a problem, it can be very fast and is quite robust. You should only watch out for interpretation problems with characters that are similar (like o and c).
I used it to find default texts on scanned forms to obtain bearings where Region of Interests are and searching in those images (6M pixels) only took around 15 ms with our implementation on a Core2 CPU in a single thread.