How to loop an mp3 in a browser with zero gap?

How to loop an mp3 in a browser with zero gap? - silverlight

I'm trying to make a guitar practice website, and a critical functionality is to loop over very short mp3 files (a few seconds long), with absolutely zero gap in between. For example, it could a 4-measures-long chord progression, and I want to allow the user to loop over it seamlessly.
I tried using the HTML5 <audio> tag with the loop attribute. Google Chrome gives a small gap between the loops, but big enough to be totally unacceptable for my purpose. I haven't tested the other browsers, but I believe it won't work.
A possible workaround is to use ffmpeg to stream repetitions the same audio as an mp3. However, this costs a lot of bandwidth.
For myself I use Audacity to loop without gaps, but unfortunately Audacity doesn't have a web version.
So, do you have any ideas how I may loop over an mp3 in a browser with zero gap? I prefer non-Flash solutions, but if nothing else works I'll use Flash.
Edit:
Thank you for all your suggestions. Flash turns out to work decently. I've made a toy demo at http://vmlucid.lcm.hk/~netvope/audio/flash.html. To my surprise (I use to associate Flash with resource hog and browser crashes only), Flash and ActionScript are rather well designed and easy to use. It took me only 3 hours on my first Flash project :)

Have a look at this page. Listening for a while using Google Chrome 7, I found Method 1 works decently, while Method 3 gives the best results, though it's a bit of a hack. Ultimately, all browsers work differently, especially since HTML5 isn't finalized yet. If possible, you should opt for a Flash version, which I would think would give you the best loop.

in flash AS3 you can extract sound data with computeSpectrum() and give it to your Sound object exactly when it's needed (SampleDataEvent is fired).

I am not sure how well this will work, but if you knew your loop lasted 800 milliseconds - you could have the browser call the play method every 800ms... it still wouldn't be guaranteed to be perfect though. I don't think the browser is natively capable of delivering reliable audio looping at this point.
setInterval(function(){
document.getElementById("loop").play();
}, 800);
Rumor has it the best way to do pull this off in the most gapless fashion to use to multiple audio tags and alternate between them.

Or check out this utility: http://www.compuphase.com/mp3/mp3loops.htm I used it successfully for my flash projects when music had to be looped without gaps, and 99% of the time it worked. It takes WAV as an input.
Basically it is a kind of front-end for LAME mp3 encoder, which uses such settings as to prevent the gaps appearing. It won't work on very short sound effects (less than 0.5 second I believe).
Afterward all you have to do is use:
var sound:Sound = new MySoundEffect();
sound.play(0, 1000);
and it will loop one thousand times.

Related

About attempting to sync audio and video

I've got a little side project going on using SDL2/SDL_mixer and a couple other sound libraries. I've been trying for a while now to synchronize my audio and video but haven't been able to get it anywhere near successfully. All new to this stuff so forgive the poorman's logic and coding. At first I thought to set the delay to SDL_Delay(30) after every frame, and then a few other numbers in that range. Not quite right. Then I tried doing it by getting Ticks. Where I would get the difference between current_ticks and last_ticks and set a delay if the delta between ticks was <=30 and set the delay to 30-delta. Still not quite right (by far). Hoping someone on here with more experience might guide me in the right direction. In regards to the video, it's a visualizer of course, seems like a popular beginners project.

The basic way you synchronize audio and video is that you choose one to use as a timer source and present the other according to that timer. The easiest is generally audio, but because it's generally buffered ahead, you need some method of measuring what time in the audio stream is actually coming out of the speakers. Once you get that, it's just a matter of waiting until the audio reaches the right time for the next video frame and displaying it.

Robotic Navigation using Kinect

Till now I have been able to create an application where the Kinect sensor is at one place. I have used speech recognition EmguCV (open cv) and Aforge.NET to help me process an image, learn and recognize objects. It all works fine but there is always scope for improvement and I am posing some problems: [Ignore the first three I want the answer for the fourth]
The frame rate is horrible. Its like 5 fps even though it should be like 30 fps. (This is WITHOUT all the processing) My application is running fine, it gets color as well as depth frames from the camera and displays it. Still the frame rate is bad. The samples run awesome, around 25 fps. Even though I ran the exact same code from the samples it wont just budge. :-( [There is no need for code, please tell me the possible problems.]
I would like to create a little robot on which the kinect and my laptop will be mounted on. I tried using the Mindstorms Kit but the lowtorque motors dont do the trick. Please tell me how will I achieve this.
How do I supply power on board? I know that the Kinect uses 12 volts for the motor. But it gets that from an AC adapter. [I would not like to cut my cable and replace it with a 12 volt battery]
The biggest question: How in this world will it navigate. I have done A* and flood-fill algorithms. I read this paper like a thousand times and I got nothing. I have the navigation algorithm in my mind but how on earth will it localize itself? [It should not use GPS or any kind of other sensors, just its eyes i.e. the Kinect]
Helping me will be Awesome. I am a newbie so please don't expect me to know everything. I have been up on the internet for 2 weeks with no luck.
Thanks A lot!

Localisation is a tricky task, as it depends on having prior knowledge of the environment in which your robot will be placed (i.e. a map of your house). While algorithms exist for simultaneous localisation and mapping, they tend to be domain-specific and as such not applicable to the general case of placing a robot in an arbitrary location and having it map its environment autonomously.
However, if your robot does have a rough (probabilistic) idea of what its environment looks like, Monte Carlo localisation is a good choice. On a high level, it goes something like:
Firstly, the robot should make a large number of random guesses (called particles) as to where it could possibly be within its known environment.
With each update from the sensor (i.e. after the robot has moved a short distance), it adjusts the probability that each of its random guesses is correct using a statistical model of its current sensor data. This can work especially well if the robot takes 360º sensor measurements, but this is not completely necessary.
This lecture by Andrew Davison at Imperial College London gives a good overview of the mathematics involved. (The rest of the course will most likely be very interesting to you as well, given what you are trying to create). Good luck!

Sound of a rolling ball

I'm looking for the most realistic way of playing sound of a rolling ball. Currently I'm using a Wav sample that I play over and over as long as the ball is moving - which just doesn't feel right.
I've been thinking about completely synthesizing the sound, which I know very little about (almost nothing), I'd be grateful for any tutorials/research materials/samples concerning synthesis of sound of a ball made of particular material rolling on surface made of another material. Also if this idea is completely wrong, please suggest another way of doing this.
Thanks!

I would guess that you'll get the biggest bang for your buck by doing a dynamic frequency adjustment on the sound that makes the playback frequency proportional to the velocity of the ball. I don't know what type of sound library you use, but most will support some variant of this.
For example, in FMOD you could use the Channel::setFrequency method. Ideally, you would compute your desired playback frequency based on your WAV's original sample frequency (Fo), the ball's current velocity (Vc), and the ball's 'ideal' velocity at which the default WAV sounds right (Vi). Something generally like:
F = Fo * ( Vc / Vi )
This will tend to break down as the ball gets farther away from the 'ideal' velocity. You might want to have several different WAVs that are appropriate for different speed ranges that you switch to at certain threshold velocities. Within each WAV's bracket, you'd do the same kind of frequency adjustment.
Another note: this is probably not something that is worth doing every frame. I'd guess that doing this more than 20 times per second would be a waste of time.
ADDENDUM: Playback frequency scaling like this can also be used for simulating the Doppler effect as well. Once you have your adjusted playback frequency, you'd perform another scale of the Frequency based on the velocity of the ball relative to the 'listener' (the camera).

Have you tried playing the sound forward, then playing it backward, and looping that? I use this trick graphically to creating repeating patterns. I don't know much about sound but it might work?

One approach might be to analyze the sound of a rolling ball, and decompose it into its component waveforms. Then you'd be able to generate your own wav file with synthesized waves.
You should be able to do this using an FFT on a sample of the sound.
One drawback is that the sound will likely sound synthesized - you'll have to add noise and such to make it sound more realistic. Getting it to sound real enough may be the hardest part.

I don't think you need the trouble to synthesize that. It would be way too hard to even sound convincing.
Depending on how your scene is, you could loop the sound foward/backwards and simulate a doppler effect applying a low pass filter and/or changing it's pitch.
By the way, freesoung.org is a great place for free samples. They are not professionally recorded but are a good starting point for manipulation. On the other hand, sound ideas has some great sample cds (they're actually industry standard) if you can find them on the cheap. You just have to search for which one has rolling ball sounds.

I really like the approach described in this SIGGRAPH paper:
http://www.cs.ubc.ca/~kvdoel/publications/foleyautomatic.pdf
It describes synthesizing the sound of a rock rolling in a wok (no, really :). The idea is to use modal synthesis (i.e. convolved impulse responses), and the results can be very convincing.
Here's a link to the video demo that goes with the paper:
http://www.cs.ubc.ca/~kvdoel/publications/foleyautomatic.mpeg
And here's a link to the JASS library (written by one of the authors), which was used to create the sound for the video:
http://www.cs.ubc.ca/~kvdoel/jass/jass.html
I'm not sure if you could make it run on a smart phone, but with an efficient enough convolution routine/approximation you might be able to do something interesting...

My question is 'why?' - do you see some benefit in this, or is it just for fun? Your question implies that you aren't happy with the wav you are using but I strongly believe that synthesising your own is going to sound far inferior.
If your wav sample doesn't sound right, I'd suggest try to find another sample. Synthesising a sound is not easy and is never going to sound as realistic as your sample.
Real time synthesis may require more resources for processing and computation. You may very well end up prerendering your synthesised sound into a wav file and performing a playback.
If you want to simulate the sound of different materials then you can use some DSP, or even simple tricks like slowing or speeding the wav playback. The simplest way is the prerender these in another application and store one copy of the file for each use.

Streaming a webcam from Silverlight 4 (Beta)

The new webcam stuff in Silverlight 4 is darned cool. By exposing it as a brush, it allows scenarios that are way beyond anything that Flash has.
At the same time, accessing the webcam locally seems like it's only half the story. Nobody buys a webcam so they can take pictures of themselves and make funny faces out of them. They buy a webcam because they want other people to see the resulting video stream, i.e., they want to stream that video out to the Internet, a lay Skype or any of the dozens of other video chat sites/applications. And so far, I haven't figured out how to do that with
It turns out that it's pretty simple to get a hold of the raw (Format32bppArgb formatted) bytestream, as demonstrated here.
But unless we want to transmit that raw bytestream to a server (which would chew up way too much bandwidth), we need to encode that in some fashion. And that's more complicated. MS has implemented several codecs in Silverlight, but so far as I can tell, they're all focused on decoding a video stream, not encoding it in the first place. And that's apart from the fact that I can't figure out how to get direct access to, say, the H.264 codec in the first place.
There are a ton of open-source codecs (for instance, in the ffmpeg project here), but they're all written in C, and they don't look easy to port to C#. Unless translating 10000+ lines of code that look like this is your idea of fun :-)
const int b_xy= h->mb2b_xy[left_xy[i]] + 3;
const int b8_xy= h->mb2b8_xy[left_xy[i]] + 1;
*(uint32_t*)h->mv_cache[list][cache_idx ]= *(uint32_t*)s->current_picture.motion_val[list][b_xy + h->b_stride*left_block[0+i*2]];
*(uint32_t*)h->mv_cache[list][cache_idx+8]= *(uint32_t*)s->current_picture.motion_val[list][b_xy + h->b_stride*left_block[1+i*2]];
h->ref_cache[list][cache_idx ]= s->current_picture.ref_index[list][b8_xy + h->b8_stride*(left_block[0+i*2]>>1)];
h->ref_cache[list][cache_idx+8]= s->current_picture.ref_index[list][b8_xy + h->b8_stride*(left_block[1+i*2]>>1)];
The mooncodecs folder within the Mono project (here) has several audio codecs in C# (ADPCM and Ogg Vorbis), and one video codec (Dirac), but they all seem to implement just the decode portion of their respective formats, as do the java implementations from which they were ported.
I found a C# codec for Ogg Theora (csTheora, http://www.wreckedgames.com/forum/index.php?topic=1053.0), but again, it's decode only, as is the jheora codec on which it's based.
Of course, it would presumably be easier to port a codec from Java than from C or C++, but the only java video codecs that I found were decode-only (such as jheora, or jirac).
So I'm kinda back at square one. It looks like our options for hooking up a webcam (or microphone) through Silverlight to the Internet are:
(1) Wait for Microsoft to provide some guidance on this;
(2) Spend the brain cycles porting one of the C or C++ codecs over to Silverlight-compatible C#;
(3) Send the raw, uncompressed bytestream up to a server (or perhaps compressed slightly with something like zlib), and then encode it server-side; or
(4) Wait for someone smarter than me to figure this out and provide a solution.
Does anybody else have any better guidance? Have I missed something that's just blindingly obvious to everyone else? (For instance, does Silverlight 4 somewhere have some classes I've missed that take care of this?)

I just received this response from Jason Clary on my blog:
Saw your post on Mike Taulty's blog about VideoSink/AudioSink in Silverlight 4 beta.
I thought I'd point out that VideoSink's OnSample gives you a single uncompressed 32bpp ARGB frame which can be copied straight into a WritableBitmap.
With that in hand grab FJCore, a jpeg codec in C#, and modify it to not output the JFIF header. Then just write them out one after the other and you've got yourself an Motion JPEG codec. RFC2435 explains how to stuff that into RTP packets for RTSP streaming.
Compressing PCM audio to ADPCM is fairly easy, as well, but I haven't found a ready-made implementation as yet. RFC3551 explains how to put either PCM or ADPCM into RTP packets.
It should also be reasonably easy to stuff MJPEG and PCM or ADPCM into an AVI file. MS has some decent docs on AVI's modified RIFF format and both MJPEG and ADPCM are widely supported codecs.
It's a start anyway.
Of course, once you've gone through all that trouble, the next Beta will probably come out with native support for compressing and streaming to WMS with the much better WMV codecs.
Thought I'd post it. It's the best suggestion I've seen so far.

I thought I'd let interested folks know the approach I actually took. I'm using CSpeex to encode the voice, but I wrote my own block-based video codec to encode the video. It divides each frame up into 16x16 blocks, determines which blocks have sufficiently changed to warrant transmitting, and then Jpeg-encodes the changed blocks using a heavily modified version of FJCore. (FJCore is generally well done, but it needed to be modified to not write the JFIF headers, and to speed up initialization of the various objects.) All of this is being passed up to a proprietary media server using a proprietary protocol roughly based on RTP.
With one stream up and four streams down at 144x176, I'm currently getting 5 frames per second, using a total of 474 Kbps (~82 Kbps / video stream + 32 Kbps / audio), and chewing up about 30% CPU on my dev box. The quality's not great, but it's acceptable for most video chat applications.
Since I posted my original question, there have been several attempts to implement a solution. Probably the best is at the SocketCoder website here (and here).
However, because the SocketCoder motion JPEG-style video codec translates the entirety of every frame rather than just the blocks that have changed, my assumption is that CPU and bandwidth requirements are going to be prohibitive for most applications.
Unfortunately, my own solution is going to have to remain proprietary for the foreseeable future :-(.
Edit 7/3/10: I just got permissions to share my modifications to the FJCore library. I've posted the project (without any sample code, unfortunately) here:
http://www.alanta.com/Alanta.Client.Media.Jpeg.zip
A (very rough) example of how to use it:
public void EncodeAsJpeg()
{
byte[][,] raster = GetSubsampledRaster();
var image = new Alanta.Client.Media.Jpeg.Image(colorModel, raster);
EncodedStream = new MemoryStream();
var encoder = new JpegFrameEncoder(image, MediaConstants.JpegQuality, EncodedStream);
encoder.Encode();
}
public void DecodeFromJpeg()
{
EncodedStream.Seek(0, SeekOrigin.Begin);
var decoder = new JpegFrameDecoder(EncodedStream, height, width, MediaConstants.JpegQuality);
var raster = decoder.Decode();
}
Most of my changes are around the two new classes JpegFrameEncoder (instead of JpegEncoder) and JpegFrameDecoder (instead of JpegDecoder). Basically, the JpegFrameEncoder writes the encoded frame without any JFIF headers, and the JpegFrameDecoder decodes the frame without expecting any JFIF headers to tell it what values to use (it assumes you'll share the values in some other, out-of-band manner). It also instantiates whatever objects it needs just once (as "static"), so that you can instantiate the JpegFrameEncoder and JpegFrameDecoder quickly, with minimal overhead. The pre-existing JpegEncoder and JpegDecoder classes should work pretty much the same as they always have, though I've only done a very little bit of testing to confirm that.
There are lots of things I'd like to improve about it (I don't like the static objects -- they should be instantiated and passed in separately), but it works well enough for our purposes at the moment. Hopefully it's helpful for someone else. I'll see if I can improve the code/documentation/sample code/etc. if I have time.

I'll add one other comment. I just heard today from a Microsoft contact that Microsoft is not planning to add any support for upstream audio and video encoding/streaming to Silverlight, so option #1 appears to be off the table, at least for right now. My guess is that figuring out support for this will be the community's responsibility, i.e., up to you and me.

Stop-Gap?
Would it be possible to use the Windows Media Encoder as a compression method for the raw video Silverlight provides? After capture to ISO Storage, encode w/ WME and send to the server via the WebClient. Two big issues are:
Requires a user to install the encoder
WME will no longer be supported
It seems like that might be a stop-gap solution until something better comes along. I haven't worked w/ WME before though so I don't know how feasible this would be. Thoughts?

Have you tried the new Expression 4 Encoders?
http://www.microsoft.com/expression/products/EncoderPro_Overview.aspx

Using Silverlight 2 for short audio caching

I'm attempting to use a large number of short sound samples in a game I'm creating in Silverlight 2. The samples are less than 2 seconds long.
I would prefer to load all the audio samples onto the canvas during the initualization. I have been adding the media element to the canvas and a generic list to manage it. So far, it appears to work.
When I play the sample the first time, it plays perfectly. If it has finished playing and I want to re-use the same element, it cuts off the first part of the sound. To play the sample again, I stop and play the media element.
Is there another method I should use the samples so that the audio is not clipped and good performance is obtained?

Also, it's probably a good idea to make sure that all of your audio samples are brought down to the client side initially. Depending on how you set it up, it's possible that the MediaElements are using their progressive download functionality to get the media files from the server. While there's nothing wrong with this per se (browser caching should be helping you out after the initial download), it does mean that you have to deal with the browser cache, and there are some potential issues there.
Possible steps to try:
Mark your audio files as "Content". This will get them balled up in the .xap.
Load your audio files into MemoryStreams (see Application.GetResourceStream method) and call MediaElement.SetSource().
HTH,
Erik

Some comments:
From MSDN:
Try to limit the number of MediaElement objects you have in your application at once. If you have over one hundred MediaElement objects in your application tree, regardless of whether they are playing concurrently or not, MediaFailed events may be raised. The way to work around this is to add MediaElement objects to the tree as they are needed and remove them when they are not.
You could try to seek to the start of the sample to reset the point currently being played before re-using it with:
mediaelement.Position = new TimeSpan();
See also MSDNs MediaElement.Position.

One techique you can use, although I'm not sure how well it will work in Silverlight, is create one large file with all of your samples joined together (probably with a half-second or so of silence between each). Figure out the timecode for each sample and seek the media element to that position and play. You'll only need as many media elements as simultaneous sounds you want to play.