How does Ludwig encode images - ludwig

I’m looking to understand how Ludwig encodes images. Does it run the images through a pretrained model without running a loss or does it run a loss? If so, what type of loss is ran for a large feature set?

All the documentation regarding image pre-processing and encoding can be found here.
In summary:
Currently there are two encoders supported for images: Convolutional Stack Encoder and ResNet encoder which can be set by setting encoder parameter to stacked_cnn or resnet in the input feature dictionary in the model definition (stacked_cnn is the default one).
Each of the above encoders have configurable hyper-parameters. It does not run them through a pre-trained model as that would defeat the purpose - you are training your own model with your own data.

Related

Non Redundant Image Extraction From Video

I am collecting data for a project. The data collection is done by recording videos of the subjects and the environment. However, while training the network, I would not want to train it with all the images collected in the video sequence.
The main objective is to not train the network with redundant images. The video sequence collected at 30 frames/sec can have redundant images (images that are very similar) within the short intervals. T(th) frame and (T+1)th frame can be similar.
Can someone suggest ways to extract only the images that can be useful for training ?
Update #2: Further resources,
https://github.com/JohannesBuchner/imagehash
https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
https://www.pyimagesearch.com/2020/04/20/detect-and-remove-duplicate-images-from-a-dataset-for-deep-learning/
Update #1: You can use this repo to calculate similarity between given images. https://github.com/quickgrid/image-similarity**
If frames with certain objects(e.g., vehicle, device) are important, then use pretrained object detectors if available, to extract important frames.
Next, use a similarity method to remove similar images in nearby frames. Until a chosen threshold is exceeded keep removing nearby N frames.
This link should be helpful in finding right method for your case,
https://datascience.stackexchange.com/questions/48642/how-to-measure-the-similarity-between-two-images
This repository below should help implement the idea with few lines of code. It uses CNN to extract features then calculates there cosine distance as mentioned there.
https://github.com/ryanfwy/image-similarity

Quantization for object detection

How quantization for object detection models varies from that of classification models?
Since detection models need to handle the bbox coordinates(multiple objects in an input),there must be some scaling trick in quantization.
You can look at SSD of Tensorflow model in the model zoo API here
SSD is single shot detection model, it takes the features of detection when looking overthe image to provide the classification score according to label. This type of application is very useful for multiple types of object detection.

Feeding multiple sample rates to the same buffer source : ffmpeg filters

I have a remote source of PCM audio samples which keeps changing the sample rate. It sometimes supplies 16Khz and the later 48Khz depending upon the bandwidth. I would like to convert them to FLTP through a filter, before feeding to an audio decoder. When I do that I get the error "Changing audio frame properties on the fly is not supported. [Invalid argument]".
Can someone please suggest a way this can be done?
Is it possible to create a filter graph with multiple buffer sources but only one sink?
I used swr_convert_frame() keeping an array of SwrContext * for different input sample rates.

Recording HD from webcam using Expression encoder sdk in WPF

I am trying to record a stream from a webcam using Expression Encoder 4 SDK in WPF I can capture the video & audio streams and record these to disk however they are only recording at a base resolution of 320x240 the webcam is capable of capturing at 720p, how can I record at this resolution. Any help would be appreciated, I have been pulling my hair out trying to solve this all week.
Know this is a bit late but all questions need answers:
These might be a possible solution:
Check to see if your camera has it's own settings on the camera or comes with an installation disk.
for the expression encoder 4 put the video profile quality to max.
Good luck. If you are still around tell me, how it goes.
to change the "size" you can use the following line :
LiveJob.OutputFormat.VideoProfile.Streams[0].Size = new Size(1280,1080)
Or whatever you want it to be.
Encoder also offers a setting page that you can use.
That's what I did and then after setting the outputsize you can do that :
currentJob.OutputFormat.VideoProfile.Streams[0].Size = ((LiveSource)LiveDeviceSource).CropRect.Size;
Only 1 small limitation, you can't change the size while it's recording if you are publishing the source.

Streaming a webcam from Silverlight 4 (Beta)

The new webcam stuff in Silverlight 4 is darned cool. By exposing it as a brush, it allows scenarios that are way beyond anything that Flash has.
At the same time, accessing the webcam locally seems like it's only half the story. Nobody buys a webcam so they can take pictures of themselves and make funny faces out of them. They buy a webcam because they want other people to see the resulting video stream, i.e., they want to stream that video out to the Internet, a lay Skype or any of the dozens of other video chat sites/applications. And so far, I haven't figured out how to do that with
It turns out that it's pretty simple to get a hold of the raw (Format32bppArgb formatted) bytestream, as demonstrated here.
But unless we want to transmit that raw bytestream to a server (which would chew up way too much bandwidth), we need to encode that in some fashion. And that's more complicated. MS has implemented several codecs in Silverlight, but so far as I can tell, they're all focused on decoding a video stream, not encoding it in the first place. And that's apart from the fact that I can't figure out how to get direct access to, say, the H.264 codec in the first place.
There are a ton of open-source codecs (for instance, in the ffmpeg project here), but they're all written in C, and they don't look easy to port to C#. Unless translating 10000+ lines of code that look like this is your idea of fun :-)
const int b_xy= h->mb2b_xy[left_xy[i]] + 3;
const int b8_xy= h->mb2b8_xy[left_xy[i]] + 1;
*(uint32_t*)h->mv_cache[list][cache_idx ]= *(uint32_t*)s->current_picture.motion_val[list][b_xy + h->b_stride*left_block[0+i*2]];
*(uint32_t*)h->mv_cache[list][cache_idx+8]= *(uint32_t*)s->current_picture.motion_val[list][b_xy + h->b_stride*left_block[1+i*2]];
h->ref_cache[list][cache_idx ]= s->current_picture.ref_index[list][b8_xy + h->b8_stride*(left_block[0+i*2]>>1)];
h->ref_cache[list][cache_idx+8]= s->current_picture.ref_index[list][b8_xy + h->b8_stride*(left_block[1+i*2]>>1)];
The mooncodecs folder within the Mono project (here) has several audio codecs in C# (ADPCM and Ogg Vorbis), and one video codec (Dirac), but they all seem to implement just the decode portion of their respective formats, as do the java implementations from which they were ported.
I found a C# codec for Ogg Theora (csTheora, http://www.wreckedgames.com/forum/index.php?topic=1053.0), but again, it's decode only, as is the jheora codec on which it's based.
Of course, it would presumably be easier to port a codec from Java than from C or C++, but the only java video codecs that I found were decode-only (such as jheora, or jirac).
So I'm kinda back at square one. It looks like our options for hooking up a webcam (or microphone) through Silverlight to the Internet are:
(1) Wait for Microsoft to provide some guidance on this;
(2) Spend the brain cycles porting one of the C or C++ codecs over to Silverlight-compatible C#;
(3) Send the raw, uncompressed bytestream up to a server (or perhaps compressed slightly with something like zlib), and then encode it server-side; or
(4) Wait for someone smarter than me to figure this out and provide a solution.
Does anybody else have any better guidance? Have I missed something that's just blindingly obvious to everyone else? (For instance, does Silverlight 4 somewhere have some classes I've missed that take care of this?)
I just received this response from Jason Clary on my blog:
Saw your post on Mike Taulty's blog about VideoSink/AudioSink in Silverlight 4 beta.
I thought I'd point out that VideoSink's OnSample gives you a single uncompressed 32bpp ARGB frame which can be copied straight into a WritableBitmap.
With that in hand grab FJCore, a jpeg codec in C#, and modify it to not output the JFIF header. Then just write them out one after the other and you've got yourself an Motion JPEG codec. RFC2435 explains how to stuff that into RTP packets for RTSP streaming.
Compressing PCM audio to ADPCM is fairly easy, as well, but I haven't found a ready-made implementation as yet. RFC3551 explains how to put either PCM or ADPCM into RTP packets.
It should also be reasonably easy to stuff MJPEG and PCM or ADPCM into an AVI file. MS has some decent docs on AVI's modified RIFF format and both MJPEG and ADPCM are widely supported codecs.
It's a start anyway.
Of course, once you've gone through all that trouble, the next Beta will probably come out with native support for compressing and streaming to WMS with the much better WMV codecs.
Thought I'd post it. It's the best suggestion I've seen so far.
I thought I'd let interested folks know the approach I actually took. I'm using CSpeex to encode the voice, but I wrote my own block-based video codec to encode the video. It divides each frame up into 16x16 blocks, determines which blocks have sufficiently changed to warrant transmitting, and then Jpeg-encodes the changed blocks using a heavily modified version of FJCore. (FJCore is generally well done, but it needed to be modified to not write the JFIF headers, and to speed up initialization of the various objects.) All of this is being passed up to a proprietary media server using a proprietary protocol roughly based on RTP.
With one stream up and four streams down at 144x176, I'm currently getting 5 frames per second, using a total of 474 Kbps (~82 Kbps / video stream + 32 Kbps / audio), and chewing up about 30% CPU on my dev box. The quality's not great, but it's acceptable for most video chat applications.
Since I posted my original question, there have been several attempts to implement a solution. Probably the best is at the SocketCoder website here (and here).
However, because the SocketCoder motion JPEG-style video codec translates the entirety of every frame rather than just the blocks that have changed, my assumption is that CPU and bandwidth requirements are going to be prohibitive for most applications.
Unfortunately, my own solution is going to have to remain proprietary for the foreseeable future :-(.
Edit 7/3/10: I just got permissions to share my modifications to the FJCore library. I've posted the project (without any sample code, unfortunately) here:
http://www.alanta.com/Alanta.Client.Media.Jpeg.zip
A (very rough) example of how to use it:
public void EncodeAsJpeg()
{
byte[][,] raster = GetSubsampledRaster();
var image = new Alanta.Client.Media.Jpeg.Image(colorModel, raster);
EncodedStream = new MemoryStream();
var encoder = new JpegFrameEncoder(image, MediaConstants.JpegQuality, EncodedStream);
encoder.Encode();
}
public void DecodeFromJpeg()
{
EncodedStream.Seek(0, SeekOrigin.Begin);
var decoder = new JpegFrameDecoder(EncodedStream, height, width, MediaConstants.JpegQuality);
var raster = decoder.Decode();
}
Most of my changes are around the two new classes JpegFrameEncoder (instead of JpegEncoder) and JpegFrameDecoder (instead of JpegDecoder). Basically, the JpegFrameEncoder writes the encoded frame without any JFIF headers, and the JpegFrameDecoder decodes the frame without expecting any JFIF headers to tell it what values to use (it assumes you'll share the values in some other, out-of-band manner). It also instantiates whatever objects it needs just once (as "static"), so that you can instantiate the JpegFrameEncoder and JpegFrameDecoder quickly, with minimal overhead. The pre-existing JpegEncoder and JpegDecoder classes should work pretty much the same as they always have, though I've only done a very little bit of testing to confirm that.
There are lots of things I'd like to improve about it (I don't like the static objects -- they should be instantiated and passed in separately), but it works well enough for our purposes at the moment. Hopefully it's helpful for someone else. I'll see if I can improve the code/documentation/sample code/etc. if I have time.
I'll add one other comment. I just heard today from a Microsoft contact that Microsoft is not planning to add any support for upstream audio and video encoding/streaming to Silverlight, so option #1 appears to be off the table, at least for right now. My guess is that figuring out support for this will be the community's responsibility, i.e., up to you and me.
Stop-Gap?
Would it be possible to use the Windows Media Encoder as a compression method for the raw video Silverlight provides? After capture to ISO Storage, encode w/ WME and send to the server via the WebClient. Two big issues are:
Requires a user to install the encoder
WME will no longer be supported
It seems like that might be a stop-gap solution until something better comes along. I haven't worked w/ WME before though so I don't know how feasible this would be. Thoughts?
Have you tried the new Expression 4 Encoders?
http://www.microsoft.com/expression/products/EncoderPro_Overview.aspx

Resources