On some audio files the value of MediaElement.NaturalDuration is less than the actual duration of the audio. When I open the file in Windows Media Player the duration is correct (also when I look at the properties of the file). Although the value of the NaturalDuration property is incorrect, the audio is played fully, but at some point the value of the Position property becomes greater than the value of the NaturalDuration property, which, as I understand, should never happen.
I have created a simple application to reproduce the problem: https://skydrive.live.com/redir?resid=ACF8BFD4384116CE!2908&authkey=!AG-wF6Ae-7EAYk8
The duration of the audio file used in the application is 00:02:54, but the value of the NaturalDuration property is 00:01:59.
Does anyone know why and if there is a workaround for this?
Thanks in advance for any help.
Ok, this is not an answer but some results of a short investigation that give some clues why it behaves like that and where those numbers come from (2:58 and 1:59). First look at this thread: Calculating the length of MP3 Frames in milliseconds
Two things that we will use from there:
1) Frame length (in ms) = (samples per frame / sample rate (in hz)) * 1000, and
Duration in sec = Frame length (in ms) * number of frames / 1000
2) There are some standards regarding number of samples for different MPEG versions:
Samples per frame:
MPEG Version 1
384, // Layer1
1152, // Layer2
1152 // Layer3
MPEG Version 2 & 2.5
384, // Layer1
1152, // Layer2
576 // Layer3
Now lets check in winamp what it says about files format info:
MPEG-2.5 layer 3
16 kbps, 2482 frames
Now if you take frames = 2482 and samples per frame = 576 (MPEG-2.5 layer 3) you'll get duration 2:58. But it looks like for some reason silverlight and iTunes uses samples per frame = 384 which gives us 1:59. Next step could be to check the real values of file's headers and if they are correct and it is possible to calculate correct duration - well than you could cook up some hack to get durations separately (from the server for example). But I'm pretty sure - that file has some defects (inconsistent headers and content) and some players can handle it, others - not.
Related
I'm using the H.264 library to compress a video frame by frame. It works, I can replay it back locally without any issue.
However, I need to send that video over the LAN and that LAN is rather busy already so I need to limit the size of each frame to a maximum of about 250Kb.
I use the following code to setup the parameters, but changing the bit rate values does not seem to have any effect on what the library does with the input frames:
x264_param_t param = {};
if(x264_param_default_preset(¶m, "faster", nullptr) < 0)
{
return -1;
}
param.i_csp = X264_CSP_I420;
param.i_width = 3840;
param.i_height = 2160;
param.i_keyint_max = static_cast<int>(f_frame_header.f_fps);
param.i_threads = X264_THREADS_AUTO;
param.b_vfr_input = 0;
param.b_repeat_headers = 1;
param.b_annexb = 1;
// the following three parameters are the ones I tried to change with no results
param.rc.i_bitrate = 100000;
param.rc.i_vbv_max_bitrate = 100000;
param.rc.i_vbv_buffer_size = 125000;
if(x264_param_apply_profile(¶m, "high") < 0)
{
return -1;
}
...enter loop reading frames and compressing them...
Changing the i_bitrate, i_vbv_max_bitrate and i_vbv_buffer_size parameters seems to have absolutely no effect on the size of the resulting frames. I still get some frames over 500Kb and in many even, rather large frames one after the other as the following sizes show:
20264
358875
218429
20728
25215
310230
36127
9077
29785
341541
222778
23542
21356
276772
25339
32459
421036
11179
6172
286070
193849
What I would need is the largest frame to be around 250,000 at its maximum. Now I understand that once in a while it go over a bit, but not 2×. That's just too much for my current available bandwidth.
What am I doing wrong in the parameters setup above?
I've seen this command line:
ffmpeg -i input -c:v libx264 -b:v 2M -maxrate 2M -bufsize 1M output.mp4
which would suggest that what I'm doing above should work (I tried all sorts of values including the ones one that command line). Yet the frame size does not really change between my runs.
I tried with a blur applied to each frame to see whether it work help. Yes! It did. The result is a movie which is 2.44 times smaller than the original.
To load each JPEG image from the original, I use ImageMagick++ (in C++), so I just do the following blur on each image:
image.blur(0.0, 5.0);
and that took about 10 hours total (without the blur the same processing took about 40 minutes) but it was worth it since in the end the compressed movie went from 1,293,272,023 bytes to only 529,556,265 bytes (2.44218 times smaller). The blur added about 3.3 seconds of processing per frame and there are a little over 11,000 frames in the original.
Note: I used 5.0 for the blur because I have 4K images and although I can see a sharp difference when I look at one frame, when playing back the resulting movie, I don't notice the final blur. If you have smaller images, you probably want to use a smaller number. It looks like many people use a blur of just 0.05 and already have good results in compression ratios.
In C, use the BlurImage() function:
Image *BlurImage(const Image *image,const double radius,
const double sigma,ExceptionInfo *exception)
Here are some references about using a blur to further compress JPEG images as it helps eliminates sharp edges which do not compress well in the JPEG format (as sharp edge are not as natural):
Recommendation for compressing JPG files with ImageMagick
How do I reduce the file size of an image? (search on "blur" to find the section)
Could I blur an image to dramatically reduce the file size?
There is a very good chance that I am going down a pointless path on this, so I apologize if this is a waste of time. I have been trying to write uncompressed video to an FLV file, and I am not sure whether it is possible.
According to Wikipedia, a valid video encoding option is 0, which indicates an "RGB" video encoding: https://en.wikipedia.org/wiki/Flash_Video#Packets. However, I don't see any mention of this Codec ID option in Adobe's documentation; neither "Video File Format Specification Version 10" nor "Adobe Flash Video File Format Specification Version 10.1".
I proceeded under the assumption that a 0/RGB Codec ID is allowed. I hard-coded an array of unsigned char in C and used fwrite to write the following Double/Number metadata to a new, binary FLV file (which admittedly, I am assuming I wrote correctly):
duration: 4 (seconds)
width: 16 (pixels)
height: 16 (pixels)
videodatarate: 6 (Kbps)
framerate: 1 (fps)
videocodecid: 0
filesize: 3323 (bytes)
I then added 4 VIDEODATA tags, 1 for each RGB frame I was hoping to write. Their timestamps are 0, 1000, 2000, and 3000 (milliseconds). All four of them have a 769-byte payload: the first byte to specify it is a keyframe with a Codec ID of 0, and the remaining 768 are to represent a 16x16x3 (RGB) image. I wrote 255/0xFF for all values in hopes of seeing a small, white screen appear for 4 seconds.
When that did not play correctly in VLC Media Player, as I feared, I tried using RGBA colors for each frame. I also changed the videodatarate and filesize metadata to Number values 8 (Kbps) and 4347 (bytes) respectively.
Unfortunately, this did not play in VLC Media Player either. I was wondering if anyone knew for certain whether uncompressed video in an FLV file is possible? If so, I was curious what format the video data should be in (RGB, RGBA, multiple VIDEODATA tags, just one VIDEODATA tag, etc.)?
My C code is mostly one, giant array of unsigned char, but if anyone would like to see it, I can try adding it. Any advice is greatly appreciated.
Thank you,
Mitchell A
As per SirDarius, "the video encoding types listed in the Wikipedia page do not come from an official source. I would not recommend relying on those." This makes sense given that the FLV Format documentation from Adobe itself makes no mention of an uncompressed, RGB option for video encoding.
I was holding out hope that Wikipedia editors and other people knew of some undocumented easter egg in the FLV format, but I'm now convinced that's not the case.
"...The FLV Format documentation from Adobe itself makes no mention of an uncompressed, RGB option for video encoding."
For RGB (raw bitmap data) you must use theScreen 1 codec (id=3).
Strangely, it's hidden in the SWF Format documentation (not the FLV Format docs).
See Chapter 14 (page 204) which is the Video section...
You want specifically page 208 for the Screen Video codec to be explained.
Check this example code (AS3) of encoding RGB into Screen Video.
Apply the logic, especially function videoData(), which could be adjusted to read pixels uints (via some getPixel type call) or just read from an Array.
Example:
for (var x2:int = 0; x2 < xLimit; x2++)
{
var px:int = (x1 * blockWidth) + x2;
var py:int = frameHeight - ((y1 * blockHeight) + y2); // (flv's save image from bottom to top)
var p:uint = YOUR_INPUT_BITMAP.getPixel(px, py); // sample a pixel's RGB (3-bytes unsigned int)
//# IF reading from Pixel's uint value
block.writeByte( p & 0xff ); // blue
block.writeByte( p >> 8 & 0xff ); // green
block.writeByte( p >> 16 ); // red
//# ELSE IF reading from Array of R-G-B values(FLV writes in BGR format)
block.writeByte( myRGB_Array[x+2] ); // blue
block.writeByte( myRGB_Array[x+1] ); // green
block.writeByte( myRGB_Array[x] ); // red
}
I am learning machine learning using TensorFlow. I have been through a couple of tutorials but I still have a hard time trying to find what are the good ways of training a model. Recently I implemented a CNN model I found in the litterature. The model must take a crop of a certain size centered on a given pixel and predict the label of this pixel. It does that for each pixel of the image. I used:
classifier = tf.learn.Estimator(model_fn=cnn_model_fn, model_dir="./cnn")
with cnn_model_fn beeing a function I implemented.
For each training image, we take 3000 crops randomly, so I can't load all theses images and their crops to memory. The way I found is by loading one image at a time, extract the 3000 crops and then call classifier.fit() to train on the 3000 crops. Then loop for each image in my dataset.
for i in range(len(filenames)):
...
image = misc.imread(filenames[i])
labels = misc.imread(groundTruth[i]) #labels for each pixels
input_classifier = preprocess(image,...) #crops 3000 images in image and do other things
input_labels = preprocess_labels(labels, ...) #take the corresponding 3000 labels
classifier.fit(x = input_classifier,
y = input_labels,
batch_size = 30
steps = 100)
It worked fine for 100 images, but if I try on the whole dataset (2000 images), it always stops and give an error of ResourceExhausted.
...
[everything goes well]
...
iteration :227/2000
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
INFO:tensorflow:Create CheckpointSaverHook.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating
TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus
id: 0000:01:00.0)
Traceback (most recent call last):
File "train-cnn.py", line 78, in <module>
classifier.fit(x= input_classifier, y=input_labels,batch_size=30, steps=100)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
...
...
...
tensorflow.python.framework.errors_impl.ResourceExhaustedError: cnn/graph.pbtxt.tmp32bcc6311c164c29b91177d17d05d669
I don't see why it gets OOM... I have suspicions that it is because of the way I call fit() in loop. After each fit(), a ckpt is saved and it must be restored right after to train on the next image. So is it a bad way to train a model?
running estimator.fit in a loop with smaller steps is not a good idea. I would put all input logic into an input_fn. then run estimator.fit only once with more steps.
An example of reading data from different files can be found here: tf.contrib.learn.read_batch_examples
This problem deals with HLS playlists, and it may help to understand how HLS works before diving in.
HLS (HTTP Live Streaming) is a playlist, similar to that of an iTunes .m3u playlist. HLS takes a video file (such as an .mpg), and splits it into multiple, equal-length segment files (.ts — stands for Transport Stream). For simplicity, you can think of these segment files as chunks of the original .mpg file, which are to be played consecutively.
There are many ways to name these segment files. Sometimes you have…
file_segment_0.ts
file_segment_1.ts
file_segment_2.ts
But sometimes you have something like,
22/51/04.ts
22/51/09.ts
22/51/14.ts
(H/M/S)
The client (such as VLC) knows how to handle these files. It’s up to the producer to decide how they want to name their files.
An HLS playlist can also be “VOD” (Video On-Demand) or “Live”. If the playlist is “Live”, the client will jump to the current live time. Inside the playlist, a header will define the program’s (in terms of the streamed event) start datetime, like so:
#EXT-X-PROGRAM-DATE-TIME:2016-09-16T21:59:09+00:00
The playlist will also tell the client how far apart the segmented files are, in terms of seconds.
My issue falls with H/M/S format. You can find an example playlist here: http://pastebin.com/raw/rS84YJwN
The segments are 5.005s apart, as defined by #EXTINF:5.005.
At first glance, it doesn’t look so bad. Start at the EXT-X-PROGRAM-DATE-TIME, increment by 5.005, round accordingly, and format the date as H/M/S.ts
But there’s a bigger question: Why are there sometimes 207 segments between EXT-X-PROGRAM-DATE-TIME + 5.005, and why are there sometimes 196 segments between EXT-X-PROGRAM-DATE-TIME + 5.005?
My math tells me that I will increment by 6 (instead of 5) every 200 segments, which I can calculate as true with some quick and dirty ruby code[0], which produces this output:
139 22/14/40.ts
339 22/31/21.ts
539 22/48/02.ts
746 23/05/18.ts
946 23/21/59.ts
1146 23/38/40.ts
1346 23/55/21.ts
1402 00/00/01.ts
1542 00/11/42.ts
1742 00/28/23.ts
1942 00/45/04.ts
2142 01/01/45.ts
2342 01/18/26.ts
2542 01/35/07.ts
2742 01/51/48.ts
2942 02/08/29.ts
3142 02/25/10.ts
3342 02/41/51.ts
3542 02/58/32.ts
3749 03/15/48.ts
3949 03/32/29.ts
4149 03/49/10.ts
Where 139 is a line number, and 22/14/40.ts is the segment file.
My question is this: What’s going on here, and how can I reproduce it accurately? I obviously don’t/won’t have access to the actual input video file, and I need to rebuild these playlist files.
[0]
require 'date'
file = `curl 'http://pastebin.com/raw/rS84YJwN'`
start_date = DateTime.parse(file.scan(%r{#EXT-X-PROGRAM-DATE-TIME:(.*)$}).to_a.first.first)
lines = file.split("\n").select { |line| !line.index('.ts').nil? }
date_lines = []
lines.each_with_index do |line, i|
str = line.gsub("/", ':').split(".ts")[0]
date = "#{start_date.to_date} #{str}"
date_obj = DateTime.parse(date)
date_lines << date_obj
next unless i > 0
diff = date_obj.to_time - date_lines[i - 1].to_time
if diff != 5.0
puts "#{i} #{line}"
end
end
I am writing a software which processes audio files. I am using libsndfile library for reading wave file data, and I come across a doubt that wasn't solved by their documentation: what is the difference between functions that read items and functions that read frames? Or, in other words, am I getting the same results if I interchange both sf_read_short and sf_readf_short?
I have read in some questions that an audio frame equals a single sample, so I thought that what libsndfile calls items might be the same thing. During my tests they seemed to be the same.
I was concerned too and found the answer.
Q12 : I'm looking at sf_read*. What are items? What are frames?
An item is a single sample of the data type you are reading; ie a
single short value for sf_read_short or a single float for
sf_read_float. For a sound file with only one channel, a frame is the
same as a item (ie a single sample) while for multi channel sound
files, a single frame contains a single item for each channel.
Here are two simple, correct examples, both of which are assumed to be
working on a stereo file, first using items:
#define CHANNELS 2
short data [CHANNELS * 100] ;
sf_count items_read = sf_read_short (file, data, 200) ;
assert (items_read == 200) ;
and now readng the exact same amount of data using frames:
#define CHANNELS 2
short data [CHANNELS * 100] ;
sf_count frames_read = sf_readf_short (file, data, 100) ;
assert (frames_read == 100) ;
This is a copy&paste from:
libsndfile FAQ, question 12.