I am writing a software which processes audio files. I am using libsndfile library for reading wave file data, and I come across a doubt that wasn't solved by their documentation: what is the difference between functions that read items and functions that read frames? Or, in other words, am I getting the same results if I interchange both sf_read_short and sf_readf_short?
I have read in some questions that an audio frame equals a single sample, so I thought that what libsndfile calls items might be the same thing. During my tests they seemed to be the same.
I was concerned too and found the answer.
Q12 : I'm looking at sf_read*. What are items? What are frames?
An item is a single sample of the data type you are reading; ie a
single short value for sf_read_short or a single float for
sf_read_float. For a sound file with only one channel, a frame is the
same as a item (ie a single sample) while for multi channel sound
files, a single frame contains a single item for each channel.
Here are two simple, correct examples, both of which are assumed to be
working on a stereo file, first using items:
#define CHANNELS 2
short data [CHANNELS * 100] ;
sf_count items_read = sf_read_short (file, data, 200) ;
assert (items_read == 200) ;
and now readng the exact same amount of data using frames:
#define CHANNELS 2
short data [CHANNELS * 100] ;
sf_count frames_read = sf_readf_short (file, data, 100) ;
assert (frames_read == 100) ;
This is a copy&paste from:
libsndfile FAQ, question 12.
Related
I'm using the H.264 library to compress a video frame by frame. It works, I can replay it back locally without any issue.
However, I need to send that video over the LAN and that LAN is rather busy already so I need to limit the size of each frame to a maximum of about 250Kb.
I use the following code to setup the parameters, but changing the bit rate values does not seem to have any effect on what the library does with the input frames:
x264_param_t param = {};
if(x264_param_default_preset(¶m, "faster", nullptr) < 0)
{
return -1;
}
param.i_csp = X264_CSP_I420;
param.i_width = 3840;
param.i_height = 2160;
param.i_keyint_max = static_cast<int>(f_frame_header.f_fps);
param.i_threads = X264_THREADS_AUTO;
param.b_vfr_input = 0;
param.b_repeat_headers = 1;
param.b_annexb = 1;
// the following three parameters are the ones I tried to change with no results
param.rc.i_bitrate = 100000;
param.rc.i_vbv_max_bitrate = 100000;
param.rc.i_vbv_buffer_size = 125000;
if(x264_param_apply_profile(¶m, "high") < 0)
{
return -1;
}
...enter loop reading frames and compressing them...
Changing the i_bitrate, i_vbv_max_bitrate and i_vbv_buffer_size parameters seems to have absolutely no effect on the size of the resulting frames. I still get some frames over 500Kb and in many even, rather large frames one after the other as the following sizes show:
20264
358875
218429
20728
25215
310230
36127
9077
29785
341541
222778
23542
21356
276772
25339
32459
421036
11179
6172
286070
193849
What I would need is the largest frame to be around 250,000 at its maximum. Now I understand that once in a while it go over a bit, but not 2×. That's just too much for my current available bandwidth.
What am I doing wrong in the parameters setup above?
I've seen this command line:
ffmpeg -i input -c:v libx264 -b:v 2M -maxrate 2M -bufsize 1M output.mp4
which would suggest that what I'm doing above should work (I tried all sorts of values including the ones one that command line). Yet the frame size does not really change between my runs.
I tried with a blur applied to each frame to see whether it work help. Yes! It did. The result is a movie which is 2.44 times smaller than the original.
To load each JPEG image from the original, I use ImageMagick++ (in C++), so I just do the following blur on each image:
image.blur(0.0, 5.0);
and that took about 10 hours total (without the blur the same processing took about 40 minutes) but it was worth it since in the end the compressed movie went from 1,293,272,023 bytes to only 529,556,265 bytes (2.44218 times smaller). The blur added about 3.3 seconds of processing per frame and there are a little over 11,000 frames in the original.
Note: I used 5.0 for the blur because I have 4K images and although I can see a sharp difference when I look at one frame, when playing back the resulting movie, I don't notice the final blur. If you have smaller images, you probably want to use a smaller number. It looks like many people use a blur of just 0.05 and already have good results in compression ratios.
In C, use the BlurImage() function:
Image *BlurImage(const Image *image,const double radius,
const double sigma,ExceptionInfo *exception)
Here are some references about using a blur to further compress JPEG images as it helps eliminates sharp edges which do not compress well in the JPEG format (as sharp edge are not as natural):
Recommendation for compressing JPG files with ImageMagick
How do I reduce the file size of an image? (search on "blur" to find the section)
Could I blur an image to dramatically reduce the file size?
I am working on a project,in which I need to extract data from the device: InertialUnit.
I get a single value in real time, but I need data for the first 10 s and in 1 ms increments, or all the data for the entire cycle of the device. Please help me implement this if possible.
Webots controllers are like any other programs, so you can easily get the values of the inertial unit and save them in a file at each step.
Here is a very simple example in Python:
from controller import Robot
robot = Robot()
inertial_unit = robot.getInertialUnit('inertial unit')
inertial_unit.enable(10)
while robot.step(10) != -1:
values = inertial_unit.getValues()
with open('values.txt','a') as f:
f.write('\n'.join(values))
There is a very good chance that I am going down a pointless path on this, so I apologize if this is a waste of time. I have been trying to write uncompressed video to an FLV file, and I am not sure whether it is possible.
According to Wikipedia, a valid video encoding option is 0, which indicates an "RGB" video encoding: https://en.wikipedia.org/wiki/Flash_Video#Packets. However, I don't see any mention of this Codec ID option in Adobe's documentation; neither "Video File Format Specification Version 10" nor "Adobe Flash Video File Format Specification Version 10.1".
I proceeded under the assumption that a 0/RGB Codec ID is allowed. I hard-coded an array of unsigned char in C and used fwrite to write the following Double/Number metadata to a new, binary FLV file (which admittedly, I am assuming I wrote correctly):
duration: 4 (seconds)
width: 16 (pixels)
height: 16 (pixels)
videodatarate: 6 (Kbps)
framerate: 1 (fps)
videocodecid: 0
filesize: 3323 (bytes)
I then added 4 VIDEODATA tags, 1 for each RGB frame I was hoping to write. Their timestamps are 0, 1000, 2000, and 3000 (milliseconds). All four of them have a 769-byte payload: the first byte to specify it is a keyframe with a Codec ID of 0, and the remaining 768 are to represent a 16x16x3 (RGB) image. I wrote 255/0xFF for all values in hopes of seeing a small, white screen appear for 4 seconds.
When that did not play correctly in VLC Media Player, as I feared, I tried using RGBA colors for each frame. I also changed the videodatarate and filesize metadata to Number values 8 (Kbps) and 4347 (bytes) respectively.
Unfortunately, this did not play in VLC Media Player either. I was wondering if anyone knew for certain whether uncompressed video in an FLV file is possible? If so, I was curious what format the video data should be in (RGB, RGBA, multiple VIDEODATA tags, just one VIDEODATA tag, etc.)?
My C code is mostly one, giant array of unsigned char, but if anyone would like to see it, I can try adding it. Any advice is greatly appreciated.
Thank you,
Mitchell A
As per SirDarius, "the video encoding types listed in the Wikipedia page do not come from an official source. I would not recommend relying on those." This makes sense given that the FLV Format documentation from Adobe itself makes no mention of an uncompressed, RGB option for video encoding.
I was holding out hope that Wikipedia editors and other people knew of some undocumented easter egg in the FLV format, but I'm now convinced that's not the case.
"...The FLV Format documentation from Adobe itself makes no mention of an uncompressed, RGB option for video encoding."
For RGB (raw bitmap data) you must use theScreen 1 codec (id=3).
Strangely, it's hidden in the SWF Format documentation (not the FLV Format docs).
See Chapter 14 (page 204) which is the Video section...
You want specifically page 208 for the Screen Video codec to be explained.
Check this example code (AS3) of encoding RGB into Screen Video.
Apply the logic, especially function videoData(), which could be adjusted to read pixels uints (via some getPixel type call) or just read from an Array.
Example:
for (var x2:int = 0; x2 < xLimit; x2++)
{
var px:int = (x1 * blockWidth) + x2;
var py:int = frameHeight - ((y1 * blockHeight) + y2); // (flv's save image from bottom to top)
var p:uint = YOUR_INPUT_BITMAP.getPixel(px, py); // sample a pixel's RGB (3-bytes unsigned int)
//# IF reading from Pixel's uint value
block.writeByte( p & 0xff ); // blue
block.writeByte( p >> 8 & 0xff ); // green
block.writeByte( p >> 16 ); // red
//# ELSE IF reading from Array of R-G-B values(FLV writes in BGR format)
block.writeByte( myRGB_Array[x+2] ); // blue
block.writeByte( myRGB_Array[x+1] ); // green
block.writeByte( myRGB_Array[x] ); // red
}
Currently I am sending the UART the strings I want to log and reading it on the host with any terminal.
In order to reduce the logging time and hopefully the image size as well (my flash is tiny), I figured out that the strings are unused in the embedded system, so why storing them on the flash?
I want to implement a server, whom I can send a hashed-key of any string (for example - it's ROM address) and the string will be output to file or screen.
My questions are:
How to create the key2string converter out of the image file (the OS is CMX, but can be answered generally)
Is there a recomended way to generate image, that will know the strings addresses but will exclude them from ROM?
Is there a known generic (open-source or other) that implemented a similar logger?
Thanks
Rather than holding hard-coded strings, then trying to hash the answers and sent it via a UART, then somehow remove the strings from the resulting image, I suggest the following.
Just send an index for an error code. The PC side can look up that index and determine what the string is for that condition. If you want the device code to be more clear, the index can be an enumeration.
For example:
enum errorStrings
{
ES_valueOutOfLimits = 1,
ES_wowItsGettingWarm = 2,
ES_randomError = 3,
ES_passwordFailure = 4
};
So, if you were sending data to the UART via printf, you could do the following:
printf("%d\n",(int)ES_wowItsGettingWarm);
Then your PC software just needs to decode the "2" that comes across the UART back into a useful string of "Wow it's getting warm."
This keeps the firmware small, but you need to manually keep the file containing the enum and the file with the strings in sync.
My solution is sending file name and line (which should be 14-20 Byte) and having a source parser on the server side, which will generate map of the actual texts. This way the actual code will contain no "format" strings, but single "filename" string for each file. Furthermore, file names can be easily replaced with enum (unlike replacing every string in the code) to reduce the COMM throughput.
I hope the sample psaudo-code will help clarifying the idea:
/* target code */
#define PRINT(format,...) send(__FILE__,__LINE__,__VA_ARGS__)
...
/* host code (c++) */
void PrintComm(istream& in)
{
string fileName;
int line,nParams;
int* params;
in>>fileName>>line>>nParams;
if (nParams>0)
{
params = new int[nParams];
for (int i=0; i<nParams; ++i)
in>>params[i];
}
const char* format = FindFormat(fileName,line);
...
delete[] params;
}
On some audio files the value of MediaElement.NaturalDuration is less than the actual duration of the audio. When I open the file in Windows Media Player the duration is correct (also when I look at the properties of the file). Although the value of the NaturalDuration property is incorrect, the audio is played fully, but at some point the value of the Position property becomes greater than the value of the NaturalDuration property, which, as I understand, should never happen.
I have created a simple application to reproduce the problem: https://skydrive.live.com/redir?resid=ACF8BFD4384116CE!2908&authkey=!AG-wF6Ae-7EAYk8
The duration of the audio file used in the application is 00:02:54, but the value of the NaturalDuration property is 00:01:59.
Does anyone know why and if there is a workaround for this?
Thanks in advance for any help.
Ok, this is not an answer but some results of a short investigation that give some clues why it behaves like that and where those numbers come from (2:58 and 1:59). First look at this thread: Calculating the length of MP3 Frames in milliseconds
Two things that we will use from there:
1) Frame length (in ms) = (samples per frame / sample rate (in hz)) * 1000, and
Duration in sec = Frame length (in ms) * number of frames / 1000
2) There are some standards regarding number of samples for different MPEG versions:
Samples per frame:
MPEG Version 1
384, // Layer1
1152, // Layer2
1152 // Layer3
MPEG Version 2 & 2.5
384, // Layer1
1152, // Layer2
576 // Layer3
Now lets check in winamp what it says about files format info:
MPEG-2.5 layer 3
16 kbps, 2482 frames
Now if you take frames = 2482 and samples per frame = 576 (MPEG-2.5 layer 3) you'll get duration 2:58. But it looks like for some reason silverlight and iTunes uses samples per frame = 384 which gives us 1:59. Next step could be to check the real values of file's headers and if they are correct and it is possible to calculate correct duration - well than you could cook up some hack to get durations separately (from the server for example). But I'm pretty sure - that file has some defects (inconsistent headers and content) and some players can handle it, others - not.