How to find the average of compression ratios of video manually? - video-processing

If the video coding standard is MPEG 1 and given frame sequence is 'IBBPBBPBBPBBI' and compression ratios for I,P,B are 0.1, 0.05 and 0.02 and video sequence is longer than 12 frames. What will be the average compression?
I am new in image processing and having difficulties to relate all the terms.How to find it?

This looks like homework, and so I'm tempted to not give you a straight answer.
That GOP structure you've shown has an I at both the front and the end. That indicates that this is then repeated. So your GOP will repeat the structure IBBPBBPBBPBB indefinitely.
The average compression would be how much each GOP is compressed compared to the original size of the 12 images. The size of the images as they come in are all the same - they are uncompressed video frames. The I frame is compressed to 0.1 of its original size. The P frames to 0.05 of its original size. The B frames to 0.02. For a GOP, you have 1 I picture, 3 P pictures and 8 B pictures. So the average ratio over a GOP is...?

So the answer will be
(0.1*1+0.05*3+8*0.02)/12
= 0.0341
That means 0. 1 part of 1, 0.05 part of 3 and 0.02 part of 8 by total no of frames.

Related

How can I write a loop to get samples from different range of a variable to draw histograms for them in another dataset?

In a data frame that has 2 columns of name and pvalue, I need to write a loop to get 20 samples (samples are gene set names which are sometimes too long) from different range of p-values including:
Less than or equal to 0.001
Between 0.001 and 0.01
Between 0.01 and 0.05
Between 0.05 and 0.10
Between 0.10 and 0.20
Between 0.20 and 0.50
Larger than 0.50
and then for each range of sampling, I want to find these 20 samples' name in another dataset to draw a histogram for each sample in one sheet. Finally I need to draw histograms of these 20 names in 4 row and 5 columns. I would like to write a loop to do this in a smart way as I need to repeat this proccess several times and also I am new in R programming and I am not famliar in writing loops well and what I want to do is a little bit complecated for me. I appreciate any helps. Thank you!
I think I have to start with getting 20 samples.
MAIN<-sample(DATA$name[DATA$pvalue<0.001, 20, replace=F)
It gives me the name of 20 samples.
Now I want to find each name in a new dataset. the new dataset is like the previous one including name and pvalue, but each name repeated about 100 times. And I want to draw a histogram for each name. Totally I would like to have 20 histograms in one sheet. I dont have any idea for this part.

Finding groups of clusters in binary array

I have a binary image which I have extracted all the pixels and wrote them to a txt file. I am trying to find how many clusters and where there are clusters of 25 or more 1's in the array.
DBSCAN, euclidean distances.
db_scan = DBSCAN(eps=1, min_samples=25,metric='euclidean', metric_params=None, algorithm='auto').fit(im_bw)
I expect to find the i, j location of the center of the clusters. I expect to find the number of clusters but says i only have 1
It probably says 0 clusters, only noise.
Because in a radius of 1 pixel you won't have 25 pixels.
Note that you need to make sure you chose the right representation and don't make false assumptions on what the algorithm does... For example, it does not produce centers.

Sampling Calculation / multimedia

I have a serious exam tomorrow and this is one of the sample questions provided. I tried to solve this problem many times but I could never get an accurate answer. There are no information regarding on the calculations in my lecture materials. I googled many things and looked for ways of calculating this in two different books which I have but could not find anything related. I do not know what the exact subject name for these sort of calculations but I think it is multimedia/sampling. I would greatly appreciate any information regarding the problem seriously any briefing would do. I just want to be able to solve it. I have quoted the question below.
"A supermarket must store text, image and video information on 2,000
items. There is text information associated with each item occupying 0.5
Kb. For 200 items, it is also necessary to store an image consisting of 1
million pixels. Each pixel represents one of 255 colours. For 10 items, it is
also necessary to store a 4 second colour video (25 frames per second), to
be viewed on a screen with a resolution of 1000 x 1000 pixels. The total
storage required for the database is:"
TOTAL = 2,000 items x 0.5 kilobytes +
(200 items x (1,000,000 pixels x 1 byte each)) +
(10 items x (25 frames x 4
seconds) x (1,000 pixels x 1,000 pixels x 1 byte each))
= 1,000,000 + 200,000,000 + 1,000,000,000
= 1,201,000,000 bytes = 1.201 GB
Notes:
Kb could represent either 1000 or 1024, depending on how coherent your syllabus is. I imagine given the choice of the other numbers it is 1,000.
Each of 255 colors can be stored in a single byte TINYINT (as 256 is the TINYINT max).

Is the Leptonica implementation of 'Modified Median Cut' not using the median at all?

I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm.
I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd.
Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all.
The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic
The largest axis is divided by locating the bin with the median pixel
(by population), selecting the longer side, and dividing in the center
of that side. We could have simply put the bin with the median pixel
in the shorter side, but in the early stages of subdivision, this
tends to put low density clusters (that are not considered in the
subdivision) in the same vbox as part of a high density cluster that
will outvote it in median vbox color, even with future median-based
subdivisions. The algorithm used here is particularly important in
early subdivisions, and 3is useful for giving visible but low
population color clusters their own vbox. This has little effect on
the subdivision of high density clusters, which ultimately will have
roughly equal population in their vboxes.
For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following
Iterate over all the possible green and blue variations of the red color
For each iteration it adds to the total number of pixels (population) it's found along the red axis
For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red
note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red
So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations
4 8 20 16 1 9 12 8 8
After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above
4 12 32 48 49 58 70 78 86
And total would have a value of 86
Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346
It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2
Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49
The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was..
Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin.
Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)
In the 9-bin example, 49 is the number of pixels in the first 5 bins. 49 is the median number in the set of 9 partial sums, but we want the median pixel in the set of 86 pixels, which is 43 (or 44), and it resides in the 4th bin.
Inspection of the modified median cut algorithm in colorquant2.c of leptonica shows that the actual cut location for the 3d box does not necessarily occur adjacent to the bin containing the median pixel. The reasons for this are explained in the function medianCutApply(). This is one of the "modifications" to Paul Heckbert's original method. The other significant modification is to make the decision of which 3d box to cut next based on a combination of both population and the product (population * volume), thus permitting splitting of large but sparsely populated regions of color space.
I do not know the algo, but I would assume your array contains the population of each red; let's explain this with an example:
Assume you have four gradations of red: A,B,C and D
And you have the following sequence of red values:
AABDCADBBBAAA
To find the median, you would have to sort them according to red value and take the middle:
median
v
AAAAAABBBBCDD
Now let's use their approach:
A:6 => 6
B:4 => 10
C:1 => 11
D:2 => 13
13/2 = 6.5 => B
I think the mismatch happened because you are counting the population; the average color would be:
(6*A+4*B+1*C+2*D)/13

Increasing the pitch of audio using a varied value

Okay, this a bit of maths and DSP question.
Let us say I have 20,000 samples which I want to resample at a different pitch. Twice the normal rate for example. Using an Interpolate cubic method found here I would set my new array index values by multiplying the i variable in an iteration by the new pitch (in this case 2.0). This would also set my new array of samples to total 10,000. As the interpolation is going double the speed it only needs half the amount of time to finish.
But what if I want my pitch to vary throughout the recording? Basically I would like it to slowly increase from a normal rate to 8 times faster (at the 10,000 sample mark) and then back to 1.0. It would be an arc. My questions are this:
How do I calculate how many samples would the final audio track be?
How to create an array of pitch values that would represent this increase from 1.0 to 8.0 back to 1.0
Mind you this is not for live audio output, but for transforming recorded sound. I mainly work in C, but I don't know if that is relevant.
I know this probably is complicated, so please feel free to ask for clarifications.
To represent an increase from 1.0 to 8.0 and back, you could use a function of this form:
f(x) = 1 + 7/2*(1 - cos(2*pi*x/y))
Where y is the number of samples in the resulting track.
It will start at 1 for x=0, increase to 8 for x=y/2, then decrease back to 1 for x=y.
Here's what it looks like for y=10:
Now we need to find the value of y depending on z, the original number of samples (20,000 in this case but let's be general). For this we solve integral 1+7/2 (1-cos(2 pi x/y)) dx from 0 to y = z. The solution is y = 2*z/9 = z/4.5, nice and simple :)
Therefore, for an input with 20,000 samples, you'll get 4,444 samples in the output.
Finally, instead of multiplying the output index by the pitch value, you can access the original samples like this: output[i] = input[g(i)], where g is the integral of the above function f:
g(x) = (9*x)/2-(7*y*sin((2*pi*x)/y))/(4*pi)
For y=4444, it looks like this:
In order not to end up with aliasing in the result, you will also need to low pass filter before or during interpolation using either a filter with a variable transition frequency lower than half the local sample rate, or with a fixed cutoff frequency more than 16X lower than the current sample rate (for an 8X peak pitch increase). This will require a more sophisticated interpolator than a cubic spline. For best results, you might want to try a variable width windowed sinc kernel interpolator.

Resources