Can a pixel be divided into smaller pixels? Or is it possible to have 1.54 pixels instead of 1 or 2? - c

I am doing an exercise where I have to resize an image "f" times. "f" is a float, so I have to consider 1.45, 3.54, and so on. I don't want you to solve the problem, but I have some doubts about it.
A pixel is 24bits in a BMP file, right? Because it is RGB, so it has 1 byte for red, 1 byte for green and 1 byte for blue. So how am I supposed to divide a pixel? If I have 2.67 for example, then 0.67 how would that work? Dividing a pixel means dividing an 3 bytes, but there is a limit how I can divide them, also, RGB would dissapear, because if I divided by half, then I would only have 12bits, not enough to store RGB.
Also when I am copying pixel by pixel, is it possible to copy instead of pixel by pixel, to copy 0.01 pixel each time? meaning that if it takes me 1 step to copy 1 pixel (1 pixel at a time), then if I copy 0.01 pixel each time, means it would take me 100 times the time it took me to copy a whole pixel. It sounds completely weird for me, because copying 0.01 pixel at a time means copying 0.01 byte at a time, and that may screw the image up if I am resizing (I think).
I have tried with integers, but for example, a for loop will not work in the floating point, because of all the possibilities.

I don't think you're being asked to split an individual pixel. It sounds like you're being asked to add or remove pixels when an image is resized. For example, suppose you have an image that is 12 x 12 pixels and you are given a factor of 1.3 to expand by. This gives you a new image size of 15.6 x 15.6, which rounds to 16 x 16.
Then you need to perform a mapping of pixels in the original image to pixels in the resized image. A simple way to do this is to take the x and y coordinates of the larger image and multiply them (or divide them) by the scaling factor to get the corresponding coordinates in the smaller image, then copy the whole pixel from the old to the new image. Given the above example, pixel (13,14) in the larger image corresponds to x = 13/1.3 = 10 and y = 14 / 1.3 = ~10.76 (rounds to 11), so copy pixel (10,11) in the old image to (13,14) in the new image.

#dbush was very clear. But you can also make a deeper scale algorithm with these two observations.
Observation 1
In #dbush example, he tries to expand a 12 x 12 to a 16 x 16 because 15.6 x 15.6 is impossible to make (since pixels are a discrete unit). But by doing this the scale factor is no longer 1.3, it is 16/12 = 1.333333333333333 now. So you can use that number to make the adjustments he says.
Observation 2
In #dbush example, the pixel (13, 14) (counting pixels from 0 to 15 I suppose) is mapped to the pixel (10, 10.76). Since this pixel doesn't exist, he rounds its coordinates to use (10, 11) instead. But (10, 10.76) represents the coordinate of the upper left corner of a little rectangle inside the original image. A normal pixel is a square of size 1 x 1. But this little rectangle has the size of a pixel scaled by the same factor of 1.3. The size of this little rectangle is 1/1.3 = 0.78 (aprox.). Which means that this little rectangle has its lower right corner at (10.78, 11.54).
This little rectangle which has to be mapped to the new image has 11 - 10.76 = 0.24 units of its height inside pixel (10, 10), and 11.54 - 11 = 0.54 units of its height inside pixel (10, 11). So the RGB values for the new pixel must be a weighted sum of the RGB values of pixels (10, 10) and (10, 11) using 0.24 and 0.54 as weights respectively. This will grant your code the power to scale images by factors smaller than 1.
Notes
I used the word "rectangle" because I was considering the fact that an image could have a horizontal scale factor different than the vertical scale factor. In this particular case, the scale was 1.3 for both horizontal and vertical.
The weighted sum uses only height as weights because the little rectangle only intersects 2 pixels in the vertical axis. It happened that in the horizontal axis the little rectangle was inside a single pixel. But there could be a scenario where the rectangle will be intersecting pixel both horizontally and vertically, or even intersecting more than 2 pixels in the same axis. So the weighted sum should be prepared to consider more than 2 pixels in the same axis and to use areas instead of widths or heights if both axes are considered for a single rectangle.

Pixel represents the point on the screen and it is atomic. I'd you need to resize the screen you need to create an algorithm which will increase or decrease the number of rows and columns so hot having pixel parts.

Related

Robustly finding the local maximum of an image patch with sub-pixel accuracy

I am developing a SLAM algorithm in C, and I have implemented the FAST corner finding method which gives me some strong keypoints in the image. The next step is to get the center of the keypoints with a sub-pixel accuracy, therefore I extract a 3x3 patch around each of them, and do a Least Squares fit of a two dimensional quadratic:
Where f(x,y) is the corner saliency measure of each pixel, similar to the FAST score proposed on the original paper, but modified to also provide a saliency measure in non corner pixels.
And the least squares:
With being the estimated parameters.
I can now calculate the location of the peak of the fitted quadratic, by taking the gradient equal to zero, achieving my original goal.
The issue arises on some corner cases, where the local peak is closer to the edge of the window, resulting in a fit with low residuals but a peak of the quadratic way outside the window.
An example:
The corner saliency and a contour of the fitted quadratic:
The saliency (blue) and fit (red) as 3D meshes:
Numeric values of this example are (row-major ordering):
[336, 522, 483, 423, 539, 153, 221, 412, 234]
And the resulting sub pixel center of (2.6, -17.1) being wrong.
How can I constrain the fit so the center is within the window?
I'm open to alternative methods for finding the sub pixel peak.
The obvious answer is to reject 3x3 (or 5x5, whatever you use) boxes whose discrete maximum is not at the center. In other words, to use a quadratic approximation only to refine the location of a maximum that must be located inside the box.
More generally, in such cases the first questions to ask is not "How do I constrain my model-fitting procedure to shoehorn a solution for this edge case?", but rather
"Does my model apply to this edge case?" and "Is this edge case even worth spending time on, or can I just ignore it?"
I tried my own code to fit a 2D quadratic function to the 3x3 values, using a stable least-squares solving algorithm, and also found a maximum outside of the domain. The 3x3 patch of data does not match a quadratic function, and therefore the fit is not useful.
Fitting a 2D quadratic to a 3x3 neighborhood requires a degree of smoothness in the data that you don't seem to have in your FAST output.
There are many other methods to find the sub-pixel location of the maximum. One that I like because it is more stable and less computationally intensive is the fitting of a "separable" quadratic function. In short, you fit a quadratic function to the three values around the local maximum in one dimension, and then another in the other dimension. Instead of solving 6 parameters with 9 values, this solves 3 parameters with 3 values, twice. The solution is guaranteed stable, as long as the center pixel is larger or equal to all pixels in the 4-connected neighborhood.
z1 = [f(-1,0), f(0,0), f(1,0)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b1 = z1
and
z2 = [f(0,-1), f(0,0), f(0,1)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b2 = z2
Now you get the x-coordinate of the centroid from b1 and the y-coordinate from b2.

RGB 0-1 nomenclature

Stoopid question time!
RGB colours have three values (red, green and blue, ranged from 0 to 255). If those values are ranged from 0 to 1, what is the name for this colourspace*?
Is it RGB 0-1? RGB digital? Unreal RGB?
*and if alpha channel is included RGBA 0-1.
Unfortunately there is not real nomenclature. And sometime the same word could be interpreted differently according which book you studied (computer graphic, TV brodcasting, digital video format, photo).
First: RGB is not a colour space but a colour model, so it give just an idea on how colour are made, but it give nothing precise.
When using RGB, usually we intend linear RGB or gamma corrected (R'G'B'). Linear indicates the intensity of light, gamma corrected more how we perceive colours. so an half gray (which seems half way between white and black) is around 18% in linear RGB, or 50% in gamma corrected space
Then we have colour space, like sRGB. In a colour space we define chromacities (of R, G, and B) and the chromacity of white. [Usually the chromacities are given as x,y of CIExyz]. Rec.709 (HDTV) has the same chromacities of R, G, and B, but a different gamma, and a different white.
Often a colour space defines various characteristics. sRGB defines values from 0 to 255 (originally), so a byte, always gamma corrected. Previously it was common to have 100 or 1.0 as values for white (a triplet of such values). Note: values above 1.0 or 100 could be valid. On old analogue TV we can have such values (limited on part of screen and for limited frames, but still allowed).
On digital world signals (e.g. in HDMI), we have full-range RGB: 0 to 255 or limited-range RGB 16-235.
I do not know a good nomenclature, but usually it is obvious. In general: linear RGB has (0.0, 0.0, 0.0) for black, and (1.0, 1.0, 1.0) as floating point number (half or single precision). Floating point number are already a sort of exponential representation, like gamma corrected, and linear: adding light is just an addition (and it doesn't give unwanted cast of colours).
On non-linear colour spaces [gamma corrected] we tend to use integers, often 0 to 255 (or 0 to 1023 in 10 bit) (colour depth, sometime it is given total, sometime as bit per channel). So if you have a colour depth, you are working with integers, so values 0 to 2**channel_depth-1.
It is always good to specify the values, also because there is lack of nomenclature and so often confusion. You see many problems about people not realizing about full-scale/limited-scale images on HDMI signals.
My take: you define "8bit sRGB" (or 24bit): 0 to 255, analog for 10bit, etc. "linear RGB" (or anything with floating point constant, or seeing also R'G'B' in the text): 0 to 1.0. If you see also YCC somewhere, start worrying, because you never know if you have full-range or limited-range. The 0 to 100 is found on text books (especially old), but in that case I prefer to add a % sign, so giving the value automatically from 0 to 1.0.
But like file formats, somewhere you should describe fully your colour space: chromacities (e.g. by referring to sRGB DCI-P3, Apple P3, AdobeRGB, etc,), which gamma correction do you use (there are various functions, Apple on old hardware used different one), and I would write black: 0,0,0, white 1.0,1.0,1.0 (e.v. with range, e.g. negative numbers [for colour "out of gamut", or values ultraluminous), and precision (16bit [half precision] 32 bit [single precision], per channel, or classic 8bit, 10bit, 12bit in case of integers). You need only once, but better to be explicit, especially considering that people in different fields have different expectations.
RGB is a color space, in which colors are defined as proportions of it's components, so RGB colors are 3 values from 0 to 1. True color is sometimes referred to as RGB, but it's not technically correct, since it's combination of color space (RGB), and color depth (3*8 bits).
RGB 0-1 is fine, but RGB digital is more fitting for 0-255 range in my opinion and RGB unreal is really ill-named since it uses real numbers instead of integers for color representations.
With 0 and 1 as RGB values you get these combinations:
000
001
010
100
110
101
011
111
Eight different colors: Black, then blue, green, red, then yellow, magenta, cyan, then white.
(In the order that I listed the rgb color bits in my list)
If you double with a bold attribute, you get the 16 official Linux colors. And I guess that colorspace existed before: it is what you get when you start with 3 "base" colors and mix them to get 6.
A bit number magic also: 2^3=2*3+1+1
8 minus black minus white is 6 colors, in two groups.

GBA dev - Affine sprite pauses every 90 degrees, BG unaffected

EDIT:
I have stripped the program down to its basics: https://github.com/aidan-aidan/temp/blob/master/source/vpp.c
I have posted this question over at the gba-dev forums but they seem pretty dead and I have not gotten a response after many days.
Here is a video showing the problem: http://youtu.be/8gweFiSobwc (different from the code above, you can run the rom if you want to see it)
As you can see the BG rotation is unaffected, though they use the same LUT and they have the same data types in their structures as each other.
I have redone the LUT a few times to no avail, and this problem only surfaced after switching to 2048 circle divisions rather than 256.
Looking at the memory viewer in VBA, I can see that pb is not behaving the same as pa. pa ranges from 0x0100 to 0 as I would expect (same as 1 to 0), but pb ranges from 0 to 0x0096 when the gun is right from the center line, but jumps up to 0xFFFF as soon as it goes left from the center line. The only thing I can figure is that it is going negative (which makes sense, as cosine of that angle should result in a negative number), but I don't fully understand ones complement so i can't be sure. 0 "degrees" is horizontal on the right for the gun. 512 would be ninety degrees.
I have included everything I can think of, any help is appreciated.
Thanks!
As near as I can tell the program is technically working correctly. Instead your visual artifacts stem from the limited 8-bit fixed-point precision in the affine transformation used by the hardware.
Essentially the difference between a 0 and 1 in the sine tables makes a rather large and visible jump in the rotation for the right-angles in your rectangular sprite, while the 0°/90°/180°/270° corners are not that special for the round earth background.
The tangent for the (co-)sine function at 0 is high, so if you look at your tables there is only a single spot where they are precisely 0. If the rotation doesn't fall precisely on this index the result looks visibly skewed.
What you may try however is to cheat by fiddling with the rounding to produce more consecutive zeroes. At the moment you table generator is scaling by 256 and rounding towards the nearest integer:
sine[i] = floor(sin(i * M_PI / 1024.0) * 256.0 + 0.5);
As an alternative we may skew the table a little by rounding towards zero, as is the default in C, and increasing the scale to insure that the table still reaches 256 for at more than one entry.
int value = sin(i * M_PI / 1024.0) * 257.0;
if(value < -256) value = -256;
if(value > +256) value = +256;
sine[i] = value;
The table may be then flattened even further by increasing the scaling factor even further.
The problem is also that for small sprites near right angles there is nowhere for the rotation to go.
Picture a 64x1 straight line sprite pointing vertically at nearly an almost straight angle, with only the minimum 1/256th fractional step from the sine table. Keep in mind that the rotation is always centered so you get one jagged horizontal step precisely at the center.
For each of the 32 pixels from the center outwards you then add up another 1/256th fraction to the texture coordinate. However all of these together only make for 1/8th of a pixel in total and so no more horizontal steps are taken. In fact for this sprite shape you will see no visible change until the rotation angle reaches a full 8/256th fraction.
On the other hand your large earth object is sufficiently big to always show multiple visible integer "steps." This means that an increased rotation angle will always at least shift the position of the non-centered step, even if it insufficient to add up to another full pixel along entire side. The result is a smoother overall effect due to the continual animation.

how to convert longitude and altitude to a grid with equal(nearly) area on the Earth sphere in Matlab or C/C++?

The range of longitude and latitude of Earth sphere are [-180, 180] and [-90, 90] respectively. I want to get equal area grids by 0.5 deg * 0.5 deg (area around the equator).
As the distortion increased when approaching the polar points. The grid should have the same latitude range but different longitude range.
How should I do?
First, what you ask got, if interpreted literally, is impossible for three reasons. One, the area of the surface of a perfect sphere is about 82506.97413 times the area of a portion 30' (thirty seconds, or half a degree) by 30' at the equator. Since that is not an integer, you cannot partition the surface into a whole number of regions of that size. Two, if you constrain the latitude span to be equal, then the rings at different latitudes must have different numbers of segments, so you cannot make a grid. The edges of segments in different rings cannot coincide. Three, the Earth is not a perfect sphere, and regions of equal area on a sphere will not map to equal areas on the Earth. Imperfections in the Earth would prevent us from knowing the area of each region (and those areas would change as the surface changes).
Supposing that what you actually want is an approximate solution that is not a grid, I suggest you examine the Google search results for “partition sphere into equal areas“. One of the top results is this paper, in which Figure 1.1 appears to show a sphere that has been partitioned into regions of similar, possibly equal, latitude span but different longitude spans. Another result is this page, which is a Matlab package for exploring sphere partitioning.
You are trying to put equal area tiles on the earth's surface with fixed extent like the mirrors on a disco ball.
So if you start at the Equator with a 0.5deg*0.5deg tile, your next tile to the north or south would have a longitude extent of 0.5deg/cos(0.5deg) to have the same area, so slightly above 0.5deg.
With that tile you cannot fill the full circle with an integer number of tiles.
Ending at the pole your tile longitude extent would be 0.5deg/cos(89.5deg) = 59,29..deg which also does not fit exactly into 360degs.
If you decrease the size of your tiles you might have an acceptable small error but yet no real "grid" because coming to the poles there will always be less tiles than at the Equator..
Maybe something like "equal area map projection" might help? http://en.wikipedia.org/wiki/Map_projection#Equal-area
Two possible solutions here (formulas in R):
lat<-seq(-89.75,89.75,by=0.5)
Formula 1, based on the area of a grid point in the equator (111.11km2).
r<-(111*10^3*0.5)*(111*10^3*0.5)*cos(lat*pi/180)
Formula 2, based on the radius of the Earth.
ER<-6371*1000
r2<-(ER^2)*(0.5*pi/180)^2*cos(lat*pi/180)

Antipole Clustering

I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?

Resources