How to realize the DRAWING processing in Processing? - arrays

We all know how to draw a line in Processing.
But when we draw a line, the line is shown immediately.
What if i want to witness the drawing process, namely, to see the line moving forward, gradually completes a whole line.
Here's what i want to realize: to DRAW several lines and curves which finally turn into some pattern.
So how to make that happen? Using array?
Many thanks.

In processing all of the drawing happens in a loop. An easy way to create animated sequences like you describe is to use frameCount to drive it and using the modulus function % is a good way to create a loop. For example, to animate along the x axis:
void draw() {
float x = 50;
float y = 50;
float lineLength = 50;
int framesToAnimate = 60;
line(x,y,x+float(frameCount % framesToAnimate)/framesToAnimate*lineLength, y);
}
Note: strange things will happen if you don't cast / convert to a float
I use this pretty often to animate other features such as the color.
fill(color(127 + sin(float(frameCount)/90)*127, 0, 0, 127));
If you want to get more advanced, setting vectors and coordinates with PVector. There is a pretty good tutorial on Daniel Shiffman's site.

If you want to set your animation independent of frame rate, you can use mills() instead. That will return current time since the sketch started so you can set something to happen in a given time in seconds.
like for example:
long initialTime;
void setup(){
size(400,200);
initialTime = millis();
}
void draw() {
float x = 50;
float y = 50; //set the multiplier to adjust speed
line(x,y,x+(millis()-initialTime)*0.01, y); //10 px/sec
line(x,y+50,x+(millis()-initialTime)*0.05, y+50); //50 px/sec
line(x,y+100,x+(millis()-initialTime)*0.001, y+100); // 1 px/sec
}
There is also some animation libraries, i've seen some impressive results with some, but i never used them. Here a list.

Related

OpenGL - Is it efficient to call glBufferSubData (nearly) each frame?

I have a spritesheet that contains a simple sprite animation. I can successfully extract each animation frame sprite and display it as a texture. However, I managed to do that by calling glBufferSubData to change the texture coordinates (before the game loop, at initialization). I want to play a simple sprite animation using these extracted textures, and I guess I will do it by changing the texture coordinates each frame (except if animation is triggered by user input). Anyways, this results in calling glBufferSubData almost every frame (to change texture data), and my question is that is this approach efficient? If not, how can I solve the issue? (In Reddit, I saw a comment saying that the goal must be to minimize the traffic between CPU and GPU memory in modern OpenGL, and I guess my approach violates this goal.) Thanks in advance.
For anyone interested, here is my approach:
void set_sprite_texture_from_spritesheet(Sprite* sprite, const char* path, int x_offset, int y_offset, int sprite_width, int sprite_height)
{
float* uv_coords = get_uv_coords_from_spritesheet(path, x_offset, y_offset, sprite_width, sprite_height);
for (int i = 0; i < 8; i++)
{
/* 8 means that I am changing a total of 8 texture coordinates (2 for each 4 vertices) */
edit_vertex_data_by_index(sprite, &uv_coords[i], (i / 2) * 5 + 3 + (i % 2 != 0));
/*
the last argument in this function gives the index of the desired texture coordinate
(5 is for stride, 3 for offset of texture coordinates in each row)
*/
}
free(uv_coords);
sprite->texture = load_texture(path); /* loads the texture -
since the UV coordinate is
adjusted based on the spritesheet
I am loading the entire spritesheet as
a texture.
*/
}
void edit_vertex_data_by_index(Sprite *sprite, float *data, unsigned int start_index)
{
glBindBuffer(GL_ARRAY_BUFFER, sprite->vbo);
glBufferSubData(GL_ARRAY_BUFFER, start_index * sizeof(float), sizeof(data), data);
glBindBuffer(GL_ARRAY_BUFFER, 0);
/*
My concern is that if I call this almost every frame, it could be not efficient, but I am not sure.
*/
}
Editing buffers is fine. Literally every game has buffers that change every frame. Buffers are how you get the data to the GPU so it can render it! (And uniforms. Your driver is likely to secretly put uniforms in buffers though!)
Yes, you should minimize the amount of buffer updates. You should minimize everything, really. The less stuff the computer does, the faster it can do it! That doesn't mean you should avoid doing stuff entirely. It means you should only do as much stuff as you need to, instead of doing wasteful stuff that you don't need.
Every time you call an OpenGL function, the driver takes some time to check how to process your request, which buffer is bound, that it's big enough, that the GPU isn't using it at the same time, etc. You want to do as few calls as possible, because that way, the driver has to check all this stuff less often.
You are doing 8 separate glBufferSubData calls in this function. If you put the UV coordinates all next to each other in the buffer, you could update them all at once with 1 call. And if you have lots of animated sprites, you should try to put all of their UV coordinates in one big array, and update the whole array in one call - all the sprites at once.
And loading textures from paths is really slow. Maybe your program can load 100 textures per second but that still means you blew half your frame time budget on texture loading. The texture hasn't changed anyway so why would you load it again?

What is the most efficient way to put multiple colours on a window, especially in frame-by-frame format?

I am making a game with C and X11. I've been trying for quite a while to find a way to put different coloured pixels on a window, frame by frame. I've seen fully developed games get thousands of frames per second. What is the most efficient way of doing this?
I have seen 2-coloured bitmaps with XImages, allocating 256 colours on a fade of black-white, and using XPutPixel with XImages (which I wasn't able to figure how to create an XImage properly that could later have pixels put on it).
I have made this for loop that creates a random image, but it is, obviously, pixel-by-pixel instead of frame-by-frame and takes 18 seconds to render one entire frame.
XColor pixel;
for (int x = 0; x < currentWindowWidth; x++) {
for (int y = 0; y < currentWindowHeight; y++) {
pixel.red = rand() % 256 * 256; //Converting 16-bit colour to 8-bit colour
pixel.green = rand() % 256 * 256;
pixel.blue = rand() % 256 * 256;
XAllocColor(display, XDefaultColormap(display, screenNumber), &pixel); //This probably takes the most time,
XSetForeground(display, graphics, pixel.pixel); //as does this.
XDrawPoint(display, window, graphics, x, y);
}
}
After three or so more weeks of testing things off and on, I finally figured out how to do it, and it was rather simple. As I said in the OP, XAllocColor and XSetForground take quite a bit of time (relatively) to work. XDrawPoint also was slow, as it does more than just put a pixel at a point on an image.
First I tested how Xlib's colour format works (for the unsigned long int represented as pixel.pixel, which was what I needed XAllocColor for), and it appears to have 100% red set to 16711680, 100% green set to 65280, and 100% blue set to 255, which is obviously a pattern. I found the maximum to be a 50% of all colours, 4286019447, which is a solid grey.
Next, I made sure my XVisualInfo would be supported by my system with a test using XMatchVisualInfo([expected visual info values]). That ensures the depth I will use and the TrueColor class works.
Finally, I made an XImage copied from the root window's image for manipulation. I used XPutPixel for each pixel on the window and set it to a random value between 0 and 4286019448, creating the random image. I then used XPutImage to paste the image to the window.
Here's the final code:
if (!XMatchVisualInfo(display, screenNumber, 24, TrueColor, &visualInfo)) {
exit(0);
}
frameImage = XGetImage(display, rootWindow, 0, 0, screenWidth, screenHeight, AllPlanes, ZPixmap);
while (1) {
for (unsigned short x = 0; x < currentWindowWidth; x += pixelSize) {
for (unsigned short y = 0; y < currentWindowHeight; y += pixelSize) {
XPutPixel(frameImage, x, y, rand() % 4286019447);
}
}
XPutImage(display, window, graphics, frameImage, 0, 0, 0, 0, currentWindowWidth, currentWindowHeight);
}
This puts a random image on the screen, at a stable 140 frames per second on fullscreen. I don't necessarily know if this is the most efficient way, but it works way better than anything else I've tried. Let me know if there is any way to make it better.
Thousands of frames per second is not possible. The monitor frequency is about 100 Hz, or 100 cycles per second, that's roughly the maximum frame rate. This is still very fast. Human eye wouldn't pick up faster frame rates.
The monitor response time is about 5ms, so any single point on the screen cannot be refreshed more than 200 times per second.
8-bit is 1 byte, so 8-bit image uses one byte per pixel, each pixel is from 0 to 256. The pixel doesn't have red, blue, green component. Instead each pixel points to an index in the color table. The color table holds 256 colors. There is a trick where you keep the pixels the same and change the color table, this makes the image fade in and out or do other weird things.
In a 24-bit image, each pixel has blue, red, green component. Each color is 1 byte, so each pixel is 3 bytes, or 24 bits.
uint8_t red = rand() % 256;
uint8_t grn = rand() % 256;
uint8_t blu = rand() % 256;
A 16-bit image uses an odd format to store red, blue, green. 16 is not divisible by 3, often times 2 colors are assigned 5-bits, and the 3rd color gets 6-bits. Then you have to fit these colors on one uint16_t sized pixel. It's probably not worth it to explore this.
The slowness of your routine is because you are painting one pixel at a time. You should paint to a buffer instead, and render the buffer once per frame. You might consider using other frame works like SDL. Other games may use things like OpenGL which takes advantage of GPU optimization for matrix operation etc.
You must use a GPU. GPUs have a highly parallel architecture optimized for graphics (hence the name). To access the GPU you will use an API like OpenGL or Vulkan or make use of a Game Engine.

Suggestions on optimizing a Z-buffer implementation?

I'm writing a 3D graphics library as part of a project of mine, and I'm at the point where everything works, but not well enough.
In particular, my main headache is that my pixel fill-rate is horribly slow -- I can't even manage 30 FPS when drawing a triangle that spans half of an 800x600 window on my target machine (which is admittedly an older computer, but it should be able to manage this . . .)
I ran gprof on my executable, and I end up with the following interesting lines:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
43.51 9.50 9.50 vSwap
34.86 17.11 7.61 179944 0.04 0.04 grInterpolateHLine
13.99 20.17 3.06 grClearDepthBuffer
<snip>
0.76 21.78 0.17 624 0.27 12.46 grScanlineFill
The function vSwap is my double-buffer swapping function, and it also performs vsyching, so it makes sense to me that the test program will spend much of its time waiting in there. grScanlineFill is my triangle-drawing function, which creates an edge list and then calls grInterpolateHLine to actually fill in the triangle.
My engine is currently using a Z-buffer to perform hidden surface removal. If we discount the (presumed) vsynch overhead, then it turns out that the test program is spending something like 85% of its execution time either clearing the depth buffer, or writing pixels according to the values in the depth buffer. My depth buffer clearing function is simplicity itself: copy the maximum value of a float into each element. The function grInterpolateHLine is:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
for(; x1 <= x2; x1 ++, z += zstep) {
if(z < grDepthBuffer[x1 + y*VIDEO_WIDTH]) {
vSetPixel(x1, y, colour);
grDepthBuffer[x1 + y*VIDEO_WIDTH] = z;
}
}
}
I really don't see how I can improve that, especially considering that vSetPixel is a macro.
My entire stock of ideas for optimization has been whittled down to precisely one:
Use an integer/fixed-point depth buffer.
The problem that I have with integer/fixed-point depth buffers is that interpolation can be very annoying, and I don't actually have a fixed-point number library yet. Any further thoughts out there? Any advice would be most appreciated.
You should have a look at the source code to something like Quake - considering what it could achieve on a Pentium, 15 years ago. Its z-buffer implementation used spans rather than per-pixel (or fragment) depth. Otherwise, you could look at the rasterization code in Mesa.
Hard to really tell what higher order optimizations can be done without seeing the rest of the code. I have a couple of minor observation, though.
There's no need to calculate x1 + y * VIDEO_WIDTH more than once in grInterpolateHLine. i.e.:
void grInterpolateHLine(int x1, int x2, int y, float z, float zstep, int colour) {
int offset = x1 + (y * VIDEO_WIDTH);
for(; x1 <= x2; x1 ++, z += zstep, offset++) {
if(z < grDepthBuffer[offset]) {
vSetPixel(x1, y, colour);
grDepthBuffer[offset] = z;
}
}
}
Likewise, I'm guessing that your vSetPixel does a similar calculation, so you should be able to use the same offset there as well, and then you only need to increment offset and not x1 in each loop iteration. Chances are this can be extended back to the function that calls grInterpolateHLine, and you would then only need to do the multiplication once per triangle.
There are some other things you could do with the depth buffer. Most of the time if the first pixel of the line either fails or passes the depth test, then the rest of the line will have the same result. So after the first test you can write a more efficient assembly block to test the entire line in one shot, then if it passes you can use a more efficient block memory setter to block-set the pixel and depth values instead of doing them one at a time. You would only need to test/set per pixel if the line is only partially occluded.
Also, not sure what you mean by older computer, but if your target computer is multi-core then you can break it up among multiple cores. You can do this for the buffer clearing function as well. It can help quite a bit.
I ended up solving this by replacing the Z-buffer with the Painter's Algorithm. I used SSE to write a Z-buffer implementation that created a bitmask w/the pixels to paint (plus the range optimization suggested by Gerald), and it still ran far too slowly.
Thank you, everyone, for your input.

How can this function be optimized? (Uses almost all of the processing power)

I'm in the process of writing a little game to teach myself OpenGL rendering as it's one of the things I haven't tackled yet. I used SDL before and this same function, while still performing badly, didn't go as over the top as it does now.
Basically, there is not much going on in my game yet, just some basic movement and background drawing. When I switched to OpenGL, it appears as if it's way too fast. My frames per second exceed 2000 and this function uses up most of the processing power.
What is interesting is that the program in it's SDL version used 100% CPU but ran smoothly, while the OpenGL version uses only about 40% - 60% CPU but seems to tax my graphics card in such a way that my whole desktop becomes unresponsive. Bad.
It's not a too complex function, it renders a 1024x1024 background tile according to the player's X and Y coordinates to give the impression of movement while the player graphic itself stays locked in the center. Because it's a small tile for a bigger screen, I have to render it multiple times to stitch the tiles together for a full background. The two for loops in the code below iterate 12 times, combined, so I can see why this is ineffective when called 2000 times per second.
So to get to the point, this is the evil-doer:
void render_background(game_t *game)
{
int bgw;
int bgh;
int x, y;
glBindTexture(GL_TEXTURE_2D, game->art_background);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_WIDTH, &bgw);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, GL_TEXTURE_HEIGHT, &bgh);
glBegin(GL_QUADS);
/*
* Start one background tile too early and end one too late
* so the player can not outrun the background
*/
for (x = -bgw; x < root->w + bgw; x += bgw)
{
for (y = -bgh; y < root->h + bgh; y += bgh)
{
/* Offsets */
int ox = x + (int)game->player->x % bgw;
int oy = y + (int)game->player->y % bgh;
/* Top Left */
glTexCoord2f(0, 0);
glVertex3f(ox, oy, 0);
/* Top Right */
glTexCoord2f(1, 0);
glVertex3f(ox + bgw, oy, 0);
/* Bottom Right */
glTexCoord2f(1, 1);
glVertex3f(ox + bgw, oy + bgh, 0);
/* Bottom Left */
glTexCoord2f(0, 1);
glVertex3f(ox, oy + bgh, 0);
}
}
glEnd();
}
If I artificially limit the speed by called SDL_Delay(1) in the game loop, I cut the FPS down to ~660 ± 20, I get no "performance overkill". But I doubt that is the correct way to go on about this.
For the sake of completion, these are my general rendering and game loop functions:
void game_main()
{
long current_ticks = 0;
long elapsed_ticks;
long last_ticks = SDL_GetTicks();
game_t game;
object_t player;
if (init_game(&game) != 0)
return;
init_player(&player);
game.player = &player;
/* game_init() */
while (!game.quit)
{
/* Update number of ticks since last loop */
current_ticks = SDL_GetTicks();
elapsed_ticks = current_ticks - last_ticks;
last_ticks = current_ticks;
game_handle_inputs(elapsed_ticks, &game);
game_update(elapsed_ticks, &game);
game_render(elapsed_ticks, &game);
/* Lagging stops if I enable this */
/* SDL_Delay(1); */
}
cleanup_game(&game);
return;
}
void game_render(long elapsed_ticks, game_t *game)
{
game->tick_counter += elapsed_ticks;
if (game->tick_counter >= 1000)
{
game->fps = game->frame_counter;
game->tick_counter = 0;
game->frame_counter = 0;
printf("FPS: %d\n", game->fps);
}
render_background(game);
render_objects(game);
SDL_GL_SwapBuffers();
game->frame_counter++;
return;
}
According to gprof profiling, even when I limit the execution with SDL_Delay(), it still spends about 50% of the time rendering my background.
Turn on VSYNC. That way you'll calculate graphics data exactly as fast as the display can present it to the user, and you won't waste CPU or GPU cycles calculating extra frames inbetween that will just be discarded because the monitor is still busy displaying a previous frame.
First of all, you don't need to render the tile x*y times - you can render it once for the entire area it should cover and use GL_REPEAT to have OpenGL cover the entire area with it. All you need to do is to compute the proper texture coordinates once, so that the tile doesn't get distorted (stretched). To make it appear to be moving, increase the texture coordinates by a small margin every frame.
Now down to limiting the speed. What you want to do is not to just plug a sleep() call in there, but measure the time it takes to render one complete frame:
function FrameCap (time_t desiredFrameTime, time_t actualFrameTime)
{
time_t delay = 1000 / desiredFrameTime;
if (desiredFrameTime > actualFrameTime)
sleep (desiredFrameTime - actualFrameTime); // there is a small imprecision here
}
time_t startTime = (time_t) SDL_GetTicks ();
// render frame
FrameCap ((time_t) SDL_GetTicks () - startTime);
There are ways to make this more precise (e.g. by using the performance counter functions on Windows 7, or using microsecond resolution on Linux), but I think you get the general idea. This approach also has the advantage of being driver independent and - unlike coupling to V-Sync - allowing an arbitrary frame rate.
At 2000 FPS it only takes 0.5 ms to render the entire frame. If you want to get 60 FPS then each frame should take about 16 ms. To do this, first render your frame (about 0.5 ms), then use SDL_Delay() to use up the rest of the 16 ms.
Also, if you are interested in profiling your code (which isn't needed if you are getting 2000 FPS!) then you may want to use High Resolution Timers. That way you could tell exactly how long any block of code takes, not just how much time your program spends in it.

Intermittent bugs - sometimes this code works and sometimes it doesn't!

This code intermittently works. It's running on a small microcontroller. It will work fine even after restarting the processor, but if I change some part of the code, it breaks. This makes me think that it's some kind of pointer bug or memory corruption. What's happening is the coordinate, p_res.pos.x is sometimes read as 0 (the incorrect value) and 96 (the correct value) when it is passed to write_circle_outlined. y seems to be correct most of the time. If anyone can spot anything obviously wrong please point it out!
int demo_game()
{
long int d;
int x, y;
struct WorldCamera p_viewer;
struct Point3D_LLA p_subj;
struct Point2D_CalcRes p_res;
p_viewer.hfov = 27;
p_viewer.vfov = 32;
p_viewer.width = 192;
p_viewer.height = 128;
p_viewer.p.lat = 51.26f;
p_viewer.p.lon = -1.0862f;
p_viewer.p.alt = 100.0f;
p_subj.lat = 51.20f;
p_subj.lon = -1.0862f;
p_subj.alt = 100.0f;
while(1)
{
fill_buffer(draw_buffer_mask, 0x0000);
fill_buffer(draw_buffer_level, 0xffff);
compute_3d_transform(&p_viewer, &p_subj, &p_res, 10000.0f);
x = p_res.pos.x;
y = p_res.pos.y;
write_circle_outlined(x, y, 1.0f / p_res.est_dist, 0, 0, 0, 1);
p_viewer.p.lat -= 0.0001f;
//p_viewer.p.alt -= 0.00001f;
d = 20000;
while(d--);
}
return 1;
}
The code for compute_3d_transform is:
void compute_3d_transform(struct WorldCamera *p_viewer, struct Point3D_LLA *p_subj, struct Point2D_CalcRes *res, float cliph)
{
// Estimate the distance to the waypoint. This isn't intended to replace
// proper lat/lon distance algorithms, but provides a general indication
// of how far away our subject is from the camera. It works accurately for
// short distances of less than 1km, but doesn't give distances in any
// meaningful unit (lat/lon distance?)
res->est_dist = hypot2(p_viewer->p.lat - p_subj->lat, p_viewer->p.lon - p_subj->lon);
// Save precious cycles if outside of visible world.
if(res->est_dist > cliph)
goto quick_exit;
// Compute the horizontal angle to the point.
// atan2(y,x) so atan2(lon,lat) and not atan2(lat,lon)!
res->h_angle = RAD2DEG(angle_dist(atan2(p_viewer->p.lon - p_subj->lon, p_viewer->p.lat - p_subj->lat), p_viewer->yaw));
res->small_dist = res->est_dist * 0.0025f; // by trial and error this works well.
// Using the estimated distance and altitude delta we can calculate
// the vertical angle.
res->v_angle = RAD2DEG(atan2(p_viewer->p.alt - p_subj->alt, res->est_dist));
// Normalize the results to fit in the field of view of the camera if
// the point is visible. If they are outside of (0,hfov] or (0,vfov]
// then the point is not visible.
res->h_angle += p_viewer->hfov / 2;
res->v_angle += p_viewer->vfov / 2;
// Set flags.
if(res->h_angle < 0 || res->h_angle > p_viewer->hfov)
res->flags |= X_OVER;
if(res->v_angle < 0 || res->v_angle > p_viewer->vfov)
res->flags |= Y_OVER;
res->pos.x = (res->h_angle / p_viewer->hfov) * p_viewer->width;
res->pos.y = (res->v_angle / p_viewer->vfov) * p_viewer->height;
return;
quick_exit:
res->flags |= X_OVER | Y_OVER;
return;
}
Structure for the results:
typedef struct Point2D_Pixel { unsigned int x, y; };
// Structure for storing calculated results (from camera transforms.)
typedef struct Point2D_CalcRes
{
struct Point2D_Pixel pos;
float h_angle, v_angle, est_dist, small_dist;
int flags;
};
The code is part of an open source project of mine so it's okay to post a lot of code here.
I see some of your calculation depends on p_viewer->yaw, but I do not see any intialization for p_viewer->yaw. Is this your problem?
A couple of things that seem sketchy:
You can return from compute_3d_transform without setting many of the fields in p_res/res but the caller never checks for this situation.
You consistently read from res->flags without initializing it first.
Whenever the output differs, it possibly means some value is not initialized and the outcome depends on the garbage value present in a variable. Keeping that in mind, I looked for uninitialized variables. the structure p_res is not initialized.
if(res->est_dist > cliph)
goto quick_exit;
that means if condition may turn out to be true or false depending on what garbage value is stored in res->est_dist. When if condition turns out to true, it goes straight to quick_exit label and doesn't update p_res.pos.x. If condition turned out to be false then its updated.
When I used to program C, I would use a divide and conquer debugging technique for this kind of problem to try to isolate the offending operation (paying attention to whether the symptoms change as debugging code is added, which is indicative of dangling pointer type bugs).
Essentially, start with the first line where the value is known to be good (and prove that it is consistently good at that line). Then identify where is it known to be bad. Then approx. halfway between the two points insert a test to see if it's bad. If not, then insert a test halfway between the mid-point and the known bad location, if it is bad then insert a test halfway between the mid-point and the known good location, and so on.
If the line identified is itself a function call, this process can be repeated in that called function, and so on.
When using this kind of approach, it's important to minimize the amount of added code and the artificial "noise", which can create timing changes.
Use this if you don't have (or can't use) an interactive debugger, or if the problem does not manifest when using one.

Resources