I'm using the device STM32F746. I know it has a hardware 2D Graphics accelerator.
I know how to do animation using double buffering.
But according to this
https://www.touchgfx.com/news/high-quality-graphics-using-only-internal-memory/
They are claiming that they use only one framebuffer for animation.
How is that possible and what techniques that are used using that STM32F746 ?
It is the double buffering. One buffer is stored in the MCU memory, where the next frame is prepared and composed. Another buffer is in LCD driver memory, to where data being transferred from the MCU when it is ready, and displayed on the LCD with the required refresh rate.
That's why that library requires so much of MCU memory.
Despite the answer was accepted it is wrong.
In fact those controllers have their own LCD-driving circuit, thus, do not require external driver. They use part of internal memory as the screen buffer and constantly refresh the image on the LCD.
In the library, that only part of memory is used. The write operatation are synchronized with LCD refresh, so they avoid flickering.
So, the only one buffer is used: the same buffer contains the output image and used to compose the next frame.
I have run into a problem that I am rather stumped on because every solution I can think of has an issue that makes it not work fully. I am working on a game on the MSP430FF529 that when first powered up has two images drawn to the screen infinitely using a loop and cycle delays. I would like to have it so that when the user presses the start button (a simple high-edge trigger on a port) that the program immediately stops drawing those screens, no matter what part of the process its in, and starts executing the rest of the code that runs the game.
I could put the function that puts the images on screen in a do while loop but then it wouldn't be asynchronous as the current image being drawn would have to finish before it moved on.
I'd use the break command but I don't think that works in ISRs and only when its directly in the loop.
I could put the entire rest of the program in the ISR I use for the start button press so that the screen drawing is essentially never returned to but thats really messes, poor coding, and would cause a lot of problems later.
Essentially, I want to make it so that when the button is pressed the program will immediately jump to the part of the program that is the actual game and forget about drawing those images on the screen. Is it possible to somehow have an ISR that doesn't return to what was currently happening after the code in the routine is executed? Basically, once the program starts moving forward (the start button is pressed) I don't want to come back to the function that draws the images unless I explicitly call it again.
The only thing I can think of is the goto command, which I feel in this particular instance would not actually be too bad, though I want to avoid using it for fear of it becoming a habit due to it being a poor solution in most cases. However, that might not even work because I have a feeling that using goto in a ISR would really mess up the stack.
Any ideas? Any suggestions are appreciated.
What you want is basically a "context switch". You should modify the program counter pointer and stack pointer which will be restored when you return from the ISR, and then do the normal ISR return so the interrupt mask is cleared, stack is restored, etc. As noted in the comments to your question, this likely requires some manual assembly code.
I'm not familiar with the MSP430, but on other architectures this is in a structure of saved registers on the kernel stack or interrupt-context stack (or maybe just "the stack" on some microcontrollers), or it might be in some special registers, and it's saved automatically by the CPU when it jumps to your ISR. So you have to change these register pointers where they are.
If you relax your requirement from "immediately" to "so fast that the user doesn't notice", you can put the if (button_pressed) into some loop in the image drawing routine.
If you really want to abort the image drawing immediately, you can do so by resetting the MCU (for example, by writing a wrong password to the WDT). In the application initialization code, check if one of the causes of the reset was your own software:
bool start_button = false;
for (;;) {
int cause = SYSRSTIV;
if (cause == SYSRSTIV_WDTKEY)
start_button = true;
if (cause == SYSRSTIV_NONE)
break;
// you might handle debugging of other reset causes here ...
}
if (!start_button)
draw_images();
else
actual_game();
(This assumes that your code never accidentally writes a wrong WDT password, but even if that happens, you're only skipping the intro images.)
My problem is the following:
I have a pointer that stores a framebuffer which is constantly changed by some thread.
I want to display this framebuffer via OpenGL APIs. My trivial choice is to use glTexImage2D and load the framebuffer again and again at every time. This loading over the framebuffer is necessary because the framebuffer is changed by outside of the OpenGL APIs. I think that there could be some methods or tricks to speed up such as:
By finding out the changes inside the framebuffer (Is it even possible?)
Some method for fast re-loading of image
OpenGL could directly using the pointer of the framebuffer? (So less framebuffer copying)
I'm not sure the above approaches are valid or not. I hope if you could give some advices.
By finding out the changes inside the framebuffer (Is it even possible?)
That would reduce the required bandwidth, because it's effectively video compression. However the CPU load is notably higher and its much slower than to just DMA copy the data.
Some method for fast re-loading of image
Use glTexSubImage2D instead glTexImage2D (note the …Sub…). glTexImage2D goes through a full texture initialization each time its called, which is rather costly.
If your program does additional things with OpenGL rather than just displaying the image, you can speed things up further, by reducing the time the program is waiting for things to complete by using Pixel Buffer Objects. The essential gist is
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboID);
void *pbuf = glMapBuffer();
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
thread image_load_thread = start_thread_copy_image_to_pbo(image, pbuf);
do_other_things_with_opengl();
join_thread(image_load_thread();
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboID);
glUnmapBuffer();
glBindTexture(…);
glTexSubImage2D(…, NULL); /* will read from PBO */
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
draw_textured_quad();
Instead of creating and joining with the thread you may as well use a thread pool and condition variables for inter thread synchronization.
OpenGL could directly using the pointer of the framebuffer? (So less framebuffer copying)
Stuff almost always has to be copied around. Don't worry about copies, if they are necessary anyway. The Pixel Buffer Objects I outlined above may or may not work with a shader copy in system RAM. Essentially you can glBufferMap into your process' address space and directly decode into that buffer you're given. But that's no guarantee that a further copy is avoided.
I have an image generator which would benefit from running in threads. I am intending to use POSIX threads, and have written some mock up code based on https://computing.llnl.gov/tutorials/pthreads/#ConVarSignal to test things out.
In the intended program, when the GUI is in use, I want the generated lines to appear from the top to the bottom one by one (the image generation can be very slow).
It should also be noted, the data generated in the threads is not the actual image data. The thread data is read and transformed into RGB data and placed into the actual image buffer. And within the GUI, the way the thread generated data is translated to RGB data can be changed during image generation without stopping image generation.
However, there is no guarantee from the thread scheduler that the threads will run in the order I want, which unfortunately makes the transformations of the thread generated data trickier, implying the undesirable solution of keeping an array to hold a bool value to indicate which lines are done.
How should I deal with this?
Currently I have a watcher thread to report when the image is complete (which really should be for a progress bar but I've not got that far yet, it instead uses pthread_cond_wait). And several render threads doing while(next_line());
next_line() does a mutex lock, and gets the value of img_next_line before incrementing it and unlocking the mutex. it then renders the line and does a mutex lock (different to first) to get lines_done checks against height, signals if complete, unlocks and returns 0 if complete or 1 if not.
Given that threads may well be executing in parallel on different cores it's pretty much inevitable that the results will arrive out of order. I think your appraoch of tracking what's complete with a set of flags is quite reasonable.
It's possible that the overall effect might be nicer if used threads in a different granularity. Say give each thread (say) 20 lines to work on rather than one. Then on completion you'd have bigger blocks available to draw, and maybe drawing stripes would look ok?
Just accept that the rows will be done in a non-deterministic order; it sounds like that is happening because they take different lengths of time to render, in which case forcing a completion order will waste CPU time.
This may sound silly but as a user I don't want to see one line rendered slowly from top to bottom. It makes a slow process seem even slower because the user already has completely predicted what will happen next. Better to just render when ready even if it is scattered over the place (either as single lines or better yet as blocks as some have suggested). It makes it look more random and therefore more captivating and less boring to a user like me.
I have a small C program to calculate hashes (for hash tables). The code looks quite clean I hope, but there's something unrelated to it that's bugging me.
I can easily generate about one million hashes in about 0.2-0.3 seconds (benchmarked with /usr/bin/time). However, when I'm printf()inging them in the for loop, the program slows down to about 5 seconds.
Why is this?
How to make it faster? mmapp()ing stdout maybe?
How is stdlibc designed in regards to this, and how may it be improved?
How could the kernel support it better? How would it need to be modified to make the throughput on local "files" (sockets,pipes,etc) REALLY fast?
I'm looking forward for interesting and detailed replies. Thanks.
PS: this is for a compiler construction toolset, so don't by shy to get into details. While that has nothing to do with the problem itself, I just wanted to point out that details interest me.
Addendum
I'm looking for more programatic approaches for solutions and explanations. Indeed, piping does the job, but I don't have control over what the "user" does.
Of course, I'm doing a testing right now, which wouldn't be done by "normal users". BUT that doesn't change the fact that a simple printf() slows down a process, which is the problem I'm trying to find an optimal programmatic solution for.
Addendum - Astonishing results
The reference time is for plain printf() calls inside a TTY and takes about 4 mins 20 secs.
Testing under a /dev/pts (e.g. Konsole) speeds up the output to about 5 seconds.
It takes about the same amount of time when using setbuffer() in my testing code to a size of 16384, almost the same for 8192: about 6 seconds.
setbuffer() has apparently no effect when using it: it takes the same amount of time (on a TTY about 4 mins, on a PTS about 5 seconds).
The astonishing thing is, if I'm starting the test on TTY1 and then switch to another TTY, it does take just the same as on a PTS: about 5 seconds.
Conclusion: the kernel does something which has to do with accessibility and user friendliness. HUH!
Normally, it should be equally slow no matter if you stare at the TTY while its active, or you switch over to another TTY.
Lesson: when running output-intensive programs, switch to another TTY!
Unbuffered output is very slow.
By default stdout is fully-buffered, however when attached to terminal, stdout is either unbuffered or line-buffered.
Try to switch on buffering for stdout using setvbuf(), like this:
char buffer[8192];
setvbuf(stdout, buffer, _IOFBF, sizeof(buffer));
You could store your strings in a buffer and output them to a file (or console) at the end or periodically, when your buffer is full.
If outputting to a console, scrolling is usually a killer.
If you are printf()ing to the console it's usually extremely slow. I'm not sure why but I believe it doesn't return until the console graphically shows the outputted string. Additionally you can't mmap() to stdout.
Writing to a file should be much faster (but still orders of magnitude slower than computing a hash, all I/O is slow).
You can try to redirect output in shell from console to a file. Using this, logs with gigabytes in size can be created in just seconds.
I/O is always slow in comparison to
straight computation. The system has
to wait for more components to be
available in order to use them. It
then has to wait for the response
before it can carry on. Conversely
if it's simply computing, then it's
only really moving data between the
RAM and CPU registers.
I've not tested this, but it may be quicker to append your hashes onto a string, and then just print the string at the end. Although if you're using C, not C++, this may prove to be a pain!
3 and 4 are beyond me I'm afraid.
As I/O is always much slower than CPU computation, you might store all values in fastest possible I/O first. So use RAM if you have enough, use Files if not, but it is much slower than RAM.
Printing out the values can now be done afterwards or in parallel by another thread. So the calculation thread(s) may not need to wait until printf has returned.
I discovered long ago using this technique something that should have been obvious.
Not only is I/O slow, especially to the console, but formatting decimal numbers is not fast either. If you can put the numbers in binary into big buffers, and write those to a file, you'll find it's a lot faster.
Besides, who's going to read them? There's no point printing them all in a human-readable format if nobody needs to read all of them.
Why not create the strings on demand rather that at the point of construction? There is no point in outputting 40 screens of data in one second how can you possibly read it? Why not create the output as required and just display the last screen full and then as required it the user scrolls???
Why not use sprintf to print to a string and then build a concatenated string of all the results in memory and print at the end?
By switching to sprintf you can clearly see how much time is spent in the format conversion and how much is spent displaying the result to the console and change the code appropriately.
Console output is by definition slow, creating a hash is only manipulating a few bytes of memory. Console output needs to go through many layers of the operating system, which will have code to handle thread/process locking etc. once it eventually gets to the display driver which maybe a 9600 baud device! or large bitmap display, simple functions like scrolling the screen may involve manipulating megabytes of memory.
I guess the terminal type is using some buffered output operations, so when you do a printf it does not happen to output in split micro-seconds, it is stored in the buffer memory of the terminal subsystem.
This could be impacted by other things that could cause a slow down, perhaps there's a memory intensive operation running on it other than your program. In short there's far too many things that could all be happening at the same time, paging, swapping, heavy i/o by another process, configuration of memory utilized, maybe memory upgrade, and so on.
It might be better to concatenate the strings until a certain limit is reached, then when it is, write it all out at once. Or even using pthreads to carry out the desired process execution.
Edited:
As for 2,3 it is beyond me. For 4, I am not familiar with Sun, but do know of and have messed with Solaris, There may be a kernel option to use a virtual tty.. i'll admit its been a while since messing with the kernel configs and recompiling it. As such my memory may not be great on this, have a root around with the options to see.
user#host:/usr/src/linux $ make; make menuconfig **OR kconfig if from X**
This will fire up the kernel menu, have a dig around in to see the video settings section under the devices sub-tree..
Edited:
but there's a tweak you put into the kernel by adding a file into the proc filesystem (if a such thing does exist), or possibly a switch passed into the kernel, something like this (this is imaginative and does not imply it actually exists), fastio
Hope this helps,
Best regards,
Tom.