Write back buffer in lpc - arm

I was reading the Lpc2148 Manual and in the Static Ram section I came across
Write back buffer
The SRAM controller incorporates a write-back buffer in order to prevent CPU stalls
during back-to-back writes. The write-back buffer always holds the last data sent by
software to the SRAM. This data is only written to the SRAM when another write is
requested by software.(the data is only written to the SRAM when software does another
write). If a chip reset occurs, actual SRAM contents will not reflect the most recent write
request (i.e. after a "warm" chip reset, the SRAM does not reflect the last write operation).
Any software that checks SRAM contents after reset must take this into account. Two
identical writes to a location guarantee that the data will be present after a Reset.
What does it mean. and what did he mean by CPU stalls and back to back writes

I'm not an EE so this is a layman's analogy. You are the only shopper at a supermarket. Because business is slow, there is only one cashier working this shift. There is no checkout counter - only a cashier and a barcode scanner. You hand items, one at a time, to the cashier. When the cashier is holding an item, they cannot take another item. Only when the cashier is done scanning an item, can they accept another one. If you don't have a bag or a cart and you bring individual items from the shelves to the cashier, there is no problem. But if you bring more than one item to the cashier and try to hand them all at once (back to back) you can't. You hand them one by one and you wait for each to be processed. This is called a stall.
Suddenly, the checkout counter with the conveyor belt is invented. Now you place your shopping at the counter and are free to go shop for more stuff. The cashier scans items at their own (slow) pace, because there is both a place for you to put them and a way for the cashier to reach them. The number of items you can put on the counter is limited, but it does allow you to drop off some stuff and continue shopping, making your shopping much more efficient.
There is a slight problem: before the invention of the checkout counter, when you wanted to know how much money the shopping spree is going to cost you, you could just look at the total displayed by the cash register. But now, you need to look at the cash register and the items on the counter that have not yet been processed.
That's why the read-from-SRAM instruction first surreptitiously checks whether the address you're reading from is one of the addresses to-be-written-to in the write queue/buffer. If so, it takes the value from the latest write-queue-entry with the same address instead of actually reading from SRAM. Reads from addresses that are in the write queue can be faster than reads from SRAM, but reads from addresses that are not currently in the write queue are made a little slower by the overhead (or at least less energy efficient, if SRAM reads and cache searches are done in parallel). Overall, this makes reads worse but the gains from no waiting for writes are worth it.
What they are telling you is that their cashier has an off-by-one bug: it drains the write queue not until it is empty but until there is only one item left on the counter. A snickers bar. And then, the cashier will look at that snickers bar forever and not put it through checkout. If you need to purchase the snickers bar, you need to put another item on the counter. Then, the cashier will happily move the conveyor belt and take the snickers bar. The text suggests you use another snickers bar, but you don't have to. In general, the last item you put on the counter will never be processed by the cashier.

Related

Estimate rate of memory and cpu stall

I'm trying to estimate a few things. For example I have a specific function that executes after another and I was wondering if it's CPU bound so it'd be ok to move it farther away from the function or if its cache/memory bound meaning I shouldn't move it farther and I may want to split some work.
Some things I want to know is written below. My question is what events might I want to look at in order to estimate the ratio of memory to cpu stalls? What gotchas may I want to know about for the suggested events? I'm reading through perf_event_open and it isn't easy to understand what the data is measuring. Here's my starting point
Memory VS CPU bound (is there an easy way to know?)
Backend stalls (PERF_COUNT_HW_STALLED_CYCLES_BACKEND, does this report per cpu or process??)
Unique L1 data lines accessed (Is this available/possible to get?)
Instruction count (PERF_COUNT_HW_INSTRUCTIONS)
Cycle count (PERF_COUNT_HW_CPU_CYCLES or __rdtscp)

Read consistency on page split

For simplicity, lets suppose we have some non-leaf page A where key is int.
We want to find key 4812 and at this point we have entries 2311 and 5974.
So, current thread acquires a shared latch for that page and calculates that it needs leaf page B (for data between 2311 and 5974).
At the same time, some other thread is inserting on page B, previously acquiring exclusive latch on it.
Because of insert, it has to split page on entry 3742 and create new Page C with upper half of data.
First thread has finished reading and releases the latch on Page A.
If it tries to find key 4812 on Page B (after exclusive latch is released) it won't find it, because it was moved to Page C during page split.
If I understand correctly, latch is implemented with spinlock and it should be short lived.
In order to prevent this kind of problem, writer thread would have to keep latches on all traversed non-leaf pages, which would be extremely inefficient.
I have basically 2 questions:
Is latch on page level only or it can be on row level also? I couldn't find information about that. If that was the case, then impact wouldn't be that big, but it would still be wasteful when there are no page splits (and that's mostly the case).
Is there some other mechanism to cover this?
My question is about Sql Server because I'm familiar with its internals, but this should apply to mostly any other database.

How to efficiently wait for data becoming available in RDBS (PostgreSQL) table?

I'm building a web service which reserves unique items to users.
Service is required to handle high amounts of concurrent requests that should avoid blocking each other as much as possible. Each incoming request must reserve n-amount of unique items of the desired type, and then process them successfully or release them back to the reservables list so they can be reserved by an another request. A succesful processing contains multiple steps like communicating with integrated services and other time consuming steps, so keeping items reserved with a DB transaction until the end would not be an efficient solution.
Currently I've implemented a solution where reservable items are stored in a buffer DB table where items are being locked and deleted by incoming requests with SELECT FOR UPDATE SKIP LOCKED. As service must support multiple item types, this buffer table contains only n amount of items per type at a time as the table size would otherwise grow into too big as there is about ten thousand different types. When certain item types are all reserved (selected and removed) the request locks the item type and adds more reservable items into the buffer. This fill operation requires integration calls and may take some time. During the fill, all other operations needs to wait until the filling operation finishes and items become available. This is where the problem arises. When thousands of requests wait for the same item type to become available in the buffer, each needs to poll this information somehow.
What could be an efficient solution for this kind of polling?
I think the "real" answer is to start the refill process when the stock gets low, rather than when it is completely depleted. Then it would already be refilled by the time anyone needs to block on it. Or perhaps you could make the refill process work asynchronously, so that the new rows are generated near-instantly and then the integrations are called later. So you would enqueue the integrations, rather than the consumers.
But barring that, it seems like you want the waiters to lock the "item type" in a mode incompatible with the how the refiller locks it. Then it will naturally block, and be released once the refiller is done. The problem is that if you want to assemble an order of 50 things and the 47th is depleted, do you want to maintain the reservation on the previous 46 things while you wait?
Presumably your reservation is not blocking anyone else, unless the one you have reserved is the last one available. In which case you are not really blocking them, just forcing them to go through the refill process, which would have had to be done eventually anyway.

Transactional counter with 5+ writes per second in Google App Engine datastore

I'm developing a tournament version of a game where I expect 1000+ simultaneous players. When the tournament begins, players will be eliminated quite fast (possibly more than 5 per second), but the process will slow down as the tournament progresses. Depending when a player is eliminated from the tournament a certain amount of points is awarded. For example a player who drops first, gets nothing, while player who is 500th, receives 1 point and the first place winner receives say 200 points. Now I'd like to award and display the amount of points right away after a player has been eliminated.
The problem is that when I push a new row into a datastore after a player has been eliminated, the row entity has to be in a separate entity group so I would not hit the gae datastore limit of 1-5 writes per second for 1 entity group. Also I need to be able to read and write a count of rows consistently so I can determine the prize correctly for all the players that get eliminated.
What would be the best way to implement the datamodel to support this?
Since there's a limited number of players, contention issues over a few a second are not likely to be sustained for very long, so you have two options:
Simply ignore the issue. Clusters of eliminations will occur, but as long as it's not a sustained situation, the retry mechanics for transactions will ensure they all get executed.
When someone goes out, record this independently, and update the tournament status, assigning ranks, asynchronously. This means you can't inform them of their rank immediately, but rather need to make an asynchronous reply or have them poll for it.
I would suggest the former, frankly: Even if half your 1000 person tournament went out in the first 5 minutes - a preposterously unlikely event - you're still looking at less than 2 eliminations per second. In reality, any spikes will be smaller and shorter-lived than that.
One thing to bear in mind is that due to how transaction retries work, transactions on the same entity group that occur together will be resolved in semi-random order - that is, it's not a strict FIFO queue. If you require that, you'll have to enforce it yourself, though that's a far from trivial thing to do in a distributed system of any sort.
the existing comments and answers address the specific question pretty well.
at a higher level, take a look at this post and open source library from the google code jam team. they had a similar problem and ended up developing a scalable scoreboard based on the datastore that handles both updates and requests for arbitrary pages efficiently.

Alarm history stack or queue?

I'm trying to develop an alarm history structure to be stored in non-volatile flash memory. Flash memory has a limited number of write cycles so I need a way to add records to the structure without rewriting all of the flash pages in the structure each time or writing out updated pointers to the head/tail of the queue.
Additionally once the available flash memory space has been used I want to begin overwriting records previously stored in flash starting with the first record added first-in-first-out. This makes me think a circular buffer would work best for adding items. However when viewing records I want the structure to work like a stack. E.g. The records would be displayed in reverse chronological order last-in-first-out.
Structure size, head, tail, indexes can not be stored unless they are stored in the record itself since if they were written out each time to a fixed location it would exceed the maximum write cycles on the page where they were stored.
So should I use a stack, a queue, or some hybrid structure? How should I store the head, tail, size information in flash so that it can be re-initialized after power-up?
See a related question Circular buffer in Flash.
Lookup ring-buffer
Assuming you can work out which is the last entry (from a time stamp etc so don't need to write a marker) this also has the best wear leveling performance.
Edit: Doesn't apply to the OP's flash controller: You shouldn't have to worry about wear leveling in your code. The flash memory controller should handle this behind the scenes.
However, if you still want to go ahead an do this, just use a regular circular buffer, and keep pointers to the head and tail of the stack.
You could also consider using a Least Recently Used cache to manage where on flash to store data.
You definitely want a ring buffer. But you're right, the meta information is a bit...interesting.
Map your entries on several sections. When the sections are full, overwrite starting with the first section. Add a sequence-number (nbr sequence numbers > 2 * entries), so on reboot you know what the first entry is.
You could do a version of the ring-buffer, where the first element stored in the page is number of times that page has been written. This allows you to determine where you should write next by finding the first page where the number is lower than the previous page. If they're all the same, you start from the beginning with the next number.

Resources