Write once read many in memory key value store - database

I have a particular use case for multiple in memory key value maps that need very fast lookup time. They are set just set once a day so can be considered immutable for all practical purposes. Redis is not an option since it gets CPU throttled in case of multiple threads accessing it. Multi instance redis takes up too much memory because of data replication. The important thing to consider here is that the read rate is very high in bursts. Around 10 million requests in bursts from around 40-50 workers simultaneously.
I was thinking of creating a simple client server architecture with multiple readers connecting to a server to read from shared memory maps. However I wonder if such an architecture already exists and has been tested profusely for this use case in which case I should not be reinventing the wheel.
So to sum up what is my best alternative? TIA.

Might not be suitable for you but you could try RBLDNSD and store your values in DNS. It's high performance and results will be cached, and it's easy to read the values from pretty much any programming environment. To write values to it you'll need to write directly to its zone files, but the format is simple and easy to write.

You don't mention the size of your maps, but given that performance is so critical, it sounds like you may want to consider keeping copies of your 'multiple in memory key value maps' with each worker.
You could then implement a simple mechanism to notify each worker that it's time to refresh their maps (e.g. Redis PUBLISH, or any other pubsub type framework).

At the risk of running afoul of the stackoverlow self-promotion police :-) eXtremeDB might be a consideration. It's not schema-less, but your schema can simply define a key-value pair. It supports MVCC (optimistic, non-blocking) concurrency so even the relatively infrequent writes won't get in the way of readers, and you'll be able to utilize all the CPU cores.

Related

Best DB architecture to maintain/update counters in near real time

I am at the beginning of a project where we will need to manage a near real-time flow of messages containing some ids (e.g. sender's id, receiver's id, etc.). We expect a throughput of about 100 messages per second.
What we will need to do is to keep track of the number of times these ids appeared in a specific time frame (e.g. last hour or last day) and store these values somewhere.
We will use the values to perform some real time analysis (i.e. apply a predictive model) and update them when needed while parsing the messages.
Considering the high throughput and the need to be in real time what DB solution would be the better choice?
I was thinking about a key-value in memory DB that will persist data on disk periodically (like Redis).
Thanks in advance for the help.
The best choice depends on many factors we don’t know, like what tech stack is your team already using, how open are they to learning new things, how much operational burden are you willing to take on, etc.
That being said, I would build a counter on top of DynamoDB. Since DynamoDB is fully managed, you have no operational burden (no database server upgrades, etc.). It can handle very high throughput, and it has single-digit millisecond latency for writes and reads to a single row. AWS even has documentation describing how to use DynamoDB as a counter.
I’m not as familiar with other cloud platforms, but you can probably find something in Azure or GCP that offers similar functionality.

How expensive is access to database? How often do we access to it?

I'm about to write an application for Android, and it will use Mysql.
I know that access to DB is really expensive in terms of time, and would like to know how often do applications like instant messaging, online gaming access to databases?
For example in a game, we would like to save the positions of a player in the world, when he's moving all the time.
Is the database access actually not expensive, and there is a way to be connected to it all the time and just do request that are actually not expensive?
Or is IT really expensive in anyway, and there are techniques to access to it for example every X interval of time, and saving it locally in the meantime?
I Know that my question is really general, and it depends always on what we need and want.
My question came out because i made a really simple login application that connects and does 1 request to database, and it takes 1 second (a lot!!) to get the result, so how online applications can be so fast?
Thank you
Before answering this I would recommend simulating the process as much as possible, benchmarking and you can work towards the best solution for your use case.
e.g. If I have an application submitting data to a database simulate the submission so I can easily run multiple submissions at the same time and see what the bottle neck is...and see how it compares when I using caching, replication, indexes, etc.
Also reading company blogs can be helpful as they often share success stories that support the usage of a particular approach
How expensive is access to database?
Accessing a database can be a pretty quick operation
SELECT 1; // 0.005 Secs :D
However there are situations that can lead to poor performance (slow reads, writes and updates) but there are some relatively simple ways to combat this
Indexes
The best way to improve the performance of SELECT operations is to
create indexes on one or more of the columns that are tested in the
query. The index entries act like pointers to the table rows, allowing
the query to quickly determine which rows match a condition in the
WHERE clause, and retrieve the other column values for those rows.
Replication
spreading the load among multiple slaves to improve performance. In
this environment, all writes and updates must take place on the master
server. Reads, however, may take place on one or more slaves. This
model can improve the performance of writes (since the master is
dedicated to updates), while dramatically increasing read speed across
an increasing number of slaves.
How often do we access to it?
If you are solely using a database you will access it every time you n position and every time you need to find out their position.
This is where you would explore options to prevent accessing the database.
Memory caches such as redis or memcache
Replication - Only read from slaves
It depends on your design and requirement.
1) Most of the applications manage Connection Pools to minimize the initialization time.
2) Most of the ORM frameworks have external Cache to improve the reading performance. So if you do heavy data reading in your application then don't worry about storing it in locally. The Cache will be effective in this case.
3) When you store locally either in File (or) some format, then it will also add extra performance delay.
4) If you keep the data in primary memory, then obviously Game performance would be better. That's why Gamers prefer high end graphics card, and huge RAM.
For most databases there is the option of batch insertions. Obviously even a small overhead will accumulate if you have to many connections over time. And performing single insertions will have a greater overhead than on batch. The only issue is how often?.... And you should test how often you wan't to insert and how much information you should store locally before doing a batch insertion.

What are the low hanging fruit for optimizing google app engine with respect to quota usage?

Everyone learns to use Memcache pretty quick. Another one I've learned recently is setting indexed=False for Model properties that I am not going to query against. What are some others? What are the big ones?
Don't use offset in queries. Use cursors instead.
Explanations: offset loads all data up to offset+limit and charges you for it, but only returns limit entities.
Minimize instance use, by tweaking idle instances and pending latency appropriately for your app.
A couple helped us (not all may be low-hanging at first). First, we denormalized our datastore to reduce joins. I'm using SQL terms because I came from a SQL background. By spreading commonly queried elements around, we reduced the number of reads we had to make considerably, even after factoring in Memcache. Potentially increases writes but for most apps, the number of reads far outweighs the number of writes.
Next, we started using task queues, backends, and the channel API more often. I don't remember specific examples but I do remember we were able to reduce our front-end usage down below the free quota mark by moving some processing around to queues and backends and by sending data down via channel rather than having the client poll.
Also, we use objectify for our data access which we configure to automatically use memcache wherever appropriate.

Trading applications trade-offs : Databases and low-latency

Databases are required for almost every business application to store data and the transactions done on that data. The transactions typically take a time of the order of milliseconds. At the same time, in a trading application the one thing which is not at all acceptable is "latency". So, what are the trade-offs made in such applications which require an upper limit on latency?
For example, a trade has been placed by the customer, it must pass a few checks, which are stored in the database, requiring a DB fetch. Then, the trade should be passed on to an OMS/ORS or the exchange. And, at each layer, it would be required to store some sort of transaction data in the database. How should one maintain a balance between transaction persistence and low-latency?
Two things:
Caching: Cache the rules in your application, so you don't need to hit the database for every trade
Threading: Put the code that stores the transaction data in the database into another thread. Like this, you can route the trade to the exchange immediately and safe the data in the database parallel to that.
Now a days people use in-memory transaction system, so you cut down network latency that you get by database.
Here are some things that you can consider for low latency
To achieve low latency in java you have to take control of GC in java, there are many ways to do that for eg pre-allocate objects(i.e use flyweight design pattern), use primitive objects - trove is very good for that, all data structure are based on primitive, Reuse object instance for eg create system wide dictionary to reduce creating new objects, very good option when reading data from stream/socket/db
-Try to reduce contention use wait-free algo( which is bit difficult), lock free algo. You can find tons of example for that
-Use in-memory computing. Memory is cheap, you can have tera byte of data in memory.
-Use mechnical sympathy - Refer lmax disruptor, excellent framework

What's more costly on every page view - Database Writes or File Writes?

What is the most efficient solution when you need to record some data on every page view in your application - should you write to a file or write to the database?
Or maybe neither - perhaps you should cache the data in memory or a file and only write it to the database (or file system if you use a memory cache) occasionally?
If it's purely recording a small amount of data with no subsequent lookups, straight file I/O is almost guaranteed to be more efficient. You're losing all the advantages of a DBMS though -- indexing, transactional integrity (really, ACID in general), concurrent access, etc..
It almost sounds like you're talking about what amounts to simple logging. If that's the case, and you don't need to do frequent complex queries on the resulting data, you're probably better off with straight file I/O if performance is a serious issue. Be careful of concurrent-write issues, though.
If the properties of an RDBMS are desirable, you might think about using SQLite, which for simplistic loads will get you better performance than most RDBMSs with less overhead, at the cost of some of the benefits (highly concurrent access and availability over the network to other machines are a couple of the "biggies"). It still wouldn't be as fast as straight file I/O in the general case, though.
Your later mention of it being for page view tracking causes me to ask: Are you incrementing a counter, rather than logging data about the page view? If so, I'd strongly suggest going with something like SQLite (doing something like UPDATE tbl SET counter = counter+1). You really don't want to get into the timing issues involved in doing this by hand -- if you don't do it right, you'll start losing counts on simultaneous access (A reads "100", B reads "100", A writes "101", B writes "101"; B should have written 102, but has no way of knowing that).
Conceptually, writing to the database is always slower than writing to a file.
The database has to write to a file too, with the extra overhead of communication to get the data to the database, so it can write it to a file. Therefore, it must be slower.
That said, databases do disk I/O very well, probably better than you will. Don't be surprised if you find out that a simple file logger is slower than writing it to a database. The database has a lot of I/O optimizations, and has some tricks available that you may not (depending on your web lanaguage and environment).
Don't be surprised if the answer changes over time. When your site is small, logging to a database is very fast. As your site grows, the logging table can become a major pain: It uses a lot of disk space, makes the backups take forever, and consumes all the disk I/O when you try to query it. This is why you should benchmark both methods yourself. Then you can re-test in the future, when conditions change.
Hitting the database is most likely going to be more expensive than writing to a file.
If your pageviews per second are high, and if the data doesn't need to be available in the database right away, then writing to a file and periodically loading the data into the DB will be a more optimal solution.
However it all depends on the nature of the data you're recording per page view and how critical it is to whatever business function it serves.
That highly depends on your needs for data safety. If you can afford to lose some data in case of a crash then keeping the data in memory and writing it periodically to a persistent store is certainly the most efficient way to go.
Edit: You mentioned pageviews. In that case I would keep the counters in memory and periodically update a database table (like every minute or so).
That depends.
Ands it really does: it depends on the DBMS and/or the OS+filesystem you use. In other words: your mileage varies.
If you just append data somewhere modern DBMS/OS+filesystems should handle this equally well and fast. Problems arise when you want to change data.
Caching - depends too on what kind of caching granularity you can afford (need to have every stepped logged crash-safe versus potential saving).
Use a hybrid solution like redis its designed for this sort of stuff

Resources