Kyoto Tycoon remove expired recorde from memory - database

We have small setup of Kyoto Tycoon [Kyoto Tycoon 0.9.55 (2.18) on Linux (Kyoto Cabinet 1.2.75)] which is Fully In-Memory DB & shared in 3 with Master slave architecture for each shared.
Presently we have issue with expired records which stays in memory & memory utilization goes UP.
When I checked this doc http://fallabs.com/kyototycoon/spex.html#tips
where I found "ktremotemgr vacuum" as per description it perform full GC operation.
But I was looking for another way like something config parameter which take cares of removing expired records from memory.
Any help on this please
Thanks

kt will do this at random and in some cases it is LRU based. Yes the mem utilization will go up for some time.
Following is some documentation from the same link.
In addition, automatic deletion by the capacity limit is performed at random. In that case, fresh records may also be deleted soon. So, setting effectual expiration time not to reach the limit is very important. If you cannot calculate effectual expiration time beforehand, use the cache hash database instead of the default stash database. The following setting is suggested.
$ ktserver '*#bnum=20000000#capsiz=8g'
Note that the space effiency of the cache hash database is worse than that of the stash database. The limit should be up to 50% of the total memory size of the machine. However, automatic deletion by the "capsiz" parameter (not "ktcapsiz") of the cache hash database is based on LRU algorithm, which prevents fresh records from sudden deletion.

Related

Snowflake query result processing

Can someone please help to understand from which layer in snowflake data is being fetched in this below plan? I understand snowflake uses either of 3 (besides results from metadata for queries like select count(*)) - result cache, warehouse cache or disk IO. In the below plan - its not from result cache ( as for the plan would say 'query result reuse'), its not showing any remote disk I/O and also the cache usage is 0%.
So its not very clear how data is being processed here. Any thoughts or pointers will be helpful.
The picture says that 0.44MB were scanned.
The picture says that 0% of those 0.44MB came from the local cache.
Hence 0.44MB were read from the main storage layer.
The data is read from the storage layer. I will assume AWS, thus from the S3 there you table is stored. There are three primary reasons for a remote read:
It is the first time this warehouse has used this data. This is the same thing that happens if you stop/start the warehouse.
The data has changed (which can be anything from 0% - 100% change of partitions), given in your example there is only one partition, any insertion happening in the back ground will cause 100% cache invalidation.
The data was flushed from the local caches by more active data, if you read this table once every 30 minutes, but between then read GB of other tables, like all caches low usage data gets dropped.
The result cache can be used, but it also can be turned off for a session, but then local disk cache still happens. And you WHERE 20 = 20 in theory might cache bust the result cache, but as it's a meaningless statement it might not. But given your results it seems, at this point of time it's enough to trick the result cache. Which implies if you want to not avoid the result cache, stop changing the number, and it you want to avoid, this seems to work.
I see you have highlighted the two spilling options, those are when working state data is too large for memory, and too large for local disk so are sent to remote (s3). The former is a sign your warehouse is undersized, and both are a hint that something in your query is rather bloated. Now maybe that is what you want/needed, but it slows things down very much. Now to know if there is perhaps "another way" if in the profile plan there is some step that goes 100M rows -> 100GB rows -> 42 rows this implies a giant mess was made, and then some filter smashed the heck out of nearly all of it, which implies the work could be done different, to avoid that large explosion/filtering.

What operations are O(n) on the number of tables in PostgreSQL?

Let's say theoretically, I have database with an absurd number of tables (100,000+). Would that lead to any sort of performance issues? Provided most queries (99%+) will only run on 2-3 tables at a time.
Therefore, my question is this:
What operations are O(n) on the number of tables in PostgreSQL?
Please note, no answers about how this is bad design, or how I need to plan out more about what I am designing. Just assume that for my situation, having a huge number of tables is the best design.
pg_dump and pg_restore and pg_upgrade are actually worse than that, being O(N^2). That used to be a huge problem, although in recent versions, the constant on that N^2 has been reduced to so low that for 100,000 table it is probably not enough to be your biggest problem. However, there are worse cases, like dumping tables can be O(M^2) (maybe M^3, I don't recall the exact details anymore) for each table, where M is the number of columns in the table. This only applies when the columns have check constraints or defaults or other additional info beyond a name and type. All of these problems are particularly nasty when you have no operational problems to warn you, but then suddenly discover you can't upgrade within a reasonable time frame.
Some physical backup methods, like barman using rsync, are also O(N^2) in the number of files, which is at least as great as the number of tables.
During normal operations, the stats collector can be a big bottleneck. Everytime someone requests updated stats on some table, it has to write out a file covering all tables in that database. Writing this out is O(N) for the tables in that database. (It used to be worse, writing out one file for the while instance, not just the database). This can be made even worse on some filesystems, which when renaming one file over the top of an existing one, implicitly fsyncs the file, so putting it on a RAM disc can at least ameliorate that.
The autovacuum workers loop over every table (roughly once per autovacuum_naptime) to decide if they need to be vacuumed, so a huge number of tables can slow this down. This can also be worse than O(N), because for each table there is some possibility it will request updated stats on it. Worse, it could block all concurrent autovacuum workers while doing so (this last part fixed in a backpatch for all supported versions).
Another problem you might into is that each database backend maintains a cache of metadata on each table (or other object) it has accessed during its lifetime. There is no mechanism for expiring this cache, so if each connection touches a huge number of tables it will start consuming a lot of memory, and one copy for each backend as it is not shared. If you have a connection pooler which hold connections open indefinitely, this can really add up as each connection lives long enough to touch many tables.
pg_dump with some options, probably -s. Some other options make it depend more on size of data.

Solr indexing issue (out of memory) - looking for a solution

I have a large index of 50 Million docs. all running on the same machine (no sharding).
I don't have an ID that will allow me to update the wanted docs, so for each update I must delete the whole index and to index everything from scratch and commit only at the end when I'm done indexing.
My problem is that every few index runs, My Solr crashes with out of memory exception, I am running with 12.5 GB memory.
From what I understand, until the commit everything is being saved in the memory, so I'm storing in the memory 100M docs instead of 50M. am I right?
But I cannot make commits while I'm indexing, because I deleted all docs at the beginning and than I'll run with partial index which is bad.
Is there any known solutions for that? can sharding solve it or I still going to have the same problem?
Is there a flag that allow me to make soft-commits but it won't change the index until the hard-commit?
You can use the master slave replication. Just dedicate one machine to do your indexing (master solr), and then, if it's finished, you can tell the slave to replicate the index from the master machine. The slave will download the new index, and it will only delete the old index if the download is successful. So it's quite safe.
http://wiki.apache.org/solr/SolrReplication
One other solution to avoid all this replication set-up is to use a reverse proxy, put nginx or something of the like in front of your solr. Use one machine for indexing the new data, and the other for searching. And you can just make the reverse proxy to always point at the one not currently doing any indexing.
If you do one of them, then you can just commit as often as you want.
And because it's generally a bad idea to do indexing and search in one same machine, I will prefer to use the master-slave solution (not to mention you have 50M docs).
out of memory error can be solved by providing more memory to jvm of your container it has nothing to do with your cache .
Use better options for Garbage collection because source of error is your jvm memory being full.
Increase the number of threads because if number of threads for a process is reached a new process is spawn (which have same number of threads as prior one and same memory allocation ).
PLease also write about cpu spike , and any other type of caching mechanism you are using
you can try one thing thats to put all auto warmup to 0 it would speed up commit time
regards
Rajat

Measuring impact of sql server index on writes

I have a large table which is both heavily read, and heavily written (append only actually).
I'd like to get an understanding of how the indexes are affecting write speed, ideally the duration spent updating them (vs the duration spent inserting), but otherwise some sort of feel for the resources used solely for the index maintenance.
Is this something that exists in sqlserver/profiler somewhere?
Thanks.
Look at the various ...wait... columns under sys.dm_db_index_operational_stats. This will account for waits for locks and latches, however it will not account for log write times. For log writes you can do a simple math based on row size (ie. a new index that is 10 bytes wide on a table that is 100 bytes wide will add 10% log write) since log write time is driven just by the number of bytes written. The Log Flush... counters under Database Object will measure the current overall DB wide log wait times.
Ultimately, the best measurement is base line comparison of well controlled test load.
I don't believe there is a way to find out the duration of the update, but you can check the last user update on the index by querying sys.dm_db_index_usage_stats. This will give you some key information on how often the index is queried and updated, and the datetime stamp of this information.

Comment post scalability: Top n per user, 1 update, heavy read

Here's the situation. Multi-million user website. Each user's page has a message section. Anyone can visit a user's page, where they can leave a message or view the last 100 messages.
Messages are short pieces of txt with some extra meta-data. Every message has to be stored permanently, the only thing that must be real-time quick is the message updates and reading (people use it as chat). A count of messages will be read very often to check for changes. Periodically, it's ok to archive off the old messages (those > 100), but they must be accessible.
Currently all in one big DB table, and contention between people reading the messages lists and sending more updates is becoming an issue.
If you had to re-architect the system, what storage mechanism / caching would you use? what kind of computer science learning can be used here? (eg collections, list access etc)
Some general thoughts, not particular to any specific technology:
Partition the data by user ID. The idea is that you can uniformly divide the user space to distinct partitions of roughly the same size. You can use an appropriate hashing function to divide users across partitions. Ultimately, each partition belongs on a separate machine. However, even on different tables/databases on the same machine this will eliminate some of the contention. Partitioning limits contention, and opens the door to scaling "linearly" in the future. This helps with load distribution and scale-out too.
When picking a hashing function to partition the records, look for one that minimizes the number of records that will have to be moved should partitions be added/removed.
Like many other applications, we could assume the use of the service follows a power law curve: few of the user pages cause much of the traffic, followed by a long tail. A caching scheme can take advantage of that. The steeper the curve, the more effective caching will be. Given the short messages, if each page shows 100 messages, and each message is 100 bytes on average, you could fit about 100,000 top-pages in 1GB of RAM cache. Those cached pages could be written lazily to the database. Out of 10 Mil users, 100,000 is in the ballpark for making a difference.
Partition the web servers, possibly using the same hashing scheme. This lets you hold separate RAM caches without contention. The potential benefit is increasing the cache size as the number of users grows.
If appropriate for your environment, one approach for ensuring new messages are eventually written to the database is to place them in a persistent message queue, right after placing them in the RAM cache. The queue suffers no contention, and helps ensure messages are not lost upon machine failure.
One simple solution could be to denormalize your data, and store pre-calculated aggregates in a separate table, e.g. a MESSAGE_COUNTS table which has a column for the user ID and a column for their message count. When the main messages table is updated, then re-calculate the aggregate.
It's just shifting the bottleneck from one place to another, but it might move it somewhere that's less of a burden.

Resources