Performance issue with loading a list of cached elements - database

This is a general cache question, regardless of the code used behind, but for the record I am using Ehcache for Java.
In a classic situation where a system have to load a dynamic list of elements from a database (so query is based on some criteria), are there any known tricks to improve loading performance by leveraging the cache system.
My guess would to be to load a list of IDs instead of a list of elements and then fetch each one of them individually so we can leverage on the caching of the entities.
Thanks for your help.
PS: I hope the question is clear enough. Any suggestion is welcomed.

The generic answer is: "It all depends".
So, yes, if you expect entities to be cached by IDs, loading a list of IDs and then fetching the cache can be faster. However, you need to be sure of it. Because otherwise, fetching one entity after the other is tremendously slow.
But then, just querying from a DB can be fast too. If it doesn't happen too much. Optimizing DB to Java mapping can improve performance as well.
Some other tricks includes retrieving only the data you need. Not the entire entity. I usually start with that.
So as I said, it depends.

Related

Is repeatedly upserting the same values bad?

I'm working on an app that needs to keep track of YouTube videos. I want to periodically pull the info on the relevant videos into Datomic and then serve them as embeds with titles, descriptions, etc. A naive way to do that would be to periodically fetch all the info I want and upsert it into my db.
But most of the time, the information won't have changed. Titles and descriptions can change (and I want to notice when they do), but usually they won't. Using the naive approach, I'd be updating entities with the same value over and over again.
Is that bad? Will I just fill up my storage with history? Will it cause a lot of reindexing? Or should I not worry about that, and let Datomic take care of itself?
A less-naive approach would look at the current values and see if they need updating. If that's a better idea, is there an easy way to do that, or should I expect to be writing a lot of custom code for it?
Upserting too often is definitely an issue for performance of the database. Yes, it will cause indexing issue, but also in terms of speed, its not an ideal solution.
If your app's performance has time as an important factor, I'd write custom code to check and then update if necessary

HttpContext.Current.Cache VS. SQL Table Performance

We have a poorly designed shopping cart database. All processed objects that will be used to the front site are stored in HttpContext.Current.Cache on Application_Start. Processed objects I mean results from sql script that has many joins and where conditions.
Looking for best solution to remove caching or improve the current caching process. I'm thinking of storing the processed objects to a SQL Server table that will be repopulated every midnight. And use Dapper ORM to retrieve data from this SQL Server table and implement output caching.
Hope someone will share a high speed and maintainable solution for this problem. :)
Thanks!
What you are describing is really : duplicating the data into a second (technically redundant) model, more suitable for query. If that is the case, then sure : have fun with that - that isn't exactly uncommon. However, before doing all that, you might want to try indexed views - it could be that this solves most everything without you having to write all the maintenance code.
I would suggest, however, not to "remove caching" - but simply "make the cache expire at some point"; there's an important difference. Hitting the database for the same data on every single request is not a great idea.

Using application's internal cache while working with Cassandra

As I've been working with traditional relational database for a long time, moving to nosql, especially Cassandra, is a big change. I ussually design my application so that everything in the database are loaded into application's internal caches on startup and if there is any update to a database's table, its corresponding cache is updated as well. For example, if I have a table Student, on startup, all data in that table is loaded into StudentCache, and when I want to insert/update/delete, I will call a service which updates both of them at the same time. The aim of my design is to prevent selecting directly from the database.
In Cassandra, as the idea is to build table containing all needed data so that join is unnencessary, I wonder if my favorite design is still useful, or is it more effective to query data directly from the database (i.e. from one table) when required.
Based on your described usecase I'd say that querying data as you need it prevents storing of data you dont need, plus what if your dataset is 5Gb? Are you still going to load the entire dataset?
Maybe consider a design where you dont load all the data on startup, but load it as needed and then store it and check this store before querying again, like what a cache does!
Cassandra is built to scale, your design cant handle scaling, you'll reach a point where your dataset is too large. Based on that, you should think about a tradeoff. Lots of on-the-fly querying vs storing everything in the client. I would advise direct queries, but store data when you do carry out a query, dont discard it and then carry out the same query again!
I would suggest to query the data directly as saving all the data to the application makes the applications performance based on the input. Now this might be a good thing if you know that the amount of data will never exceed your target machine's memory.
Should you however decide that this limit should change (higher!) you will be faced with a problem. Taking this approach will be fast when it comes down to searching (assuming you sort the result at start) but will pretty much kill maintainability.
The former favorite 'approach' is however still usefull should you choose for this.

Symfony2 Doctrine Use Cache Tables

For the project I'm working on, we have a fully normalized database where no information is redundant.
I'd like to keep this method, but also add "cache" tables, which are essentially tables which have pre-computed information. I'd love to be able to have this information in separate tables (which could then be blown away and regenerated as needed).
For example, part of this involves a forum. One "cached" value would be the number of posts a user has made. There is no need to keep this in any of the normalized tables, because it can be calculated based on a count of posts linked with that user. However, this is a (relatively) expensive call, so the cache table would keep track of this value for me and I can pull from it as needed.
I'm also strongly considering using a NoSQL database like MongoDB for this, because the cached tables would essentially have no joins or foreign keys (making it perfect for MongoDB).
Any ideas how I should approach this using Doctrine in Symfony2? Anyone done this before?
Thanks a ton!
Update
As greg0ire comments, it looks like Doctrine has some built in caching functionality: http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/caching.html
Does anyone know if I can employ this to cache my values without storing them in the database?
For example, if I had an unmapped property $postCount, can I use Doctrine to cache that value (or I guess, the object with that value populated)?
The only problem with this approach (caching to memory instead of a database), is we're working in a clustered environment, so I'd either have to build the cache multiple times (each server the user hits), or set get a shared caching server set up (which is a bit tricky).
I'll continue to investigate this route, but does anyone know of any database stored methods?
Thanks.
I think you may be looking for Doctrine's result cache
Here is the related part of the sf2 configuration.

Optimize database for web usage (lots more reading than writing)

I am trying to layout the tables for use in new public-facing website. Seeing how there will lots more reading than writing data (guessing >85% reading) I would like to optimize the database for reading.
Whenever we list members we are planning on showing summary information about the members. Something akin to the reputation points and badges that stackoverflow uses. Instead of doing a subquery to find the information each time we do a search, I wanted to have a "calculated" field in the member table.
Whenever an action is initiated that would affect this field, say the member gets more points, we simply update this field by running a query to calculate the new values.
Obviously, there would be the need to keep this field up to date, but even if the field gets out of sync, we can always rerun the query to update this field.
My question: Is this an appropriate approach to optimizing the database? Or are the subqueries fast enough where the performance would not suffer.
There are two parts:
Caching
Tuned Query
Indexed Views (AKA Materialized views)
Tuned table
The best solution requires querying the database as little as possible, which would require caching. But you still need a query to fill that cache, and the cache needs to be refreshed when it is stale...
Indexed views are the next consideration. Because they are indexed, querying against is faster than an ordinary view (which is equivalent to a subquery). Nonclustered indexes can be applied to indexed views as well. The problem is that indexed views (materialized views in general) are very constrained to what they support - they can't have non-deterministic functions (IE: GETDATE()), extremely limited aggregate support, etc.
If what you need can't be handled by an indexed view, a table where the data is dumped & refreshed via a SQL Server Job is the next alternative. Like the indexed view, indexes would be applied to make fetching data faster. But data change means cleaning up the indexes to ensure the query is running as best it can, and this maintenance can take time.
The least expensive database query is the one that you don't have to run against the database at all.
In the scenario you describe, using a high-performance cache technology (example: memcached) to store query results in your application can be a lot better strategy than trying to trick out the database to be highly scalable.
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.
Michael A. Jackson
If you are just designing the tables, I'd say, it's definitely premature to optimize.
You might want to redesign your database a few days later, you might find out that things work pretty fast without any clever hacks, you might find out they work slow, but in a different way than you expected. In either case you would waste your time, if you start optimizing now.
The approach you describe is generally fine; you could get some pre-computed values, either using triggers/SPs to preserve data consistency, or running a job to update these values time-to-time.
All databases are more than 85% read only! Usually high nineties too.
Tune it when you need to and not before.

Resources