EF Core 6 : inspection through profiler is shocking - sql-server

I have inspected EF Core 6 through SQL Server Profiler which is shocking from a performance point of view.
Here nothing profiled which is very good:
var user = App.Ctx.LoginUsers;
This code nothing profiled which is also very good:
user = App.Ctx.LoginUsers;
Here, the code profiled which is also good:
var users = App.Ctx.LoginUsers.ToList();
but with this code, when profiled, is very bad because it's already in context memory above:
users = App.Ctx.LoginUsers.ToList();
If every query is rounding from SQL Server even if it's in context memory then performance will be disaster?
But we have expected to round-trip to SQL server only for differential data which has been changed
Inserted to be added to dbcontext memory
only Modified to update in dbcontext memory
and only Deleted to remove from dbcontext memory
and provide data any ask data from dbcontext memory.
In this way, we thought performance would have boosted drastically.

if every query is rounding from sql server
Yes, that's what queries do. They fetch data from the database. If you want to access the data that was loaded in previous queries, use DbSet< TEntity >.Local, or store them in your own collection.
The DbContext Change Tracker stores the entities retrieved from your database, but it's not designed as a read-through cache. If you run another query, another query will be sent to the database.

What do you mean by "disaster"?
By default, EF will track references that it has already fetched. Do no confuse this with caching as being for a performance reason. Loading entire sets of entities as tracked references is not a good idea. Not because it still means extra round trips to the database despite that those instances are already tracked, but because the more instances that the EF DbContext is tracking, the more memory is in use and the longer it can take to fetch additional data since those operations will automatically look to associate any already tracked instances for any relationships for the data being returned.
If you fetch a significant amount of data and don't need that data to be tracked by the DbContext (I.e. you don't intend to update it so you don't need change tracking ) then use AsNoTracking().
var users = App.Ctx.LoginUsers.AsNoTracking().ToList();
This will still fetch all users from the DB but the Context will not be tracking these instances.
If you know you have loaded the desired data already, or want to check and use any pre-loaded and tracked instances before going to the database, then use the Local set from the DbSet to tell EF to just go to the tracking instances:
// Look for a tracked instance:
var user = App.Ctx.LoginUsers.Local.SingleOrDefault(x => x.UserId == userId);
if (user == null)
user = App.Ctx.LoginUsers.Single(x => x.UserId == userId);
This is a common strategy when you know some data might already be tracked and you are dealing with detached entities. (I.e. a user loaded with AsNoTracking or deserialized) The Local call checks the tracking store for that entity, so no round trip to the DB, then if we don't find it we can load and track it from the DB.
The other detail to be aware of is that while queries against the DbContext that would return an already tracked instance still trigger an SQL query, the tracked instance is not updated by the data returned by that query. For instance if you load your Users into the DbContext, then some other process not using that DbContext instance goes and modifies one or more of the users in the database, fetching those users from the DbContext will return the tracked data as it was when it was loaded. An SQL query will be run against the database, however any modified data state does not automatically update any of the already tracked entities.

Related

A copy of database in the Front-End side (Store) to avoid fetching data multiple times

I have a large database, so I decided to create a copy of this database on the front-end side in my Store (I am using Vuex with Vuejs), i.e. each table in the database has a corresponding array in the Store, and every time I have an update I persist it to the real database and to my copy (which exists in the Store) at the same time to avoid fatch the data again after each update.
Is it a good idea or not (in terms of performance) ?
I think its totally fine to create a cache of your data in the client side, and it is even recommended. But you do need to be aware of some things:
Make sure you update the client side only when the data has been successfully saved in the database. So the client will not see false information.
If you have multiple users who can change the data, and you need to use the data in real time - make sure that you send the updated data to other users as well.
Check which tables from the database you actually need at any time - maybe you need different tables at different views of the application, so you do not have to save all the database all the time. It can help to reduce memory usage
Consider using lazy loading, which means that you load the tables only when you need them, and then save it in the cache. The next time you need to use this table you won`t load it from the server, but instead use the cached data.
When you put the data in vuex store, vue will consider this data reactive, which can cause performance issues - especially if you have a lot of data. If you have data that you know will not be changed, or data that is rarely changed - consider using Object.freeze() which basically tells vue not to put any watchers on this object. This could help improve performance issues by much.
EDIT:
if you are concerned about performance issues I would implement the cache using lazy loading, and Object.freeze() which means you will not be able to change the data in the client side - so for every change you should send the update to the server and receive the full updated table in - so you will assign the new value to your cache with Object.freeze(). That way you don't have to request the table from the server for every usage, only for updates. This will help to keep good performance.

Realtime Database returning deleted values [EDITED]

I deleted everything in my database. Then, I queried the data that was just deleted and I get results! How is this possible?
I am using the Realtime Database Unity SDK. For testing purposes, I want to regularly purge the whole database and populate it with new data. Imagine my surprise when my queries returned some old, deleted data. It is as if the deleted data persists in some void that can still be accessed.
I have been tinkering with this issue for days now. Here are my steps:
I'm using GetReference(item).Push().Key; which auto-generates a unique key.
I write the new item to the database with GetReference(item).SetValueAsync().
I check my Firebase console, and indeed, the data was correctly recorded.
I create a query that returns the JSON value of item. I works fine.
I delete item from the data base.
I run the query again and item is returned. ITEM IS NOT SUPPOSED TO EXIST ANYMORE!
Out of curiosity, I write a query to return all the data in my database (which should be empty) and it returns every object I have create over the last few days. This is literally hundreds of items....from an empty database.
It seems like data persists for a few days after it is deleted.
Realizing this I decided to test what would happen if I manually made an object that uses an existing key from one of the deleted objects.
My query returns the new object. Yay!
I take a break and come back 15 minutes later. I run the exact query again. I get the old, deleted object and not the new one. WHAT THE HECK IS GOING ON?
At this point I am questioning whether Realtime Database is even a real database. It seems to break the rules of both consistency and integrity.
I've also considered that I might be deleting the data incorrectly. I was mostly doing it manually, through the browser. I also have tried RemoveValueAsync() and SetRawJsonValueAsync(null). Nothing seems to make a difference.
Please, please, please can someone tell me what is going on? I will be forever grateful.
EDIT: It turns out that the phantom data was coming from the cache on my device. Turning persistence off solved the problem. Apparently, performing the same query multiple times only retrieves the data for the database the first time. The subsequent queries look into the cache.
It turns out that the phantom data was coming from the cache on my device. Turning persistence off solved the problem. Apparently, performing the same query multiple times only retrieves the data for the database the first time. The subsequent queries look into the cache.

How efficient can Meteor be while sharing a huge collection among many clients?

Imagine the following case:
1,000 clients are connected to a Meteor page displaying the content of the "Somestuff" collection.
"Somestuff" is a collection holding 1,000 items.
Someone inserts a new item into the "Somestuff" collection
What will happen:
All Meteor.Collections on clients will be updated i.e. the insertion forwarded to all of them (which means one insertion message sent to 1,000 clients)
What is the cost in term of CPU for the server to determine which client needs to be updated?
Is it accurate that only the inserted value will be forwarded to the clients, and not the whole list?
How does this work in real life? Are there any benchmarks or experiments of such scale available?
The short answer is that only new data gets sent down the wire. Here's
how it works.
There are three important parts of the Meteor server that manage
subscriptions: the publish function, which defines the logic for what
data the subscription provides; the Mongo driver, which watches the
database for changes; and the merge box, which combines all of a
client's active subscriptions and sends them out over the network to the
client.
Publish functions
Each time a Meteor client subscribes to a collection, the server runs a
publish function. The publish function's job is to figure out the set
of documents that its client should have and send each document property
into the merge box. It runs once for each new subscribing client. You
can put any JavaScript you want in the publish function, such as
arbitrarily complex access control using this.userId. The publish
function sends data into the merge box by calling this.added, this.changed and
this.removed. See the
full publish documentation for
more details.
Most publish functions don't have to muck around with the low-level
added, changed and removed API, though. If a publish function returns a Mongo
cursor, the Meteor server automatically connects the output of the Mongo
driver (insert, update, and removed callbacks) to the input of the
merge box (this.added, this.changed and this.removed). It's pretty neat
that you can do all the permission checks up front in a publish function and
then directly connect the database driver to the merge box without any user
code in the way. And when autopublish is turned on, even this little bit is
hidden: the server automatically sets up a query for all documents in each
collection and pushes them into the merge box.
On the other hand, you aren't limited to publishing database queries.
For example, you can write a publish function that reads a GPS position
from a device inside a Meteor.setInterval, or polls a legacy REST API
from another web service. In those cases, you'd emit changes to the
merge box by calling the low-level added, changed and removed DDP API.
The Mongo driver
The Mongo driver's job is to watch the Mongo database for changes to
live queries. These queries run continuously and return updates as the
results change by calling added, removed, and changed callbacks.
Mongo is not a real time database. So the driver polls. It keeps an
in-memory copy of the last query result for each active live query. On
each polling cycle, it compares the new result with the previous saved
result, computing the minimum set of added, removed, and changed
events that describe the difference. If multiple callers register
callbacks for the same live query, the driver only watches one copy of
the query, calling each registered callback with the same result.
Each time the server updates a collection, the driver recalculates each
live query on that collection (Future versions of Meteor will expose a
scaling API for limiting which live queries recalculate on update.) The
driver also polls each live query on a 10 second timer to catch
out-of-band database updates that bypassed the Meteor server.
The merge box
The job of the merge box is to combine the results (added, changed and removed
calls) of all of a client's active publish functions into a single data
stream. There is one merge box for each connected client. It holds a
complete copy of the client's minimongo cache.
In your example with just a single subscription, the merge box is
essentially a pass-through. But a more complex app can have multiple
subscriptions which might overlap. If two subscriptions both set the
same attribute on the same document, the merge box decides which value
takes priority and only sends that to the client. We haven't exposed
the API for setting subscription priority yet. For now, priority is
determined by the order the client subscribes to data sets. The first
subscription a client makes has the highest priority, the second
subscription is next highest, and so on.
Because the merge box holds the client's state, it can send the minimum
amount of data to keep each client up to date, no matter what a publish
function feeds it.
What happens on an update
So now we've set the stage for your scenario.
We have 1,000 connected clients. Each is subscribed to the same live
Mongo query (Somestuff.find({})). Since the query is the same for each client, the driver is
only running one live query. There are 1,000 active merge boxes. And
each client's publish function registered an added, changed, and
removed on that live query that feeds into one of the merge boxes.
Nothing else is connected to the merge boxes.
First the Mongo driver. When one of the clients inserts a new document
into Somestuff, it triggers a recomputation. The Mongo driver reruns
the query for all documents in Somestuff, compares the result to the
previous result in memory, finds that there is one new document, and
calls each of the 1,000 registered insert callbacks.
Next, the publish functions. There's very little happening here: each
of the 1,000 insert callbacks pushes data into the merge box by
calling added.
Finally, each merge box checks these new attributes against its
in-memory copy of its client's cache. In each case, it finds that the
values aren't yet on the client and don't shadow an existing value. So
the merge box emits a DDP DATA message on the SockJS connection to its
client and updates its server-side in-memory copy.
Total CPU cost is the cost to diff one Mongo query, plus the cost of
1,000 merge boxes checking their clients' state and constructing a new
DDP message payload. The only data that flows over the wire is a single
JSON object sent to each of the 1,000 clients, corresponding to the new
document in the database, plus one RPC message to the server from the
client that made the original insert.
Optimizations
Here's what we definitely have planned.
More efficient Mongo driver. We
optimized the driver
in 0.5.1 to only run a single observer per distinct query.
Not every DB change should trigger a recomputation of a query. We
can make some automated improvements, but the best approach is an API
that lets the developer specify which queries need to rerun. For
example, it's obvious to a developer that inserting a message into
one chatroom should not invalidate a live query for the messages in a
second room.
The Mongo driver, publish function, and merge box don't need to run
in the same process, or even on the same machine. Some applications
run complex live queries and need more CPU to watch the database.
Others have only a few distinct queries (imagine a blog engine), but
possibly many connected clients -- these need more CPU for merge
boxes. Separating these components will let us scale each piece
independently.
Many databases support triggers that fire when a row is updated and
provide the old and new rows. With that feature, a database driver
could register a trigger instead of polling for changes.
From my experience, using many clients with while sharing a huge collection in Meteor is essentially unworkable, as of version 0.7.0.1. I'll try to explain why.
As described in the above post and also in https://github.com/meteor/meteor/issues/1821, the meteor server has to keep a copy of the published data for each client in the merge box. This is what allows the Meteor magic to happen, but also results in any large shared databases being repeatedly kept in the memory of the node process. Even when using a possible optimization for static collections such as in (Is there a way to tell meteor a collection is static (will never change)?), we experienced a huge problem with the CPU and Memory usage of the Node process.
In our case, we were publishing a collection of 15k documents to each client that was completely static. The problem is that copying these documents to a client's merge box (in memory) upon connection basically brought the Node process to 100% CPU for almost a second, and resulted in a large additional usage of memory. This is inherently unscalable, because any connecting client will bring the server to its knees (and simultaneous connections will block each other) and memory usage will go up linearly in the number of clients. In our case, each client caused an additional ~60MB of memory usage, even though the raw data transferred was only about 5MB.
In our case, because the collection was static, we solved this problem by sending all the documents as a .json file, which was gzipped by nginx, and loading them into an anonymous collection, resulting in only a ~1MB transfer of data with no additional CPU or memory in the node process and a much faster load time. All operations over this collection were done by using _ids from much smaller publications on the server, allowing for retaining most of the benefits of Meteor. This allowed the app to scale to many more clients. In addition, because our app is mostly read-only, we further improved the scalability by running multiple Meteor instances behind nginx with load balancing (though with a single Mongo), as each Node instance is single-threaded.
However, the issue of sharing large, writeable collections among multiple clients is an engineering problem that needs to be solved by Meteor. There is probably a better way than keeping a copy of everything for each client, but that requires some serious thought as a distributed systems problem. The current issues of massive CPU and memory usage just won't scale.
The experiment that you can use to answer this question:
Install a test meteor: meteor create --example todos
Run it under Webkit inspector (WKI).
Examine the contents of the XHR messages moving across the wire.
Observe that the entire collection is not moved across the wire.
For tips on how to use WKI check out this article. It's a little out of date, but mostly still valid, especially for this question.
This is still a year old now and therefore I think pre-"Meteor 1.0" knowledge, so things may have changed again? I'm still looking into this.
http://meteorhacks.com/does-meteor-scale.html
leads to a "How to scale Meteor?" article
http://meteorhacks.com/how-to-scale-meteor.html

Hibernate HQL only hits session cache

I am having some trouble understanding where an HQL query gets the information from. My project is using different threads and each thread reads/writes to the database. Threads do not share Session objects, instead I am using a HibernateUtil class which creates sessions for me.
Until recently, I would only close a session after writing but not after reading. Changes to objects would be immediately seen in the database but when reading on other threads (different Session object than the one used for writing) I would get stale information. Reading and writing happened always on different threads which means different Session objects and different session caches.
I always thought that by using HQL instead of Criteria, I would always target the database (or second level cache) and not the session cache but while debugging my code, it was made clear to me that the HQL was looking for the object in the session cache and retrieved an old outdated object.
Was I wrong in assuming that HQL always targets the database? Or at least the second level cache?
PS: I am using only one SessionFactory object.
Hibernate has different concepts of caching - entity caches, and query caches. Entity caching is what the session cache (and the 2nd level cache, if enabled) does.
Assuming query caching is not enabled (which it's not, by default), then your HQL would have been executed against the database. This would have returned the IDs of the entities that match the query. If those entities were already in the session cache, then Hibernate would have returned those, rather than rebuilding them from the database. If your session has stale copies of them (because another session has updated the database), then that's the problem you have.
I would advise against using long-lived sessions, mainly for that reason. You should limit the lifespan of the session to the specific unit of work that you're trying to do, and then close it. There's little or no performance penalty to doing this (assuming you use a database connection pool). Alternatively, to make sure you don't get stale entities, you can call Session.clear(), but you may end up with unexpected performance side-effects.

Hibernate and multiple threads, synchronize changes between multiple users

I am using Hibernate in an Eclipse RAP application. I have database tables mapped to classes with Hibernate and these classes have properties that are fetched lazily (If these weren't fetched lazily then I would probably end up loading the whole database into memory on my first query). I do not synchronize database access so there are multiple Hibernate Sessions for the users and let the DBMS do the transaction isolation. This means different instances of fetched data will belong to different users. There are things that if a user changes those things, then I would like to update those across multiple users. Currently I was thinking about using Hibernate session.refresh(object) in these cases to refresh the data, but I'm unsure how this will impact performance when refreshing multiple objects or if it's the right way to go.
Hope my problem is clear. Is my approch to the problem OK or is it fundamentally flawed or am I missing something? Is there a general solution for this kind of problem?
I would appreciate any comments on this.
The general solution is
to have transactions as short as possible
to link the session lifecycle to the transaction lifecycle (this is the default: the session is closed when the transaction is committed or rolled back)
to use optimistic locking concurrency to avoid two transactions updating the same object at the same time.
If each transaction is very short and transaction A updates some object from O to O', then concurrent transaction B will only see O until it commits or rolls back, and any other transaction started after A will see O', because a new session starts with the transaction.
We maintain an application that does exactly what you are trying to accomplish. Yes, every session.refresh() will hit the database, but since all sessions will refresh the same row at the same time, the DB server will answer all of these queries from memory.
The only thing that you still need to solve is how to propagate the information that something has changed and needs reloading to all the other sessions, possibly even to sessions on a different host.
For our application, we have about 30 users on RCP and 10-100 users on RAP instances that all connect to the very same DB backend (though through pgpool). We use a small network service that every runtime connects to; when a transaction commits, the application tells this change service that "row id X of table T" has changed and this is then propagated to all other "change subscribers", even across JVMs.
But: make sure that session.refresh() is called within the Thread that belongs to that session, possibly the RAP-Display thread. Do not call refresh() from Jobs or other unrelated threads.
As long you don't have a large number of users updating big counts of rows in short time, I guess you won't have to worry about performance.

Resources