Pagination and Entity Framework - sql-server

In my mobile app, I try to retrieve data from a table of my SQL Server database. I'm using EF and I try to use pagination for better performance. I need to retrieve data from the last element of the table. so if the table has 20 rows, I need, for page 0, IDs 20, 19, 18, 17, 16 then for page 1 IDs 15, 14, 13, 12, 11 and so on...
The problem is this: what if, while user "A" is downloading data from table, user "B" add row? If user "A" get Page 0 (so IDs 20, 19, 18, 17, 16), and user "B" in the same moment add row (so ID 21), with classic query, user "A" for page 1 will get IDs 16, 15, 14, 13, 12... so another time ID 16
My code is very simple:
int RecordsForPagination = 5;
var list = _context.NameTable
.Where(my_condition)
.OrderByDescending(my_condition_for ordering)
.Skip (RecordsForPagination * Page)
.Take (RecordsForPagination)
.ToList();
Of course Page is int that come from the frontend.
How can I solve the problem?
I found a solution but I don't know if it's the perfect one. I could use
.SkipWhile(x => x.ID >= LastID)
instead
.Skip (RecordsForPagination * Page)
and of course LastID always is sent from the frontend.
Do you think that performance is always good with this code? Is there a better solution?

The performance impacts will depend greatly on your SQL Index implementation and the order by clause. But its not so much a question about performance as it is about getting the expected results.
Stack Overflow is a great example where there is a volume of activity such that when you get to the end of any page, the next page may contain records from the page you just viewed because the underlying recordset has changed (more posts have been added)
I bring this up because in a live system it is generally accepted and in some cases an expected behaviour. As developers we appreciate the added overheads of trying to maintain a single result set and recognise that there is usually much lower value in trying to prevent what looks like duplications as you iterate the pages.
It is often enough to explain to users why this occurs, in many cases they will accept it
If it was important to you to maintain the place in the original result set, then you should constrain the query with a Where clause, but you'll need to retrieve either the Id or the timestamp in the original query. In your case you are attempting to use LastID, but to get the last ID would require a separate query on it's own because the orderby clause will affect it.
You can't really use .SkipWhile(x => x.ID >= LastID) for this, because skip is a sequential process that is affected by the order and is dis-engaged the first instance that the expression evaluates to false, so if your order is not based on Id, your skip while might result is skipping no records at all.
int RecordsForPagination = 5;
int? MaxId = null;
...
var query = _context.NameTable.Where(my_condition);
// We need the Id to constraint the original search
if (!MaxId.HasValue)
MaxId = query.Max(x => x.ID);
var list = query.Where(x => x.ID <= MaxId)
.OrderByDescending(my_condition_for ordering)
.Skip(RecordsForPagination * Page)
.Take(RecordsForPagination);
.ToList();
It is generally simpler to filter by a point in time as this is known from the client without a round trip to the DB, but depending on the implementation the filtering on dates can be less efficient.

Related

ActiveRecord - Find rows with array and update column with array of values?

Lets say I have some Users and I want to update their ages because I put them in wrong at first. I could do them one at a time as:
User.find(20).update(age: 25)
But can I pull the users with an array, and then update each individual with the corresponding place in another array? I'm imagining something like:
User.find([20,46,78]).update_all(age: [25, 50, 43])
I have a list of User ids and what their correct ages are supposed to be, so I want to change the existing ones for those users. What's the best way to do this?
One way of doing this, if your backing storage is MySql/Maria you can do something like this:
arr = { 1 => 22, 2 => 55}.map { |id, age| "(#{id}, #{age})" }
User.connection.execute("INSERT into users (id, age) VALUES #{arr.join(',')} ON DUPLICATE KEY UPDATE age = VALUES(age)")
This is assuming your table name is users. Be careful. Since SQL is manually composed, watch out for injection problems. Depends on what you need, this solution might or might not work for you. It for the most part bypasses the ActiveRecord, so you will not get any validation or callbacks.

increase performance of a linq query using contains

I have a winforms app where I have a Telerik dropdownchecklist that lets the user select a group of state names.
Using EF and the database is stored in Azure SQL.
The code then hits a database of about 17,000 records and filters the results to only include states that are checked.
Works fine. I am wanting to update a count on the screen whenever they change the list box.
This is the code, in the itemCheckChanged event:
var states = stateDropDownList.CheckedItems.Select(i => i.Value.ToString()).ToList();
var filteredStops = (from stop in aDb.Stop_address_details where states.Contains(stop.Stop_state) select stop).ToArray();
ExportInfo_tb.Text = "Current Stop Count: " + filteredStops.Count();
It works, but it is slow.
I tried to load everything into a memory variable then querying that vs the database but can't seem to figure out how to do that.
Any suggestions?
Improvement:
I picked up a noticeable improvement by limiting the amount of data coming down by:
var filteredStops = (from stop in aDb.Stop_address_details where states.Contains(stop.Stop_state) select stop.Stop_state).ToList();
And better yet --
int count = (from stop in aDb.Stop_address_details where
states.Contains(stop.Stop_state)
select stop).Count();
ExportInfo_tb.Text = "Current Stop Count: " + count.ToString();
The performance of you query, actually, has nothing to do with Contiains, in this case. Contains is pretty performant. The problem, as you picked up on in your third solution, is that you are pulling far more data over the network than required.
In your first solution you are pulling back all of the rows from the server with the matching stop state and performing the count locally. This is the worst possible approach. You are pulling back data just to count it and you are pulling back far more data than you need.
In your second solution you limited the data coming back to a single field which is why the performance improved. This could have resulted in a significant improvement if your table is really wide. The problem with this is that you are still pulling back all the data just to count it locally.
In your third solution EF will translate the .Count() method into a query that performs the count for you. So the count will happen on the server and the only data returned is a single value; the result of count. Since network latency CAN often be (but is not always) the longest step when performing a query, returning less data can often result in significant gains in query speed.
The query translation of your final solution should look something like this:
SELECT COUNT(*) AS [value]
FROM [Stop_address_details] AS [t0]
WHERE [t0].[Stop_state] IN (#p0)

Google Datastore queries and eventual consistency

I would like to confirm my understanding of eventual consistency in the Google datastore. Suppose that I have an entity defined as follows (using ndb):
class Record(ndb.Model):
name = ndb.StringProperty()
content = ndb.BlobProperty()
I think I understand Scenarios 1, but I have doubts about Scenarios 2 and 3, so some advice would be highly appreciated.
Scenario 1: I insert a new Record with name "Luca" and a given content. Then, I query the datastore:
qry = Record.query(name=="Luca")
for r in qry.iter():
logger.info("I got this content: %r" % r.content)
I understand that, due to eventual consistency, the just-inserted record might not be part of the result set. I know about using ancestor queries in order to over come this if needed.
Scenario 2: I read an existing Record with name "Luca", update the content, and write it back. For instance, assuming I have the key "k" of this record:
r = k.get()
r.content = "new content"
r.put()
Then, I run the same query as in Scenario 1. When I get the results, assume that the record is part of the result set (for instance, because the index already contained the record with name "Luca" and key k). Am I then guaranteed that the field content will have its new value "new content"?
In other words, if I update a record, leaving its key and indexed fields alone, am I guaranteed to read the most recent value?
Scenario 3: I do similarly to Scenario 2, again where k is the key of a record with name "Luca":
r = k.get()
r.content = "new content"
r.put()
but then I run a modified version of the query:
qry = Record.query(name=="Luca")
for k in qry.iter(keys_only=True):
r = k.get()
logger.info("I got this content: %r" % r.content)
In this case, logic tells me I should be getting the latest value of the content, because reading by key guarantees strong consistency. I would appreciate confirmation.
Scenario 1. Yes, your understanding is correct.
Scenario 2. No, same query, so still eventually consistent.
Scenario 3. Yes, your understanding is correct.
Also you can avoid eventual consistency by doing everything in the same transaction, but of course this may not be applicable.

Cookie. How can I count each page visit no duplicate?

My page URL is with 3 variables: id=N, num=N and item=N ; N = 1, 100. They are integers and an example of URL's given:
page.php?id=1&num=24&item=12
I want to count the visit with no duplicates. How I think it's should work:
If cookie exist don't add a new value to database, else set cookie and increment value in database.
I used a $_COOKIE[''] array, to identify if page was visited:
$_COOKIE['product[id]'] = $_GET['id'];
$_COOKIE['product[num]'] = $_GET['num'];
$_COOKIE['product[item]'] = $_GET['item'];
The problem appeared when the path it's different:
page.php?id=1&num=24&item=12&page=0#topView
I can't query the data base each time when a person access the page, it's because there are, 1000+ unique visits.
My questions is: how can I count in mod unique each page visit?
Note:
page.php?id=1&num=24&item=12
or
page.php?id=1&num=24&item=15
or
page.php?id=2&num=24&item=12
these links gives me an unique product info, depending by variables.
Thank you!
Since my comment is the solution for your problem, I am converting it as an aswer.
"I can't query the data base each time when a person access the page, it's because there are, 1000+ unique visits." - I doubt you can. In my opinion - you should. Count all the page accesses and when you want to grab the final results, do grouping by ip, id, num, item. Putting all the data into the database will also give you a brief view of the most popular pages. Even further, you will be able to see what pages are being accessed more times by one unique user and identify the reasons. The more data is better. It won't take much of your database.
There is a mistake in your finally decided algorithm. Imagine the id is 11, num is 5 and item is 7. And another with id = 1, num = 15, item = 7. Hope you see what I mean. :P Put it like that
md5($_GET['id'].'-'.$_GET['num'].'-'.$_GET['item']);
so it is really unique.

Composite key with CouchDB, finding multiple records

I know you can pass a key or a range to return records in CouchDB, but I want to do something like this. Find X records that are X values.
So for example, in regular SQL, lets say I wanted to return records with ids that are 5, 7, 29, 102. I would do something like this:
SELECT * FROM sometable WHERE id = 5 OR id = 7 or id = 29 or id = 102
Is it possible to do this in CouchDB, where I toss all the values I want to find in the key array, and then CouchDB searches for all of those records that could exist in the "key parameter"?
You can do a POST as documented on CouchDB wiki. You pass the list of keys in the body of the request.
{"keys": ["key1", "key2", ...]}
The downside is that a POST request is not cached by the browser.
Alternatively, you can obtain the same response using a GET with the keys parameter. For example, you can query the view _all_docs with:
/DB/_all_docs?keys=["ID1","ID2"]&include_docs=true
which, properly URL encoded, becomes:
/DB/_all_docs?keys=%5B%22ID1%22,%22ID2%22%5D&include_docs=true
this should give better cacheability, but keep in mind that _all_docs changes at each doc update. Sometimes, you can workaround this by defining your own view with only the needed documents.
With a straight view function, this will not be possible. However, you can use a _list function to accomplish the same result.

Resources