Sorting in batches

Sorting in batches - database

I have a Java servlet which queries a DB and shows table in the browser. I have implemented pagination so that when the user scroll the table the new request are made only then. The problem is if the user chooses to sort the table in UI based on some column, the request takes a long time because actual table in DB is quite big and it sorts the entire table and then send the sorted paged data to the client/browser. So suppose if the table has 100k rows and I have a page size of 100 rows, so is there a way to tweak the sorting in DB or in pagination in servlet that the sorting the entire 100k rows is not required.

Pagination may help. So, here is how it usually done. Unlike, the old paginated page by page loading web pages, when you have single page scroll. You usually have a drop down which lists the sorting column.
You first load the first page, as soon as the page's bottom appear, you request next-page's query via AJAX. Until here, I guess you are fine.
What seems to be troubling you that if the user has scrolled, 10 pages deep and then he sorts, you will have to load 10 pages of data in one go. This is wrong.
Two things,
When you change sorting criteria, you changed the context which you were viewing the table in.
Assume, if you load 10 pages and keep the user at 10th page. He will be surprised what happened.
So, as soon as user changed the sort criteria, you clean the DIV and load page 1, by new criteria. You see you do not have burden now. Do you?
Couple of quick-tips:
I personally think it's better to leave DBMS to do the sorting and pagination. I think it's made to do that. You should write optimized queries.
Indexing columns to be sorted-by helps some.
If items do not change frequently, you may want to cache the pages (results from DB) with an appropriate TTL.
Leverage DBMS provided special functions to optimize the query. Here is an article for MySQL

Related

TClientDataset to edit a table with 100k+ records

A client wants to build a worksheet-like application to show data from a database (presumably on a TDbGrid or similar), allowing free search and edition of all cells, as you would do in a worksheet. Underlying table will have more than 100k rows.
The problem with using TClientDataset, is that it tends to load all data into memory, violating user requisites, that are these 3:
User will be able navigate from first to last record at any moment using scroll bar, keyboard, or a search filter (note that TClientDataset will load all records if you go to last record, AFAIK...).
Connection will be through external VPN / internet (possibly slow), so only the actual visible records on screen should be loaded. Never all.
Editions must be kept inside a transaction, so they can be committed or rollbacked at the end, and reconciling if necessary.
Is it possible to accomplish this 3 points using TClientDataset?
If not, what are the alternatives?

I'm answering just by your last line regarding alternatives, I can add some suggestions:
1- You can use some creativity, provide pagination and fetch let's say 100 rows per page using a thread which is equipped with a nice progress bar in the UI. in this method you must manage search and filters by some smart queries, reloading data sometimes, etc...
2- Use third party components that optimized for this purpose like SDAC + EhLib Dbgrid.
SDAC is a dataset that can be useful for cache updates and EhDBGrid has a MemTable component inside it which is very powerful, free search, fuzzy match or approximate search work nicely, possible to revert, undo and redo, etc...

AngularJS - Returning 100k rows with 200 characters in each

I have a web page control panel that returns different data sets depending on which page you are looking at. These data sets can vary greatly in size, from a couple thousand to up to a million. While it is unreasonable to think someone to look at a million rows in a table I have to make it work and am unsure of the best way to go about it.
I make a $http.get request and all the data is returned however is it possible for it to all be done of the server side as it takes about a minute to load? I read that angular-datatables should be able to do this but I can't find any working JSFiddles of it. How should I go about doing this?

what is the best way to handle large amount of data in dashboard?

i have created a simple dashboard which have 10 to 15 widgets. so each widget are created with more than 100000 data. so there will be more than 1500000 records, how to handle it own browser.
The dashboard which i have created is just hangup.

I don't think you can do much on the frontend, but at the backend, if you are able to change something there, I would suggest you to query only the data that is required.
When you use charts let say for showing a timeline about the sales per month you would be using group by in your sql code. This will reduce the amount of data should be significantly less because you will get only the records which are required to show instead of manipulating the result in code.
If you use a datatable handle pagination within your query, instead of pulling all data from the database, which will affect performance and will need time to load, you can pull for example the first 100 and load the next 100 records of data when the user clicks on next page or scroll downs (like how facebook does with their timeline). You can also consider to use an in-memory database like Redis.
Hope this helps.

How do I fetch data in batches using backbone for an "infinite scroll"?

I'm attempting to fetch data from MySQL (with a PHP-based API) using Backbone. I want to retrieve items in batches of 20 though. For example, I have 10,000 records, but only want to show 20 records upon page load. As the user scrolls down I'd like to load the next 20 every time they reach the bottom of the page. My query is not sorted by ID (by design), so that may add one level of complexity. How is this possible in Backbone?

I recommend to use this plugin https://github.com/backbone-paginator/backbone.paginator

looking for ideas/alternatives to providing a page/item count/navigation of items matching a GAE datastore query

I like the datastore simplicity, scalability and ease of use; and the enhancements found in the new ndb library are fabulous.
As I understand datastore best practices, one should not write code to provide item and/or page counts of matching query results when the number of items that match a query is large; because the only way to do this is to retrieve all the results which is resource intensive.
However, in many applications, including ours, it is a common desire to see a count of matching items and provide the user with the ability to navigate to a specific page of those results. The datastore paging issue is further complicated by the requirement to work around limitations of fetch(limit, offset=X) as outlined in the article Paging Through Large Datasets. To support the recommended approach, the data must include a uniquely valued column that can be ordered in the way the results are to be displayed. This column will define a starting value for each page of results; saving it, we can fetch the corresponding page efficiently, allowing navigation to a specific or next page as requested. Therefore, if you want to show results ordered in multiple ways, several such columns may need to be maintained.
It should be noted that as of SDK v1.3.1, Query Cursors are the recommended way to do datastore paging. They have some limitations, including lack of support for IN and != filter operators. Currently some of our important queries use IN, but we'll try writing them using OR for use with query cursors.
Following the guidelines suggested, a user could be given a (Next) and (Prev) navigation buttons, as well as specific page buttons as navigation proceeded. For example if the user pressed (Next) 3 times, the app could show the following buttons, remembering the unique starting record or cursor for each to keep the navigation efficient: (Prev) (Page-1) (Page-2) (Page-3) (Page-4) (Next).
Some have suggested keeping track of counts separately, but this approach isn't practical when users will be allowed to query on a rich set of fields that will vary the results returned.
I'm looking for insights on these issues in general and the following questions specifically:
What navigational options of query results do you provide in your datastore apps to work around these limitations?
If providing users with efficient result counts and page navigation
of the entire query result set is a priority, should use of the datastore
be abandoned in favor of the GAE MySql solution now being offered.
Are there any upcoming changes in the big table architecture or
datastore implementation that will provide additional capability for
counting results of a query efficiently?
Many thanks in advance for your help.

It all depends on how many results you typically get. E.g. by passing .count() a suitable limit you can provide an exact count if the #items is e.g. <= 100 and "many" if there are more. It sounds like you cannot pre-compute all possible counts, but at least you could cache them, thereby saving many datastore ops.
Using NDB, the most efficient approach may either be to request the first page of entities using fetch_page(), and then using the resulting cursor as a starting point for a count() call; or alternatively, you may be better off running the fetch() of the first page and the count() concurrently using its async facilities. The second option may be your only choice if your query does not support cursors. Most IN / OR queries don't currently support cursors, but they do if you order by __key__.
In terms of UI options, I think it's sufficient to offer next and previous page options; the "Gooooooogle" UI that affords skipping ahead several pages is cute but I almost never use it myself. (To implement "previous page", reverse the order of the query and use the same cursor you used for the current page. I'm pretty sure this is guaranteed to work.)

Maybe just aim for this style of paging:
(first)(Prev)(Page1)(Page2)(Page3)....(Last)(next)
That way the total number is not required - you only need your code to know that there is enough results for another 3+ pages. with page size of 10 items per page, you just need to know there are 30+ items.
If you have 60 items, (enough for 6 pages) when youre already on page 4, your code would look forward and realise there are only another 20 records to go, so you could then show the last page number:
(first)(Prev)(Page4)(Page5)(Page6)(next)(last)
Basically for each fetch for the current page, just fetch enough records for another 3 pages of data, count them to see how many more pages you actully have, then dispaly your pager accordingly.
Also, if you just fetch the keys, it will be more efficient than fetching extra items.
hope that makes some sense!!?? :)

I notice that gmail is ready with some counts - it can tell you how many total emails you've received, and how many are in your inbox, etc - but on other counts, like full-text searches it says you're looking at "1-20 of many" or "1-20 of about 130". Do you really need to display counts for every query, or could you pre-calculate just the important ones?

Since the question was "looking for ideas/alternatives to providing a page", maybe the very simple alternative of fetching 10 pages worth of key_only items, then handling navigation through within this set is worth considering.
I have elaborated on this in answering a similar question, you will find sample code there :
Backward pagination with cursor is working but missing an item
The sample code would be more appropriate for this question. Here is a piece of it:
def session_list():
page = request.args.get('page', 0, type=int)
sessions_keys = Session.query().order(-Session.time_opened).fetch(100, keys_only=True)
sessions_keys, paging = generic_list_paging(sessions_keys, page)
# generic_list_paging will select the proper sublist.
sessions = [ sk.get() for sk in sessions_keys ]
return render_template('generic_list.html', objects=sessions, paging=paging)
See the referenced question for more code.
Of course, if the result set is potentially huge, some limit to the fetch must still be given, the hard limit being 1000 items I think. Obviously, it the result is more than some 10 pages long, the user will be asked to refine by adding criteria.
Dealing with paging within a few hundreds of keys_only items is really so much simpler, that it's definitely worth considering. It makes it quite easy to provide direct page navigation as mentionned in the question. The actual entity items are only fetched for the actual current page, the rest is only keys so it's not so costly. And you may consider keeping the keys_only result set in memcache for a few minutes so that a user quickly browsing through pages will not require the same query to be performed again.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight