TClientDataset to edit a table with 100k+ records - sql-server

A client wants to build a worksheet-like application to show data from a database (presumably on a TDbGrid or similar), allowing free search and edition of all cells, as you would do in a worksheet. Underlying table will have more than 100k rows.
The problem with using TClientDataset, is that it tends to load all data into memory, violating user requisites, that are these 3:
User will be able navigate from first to last record at any moment using scroll bar, keyboard, or a search filter (note that TClientDataset will load all records if you go to last record, AFAIK...).
Connection will be through external VPN / internet (possibly slow), so only the actual visible records on screen should be loaded. Never all.
Editions must be kept inside a transaction, so they can be committed or rollbacked at the end, and reconciling if necessary.
Is it possible to accomplish this 3 points using TClientDataset?
If not, what are the alternatives?

I'm answering just by your last line regarding alternatives, I can add some suggestions:
1- You can use some creativity, provide pagination and fetch let's say 100 rows per page using a thread which is equipped with a nice progress bar in the UI. in this method you must manage search and filters by some smart queries, reloading data sometimes, etc...
2- Use third party components that optimized for this purpose like SDAC + EhLib Dbgrid.
SDAC is a dataset that can be useful for cache updates and EhDBGrid has a MemTable component inside it which is very powerful, free search, fuzzy match or approximate search work nicely, possible to revert, undo and redo, etc...

Related

Keeping Data Fresh via Azure Data Factory

What is the proper way to keep data slices refreshed? Imagine I have a table with various column, but importantly a DATE_CREATED and a DATE_MODIFIED column.
If my data slicing strategy is based on DATE_CREATED, I could periodically reprocess old slices. This follows the ADF guidance of "repeatability". I don't think ADF has a way of doing this automatically, but I could externally trigger the refresh via the API (I'm guessing.) This seems like perhaps the most correct way, but given that ADF doesn't seem to support this as a feature, it makes me feel like there's a better way of doing it ... it also seems mildly wasteful.
If my data slicing strategy is based on DATE_MODIFIED, I run into issues with the ADF activities not being repeatable. An old slice, when refreshed, would give different results because rows that were within the window may have moved to a different window. On the other hand, the latest slice would always catch rows that have changed. The other issue is preventing row duplication. The pre-activity clean up actions would need to somehow be able to clean up records in the destination table prior to the copy. Or some type of UPSERT method must be employed.
The final option is to TRUNCATE the destination table every day. This is fine for smaller tables but has its own downsides, (1) we're not really "slicing" at all anymore. This is just scorched earth. (2) Any time any slice is being processed, all downstream slices from all dates are in danger of failing, due to the table being blown away. (3) practically impossible if your table has any respectable amount of data in it.
No option seems excellent but the first option seems better. Looking for advice from someone who has solved this problem or is experienced with ADF.

Insert New Record Before Existing Recordds in MS SQL Server

I'm working with an existing MS SQL database and ASP.NET web application. An update is needed in a table, but in order to add the new data and have it display correctly in the site, I need to be able to take a series of existing records, essentially "push them down", and then add the new data in the open space created.
Is there a cleaner, more efficient method than by just creating a new record that's a copy of the last related record, and then essentially doing a copy-and-paste for the remaining records until I reach the insertion point? There are quite a few records to move and I'd prefer something that isn't as mind-numbing and potentially error-prone as that.
I know the existing site and database isn't designed optimally for inserting new data into this table unless it's added to the end, but reconfiguring the database and stored procedures is not an option I presently have.
-- EDIT --
For additional requested information...
Screen shot of table definition:
Screen shot of some table data (filtered by TemplateID):
When looking at the table data, there are a couple other template ID values that bring back a bit more complex data. The issue is that this data needs to maintain this order, which happens to be the order in which it has been entered, since it gets returned and displayed in the shown order. The new data needs to be entered prior to one of these lettered subject headers. Honestly, I think this is not the best way to do this, but I had no hand in the design. It was created by a different company, and mine was hired to handle updates and maintenance after the creators became unpleasant to work with. A different template ID value brings back two levels of headers, which doesn't make my task any easier or alterations much cleaner considering the CS code that calls the stored procedures is completely separated from the code that builds the contents of the pages, and the organizational structure i tough to follow. There are some very poor naming conventions in places.
At any rate, there needs to be an insertion into this group of data under the "A" header value. The same needs to occur with another chunk associated with a different template ID, and there is another main header below the insertion point:

How to fetch thousands of data from database without getting slow down?

I want auto search option in textbox and data is fetching from database. I have thousands of data in my database table (almost 8-10000 rows). I know how to achieve this but as I am fetching thousands of data, it will take a lot of time to fetch. How to achieve this without getting slow down? Should I follow any other methodology to achieve this apart from simple fetching methods? I am using Oracle SQL Developer for database.
Besides the obvious solutions involving indexes and caching, if this is web technology and depending on your tool you can sometimes set a minimum length before the server call is made. Here is a jquery UI example: https://api.jqueryui.com/autocomplete/#option-minLength
"The minimum number of characters a user must type before a search is performed. Zero is useful for local data with just a few items, but a higher value should be used when a single character search could match a few thousand items."
It depends on your web interface, but you can use two tecniques:
Paginate your data: if your requirements are to accept empty values and to show all the results load them in block of a predefined size. goggle for example paginates search results. On Oracle pagination is made using the rownum special variable (see this response). Beware: you must first issue a query with a order by and then enclose it in a new one that use rownum. Other databases that use the limit keyword behave in a different way. If you apply the pagination techique to a drop down you end up with an infinite scroll (see this response for example)
Limit you data imposing some filter that limits the number of rows returned; your search display some results only after the user typed at least n chars in the field
You can combine 1 & 2, but unless you find an existing web component (a jquery one for example) it may be a difficult task if you don't have a Javascript knowledge.

Sorting in batches

I have a Java servlet which queries a DB and shows table in the browser. I have implemented pagination so that when the user scroll the table the new request are made only then. The problem is if the user chooses to sort the table in UI based on some column, the request takes a long time because actual table in DB is quite big and it sorts the entire table and then send the sorted paged data to the client/browser. So suppose if the table has 100k rows and I have a page size of 100 rows, so is there a way to tweak the sorting in DB or in pagination in servlet that the sorting the entire 100k rows is not required.
Pagination may help. So, here is how it usually done. Unlike, the old paginated page by page loading web pages, when you have single page scroll. You usually have a drop down which lists the sorting column.
You first load the first page, as soon as the page's bottom appear, you request next-page's query via AJAX. Until here, I guess you are fine.
What seems to be troubling you that if the user has scrolled, 10 pages deep and then he sorts, you will have to load 10 pages of data in one go. This is wrong.
Two things,
When you change sorting criteria, you changed the context which you were viewing the table in.
Assume, if you load 10 pages and keep the user at 10th page. He will be surprised what happened.
So, as soon as user changed the sort criteria, you clean the DIV and load page 1, by new criteria. You see you do not have burden now. Do you?
Couple of quick-tips:
I personally think it's better to leave DBMS to do the sorting and pagination. I think it's made to do that. You should write optimized queries.
Indexing columns to be sorted-by helps some.
If items do not change frequently, you may want to cache the pages (results from DB) with an appropriate TTL.
Leverage DBMS provided special functions to optimize the query. Here is an article for MySQL

Millions of rows auto-complete field - implementation ideas?

I have a location auto-complete field which has auto complete for all countries, cities, neighborhoods, villages, zip codes. This is part of a location tracking feature I am building for my website. So you can imagine this list will be in the multi-millions of rows. Expecting over 20 million atleast with all the villages and potal codes. To make the auto-complete work well I will use memcached so we dont hit the database always to get this list. It will be used a lot as this is the primary feature on the site. But the question is:
Is only 1 instace of the list stored in memcached irrespective of the users pulling the info or does it need to maintain a separate instance for each? So if say 20 million people are using it at the same time, will that differ from just 1 person using the location auto-complete? I am open to other ideas also on how to implement this location auto complete so it performs well.
Or can i do something like this: When a user logs in in the background I send them the list anyways, so by the time they reach the auto complete textfield their computer will have it ready to load instant?
Take a look at Solr (or Lucene itself), using NGram (or EdgeNGram) tokenizers you can get good autocomplete performance on massive datasets.

Resources