Paging of Query Result Set - database

Greetings Overflowers,
I'm wondering if there is a way to query some kind of a database and only fetch a certain window in the full result set without having to actually go through them all.
For example, if I query my database and I want only results number 100 to 200, would the database fetch all the results (say 0 to 1000) that match my query and later on filter them to exclude any thing outside my specified window frame ?
Actually, I'm working on a full text search problem (not really relational db stuff).
So how about Google and other search engines, do they get full result then filter or do they have direct access to only the needed window frame ?
Thank you all !

Your question is probably best answered in two parts.
For a database (traditional, relational), a query that is executed contains a number of "where" clauses, which will cause the database engine to limit the number of results that it returns. So if you specify a where clause that basically limits between 2 values of the primary key,
select * From table where id>99 and id<201;
you'll get what you're asking for.
For a search engine, a query you make to get the results will always paginate - using various techniques, all the results will be pre-split into pages and a few will be cached. Other pages will be generated on demand. So if you want pages 100-200 then you only ever fetch those that are needed.
The option to filter is not very efficient because large data sources never want to load all their data into memory and slice - you only want to load what's needed.

Related

How to use only the search set generated by the previous query for searching the current query in ElasticSearch

I am working on a Search Engine for searching strings. There is around 600 GB of data stored at ElasticSearch index with 4 nodes, having 5 primary shards and 1 replica set.
Assume there is a complete data set A(600GB) on which first time search, results in a new data set B(30GB) and then filters are applied on the data set B.
As of now I am creating a new query every-time a filter is applied and whole data i.e. set A(600GB) is used for searching, I want to know if there is a way to use only the data set B(30GB) for the subsequent filtering. If yes then how can I achieve this.
I am asking this because I think there should some way to achieve this. Since, when I do the search it shows the number of hits, which I think means ElasticSearch knows exactly which documents I want to use for subsequent filtering.
I am new to ElasticSearch so please pardon me if you find this question trivial.

Creating an Efficient (Dynamic) Data Source to Support Custom Application Grid Views

In the application I am working on, we have data grids that have the capability to display custom views of the data. As a point of reference, we modeled this feature using the concept of views as they exist in SharePoint.
The custom views should have the following capabilities:
Be able to define which subset of columns (of those that are
available) should be displayed in the view.
Be able to define one or
more filters for retrieving data. These filters are not constrained
to use only the columns that are in the result set but must use one
of the available columns. Standard logical conditions and operators
apply to these filters. For example, ColumnA Equals Value1 or
ColumnB >= Value2.
Be able to define a set of columns that the data will be sorted by. This set of columns can be one or more columns
from the set of columns that will be returned in the result set.
Be
able to define a set of columns that the data will be grouped by.
This set of columns can be one or more columns from the set of
columns that will be returned in the result set.
I have application code that will dynamically generate the necessary SQL to retrieve the appropriate set of data. However, it appears to perform poorly. When I run across a poorly performing query, my first thought is to determine where indexes might help. The problem here is that I won't necessarily know which indexes need to be created as the underlying query could retrieve data in many different ways.
Essentially, the SQL that is currently being used does the following:
Creates a temporary table variable to hold the filtered data. This table contains a column for each column that should be returned in the result set.
Inserts data that matches the filter into the table variable.
Queries the table variable to determine the total number of rows of data.
If requested, determines the grouping values of the data in the table variable using the specified grouping columns.
Returns the requested page of the requested page size of data from the table variable, sorted by any specified sort columns.
My question is what are some ways that I may improve this process? For example, one idea I had was to have my table variable only contain the columns of data that are used to group and sort and then join in the source table at the end to get the rest of the displayed data. I am not sure if this would make any difference which is the reason for this post.
I need to support versions 2014, 2016 and 2017 of SQL Server in addition to SQL Azure. Essentially, I will not be able to use a specific feature of an edition of SQL Server unless that feature is available in all of the aforementioned platforms.
(This is not really an "answer" - I just can't add comments yet because my reputation score isn't high enough yet.)
I think your general approach is fine - essentially you are making a GUI generator for SQL. However a few things:
This type of feature is best suited for a warehouse or read only replica database. Do not build this on a live production transactional database. There are permutations that you haven't thought of that your users will find that will kill your database (it's also true from a warehouse standpoint, but they usually don't have response time expectations as a transactional database)
The method you described for doing paging is not efficient from a database standpoint. You are essentially querying, filtering, grouping, and sorting the same exact dataset multiple times just to cherry pick a few rows each time. If you have the data cached, that might be ok, but you shouldn't make that assumption. If you have the know how, figure out how to snapshot the entire final data set with an extra column to keep the data physically sorted in the order the user requested. That way you can quickly query the results for your paging.
If you have a Repository/DAL layer, design your solution so that in the future certain combinations of tables/columns can utilize hardcoded queries/stored procedures. There will inevitably be certain queries that pop up that cause you performance issues and you may have to build a custom solution for specific queries in order to get the desired performance that can't be obtained by your dynamic sql

How to fetch thousands of data from database without getting slow down?

I want auto search option in textbox and data is fetching from database. I have thousands of data in my database table (almost 8-10000 rows). I know how to achieve this but as I am fetching thousands of data, it will take a lot of time to fetch. How to achieve this without getting slow down? Should I follow any other methodology to achieve this apart from simple fetching methods? I am using Oracle SQL Developer for database.
Besides the obvious solutions involving indexes and caching, if this is web technology and depending on your tool you can sometimes set a minimum length before the server call is made. Here is a jquery UI example: https://api.jqueryui.com/autocomplete/#option-minLength
"The minimum number of characters a user must type before a search is performed. Zero is useful for local data with just a few items, but a higher value should be used when a single character search could match a few thousand items."
It depends on your web interface, but you can use two tecniques:
Paginate your data: if your requirements are to accept empty values and to show all the results load them in block of a predefined size. goggle for example paginates search results. On Oracle pagination is made using the rownum special variable (see this response). Beware: you must first issue a query with a order by and then enclose it in a new one that use rownum. Other databases that use the limit keyword behave in a different way. If you apply the pagination techique to a drop down you end up with an infinite scroll (see this response for example)
Limit you data imposing some filter that limits the number of rows returned; your search display some results only after the user typed at least n chars in the field
You can combine 1 & 2, but unless you find an existing web component (a jquery one for example) it may be a difficult task if you don't have a Javascript knowledge.

Paged results when selecting data from 2 databases

Hi
I have one web service connected to one db that has a table called clients which has some data.
I have another web service connected to another db that has a table called clientdetails which has some other data.
I have to return a paged list of clients and every client object contains the information from both tables.
But I have a problem.
The search criteria has to be applied on both tables.
So basically in the clients table I can have the properties:
cprop1, cprop2
in the clientdetails table I can have cdprop1,cdprop2
and my search criteria can be cporp1=something, cdprop2 = somethingelse
I call the first web service and send it the criteria cporp1=something
And it returns some info and then I call the method in the second web service but if I have to return say 10 items on a page and the criteria of the second web service are applied on the 10 items selected by the first web service(cdprop2 = somethingelse) then I may be left with 8 items or none at all.
So what do I do in this case?
How can I make sure I always get the right number of items(that is as much as the user says he wants on a page)?
Until you have both responses you don't know how many records you are going to have to display.
You don't what kind of database access you are using, you imply that you ask for "N records matching criterion X", where you have N set to 10. In some DB access mechanisms you can ask for all matching records and then advance a "cursor" through the set, hence you don't need to set any upper bound - we assume that the DB takes care of managing resources efficiently for such a query.
If you can't do that, then you need to be able to revisit the first database asking for the next 10 records, repeat until finally you have a page full or no more records can be found. This requires that you have some way to specify a query for "next 10".
You need the ability to get to all records matching the criteria in some efficient way, either by some cursor mechanism offered by your DB or by your own "paged" queries, without that capability I don't see a way to guarantee to give an accurate result.
I found that in instances like this it's better not to use identity primary keys but primary keys with generated values in the second database(generated in the first database).
As for searching you should search for the first 1000 items that fit your criteria from the first database, intersect them with the first 1000 that match the given criteria from the second database and return the needed amount of items from this intersection.
Your queries should never return an unlimited amount of items any way so 1000 should do. The number could be bigger or smaller of course.

Creating an efficient search capability using SQL Server (and/or coldfusion)

I am trying to visualize how to create a search for an application that we are building. I would like a suggestion on how to approach 'searching' through large sets of data.
For instance, this particular search would be on a 750k record minimum table, of product sku's, sizing, material type, create date, etc;
Is anyone aware of a 'plugin' solution for Coldfusion to do this? I envision a google like single entry search where a customer can type in the part number, or the sizing, etc, and get hits on any or all relevant results.
Currently if I run a 'LIKE' comparison query, it seems to take ages (ok a few seconds, but still), and it is too long. At times making a user sit there and wait up to 10 seconds for queries & page loads.
Or are there any SQL formulas to help accomplish this? I want to use a proven method to search the data, not just a simple SQL like or = comparison operation.
So this is a multi-approach question, should I attack this at the SQL level (as it ultimately looks to be) or is there a plug in/module for ColdFusion that I can grab that will give me speedy, advanced search capability.
You could try indexing your db records with a Verity (or Solr, if CF9) search.
I'm not sure it would be faster, and whether even trying it would be worthwhile would depend a lot on how often you update the records you need to search. If you update them rarely, you could do an Verity Index update whenever you update them. If you update the records constantly, that's going to be a drag on the webserver, and certainly mitigate any possible gains in search speed.
I've never indexed a database via Verity, but I've indexed large collections of PDFs, Word Docs, etc, and I recall the search being pretty fast. I don't know if it will help your current situation, but it might be worth further research.
If your slowdown is specifically the search of textual fields (as I surmise from your mentioning of LIKE), the best solution is building an index table (not to be confiused with DB table indexes that are also part of the answer).
Build an index table mapping the unique ID of your records from main table to a set of words (1 word per row) of the textual field. If it matters, add the field of origin as a 3rd column in the index table, and if you want "relevance" features you may want to consider word count.
Populate the index table with either a trigger (using splitting) or from your app - the latter might be better, simply call a stored proc with both the actual data to insert/update and the list of words already split up.
This will immediately drastically speed up textual search as it will no longer do "LIKE", AND will be able to use indexes on index table (no pun intended) without interfering with indexing on SKU and the like on the main table.
Also, ensure that all the relevant fields are indexed fully - not necessarily in the same compund index (SKU, sizing etc...), and any field that is searched as a range field (sizing or date) is a good candidate for a clustered index (as long as the records are inserted in approximate order of that field's increase or you don't care about insert/update speed as much).
For anything mode detailed, you will need to post your table structure, existing indexes, the queries that are slow and the query plans you have now for those slow queries.
Another item is to enure that as little of the fields are textual as possible, especially ones that are "decodable" - your comment mentioned "is it boxed" in the text fields set. If so, I assume the values are "yes"/"no" or some other very limited data set. If so, simply store a numeric code for valid values and do en/de-coding in your app, and search by the numeric code. Not a tremendous speed improvement but still an improvement.
I've done this using SQL's full text indexes. This will require very application changes and no changes to the database schema except for the addition of the full text index.
First, add the Full Text index to the table. Include in the full text index all of the columns the search should perform against. I'd also recommend having the index auto update; this shouldn't be a problem unless your SQL Server is already being highly taxed.
Second, to do the actual search, you need to convert your query to use a full text search. The first step is to convert the search string into a full text search string. I do this by splitting the search string into words (using the Split method) and then building a search string formatted as:
"Word1*" AND "Word2*" AND "Word3*"
The double-quotes are critical; they tell the full text index where the words begin and end.
Next, to actually execute the full text search, use the ContainsTable command in your query:
SELECT *
from containstable(Bugs, *, '"Word1*" AND "Word2*" AND "Word3*"')
This will return two columns:
Key - The column identified as the primary key of the full text search
Rank - A relative rank of the match (1 - 1000 with a higher ranking meaning a better match).
I've used approaches similar to this many times and I've had good luck with it.
If you want a truly plug-in solution then you should just go with Google itself. It sounds like your doing some kind of e-commerce or commercial site (given the use of the term 'SKU'), So you probably have a catalog of some kind with product pages. If you have consistent markup then you can configure a google appliance or service to do exactly what you want. It will send a bot in to index your pages and find your fields. No SQl, little coding, it will not be dependent on your database, or even coldfusion. It will also be quite fast and familiar to customers.
I was able to do this with a coldfusion site in about 6 hours, done! The only thing to watch out for is that google's index is limited to what the bot can see, so if you have a situation where you want to limit access based on a users role or permissions or group, then it may not be the solution for you (although you can configure a permission service for Google to check with)
Because SQL Server is where your data is that is where your search performance is going to be a possible issue. Make sure you have indexes on the columns you are searching on and if using a like you can't use and index if you do this SELECT * FROM TABLEX WHERE last_name LIKE '%FR%'
But it can use an index if you do it like this SELECT * FROM TABLEX WHERE last_name LIKE 'FR%'. The key here is to allow as many of the first characters to not be wild cards.
Here is a link to a site with some general tips. https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=173

Resources