Paged results when selecting data from 2 databases - database

Hi
I have one web service connected to one db that has a table called clients which has some data.
I have another web service connected to another db that has a table called clientdetails which has some other data.
I have to return a paged list of clients and every client object contains the information from both tables.
But I have a problem.
The search criteria has to be applied on both tables.
So basically in the clients table I can have the properties:
cprop1, cprop2
in the clientdetails table I can have cdprop1,cdprop2
and my search criteria can be cporp1=something, cdprop2 = somethingelse
I call the first web service and send it the criteria cporp1=something
And it returns some info and then I call the method in the second web service but if I have to return say 10 items on a page and the criteria of the second web service are applied on the 10 items selected by the first web service(cdprop2 = somethingelse) then I may be left with 8 items or none at all.
So what do I do in this case?
How can I make sure I always get the right number of items(that is as much as the user says he wants on a page)?

Until you have both responses you don't know how many records you are going to have to display.
You don't what kind of database access you are using, you imply that you ask for "N records matching criterion X", where you have N set to 10. In some DB access mechanisms you can ask for all matching records and then advance a "cursor" through the set, hence you don't need to set any upper bound - we assume that the DB takes care of managing resources efficiently for such a query.
If you can't do that, then you need to be able to revisit the first database asking for the next 10 records, repeat until finally you have a page full or no more records can be found. This requires that you have some way to specify a query for "next 10".
You need the ability to get to all records matching the criteria in some efficient way, either by some cursor mechanism offered by your DB or by your own "paged" queries, without that capability I don't see a way to guarantee to give an accurate result.

I found that in instances like this it's better not to use identity primary keys but primary keys with generated values in the second database(generated in the first database).
As for searching you should search for the first 1000 items that fit your criteria from the first database, intersect them with the first 1000 that match the given criteria from the second database and return the needed amount of items from this intersection.
Your queries should never return an unlimited amount of items any way so 1000 should do. The number could be bigger or smaller of course.

Related

Creating an Efficient (Dynamic) Data Source to Support Custom Application Grid Views

In the application I am working on, we have data grids that have the capability to display custom views of the data. As a point of reference, we modeled this feature using the concept of views as they exist in SharePoint.
The custom views should have the following capabilities:
Be able to define which subset of columns (of those that are
available) should be displayed in the view.
Be able to define one or
more filters for retrieving data. These filters are not constrained
to use only the columns that are in the result set but must use one
of the available columns. Standard logical conditions and operators
apply to these filters. For example, ColumnA Equals Value1 or
ColumnB >= Value2.
Be able to define a set of columns that the data will be sorted by. This set of columns can be one or more columns
from the set of columns that will be returned in the result set.
Be
able to define a set of columns that the data will be grouped by.
This set of columns can be one or more columns from the set of
columns that will be returned in the result set.
I have application code that will dynamically generate the necessary SQL to retrieve the appropriate set of data. However, it appears to perform poorly. When I run across a poorly performing query, my first thought is to determine where indexes might help. The problem here is that I won't necessarily know which indexes need to be created as the underlying query could retrieve data in many different ways.
Essentially, the SQL that is currently being used does the following:
Creates a temporary table variable to hold the filtered data. This table contains a column for each column that should be returned in the result set.
Inserts data that matches the filter into the table variable.
Queries the table variable to determine the total number of rows of data.
If requested, determines the grouping values of the data in the table variable using the specified grouping columns.
Returns the requested page of the requested page size of data from the table variable, sorted by any specified sort columns.
My question is what are some ways that I may improve this process? For example, one idea I had was to have my table variable only contain the columns of data that are used to group and sort and then join in the source table at the end to get the rest of the displayed data. I am not sure if this would make any difference which is the reason for this post.
I need to support versions 2014, 2016 and 2017 of SQL Server in addition to SQL Azure. Essentially, I will not be able to use a specific feature of an edition of SQL Server unless that feature is available in all of the aforementioned platforms.
(This is not really an "answer" - I just can't add comments yet because my reputation score isn't high enough yet.)
I think your general approach is fine - essentially you are making a GUI generator for SQL. However a few things:
This type of feature is best suited for a warehouse or read only replica database. Do not build this on a live production transactional database. There are permutations that you haven't thought of that your users will find that will kill your database (it's also true from a warehouse standpoint, but they usually don't have response time expectations as a transactional database)
The method you described for doing paging is not efficient from a database standpoint. You are essentially querying, filtering, grouping, and sorting the same exact dataset multiple times just to cherry pick a few rows each time. If you have the data cached, that might be ok, but you shouldn't make that assumption. If you have the know how, figure out how to snapshot the entire final data set with an extra column to keep the data physically sorted in the order the user requested. That way you can quickly query the results for your paging.
If you have a Repository/DAL layer, design your solution so that in the future certain combinations of tables/columns can utilize hardcoded queries/stored procedures. There will inevitably be certain queries that pop up that cause you performance issues and you may have to build a custom solution for specific queries in order to get the desired performance that can't be obtained by your dynamic sql

How do I paginate Parse.com pointers? A list of message senders and receivers, for example

I'm trying to set up a scalable messaging system following the example offered by Parse.com. I want to paginate the users that the user has interacted with, that is, the list of senders and receivers.
I can query for all messages:
var senderQuery = new Parse.Query("Messages");
var receiverQuery = new Parse.Query("Messages");
senderQuery.equalTo("sender", request.user);
receiverQuery.equalTo("receiver", request.user);
var query = Parse.Query.or(senderQuery, receiverQuery);
But if I paginate this query and have many messages with one user, I might not see any other users until a few paginations later.
I'd like to query for just a list of distinct users, but there's no built-in query constraint that would return distinct values based on a column. Parse.com instead recommends that I:
query for all the rows, then iterate through them and track the distinct values for the desired column
But Parse won't let me query for more than 1000 objects at a time), and even if I could, this isn't a scalable approach.
How can I make messaging scalable?
Well for the start, you do not want to query a message table that grows vertically very quickly as it is certainly a bad idea in any scalable database. I suggest you create three Relation fields for every user,
Sent messages by that user,
Received messages for that user
UserInteractions to keep a list of all users you interacted with so far.
Now what you want to achieve is easy as you can simply exploit a neat feature of Parse Relation. When you add an object (which has already been added before) to a Relation field, you will never ever end up with a duplicate record ! This means you can get the list of the distinct users you've interacted with from your third Relation even if all the last 100s of messages you sent/received are from only a single user.

How to fetch thousands of data from database without getting slow down?

I want auto search option in textbox and data is fetching from database. I have thousands of data in my database table (almost 8-10000 rows). I know how to achieve this but as I am fetching thousands of data, it will take a lot of time to fetch. How to achieve this without getting slow down? Should I follow any other methodology to achieve this apart from simple fetching methods? I am using Oracle SQL Developer for database.
Besides the obvious solutions involving indexes and caching, if this is web technology and depending on your tool you can sometimes set a minimum length before the server call is made. Here is a jquery UI example: https://api.jqueryui.com/autocomplete/#option-minLength
"The minimum number of characters a user must type before a search is performed. Zero is useful for local data with just a few items, but a higher value should be used when a single character search could match a few thousand items."
It depends on your web interface, but you can use two tecniques:
Paginate your data: if your requirements are to accept empty values and to show all the results load them in block of a predefined size. goggle for example paginates search results. On Oracle pagination is made using the rownum special variable (see this response). Beware: you must first issue a query with a order by and then enclose it in a new one that use rownum. Other databases that use the limit keyword behave in a different way. If you apply the pagination techique to a drop down you end up with an infinite scroll (see this response for example)
Limit you data imposing some filter that limits the number of rows returned; your search display some results only after the user typed at least n chars in the field
You can combine 1 & 2, but unless you find an existing web component (a jquery one for example) it may be a difficult task if you don't have a Javascript knowledge.

Choosing the right model for storing and querying data?

I am working on my first GAE project using java and the datastore. And this is my first try with noSQL database. Like a lot of people i have problems understanding the right model to use. So far I've figured out two models and I need help to choose the right one.
All the data is represented in two classes User.class and Word.class.
User: couple of string with user data (username, email.....)
Word: two strings
Which is better :
Search in 10 000 000 entities for the 100 i need. For instance every entity Word have a string property owner and i query (owner = ‘John’).
In User.class i add property List<Word> and method getWords() that returns the list of words. So i query in 1000 users for the one i need and then call method like getWords() that returns List<Word> with that 100 i need.
Which one uses less resources ? Or am i going the wrong way with this ?
The answer is to use appstats and you can find out:
AppStats
To keep your application fast, you need to know:
Is your application making unnecessay RPC calls? Should it be caching
data instead of making repeated RPC calls to get the same data? Will
your application perform better if multiple requests are executed in
parallel rather than serially?
Run some tests, try it both ways and see what appstats says.
But I'd say that your option 2) is better simply because you don't need to search millions of entities. But who knows for sure? The trouble is that "resources" are a dozen different things in app engine - CPU, datastore reads, datastore writes etc etc etc.
For your User class, set a unique ID for each user (such as a username or email address). For the Word class, set the parent of each Word class as a specific User.
So, if you wanted to look up words from a specific user, you would do an ancestor query for all words belonging to that specific user.
By setting an ID for each user, you can get that user by ID as opposed to doing an additional query.
More info on ancestor queries:
https://developers.google.com/appengine/docs/java/datastore/queries#Ancestor_Queries
More info on IDs:
https://developers.google.com/appengine/docs/java/datastore/entities#Kinds_and_Identifiers
It really depends on the queries you're using. I assume that you want to find all the words given a certain owner.
Most likely, 2 would be cheaper, since you'll need to fetch the user entity instead of running a query.
2 will be a bit more work on your part, since you'll need to manually keep the list synchronized with the instances of Word
Off the top of my head I can think of 2 problems with #2, which may or may not apply to you:
A. If you want to find all the owners given a certain word, you'll need to keep that list of words indexed. This affects your costs. If you mostly find words by owner, and rarely find owners by words, it'll still make sense to do it this way. However, if your search pattern flips around and you're searching for owners by words a lot, this may be the wrong design. As you see, you need to design the models based on the queries you will be using.
B. Entities are limited to 1MB, and there's a limit on the number of indexed properties (5000 I think?). Those two will limit the number of words you can store in your list. Make sure that you won't need more than that limit of words per user. Method 1 allows you unlimted words per user.

Paging of Query Result Set

Greetings Overflowers,
I'm wondering if there is a way to query some kind of a database and only fetch a certain window in the full result set without having to actually go through them all.
For example, if I query my database and I want only results number 100 to 200, would the database fetch all the results (say 0 to 1000) that match my query and later on filter them to exclude any thing outside my specified window frame ?
Actually, I'm working on a full text search problem (not really relational db stuff).
So how about Google and other search engines, do they get full result then filter or do they have direct access to only the needed window frame ?
Thank you all !
Your question is probably best answered in two parts.
For a database (traditional, relational), a query that is executed contains a number of "where" clauses, which will cause the database engine to limit the number of results that it returns. So if you specify a where clause that basically limits between 2 values of the primary key,
select * From table where id>99 and id<201;
you'll get what you're asking for.
For a search engine, a query you make to get the results will always paginate - using various techniques, all the results will be pre-split into pages and a few will be cached. Other pages will be generated on demand. So if you want pages 100-200 then you only ever fetch those that are needed.
The option to filter is not very efficient because large data sources never want to load all their data into memory and slice - you only want to load what's needed.

Resources