One big query vs. many small ones? - database

I'd like to know, which option is the most expensive in terms of bandwith and overall efficiency.
Let's say I have a class Client in my application and a table client in my database.
Is it better to have one static function Client.getById that retrieves the whole client record or many (Client.getNameById, Client.getMobileNumberById, etc.) that retrieve individual fields?
If a single record has a lot of fields and I end up using one or two in the current script, is it still better to retrieve everything and decide inside the application what to do with all the data?
I'm using PHP and MySQL by the way.

Is it better to have one static function Client.getById that retrieves the whole client record or many (Client.getNameById, Client.getMobileNumberById, etc.) that retrieve individual fields?
Yes, it is.
Network latency and lag as well as the overheads of establishing a connection mean that making as small a number of database calls as possible is the best way to keep the database from saturation.
If the size of the data is really so much that you see an effect, you can consider retrieval of the specific fields you need in one single query (tailor the queries to the data).

Related

Which one is better, iterate and sort data in backend or let the database handle it?

I'm trying to design a database schema for Djabgo rest framework web application.
At some point, I have two choces:
1- Choose a schema in which in one or several apies, I have to get a queryset from database and iterate and order it with python. (For example, I can store some datas in an array-data-typed column, get them from database and sort them with python.)
2- store the data in another table and insert a kind of big number of rows with each insert. This way, I can get the data in my favorite format in much less lines with orm codes.
I tried some basic tests and benchmarking to see which way is faster, and letting database handle more of the job (second way) didn't let me down. But I don't have the means of setting a more real situatuin and here's the question:
Is it still a good idea to let database handle the job when it also has to handle hundreds of requests from other apies and clients each second?
Is database (and orm) usually faster and more reliable than backend?
As a general rule, you want to let the database do work when the work is appropriate for the database. Sorting result sets would be in that category.
Keep in mind:
The database is running on a server, often on a distributed system and so it has access to more resources.
Databases are designed to handle large data, so they are not limited by the memory in a single thread.
When this question comes up, often more data needs to be passed back to the application than is strictly needed. Consider a problem such as getting the top 10 of something.
Mixing processing in the application and the database often requires multiple queries and passing data back and forth, which is expensive.
(And there are no doubt other considerations.)
There are some situations where it might be more efficient or convenient to do work in the application. A common example is formatting result sets for the application -- say turning 1234.56 into $1,234.56. Other examples would be when the application language has capabilities that are not directly in SQL or are hard to implement in SQL.

Using application's internal cache while working with Cassandra

As I've been working with traditional relational database for a long time, moving to nosql, especially Cassandra, is a big change. I ussually design my application so that everything in the database are loaded into application's internal caches on startup and if there is any update to a database's table, its corresponding cache is updated as well. For example, if I have a table Student, on startup, all data in that table is loaded into StudentCache, and when I want to insert/update/delete, I will call a service which updates both of them at the same time. The aim of my design is to prevent selecting directly from the database.
In Cassandra, as the idea is to build table containing all needed data so that join is unnencessary, I wonder if my favorite design is still useful, or is it more effective to query data directly from the database (i.e. from one table) when required.
Based on your described usecase I'd say that querying data as you need it prevents storing of data you dont need, plus what if your dataset is 5Gb? Are you still going to load the entire dataset?
Maybe consider a design where you dont load all the data on startup, but load it as needed and then store it and check this store before querying again, like what a cache does!
Cassandra is built to scale, your design cant handle scaling, you'll reach a point where your dataset is too large. Based on that, you should think about a tradeoff. Lots of on-the-fly querying vs storing everything in the client. I would advise direct queries, but store data when you do carry out a query, dont discard it and then carry out the same query again!
I would suggest to query the data directly as saving all the data to the application makes the applications performance based on the input. Now this might be a good thing if you know that the amount of data will never exceed your target machine's memory.
Should you however decide that this limit should change (higher!) you will be faced with a problem. Taking this approach will be fast when it comes down to searching (assuming you sort the result at start) but will pretty much kill maintainability.
The former favorite 'approach' is however still usefull should you choose for this.

Heavy vs light queries

When you have an application that is constantly querying a database for information, in terms of performance, and database usage, is it better to have one big query that pulls in all the data at once or is it better to have a bunch of smaller query's that pull in the data one at a time. Does it matter?
Im trying to figure out if I should query my entire class once any value in the class is requested. Or to only query individual values as they are needed.
If others are paying for your usage, I would recommend pulling as much data as you can in a single connection and work with it locally. However, I don't know what 'usage' is defined as. It could be connections, bandwidth, operations, etc.

What is the best alternative to storing large amounts of data in the server session in an ASP MVC Application?

Ok so I'm working on an ASP MVC Web Application that queries a fairly large amount of data from an SQL Server 2008. When the application starts, the user is presented with a search mask that includes several fields. Using the search mask, the user can search for data in the data base and also filter the search by specifying parameters in the mask. In order to speed up searching I'm storing the result set returned by the data base query in the server session. During subsequent searches I can then search the data I have in the session thus avoiding unecessary trips to the DB.
Since the amount of data that can be returned by a data base query can be quite large, the scalability of the web application is severily limited. If there are, let's say, 100 users using the application at the same time, the server will keep search results in its session for each separate user. This will eventually eat up quite a bit of memory. My question now is, what's the best alternative to storing the data in session? The query in the DB can take quite a while at times so, at the moment, I would like to avoid having to run the query on subsequent searches if the data I already retrieved earlier contains the data that is now being searched for. Options I've considered are creating a temp table in the DB in my search query that stores the retrieved data and which can be used for subsequent searches. The problem with that is, I don't have all too much experience with SQL Server so I don't know if the SQL Server would create temp tables for each user if there are multiple users performing the search. Are there any other possibilities? Could the idea with the temp table in the SQL Server work or would it only lead to memory issues on the SQL Server? Thanks for the help! :)
Edit: Thanks a lot for the helpful and insightful answers, guys! However, I failed to mention a detail that's kind of important. When I query the database, the format of the result set can vary from user to user. This is because the user can decide which columns the result table can have by selecting columns from a predefined multiselect box in the search mask. If user A wants ColA, ColB and ColC to be displayed in his result table, he selects those values from the multiselect box in the search mask. User B, however, might select ColA and ColC only. Therefore, caching the results in a single table for all users might be a bit tricky since the table columsn are not necessarily going to be the same for all users. Therefore, I'm thinking, I'll almost have to use an alternative that saves each user's cached table separately. The HTML5 Local Storage alternative option mentioned below sounds interesting. Since this is an intranet application, it might be fair to assume (or require) that users have an up to date browser that supports HTML5. What do you guys think? Again, thanks for the help :)
If you want to cache query results, they'll have to be either on the web server or client in some form or another. All options will require memory, and since search results are user-specific, that memory usage will increase as a linear function of the number of current users.
My suggestions are to limit the number of rows returned from SQL (with TOP) and/or to look into optimizing your query on the SQL end. If your DB query takes a noticeable amount of time there's a good chance it can be optimized in SQL.
Have you already tought about the NoSql databases?
The idea of a NoSql database is to store information that is optimized for reading or writing and is accessed with 'easy queries' (for example a look-up on search terms). They scale easily horizontally and would allow you to search trough a whole lot of data very fast (Think of Google's BigData for example!)
if HTML5 is a possibility, you could use Local Storage.
You could try turning on sql session state.
http://support.microsoft.com/kb/317604
Plus: Effortless, you will find out if this fixes the memory pressure and has acceptable performance (i.e. reading and writing the cache to the DB). If the perf is okay, then you may want to implement the sort of thing that sql session does yourself because there is a ...
Down side: If you aren't aggressive about removing it from session, it will be serialized and deserialized on each request on unrelated pages.

Dataset retrieving data from another dataset

I work with an application that it switching from filebased datastorage to database based. It has a very large amount of code that is written specifically towards the filebased system. To make the switch I am implementing functionality that will work as the old system, the plan is then making more optimal use of the database in new code.
One problem is that the filebased system often was reading single records, and read them repeatedly for reports. This have become alot of queries to the database, which is slow.
The idea I have been trying to flesh out is using two datasets. One dataset to retrieve an entire table, and another dataset to query against the first, thereby decreasing communication overhead with the database server.
I've tried to look at the DataSource property of TADODataSet but the dataset still seems to require a connection, and it asks the database directly if Connection is assigned.
The reason I would prefer to get the result in another dataset, rather than navigating the first one, is that there is already implemented a good amount of logic for emulating the old system. This logic is based on having a dataset containing only the results as queried with the old interface.
The functionality only have to support reading data, not writing it back.
How can I use one dataset to supply values for another dataset to select from?
I am using Delphi 2007 and MSSQL.
You can use a ClientDataSet/DataSetProvider pair to fetch data from an existing DataSet. You can use filters on the source dataset, filters on the ClientDataSet and provider events to trim the dataset only to the interesting records.
I've used this technique with success in a couple of migrating projects and to mitigate similar situation where a old SQL Server 7 database was queried thousands of times to retrieve individual records with painful performance costs. Querying it only one time and then fetching individual records to the client dataset was, at the time, not only an elegant solution but a great performance boost to that particular application: The most great example was an 8 hour process reduced to 15 minutes... poor users loved me that time.
A ClientDataSet is just a TDataSet you can seamlessly integrate into existing code and UI.

Resources