I have created a windows form that uses SQL Server Database. The windows form contains a search grid which brings all the bank account information of a person. The search grid contains a special field "Number of Account" which displays the number of Accounts a person have associated with a bank.
There are more than 100,000 records in the table from where the data is fetched. I just wanted to know how should I decrease the response time or the search time while getting the data from the table in the search grid.
When I run the page it takes hell lot of time to get the records displayed in the search grid. Moreover, it does not get the data unless and until I provide a search criteria(To and from Date for searching)
Is their any possible way to decrease the search time so that the data should get displayed in the grid.
There are a few things that you can do:
Only fetch the minimum amount of data that you need for your results - this means only select the needed columns and limit the number of rows.
In addition to the above, consider using paging on the UI, so you can further limit the amount of data returned. There is no point in showing a user 100,000 rows.
If you hadn't done so already, add indexes to the table (though at 100,000 rows, things shouldn't be that slow anyway). I can't go into detail about how to do that.
Related
I'm using tableau with a bigquery data source that has 500M rows, 30 columns. In order to have this BQ data used by my workbooks I refresh an extract (hyper) every day.
In my workbooks I have 6 parameters and one filter that is a user filter.
I notice that the workbooks loading time is slow. It gets also slow when I change the values of parameters.
When using performance recording I get numbers of 40s per query.
While reneding time is in the order of milliseconds.
Is this normal even if I'm using extracts with less quick filters ? How could I enhance the performance of querying ?
Tableau Server info : I'm using Tableau 2022.1 on a 2 nodes server with 256Gb Ram.
How many marks are you pulling into the Dashboard? If it is a summary query /Dashboard, it shouldn't take this long. More the number of marks you pull out, higher will the time taken. Also, user filters could also be a problem. Try removing that to see the impact.
TL;DR version: I have a query linked to a database. I need to add some columns to it for data that isn't linked to the database, but don't know how.
I'm quite new to SQL and Access (got a reasonable grasp of Excel and VBA though) and have a pretty complex reporting task. I've got halfway (I think) but am stuck.
Purpose
A report showing how many (or what percentage of) delivery lines were late in a time period, with reasons for their being late from a set list, and analysis of what's the biggest cause of lateness.
Vague Plan
Create a table/query showing delivery lines with customer, required date and delivery date, plus a column to show whether they were on time, plus another to detail lateness reason. Summaries can be done afterwards in Excel. I'd like to be able to cycle through said table in form view entering lateness reasons (they'll be from a linked table, maybe 4 or 5 options).
Sticking Point
I have the data, but not the analysis. I've created the original data output query, it's linked to a live SQL database (Orderwise) so it keeps updating. However I don't know:
how to add extra columns and lookups to that to figure / record whether it's ontime and lateness reason
how I'll be able to cycle through the late ones in form view to add reasons
How do I structure the access database so it can do this please?
Is there any techniques to calculate actual used data size per every SQL table row? Including enabled Indexes and Log records?
Sum of field sizes would not be correct because some fields can be empty or data is less than field size.
Target is to know, how much exactly data is used per user.
Probably I can do this in handler side.
With the word "exactly", I have to say "no".
Change that to "approximately", and I say
SHOW TABLE STATUS
and look at Avg_row_length. This info is also available in information_schema.TABLES.
But, that is just an average. And not a very accurate average at that.
Do you care about a hundred bytes here or there? Do users own rows in a single table? What the heck is going on?
There are some crude formulas for computing the size of Data rows and Index rows, but nothing on Log records. One of the problems is that if there is a "block split" in a BTree because someone else inserted a row, do you divvy up the new block evenly across all users? Or what?
I'm looking for design and/or index recommendations for the problem listed below.
I have a couple of denormalized tables in an Azure S1 Standard (20 DTU) database. One of those tables has ~20 columns and a million rows. My application requirements need me to support sub-second (or at least close to it) querying of this table by any combination of columns in my WHERE clause, as well as sub-second (or at least close to it) querying of DISTINCT values in each column.
In order to picture the use case behind this, here is an example. Imagine you were using an HR application that allowed you to search for employees and view employee information. The employee table might have 5 columns and millions of rows. The application allows you to filter by any column, and provides an interface to allow this. Therefore, the underlying SQL queries that must be made are:
A GROUP BY (or DISTINCT) query for each column, which provides the interface with the available filter options
A general employee search query, that filters all rows by any combination of filters
In order to solve performance issues on the first set of queries, I've implemented the following:
Index columns with a large variety of values
Full-Text index columns that require string matching (So CONTAINS querying instead of LIKE)
Do not index columns with a small variety of values
In order to solve the performance issues on the second query, I've implemented the following:
Forcing the front end to use pagination, implemented using SELECT * FROM table OFFSET 0 ROWS FETCH NEXT n ROWS ONLY, and ensuring the order by column is indexed
Locally, this seemed to work fine. Unfortunately, and Azure Standard database doesn't have the same performance as my local machine, and I'm seeing issues. Specifically, the columns I am not indexing (the ones with a very small set of distinct values) are taking 30+ seconds to query for. Additionally, while the paging is initially very quick, the query takes longer and longer the higher and higher I increase the offset.
So I have two targeted questions, but any other advice or design suggestions would be most welcome:
How bad is it to index every column in the table? Know that the table does need to be updated, but the columns that I update won't actually be part of any filters or WHERE clauses. Will the indexes still need to be rebuilt on update? You can also safely assume that the table will not see any inserts/deletes, except for once a month where the entire table is truncated and rebuilt from scratch
In regards to the paging getting slower and slower the deeper I get, I've read this is expected, but the performance becomes unacceptable at a certain point. Outside of making my clustered column the sort by column, are there any other suggestions to get this working?
Thanks,
-Tim
I need some inspiration for a solution...
We are running an online game with around 80.000 active users - we are hoping to expand this and are therefore setting a target of achieving up to 1-500.000 users.
The game includes a highscore for all the users, which is based on a large set of data. This data needs to be processed in code to calculate the values for each user.
After the values are calculated we need to rank the users, and write the data to a highscore table.
My problem is that in order to generate a highscore for 500.000 users we need to load data from the database in the order of 25-30.000.000 rows totalling around 1.5-2gb of raw data. Also, in order to rank the values we need to have the total set of values.
Also we need to generate the highscore as often as possible - preferably every 30 minutes.
Now we could just use brute force - load the 30 mio records every 30 minutes, calculate the values and rank them, and write them in to the database, but I'm worried about the strain this will cause on the database, the application server and the network - and if it's even possible.
I'm thinking the solution to this might be to break up the problem some how, but I can't see how. So I'm seeking for some inspiration on possible alternative solutions based on this information:
We need a complete highscore of all ~500.000 teams - we can't (won't unless absolutely necessary) shard it.
I'm assuming that there is no way to rank users without having a list of all users values.
Calculating the value for each team has to be done in code - we can't do it in SQL alone.
Our current method loads each user's data individually (3 calls to the database) to calculate the value - it takes around 20 minutes to load data and generate the highscore 25.000 users which is too slow if this should scale to 500.000.
I'm assuming that hardware size will not an issue (within reasonable limits)
We are already using memcached to store and retrieve cached data
Any suggestions, links to good articles about similar issues are welcome.
Interesting problem. In my experience, batch processes should only be used as a last resort. You are usually better off having your software calculate values as it inserts/updates the database with the new data. For your scenario, this would mean that it should run the score calculation code every time it inserts or updates any of the data that goes into calculating the team's score. Store the calculated value in the DB with the team's record. Put an index on the calculated value field. You can then ask the database to sort on that field and it will be relatively fast. Even with millions of records, it should be able to return the top n records in O(n) time or better. I don't think you'll even need a high scores table at all, since the query will be fast enough (unless you have some other need for the high scores table other than as a cache). This solution also gives you real-time results.
Assuming that most of your 2GB of data is not changing that frequently you can calculate and cache (in db or elsewhere) the totals each day and then just add the difference based on new records provided since the last calculation.
In postgresql you could cluster the table on the column that represents when the record was inserted and create an index on that column. You can then make calculations on recent data without having to scan the entire table.
First and formost:
The computation has to take place somewhere.
User experience impact should be as low as possible.
One possible solution is:
Replicate (mirror) the database in real time.
Pull the data from the mirrored DB.
Do the analysis on the mirror or on a third, dedicated, machine.
Push the results to the main database.
Results are still going to take a while, but at least performance won't be impacted as much.
How about saving those scores in a database, and then simply query the database for the top scores (so that the computation is done on the server side, not on the client side.. and thus there is no need to move the millions of records).
It sounds pretty straight forward... unless I'm missing your point... let me know.
Calculate and store the score of each active team on a rolling basis. Once you've stored the score, you should be able to do the sorting/ordering/retrieval in the SQL. Why is this not an option?
It might prove fruitless, but I'd at least take a gander at the way sorting is done on a lower level and see if you can't manage to get some inspiration from it. You might be able to grab more manageable amounts of data for processing at a time.
Have you run tests to see whether or not your concerns with the data size are valid? On a mid-range server throwing around 2GB isn't too difficult if the software is optimized for it.
Seems to me this is clearly a job for chacheing, because you should be able to keep the half-million score records semi-local, if not in RAM. Every time you update data in the big DB, make the corresponding adjustment to the local score record.
Sorting the local score records should be trivial. (They are nearly in order to begin with.)
If you only need to know the top 100-or-so scores, then the sorting is even easier. All you have to do is scan the list and insertion-sort each element into a 100-element list. If the element is lower than the first element, which it is 99.98% of the time, you don't have to do anything.
Then run a big update from the whole DB once every day or so, just to eliminate any creeping inconsistencies.