I currently have a client-server application. The client needs to be able to generate reports. These reports can have millions of records. What is the best way to get the data to the client? Sending all million records at once is not very efficient. Let's say I just want to display 20 at a time and page through all the results. Would I have the server do the query each time a user clicks on next page or should I have the server get the entire result set and then sent parts of it to the client?
Use a cursor - it is perfect for these purposes.
Note - a report with million rows is not well designed report. Nobady can process it. But this is question for analytic.
http://www.commandprompt.com/ppbook/x15040
I used Limit in sql query to set a range of records fetched. but the query has to originate from client and the query must pass the range.
It depends on scripting, if you use Dotnet+npgsql connector
I would generate a Pagenumber in a combobox on form
Then multiply the pagenumber with Rows*(pagenum-1) (20*1 for page2) to get starting rownum
Pass it in form of sql to database
Retrieve rows and display in datagridview
Although, i might just add, searching through a million record Page_by_page is not wisdom. i hope there are some filters to narrow down search to a few pages. Because research show that views beyond page10 drop exponentially.
Related
I have a SQL Table with close to 2 Million rows and I am trying to export this data into an Excel file so the stakeholders can manipulate data, see charts, so on...
The issue is, when I hit refresh, it fails after getting all the data saying the number of rows exceed max rows limitation in Excel. This table is going to keep growing every day.
What I am looking for here is a way to refresh data, then add rows to Sheet 1 until max rows limitation is reached. Once maxed out, I want the rows to start getting inserted into Sheet 2. Once maxed out, move to 3rd sheet, all from the single SQL table, from a single refresh.
This does not have to happen in Excel (Data -> Refresh option), I can have this as a part of the SSIS package that I am already using to populate rows in the SQL table.
I am also open to any alternate ways to export SQL table into a different format that can be used by said stakeholders to create charts, analyze data, and whatever else pleases them.
Without sounding too facetious, you are suggesting a very inefficient method.
The best way of approaching this method is not to use .xlsx files at all for the data storage.
Assuming your destination stakeholders don't have read access to the SQL server, export the data to .csv and then use Power Query in some sort of 'Dashbaord.xlsx' type file to load the .csv to the data model which can handle hundreds of millions of rows instead of just 1.05m.
This will allow for the use of Power Pivot and DAX for analysis and the data will also be visible in the data model table view if users do want raw rows (or they can refer to the csv file..).
If they do have SQL read access then you can query the server directly so you don't need to store any rows whatsoever as it will read directly.
Failing all that and you decide to do it your way, I would suggest the following.
Read your table into a Pandas df and iterate over each row and cell of the dataframe, writing to an your xlsx[sheet1] using openpyxl then once the row number reaches 1,048,560 simply iterate to xlsx[sheet2].
In short: openpyxl allows you to create workbooks, worksheets, and write to cells directly.
But depending on how many columns you have it could take incredibly long.
Product Limitations
Excel 2007+ 1,048,576 rows by 16,384 columns
A challenge with your suggestion of filling a worksheet with the max number of rows and then splitting is "How are they going to work with that data?" and "Did you split data that should have been together to make an informed choice?"
If Excel is the tool the users want to use and they must have access to all the data, then you're going to need to put the data into a Power Pivot data model (and yes, that's going to impact the availability of some data visualizations). A Power Pivot model is an in-memory tabular data set. What the means is that the data engine, xVelocity, is going to use a bunch of memory but can get over the 1 million row limitation. Depending on how much memory is required, you might need to switch from the default 32 bit Office install and go with a 64 bit install (and I've seen clients have to max RAM out on old, low end desktops because they went cheap for business users).
Power Pivot will have a connection to your SQL Server (or other provider). When it refreshes data, it's going to fire off queries and determine the unique values in columns and then create a dictionary of unique values. This allows it to compress the data with low cardinality really well - sales dates are likely going to be repeated heavily within your set so the compression is good. Assuming your customers are typically not-repeat customers, a customer surrogate key would have high cardinality and thus not compress well since there's little to no repeat. The refresh is going to be dependent on your use case and environment. Maybe the user has to manually kick it off, maybe you have SharePoint with Excel services installed and then you can have it refresh the data on various intervals.
If they're good analysts, you might try turning them on to Power BI. Same-ish engine behind the scenes but built from the ground up to be an response reporting tool. If they're just wading through tables of data, they're not ready for PBI. If they are making visuals out of the data, PBI is likely a better fit.
I have a series of combo boxes on a form - and I have connected them to lists of data in the past by using a pass-through query to a sql server, and while this performs well in most cases, many of my users have been accessing the database using a VPN, and the response to populate them seems slow.
I started programming the system to populate a local table from the pass through query on form load - then, used a local query to show results in the proper order. Once they select the first combo box the second combobox filters from there, to the third, and the fourth (in some cases I have more than 4 too),
All of this is intended so I don't have to connect to the server each time they change selections. However, I am thinking that having a value list that is hard coded might be faster in performance for the lists that don't change - such as the highest level in the selection hiearchy (think of 3 or 4 comboboxes that effect each other and filter the next ones results).
Does anyone know if using a hardcoded value list in a combobox has a performance gain over a local query to a local table in MS Access? I am using MS Access 2016 if it helps.
Ok, a few things: you do NOT want to use a pass-through query for a listbox/combo box. NEVER!!!!
Why? The Access client cannot filter a PT query. This seems contradictory here.
First, lets be clear: the fastest way to pull data is a PT query - hands down.
However lets assume the combo box is large - say 3000 rows.
AND ALSO assume the combo box is bound (that means it saves a value to the form's bound recordset.
Well, remember the above rule!!! - Access client NEVER is able to filter the PT query.
So, if you have a report that you run? If your PT query is not PRE-FILTERED (the criteria exists in the PT query - even a stored procedure), then adding a where clause to the form (or report), or in this case a combo box WILL NOT WORK - the FULL data set is returned and can not be filtered.
So, if you have a PT query with 500,000 rows, and you do this:
docmd.OpenReport "rtpInvoice", acViewPreview,,"InvoiceNumber = 1234"
The above will pull 500,000 rows.
However, if you used a plane jane linked table, or best link to a view, and it ALSO has 500,000 rows? Well only 1 record comes down the network pipe!
So, if that combo box is bound to 5000 rows, and a simple navigation to the NEXT record occurs? Well, the combo box will of course have a single value - the underlying recordset of the bound form. But, it cannot filter + display the ONE value for the combo box (most combo boxes are multi-column - 1st column = PK, and 2nd column = some kind of description.
So, access is actually quite smart, and it will attempt to ONLY pulls the ONE value from the 5000 rows driving the combo box. But the exception to the rule is of course a PT query!
Again: Access client cannot filter a PT query.
So, if the combo box has a large data set AND ALSO is feed by a PT query, then things run slow as a dog. I had one system in which the PT query was returning about 100,000 rows. Just a simple navigation to the next record was slow as a dog.
Replacing the PT query with a linked view made the form work near instant. This is because out of the 100,000 rows, Access is able to ONLY pull the ONE row for display of the combo box and that 2nd column.
So, while a PT query is the fastest way to pull data? Yes, 100% true, but you ONLY want to use that speed advantage WHEN you need ALL OF the records of that PT query. If you going to filter AFTER or AGAINST the PT query (such as is done for a bound combo box), then a full scan of all records driving that combo box will occur.
The best approach? Dump the PT queries for a combo box, and replace them with linked views. You can even specify a sort in the view (yes, it does shove in the top 100%). So ONLY place the view name in the combo box - don't fire up the query builder.
And, you can even get away using a direct link to a linked table. Again, Access client can filter that data set for the combo box.
So, there is no need to waste all that developer time building a PT query for a combo box and there is no need (nor advantage in speed) attempting to use code to load up and fill a combo box. But, if the PT query is going to return more than about 30 rows, then you want to dump the PT query and use a linked table, or even better is a linked view.
I would not mess around with pulling a table from server to local - you could do that, but that's just extra code, extra pain, and it really gains you very little. And often it can make things worse. Why pull a whole table of lots of rows for a combo box, when just launching the form, access is smart enough to ONLY pull the one row required for display of the combo box. It will not pull the data until you click on the combo to make a choice.
If your PT queries are say over 50 or 100 rows to fill the combo box, and that combo box i a bound combo box? Then you have to dump the PT query, and use a linked table/view - and you get much better performance.
Rather than using tables linked from SQL server, consider using local tables for any values that rarely change, such as countries.
Have the master tables held within SQL server. When changes are needed, update the master table, and then import to your front end before deploying.
This way you will get better speed than from non-local tables, and are able to use these tables within queries and the like. And as the same data is held both within the Access front end and SQL Server back end, you can still create views within SQL server using the same data.
Regards,
I am making a report that requires a complex query, the report has about 50 pages, and each page consists of a header and a table with around 30 rows.
The question is whether I should create a single dataset (Sql), even if you repeat the header information for each record, or is it better to create a dataset for the headers and a table to see the records of each header?
Which of the two options is more optimal?
I have tested with the software "Neor Profile Sql (Free)" and I have checked that when I run only one query I need less resources than when I run a query for each subdataset.
Thanks.
I have two different data sets using two different SQL queries. Essentially one data set is day/caller stats rolled up the other set is call data. So each call data set rolls up to get their day/caller data.
I needed to separate these two queries for performance because I needed one extract and one parameterized custom query for the call data. So essentially I will always bring in this month of data and last month for the day/caller data.
What I need to do is create one dashboard, that has the caller and all of their stats aggregated for the time period. Then I need to be able to click a row to prompt all the call data in a different sheet on the same dashboard
I am at the home stretch and need a way to connect these two sheets and update the call data. Right now I only have a parameter for the Unique ID of the callers not time, I bring in all the same days of calls even though it is really not needed. In a perfect world I will click the report caller and my second query will update to the appropriate day range and Unique ID and produce only that callers calls. My problem right now is no matter what I do I cannot create the one sheet to update the second call sheet. I have successfully created a manually functioning report but I need the action to filter to a timer period and the specific caller.
Let me know if you have any feedback. My two issues are creating two separate queries caller data (225k rows help in export) call data (7 million rows if unfiltered) which needs to be a live connection so when sheet is clicked the parameters will update and those calls will populate. Anything would help!
The solution i can think of is to use an action filter and there is an option below to select the fields to map between the sheets.Choose selected fields instead of all fields and map the id and time between the two data sources.
Apart from this i dont really get what the issue is.If you need further clarifications please rephrase your questions and provide examples and your data structure.
Currently we create Jasper PDF Reports from a single simple database table for our customers. This has been achieved programmatically. It's static. If the user wishes to change the query, he/she creates a change request, which we cannot deliver before the end of the next sprint (SCRUM).
The tool/library should be straight forward (e.g. convention over configuration) and employable from within a JavaEE container. And, open source.
Is there a dynamic tool that allows or customers to create the simple queries/reports themselves without knowing SQL? Means, they should be able to see the table and then create a query from it, execute and print (we could use Jasper Reports for the last one).
E.g. Select only data from year 2014, aggregate them by customer group and select columns x,y and z.
All of these criterias and query structure may change though, thus not just the value like year 2014.
Questions:
1) Is there a tool that presents the data in some kind of SAP-cube or something similar where the user could select the structure and attributes?
2) Can that tool save template queries (queries that the user has invoked before)?
thanks
With BIRT you could use parameters in the report... for example have one report that shows the whole data set or data cube (or at least a bit of all of the fields). Then you could add JavaScript to the report (or do all of the presentation in JavaScript for that matter), that shows the parameters a user can select from. These parameter values can then either be sent to a new report or could update the existing report. Parameters can be put into database queries too.
If that was exposed in JavaScript on a web page you could save the parameter values to an array and store them in the browser or server.