I am making a report that requires a complex query, the report has about 50 pages, and each page consists of a header and a table with around 30 rows.
The question is whether I should create a single dataset (Sql), even if you repeat the header information for each record, or is it better to create a dataset for the headers and a table to see the records of each header?
Which of the two options is more optimal?
I have tested with the software "Neor Profile Sql (Free)" and I have checked that when I run only one query I need less resources than when I run a query for each subdataset.
Thanks.
Related
I have a SQL Table with close to 2 Million rows and I am trying to export this data into an Excel file so the stakeholders can manipulate data, see charts, so on...
The issue is, when I hit refresh, it fails after getting all the data saying the number of rows exceed max rows limitation in Excel. This table is going to keep growing every day.
What I am looking for here is a way to refresh data, then add rows to Sheet 1 until max rows limitation is reached. Once maxed out, I want the rows to start getting inserted into Sheet 2. Once maxed out, move to 3rd sheet, all from the single SQL table, from a single refresh.
This does not have to happen in Excel (Data -> Refresh option), I can have this as a part of the SSIS package that I am already using to populate rows in the SQL table.
I am also open to any alternate ways to export SQL table into a different format that can be used by said stakeholders to create charts, analyze data, and whatever else pleases them.
Without sounding too facetious, you are suggesting a very inefficient method.
The best way of approaching this method is not to use .xlsx files at all for the data storage.
Assuming your destination stakeholders don't have read access to the SQL server, export the data to .csv and then use Power Query in some sort of 'Dashbaord.xlsx' type file to load the .csv to the data model which can handle hundreds of millions of rows instead of just 1.05m.
This will allow for the use of Power Pivot and DAX for analysis and the data will also be visible in the data model table view if users do want raw rows (or they can refer to the csv file..).
If they do have SQL read access then you can query the server directly so you don't need to store any rows whatsoever as it will read directly.
Failing all that and you decide to do it your way, I would suggest the following.
Read your table into a Pandas df and iterate over each row and cell of the dataframe, writing to an your xlsx[sheet1] using openpyxl then once the row number reaches 1,048,560 simply iterate to xlsx[sheet2].
In short: openpyxl allows you to create workbooks, worksheets, and write to cells directly.
But depending on how many columns you have it could take incredibly long.
Product Limitations
Excel 2007+ 1,048,576 rows by 16,384 columns
A challenge with your suggestion of filling a worksheet with the max number of rows and then splitting is "How are they going to work with that data?" and "Did you split data that should have been together to make an informed choice?"
If Excel is the tool the users want to use and they must have access to all the data, then you're going to need to put the data into a Power Pivot data model (and yes, that's going to impact the availability of some data visualizations). A Power Pivot model is an in-memory tabular data set. What the means is that the data engine, xVelocity, is going to use a bunch of memory but can get over the 1 million row limitation. Depending on how much memory is required, you might need to switch from the default 32 bit Office install and go with a 64 bit install (and I've seen clients have to max RAM out on old, low end desktops because they went cheap for business users).
Power Pivot will have a connection to your SQL Server (or other provider). When it refreshes data, it's going to fire off queries and determine the unique values in columns and then create a dictionary of unique values. This allows it to compress the data with low cardinality really well - sales dates are likely going to be repeated heavily within your set so the compression is good. Assuming your customers are typically not-repeat customers, a customer surrogate key would have high cardinality and thus not compress well since there's little to no repeat. The refresh is going to be dependent on your use case and environment. Maybe the user has to manually kick it off, maybe you have SharePoint with Excel services installed and then you can have it refresh the data on various intervals.
If they're good analysts, you might try turning them on to Power BI. Same-ish engine behind the scenes but built from the ground up to be an response reporting tool. If they're just wading through tables of data, they're not ready for PBI. If they are making visuals out of the data, PBI is likely a better fit.
I was given a task to develop a report that would present the following details (as separate columns in ALV):
1) Name of the DB table (like MSEG, EKPO etc.)
2) Size of the DB table in megabytes
3) Number of records
4) Number of read requests performed on the table
5) Number of write requests performed on the table
There are DB* tables that contain such kind of info. Specifically I am referring to DB6PMHST and DB6HISTBS. When I try to view them via SE11 or SE16, system reports that these tables do not hold any records. I tried all three development, testing and production landscapes.
Please may you provide a guidance on what I am doing wrong? Maybe there are some other system tables that would contain the necessary info?
P.S. I tried to debug ST04 transaction in order to see the tables from which the report selects data, but wasn't able to find those lines of the source code.
I would deeply appreciate your kind assistance.
P.S.S. Checked the table MSSDBSTATT - it is empty as well (our enterprise uses MS SQL Database)
Go to SE38 and run this report RSTABLESIZE, enter table ID and see the magic.
The number of reads and writes on a table is a subject specific to the type of database (MSSQL) -> please tag your question accordingly.
If you get an answer by an MSSQL expert, which says that the data is stored in some MSSQL tables, then you cannot query those tables using "Open SQL" but you may query them using "native SQL" (i.e. EXEC SQL or ADBC for instance).
We are trying to create a report to link the Tag Items to their TestCaseIds.
However, the tables in the TFSWarehouse are empty. How would we link these sets of data.
It was very interesting - I did a search on every field on every table in the TFS database and came to know the Tags were actually embedded in XML in the rows. This is a very unique way of placing data, as the XML rows could be readily 'read' by a browser. However, this was very difficult to SELECT and perform JOINS on data.
What I am currently looking to do is use our existing UI to select a number of columns from various tables (yes, multiple tables) and pass them into a BIRT report as parameters. From there, I am planning on building a query that will dynamically replace the columns into the query and pull the results automatically. I'll have to hide columns with no value passed to them as well. I also expect I'll have to setup the query to be a little heavy handed and already know all of the logical connections in the database (e.g. connect the proper tables, etc).
My question is this the best way to manage a dynamic column/table in a dataset? or is there a better way to manage this method? I'v seen some online information about the "ad-hoc" BIRT report designer that lets non-programmers create reports, but I am not looking for other people to actually build the report, just generate one using an existing template with interchangeable columns.
I think the easiest way is to firstly build a report with all columns that you need.
Then apply some logic on the visibility of the column. You can use the parameters there as well.
Select your table, select a column, open the properties window and take a look at the visibility tab there. Just add some logic that results in a true or false.
If you are using a crosstab to display the information, you could use the filter options to include or exclude columns.
Yes, this will load data that is not used, but you need to build realy big reports for performance becomming a real issue.
If you try to add this logic in the actual dataset, you have to make the query and fetch script dynamic and then you still have the problem with the visualisation of the columns. I think you'll end up using the visibility script anyway (to show/hide the colums on the report), so might just start from there and have a working report fast.
I currently have a client-server application. The client needs to be able to generate reports. These reports can have millions of records. What is the best way to get the data to the client? Sending all million records at once is not very efficient. Let's say I just want to display 20 at a time and page through all the results. Would I have the server do the query each time a user clicks on next page or should I have the server get the entire result set and then sent parts of it to the client?
Use a cursor - it is perfect for these purposes.
Note - a report with million rows is not well designed report. Nobady can process it. But this is question for analytic.
http://www.commandprompt.com/ppbook/x15040
I used Limit in sql query to set a range of records fetched. but the query has to originate from client and the query must pass the range.
It depends on scripting, if you use Dotnet+npgsql connector
I would generate a Pagenumber in a combobox on form
Then multiply the pagenumber with Rows*(pagenum-1) (20*1 for page2) to get starting rownum
Pass it in form of sql to database
Retrieve rows and display in datagridview
Although, i might just add, searching through a million record Page_by_page is not wisdom. i hope there are some filters to narrow down search to a few pages. Because research show that views beyond page10 drop exponentially.