I use numerous SQL Data Connections to import data into Excel for use in pivot tables / slicers. Some of these take a while to update and display.Is there is any advantage in swapping some of these larger queries to Powerpivot imports? Is Powerpivot more efficient or is it essentially doing the same job as a SQL data connection?
Really it depends on the setup.
If the pivot table is using the SQL database directly, e.g. change in the slicers results in an SQL statement issued to the database server, then yes power pivot would be more be more efficient. This would be due to the fact that the pivot table would then query the Power Pivot data model which would be static snapshot of data in the data model. Only when the Power Pivot data model is refreshed would it query the SQL back-end.
The main advantage to Power Pivot would be the following:
Anything involving the pivot table would hit the Power Pivot data model, which would be local processing on the computer running excel
If data is loaded directly into the Power Pivot data model it allows you to bypass the max number of rows in an excel sheet
In addition the data within the data model is typically compressed by a factor of 10x. With a data set that has values what repeat frequently having a higher compression. Row_IDs, being unique would compress poorly.
As a real life example, I have managed to load 4.8 GB of CSV files of a retailer's by store by item by week POS data (34M rows) using excel 2016 on my low power work laptop. Since the data was fairly repetitive, it ended up creating a 280 MB excel file.
Fact that the excel version, Power BI desktop, Power BI Web Service and SSAS tabular model all use the same calculation language and design. In fact an excel Power Pivot Model can be directly loaded into Power BI desktop and then used to make dashboards
Allows for complex math to be preformed within the pivot table.
Downsides
The compression of the data models means that someone could walk away with a lot of data in a small file size
May not be as useful if people are looking for real time numbers directly from the system
Related
I have an SSAS multidimentional cube (sql server 2019) of 350 GB with a retention of 10 years of data.
I noticed that users often use the cube to extract data at the leaf level (Excel tables with multiple columns).
I think that SSAS is not suited for producing these type of reports.
What is the best tool / solution to let users genrate flat reports ? I know that sql is good for that but users aren't sql developers.
Could a PowerBI Model with direct query be more efficient than tha actual SSAS cube ?
Could a PowerBI Model with direct query be more efficient than tha actual SSAS cube ?
SSAS Multidimensional is exceptionally bad at generating large flattened results. Almost anything will be better. A PowerBI or SSASS Tabular DirectQuery model is much better, but not ideal for very large extracts. But be sure to extract through DAX not MDX. A Paginated Report exported to CSV or Excel is a good choice too.
I have a SQL Server database where we have created some views based on dim and fact tables. I need to build SSAS tabular model based on my tables and views. But one of the view runs for 1.5 hour inside SQL query (SSMS). Now I need to use this same view to build my SSAS tabular model but 1.5 hour is not acceptable. This view is made up of more than 10 table joins and lot of Where conditions.
1) Can I bring all these tables being used in this view inside my SSAS tabular model but then I am not sure how to join them all and use where clauses inside SSSAS and build something similar to my view. Is that possible? If yes how?
or
2) I will build one time SSAS model from that view and then if I want to incrementally load the data daily, whats is the best way to do that?
The best option is to set up a proper ETL process. That is:
Extract the tables from your source SQL database into a new SQL database that you control.
Transform the data into a star schema.
Load the data from the star schema into SSAS.
On SQL Server, the most common approach is use SSIS packages for data extraction, movement, and orchestration, and SQL Server Agent Jobs for scheduling.
To answer your questions:
Yes, it is certainly possible to bring in all of the tables directly from your source system into your tabular model, but please don't do this! You will only create problems for yourself later on when creating DAX calculations. More information here.
Incrementally loading data is something you decide for each table that is imported into your tabular model. Again, this is much easier if you have a proper star schema, as you would typically run a full processing on all your dimension tables, and then do incremental processing only on the largest fact tables.
I am using SSIS packages to extract data from SAP database tables into SQL Server tables. I am using OLEDB source/destination connections to achieve this.
The problem now is that a table in SAP has 5 Million records and its taking around 2 hours to extract this data into my SQL Server table. I have used the trunc-dump method (truncating the table in sql server and dumping data into it from SAP table) and also tried using Multiple Hash key to bring in the updated/new records.
The problem with Hash key is that it still has to scan the entire table to look for changed/new records and hence takes almost the same time as the trunc-dump method.
I am looking for a new way or changing the existing way to reduce the time taken to complete this extraction.
As you mentioned you were using OLEDB source connection to access SAP, if that means you were accessing SAP's underlying database directly, you should pause doing that for three reasons till there are explicit IT approvals:
You skipped SAP's application layer security. There can be an enterprise security compliance issue;
Your company's SAP license may not allow you to do that. If your company only has SAP indirect access license, then you may have to stay on application layer;
You will not get SAP's official support by accessing the underlying database directly.
You have multiple options to fetch data using SSIS through SAP application layer:
Use commercial SSIS custom components for this job (disclaimer: AecorSoft is one of the leading vendors offering such connectivity components);
Look into SAP's own OData Gateway interface to consume data.
Request your SAP ABAP team to write custom ABAP programs to dump SAP data into CSV files, and then use SSIS to fetch them.
Let's now look at the performance side:
SAP ETL Performance depends on many factors, but in general, even for the SAP transactional tables with 100+ columns, it's considered very slow to extract 5 millions rows per a couple of hours. For example, we've seen cases of extracting standard SAP General Ledger header table BKPF (almost 100 columns) at consistent performance of 1M rows every 1-2 minutes. Of course such performance is achieved through commercial component and SSIS, but you should expect at least 1M per 10 minutes even for the #3 option above, going through an intermediate CSV file. Under the hood, through SAP application layer, all the 3 options would leverage SAP Open SQL (in contrast to the "Native SQL" which the underlying database offers) to access SAP tables, therefore, if you experience application layer performance issue, you can analyze the Open SQL side.
As you also mentioned about update/new records scenario, it's a typical delta extraction problem. Normally, in SAP transactional tables, there are Create Date and Changed Date fields which can help you capture delta. In this case, in order to avoid full table scan, apply indices through SAP application layer on those "delta fields". For example, if you need to extract Sales Document Header VBAK table, you can filter by ERDAT (Created on) and AEDAT (Changed on). Delta is a complex subject in SAP. There is no simple statement to describe the delta solution, as SAP data models are complex and very different across functional modules. The delta analysis is always a case-by-case effort. Some people may also simply recommend using "delta extractors", but don't treat that as silver bullet, because extractor has its own problem. In short, if you look into table based extraction, focus on that, and try to work with your SAP functional team to determine the suitable delta fields. Try avoiding doing full table scan and hashing. Do incremental load with some optional overlap of previous extract (e.g. loading today and yesterday's records), and do MERGE to absorb the changes.
There are few cases you may not be able to find any delta field, and it is not practical to do full load all the time. One great example is the Address Master data table ADRC. In this case, if you are required to do delta load on such table, you ether have to request your SAP function team to figure out delta for you (meaning they inject custom logic to every place where Address master can be created, updated, or deleted), or you have to request your SAP Basis team to create DB trigger on the underlying database table, and expose the trigger table at application layer. This way, you can create an application layer view on the main table and the trigger table to do delta. Still, there is no direct database access through your solution. The DB layer trigger is fully managed and controlled by your SAP Basis team who also supports the database.
Hope this helps!
Hi i have a serveral cube tables on oracle 12c database. How respresent its with Microstrategy? The Object Intelligent Cube the Microstrategy don't represent correctly this cubes and It save in-memory sqls. I need execute sql realtime to cube table
A MicroStrategy cube is an in-memory copy of the results of an SQL query executed against your data warehouse. It's not intended to be a representation of the Oracle cubes.
I assume both these "cubes" organize data in a way that is easy and fast to use for dimensional queries, but I don't think you can import directly an Oracle cube into MicroStrategy IServer memory.
I'm not an expert with Oracle Cubes, but I think you need to map dimensions and facts like you would do with any other Oracle table. At the end an Oracle cube is a tool that Oracle provide to organize your data (once dimensions and metrics are defined) and speed up your query, but you still need to query it: MicroStrategy will write your queries, but also MicroStrategy needs to be aware of your dimensions and metrics (MicroStrategy facts).
At the end the a cube speeds up your queries organizing and aggregating your data, and it seems to me that you have achieved this already with your Oracle cube. A MicroStrategy cube is an in-memory structure that saves also the time required by a query against the database.
If your requirements are that you execute SQL against your database at all times, then you need to disable caching on the MicroStrategy side (this can be done on a report-by-report basis, or at a project level).
MicroStrategy Intelligent Cubes aren't going to be a good fit for you here, because they explicitly cache data, in order to decrease response time, and reduce load on your source database.
I'm going to use a single table to aggregate historical data about our (very big) virtual infrastructure. The table will be composed of 15 to 30 fields, and I esitmate from 500 to 1000 records a day.
Why a single table? A couple of reasons:
Data is extracted to csv using powershell scripts. Then bulk load on a single table is very easy and fast.
I will use the table to connect excel and report through pivot tables. Then a single table is perfect (otherwise I should create views).
Now my question:
If I'm planning in the future to build cubes upon this table is the "single-table" choice a bad solution?
Do cubes rely on relational databases or they can be easily built upon single-table databases?
Thanks for any suggestion
Can't tell you specifically about SQL Server Analysis Services, but for OLAP you typically use denormalized and aggregated data. That means fewer tables than in a normal relational scenario. And as your data volume is not really big (365k rows/year - even small for OLAP), I don't see any problem using a single table for your data.