I am accessing OLAP SSAS Cubes on a 2005 SQL Server using Excel 2007 pivot tables and finding that refreshing some of the tables is taking >10 minutes. My coworkers seem to think it is a sad reality, but I am wondering if there are alternatives I should be looking into.
Some thoughts I have had:
Obviously if I could upgrade the server hardware I would, but I am merely an analyst with no such powers, so I don't think hardware improvements are a great option. The same is true of moving to a newer SQL server, which I imagine would also speed up the process.
Would updating to a newer version of excel speed up the process?
I came across this: http://olappivottableextend.codeplex.com/, which gives me access to the MDX, which is apparently comically inefficient (Sounds like the macro recorder for VBA to me), so would changing the MDX around (I know a bit of it and the queries it gives for the pivot tables don't seem that complicated) be an option?
Would running MDX outside of excel be an option? I can write the queries, but I imagine it would not be as simple as the pivot table is.
It just seems like OLAP Cubes are a great solution in a lot of ways and these are some massive pivot tables processing quite a bit of information, but if there is a reasonable way to speed up the whole process I would love to know more about it.
Thanks for your thoughts SO.
There are many ways to access SSAS cubes, but it depends on what you are trying to achieve.
Excel tends to be used by business because
Its already installed
It is a familiar business tool
Easy to use
Requires no developer intervention
Other alternatives to Excel to access the cube include
SQL Server Analysis Services (management studio) via cube browser or mdx directly
SQL Server Reporting Services
Bespoke development (such as c#) utilising AdomdConnection
SQL Server (management studio) via OpenQuery
If you have been using Excel to access the cube so far, you will probably decide that none of the other tools quite cover your needs and you will end up sticking with it.
Assuming that Excel is the right tool for you, you should then move on to why is it slow. The list of possibilities (not including hardware / software) is long, but here are some;
It could be that it is external contention (to your project) on network / database / disk resource. The colume of data may be accumulating over time.
The cube may not be paritioned.
The questions you ask of it may be getting more complex.
The cube aggregations may not be utilised for your needs.
Cube partitioning may be missing
Cube structure may be inefficient as its supporting many-to-many relationships
User / query volume may have increased
To try to address the problem I would
Assess the data that you require within the cube (and maybe limit the cube to a rolling x month window)
Log your queries and apply Usage Based Optimisation
Monitor cube usage via SQL Server Profiler
Review the structure of your cube design
Attempt similar queries with other tools (both across the network and local to the cube) to establish where the issue lies
These two sites may help you if you establish Excel is the week point Excel, Cube Formulas, Analysis Services, Performance, Network Latency, and Connection Strings OR Excel, Cube Formulas, Analysis Services, Performance, Network Latency, and Connection Strings (which is on page 57 of SQLCAT's Guide to BI and Analytics)
Related
I want to process my data into Qlikview but i am confused about to process the data through Cube or directly from SQL.
Can anyone tell me which gives better performance from cube and SQL?
Note: I have millions of data into the database.
Generally as the volume of data grows, the advantages of SSAS tend to become more apparent than those from using SQL Server as the source. How will the data be used? When it comes to large scale aggregations SSAS becomes very beneficial. SSAS will also force a structured layout, as the relationships are predefined in the cube as opposed to joins. Some additional features that SSAS brings are hierarchical analysis (hierarchies) as well as ease of use with tools such as Excel and SSRS, although it sounds like you're only looking to use Qlikview for this. However, your best option would be to do a baseline for both SSAS and SQL Server in your environment with queries that best represent what would be run when this is implemented, and assess the results from there.
From BI tool perspective it doesn't matter as you can connect to both source (SQL is more common but it depends on your expertise). Regarding performance the best strategy is to have separate extract layer and store data incrementally as qvd (for example every night previous day) so performance is not as important with incremental reload as even for big data sets it should be quick.
If your original source of data is SQL in my opinion it doesn't make sense to replicate data in 3 places (SQL, cube and QlikView) better connect directly to source save it incrementally raw data as qvd and then have transformer which will model that data.
I have a 16GB CSV that I have imported into Power BI desktop. The workstation I am using is an Azure VM running Windows Server 2016 (64GB Memory). The import of the file takes a few seconds, however, when I try to filter the data set in query editor to a specific date range, it takes a fairly long time (it is still running and has been around 30 minutes so far). The source file (16GB CSV) is being read from a RAM disk that has been created on the VM.
What is the best approach/practice when working with data sets of this size? Would I get better performance importing the CSV in SQL server and then using direct query when filtering the data set to a date range? I would have thought it would run fairly quickly with my current setup as I have 64GB memory on available on that VM.
When the data size is significant, you also need appropriate computing power to process it. When you import these rows in Power BI, the Power BI itself needs this computing power. If you import the data in SQL Server (or in Analysis Services, or other), and you use Direct Query or Live Connection, you can delegate computations to the database engine. With Live Connection all your modeling is done on the database engine, while in Direct Query modeling is also done in Power BI and you can add computed columns and measures. So if you you Direct Query, you still must be careful what is computed where.
You ask for "the best", which is always a bit vague. You must decide for yourself depending on many other factors. Power BI is Analysis Services by itself (when you run Power BI Desktop you can see the Microsoft SQL Server Analysis Services child process running), so importing the data in Power BI should give you similar performance as if it was imported in SSAS. To improve the performance in this case, you need to tune your model. If you import the data in SQL Server, you need to tune the database (proper indexing and modeling).
So to reach a final decision you must test these solutions, consider pricing and hardware requirements and depending on that, decide what is the best for your case.
Recently, Microsoft made a demo with 1 trillion rows of data. You may want to take a look at it. I will also recommend to take a look at aggregations, which could help you improve the performance of your model.
I am starting a new project using SQL Server for a medical office. Their current database (SQL Server 2008) have over 500,000 rows that span across 15+ tables. Currently they are complaining that their data entry application is very slow to generate reports and insert new data.
For my new system I was thinking of developing a two tiered database approach where the primary used SQL Server 2012 will only contain 3 months worth of rows and the second SQL Server 2012 would maintain all the data for the system. This way when users insert new data it will be entered into a much smaller system and when they query recent data the query should execute much faster. This system will also have reporting, but I think the reports will have to be generated from the larger data set.
My questions are as follows
Will a solution like this improve the overall performance of the database
Are there any scalability concerns with this solution?
What is the best way to transfer that data between the two servers each night?
If my solution makes no sense please feel free to offer any other solutions.
Don't do this. Splitting your app into multiple databases will be a management nightmare. Plus, 500k records isn't that many, assuming that the records are of reasonable size.
Instead, go after the low-hanging fruit. Turn on logging and look at the access patterns. Which queries are slow? Figure out why. Do they lack indexes? Can the queries be simplified? Debug the problem.
Keep in mind that sometimes throwing hardware at the problem is the right solution. If you can solve the problem with an $800 server, do it. That's a lot cheaper than your time.
To chime in: 500K records is not so big. You ought to be able to make the db work very speedily as is with some tuning.
For a large Data Base which one is better in performance. which one will give data very fast on MDX query execution.
For a 200GB data set, SSAS on an ordinary wintel server will be fine. It plays nicely with readily available front-end tools and will be much, much cheaper than Oracle unless you already have incumbent OBIEE licencing.
I was trying to find a tool to increase performance in the reports of our application and I heard about OLAP + Reporting Services which is described as an excellent combination to do this work. Anyway I didn't find the way to keep the OLAP cube up-to-date since the data in the original DB can change. (It's a transactional application and one pending record can be mark as paid etc).
Is this the better way to do this, or should I use another technology?
If the suggestion is still to use OLAP + Reporting services how can I have the information up-to-date?
I have never used them but I've heard that astrology + fortune telling are extremely cheaper, faster, more efficient, make magics and require to provide even less input than you did in this question.
"Anyway I didn't find the way to keep the OLAP cube up-to-date since the data in the original DB can change."
It is called ROLAP storage mode
It is usual for a OLAP database to be populated on a regular schedule from your OLTP database using some form of ETL (Extract, Transform, Load).
In the SQL Server world, that is often accomplished using SSIS.
I suggest you read these books:
The Data Warehouse Toolkit
The Data Warehouse ETL Toolkit