Do I need a cube? - sql-server

We have a content ingestion system which receives (mobile) digital contents of different types (Music, Ringtone, Video, Game, Wallpaper etc) from various providers (Sony, Universal Music, EA Games etc) and then dispatches them across several online stores (e.g. Store1, Store2 etc).
The managers want to know how many of each content type, in a given time window, has been come through from each suppliers and they have gone to which store!
To me it seems like a report that needs an OLAP cube. Am I correct? The problem is that I am a .NET developer and not much skilled in BI and SQ Server Analysis Services therefore I want to make this simple yet flexible and meaningful. Is there an easier way of having a reporting cube, and a data mart to produce reports like this? (I am not sure if we can purchase SSAS and SSIS licenses at all).
And for such data mart and cube, what structure is suggested?

From your description, a cube isn't necessary. Assuming this data is in a database you can just write a query to get that result. If you've bought a licence of SQL Server (i,e, not the free edition) then you already have SSAS, SSIS, SSRS.
Some of a cube's main advantages are:
It's easier for end users to do adhoc reporting
Performance is often better than a relational (SQL Query) source
Some disadvantages are:
You need to spend processing time 'building' the cube
The query language (MDX) can be a challenge to learn
You don't have an adhoc user analysis requirement here

An SSAS cube presented in Excel Pivot Tables is probably still the most powerful and flexible end-user query tool out there, with a very low learning curve (most managers/analysts can already use Excel). Once they have a cube they can satisfy many requirements themselves, without you needing to constantly tweak queries. Even when they do want something more complex, you have a perfect source for report/query design and testing.
But designing and building an SSAS cube is very difficult and they are quite obscure to debug.
I suggest starting with Power Pivot - it's a free Excel Add-In that builds an in-memory cube, and presents the results as Excel Pivot Tables. It scales well through advanced compression and the resulting Model can be published to an SSAS Tabular server. The calculation language is DAX which is an improvement on the horrible MDX - DAX reads more like Excel functions.
This site is probably the best starting point for Power Pivot:
http://www.powerpivotpro.com/

You can solve this with just standard queries or views in SQL Server. Tools such as PowerPivot for Excel also allow you to create local cubes with very little effort.
Of course, purchasing an SSAS license and moving to a cube environment has several advantages, despite the extra cost:
Cubes are faster and allow for more complex calculations than SQL
Queries
With the introduction of the SSAS Tabular Model, making cubes really isn't hard anymore
Creating cubes often forces you to clean up your data model, which has a positive effect on your architecture overall in most cases

Create a cube might be overkilled for your scenario as your data is not quite complicate and not so big. But excel might not enough as it is hard to pivot data in your database directly.
You can try embed WebPivotTable into your website or your application. It provide all functions of excel pivot table and can be connect to CSV/Excel files or connect to database by web service interface. It is web based and the front end user interface are quite intuitive so that users can easily get what he want by simple drag and drops. Here is demo and Documents.
Of course, if you still want to create a cube, this tool can also be very helpful as it can connect to SSAS cubes directly.

Related

Power BI and SQL Server Indexes

I've done some research without getting valuable information about my question.
I'm working on a data warehouse project and one my customer's requirement is that to use power bi pro for data visualisation.
What is not clear to me is if power bi, while acquiring data in its data model, would or not benefit from the indexing structure developed in SQL Server.
Thank you in advance for recommendation/tips on this subject.
It somewhat depends on whether you are using a live connection.
Existing indexes may speed up data loading when using PowerBI in import mode where the data source is a view, query, or stored procedure.
They will also be used in Live mode when connecting to the above sources, and might be used when connecting directly to multiple tables.
As the comments state, if you are bringing entire tables into PowerBI with import mode, then the existing indexes will not benefit you, and the internal SSAS instance that PBI uses is a whole different kettle of fish.
One caveat is that columnstore indexes can be used to get around some of the data size limitations when dealing with the gateway as described here: https://community.powerbi.com/t5/Power-Query/Using-SQL-Server-with-Nonclustered-Columnstore-Index/td-p/563787, but that's not directly related to your question.
Indexes help with retrieval speed on the server end. The answer to how much it will help depends on the specifics of your situation. If you are doing a lot of data transformation and mashup in the Power BI query editor, indexes will only help where there is a step that selects rows from the SQL Server. It won't help with steps where the processing is being done on the Power BI end (such as merging with data from an Excel file or adding custom columns or some forms of substituting values). However, since you mention a data warehouse rather than a simple database, I'm going to assume you're barely doing any transformation on the Power BI end, relying instead on the server end to do the heavy lifting. In that case indexes will definitely help speed things up if they're done strategically
There are some difference between Import mode and Connect live mode.
Import mode:
Data import can be used against any data source type, it can combining Data from different sources. Current Power BI service limitation published file size is 1 GB.
When using import, data are stored in Power BI file/service. Therefore, there is no need to setup permissions on data source side (service account for load is enough) and you can share data publically or with people outside organization. On the other hand, all data are stored on Power BI. It is supported to implement full DAX expressions and full Power Query transformations.
Connect live mode:
There are more limitations for live connection in place. It doesn’t work against all data sources. Current list can be seen here, it cannot combine data from multiple sources.
You are also limited to just one data source/database you selected. You can’t combine data from multiple data sources anymore. If you are connected to SQL Database, you can still create logical relationships between objects from that database as well as measures and calculated columns. When you are connected to SQL Server Analysis Services, you are limited just to report layout and even can’t make calculated columns ,while you can only create measures currently. When using live connection, users have to have access to underlying data source. This means you can’t share outside of your organization or publically. And It is not supported to implement full DAX expressions, only Report Level Measures, to learn more about report level measures, watch this great video from Patrick, and there is no Power Query transformations.
You can learn more: directquery-live-connection-or-import-data-tough-decision

Should OLAP cubes always be built upon a data warehouse?

I have a OLTP database that I want to do complex analyses of its data. I recently learned about OLAP cubes and SQL Server Analysis Services. Building a cube for analyzing the data seems like the right way to go.
However, when looking through the Microsoft SSAS tutorial, I wasn't able to clarify whether the cube is only meant to be built upon data warehouses or OLTP databases. I realize that a data warehouse could be as simple as a database (like I have). If I want to build a cube, will I have to create a warehouse of what I currently have? Should I even be thinking about data warehousing? Both seem like must-haves for data analysis.

Does Process Cube eliminate the need for SSIS?

I am trying to understand how SQL Server Analysis Services fits into the Business Intelligence field.
I have used SSIS to create copy databases and then SSRS to produce reports, which are accessed by he users.
I know that SSAS is a database engine, which allows you to create Cubes. There is an option in SSAS to process cube (http://technet.microsoft.com/en-us/library/aa216366(v=sql.80).aspx). Is SSAS a replacement for SSIS as it seems to do the ETL for you (using process cube)?
SSIS is an ETL tool providing you with the ability to move, manipulate and consolidate (from multiple sources) data. SSIS tends to be a developer tool used to get the data in the correct shape either for an application or a reporting tool.
SSAS is a cube building tool providing the business with the ability to slice and dice the data ad-hocly. Developers will build cubes, however the consumers will tend to by the business.
I have seen instances of SSAS cubes built pulling data directly from source, but these tend not to work very well, due to the load on the source systems and the complexity involved in structuring the data correctly.
A more typical approach is to utilise SSIS to pull (possibly only daily differences) and stage the data into a dimensional model that can then be cleanly consumed by SSAS. This way both tools are playing to their strengths - SSIS moves the data around and SSAS presents the data in an efficient and user friendly way.

Analysis Services with excel as front end - is it possible to get the nicer UI that powerpivot provides

I have been looking into PowerPivot and concluded that for "self service BI" and ahoc buidling of cubes it has its uses. In particular I like the enhanced UI that you get from using PowerPivot rather than just using a PivotTable hooked up to an analysis services datasource.
However it seems that hooking up PowerPivot to an existing analysis services cube is not a solution for "organisational BI". It is not always desireable to suck millions of rows into excel at once and the interface between PowerPivot and analysis services is very poor in my book.
Hence the question is can an existing analysis services solution get the enhanced ui features that power pivot brings, without using powerpivot as the design tool? If powerpivot is aimed at self service/personal BI then it seems bizare that the UI for this is better than for bigger/more costly analysis services solutions.
Although I agree that PowerPivot has a nicer UI than using Analysis Services via standard pivot tables, PowerPivot through the Excel client has some really bad drawbacks when trying to use it in lieu of Analysis Services.
You have to download all the rows into your spreadsheet to "refresh" the data. In large data warehouses, this is equivalent to having users run SELECT * queries directly against your database. It's horribly slow for the user and has a high resource usage cost to your server.
It is extremely easy for someone to either intentionally or unintentionally walk out of the office with your entire data warehouse in a non-secure manner. Ouch!
The end-user machines need to be pretty powerful. I tried using PowerPivot with a few small tables (5 million rows or less) on our standard company machine build and it did not have sufficient memory to refresh PowerPivot. The only way I can see to deploy PowerPivot across the enterprise is to upgrade all of the analyst machines to 64-bit Windows 7 with at least 6GB to 8GB of RAM. Although this can be feasible in a small organization, it is not a reasonable solution in a large enterprise.
You won't have any good metrics on how people are using your data if you hand out PowerPivot with unrestricted access to your data warehouse. Yes, you may have metrics on how frequently people hit the refresh button and you may be able to log which tables they are querying, but you won't see how they use the data unless you audit their spreadsheets directly. And even then, you will only get their final result -- not their path to how they got to the final result.
PowerPivot generates really, really big files. Even if someone drills the data down to a small subset of the total data, it is still difficult to share the files with others since large PowerPivot files generally exceed minimum Exchange server file size limits. I've encountered this at my organization despite never having had this problem with an Analysis Services files.
PowerPivot does not have a very good security model. Sure, you can restrict who gets to the data the first time, but you can't restrict it once it is in the spreadsheet. Analysis Services prevents users from making changes to the spreadsheet if they don't have access to the underlying cube. It's just so easy to compromise the security of your most valuable business data with PowerPivot.
PowerPivot does not currently scale for very large data sources. I have several multi-billion fact tables that just can't be downloaded by PowerPivot unless I pre-aggregate them down to a few hundred million rows. PowerPivot works really well for small data warehouses, but it doesn't elegantly scale to large data warehouses.
Please note my above comments don't apply to PowerPivot via SharePoint. I haven't tried the SharePoint integrated product out, but many of the above concerns seem to have been addressed from the documentation and demonstrations that I've seen of the SharePoint version of the product.
Despite all of the above comments, PowerPivot could work as a replacement for Analysis Services if you have a very small or immature data warehouse. If your largest fact table is a few million rows, then the overhead of building and maintaining a data warehouse may not be cost effective if you are a BI team of 1-2 people. PowerPivot is probably a great new feature for a department that doesn't have a dedicated BI team and only has a handful of Excel junky analysts. It doesn't take much sophistication to put together a virtual data mart from disparate data sources with PowerPivot. But if you want to build a truly professional data warehouse that is secure, scalable, and highly manageable, then I would recommend building cubes in Analysis Services and either use Excel or a 3rd party vendors tools for connecting to the Analysis Services OLAP cubes.
Now that SQL Server 2012 is released, you may want to take a look at using one or more SSAS BISM models, rather than PowerPivot. You get interop with PowerPivot, but you can now build your model using SSDT (in Visual Studio) and can get more control over security and can host on a dedicated server.
I'll be presenting live and online this spring and summer of the BISM - here's my latest deck on slideshare - http://www.slideshare.net/lynnlangit/sql-2012-bism
Now that Office 2013 preview is out, you can check out PowerView inside of Excel (PowerPivot) without the need to have SharePoint. It remains to be seen when MSFT will remove the dependency on Silverlight (i.e. move to HTML5). The preview release of Office 2013 that I got in September still included PowerPivot which required Silverlight. I am looking forward to the release built on HTML5. Here's a deck by Jen Underwood to give you an idea of what PowerView looks like.
WebPivotTable is a pure javascript pivot table and pivot chart component which can be used to pivot csv data and all kinds of OLAP cubes, include microsoft SSAS. It mimics all functionalities of Excel but web based, no dependence on any other plugins, drivers, server side compoenents. It can be easily to integrated into any web application and web sites.
Here is Demo and Documents
I know that Powerpivot is a free download for Excel 2010, but for a better desktop client experience you should look at the ProClarity client.
Also worth looking at Analyzer by Strategy Companion (http://www.strategycompanion.com).
I've found it provides a smooth web-based interface for slicing and dicing in pivot tables (and charts) that is nicer than what is provided by Excel 2007.
ProClarity was the runaway best option until Microsoft bought them and killed the product. Some of the features are making their way into other tools, but the product itself is no longer supported. Panorama or Tableau are probably the best 3rd party options.
This is the best I've found so far that is up-to-date: http://www.varigence.com/products/vivid/videos
Edit: http://silverlight.galantis.com is also a possible solution - WPF version comes out next month that could be used is a VSTO add-in.

How to aggregate data from SQL Server 2005

I have about 150 000 rows of data written to a database everyday. These row represent outgoing articles for example. Now I need to show a graph using SSRS that show the average number of articles per day over time. I also need to have a information about the actual number of articles from yesterday.
The idea is to have a aggregated view on all our transactions and have something that can indicate that something is wrong (that we for example send out 20% less articles than the average).
My idea is to have yesterdays data moved into SSAS every night and there store the aggregated value of number of transactions and the actual number of transaction from yesterdays data. Using SSAS would hopefully speed up the reports.
Do you think this is the right idea? Should I skip SSAS and have reports straight on the raw data? I know how use reporting services on raw data using standard SQL queries but how would this change when querying SSAS? I don't know SSAS - where do I start ..?
The neat thing with SSAS is that you can get those indicators that you talk about quite easily either by creating calculated measures or by using KPIs.
I started with Delivering Business Intelligence with Microsoft SQL Server 2005. It had some good introduction, but unfortunately it's too verbose when it comes to the details. But if you want to understand SSAS, OLAP and reporting using this framework it's a good start.
Mosha Pasumansky has a blog on SSAS and MDX with great links.
Other than that I would recommend Microsofts Online books.
Are you sure you aren't mixing up SSAS (Analysis Services) and SSIS (integration services)?
SSAS is not an ETL, it is an OLAP tool.
SSIS is an ETL tool.
I agree with everything that Rowan said. I'm just confused by the terms.
SSAS is an ETL tool. Basically you get data from somewhere (your outgoing articles), do something to it (aggregate), and put it somewhere else (your aggregates table, data warehouse, etc). Check the link for details.
You probably won't be keeping all of the rows in the DB indefinitely and if you want to be able to report on longer trends you need in any case do some kind of aggregating of historical data. So making the reports use this historical data store as their source makes sense. You can then use it to do all kinds of fancy reporting.
TL;DR: Define your aggregated history table with your future reporting needs in mind. Use the SSAS to populate the table and refresh it from the daily updates. Report from that table. Further reading: Star Schemas and data warehousing.
#Sergio and #Rowan
Yes, we're not talking about loading and transforming data into the database (like a SSIS tool would do). That's solved using our integration platform.
#Riri maybe SSAS is overkill for the situation you presented. If you only need to daily populate sumarization tables, you can accomplish it by creating a regular JOB in SQL Server and doing it in a regular T-SQL script.
I've used this approach for several years in a daily process to calculate business indicators from about 9GB new data / day. It works, it's fast, it's simple and it uses a technology you're already used to. If your daily process get's more complicated (it needs to read from files, use FTP, send emails) you can move to a SSIS package (or any other ETL tool you like), but I cannot recommend using SSAS unless you need to provide OLAP capabilities to your users.

Resources