Does Process Cube eliminate the need for SSIS? - sql-server

I am trying to understand how SQL Server Analysis Services fits into the Business Intelligence field.
I have used SSIS to create copy databases and then SSRS to produce reports, which are accessed by he users.
I know that SSAS is a database engine, which allows you to create Cubes. There is an option in SSAS to process cube (http://technet.microsoft.com/en-us/library/aa216366(v=sql.80).aspx). Is SSAS a replacement for SSIS as it seems to do the ETL for you (using process cube)?

SSIS is an ETL tool providing you with the ability to move, manipulate and consolidate (from multiple sources) data. SSIS tends to be a developer tool used to get the data in the correct shape either for an application or a reporting tool.
SSAS is a cube building tool providing the business with the ability to slice and dice the data ad-hocly. Developers will build cubes, however the consumers will tend to by the business.
I have seen instances of SSAS cubes built pulling data directly from source, but these tend not to work very well, due to the load on the source systems and the complexity involved in structuring the data correctly.
A more typical approach is to utilise SSIS to pull (possibly only daily differences) and stage the data into a dimensional model that can then be cleanly consumed by SSAS. This way both tools are playing to their strengths - SSIS moves the data around and SSAS presents the data in an efficient and user friendly way.

Related

Should OLAP cubes always be built upon a data warehouse?

I have a OLTP database that I want to do complex analyses of its data. I recently learned about OLAP cubes and SQL Server Analysis Services. Building a cube for analyzing the data seems like the right way to go.
However, when looking through the Microsoft SSAS tutorial, I wasn't able to clarify whether the cube is only meant to be built upon data warehouses or OLTP databases. I realize that a data warehouse could be as simple as a database (like I have). If I want to build a cube, will I have to create a warehouse of what I currently have? Should I even be thinking about data warehousing? Both seem like must-haves for data analysis.

Migrating data between two SQL Server 2014 DBs

I have two SQL Server 2014 DBs with different schemas. These DBs served two distinct web application operating in the same area of interest, hence I have similar tables in these two DBs. What is the easiest way to migrate data between them? I was thinking about a Transact-SQL script. Is there a tool that could solve this task more easily?
If the migration is relatively simple or if you want to reduce the number of tools involved you can stick with a tsql script. If you want to run it on a schedule you can execute it with SQL agent as TSQL or wrap it in a stored procedure and call that from the agent. If there are different servers involved you can create a linked server.
If you like a visual tool or if the process is very complex and you do not want to write tsql scripts then SSIS is a great tool that is specialized in taking data from disparate sources, applying transformations / conversions and importing. Some people also like to use SSIS for simple tasks because of the visual design surface.
Without more details it is hard to say the best route. If I had two DB's that were very similar I would consider merging the designs to accommodate both business lines / customers and add flexibility to allow more businiess lines / customers into the same design in the future

Do I need a cube?

We have a content ingestion system which receives (mobile) digital contents of different types (Music, Ringtone, Video, Game, Wallpaper etc) from various providers (Sony, Universal Music, EA Games etc) and then dispatches them across several online stores (e.g. Store1, Store2 etc).
The managers want to know how many of each content type, in a given time window, has been come through from each suppliers and they have gone to which store!
To me it seems like a report that needs an OLAP cube. Am I correct? The problem is that I am a .NET developer and not much skilled in BI and SQ Server Analysis Services therefore I want to make this simple yet flexible and meaningful. Is there an easier way of having a reporting cube, and a data mart to produce reports like this? (I am not sure if we can purchase SSAS and SSIS licenses at all).
And for such data mart and cube, what structure is suggested?
From your description, a cube isn't necessary. Assuming this data is in a database you can just write a query to get that result. If you've bought a licence of SQL Server (i,e, not the free edition) then you already have SSAS, SSIS, SSRS.
Some of a cube's main advantages are:
It's easier for end users to do adhoc reporting
Performance is often better than a relational (SQL Query) source
Some disadvantages are:
You need to spend processing time 'building' the cube
The query language (MDX) can be a challenge to learn
You don't have an adhoc user analysis requirement here
An SSAS cube presented in Excel Pivot Tables is probably still the most powerful and flexible end-user query tool out there, with a very low learning curve (most managers/analysts can already use Excel). Once they have a cube they can satisfy many requirements themselves, without you needing to constantly tweak queries. Even when they do want something more complex, you have a perfect source for report/query design and testing.
But designing and building an SSAS cube is very difficult and they are quite obscure to debug.
I suggest starting with Power Pivot - it's a free Excel Add-In that builds an in-memory cube, and presents the results as Excel Pivot Tables. It scales well through advanced compression and the resulting Model can be published to an SSAS Tabular server. The calculation language is DAX which is an improvement on the horrible MDX - DAX reads more like Excel functions.
This site is probably the best starting point for Power Pivot:
http://www.powerpivotpro.com/
You can solve this with just standard queries or views in SQL Server. Tools such as PowerPivot for Excel also allow you to create local cubes with very little effort.
Of course, purchasing an SSAS license and moving to a cube environment has several advantages, despite the extra cost:
Cubes are faster and allow for more complex calculations than SQL
Queries
With the introduction of the SSAS Tabular Model, making cubes really isn't hard anymore
Creating cubes often forces you to clean up your data model, which has a positive effect on your architecture overall in most cases
Create a cube might be overkilled for your scenario as your data is not quite complicate and not so big. But excel might not enough as it is hard to pivot data in your database directly.
You can try embed WebPivotTable into your website or your application. It provide all functions of excel pivot table and can be connect to CSV/Excel files or connect to database by web service interface. It is web based and the front end user interface are quite intuitive so that users can easily get what he want by simple drag and drops. Here is demo and Documents.
Of course, if you still want to create a cube, this tool can also be very helpful as it can connect to SSAS cubes directly.

Practical Implementation for Data Warehouse

Data warehousing seems to be a big trend these days, and is very interesting to me. I'm trying to acquaint myself with its concepts, and am having a problem "seeing the forest through the trees" because all of the data warehouse models and descriptions I can find online are theoretical, but don't gives examples with actual technologies being used. I'm a contextual learner, so abstracted, theoretical explanations don't really help me out all that much.
Now there seem to be many "data warehousing models", but all of them seem to have some similar characteristics. There is ually an "ODS" (operational data store that aggregates data from multiple sources into the same place. A process known as "ETL" then converts data in this ODS into a "data vault", and again into "data" and/or "strategy marts."
Can someone provide an example of the technologies that would be used for each of these components (ODS, ETL, data vault, data/strategy marts)?
It sounds like the ODS could just be any ordinary database, but the data vault seems to have some special things going on because it is used by these "marts" to pull data from.
ETL is the biggest thing I'm choking on by far. Is this a language? A framework? An algorithm?
I think once I see a concrete example of what's going on at each step of the way, I'll finally get it. Thanks in advance!
ETL is a process. The abbreviation stands for Extract-Transform-Load which describes what is being done with data during the process. The process can be implemented anywhere where you need to create a bridge between two systems with differenet data formats. First, you need to pull (exract) data from a source system (database, flat files, web service etc.), Then data are being processed (transform) to comply with format of a target storage (again it can vary: databases, files, API calls). During the transform step, further actions can be performed on the data set as enrichment with data from other sources, cleansing and improving its quality. The last step is loading transformed data into a target storage.
Typically, an ETL process is employed for loading a datawarehouse, migrating data from one system or database to another during moving from a legacy system to new one, synchronizing data between two or more systems. It is also used as an intermediate layer in broader MDM and BI solutions.
In terms of specific software, there are many ETL tools on the market ranging from robust solutions from big players as Informatica, IBM DataStage, Oracle Data Integrator, to more affordable and open source providers as CloverETL, Talend, or Pentaho. The most of these tools offer a GUI where flow and processing of data is defined through diagrams.
For Microsoft SQL Server 2005 and later the ETL tool is called SSIS (SQL Server Integration Services). If you install at least the Standard version of the SQL Server you get the Business Intelligence Developer Studio with which you can design your data flows. Basically what an ETL tool does is take data from one or more sources (tables, flat files, ...) then transform it (add columns, join, filter, map to different data types, etc.) and finally store it again to one or more tables or files.
To get a basic understanding of how something works you can watch e.g. this video or this one (both from midnightdba). They're a bit lengthy, but you get an idea. They certainly helped me in understanding the basic functionality of an ETL tool.
Unfortunately I have not yet digged into other platforms or tools.
I'd highly recommend checking out some of the books by Ralph Kimball and Margy Ross (The Data Warehouse Toolkit, The Data Warehouse Lifecycle Toolkit) for an introduction to data warehousing.
My company's data warehouse is built using the Oracle Warehouse Builder tool for ETL. The OWB is a GUI tool that generates PL/SQL code on the database to manipulate the data. After manipulation and cleansing, the data is published to an Oracle datamart. The datamart is a database instance that users access for ad-hoc querying via Oracle Discoverer (Java software).

How to aggregate data from SQL Server 2005

I have about 150 000 rows of data written to a database everyday. These row represent outgoing articles for example. Now I need to show a graph using SSRS that show the average number of articles per day over time. I also need to have a information about the actual number of articles from yesterday.
The idea is to have a aggregated view on all our transactions and have something that can indicate that something is wrong (that we for example send out 20% less articles than the average).
My idea is to have yesterdays data moved into SSAS every night and there store the aggregated value of number of transactions and the actual number of transaction from yesterdays data. Using SSAS would hopefully speed up the reports.
Do you think this is the right idea? Should I skip SSAS and have reports straight on the raw data? I know how use reporting services on raw data using standard SQL queries but how would this change when querying SSAS? I don't know SSAS - where do I start ..?
The neat thing with SSAS is that you can get those indicators that you talk about quite easily either by creating calculated measures or by using KPIs.
I started with Delivering Business Intelligence with Microsoft SQL Server 2005. It had some good introduction, but unfortunately it's too verbose when it comes to the details. But if you want to understand SSAS, OLAP and reporting using this framework it's a good start.
Mosha Pasumansky has a blog on SSAS and MDX with great links.
Other than that I would recommend Microsofts Online books.
Are you sure you aren't mixing up SSAS (Analysis Services) and SSIS (integration services)?
SSAS is not an ETL, it is an OLAP tool.
SSIS is an ETL tool.
I agree with everything that Rowan said. I'm just confused by the terms.
SSAS is an ETL tool. Basically you get data from somewhere (your outgoing articles), do something to it (aggregate), and put it somewhere else (your aggregates table, data warehouse, etc). Check the link for details.
You probably won't be keeping all of the rows in the DB indefinitely and if you want to be able to report on longer trends you need in any case do some kind of aggregating of historical data. So making the reports use this historical data store as their source makes sense. You can then use it to do all kinds of fancy reporting.
TL;DR: Define your aggregated history table with your future reporting needs in mind. Use the SSAS to populate the table and refresh it from the daily updates. Report from that table. Further reading: Star Schemas and data warehousing.
#Sergio and #Rowan
Yes, we're not talking about loading and transforming data into the database (like a SSIS tool would do). That's solved using our integration platform.
#Riri maybe SSAS is overkill for the situation you presented. If you only need to daily populate sumarization tables, you can accomplish it by creating a regular JOB in SQL Server and doing it in a regular T-SQL script.
I've used this approach for several years in a daily process to calculate business indicators from about 9GB new data / day. It works, it's fast, it's simple and it uses a technology you're already used to. If your daily process get's more complicated (it needs to read from files, use FTP, send emails) you can move to a SSIS package (or any other ETL tool you like), but I cannot recommend using SSAS unless you need to provide OLAP capabilities to your users.

Resources