Do cubes use resources outside of the SQL Server Analysis Server? - sql-server

I have a cube that I have build that has data across multiple servers. After the cube is deployed to the SSAS server, does it interact with the SQL servers that contain the initial data in which the cube was based on? The reason I ask is because I have potentially a lot of users and some of the data is on one of our production servers which we don't want to be accessed during a query to the cube.
Thanks,
Ethan

A typical SSAS Cube copies all the data available to it (as per the tables/views you pull into the DSV) to it's own location, you can validate this by going to the storage path as defined in SSAS Server options and looking at the folder sizes. When you query the cube, it will use this 'copied data'.
Having said that, there are exceptions:
If you have ROLAP dimensions it can go through to the underlying data:
http://technet.microsoft.com/en-us/library/ms174915.aspx
If your cube is set up for proactive caching, then it could query the underlying databases itself in order to stay up-to-date:
http://msdn.microsoft.com/en-us/library/ms174769.aspx
Those are the only two I'm familiar with.
Do bear in mind that deployment will generally require processing afterwards, unless you're restoring from a backup you've processed elsewhere. Also bear in mind at some point you'll probably want to add new data into the cube, which you say comes from the production databases you don't want to interrupt.

Related

Advice on Azure platform to host Data Warehouse

I am a Data Warehouse developer currently looking into using the Azure platform to host a new Data Warehouse.
My experience is with using on premise servers hosting standard SQL Server Databases, one for the staging database and one for the Data Warehouse. Typically I would use a combination of SSIS and stored procedures running in a scheduled SQL server agent job for the ETL.
How can I replicate this kind of setup within Azure?
The storage size will be less than 1TB so could I just use Azure SQL Server Database over Azure SQL Data Warehouse?
If so would I need separate databases for staging and the data warehouse using the elastic pool option?
The data that I will be loading into staging will all be on premise. Will SSIS still be suitable for loading to Azure or will Azure Data Factory be a better fit?
Any help at all would be greatly appreciated! Thanks.
Leon has lots of good information there. But from a Data Warehouse perspective, I wouldn't use Data Sync for ETL purposes (mensioned as "not preferred" in the link Leon provided, Data Sync, in the list "When to use Data Sync").
For DW, Azure DB is a good option. Azure SQL Data Warehouse (known as Azure Synapse Analytics nowadays) is a heavy duty beast for handling DW. Are you really sure you need this kind of system with < 1Tb data? I'd personnally leave Azure Synaptics for now, and tried with Azure DB first. It's a LOT cheaper and you can upgrade later if necessary.
One thing to note about Azure DB though: Azure DB doesn't support queries over databases. That's not a deal breaker though, everything can be handled in the same database. I personally use a schema to differentiate staging from the DW (and of course I use other schemas in the DW as well). It's not very difficult to use separate databases of course, but the border between them is a lot deeper in Azure DB than on-premise SQL Server or other Azure solutions (Managed Instance for example).
SSIS is still an option, but the problem is, what you use to run the packages? There are options like:
continue running them from on-premise (all the hard work is still done in the cloud)
rent a VM with SQL Server from Azure, deploy the packages to the VM and run them from VM
use Data Factory to run the SSIS packages
None of those are a perfect solution for every use case. First two options come with quite a heavy cost, if running SSIS is the only thing you need them for. Using Data Factory to run SSIS is a bit cumbersome at the moment, but it's an option anyway.
Data Factory itself is a good option as well (I haven't personally tried it, but I have heard good things about it). If you use Data Factory to run your SSIS, why not start using Data Factory without SSIS packages in the first place? Of course Data Factory has some limitations compared to SSIS which might be the reason, but if your SSIS packages are simple enough, why not give Data Factory a try.
I would suggest you using Azure SQL database. It provides many price tier with difference storage for you. You can select the most suitable price tier for you. Azure SQL database also support scale up/down base on the usage.
Ref: Service tiers in the DTU-based purchase model
And as you said, the data that I will be loading into staging will all be on premise.
Azure SQL database has the feature Data Sync can help you do that:
Data Sync is useful in cases where data needs to be kept updated across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
When you create the SQL database, you can migrate the schema or data to Azure with many tools, such as Data Migration Assistant(DMA).
Then Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises, it will help sync the data auto every 5 mins.
Hope this helps.
If you want to start on the less expensive options in Azure, go with a general purpose SQL database and an Azure Data Factory pipeline with a few activities.
Dynamic Resource Scaling ETL
You can scale up the database by issuing an alter database statement and then move onto your stored proc based ETL. I would even use a "master" proc to call the dimension and fact proc's to control the execution flow. Then scale down the database with another alter database statement. I even created my own stored proc to issue these scaling statements.
You also cannot predict when the scaling will be completed, so I have a wait activity. You could be a little more nerdy with a loop that checks the service objective property and then proceeds when it is complete. But it was just easier to wait for 10 minutes. I have only been burnt a couple times when the scaling took longer.
Data Pipeline Activities:
Scale up, proceed if successful
Wait about 10 minutes, proceed always
Execute the ETL, proceed always
Scale down
Elastic Query
You can query across databases with vertical partition Elastic Query. Performance isn't great, and they don't recommend it for ETL, but it will work. To improve performance try dumping any large table you need into a temp table and then transform the data locally.

Which gives better performance for connection with Qlikview??SQL server or SSAS Cube?

I want to process my data into Qlikview but i am confused about to process the data through Cube or directly from SQL.
Can anyone tell me which gives better performance from cube and SQL?
Note: I have millions of data into the database.
Generally as the volume of data grows, the advantages of SSAS tend to become more apparent than those from using SQL Server as the source. How will the data be used? When it comes to large scale aggregations SSAS becomes very beneficial. SSAS will also force a structured layout, as the relationships are predefined in the cube as opposed to joins. Some additional features that SSAS brings are hierarchical analysis (hierarchies) as well as ease of use with tools such as Excel and SSRS, although it sounds like you're only looking to use Qlikview for this. However, your best option would be to do a baseline for both SSAS and SQL Server in your environment with queries that best represent what would be run when this is implemented, and assess the results from there.
From BI tool perspective it doesn't matter as you can connect to both source (SQL is more common but it depends on your expertise). Regarding performance the best strategy is to have separate extract layer and store data incrementally as qvd (for example every night previous day) so performance is not as important with incremental reload as even for big data sets it should be quick.
If your original source of data is SQL in my opinion it doesn't make sense to replicate data in 3 places (SQL, cube and QlikView) better connect directly to source save it incrementally raw data as qvd and then have transformer which will model that data.

SQL storage sizing - How to get statistics of what data is being accessed

How can I monitor which data is being accessed and which frequency?
I'm in need to migrate several (very) small SQL Server instances, each which several small databases. Current configuration is based in a lot of also small servers with local storage. New configuration is based in a single server with a single NAS.
So far, the SQL Server memory and CPU sizing is OK. Also DB sizes and total IOPS. But there's no existing documentation of what data set is actually being accessed. So, basically, I don't have a clue about what are the real storage requirements since the total amount of IOPS may be for only a couple of tables (so it would work like a charm with just a couple of SSD) or if the whole set of databases are being scanned all the time and I'll need several dozens of disks.
So, back to the question: How can I "profile" and get statistics of what data is being accessed? Either at SQL or Windows level?
The best way to see how much a table or groups of tables are being used is to use SQL Server Audit. It has very little impact on SQL Server's performance and can be easily set up to monitor selects (unlike triggers) in addition to inserts/updates/deletes.

How to make my SSAS cube production ready

I am developing a data warehouse + a data cube for a certain client using Microsoft technologies (SSIS, SSAS and SSRS). I have almost finished doing the data warehouse. I have created a data cube in SSAS and already did the initial setting up of the cube dimensions and facts. So far it was in a development environment. We are expecting to deploy the solution at a production server in about two weeks from now.
My question is (as this is my first enterprise level cube creation), I do not know whether the cube is production ready yet. Can anyone let me know whether there are anything in specific I should do to the cube before deploying it? Also I have kept the system suggested names for the cube dimensions and measures; do I need to change them before it goes to the users?
Any help is deeply appreciated.
I'm not sure what you mean by "enterprise level": are there specific security/audit requirements, availability levels, backup schedules or support procedures that you need to comply with?
And if your users have accepted the dimension names in the test environment and you have developed reports and even code using them, then why would you want to change them in production?
Assuming that you've already deployed cubes and packages successfully to different environments, then the things to check should be exactly the same every time: accounts and permissions, package configurations, scheduled jobs for batch processing etc.

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.
Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.
How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.
There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.
I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.
In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Resources