Power BI and SQL Server Indexes - sql-server

I've done some research without getting valuable information about my question.
I'm working on a data warehouse project and one my customer's requirement is that to use power bi pro for data visualisation.
What is not clear to me is if power bi, while acquiring data in its data model, would or not benefit from the indexing structure developed in SQL Server.
Thank you in advance for recommendation/tips on this subject.

It somewhat depends on whether you are using a live connection.
Existing indexes may speed up data loading when using PowerBI in import mode where the data source is a view, query, or stored procedure.
They will also be used in Live mode when connecting to the above sources, and might be used when connecting directly to multiple tables.
As the comments state, if you are bringing entire tables into PowerBI with import mode, then the existing indexes will not benefit you, and the internal SSAS instance that PBI uses is a whole different kettle of fish.
One caveat is that columnstore indexes can be used to get around some of the data size limitations when dealing with the gateway as described here: https://community.powerbi.com/t5/Power-Query/Using-SQL-Server-with-Nonclustered-Columnstore-Index/td-p/563787, but that's not directly related to your question.

Indexes help with retrieval speed on the server end. The answer to how much it will help depends on the specifics of your situation. If you are doing a lot of data transformation and mashup in the Power BI query editor, indexes will only help where there is a step that selects rows from the SQL Server. It won't help with steps where the processing is being done on the Power BI end (such as merging with data from an Excel file or adding custom columns or some forms of substituting values). However, since you mention a data warehouse rather than a simple database, I'm going to assume you're barely doing any transformation on the Power BI end, relying instead on the server end to do the heavy lifting. In that case indexes will definitely help speed things up if they're done strategically
There are some difference between Import mode and Connect live mode.
Import mode:
Data import can be used against any data source type, it can combining Data from different sources. Current Power BI service limitation published file size is 1 GB.
When using import, data are stored in Power BI file/service. Therefore, there is no need to setup permissions on data source side (service account for load is enough) and you can share data publically or with people outside organization. On the other hand, all data are stored on Power BI. It is supported to implement full DAX expressions and full Power Query transformations.
Connect live mode:
There are more limitations for live connection in place. It doesn’t work against all data sources. Current list can be seen here, it cannot combine data from multiple sources.
You are also limited to just one data source/database you selected. You can’t combine data from multiple data sources anymore. If you are connected to SQL Database, you can still create logical relationships between objects from that database as well as measures and calculated columns. When you are connected to SQL Server Analysis Services, you are limited just to report layout and even can’t make calculated columns ,while you can only create measures currently. When using live connection, users have to have access to underlying data source. This means you can’t share outside of your organization or publically. And It is not supported to implement full DAX expressions, only Report Level Measures, to learn more about report level measures, watch this great video from Patrick, and there is no Power Query transformations.
You can learn more: directquery-live-connection-or-import-data-tough-decision

Related

I have text data, and want to get it into AWS

I have what is essentially a traditional relational database, consisting of four tables, all related with IDs. Currently this database resides in four tab-delimited text files, in an S3 bucket. Very little, if any, data will ever be added to these tables. It is an unchanging reference database. So it will be exclusively read from, never added to or edited.
I would like to access this database in an Alexa skill. I've built a few skills already, using NodeJS, so I know how that all works. But I'm anxious to learn how to link up a skill with a back-end DB. This skill will need to do SQL SELECT statements against this DB, based-on user-provided parameters, and based on the query filter be able to pull a set of records into an array that can be used by my skill's lambda function.
Each of the current text files holds one of four tables. The largest table is about 35k rows. Whole DB is maybe 5 Mb, 90% of which is one of the four. Like I said, they are all connected with ID columns like a traditional RDBMS. This will not be for commercial purposes. Probably.
I am already familiar with SQL Server, it's the DB I know, and I'm comfortable with SQL Server Express and can whip something up there, but I'm open to learning NoSQL or some other method if it's more appropriate for this use case. And as this is mostly a learning exercise, if something is "just as good", it's good for me to know.
What is my best DB solution?
* NoSQL such as DynamoDB?
* Some sort of MySQL?
* SQL Server?
* Leave them as tab-delimited text and use them from the Lambda function directly?
Thanks, I don't want to start down the wrong road here.
A few options...
S3 Select
S3 Select (in Preview at the time of writing this) "enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases – in many cases you can get as much as a 400% improvement."
DynamoDB
The benefit of using DynamoDB is that there is no need to run a database server -- it is a fully-managed service. While it doesn't support SQL syntax, it is very fast and can suit many use-cases.
In fact, most projects should consider using a NoSQL database like DynamoDB for every situation, unless there is a particular reason to use SQL (such as business reporting).
Cost is based upon storage and provisioned capacity (which can scale-up and down based on demand).
SQL Database
Yes, you can certainly run an SQL database, either through Amazon RDS (Relational Database Service) or on your own EC2 instance (eg MySQL or even Apache Derby. However, you are then paying for the server even when it isn't being used.
Using Microsoft SQL Server is probably too much for your use-case (and more expensive than using an open-source product).
I wonder if you could incorporate SQLite in your app, which would provide SQL capabilities without much overhead?
Do it in memory
5 MB is, quite frankly, not much data. You could simply load all the data into memory and do your manipulations from there. While the load might consume a few cycles, data access will be very quick after that.

Options for loading a bunch of disparate data sets into a 'target' schema

Background
5-10 data sources
Various formats (csv, psv, xml)
Different update schedules (weekly, monthly, quarterly)
Requirements
Only interested in some of the fields from each data source
Want to build a model from the various sources, into a single database (SQL Server)
Current platform/skillset
Azure
SQL Server
Considerations
Minimal code. Hopefully i can do this all via a UI/drag-drop interface.
Automation. Hoping i can drop the files onto a server when it needs to be updated, then "things" kick off (Azure Functions blob/FTP trigger?)
Questions
I haven't done much in the ETL space, but my initial thoughts point to something like SQL Server Integration Services, mainly because that's the only thing i can ever had experience in, ETL-wise.
Now that we have things like Azure Data Factory, SQL Data Warehouse, etc, would that be a better solution? Obviously the answer is "it depends", so what questions do i need to go about asking myself in order to clarify that? Can someone please point me to a good article to get started in this space?
TIA
The main question is where do you want to stage the data.
Many people are talking about Azure Data Lake as a staging area. There are pros and cons to this solution.
The pros are Azure Active Directory Service can be federated with your on premise forest. Once that is done, regular Access Control List can be used to restrict access.
The cons are the fact that you are using premium storage (SSD) which can cost a-lot of money for a small to medium size company.
On the other hand, Azure Blob Storage has been around for a long time. One of the pros is the cost of this storage. A shared access signature (SAS) can be used to let anyone access to the account.
The cons is that the SAS is the key to the whole kingdom. Unlike ADLS, you can not assign privledges at the file.
If you like SQL Server OpenRowSet or Bulk Insert, you are in for a treat. Support for those functions were added earlier this year.
Check out my article on MS SQL TIPS for the details.
As for scheduling, you can use a very simple Power Shell script in Azure Automation to create a hands off process.
Azure Data Factory might be able to do some of these tasks; However, you adding a-lot more complexity than a simple T-SQL statement to load data into a table.
Last but not least, learn to love PowerShell. You can pretty much do any type of file processing with that language and the right .NET components.
Happy coding.
John Miner
The Crafty DBA

Do I need a cube?

We have a content ingestion system which receives (mobile) digital contents of different types (Music, Ringtone, Video, Game, Wallpaper etc) from various providers (Sony, Universal Music, EA Games etc) and then dispatches them across several online stores (e.g. Store1, Store2 etc).
The managers want to know how many of each content type, in a given time window, has been come through from each suppliers and they have gone to which store!
To me it seems like a report that needs an OLAP cube. Am I correct? The problem is that I am a .NET developer and not much skilled in BI and SQ Server Analysis Services therefore I want to make this simple yet flexible and meaningful. Is there an easier way of having a reporting cube, and a data mart to produce reports like this? (I am not sure if we can purchase SSAS and SSIS licenses at all).
And for such data mart and cube, what structure is suggested?
From your description, a cube isn't necessary. Assuming this data is in a database you can just write a query to get that result. If you've bought a licence of SQL Server (i,e, not the free edition) then you already have SSAS, SSIS, SSRS.
Some of a cube's main advantages are:
It's easier for end users to do adhoc reporting
Performance is often better than a relational (SQL Query) source
Some disadvantages are:
You need to spend processing time 'building' the cube
The query language (MDX) can be a challenge to learn
You don't have an adhoc user analysis requirement here
An SSAS cube presented in Excel Pivot Tables is probably still the most powerful and flexible end-user query tool out there, with a very low learning curve (most managers/analysts can already use Excel). Once they have a cube they can satisfy many requirements themselves, without you needing to constantly tweak queries. Even when they do want something more complex, you have a perfect source for report/query design and testing.
But designing and building an SSAS cube is very difficult and they are quite obscure to debug.
I suggest starting with Power Pivot - it's a free Excel Add-In that builds an in-memory cube, and presents the results as Excel Pivot Tables. It scales well through advanced compression and the resulting Model can be published to an SSAS Tabular server. The calculation language is DAX which is an improvement on the horrible MDX - DAX reads more like Excel functions.
This site is probably the best starting point for Power Pivot:
http://www.powerpivotpro.com/
You can solve this with just standard queries or views in SQL Server. Tools such as PowerPivot for Excel also allow you to create local cubes with very little effort.
Of course, purchasing an SSAS license and moving to a cube environment has several advantages, despite the extra cost:
Cubes are faster and allow for more complex calculations than SQL
Queries
With the introduction of the SSAS Tabular Model, making cubes really isn't hard anymore
Creating cubes often forces you to clean up your data model, which has a positive effect on your architecture overall in most cases
Create a cube might be overkilled for your scenario as your data is not quite complicate and not so big. But excel might not enough as it is hard to pivot data in your database directly.
You can try embed WebPivotTable into your website or your application. It provide all functions of excel pivot table and can be connect to CSV/Excel files or connect to database by web service interface. It is web based and the front end user interface are quite intuitive so that users can easily get what he want by simple drag and drops. Here is demo and Documents.
Of course, if you still want to create a cube, this tool can also be very helpful as it can connect to SSAS cubes directly.

MS Access database with no relations

Can anyone recommend a tool or suggest the approach when dealing with MS Access database with no relationships between tables?
As part of data migration project I am creating data mapping definition rules but it becomes more and more difficult and time consuming to correctly identify source tables/fields for extraction.
I have many tables with the same data appearing in different places. Furthermore, as there were no validation rules when data was input, many entries contain spelling errors or generally do not match expected data type. Most of the tables however already have the keys (primary & foreign) created.
I am looking for a quick solution to rebuild the database (*.mdb), ideally with a use of some software which could identify all potential data issues, suggest corrections, allow for adjustments and finally left off with fully relational database where the data can easily be identified and not scattered all over the place.
I have some general knowledge of databases and SQL but didn't use Access much before so I'm trying to save myself some of the time. And - if it matters - I don't care about database performance at all... Only the data itself. I will be extracting it to *.csv files later anyway...
Comments, suggestions and/or other considerations will be appreciated.
Thanks in advance
J.
I don't believe there is any software that will analyze an Access database and use some kind of artificial intelligence to generate a new database with good data and strong relationships.
My recommendation though is to export all the data into SQL Server (or even MySQL) and then work with it there. It's much easier to manipulate the data with a real query language instead of trying to scrub data in Access.
You can do mass updates, comparisons, joins, etc. with SQL Server. You can query the schema easily (write queries to see if a field appears in a table), change schemas/table definitions with code, etc.
Then once you're done you can use jobs (SSIS) to export the data to CSV.
(You can download SQL Express if you don't have/can't afford SQL Server.)

How to aggregate data from SQL Server 2005

I have about 150 000 rows of data written to a database everyday. These row represent outgoing articles for example. Now I need to show a graph using SSRS that show the average number of articles per day over time. I also need to have a information about the actual number of articles from yesterday.
The idea is to have a aggregated view on all our transactions and have something that can indicate that something is wrong (that we for example send out 20% less articles than the average).
My idea is to have yesterdays data moved into SSAS every night and there store the aggregated value of number of transactions and the actual number of transaction from yesterdays data. Using SSAS would hopefully speed up the reports.
Do you think this is the right idea? Should I skip SSAS and have reports straight on the raw data? I know how use reporting services on raw data using standard SQL queries but how would this change when querying SSAS? I don't know SSAS - where do I start ..?
The neat thing with SSAS is that you can get those indicators that you talk about quite easily either by creating calculated measures or by using KPIs.
I started with Delivering Business Intelligence with Microsoft SQL Server 2005. It had some good introduction, but unfortunately it's too verbose when it comes to the details. But if you want to understand SSAS, OLAP and reporting using this framework it's a good start.
Mosha Pasumansky has a blog on SSAS and MDX with great links.
Other than that I would recommend Microsofts Online books.
Are you sure you aren't mixing up SSAS (Analysis Services) and SSIS (integration services)?
SSAS is not an ETL, it is an OLAP tool.
SSIS is an ETL tool.
I agree with everything that Rowan said. I'm just confused by the terms.
SSAS is an ETL tool. Basically you get data from somewhere (your outgoing articles), do something to it (aggregate), and put it somewhere else (your aggregates table, data warehouse, etc). Check the link for details.
You probably won't be keeping all of the rows in the DB indefinitely and if you want to be able to report on longer trends you need in any case do some kind of aggregating of historical data. So making the reports use this historical data store as their source makes sense. You can then use it to do all kinds of fancy reporting.
TL;DR: Define your aggregated history table with your future reporting needs in mind. Use the SSAS to populate the table and refresh it from the daily updates. Report from that table. Further reading: Star Schemas and data warehousing.
#Sergio and #Rowan
Yes, we're not talking about loading and transforming data into the database (like a SSIS tool would do). That's solved using our integration platform.
#Riri maybe SSAS is overkill for the situation you presented. If you only need to daily populate sumarization tables, you can accomplish it by creating a regular JOB in SQL Server and doing it in a regular T-SQL script.
I've used this approach for several years in a daily process to calculate business indicators from about 9GB new data / day. It works, it's fast, it's simple and it uses a technology you're already used to. If your daily process get's more complicated (it needs to read from files, use FTP, send emails) you can move to a SSIS package (or any other ETL tool you like), but I cannot recommend using SSAS unless you need to provide OLAP capabilities to your users.

Resources