Power BI dealing with 16gb CSV file - sql-server

I have a 16GB CSV that I have imported into Power BI desktop. The workstation I am using is an Azure VM running Windows Server 2016 (64GB Memory). The import of the file takes a few seconds, however, when I try to filter the data set in query editor to a specific date range, it takes a fairly long time (it is still running and has been around 30 minutes so far). The source file (16GB CSV) is being read from a RAM disk that has been created on the VM.
What is the best approach/practice when working with data sets of this size? Would I get better performance importing the CSV in SQL server and then using direct query when filtering the data set to a date range? I would have thought it would run fairly quickly with my current setup as I have 64GB memory on available on that VM.

When the data size is significant, you also need appropriate computing power to process it. When you import these rows in Power BI, the Power BI itself needs this computing power. If you import the data in SQL Server (or in Analysis Services, or other), and you use Direct Query or Live Connection, you can delegate computations to the database engine. With Live Connection all your modeling is done on the database engine, while in Direct Query modeling is also done in Power BI and you can add computed columns and measures. So if you you Direct Query, you still must be careful what is computed where.
You ask for "the best", which is always a bit vague. You must decide for yourself depending on many other factors. Power BI is Analysis Services by itself (when you run Power BI Desktop you can see the Microsoft SQL Server Analysis Services child process running), so importing the data in Power BI should give you similar performance as if it was imported in SSAS. To improve the performance in this case, you need to tune your model. If you import the data in SQL Server, you need to tune the database (proper indexing and modeling).
So to reach a final decision you must test these solutions, consider pricing and hardware requirements and depending on that, decide what is the best for your case.
Recently, Microsoft made a demo with 1 trillion rows of data. You may want to take a look at it. I will also recommend to take a look at aggregations, which could help you improve the performance of your model.

Related

Long loading time after creating Availability Groups and migrating in SQL

so I have this issue. Our client using MS SQL databases. Two months ago they migrated their databases to the SQL Enterprise 2019 from earlier version and Standard edition.
They major reason was to secure high availability through feature in MS SQL - Availability groups.
After that our application get really slowed. In the simply way to tell, customer startup an app select workspace and then its takes like 15 seconds to load data.
First step is just sending request to database to select data - no inserts, deletes or any high performance processes.
App is using and working with geographical and geometry data, every geo objects is saved in database as geometry data type. The first huge, major select is causing the slow issue.
When I was looking at activity mon under wait categories is only one thing suspicious to me and its type Other.
In database I dont see any high cost queries and availability group mode is set to synchronous.
If Im getting this right, the synchronous mode should not be the cause of this problem because this database is clearly for reading a data not as I mentioned modifying.
I made changes to some instance parameters and set Optimize for Ad hoc workloads to True and and threshold for parallelism from 5 to 20.
Other thing which I tried was create a new app source database and database which contains geo data inside of that SQL instance and didnt add them to availability groups.
From application we are using, for test causes, a connection to the one instance with new test databases.
Neither of this settings work. So guys if you have any idea or any experience with this please help me.
Here is a screen of top 10 waits from sys dmv.
1 - Stats recompute...
When you are going from a SQL version to a higher one, you must first change the compatibility level (to have some performance benefits) and then recompute all statistics in the database with a FULLSCAN. Why ? Because each version of SQL Server come with a new optimizer that have new operators, new algorithms and many improvements... To stick to this new version of the optimizer the method of computing statistics and the form of the results of these calculations, is rethought with each modification of the engine ... so much so that if we use the old statistics with a new engine, it is like taking the census of the population in 1930, to plan the construction of roads, schools and hospitals for the current actual population ....
2 - SQL Server Editions...
When upscaling SQL Server from Standard to Enterprise, you need to increase the "hardware" (even if it is a VM) because many of the features that runs under Enterprise version, and does not exists in Standard, needs some more computationnal resources. As an example, using the AUTO_UPDATE_STATISTICS_ASYNC will use automatically one more thread to the detriment of other processes... In comparison, using a Rolls Royce or a Hummer, instead of a VolksWagen is arguably more comfortable, faster ... but requires more oil and more expensive insurance!
3 - Synchronous AVG...
Synchronous AlwaysOn availability groups must have a very fast and faultless network .... If this is not the case, the replication of update requests can drag performance down, especially if you are in pessimistic lockdown (default mode).
4 - Transaction logs...
One common global lack of performances can be the latency to write the transaction log.
5 - Tempdb files...
Another current global lack of performances can be the latency to access tempdb files.
For those two file problems, use the Glenn Berry latency file query that will give you a indice... Good values are under 7 ms for reads and 15 ms for writes...
CONCLUSION
Many other factors can contribute to slow down you system. But without no more information, we cannot help you...

SQL Server Analysis Services still needed if using Power BI?

I have a project that requires using SQL Server Analysis Services, but we've also started looking at PowerBI.
I'm not entirely clear on how PowerBI functions, and where the computations/data storage takes place. If we use PowerBI for generating the analytics, is there still a benefit to having an Analysis Services layer?
To Use Analysis Services or Not?
It depends. If you already have an Analysis Services (SSAS) model, as Caio mentions, then I wouldn't get rid of it. Power BI works very well with Analysis Services and Analysis Services is going to offer a lot of enterprise-grade options that Power BI isn't going to improve upon (such as the ability to handle millions of new rows each day).
However, if you don't have an Analysis Services model already, SSAS isn't a prerequisite for using Power BI. As Mike mentions, Power BI is fully featured by itself and can easily handle most needs (importing data, modeling the data, and then visualizing the data).
To answer your question about computation and storage, Power BI has a number of layers:
An ETL layer (M). This is how data is brought into your model.
A modeling layer (DAX). This is where the data is stored, and where calculations run.
A visualization layer.
When you use Power BI with Analysis Services in Direct Query mode, then the ETL & data modeling side of things are handled by SSAS. All computation & data storage happens in Analysis Services and Power BI becomes a visualization layer only, sending queries to Analysis Services as needed for your reports.
When you don't have Analysis Services (and are using Data Import mode), then the data is stored in Power BI and all the computations run inside Power BI too.
Pros & Cons of Each Option
The advantage of using Power BI without SSAS is speed of delivery. Everything is handled in one file by one person. If you need to change your data model to make a report work, you can do that within Power BI. When you have a SSAS model, making changes to your data model can be cumbersome (partly because you have to use another tool and partly because any changes will affect all users).
The advantage of using Power BI with SSAS is scalability. Configured correctly, a single Analysis Services model can grow to handle hundreds of gigabytes, hundreds of reports, and hundreds of users with no issue. Analysis Services offers a level of enterprise robustness that goes beyond what you'd want a Power BI file to handle.
That said, introducing Analysis Services brings a number of disadvantages: most importantly, licensing & maintaining a SQL Server & keeping that server up-to-date. Power BI Desktop is updated monthly and is a quick download to get the latest & greatest DAX features. Using SSAS means you have to wait for new releases of SQL Server that include the same DAX features, then test & install them.
Conclusion
If you're not dealing with vast amounts of data (e.g. millions of new rows each month), one way to know if you need the enterprise-grade features of Analysis Services would be to think about the reports needed at the end of the project. If there's a dozen or less reports and you plan to build them all yourself, then Power BI alone offers a lot of advantages. If, on the other hand, there's a whole department of report writers waiting for you to build a data model, then Analysis Services is the way to go.
Sidenote
What's more important than Analysis Services vs. Power BI for ETL/modeling is getting your data model right. A poor data model will be slow using either tool. A well-designed data model will be fast using either option. Make sure to spend plenty of time understanding best practices when it comes to modeling your data. "Analyzing Data with Power BI and Power Pivot for Excel" by Alberto Ferrari & Marco Russo is well worth picking up if you're new to data modeling & BI in general. (Not saying you are.)
Yes, you absolutely need to keep your Analysis Services layer (and other data sources you might have). Power BI is a reporting tool and should receive data pre-aggregated as much as possible, enough to be able to plot charts, display tables, apply filters, etc. The heavy lifting is done at the data source level.
There are a number of limitations in Power BI, and you should plan for that.
For instance:
There is a 1 million row limit for returning data when using
DirectQuery. This does not affect aggregations or calculations used to
create the dataset returned using DirectQuery, only the rows returned.
For example, you can aggregate 10 million rows with your query that
runs on the data source, and accurately return the results of that
aggregation to Power BI using DirectQuery as long as the data returned
to Power BI is less than 1 million rows. If more than 1 million rows
would be returned from DirectQuery, Power BI returns an error.
https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-use-directquery/
You probably dont need a separate Analysis Services instance - only for very large models. In the default Import mode you are only limited by a model size of 1GB for Free or Pro accounts. Due to effective data compression this can be many millions of rows. A rough basis for estimation would be 50m rows in 1GB. Performance is excellent.
Power BI actually spins up an internal Analysis Services instance when a model is in use, which handles all the analytic/calculation requirements. When using Power BI Desktop this runs on your PC (you can watch it in the Task Manager). When using the web service it runs in the cloud. With Power BI Report Server it runs on an on-premise server. You can connect to any of those using Excel Pivot Tables etc, just as you would with regular Analysis Services.

Alternatives to OLAP SSAS Cube Pivot Tables in Excel

I am accessing OLAP SSAS Cubes on a 2005 SQL Server using Excel 2007 pivot tables and finding that refreshing some of the tables is taking >10 minutes. My coworkers seem to think it is a sad reality, but I am wondering if there are alternatives I should be looking into.
Some thoughts I have had:
Obviously if I could upgrade the server hardware I would, but I am merely an analyst with no such powers, so I don't think hardware improvements are a great option. The same is true of moving to a newer SQL server, which I imagine would also speed up the process.
Would updating to a newer version of excel speed up the process?
I came across this: http://olappivottableextend.codeplex.com/, which gives me access to the MDX, which is apparently comically inefficient (Sounds like the macro recorder for VBA to me), so would changing the MDX around (I know a bit of it and the queries it gives for the pivot tables don't seem that complicated) be an option?
Would running MDX outside of excel be an option? I can write the queries, but I imagine it would not be as simple as the pivot table is.
It just seems like OLAP Cubes are a great solution in a lot of ways and these are some massive pivot tables processing quite a bit of information, but if there is a reasonable way to speed up the whole process I would love to know more about it.
Thanks for your thoughts SO.
There are many ways to access SSAS cubes, but it depends on what you are trying to achieve.
Excel tends to be used by business because
Its already installed
It is a familiar business tool
Easy to use
Requires no developer intervention
Other alternatives to Excel to access the cube include
SQL Server Analysis Services (management studio) via cube browser or mdx directly
SQL Server Reporting Services
Bespoke development (such as c#) utilising AdomdConnection
SQL Server (management studio) via OpenQuery
If you have been using Excel to access the cube so far, you will probably decide that none of the other tools quite cover your needs and you will end up sticking with it.
Assuming that Excel is the right tool for you, you should then move on to why is it slow. The list of possibilities (not including hardware / software) is long, but here are some;
It could be that it is external contention (to your project) on network / database / disk resource. The colume of data may be accumulating over time.
The cube may not be paritioned.
The questions you ask of it may be getting more complex.
The cube aggregations may not be utilised for your needs.
Cube partitioning may be missing
Cube structure may be inefficient as its supporting many-to-many relationships
User / query volume may have increased
To try to address the problem I would
Assess the data that you require within the cube (and maybe limit the cube to a rolling x month window)
Log your queries and apply Usage Based Optimisation
Monitor cube usage via SQL Server Profiler
Review the structure of your cube design
Attempt similar queries with other tools (both across the network and local to the cube) to establish where the issue lies
These two sites may help you if you establish Excel is the week point Excel, Cube Formulas, Analysis Services, Performance, Network Latency, and Connection Strings OR Excel, Cube Formulas, Analysis Services, Performance, Network Latency, and Connection Strings (which is on page 57 of SQLCAT's Guide to BI and Analytics)

How to decide which BI tool is suitable in a particular scenario, SSAS or OBIEE?

For a large Data Base which one is better in performance. which one will give data very fast on MDX query execution.
For a 200GB data set, SSAS on an ordinary wintel server will be fine. It plays nicely with readily available front-end tools and will be much, much cheaper than Oracle unless you already have incumbent OBIEE licencing.

realtime system database use

Given a .NET environment with Windows CE, can you persist thousands of records per second in a local database (SQL Server 2008 - standard or CE).
What are the performance issues with persisting realtime instrument data in a database versus a log file?
SQL Server 2008 standard is more than capable of those insertion rates PROVIDED you have hardware capable of supporting it.
The question you really need to be asking is do I require the ability to search the captured data quickly?
This SO answer might be of interest: What does database query and insert speed depend on?
The number (and width) of indexes on a table will obviously have an impact on insertion rate.
If you are considering open-source, then MySQL is often cited as being able to handle high volumes.

Resources