Good Practice Question: Using DAX vs Doing Everything in the Database

Good Practice Question: Using DAX vs Doing Everything in the Database - sql-server

Is it a better practice to do all of the calculations, etc in SQL Server (or whatever database you are using) instead of DAX and avoid DAX at all cost or only use it for minor things?
I'm inexperienced with DAX.
We are having some performance issues with DAX, so I was wondering what the best practice would be.

Data transformations and calculations should be performed at the lowest level they make sense, and in an environment you are comfortable and productive in.
So if all your data comes from a SQL Server you control, and you are comfortable performing data transformation and calculation tasks using TSQL, then you should do much of the prep work, modeling, and basic calculations there.
Note that TSQL is incapable of expressing complex business calculations that can be applied across arbitrary "filter contexts", so you will still use DAX for some measure calculations.
On the other hand, if you don't have a SQL Server you control, or you're mashing-up data directly in Power BI, or you don't have a TSQL skillset, then you should do the data transformation in Power Query/DAX and the business calculations in DAX.

Related

DWH: Why build cubes, when you already have a DB star schema?

my etl process collects and transforms data into a DB base with facts and dimensions. so why should I build cubes out of this? Is there more than just a the speed benefit of queries and the pre aggregiation of values?
Thank you for helping

Facts and dimensions are tables that can be built in almost any relational database. To improve performance you can build aggregate fact tables in the same relational database. These tables are generic in that, with very little effort, you could move these tables from Oracle to SQL Server, as an example.
At the risk of over simplifying, a cube is a type of aggregate fact table but is built in a multi-dimensional database and is, normally, specific to that flavour of database. So if you build a cube in SSAS you couldn't move it to Hyperion Essbase.
For a simple query, such as sum transaction amount by date, cubes would not give you much/any benefit over facts. For complex queries, the performance tends to be significantly better than with facts.
Cubes normally support their own query language (e.g. SSAS and DAX) that allow much more complex queries than can normally be written in SQL (without a lot of effort)
So whether you should build cubes depends on a lot of factors, such as:
Are you running a lot of complex queries that would perform better in a cube?
Would the improved performance be worth the effort/cost involved?
Is there a cost/benefit case for deploying a MOLAP (cube) database as well as your dimensional database? Cubes are normally populated from facts/dimensions so they are an addition to, rather than a replacement for, facts/dimensions

Which gives better performance for connection with Qlikview??SQL server or SSAS Cube?

I want to process my data into Qlikview but i am confused about to process the data through Cube or directly from SQL.
Can anyone tell me which gives better performance from cube and SQL?
Note: I have millions of data into the database.

Generally as the volume of data grows, the advantages of SSAS tend to become more apparent than those from using SQL Server as the source. How will the data be used? When it comes to large scale aggregations SSAS becomes very beneficial. SSAS will also force a structured layout, as the relationships are predefined in the cube as opposed to joins. Some additional features that SSAS brings are hierarchical analysis (hierarchies) as well as ease of use with tools such as Excel and SSRS, although it sounds like you're only looking to use Qlikview for this. However, your best option would be to do a baseline for both SSAS and SQL Server in your environment with queries that best represent what would be run when this is implemented, and assess the results from there.

From BI tool perspective it doesn't matter as you can connect to both source (SQL is more common but it depends on your expertise). Regarding performance the best strategy is to have separate extract layer and store data incrementally as qvd (for example every night previous day) so performance is not as important with incremental reload as even for big data sets it should be quick.
If your original source of data is SQL in my opinion it doesn't make sense to replicate data in 3 places (SQL, cube and QlikView) better connect directly to source save it incrementally raw data as qvd and then have transformer which will model that data.

Native query performance in Hibernate

It's obvious that hql query is slower that native. There are project on the mind which will use huge amount of small transactions to perform. So question is what will perform better:
native query in jpa
native query through jdbc
How much difference is? Because of jpa mapping capabilities and prepared statements prefer it. But according to performance requirements it could be that hibernate will not be fast enough...
EDIT
replaced "Huge amount of data to process" to "huge amount of small transactions to perform"

It is a way too little info you have given.
Huge amount of data to process
How huge is that? Like a few 10 millions of records of few 100 Gigabytes?
How do you plan to process that data? If you pull it to the Java side via Hibernate (or native query or jdbc) then probably you are on a wrong track. You shall keep the data in the database, and process it there with the tools what the database offers for that, and lightyears more performant than any client-side processing. Consider database side processing (PL/SQL of Oracle, Transact SQL of MS SQL Server).
What are you planning to optimize? Data insertion? Data retrieval? Are you planning to use select statements which dig trough a lots of data? Then consider an OLAP solution over classic OLTP. OLAP solutions are built for business intelligence and analyzing of huge gigabytes of data by a few clever tricks. Google for it (OLAP, Decision cubes)
Can you use any of the capabilities of the underlying SQL engine? For example, if you are using Oracle, you have 1000 times more features available what you can actually use in Hibernate. For example, you just simply can not make an Oracle Text query in Hibernate, and there are really a lots of things you can not do.
I could sum this up that the performance difference is not between using native SQL or HQL. Instead:
Hibernate is powerful in what it was built for: handing very few records of data, optimisitically caching it locally in the Java side as a groundwork for data processing systems built for databases which are not capable for data processing. (but only for some selects, inserts, updates, deletes)
Once you really have to move huge amount of data, Java side processing is not an option. Programmers in the '80s have invented the stored procedures exactly because of this reason. Pick a database which supports database side processing to shortcut all network roundtrips, and imperative data processing with for-loops in your java code. Prefer instead as much declarative SQL as you can, and run the processing on your database.
Once you are about to start using the features of your database, Hibernate will be pretty much in your way. It is really built as an ORM wrapper - however data processing problems are not CRUD and ORM-able problems all the time. For example, Hibernate is not too useful for an OLAP use case.
Huge amount of small transactions
In the case of having lots of small transactions (inserting/updating data to the database), Hibernate does not have any performance advantage (since no Java side caching can be utilized) disregarding of the database in use. However you may prefer hibernate since it is a nice tool for converting Java objects to SQL statements.
But with Hibernate, you are totally not in control on what is happening. For example, Hibernate + Oracle, inserting new entities to the database: this is the worst performance nightmare you can imagine.
This is what Hibernate does:
select one new id from a sequence
execute one insert
repeat zillion of times
This is very much not performing well. (The sequence reference shall be part of the insert statement, the whole insert statement shall use JDBC batching.)
I found that a JDBC based approach runs about 1000 times quicker than the Hibernate in this particular use case. (Prefetching next 100 sequences, use JDBC Batch mode for Oracle, bind variables to the batch, send down records in batches of 100, and use asynchronous commit (yet again something you can not control in Hibernate).
I found that if I want to squeeze out most of your tools, I need to learn them in depth. Unfortunately you can find lots of opinion-based comments especially in Hibernate vs. non-Hibernate wars on the net. Most of them are written by Java developers who have literally no idea about what happens behind the scenes. So, don't believe - measure :)

SSRS Best Practice - Data Calculations/Aggregatopm in SQL SP or in SSRS Expressions (VS/Report Builder)

Should I try to do all (or as many as possible) necessary calculations for an SSRS report in SQL code (stored procedures) like summing, percentages etc. or should I do the calculations using Expressions in Report Builder/VS?
Is there an advantage to doing one over the other?
In other words, should I try to keep the data in my Datasets very granular, detailed, low level and then just use Report Builder 3.0/VS to do all the necessary calculations/aggregations?

There is no one-size-fits-all best approach. In a lot of cases, SQL will be faster at performing aggregations than SSRS. SSRS will be faster at performing the kind of operations that would cause a table scan instead of an index seek when it's done in SQL.
Experience, common sense, and testing are the best guides.

Almost always you want to do your filtering and calcs on the server side. If you do it through a stored procedure SQL Server can optimize the query and create a well prepared, reusable, query plan. You can examine the resulting query plan and optimize it. None of this is possible if you create and run the code on the client side. How will it use indexes on the client? If your report uses a lot of data your report will take a much longer time to run and your users will blame you. The editor in BIDS is much poorer then the one in SSMS. Procs can be backed up and managed through SVN of TFS. Unless you know for sure that it runs faster on the client (and this is very rare) learn how to create stored procedures.

Need for OLAP cubes if we can Build views based directly off the RAW table

Assume that the table in the source data is is clean and in a state where they can be used directly.
I am trying to understand whether building views based off the RAW table is better than creating cubes. To make the VIEWS dynamic, we can have .NET application which would take paramteres for the view and execute a View with Parameters and get the data for Reporting and analysis.
If I want to view the Sales of a Product for United states in the Month of Februaray. So, I can create a view joining Product, Customer get the sales for a particular day in the month of February.
Instead of forming a Star Schema with Product, Date, Customer dimension. I am really trying to understand what is the standarad a company should go with.
I have folks telling Cubes are only good for analysis not good for reporting . Whatever information we want we can get it by creating a DYNAMIC Views
Any advise or ideas on this ?
Thanks!!

As the name suggest SSAS (SQL Server Analysis Services) is indeed built for analysis. The reason for this is the highly normalized table structure (e.g., the star schema) which allows for super efficient indexing combined with the pre-processing of aggregated values.
Views are a great way to take data that already exists within your OLTP (as compared to OLAP) database and transform it in a manner that better fits your querying needs. This works in the same manner as "get" stored procedures.
Now for my opinion:
If you have a small amount of data (relative to the power of your server, as well as many other factors) and you're not performing intense aggregations of the data, consider using stored procedures to access your database. You can specify the parameter in .NET like any other function, making this method super easy.
If you have a lot of data (like, over 100 million rows), consider creating a cube. This will allow your queries to fly. There's a lot more work that goes into these, but the speed payoff is huge.
End note:
If the data in your reports is pretty similar to the data you already have in your database (including JOINing the tables) and you have under half a billion rows, just use a stored proc, and look into using SSRS (or not). If you have a ton of data that needs to be aggregated and transformed, look into SSAS OLAP cubes.

From my limited experience with Microsoft's Analysis Services, I would agree with Norla. If the execution time of the view is reasonable, that would be the way to go. Cubes can certainly be reported against, as SQL Reporting Services accommodates them fairly well, but the development process can often be much more involved when using a cube as your data source.

Building views can be an alternative for small datasets. You could consider going that route BUT:
1) once the reports are taking a lot of time to load
2) It slows the transactional systems
Then you'll have to consider cubes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight