Tableau Peformance- Custom SQL Queries - sql-server

I am essentially building gone report that ingests two types of data. One is the receptionists data. Which is each receptionists stats by day. But then the data gets a little more granular and is each call for each receptionist.
Essentially the report does two things gives receptionist performance then a person can click and prompt the same dashboard sheet to update with specific call log etc.
So basically this data set is huge and held as an export so it will be faster an I limit the data to this month and last month (minimum requirement). I have also eliminated any unnecessary columns.
I am curious if I should create two separate custom queries in Tableau then create referential field or should I bring both custom queries inside of one workbook and join them together. At first I had the two connections separate but now I brought them together and am noticing some performance issues. What are some of my options?

It would be better to have two seperate queries since for the first view doesnt need all the additional details you want to show in the drill down.
Use an action filter and link the two sheets(which use different data sources) by selecting the specific fields when configuring the action filter.
Performance wise this is a good approach.

Related

Creating an Efficient (Dynamic) Data Source to Support Custom Application Grid Views

In the application I am working on, we have data grids that have the capability to display custom views of the data. As a point of reference, we modeled this feature using the concept of views as they exist in SharePoint.
The custom views should have the following capabilities:
Be able to define which subset of columns (of those that are
available) should be displayed in the view.
Be able to define one or
more filters for retrieving data. These filters are not constrained
to use only the columns that are in the result set but must use one
of the available columns. Standard logical conditions and operators
apply to these filters. For example, ColumnA Equals Value1 or
ColumnB >= Value2.
Be able to define a set of columns that the data will be sorted by. This set of columns can be one or more columns
from the set of columns that will be returned in the result set.
Be
able to define a set of columns that the data will be grouped by.
This set of columns can be one or more columns from the set of
columns that will be returned in the result set.
I have application code that will dynamically generate the necessary SQL to retrieve the appropriate set of data. However, it appears to perform poorly. When I run across a poorly performing query, my first thought is to determine where indexes might help. The problem here is that I won't necessarily know which indexes need to be created as the underlying query could retrieve data in many different ways.
Essentially, the SQL that is currently being used does the following:
Creates a temporary table variable to hold the filtered data. This table contains a column for each column that should be returned in the result set.
Inserts data that matches the filter into the table variable.
Queries the table variable to determine the total number of rows of data.
If requested, determines the grouping values of the data in the table variable using the specified grouping columns.
Returns the requested page of the requested page size of data from the table variable, sorted by any specified sort columns.
My question is what are some ways that I may improve this process? For example, one idea I had was to have my table variable only contain the columns of data that are used to group and sort and then join in the source table at the end to get the rest of the displayed data. I am not sure if this would make any difference which is the reason for this post.
I need to support versions 2014, 2016 and 2017 of SQL Server in addition to SQL Azure. Essentially, I will not be able to use a specific feature of an edition of SQL Server unless that feature is available in all of the aforementioned platforms.
(This is not really an "answer" - I just can't add comments yet because my reputation score isn't high enough yet.)
I think your general approach is fine - essentially you are making a GUI generator for SQL. However a few things:
This type of feature is best suited for a warehouse or read only replica database. Do not build this on a live production transactional database. There are permutations that you haven't thought of that your users will find that will kill your database (it's also true from a warehouse standpoint, but they usually don't have response time expectations as a transactional database)
The method you described for doing paging is not efficient from a database standpoint. You are essentially querying, filtering, grouping, and sorting the same exact dataset multiple times just to cherry pick a few rows each time. If you have the data cached, that might be ok, but you shouldn't make that assumption. If you have the know how, figure out how to snapshot the entire final data set with an extra column to keep the data physically sorted in the order the user requested. That way you can quickly query the results for your paging.
If you have a Repository/DAL layer, design your solution so that in the future certain combinations of tables/columns can utilize hardcoded queries/stored procedures. There will inevitably be certain queries that pop up that cause you performance issues and you may have to build a custom solution for specific queries in order to get the desired performance that can't be obtained by your dynamic sql

Excel Scalability and Speed Issues (VBA, Array and Comboboxes)

Context
There are two excel.workbooks in the same location: database and dashboards. Whereas database.workbook has as many tabs as clients I manage, dashboard.workbook has as many tabs as reports are required.
Navigation across report's (dashboard.worksheets) it's pretty simple. On each report there's a combobox that contains every dashboard.worksheets' names. Selecting any report on that combobox hides the current worksheet/report and open the desired one.
In each tab/report there is a second combobox that allows you to select a client, populating the report with the selected client's data.
The report
The information in the database looks like this:
Date|Device|Group|Subgroup|metric1|metric2|metric3|etc.
The information displayed in the report (in the one I'm having issues with) looks like this:
Group|metric1|2|3|...
The issues
1) Currently the group is displayed like this:
=IFERROR(LOOKUP(2,1/(COUNTIF($C$17:C18,IF($C$8="Goldsmiths",Client1_GroupName,IF($C$8="Client2",Client2_GroupName,IF($C$8="Client3",Client3_GroupName,IF($C$8="Client4",Client4_GroupName)))))=0),IF($C$8="Client1",Client1_GroupName,IF($C$8="Client2",Client2_GroupName,IF($C$8="Client3",Z2,Client3_GroupName($C$8="Client4",Client4_GroupName))))),"")
The combobox prints its value into Range("C8"). Through a nested ifs structure the formula identifies the client and then pulls a unique list of groups from the selected client tab (from database.workbook).
One issue is that it is very messy and hard to escalate (the more clients I get, its complexity growth exponentially). I bet there are easiest ways to do it (maybe VBA?).
It can be quite slow, the more "groups" we get and more days recorded into the database, more slow it will get.
2) Pulling the data
Most of the data to pull can be done through array formulas like this one:
={SUM((Client1_GroupName=C20)*Metric1)}
It sums all the Metric1 for the group matching C20,C21,22,23 (in that c20:xx range we have the first formula pulling the Group list.
I haven't added the nested ifs yet. It's going to be a pain to do it across 5 more columns. Again very hard to escalate.
This can be terribly slow. It comes a point that changing client means waiting 2 or 3 minutes to process the array.
Conclusions
I guess what I'm seeking is some advice on how to face this issues, which essentially are: scalability and speed.

Insert New Record Before Existing Recordds in MS SQL Server

I'm working with an existing MS SQL database and ASP.NET web application. An update is needed in a table, but in order to add the new data and have it display correctly in the site, I need to be able to take a series of existing records, essentially "push them down", and then add the new data in the open space created.
Is there a cleaner, more efficient method than by just creating a new record that's a copy of the last related record, and then essentially doing a copy-and-paste for the remaining records until I reach the insertion point? There are quite a few records to move and I'd prefer something that isn't as mind-numbing and potentially error-prone as that.
I know the existing site and database isn't designed optimally for inserting new data into this table unless it's added to the end, but reconfiguring the database and stored procedures is not an option I presently have.
-- EDIT --
For additional requested information...
Screen shot of table definition:
Screen shot of some table data (filtered by TemplateID):
When looking at the table data, there are a couple other template ID values that bring back a bit more complex data. The issue is that this data needs to maintain this order, which happens to be the order in which it has been entered, since it gets returned and displayed in the shown order. The new data needs to be entered prior to one of these lettered subject headers. Honestly, I think this is not the best way to do this, but I had no hand in the design. It was created by a different company, and mine was hired to handle updates and maintenance after the creators became unpleasant to work with. A different template ID value brings back two levels of headers, which doesn't make my task any easier or alterations much cleaner considering the CS code that calls the stored procedures is completely separated from the code that builds the contents of the pages, and the organizational structure i tough to follow. There are some very poor naming conventions in places.
At any rate, there needs to be an insertion into this group of data under the "A" header value. The same needs to occur with another chunk associated with a different template ID, and there is another main header below the insertion point:

To use or not to use computed columns for performance and maintainability

I have a table where am storing a startingDate in a DateTime column.
Once i have the startingDate value, am supposed to calculate the
number_of_days,
number_of_weeks
number_of_months and
number_of_years
all from the startingDate to the current date.
If you are going to use these values in two or more places in the application and you do care much about the applications response time, would you rather make the calculations in a view or create computed columns for each so you can query the table directly?
Computed columns are easy to maintain and provide an ideal solution to your problem – I have used such a solution recently. However, be aware the values are calculated when requested (when they are SELECTed), not when the row is INSERTed into the table – so performance might still be an issue. This might be acceptable if you can off-load work from the application server to the database server. Views also don’t exist until they are requested (unless they are materialised) so, again, there will be an overhead at runtime, but, again it’s on the database server, not the application server.
Like nearly everything: It depends.
As #RedX suggest it probably not much of a performance difference either way, so it becomes a question of how will use them. To me this is more of a feel thing.
Using them more than once doesn't wouldn't necessary drive me immediately to either a view or computed columns. If I only use them in a few places or low volume code paths I might calc them in-line in those places or use a CTE. But if the are in wide spread or heavy use I would agree with a view or computed column.
You would also want them in a view or cc if you want them available via ORM tools.
Am I using those "computed columns" individual in places or am I using them in sets? If using them in sets I probably want a view of the table that shows included them all.
When i need them do I usually want them associated with data from a particular other table? If so that would suggest a view.
Am I basing updates on the original table of those computed values? If so then I want computed columns to avoid joining the view in these case.
Calculated columns may seem an easy solution at first, but I have seen companies have trouble with them because when they try to do ETL with CDC for real-time Change Data Capture with tools like Attunity it will not recognize the calculated columns since the values are not there permanently. So there are some issues. Also if the columns will be retrieve many, many times by users, you will save time in the long run by putting that logic in the ETL tool or procedure and write it once to the database instead of calculating it many times for each request.

Bitemporal Database Design Question

I am designing a database that needs to store transaction time and valid time, and I am struggling with how to effectively store the data and whether or not to fully time-normalize attributes. For instance I have a table Client that has the following attributes: ID, Name, ClientType (e.g. corporation), RelationshipType (e.g. client, prospect), RelationshipStatus (e.g. Active, Inactive, Closed). ClientType, RelationshipType, and RelationshipStatus are time varying fields. Performance is a concern as this information will link to large datasets from legacy systems. At the same time the database structure needs to be easily maintainable and modifiable.
I am planning on splitting out audit trail and point-in-time history into separate tables, but I’m struggling with how to best do this.
Some ideas I have:
1)Three tables: Client, ClientHist, and ClientAudit. Client will contain the current state. ClientHist will contain any previously valid states, and ClientAudit will be for auditing purposes. For ease of discussion, let’s forget about ClientAudit and assume the user never makes a data entry mistake. Doing it this way, I have two ways I can update the data. First, I could always require the user to provide an effective date and save a record out to ClientHist, which would result in a record being written to ClientHist each time a field is changed. Alternatively, I could only require the user to provide an effective date when one of the time varying attributes (i.e. ClientType, RelationshipType, RelationshipStatus) changes. This would result in a record being written to ClientHist only when a time varying attribute is changed.
2) I could split out the time varying attributes into one or more tables. If I go this route, do I put all three in one table or create two tables (one for RelationshipType and RelationshipStatus and one for ClientType). Creating multiple tables for time varying attributes does significantly increase the complexity of the database design. Each table will have associated audit tables as well.
Any thoughts?
A lot depends (or so I think) on how frequently the time-sensitive data will be changed. If changes are infrequent, then I'd go with (1), but if changes happen a lot and not necessarily to all the time-sensitive values at once, then (2) might be more efficient--but I'd want to think that over very carefully first, since it would be hard to manage and maintain.
I like the idea of requiring users to enter effective daes, because this could serve to reduce just how much detail you are saving--for example, however many changes they make today, it only produces that one History row that comes into effect tomorrow (though the audit table might get pretty big). But can you actually get users to enter what is somewhat abstract data?
you might want to try a single Client table with 4 date columns to handle the 2 temporal dimensions.
Something like (client_id, ..., valid_dt_start, valid_dt_end, audit_dt_start, audit_dt_end).
This design is very simple to work with and I would try and see how ot scales before going with somethin more complicated.

Resources