Mutually exclusive facts. Should I create a new dimention in this case? - sql-server

There is a star schema that contains 3 dimensions (Distributor, Brand, SaleDate) and a fact table with two fact columns: SalesAmountB measured in boxes as the integer type and SalesAmountH measured in hectolitres as the numeric type. The end user wants to select which fact to show in a report. The report is going to be presented via SharePoint 2010 PPS.
So help me please determine which variant is suitable for me the most:
1) Add a new dimension like "Units" with two values Boxes, Hectolitres and use the in-built filter for this dim. (The fact data types are incompatible though)
2) Make two separate tables for the two facts and build two cubes. Then select either as the datasource.
3) Leave the model as it is and use the PPS API in SharePoint to select the fact to show.
So any ideas?

I think the best way to implement this is by using separate field for SalesAmountB and SalesAmountH in fact table. Then creating 2 separate measure in BIDS and controlling the visibility through MDX. By doing this, you can avoid complexity of duplicating whole data or even creating separate cubes.

Related

Creating an Efficient (Dynamic) Data Source to Support Custom Application Grid Views

In the application I am working on, we have data grids that have the capability to display custom views of the data. As a point of reference, we modeled this feature using the concept of views as they exist in SharePoint.
The custom views should have the following capabilities:
Be able to define which subset of columns (of those that are
available) should be displayed in the view.
Be able to define one or
more filters for retrieving data. These filters are not constrained
to use only the columns that are in the result set but must use one
of the available columns. Standard logical conditions and operators
apply to these filters. For example, ColumnA Equals Value1 or
ColumnB >= Value2.
Be able to define a set of columns that the data will be sorted by. This set of columns can be one or more columns
from the set of columns that will be returned in the result set.
Be
able to define a set of columns that the data will be grouped by.
This set of columns can be one or more columns from the set of
columns that will be returned in the result set.
I have application code that will dynamically generate the necessary SQL to retrieve the appropriate set of data. However, it appears to perform poorly. When I run across a poorly performing query, my first thought is to determine where indexes might help. The problem here is that I won't necessarily know which indexes need to be created as the underlying query could retrieve data in many different ways.
Essentially, the SQL that is currently being used does the following:
Creates a temporary table variable to hold the filtered data. This table contains a column for each column that should be returned in the result set.
Inserts data that matches the filter into the table variable.
Queries the table variable to determine the total number of rows of data.
If requested, determines the grouping values of the data in the table variable using the specified grouping columns.
Returns the requested page of the requested page size of data from the table variable, sorted by any specified sort columns.
My question is what are some ways that I may improve this process? For example, one idea I had was to have my table variable only contain the columns of data that are used to group and sort and then join in the source table at the end to get the rest of the displayed data. I am not sure if this would make any difference which is the reason for this post.
I need to support versions 2014, 2016 and 2017 of SQL Server in addition to SQL Azure. Essentially, I will not be able to use a specific feature of an edition of SQL Server unless that feature is available in all of the aforementioned platforms.
(This is not really an "answer" - I just can't add comments yet because my reputation score isn't high enough yet.)
I think your general approach is fine - essentially you are making a GUI generator for SQL. However a few things:
This type of feature is best suited for a warehouse or read only replica database. Do not build this on a live production transactional database. There are permutations that you haven't thought of that your users will find that will kill your database (it's also true from a warehouse standpoint, but they usually don't have response time expectations as a transactional database)
The method you described for doing paging is not efficient from a database standpoint. You are essentially querying, filtering, grouping, and sorting the same exact dataset multiple times just to cherry pick a few rows each time. If you have the data cached, that might be ok, but you shouldn't make that assumption. If you have the know how, figure out how to snapshot the entire final data set with an extra column to keep the data physically sorted in the order the user requested. That way you can quickly query the results for your paging.
If you have a Repository/DAL layer, design your solution so that in the future certain combinations of tables/columns can utilize hardcoded queries/stored procedures. There will inevitably be certain queries that pop up that cause you performance issues and you may have to build a custom solution for specific queries in order to get the desired performance that can't be obtained by your dynamic sql

Database designing for query builder project

I have this project in which user can create queries using web based UI in which there will be list of columns and applicable operators. Later users can edit these queries, too. So I need to store this in database.
I can store query in table as simple string but then editing will be not possible so I need to store it some other way. So I somehow managed to design it following way.
So let's say I have to store this query:
C1 > 8 AND (C2 <= 7 OR (C4 LIKE '%all%' AND (C1 > 15 OR C2 <= 3)))
where: C denotes some column
If I have to store it in DB as shown in image,
I would group each condition and store it in sub_operand table
then there will be recursive mapping entry in op_master table for each entry in sub_operand table
finally there will be master entry in op_master
But it seems too much complicated to handle insert and update. Can someone help me with this? I am very much stuck here.
UPDATE: I think I am missing something here in schema. It won't work as I have thought. I will update question as soon as I can correct it.
I'm not quite sure about the way you will use your data structure to represent the tree structure of a formula. See my answer to Logical Expressions rules in relational datamodel for this aspect. (But your question is not a duplicate of that one.)
I don't see the complication in inserting and updating. The only complicated aspect I can see is the GUI for your users to enter and edit these recursive formulas. It's somewhat complicated as, due to the unbounded width and depth of the formula, you can't just define one set of drop-down fields for column and operand selection, but the count of GUI elements will need to grow as the user increases the width and depth.
Once this is solved, you will have the following architecture:
Formula GUI --buildFormula--> Formula --storeFormula--> Database
<--display------- <-readFormula----
This means you have some abstract representation of a Formula in your domain layer, again some tree, that you use to actually evaluate these formulas. And you need to persist Formulas in the database. The operations propagating formulas from the GUI to the domain and further to the database, and the other way round, are also shown.
As I said, the GUI is the most complex part. Having a formula representation on the GUI, sending it to the domain and building its identically structured counterpart in your programming language isn't a problem. If the formula got edited, i.e. if it had a previous structure that now got modified by the user, I wouldn't try to incrementally update the domain object, but just throw it away and build it up from scratch. The same holds for storing in the database: Delete all parts of it and store it as a whole.
Reading is straightforward too, again with some effort in building the GUI representation.
By the way, it isn't strictly necessary to represent all subterms of your formula as records in the database. If you will never query on those subterms, but just store and read the formula as a whole, and if you never have a query like select all formulas using a specific column, it would be sufficient to store a formula as a single string.

SSAS roles - OR rather than AND

we are using SSAS 2008. We have a data related permissions issue. We are successfully able to apply SSAS roles to an incoming user's (Active Directory [AD]) context. i.e. the results returned are based on the user's SSAS role and limit the data returned from a dimension based on the role.
As far as I am aware, you can apply multiple roles to a user so that dimensions take the relevant role and limits it's results - however this condition is applied as an intersection (i.e. an AND) - is it able to be applied as a union (i.e. an OR)
There are a couple of things to note.
- we are accessing our cube via Excel, so relying on the above intelligence within some MDX is not necessarily achievable because the user can query on any dimension (some may be limited by role while others not)
- we have toyed with the idea of having two cubes coupled with two different userIds (in different AD groups) per user; the user would extract data from Excel depending on the data they would like to see (and hence the cube they would be querying); this is messy because we would like the users results all wrapped up in one resultset rather than two separate ones
Has anyone experienced and/or a resolution to the above - is it possible - is there an alternative ?
You can add the Roles property to a SSAS connection string that hold a comma-delimited list of database roles to be evaluated. Only the roles applied in that list will then be applied by the server.
Data Source=localhost;Initial Catalog=MySSASDb;Roles=RoleA,RoleB
If you need a OR condition on 2 dimensions, you can add another dimension with 2 levels being the same level of the 2 initial dimension:
For instance, you want to limit the role to [Country].[France] or [Currency].[USD], then you can add a [CountryThenCurrency] dimension and allow the 2 following paths: [CountryThenCurrency].[France].[*] and [CountryThenCurrency].[*].[USD]
In other words: add dimensions in order to be able all your OR conditions in a single dimension.
I have been searching for a solution for this problem and found a great article of Chris Webb.
http://cwebbbi.wordpress.com/2011/12/22/replacing-cell-security-with-dimension-security/
The idea is to create a junk dimension named DIM_Security and put all distinct combination of both dimensions in there. Now via an MDX expression you get the UNION of all Members having either dimension1 = 'allowedvalue' or dimension2 = 'allowedvalue', thus you have OR permissions.
Moreover Webb explains how to do this with dynamic security.
Not copying all the MDX as the original post is several pages long with many screenhots and linked many times all over the web.

Detail Dimension for Drillthrough

I have created a number of type 1 dimensions to hold customer/subscription level details. These dimensions are very large compared to any other dimensions I am using with nearly a 1 to 1 relationship to facts. The dimensions are only being used to provide drillthrough details.
It's all working but the size of these dimensions is quite large and I'm running into some memory issues when processing. I'm wondering if there are some porperties I should be setting since these are only used for drillthrough? NonAggregateable?
Would it be better to include details as nonAggregateable Measures since there is nearly a 1 to 1 relationship?
An example would be SubscriptionDetail which has values like email, userUID, activationcode. If users are looking at the subscription fact they can drillthrough to pull these details.
You won't be able to use strings as measures, so Email will be out.
I have had success using hidden datetime measures for drillthrough though, to get the exact datetime when the fact table generally keys off to a date dimension.
If processing is an issue, and there is a 1:1 with the fact, does the dimension change historically? If not, have you tried ProcessAdd to only add the new rows? If you have enterprise SSIS there is a component for this, or you can generate your own XMLA and send it to the server as part of the processing: http://www.artisconsulting.com/Blogs/tabid/94/EntryID/3/Default.aspx

SQL Server Dynamic Columns Problem

I use a table GadgetData to store the properties of gadgets in my application. There gadgets are basically sort of custom control which have 80% of the properties common like height, width, color, type etc. There are are some set of properties per gadget type that the unique to them. All of this data has to store in database. Currently I am storing only common properties. What design approach should I use to store this kind data where the columns are dynamic.
Create table with common properties as Columns and add extra column of type Text to store the all unique properties of each gadget type in XML format.
Create a table with all possible columns in all of the gadget types.
Create separate table for each type of gadgets.
Any other better way you recommend?
(Note: The number of gadget types could grow even beyond 100 and )
Option 3 is a very normalized option, but will come back and bite you if you have to query across multiple types - every SELECT will have another join if a new type is added. A maintenance nightmare.
Option 2 (sparse table) will have a lot of NULL values and take up extra space. The table definition will also need updating if another type is added in the future. Not so bad but still painful.
I use Option 1 in production (using an xml type instead of text). It allows me to serialize any type derived from my common type, extracting the common properties and leaving the unique ones in the XmlProperties column. This can be done in the application or in the database (e.g. a stored procedure).
Your options:
Good one. Could even force schema etc
You cannot make those column NOT NULL, so you will loose some data integrity there
As long as you do not allow search for more then one type of gadget, it is good enough, but option 1 is better
I would use ORM
Notes:
If you would like to keep your database 'relational', but are not afraid to use ORM tools, then I would use one. I which case you can store the data (almost) as you want, but have it properly handled as long as you map them correctly.
See:
Single Table Inheritance
Concrete Table Inheritance
If you need SQL-only solution, then depending on your RDBMS, I would probably use XML column to store all the data that is specific to the gadget type: you can have validation, extend easily with new attributes. Then you can have all in one table, search quickly on all common attributes, and also pretty easily search for one gadget' type attributes as well
If all types of gadgets have many common mandatory properties that can be stored in one table and just several optional properties, you'd better use first approach: thus you'll use best of relational schema and ease your life with XML. And don't forget to use XML Schema collection linking XML column to it: you'll have full indexing and XQuery capabilities.
If gadget types has very different descriptions and only 1-3 common columns among 5 or more different set of properties, use 3rd approach.
But concerning the situation of 100+ types of gadgets I'd use 1st approach: it has flexibility supported with good performance and ease of support and further development.
Depending on how different the "Gadgets" are I wouldn't like option 2 there would be a lot of nulls floating around, which could get bad if you had a column which was mandatory for one gadget but not even used for another.
I would only go option 3 if the number of gadgets changes infrequently since it would require altering the database each time.
The unmentioned option is to store the Gadgets with a child table which holds the gadgets unique values. But this would require a fair amount of work to return gadgets details, or multiple Database calls.
Leaving option 1, except I would use SQL servers XML type instead of text, you can then use XQuery within your stored procedures.

Resources