Defining measure with a varchar field in OLAP Cube - sql-server

I'm trying to create a multidimentional database from a preexisting database using SQL Server Analysis Services. My problem is that the original database stores all information on a varchar field called "value". What's in that field depends on another field that holds the type of statistic. So I can have for example a fact with statistic_type "number of products sold" with value 1000 and another with type "cost of material bought" with value 5000. The values can have completely differentic meanings, some are numeric values, others are percentages and others are strings.
How do I turn those into measures. Should the statistic_type be a dimension of the cube and have the value as a measure? Does a measure always need to have a numeric value? Should I separate the fact table amoung several tables, one for each type of statistic? Or is there some sensible way to create a cube using just the one table.
It's the first time I'm working with multidimentional databases and SSAS so I'm a little lost.

A measure always needs to have a numeric value. In fact, you will probably have to cast the value column as a numeric datatype in your Data Source View in order for it to even be a candidate for a measure in your cube.
You should make statistic_type a dimension and "value" a measure. It's ok to just use the one table, although it might be easier to work with if you make a lookup table of the distinct statistic_types.

Related

Dimension Creation - Multiple Uses

We received some generic training related to TM1 and dimension creation and we were informed we'd need separate dimensions for the same values.
Let me describe, we transport goods and we'd have an origin and destination province and in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension. This seems to be cumbersome and seems like we'd encounter the same issue with customers, services, etc.
Can someone clarify how this could work for us?
Again, I'd expect to see a "lookup" table in the database which contains all possible provinces (assumption is values in both columns would be the same), then you'd have an ID value in any column that used the "province" and join to the "lookup" table based on ID.
in typical database design I'd expect we'd have one "province" reference table, but we were informed we'd need an "origin" dimension and a "destination" dimension
Following the regular DB design it makes sense to keep two data entities separate: one defines source, other defines target. I think on this we'd both agree. If you could give more details it would be better.
Imagine a drop down list: two lists populated by one single "source", but represent two different values in DB.
assumption is values in both columns would be the same
if the destination=origin, you don't need two dimensions then? :) This point needs clarification.
Besides your solution (combination of all source and destination in a table with an unique ID, which could be a way of solving this), it seems it's resolvable by cube or dimension structure changes.
If at some dimension you'd use e.g. ProvinceOrigin and ProvinceDestination as string type elements, and populate them from one single dimension (dynamic attribute) then whenever you save the cube you'll have these two fields populated from one single dimension.
Obviously the best solution for you depends on your system architecture.

Manage numeric Intervals on SQL

I want to manage some datas by intervals on my database like that :
It is possible to do that on an unique table or I need 3 tables, one for each color (with FK) ?
Real example :
Actually, on my app I use this on a dataGridView and on my database :
It is possible to set / modify or everything on three databases. I manually add the equivalency (green) but for some number with a little different is it the same equivalency, so it's - for me - interesting to use numeric intervals
I'm not an expert on modeling databases but this is how I solve your scenario.
I'd create two Range Tables, one for storing column values, and other one for row values, each table will have same structure but since you need to represent the final values in a matrix way i decide to consider two tables(instead of merging them in one, its possible but then you'll need more effort to showing data from "Values"). As you can see i've considered a IdEquivalency columns, this will be useful for showing the data ad needed.
Finally the table Values(for green values) has two FK(one for each range value), and the value stored.
This is still a basic idea, but I'm sure you get the point.
Considerations:
Change Table Names according what its value represent.

When creating a database - Is it advisable to create a column for data I can calculate based on other fields?

I am building an application for a lawyer, in which he can create a client portfolio. In this portfolio, there is the ID of the portfolio, the creation date, the client's name, telephone, etc.
Besides of all these fields, there is another field: "portfolio name". This field contains some information about the client from the other fields, in a formatted text.
So, for example, if:
ID = 271
client_name = "John Doe"
creation_date = 18/02/2016
the portfolio_name will be 271/John Doe/18022016.
Now, since the portifolio_name is not really containing new data, but only formatted data from other fields, should it really exist in the database table as a column? Is that Data Duplication or not?
This is a textbook violation of 1NF and should generally be avoided. It's acceptable in some cases -- for example, where the calculated value is very difficult or time-consuming to obtain. However, since string concatenation is so simple (you can even do it right in your query, without the definition of any pseudo-fields) I wouldn't ever recommend doing this unless the field simply contains an initial default value and the client has the ability to customize it later. Otherwise, it will eventually become inconsistent. E.g., what happens when a client's name changes?
It depends on the size of the table and how you query the table.
If the table is large you can create a column for the calculated field. So that it will be easy for querying.
If the table small you can calculate in the query
Most database engines allow you to create a computed column for this exact purpose. Depending on the engine and how you set up the computed column it may or may not be saved to disk, but it will be guaranteed to be always up to date. The nice thing is that you can treat it like it's a read only column.
https://technet.microsoft.com/en-us/library/ms191250%28v=sql.105%29.aspx

Database design: how to represent a field which can be of several types?

In my database model I want to represent a table PROPERTY which, among other fields, should have a field called "value". This value should be able to contain any type: integer, decimal, text, date, etc.
What is the best approach to achieve this? Having one field for each type (valueInteger, valueDate, etc) and filling only the desired one -> lots of empty fields? Or maybe store only string field which then should be parsed to the correct type?
Besides PROPERTY table, I will also need a PROPERTY_RANGE table, which will contain a lowerValue and upperValue fields. If I chose to have a field for each type, in this table I will need 2 fields (lower, upper) for each supported type.
I can also think about using some sort of tablet inheritance to distinguish "value" field type (although this may be killing a fly with a shotgun).
What is the best approach to solve this question?
FWIW, I would go with a table that contains one column per each possible data type of the property, both for correctness and maintainability. This will help especially during querying the records where the filters on different data types are different and converging all values to a VARCHAR field means that we need to convert these values to appropriate type at runtime which could prove costly, depending on the size of the table.

How should I model a field that can contain both numeric and string values in SQL Server 2005?

I have a new database table I need to create...
It logically contains an ID, a name, and a "value".
That value field could be either numeric or a character string in nature.
I don't think I want to just make the field a varchar, because I also want to be able to query with filters like WHERE value > 0.5 and such.
What's the best way to model this concept in SQL Server 2005?
EDIT:
I'm not opposed to creating multiple fields here (one for numbers, one for non-numbers), but since they're all really the same concept, I wasn't sure that was a great idea.
I guess I could create separate fields, then have a view that sort of coalesces them into a single logical column.
Any opinions on that?
What I want to achieve is really pretty simple... usually this data will just be blindly displayed in a grid-type view.
I want to be also able to filter on the numeric values in that grid. This table will end up being in the tens of millions of records, so I don't want to paint myself into a corner with querying performance.
That querying performance is my main concern.
A good way to get the query support you want is to have two columns: numvalue that stores a number and textvalue that stores characters. They should be nullable or at least have some default that represents no value. Your application can then decide which column to store its value and which to leave with no value.
Your issue with mixing data may be how Sql 2005 sorts text data. It's not a 'natural' sort.
If you have a varchar field and you do:
where value > '20.5'
Values like "5" will be in your result (as in a character based sort "5" comes after "20.5")
You're going to be better off with separate columns for storage.
Use Coalesce to merge them into one column if you need them merged in your results:
select [ID], [Name], Coalesce( [value_str], [value_num] )
from [tablename]
If you want to store numeric and string values in the same column, I am not sure you can avoid doing a lot of casts and converts when using that column as a query filter.
two columns.
Table: (ValueLable as char(x), Value as numerica(p,s))
I don't think it's possible to have a column with both varchar and int type. You could save your value as a varchar and cast it to int during your query. But this way you could get an exception if your value does contain any character. What are you trying to achieve?
If you want it to be able to hold a character string, I think you have to make the column varchar, or similar.
An alternative could be to have 2 or 3 columns instead of the one value column. Maybe have the three columns, value_type (enum between "number" and "string"), number_value, string_value. Then you could reconstruct that query to be
WHERE value_type = 'number' AND number_value > 0.5
I don't think you're going to be able to get around using VARCHAR or NVARCHAR as your data type. With mixed data like you're describing, you'll have to test the value when you pull the field out of the db and perform the appropriate CAST or CONVERT based on the data type.
I guess I could create separate fields, then have a view that sort of coalesces them into a single logical column. Any opinions on that?
It depends on the source of the data. If you are getting the data from users (or some other system) in some free-form manner and don't really care what type of data it is, then the best way to store it is the most generic manner (varchar, etc). If the incoming data is more structured and you care about that structure, then it makes more sense to keep that structure in the database by using separate fields.
From the viewpoint of a SELECT it doesn't really matter; you can store it either way and read it as the same schema. Once you get into filters (as you mention) things get a bit more hairy, but still easily doable. However, you don't mention if you need to be able to update this data and if so, if you need to enforce any validation on the data.
From the sounds of it, you need to do different types of searches based on the "type" of value being stored. As such, it may make sense to add a Type field so that any filters can be quickly limited to the type of values that you care about. Note, by Type I mean a more logical, application scope, Type; not the actual datatype being stored.
My recommendation would be to use a single field with a Type column if you need to easily support UPDATEs or use multiple fields (or tables, if these are totally different data sets) if SELECTing and filtering is all that is needed.
You might consider using two columns, one "string" and one "numeric" (whatever variants of those are appropriate) with the "string" column NOT NULL and the "numeric" column allowing NULL values. When inserting a value, always populate the "string" column indpendent of the type, however if the value is numeric, ALSO store it in the "numeric" column. Now you have a built in indicator as to the type (if the "numeric" column is populated it is numeric, if not it is a string), can always just pull the value for display from the "string" column, and can use the "numeric" value in calculations or for proper numeric sorting / comparison as needed. You could always add a third column indicating the value type, but this approach eliminates the need for that. Note that you might consider maintaining the numeric and string values using a set of INSERT and UPDATE triggers.

Resources