I have a data model based on a star schema. It stores three date elements. I integrated them into one role play dimension to avoid redundant dates. I would like to store my data into a data vault model in the core DWH and show the star schema as a view. But right now, im not sure how to handle the problem with the role play model. Should I implement three separate Hubs and Sats fpr the dates? and put them together in the View Layer? or can i implement one date hub + sat and reference them to the link table three times (to the three different dates)?
best regards
I consider the dates as reference table. I have draw a logical model If i understood your problem correctly, Then next logical model would be a possible solution on how to use the same in Hub satelitte or link satellite.
A Role playing Dimension or you could have 3 views on this dimension :
Solution :
Note: This is logical model, So "NO" physical foreign keys.
Dan's Definition of "Reference tables" are referenced from Satellites, but never bound with physical foreign keys. There is no prescribed structure for reference tables: use what works best in your specific case, ranging from simple lookup tables to small data vaults or even stars. They can be historical or have no history, but it is recommended that you stick to the natural keys and not create surrogate keys in that case.[20] Normally, data vaults have a lot of reference tables, just like any other Data Warehouse.
https://en.wikipedia.org/wiki/Data_vault_modeling#Reference_tables
Related
I am struggling in figuring out how to create a star schema from multiple source tables. I work at a trading firm so the data is related to user trading activity. The issue I am having is that our datasets do not have primary ids for every field that could be a dimension. Instead, we usually relate our data together using the combination of date and account number. Here is an example of 3 source tables...
I would like to turn this into a star schema, something that looks like ...
Is my only option to denormalize my source tables into one wide table (joining trades to position on account number and date, and joining the users table on account number), create keys for each dimension, then re normalizing it into the star schema? Are star schema's ever built from multiple source tables?
Star schemas are almost always created from multiple source tables.
The normal process is:
Populate your dimension tables
Create a temporary/virtual fact record using your source data
Using this fact record, look up the relevant dimension keys
Write the actual fact record to your target fact table
Data-warehousing is about query speed. The data-warehouse should not be concerned with data integrity. IT SHOULD NOT CLEAN OR CORRECT BAD DATA. It only needs to gather all the data together into a single record to present to the model for analysis. Denormalizing the data is how this is done.
In a star schema, dimensions do not know about each other and have no relationships with other dimensions. In a snowflake, dimensions are related to other dimensions. That is the primary difference between star and snowflake.
All the metadata options for events are rolled up into dimensions and used for slicing/filtering. All the measurable/calculation data for an event are in the event fact, along with a reference to the dimension(s) containing the relevant metadata. The Metadata/Dimension is reused across multiple fact records.
Based on the limited example you've provided, I'd suggest you research degenerate dimensions and junk dimensions. Your Trade and Position data may need to be turned into a fact and a dimension (degenerate), and some of your flag attributes may be best placed into a junk dimension.
You should also make sure your dimension keys are clear. You should not have multiple paths to a dimension (accountnumber: trade -> position -> user & trade -> user ) as that will cause inconsistent results when querying depending on which relationship you traverse.
We have a situation in a database design in our company. We are trying to figure out the best way to design the database to store transactional data. I need expert’s advice on the best relational design to achieve it. Problem: We have different kind of “Entities” in our system, for example; Customers, Services, Dealers etc. These Entities are doing transfer of funds between each other. We need to store the history of the transfers in database.
Solutions:
One table of transfers and another table to keep “Accounts” information. There are three tables “Customers”, “Services”, “Dealers”. There is another table “Accounts”. An account can be related to any of the “Entities” mentioned above; it means (and that’s the requirement) that logically there should be a one-to-one relationship to/from Entities and Accounts. However, we can only store the Account_ID in the Entities table, but we cannot store the foreign key of Entities in Accounts table. Here the problem happens in terms of database design. Because if there is a customer’s account, it is not restricted by the database design to not be stored in Services table etc. Now we can keep all transfers in one table only since Accounts are unified among all the entities.
Keep the balance information in the table primary Entities table and separate tables for all transfers. Here for all kind of transfers between the entities, we are keeping separate tables. For example, a transfer between a Customer and Service provider will be stored in a table called “Spending”. Another table will have transfer data for transfer between Service and Dealers called “Commission” etc. In this case, we are not storing all the transfers of the funds in a single table, but the foreign keys are properly defined since the tables “Spending” and “Commission” are only between two specific entities.
According to the best practices, which one of the above given solutions is correct, and why?
If you are simply looking for schemas that claim to deal with cases like yours, there is a website with hundreds of published schemas. Some of these pertain to storing transaction data concerning customers and suppliers. You can take one of these and adapt it.
http://www.databaseanswers.org/data_models/
If your question is about how to relate accounts to business contacts, read on.
Customers, Services, and Dealers are all sub classes of some super class that I'll call Contacts. There are two well known design patterns for modeling sub classes in database tables. And there is a technique called Shared primary Key that can be used with one of them to good advantage.
Take a look at the info and the questions grouped under these three tags:
single-table-inheritance class-table-inheritance shared-primary-key
If you use class table inheritance and shared primary key, you will end up with four tables pertaining to contacts: Contacts, Customers, Dealers, and Services. Every entry in Contacts will have a corresponding entry in one of the three subclass tables.
An FK in the accounts table, let's call it Accounts.ContactID will not only reference a row in Contacts, but also a row in whichever of Customers, Dealers, Services pertains to the case at hand.
This may work outwell for you. Alternatively, single table table inheritance works out well in some of the simpler cases. It depends on details about your data and your intended use of it.
You can make table Accounts with three fields with FK to Customers,Dealers and Services and it's will close problem. But also you can make three table for each type of entity with accounting data. You have the deal with multi-system case in system design. Each system solve the task. But for deсision you need make pros and con analyses about algorithm complexity, performance and other system requirements. For example one table will be more simple to code, but three table give more performance of sql database.
I'm trying to figure out how to implement this relationship in coldfusion. Also if anyone knows the name for this kind of relationship I'd be curious to know it.
I'm trying to create the brown table.
Recreating the table from the values is not the problem, the problem that I've been stuck with for a couple of days now is how to create an editing environment.
I'm thinking that I should have a table with all the Tenants and TenantValues (TenantValues that match TenantID I'm editing) and have the empty values as well (the green table)
any other suggestions?
The name of this relationship is called an Entity Attribute Value model (EAV). In your case Tenant, TenantVariable, TenantValues are the entity, attribute and value tables, respectively. EAV is attempt to allow for the runtime definition or entities and is most found in my experience backing content managements systems. It has been referred to an as anti pattern database model because you lose certain RDBMS advantages, while gaining disadvantages such as having to lock several tables on delete or save. Often a suitable persistence alternative is a NoSQL solution such as Couch.
As for edits, the paradigm I typically see is deleting all the value records for a given ID and inserting inside a loop, and then updating the entity table record. Do this inside of a transaction to ensure consistency. The upshot of this approach is that it's must easier to figure out than delta detection algorithm. Another option is using the MERGE statement if your database supports it.
You may want to consider an RDF Triple Store for this problem. It's an alternative to Relational DBs that's particularly good for sparse categorical data. The data is represented as triples - directed graph edges consisting of a subject, an object, and the predicate that describes the property connecting them:
(subject) (predicate) (object)
Some example triples from your data set would look something like:
<Apple> rdf:type <Red_Fruit>
<Apple> hasWeight "1"^^xsd:integer
RDF triple stores provide the SPARQL query language to retrieve data from your store much like you would use SQL.
We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.
I asked about Choosing a method to store user profiles the other day and received an interesting response from David Thomas Garcia suggesting I use the Table Module design pattern. It looks like this is probably the direction I want to take. Everything I've turned up with Google seems to be fairly high level discussion, so if anyone could point me in the direction of some examples or give me a better idea of the nuts and bolts involved that would be awesome.
The best reference is "Patterns of Enterprise Application Architecture" by Martin Fowler:
Here's an excerpt from the section on Table Module:
A Table Module organizes domain
logic with one class per table in the
database, and a single instance of a
class contains the various procedures
that will act on the data. The
primary distinction with Domain
Model is that, if you have many
orders, a Domain Model will have one
order object per order while a Table
Module will have one object to handle
all orders.
Table Module would be particularly useful in the flexible database architecture you have described for your user profile data, basically the Entity-Attribute-Value design.
Typically, if you use Domain Model, each row in the underlying table becomes one object instance. Since you are storing user profile information in multiple rows, then you end up having to create many Domain Model objects, whereas what you really want is one object that encapsulates all the user properties.
Instead, the Table Module makes it easier for you to code logic that applies to multiple rows in the underlying database table. If you create a profile for a given user, you'd specify all those properties, and the Table Module class would have the code to translate that into a series of INSERT statements, one row per property.
$table->setUserProfile( $userid, array('firstname'=>'Kevin', 'lastname'=>'Loney') );
Likewise, querying a given user's profile would use the Table Module to map the multiple rows of the query result set to object members.
$hashArray = $table->getUserProfile( $userid );