I have the following dimensional tables:
DimUser
DimClient
DimLocation
DimDate
DimTime
DimLog
DimStatuses
How could I represent the following events in the fact table?
Logs by user over time
User status change over time
Let's say the fact table is something like this:
application_id
location_id
user_id
client_id
log_id
date_id
time_id
status_id
3
19
3
2
69
45
64
1
23
1
1
10
207
1
Is it a valid representation where the first record refers to a log event and the second record refers to an user status change?
Every measure needs to have its grain defined. If 2 measures have the same grain then they can, but don’t have to, be stored in the same fact table. If 2 measures don’t have the same grain then they can’t be stored in the same fact table
Different facts should be represented in the schema under different fact tables. For this case, there should be a fact table for logs by user and another fact table for user status change over time, this is known as a galaxy schema or multi fact schema, so in other words it should look like this:
So, it's feasible because you can create your measures according your grain.
Number of users by application (user status change over time fact
table)
Number of logs or events by user (logs by user fact table)
Active/Inactive users (user status change over time fact table) user
logs over time (logs by user fact table)
Related
My goal is to design a portion of a database that captures the time an activity occurs or the time a record is updated. I would also like the database to set certain field values of new records in one table based on field values from the record of another table or query.
For the first goal, there will be 4 entities: user, subject, activityLog (intermediate entity for a many-to-many between user and subject with an updatedTime field in addition to the primary keys), and a violation entity. The violation entity will also have both users and subjects as foreign keys.
Each time a user adds a new subject record into the violation table or updates an existing record in the violation table, I would like the database to programmatically select the current record's values and copy those values into a new record (essentially duplicate the entire record or certain field values I choose) in the activityLog and set the current system date/time in its updatedTime field.
For the second goal, my agency has business rules that impose penalties for violations and the penalties are assessed based on first, second, third offenses. For example, if a subject commits 5 violations and 2 of the 5 are of the same violation type, then the penalty for the 2nd occurrence of the 2 violations that are the same should be elevated to a second offense penalty (all others will remain at 1st offense if no other violation types are 2 or more).
What I'd like the database to do is select the subjectID and violationID from the activityLog table, group by subjectID and count the number of violatonIDs. After typing this out, I am realizing that is basically a query. So, the results of this query will tell me how many times an individual committed a violation, and I'd write VBA code to update a table's record that contained the queried data (this table would be permanent...I have no clue what type of query this would be...update query, perhaps).
Based on the descriptions I have provided above, how would this design be rated as far as good/bad/efficient/inefficient? Please
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The problem
Consider these database tables:
Product
Order
Order Details
User
Product has columns:
Product_Name, Product_Description, Product_Size, Product_Cost, Product_Unit
Order has columns:
Order_number, Order_Total, Order_Status, Order_Payment_Status, Order_UserId (Fk of user table), Order_date
Order Details has columns:
OrderDetails_OrderId(Fk of Order table), OrderDetails_ProductId (Fk of Product table), OrderDetails_Quantity
User has columns:
User_Name, User_Phone (unique), User_Email (unique), User_Address
Consider, order statuses to be placed, packed, delivered, canceled, closed.
Now, There are three orders for the user u1:
order O1 -> Placed status (Editable by user)
order O2 -> Placed status (Editable by user)
order O3 -> Closed status (Non-editable by the user, but editable from admin)
Now the scenario is that the user u1 updates his information. This updated information should start reflecting in O1 and O2 only because they are still in the placed status; while O3 was already closed for the user and is now open only for admin's edit - so O3 should still reflect the old user information that was there previously. With the current database structure - this is not possible.
Similarly, if the admin edits product that was there in closed order, then the edits should not be displayed in the closed order.
As you may have figured out, the current structure depicted above is a simple foreign key related structure, wherein edit in one will obviously reflect directly to all related entities.
What solutions I figured out?
Solution 1: Versioning
Never update any row/entry. Always add a new row for any change (Soft updates). Keep adding rows with some tag/id/timestamp/audit trail (who edited) and map the version with the order table using mapping table.
i.e.
User_Name | User_Phone | User_Email | User_Address | Version/Timestamp
abc | 123 | abc#email.com |someaddres | v1
abc | 234 | abc#email.com |someaddress | v2
new mapping table
version | order_id
v1 | o3
Drawbacks of this solution are:
Same table multiple entries, for the same entity - then we won't be able to use unique keys. Here, phone and email are unique, but if we pick this approach then unique indices have to be removed.
All those tables (unrelated to order) that would have the foreign keys of the User table will have an impact. For example, the user_feedback table has the user's foreign key only, but now since there are multiple entries with a different version of the same user, this table will be impacted needlessly.
When the number of users would increase, performance will be impacted in select queries.
User's email is the identity, which is used for logging in. Its duplication is anyhow not possible in the same table.
No this is not audit trailing!
As per our requirement, the old information that we want to preserve for o3, should still be kept editable. Therefore, those edits shall also have to be audited. So audit trailing will be a separate wrapper altogether.
Solution 2: When order closes, create a new table with columns saving json/dump of all respective tables
i.e.
new table
order_id | JsonOfUser | JsonOfProductDetails | ...
o3 | {"name":"abc",...} | ... |
Drawbacks of this solution:
The things dumped are to be editable, but here the dumped data is difficult to edit because now the table has changed, and this table has a string/jsonb column that is effectively going to get edited, and other navigations are removed (denormalized) so all calculation changes that potentially would happen due to edits, would also have to be done manually.
Audit trailing of the edits in this table will be cumbersome because we'll be auditing the json edits here.
Deep level child jsons - increases code complexity.
Solution 3: Create copies of all tables with structure intact, that are related to order according to status events
i.e.
User_Common
User_Closed
For order O3, upon closure, all details of user_common will be copied to User_closed, and order O3 that had the foreign key of User_common table will be changed with the foreign key of User_Closed table. Now any changes in the o3 will effectively be over old data and all other open/placed orders can still get the updated information from User_common.
Drawbacks of this solution:
Suppose there happen to be 10 such tables related to order with this requirement, then each table's copies will have to be made
Each entity is now effectively represented by two tables based on event/status of the order - syncing issues and data-keeping issues may happen - i.e. maintainability.
Foreign keys are changing here for the order table. So effectively in the order table, there would be two foreign key columns: one for user_common, and other for user_closed. So when the order is open, the user_closed foreign key will remain null, and when order closes, it will get filled. Before that, 1 data operation will still occur, one to copy the information on order closure from the user_common table to user_closed.
In code, we'll always have to make a DB check for whether the lookup should be made in a common table or closed table based on order status (another DB call) - leading to code-level cognitive complexity
This was a minimal dummy replication of our requirement and proposed solutions that are thought of in research. What is the practical possible design that can adhere to this requirement without adding needless complexities?
Use the approach that has the User table add another column for a valid_since column. Apply the "never delete" strategy to Users.
If you measure performance issues add a persited/materialized (in memory) view for User that only shows the most current address - use that to get the user_id for newly placed orders and joins to show orders that are open. Joins for existing orders that use foreign keys into User mostly don't care about how many actual user_id's there are (simplified).
Use an after_insert trigger on User to propagate the new user_id to all entries of the Order to table that should reflect those changes, ignore closed orders for that update. This will be, in respect to the orders, a rather small update - how many open orders is one user allowed to have? 10? 20? 50?
Cleanup user data regularly in case they change but never order anything - those User entries can get erased.
You ensure integrity on database level that way - if you want add a report for users that change their details more then thrice a day (or limit those changes frontend wise).
Most of your user fields should also be 1:N relations - I have at least 3 telephone numbers and might use 2 addresses (living + shipping) and more then 1 email is a given.
Changing out those to own tables with a "active" might elivate the need to create full User copys. Based on buisiness needs the "address" used to ship stuff to might be worth remembering, not so much for the mobilenumber used to order or the email used to send confirmation towards but that is a business decision.
Unless you have a very active shop with millions of different users that order hundreds of orders each and change their details twenty times a month you wont get into problems using any of this with currently state of the art databse system. It seems to me this is rather a thought experiment then based in actual needs?
The users I am concerned with can either be "unconfirmed" or "confirmed". The latter means they get full access, where the former means they are pending on approval from a moderator. I am unsure how to design the database to account for this structure.
One thought I had was to have 2 different tables: confirmedUser and unconfirmedUser that are pretty similar except that unconfirmedUser has extra fields (such as "emailConfirmed" or "confirmationCode"). This is slightly impractical as I have to copy over all the info when a user does get accepted (although I imagine it won't be that bad - not expecting heavy traffic).
The second way I imagined this would be to actually put all the users in the same table and have a key towards a table with the extra "unconfirmed" data if need be (perhaps also add a "confirmed" flag in the user table).
What are the advantages adn disadvantages of each approach and is there perhaps a better way to design the database?
The first approach means you'll need to write every query you have for two tables - for everything that's common. Bad (tm). The second option is definitely better. That way you can add a simple where confirmed = True (or False) as required for specific access.
What you could actually ponder over is whether or not the confirmed data (not the user, just the data) is stored in the same table. Perhaps it would be cleaner + normalized to have all confirmation data in a separate table so you left join confirmation on confirmation.userid = users.id where users.id is not null (or similar, or inner join, or get all + filter in server side script, etc.) to get only confirmed users. The additional data like confirmation email, date, etc. can be stored here.
Personally I would go for your second option: 1 users table with a confirmed/pending column of type boolean. Copying over data from one table to another identical table is impractical.
You can then create groups and attach specific access rights to each group and assign each user to a specific group if the need arises.
Logically, this is inheritance (aka. category, subclassing, subtype, generalization hierarchy etc.).
Physically, inheritance can be implemented in 3 ways, as mentioned here, here, here and probably in many other places on SO.
In this particular case, the strategy with all types in the same table seems most appropriate1, since the hierarchy is simple and unlikely to gain new subclasses, subclasses differ by only a few fields and you need to maintain the parent-level key (i.e. unconfirmed and confirmed user should not have overlapping keys).
1 I.e. the "second way" mentioned in your question. Whether to also put the confirmation data in the same table depends on the needed cardinality - i.e. is there a 1:N relationship there?
the Best way to do this is to have a Table for the users with a Status ID as a Foreign Key, the Status Table would have all the different types of Confirmations all the different combinations that you could have. this is the best way, in my opinion, to structure the Database for Normalization and for your programming needs.
so your Status Table would look like this
StatusID | Description
=============================================
1 | confirmed
2 | unconfirmed
3 | CC confirmed
4 | CC unconfirmed
5 | acct confirmed CC unconfirmed
6 | all confirmed
user table
userID | StatusID
=================
456 | 1
457 | 2
458 | 2
459 | 1
if you have a need for the Confirmation Code, you can store that inside the user table. and program it to change after it is used, so that you can use that same field if they need to reset a password or what ever.
maybe I am assuming too much?
I am wondering what is the best way to make bank transaction table.
I know that user can have many accounts so I add AccountID instead of UserID, but how do I name the other, foreign account. And how do I know if it is incoming or outgoing transaction. I have an example here but I think it can be done better so I ask for your advice.
In my example I store all transactions in one table and add bool isOutgoing. So if it is set to true than I know that user sent money to ForeignAccount if it's false then I know that ForeignAccount sent money to user.
My example
Please note that this is not for real bank, of course. I am just trying things out and figuring best practices.
My opinion:
make the ID not null, Identity(1,1) and primary key
UserAccountID is fine. Dont forget to create the FK to the Accounts table;
You could make the foreignAccount a integer as well if every transaction is between 2 accounts and both accounts are internal to the organization
Do not create Nvarchar fields unless necessary (the occupy twice as much space) and don't create it 1024. If you need more than 900 chars, use varchar(max), because if the column is less than 900 you can still create an index on it
create the datetime columns as default getdate(), unless you can create transactions on a different date that the actual date;
Amount should be numeric, not integer
usually, i think, you would see a column to reflect DEBIT, or CREDIT to the account, not outgoing.
there are probably several tables something like these:
ACCOUNT
-------
account_id
account_no
account_type
OWNER
-------
owner_id
name
other_info
ACCOUNT_OWNER
--------------
account_id
owner_id
TRANSACTION
------------
transaction_id
account_id
transaction_type
amount
transaction_date
here you would get 2 records for transactions - one showing a debit, and one for a credit
if you really wanted, you could link these two transactions in another table
TRANSACTION_LINK
----------------
transaction_id1
transaction_id2
I'd agree with the comment about the isOutgoing flag - its far too easy for an insert/update to incorrectly set this (although the name of the column is clear, as a column it could be overlooked and therefore set to a default value).
Another approach for a transaction table could be along the lines of:
TransactionID (unique key)
OwnerID
FromAccount
ToAccount
TransactionDate
Amount
Alternatively you can have a "LocalAccount" and a "ForeignAccount" and the sign of the Amount field represents the direction.
If you are doing transactions involving multiple currencies then the following columns would be required/considered
Currency
AmountInBaseCcy
FxRate
If involving multiple currencies then you either want to have an fx rate per ccy combination/to a common ccy on a date or store it per transaction - that depends on how it would be calculated
I think what you are looking for is how to handle a many-tomany relationship (accounts can have multiple owners, owners can have mulitple accounts)
You do this through a joining table. So you have account with all the details needed for an account, you have user for all teh details needed for a user and then you have account USer which contains just the ids from both the other two tables.
I have a problem in creating a table for a database. I want to record many status for each farmer for example farmer will perform many procedures in paddy farming and have about 26 procedures from cultivation until harvesting.
So, each farmer must follow a schedule for each procedure according to dates fixed by Agriculture assistant. My problem is how can I record this procedure status to record whether the farmer is following the schedule or not? For now, I use the 26 procedures as the attributes for the activity table so in the activity table I have attributes
farmerID, status1 (for activity 1 eg: Cultivation) ,
status2 (for activity 2 eg: fertilization),
status 3
and so on until status 26...so is this the correct way? My lecturer says it is incorrect because so many attributes are there. Can you help me out from this problem? I can't think about this any more.
Not a good way of handling it, especially since it's not immediately scalable without adding new fields (and having your code map those new fields). I'd do something like this:
tbl_farmer
- farmerId
tbl_status
- statusId
- name (i.e. Cultivation, etc.)
tbl_activity
- farmerId
- statusId
And each time a farmer performs a status update, you place the entry inside tbl_activity. Basically tbl_activity is a reference table
An alternative approach would be to give each activity (procedure) an id and instead of many columns only have three.
farmer_id
activity_id
status
Assuming that your activities are stored in a separate table.