Bank transactions table - can this be done better?

Bank transactions table - can this be done better? - sql-server

I am wondering what is the best way to make bank transaction table.
I know that user can have many accounts so I add AccountID instead of UserID, but how do I name the other, foreign account. And how do I know if it is incoming or outgoing transaction. I have an example here but I think it can be done better so I ask for your advice.
In my example I store all transactions in one table and add bool isOutgoing. So if it is set to true than I know that user sent money to ForeignAccount if it's false then I know that ForeignAccount sent money to user.
My example
Please note that this is not for real bank, of course. I am just trying things out and figuring best practices.

My opinion:
make the ID not null, Identity(1,1) and primary key
UserAccountID is fine. Dont forget to create the FK to the Accounts table;
You could make the foreignAccount a integer as well if every transaction is between 2 accounts and both accounts are internal to the organization
Do not create Nvarchar fields unless necessary (the occupy twice as much space) and don't create it 1024. If you need more than 900 chars, use varchar(max), because if the column is less than 900 you can still create an index on it
create the datetime columns as default getdate(), unless you can create transactions on a different date that the actual date;
Amount should be numeric, not integer

usually, i think, you would see a column to reflect DEBIT, or CREDIT to the account, not outgoing.
there are probably several tables something like these:
ACCOUNT
-------
account_id
account_no
account_type
OWNER
-------
owner_id
name
other_info
ACCOUNT_OWNER
--------------
account_id
owner_id
TRANSACTION
------------
transaction_id
account_id
transaction_type
amount
transaction_date
here you would get 2 records for transactions - one showing a debit, and one for a credit
if you really wanted, you could link these two transactions in another table
TRANSACTION_LINK
----------------
transaction_id1
transaction_id2

I'd agree with the comment about the isOutgoing flag - its far too easy for an insert/update to incorrectly set this (although the name of the column is clear, as a column it could be overlooked and therefore set to a default value).
Another approach for a transaction table could be along the lines of:
TransactionID (unique key)
OwnerID
FromAccount
ToAccount
TransactionDate
Amount
Alternatively you can have a "LocalAccount" and a "ForeignAccount" and the sign of the Amount field represents the direction.
If you are doing transactions involving multiple currencies then the following columns would be required/considered
Currency
AmountInBaseCcy
FxRate
If involving multiple currencies then you either want to have an fx rate per ccy combination/to a common ccy on a date or store it per transaction - that depends on how it would be calculated

I think what you are looking for is how to handle a many-tomany relationship (accounts can have multiple owners, owners can have mulitple accounts)
You do this through a joining table. So you have account with all the details needed for an account, you have user for all teh details needed for a user and then you have account USer which contains just the ids from both the other two tables.

Related

Either of 2 columns is always redundant -- is there a better solution?

Say, I want to create a form for a feedback. If a registered user submits a feedback, his email address is used automatically because he's authenticated. If an anonymous user does that, he has to enter his email manually. My table would look like this:
feedbacks(id, user_id, email, body)
As you can see, it has a redundant column: either user_id or email. And for those who's not familiar with the database structure it'll be confusing: why both email and user_id? can they both be null? or both have a value at the same time? in reality, only one of them must have a value, which isn't possibly to achieve on database level using constraints. Also, what if I by mistake insert values in both columns?
Thus, I wonder, is there any way to change its structure so that it's more wise and that issue described above has become resolved? Using a trigger isn't what I'm looking for.
In other words, the issue is "either of 2 columns is always redundant".

If you had several mutually exclusive columns, then you might have a good case for something called entity sub-typing. As it is, there is no good design reason for adding all of the extra overhead of this design pattern.
These are the basic options that you have:
Two mutually exclusive columns in one table - This is your current design. This is a good design because it lets you define a proper foreign key constraint on your user_id. You mention that it may be confusing for people that don't know the database well because the same kind of information might appear in one or the other place in the table. However, it's important to remember that even though both columns contain a string that happens to be in the form of an email address, to your system these things are semantically distinct. One is a foreign key to your user table. The other is a means of contacting (or identifying?) a non-member. You could avoid this apparent confusion in one of two ways: (a) give a more descriptive name to your email column, such as non_member_email or (b) create a view that coalesces user_id and email into a single column for displaying this information to people who would otherwise be confused.
Entity Subtyping - This approach has you create separate tables for logically separate groups of predicates (columns). These are joined together by a supertype table which gives a common primary key for all logical subtypes, as well as holding all other common predicates. You can google around to learn more about this design pattern. As I've already mentioned, this is overkill for your case because you only have one pair of mutually exclusive columns. If you think it's confusing to have this then having three tables (supertype, member subtype, non-member subtype) will really be confusing.
Column Overloading - This approach would have you combine both columns into a single one. This is feasible because you only need room in your table for one email address at a time. This is a terrible idea because it prevents you from creating a declarative referential constraint on user_id which is a very important tool for maintaining your data's referential integrity. It also conflates two semantically different pieces of information, which violates good database design principles.
The best choice is number 1. Don't worry about having two mutually exclusive columns or if you think you can't "comment" your way around the confusion you think this might cause with more descriptive column names, then use a view to hide the "complexity" of storing two things that look similar in two separate columns.

If one must be exclusively filled:
create table feedbacks (
id integer,
user_id text,
email text,
body text,
check ((user_id is null)::int + (email is null)::int = 1)
);
The cast from boolean to integer yields either 1 or 0, so the sum must be 1.

Remove the email field. If the user is registered, enter their user_id as you do now. If the user is not registered, search the user table for an anonymous entry with that email address. If exists, use that user_id. Otherwise, create an entry in the user table named 'Anonymous', storing the address and use the newly created user_id. There are two advantages:
You don't need mutually exclusive fields. As you have already noticed, these can be the cause of a lot of confusion and extra work to keep the data clean.
If an anonymous poster later registers, the existing "anonymous" user entry can be updated, thus preserving the user_id value and preserving all feedback (and any other activity you track for anonymous users) entered before registering. That is, if a user anonymously enters a few feedbacks then registers, the previous feedbacks remain associated with the now named user.

I might misunderstand the question, but why you say it is impossible to do with constraints?..
t=# CREATE TABLE feedbacks (
t(# id integer,
t(# user_id text CHECK (case when email is null then user_id is distinct from null else user_id is null end),
t(# email text CHECK (case when user_id is null then email is distinct from null else email is null end),
t(# body text
t(# );
CREATE TABLE
t=# insert into feedbacks select 1,null,null,'t';
ERROR: new row for relation "feedbacks" violates check constraint "feedbacks_check1"
DETAIL: Failing row contains (1, null, null, t).
t=# insert into feedbacks select 1,'t','t','t';
ERROR: new row for relation "feedbacks" violates check constraint "feedbacks_check1"
DETAIL: Failing row contains (1, t, t, t).
t=# insert into feedbacks select 1,'t',null,'t';
INSERT 0 1
t=# insert into feedbacks select 1,null,'t','t';
INSERT 0 1
t=# select * from feedbacks ;
id | user_id | email | body
----+---------+-------+------
1 | t | | t
1 | | t | t
(2 rows)

Database schema about social networking like Facebook

Like Facebook, I have posts, comments and user profiles.
I THINK THAT
Posts and comments do not need the details of user
ONLY user profiles need the details
So I separate the user information into main and detail
Here is the schema.
Question
Is it necessary to separate user data into main and details?
WHY not or WHY yes?
Thanks for applying!

I would recommend using separate tables because you may not need all that information at one time. You could do it either way but I think of it as do you need all of the data at once.
Table 1 (User Auth)
This table would hold only information for log-in and have three columns (user_name, hashed_password, UID)
So your query would select UID where user_name and hashed_password matched. I would also recommend never storing a readable password in a database table because that can become a security issue.
Table 2 (Basic Information)
This table would hold the least amount of information that you would get at signup to make a basic profile. The fields would consist of UID, name, DOB, zip, link_to_profile_photo, email and whatever basic information you would like. email is kind of special because if you require the user_name to be an email address there is no reason to have it twice.
Table 3 (Extended Information)
This table would hold any optional information that the user could enter like phone_number, bio or address assigned by UID.
Then after that you can add as many other tables that you would like. One for Post, one for comments, ect.
An Example of a Post table would be like:
post_id, UID, the_post, date_of_post, likes, ect.
Then for Comments
comment_id, for_post_id, UID, the_comment, date_of_comment, likes, ect.
Breaking it down in to small sections would be more efficient in the long run.

Database performance is associated with disk seek time. Disk seek time is a bottleneck of database performance. For large table, you may need large seek time to locate and read an entry. As for post and comments you do not need user details, just user main info, you may get reduced read time when you read only user Id for post and comments. Also joins with user_main_info will be faster. You may keep the smallest portion of data you need to read most frequently on one table and other detailed information on another table. But, in a scenario like when you will always need to read all the user information together, this won't give you any benefit.

1)the userinformation table will be added
ex:create table fb_users
(intuserid primary key,
username varchar(50),
phoneno int,
emailid varchar(max))
2)the sending of the friend request would be
2.a)create the table name called friends, friend requestor, friend requested by, status b/w both of them, Active flag
ex:create table fb_friends
(intfriendid primary key,
intfriendrequestor int (foreign key with fb_users's(intuserid)),
intfriendrequestedby int (foreign key with fb_users's(intuserid)),
statusid varchar(max)(use the status id from the below table which is a look up table),
active bit)
3)creating the table for the status the status
3.a)create the table name called status, statusname, statusdesc, Active flag
ex:create table fb_staus
(intstatusid primary key,
statusname varchar,
statusdesc varchar,
active bit)
the status could be
pending
approval
deleted
..etc
4)similarly for creating the groups,likes,comments also
a table will be created respectively for each one of them and the foreign key of the intuserid from user table
are linked for each of them

Database Design Questions

I have 2 questions regarding a project. Would appreciate if I get clarifications on that.
I have decomposed Address into individual entities by breaking down to the smallest
unit. Bur addresses are repeated in a few tables. Like Address fields are there in the
Client table as well as Employee table. Should we separate the Address into a separate table with just a linking field
For Example
Create an ADDRESS Table with the following attributes :
Entity_ID ( It could be a employee ID(Home Address) or a client ID(Office Address) )
Unit
Building
Street
Locality
City
State
Country
Zipcode
Remove all the address fields from the Employee table and the Client table
We can obtain the address by getting the employee ID and referring the ADDRESS table for the address
Which approach is better ? Having the address fields in all tables or separate as shown above. Any thoughts on which design in better ?

Ya definitely separating address is better Because people can have multiple addresses so it will be increasing data redundancy.
You can design the database for this problem in two ways according to me.
A. Using one table
Table name --- ADDRESS
Column Names
Serial No. (unique id or primary key)
Client / Employee ID
Address.
B. Using Two tables
Table name --- CLIENT_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to client table)
Address.
Table name --- EMPLOYEE_ADDRESS
Column Names
Serial No. (unique id or primary key)
Client ID (foreign key to employee table)
Address.
Definitely you can use as many number of columns instead of address like what you mentioned Unit,Building, Street e.t.c
Also there is one suggestion from my experience
Please add this five Columns in your each and every table.
CREATED_BY (Who has created this row means an user of the application)
CREATED_ON (At what time and date table row was created)
MODIFIED_ON (Who has modified this row means an user of the application)
MODIFIED_BY (At what time and date table row was modified)
DELETE_FLAG (0 -- deleted and 1 -- Active)
The reason for this from point of view of most of the developers is, Your client can any time demand records of any time period. So If you are deleting in reality then it will be a serious situation for you. So every time when a application user deleted an record from gui you have to set the flag as 0 instead of practically deleting it. The default value is 1 which means the row is still active.
At time of retrieval you can select with where condition like this
select * from EMPOLOYEE_TABLE where DELETE_FLAG = 1;
Note : This is an suggestion from my experience. I am not at all enforcing you to adopt this. So please add it according to your requirement.
ALSO tables which don't have any significant purpose doesn't need this.

Separating address into a seperate table is a better design decision as it means any db-side validation logic etc. only needs to be maintained in one place.

approaches to design of database, which one?

I'm designing a database that will hold a list of transactions. There are two types of transactions, I'll name them credit (add to balance) and debit (take from balance).
Credit transactions most probably will have an expiry, after which this credit balance is no longer valid, and is lost.
Debit transactions must store from which credit transaction they come from.
There is always room for leniency with expiry dates. It does not need to be exact (till the rest of the day for example).
My friend and I have came up with two different solutions, but we can't decide on which to use, maybe some of you folks can help us out:
Solution 1:
3 tables: Debit, Credit, DebitFromCredit
Debit: id | time | amount | type | account_id (fk)
Credit: id | time | amount | expiry | amount_debited | accountId (fk)
DebitFromCredit: amount | debit_id (fk) | credit_id (fk)
In table Credit, amount_debited can be updated whenever a debit transaction occurs.
When a debit transaction occurs, DebitFromCredit holds information of which credit transaction(s) has this debit transaction been withdrawn.
There is a function getBalance(), that will get the balance according to expiry date, amount and amount_debited. So there is no physical storage of the balance; it is calculated every time.
There is also a chance to add a cron job that will check expired transactions and possibly add a Debit transaction with "expired" as a type.
Solution 2
3 tables: Transactions, CreditInfo, DebitInfo
Transactions: id | time | amount (+ or -) | account_id (fk)<br />
CreditInfo: trans_id (fk) | expiry | amount | remaining | isConsumed<br />
DebitInfo: trans_id (fk) | from_credit_id (fk) | amount<br />
Table Account adds a "balance" column, which will store the balance. (another possibility is to sum up the rows in transactions for this account).
Any transaction (credit or debit) is stored in table transactions, the sign of the amount differentiates between them.
On credit, a row is added to creditInfo.
On debit one or more rows are added to DebitInfo (to handle debiting from multiple credits, if needed). Also, Credit info row updates the column "remaining".
A cron job works on CreditInfo table, and whenever an expired row is found, it adds a debit record with the expired amount.
Debate
Solution 1 offers distinction between the two tables, and getting data is pretty simple for each. Also, as there is not a real need for a cron job (except if to add expired data as a debit), getBalance() gets the correct current balance. Requires some kind of join to get data for reporting. No redundant data.
Solution 2 holds both transactions in one table, with + and - for amounts, and no updates are occurring to this table; only inserts. Credit Info is being updated on expiry (cron job) or debiting. Single table query to get data for reporting. Some redundancy.
Choice?
Which solution do you think is better? Should the balance be stored physically or should it be calculated (considering that it might be updated with cron jobs)? Which one would be faster?
Also, if you guys have a better suggestion, we'd love to hear it as well.

Which solution do you think is better?
Solution 2. A transaction table with just inserts is easier for financial auditing.
Should the balance be stored physically or should it be calculated (considering that it might be updated with cron jobs)?
The balance should be stored physically. It's much faster than calculating the balance by reading all of the transaction rows every time you need it.

I am IT student that has passed a course called databases, pardon my inexperience.
I made this using MySQL workbench can send you model via email to you do not lose time recreating the model from picture.
This schema was made in 10 minutes. Its holding transactions for a common shop.
Schema explanation
I have a person who can have multiple phones and addresses.
Person makes transactions when he is making a transaction,
You input card name e.g. american express,
card type credit or debit (MySQL workbench does not have domains or constrains as power-designer as far as i know so i left field type as varchar) should have limited input of string debit or credit,
Card expiry date e.g. 8/12,
Card number e.g. 1111111111
Amount for which to decrease e.g. 20.0
time-stamp of transaction
program enters it when entering data
And link it to person that has made the transsaction
via person_idperson field.
e.g.
1
While id 1 in table person has name John Smith
What all this offers:
transaction is uniquely using a card. Credit or Debit cant be both cant be none.
Speed less table joins more speed for system has.
What program requires:
Continuous comparion of fields variable exactTimestamp is less then variable cardExpiery when entering a transaction.
Constant entering of card details.
Whats system does not have
Saving amount that is used in the special field, however that can be accomplished with Sql query
Saving amount that remains, I find that a personal information, What I mean is you come to the shop and shopkeeper asks me how much money do you still have?
system does not tie person to the card explicitly.
The person must be present and use the card with card details, and keeping anonymity of the person. (Its should be a query complex enough not to type by hand by an amateur e.g. shopkeeper) , if i want to know which card person used last time i get his last transaction and extract card fields.
I hope you think of this just as a proposition.

Database design and large tables?

Are tables with lots of columns indicative of bad design? For example say I have the following table that stores user information and user settings:
[Users table]
userId
name
address
somesetting1
...
somesetting50
As the site requires more settings the table gets larger. In my mind this table is normalized, all the settings are dependent on the userId.
I have a thing against tables with lots of columns it just seems wrong to me, but then I remembered that you can select what data to return from the table, so If the table is large I could still break it into several different objects in code. For example
[User object]
[UserSetting object]
and return only the data to fill those objects.
Is the above common practice, or are their other techniques that deal with tables with lots of columns that are more suitable to use?

I think you should use multiple tables like this:
[Users table]
userId
name
address
[Settings table]
settingId
userId
settingKey
settingValue
The tables are related by the userId column which you can use to retrieve the settings for the user you need to.

I would say that it is bad table design. If a user doesn't have an entry for 47 of those 50 settings then you will have a large number of NULL's in the table which isn't good practice and will also slow down performance (NULL's have to be handled in a special way).
Instead, have the following:
USER TABLE
Id,
FirstName
LastName
etc
SETTINGS
Id,
SettingName
USER SETTINGS
Id,
SettingId,
UserId,
SettingValue
You then have a many to many join, and eliminate NULL's

first, don't put spaces in table names! all the [braces] will be a real pain!
if you have 50 columns how meaningful will all that data be for each user? will there be lots of nulls? Most data may not even apply to any given user. Think 1 to 1 tables, where you break down the "settings" into logical groups:
Users: --main table where most values will be stored
userId
name
address
somesetting1 ---please note that I'm using "somesetting1", don't
... --- name the columns like this, use meaningful names!!
somesetting5
UserWidgets --all widget settings for the user
userId
somesetting6
....
somesetting12
UserAccounting --all accounting settings for the user
userId
somesetting13
....
somesetting23
--etc..
you only need to have a Users row for each user, and then a row in each table where that data applies to the given user. I f a user doesn't have any widget settings then no row for that user. You can LEFT join each table as necessary to get all the settings as needed. Usually you only need to work on a sub set of settings based on which part of the application that is running, which means you won't need to join in all of the tables, just the one or tow that you need at that time.

You could consider an attributes table. As long as your indexes are good, then you wouldn't have too much of a performance issue:
[AttributeDef]
AttributeDefId int (primary key)
GroupKey varchar(50)
ItemKey varchar(50)
...
[AttributeVal]
AttributeValId int (primary key)
AttributeDefId int (FK -> AttributeDef.AttributeDefId)
UserId int (probably FK to users table?)
Val varchar(255)
...
basically you're "pivoting" your table with many columns into 2 tables with less columns. You can write views and table functions around this structure to give you data for a group of related items or just a specific item, etc. You could also add other things to the attribute definition table to indicate required data elements, restrictions on the data elements, etc.
What's your thought on this type of design?

Use several tables with matching indexes to get the best SELECT speed. Use the indexes as a way to relate the information between tables using a JOIN.