Designing inventory management database? - database

I am designing data base for inventory management system which is used by nearly 10 to 15 companies. This database contains nearly 25 tables.For designing database i'm planning to use shared schema architecture(ie each schema corresponding to a company and these all schemas are to be placed in a single database).
i want to know whether it is reliable to use shared schema architecture.
can any one please tell me is it correct decision to use above mentioned architecture.
Thanks in advance..

If I read your question, you are suggesting that each company has its own schema. This means two things:
If you decide to implement a basic change in the schema (ie not a change that one company requests), then you will have to implement this change in all the schemae.
You will probably have to implement different logic in your front
end program for each company.
Better you should develop one schema for the entire database; each table would have a field called 'CompanyID' which naturally would define to which company each row belongs. This field would be a foreign key to the Companies table.

Related

Securely store data for multiple party's in a single table

I'm certainly no DBA and only a beginner when it comes to software development, so any help is appreciated. What is the most secure structure for storing the data from multiple parties in one database? For instance if three people have access to the same tables, I want to make sure that each person can only see their data. Is it best to create a unique ID for each person and store that along with the data then query based on that ID? Are there other considerations I should take into account as well?
You are on the right track, but mapping the USER ID into the table is probably not what you want, because in practice many users have access to the corporations data. In those cases you would store "CorpID" as a column, or more generically "ContextID". But yes, to limit access to data, each row should be able to convey who the data is for, either directly (the row actually contains a reference to CorpID, UserID, ContextID or the like) or it can be inferred by joining to other tables that reference the qualifier.
In practice, these rules are enforced by a middle tier that queries the database, providing the user context in some way so that only the correct records are selected out of the database and ultimately presented to the user.
...three people have access to the same tables...
If these persons can query the tables directly through some query tool like toad then we have a serious problem. if not, that is like they access through some middle tier/service layer or so then #wagregg's solution above holds.
coming to the case when they have direct access rights then one approach is:
create database level user accounts for each of the users.
have another table with row level grant information. say your_table has a primary key column MY_PK_COL then the structure of the GRANTS_TABLE table would be like {USER_ID; MY_PK_COL} with MY_PK_COL a foreign key to your_table.
Remove all privileges of concerned users from your_table
Create a view. SELECT * FROM your_table WHERE user_id=getCurrentUserID();
give your users SELECT/INSERT/UPDATE rights on this view.
Most of the database systems (MySQL, Oracle, SQLServer) provide way to get current logged user. (the one used in the connection string). They also provide ways to restrict access to certain tables. now for your users the view will behave as a normal table. they will never know the difference.
a problem happens when there are too many users. provisioning a database level uer account to every one of them may turn difficult. but then DBMS like MsSQLServer can use windows authentication, there by reducing the user/creation problem.
In most of the scenarios the filter at middle tier approach is the best way. but there are times when security is paramount. Also a bug in the middle tier may allow malicious users to bypass the security. SQL injection is one thing to name. then you have to do what you have to do.
It sounds like you're talking about a multi-tenant architecture, but I can't tell for sure.
This SO answer has a summary of the issues, and links to an online article containing details about the trade-offs.

How to handle different organization with single DB?

Background
building an online information system which user can access through any computer. I don't want to replicate DB and code for every university or organization.
I just want user to hit a domain like www.example.com sign in and use it.
For second user it will also hit the same domain www.example.com sign in and use it. but the data for them are different.
Scenario
suppose a university has 200 employees, 2nd university has 150 and so on.
Qusetion
Do i need to have separate employee table for each university or is it OK to have a single table with a column that has University ID?
I assume 2nd is best but Suppose i have 20 universities or organizations and a total of thousands of employees.
What is the best approach?
This same thing is for all table? This is just to give you an example.
Thanks
The approach will depend upon the data, usage, and client requirements/restrictions.
Use an integrated model, as suggested by duffymo. This may be appropriate if each organization is part of a larger whole (i.e. all colleges are part of a state college board) and security concerns about cross-query access are minimal2. This approach has a minimal amount of separation between each organization as the same schema1 and relations are "openly" shared. It leads to a very simple model initially, but it can become very complicated (with compound FKs and correct usage of such) if needing relations for organization-specific values because it adds another dimension of data.
Implement multi-tenancy. This can be achieved with implicit filters on the relations (perhaps hidden behinds views and store procedures), different schemas, or other database-specific support. Depending upon implementation this may or may not share schema or relations even though all data may reside in the same database. With implicit isolation, some complicated keys or relationships can be hidden/eliminated. Multi-tenancy isolation also generally makes it harder/impossible to cross-query.
Silo the databases entirely. Each customer or "organization" has a separate database. This implies separate relations and schema groups. I have found this approach to to be relatively simple with automated tooling, but it does require managing multiple database. Direct cross-querying is impossible, although "linked databases" can be used if there is a need.
Even though it's not "a single DB", in our case, we had the following restrictions 1) not allowed to ever share/expose data between organizations, and 2) each organization wanted their own local database. Thus, our product ended up using a silo approach. Make sure that the approach chosen meets customer requirements.
None of these approaches will have any issue with "thousands", "hundreds of thousands", or even "millions" of records as long as the indices and queries are correctly planned. However, switching from one to another can violate many assumed constraints and so the decision should be made earlier on.
1 In this response I am using "schema" to refer to the security grouping of database objects (e.g. tables, views) and not the database model itself. The actual database model used can be common/shared, as we do even when using separate databases.
2 An integrated approach is not necessarily insecure - but it doesn't inherently have some of the built-in isolation of other designs.
I would normalize it to have UNIVERSITY and EMPLOYEE tables, with a one-to-many relationship between them.
You'll have to take care to make sure that only people associated with a given university can see their data. Role based access will be important.
This is called a multi-tenant architecture. you should read this:
http://msdn.microsoft.com/en-us/library/aa479086.aspx
I would go with Tenant Per Schema, which means copying the structure across different schemas, however, as you should keep all your SQL DDL in source control, this is very easy to script.
It's easy to screw up and "leak" information between tenants if doing it all in the same table.

Data independence in relational database

Can anybody explain how data independence in ensured in a relational database? What says that nothing will change for the user if the database structure changes?
For example, I have relation R (and have created an application which uses this relation) and the database admin decides to make a decomposition of R to R1 and R2. How is application inalterability ensured for the end user?
I asked myself exactly the same question during my Database class.
According Codd's 12 rules, there are two kinds of data independence:
Physical Data Independence requires that changes at the physical level (like data structures) have no impact in the applications that consume the database. For example, let's say you decide to stop using a Hash Index in your table and decide to use a B-Tree Index instead: Your application that executes queries against this table doesn't have to change at all.
Logical Data Independence states that changes at the logical level (tables, columns, rows) will have no impact in the applications that access the database. As you already noticed, this feature is harder to implement that Physical Data Independence but there are still cases when this feature works. For example, if you add Tables, Columns or Rows to your current scheme the already working queries aren't affected at all.
Your question is not phrased very clearly. I don't see the relationship between between "data independence" and "application inalterability".
A proper relational structure decomposes data into entities and relationships. The idea is that when a value changes, it only changes in one place. This is the reasoning behind the various "normal forms" of data.
Most user applications do not want to see data in a normalized form. They want to see data in a denormalized form, often with lots of fields gathered together on one line. Similarly, an update might involve several fields in different entities, but to a user, it is just one thing.
A relational database can maintain the structure of the data and allow you to combine data for different viewpoints. It has nothing to do with your second point. Application independence (I think this is a better word than "inalterability") depends on how the application is designed. A well-designed application has a well-design application programming interface (also known as an API).
It seems that a lot of database developers think that the physical data structure is good enough as an API. However, this is often a bad design decision. Often, a better design decision is to have all database operations performed through stored procedures, views, and user defined functions. In other words, don't directly update a table. Create a stored procedure called something usp_table_update that takes fields and updates the table.
With such a structure, you can modify the underlying database structure and maintain user applications at the same time.
what says that nothing will change for the user if the database
structure changes?
Well, database structures can change for many reasons. On a high level, I see two possibilities:
Performance / internal database reasons
Business rules / the world outside the application changed
#1: in this case, the DBA has decided to change some structure for performance or ... In that case an extra layer, for example using stored procedures, views etc. can help to "hide" the change to the application/user. Or a good data-layer on the application side could be helpfull.
#2: if the outside world changes, or your business rules change, NOTHING you can do on the database level can keep that away from the user. For example a company that always has used only ONE currency in the database is suddenly going international: in that case your database has to be adopted to support multi currency and it will need serious alteration in the database and for the user.
For example, I have relation R (and created application which uses this relation) and the database admin desides to make a decomposion of R to R1 and R2. How the application inalternability is ensured for the end user?
The admin should create a view which would represent R1 and R2 as the original R.

Few database design questions relating to user content site

Designing a user content website (kind of similar to yelp but for a different market and with photo sharing) and had few databse questions:
Does each user get their own set of
tables or are we storing multiple
user data into common tables? Since
this even a social network, when
user sizes grows for scalability
databases are usually partitioned
off. Different sets of users are
sent separately, so what is the best
approach? I guess some data like
user accounts can be in common
tables but wall posts, photos etc
each user will get their own table?
If so, then if we have 10 million
users then that means 10 million x
what ever number of tables per user?
This is currently being designed in
MySQL
How does the user tables know what
to create each time a user joins the
site? I am assuming there may be a
system table template from which it
is pulling in the fields?
In addition to the above question,
if tomorrow we modify tables,
add/remove features, to roll the
changes down to all the live user
accounts/tables - I know from a page
point of view we have the master
template, but for the database, how
will the user tables be updated? Is
that something we manually do or the
table will keep checking like every
24 hrs with the system tables for
updates to its structure?
If the above is all true, that means we are maintaining 1 master set of tables with system default values, then each user get the same value copied to their tables? Some fields like say Maximum failed login attempts before system locks account. One we have a system default of 5 login attempts within 30 minutes. But I want to allow users also to specify their own number to customize their won security, so that means they can overwrite the system default in their own table?
Thanks.
Users should not get their own set of tables. It will most likely not perform as well as one table (properly indexed), and schema changes will have to be deployed to all user tables.
You could have default values specified on the table for things that are optional.
With difficulty. With one set of tables it will be a lot easier, and probably faster.
That sort of data should be stored in a User Preferences table that stores all preferences for all users. Again, don't duplicate the schema for all users.
Generally the idea of creating separate tables for each entity (in this case users) is not a good idea. If each table is separate querying may be cumbersome.
If your table is large you should optimize the table with indexes. If it gets very large, you also may want to look into partitioning tables.
This allows you to see the table as 1 object, though it is logically split up - the DBMS handles most of the work and presents you with 1 object. This way you SELECT, INSERT, UPDATE, ALTER etc as normal, and the DB figures out which partition the SQL refers to and performs the command.
Not splitting up the tables by users, instead using indexes and partitions, would deal with scalability while maintaining performance. if you don't split up the tables manually, this also makes that points 2, 3, and 4 moot.
Here's a link to partitioning tables (SQL Server-specific):
http://databases.about.com/od/sqlserver/a/partitioning.htm
It doesn't make any kind of sense to me to create a set of tables for each user. If you have a common set of tables for all users then I think that avoids all the issues you are asking about.
It sounds like you need to locate a primer on relational database design basics. Regardless of the type of application you are designing, you should start there. Learn how joins work, indices, primary and foreign keys, and so on. Learn about basic database normalization.
It's not customary to create new tables on-the-fly in an application; it's usually unnecessary in a properly designed schema. Usually schema changes are done at deployment time. The only time "users" get their own tables is an artifact of a provisioning decision, wherein each "user" is effectively a tenant in a walled-off garden; this only makes sense if each "user" (more likely, a company or organization) never needs access to anything that other users in the system have stored.
There are mechanisms for dealing with loosely structured types of information in databases, but if you find yourself reaching for this often (the most common method is called Entity-Attribute-Value), your problem is either not quite correctly modeled, or you may not actually need a relational database, in which case it might be better off with a document-oriented database like CouchDB/MongoDB.
Adding, based on your updated comments/notes:
Your concerns about the number of records in a particular table are most likely premature. Get something working first. Most modern DBMSes, including newer versions of MySql, support mechanisms beyond indices and clustered indices that can help deal with large numbers of records. To wit, in MS Sql Server you can create a partition function on fields on a table; MySql 5.1+ has a few similar partitioning options based on hash functions, ranges, or other mechanisms. Follow well-established conventions for database design modeling your domain as sensibly as possible, then adjust when you run into problems. First adjust using the tools available within your choice of database, then consider more drastic measures only when you can prove they are needed. There are other kinds of denormalization that are more likely to make sense before you would even want to consider having something as unidiomatic to database systems as a "table per user" model; even if I were to look at that route, I'd probably consider something like materialized views first.
I agree with the comments above that say that a table per user is a bad idea. Also, while it's a good idea to have strategies in mind now for how you can cope when things get really big, I'd concentrate on getting things right for a small number of users first - if no-one wants to / is able to use your service, then unfortunately you won't be faced with the problem of lots of users.
A common approach among very large sites is database sharding. The summary is: you have N instances of your database in parallel (on separate machines), and each holds 1/N of the total data. There's some shared way of knowing which instance holds a given bit of data. To access some data you have 2 steps, rather than the 1 you might expect:
Work out which shard holds the data
Go to that shard for the data
There are problems with this, such as: you set up e.g. 8 shards and they all fill up, so you want to share the data over e.g. 20 shards -> migrating data between shards.

Why use database schemas?

I'm working on a single database with multiple database schemas,
e.g
[Baz].[Table3],
[Foo].[Table1],
[Foo].[Table2]
I'm wondering why the tables are separated this way besides organisation and permissions.
How common is this, and are there any other benefits?
You have the main benefit in terms of logically groupings objects together and allowing permissions to be set at a schema level.
It does provide more complexity in programming, in that you must always know which schema you intend to get something from - or rely on the default schema of the user to be correct. Equally, you can then use this to allow the same object name in different schemas, so that the code only writes against one object, whilst the schema the user is defaulted to decides which one that is.
I wouldn't say it was that common, anecdotally most people still drop everything in the dbo schema.
I'm not aware of any other possible reasons besides organization and permissions. Are these not good enough? :)
For the record - I always use a single schema - but then I'm creating web applications and there is also just a single user.
Update, 10 years later!
There's one more reason, actually. You can have "copies" of your schema for different purposes. For example, imagine you are creating a blog platform. People can sign up and create their own blogs. Each blog needs a table for posts, tags, images, settings etc. One way to do this is to add a column
blog_id to each table and use that to differentiate between blogs. Or... you could create a new schema for each blog and fresh new tables for each of them. This has several benefits:
Programming is easier. You just select the approppriate schema at the beginning and then write all your queries without worrying about forgetting to add where blog_id=#currentBlog somewhere.
You avoid a whole class of potential bugs where a foreign key in one blog points to an object in another blog (accidental data disclosure!)
If you want to wipe a blog, you just drop the schema with all the tables in it. Much faster than seeking and deleting records from dozens of different tables (in the right order, none the less!)
Each blog's performance depends only (well, mostly anyway) on how much data there is in that blog.
Exporting data is easier - just dump all the objects in the schema.
There are also drawbacks, of course.
When you update your platform and need to perform schema changes, you need to update each blog separately. (Added yet later: This could actually be a feature! You can do "rolling udpates" where instead of updating ALL the blogs at the same time, you update them in batches, seeing if there are any bugs or complaints before updating the next batch)
Same about fixing corrupted data if that happens for whatever reason.
Statistics for all the platform together are harder to calculate
All in all, this is a pretty niche use case, but it can be handy!
To me, they can cause more problems because they break ownership chaining.
Example:
Stored procedure tom.uspFoo uses table tom.bar easily but extra rights would be needed on dick.AnotherTable. This means I have to grant select rights on dick.AnotherTable to the callers of tom.uspFoo... which exposes direct table access.
Unless I'm completely missing something...
Edit, Feb 2012
I asked a question about this: SQL Server: How to permission schemas?
The key is "same owner": so if dbo owns both dick and tom schema, then ownership chaining does apply. My previous answer was wrong.
There can be several reasons why this is beneficial:
share data between several (instances
of) an application. This could be the
case if you have group of reference
data that is shared between
applications, and a group of data
that is specific for the instance. Be careful not to have circular references between entities in in different schema's. Meaning don't have a foreign key from an entity in schema 1 to another entity in schema 2 AND have another foreign key from schema 2 to schema 1 in other entities.
data partitioning: allows for data to be stored on different servers
more easily.
as you mentioned, access control on DB level

Resources