How to avoid repeated information in multiple tables - database

First of all, sorry for my english, is not my native language.
Well, I have a problem with my database design, I mean:
I have a Java Web Application (using JSP, Servlets, Classes and Mysql RDBMS) and I have been storing data about properties and it's owners mainly (there are others entities but those are the important here).
Now, I need to create a new module to store data about events, something like this:
Event: name, location, date, topic, etc.
Participants: Identification, name,location, type of participant (speaker or assistant)
I've been thinking in my database design and most of the assistants are already stored in the Owner entity, but other donĀ“t.
The problem is:
If I create an assistant entity, I'm going to repeat the same data which is alreade stored in Owner Entity (for those assistants that are already created as owners). So, if later I need to store data about surveyers or sales person or whatever, I'm going to have the same data in different tables.
I was thinking on create a Person Entity, and use it to store the properties that are common between assistants and Owners (even for my user table) but I have read about inheritance in a database and people say it's not a good practice for database design purposes.
How can I solve this problem?
What's the best practice in this case?

Related

How to persist data in microservices?

I am getting started in microservices architectures and I have a couple of questions about the data persistence and databases.
So my understanding is each microservice has it's own database (not necessarily, but usually). But given that case, consider a usual social media platform with users, posts and comments. There will be two microservices, a user's microservice and a posts' microservice. The user's database have a users table and the posts' database has posts and comments tables.
My question is on the posts microservice, because each post and comment has an author, so usually we would create the foreign key pointing to the user's table, however this is in a different database. What to do then? From my perspective there are 2 options:
Add the authorId entry to the table but not the foreign key constrain. If so, what would happen in the application whenever we retrieve that user's data from the user's microservice using the authorId and the user's data is gone?
Create an author's table in the posts' database. If so, what data should that table contain other than the user's id?
It just doesn't feel right to duplicate the data that is already in the user's database but it also doesn't feel right to use the user's id without the FK constraint.
One thing to note, data growth is quite different
Users -> relatively static data.
Posts & Comments -> Dynamic and could be exponentially high compared to users data.
Two microservices design looks good. I would prefer option-1 from your design.
Duplication is not bad, In normal database design this is normal to have "Denormalization" for better read performance. This is also helping in decoupling from users table , may help you to choose different database if require. some of your question what if users data is missing and posts is available, this can be handle with business logic and API design.

Designing inventory management database?

I am designing data base for inventory management system which is used by nearly 10 to 15 companies. This database contains nearly 25 tables.For designing database i'm planning to use shared schema architecture(ie each schema corresponding to a company and these all schemas are to be placed in a single database).
i want to know whether it is reliable to use shared schema architecture.
can any one please tell me is it correct decision to use above mentioned architecture.
Thanks in advance..
If I read your question, you are suggesting that each company has its own schema. This means two things:
If you decide to implement a basic change in the schema (ie not a change that one company requests), then you will have to implement this change in all the schemae.
You will probably have to implement different logic in your front
end program for each company.
Better you should develop one schema for the entire database; each table would have a field called 'CompanyID' which naturally would define to which company each row belongs. This field would be a foreign key to the Companies table.

need of a separate database

I am working on my first web project. I have referenced many tutorials and pdfs but all those had simple examples for the login and sign-up feature for a webpage, which only used a single database. I am having a massive confusion on whether or not, the login and sign-up should have separate databases.
My main question is : The project intakes user's personal information(name, email, address, telephone number, etc.) along with information specific to their vehicles (model, company, make, manufacture date, etc.). And after logging into the website, both these data's are important but only some of them are in use like, the user's name, his/her address, the model of vehicle, and the company. So should I maintain separate databases for both of them and reference each element with a foreign key while working on databases ?? Or should i just bother less and use a single database and complete my login and sign-up function ??, because with the no. of columns that I have apparently is very large.
This might be a bit too academic, but a word you'll want to learn well is normalization. Here is a link to a pretty stiff definition: https://en.wikipedia.org/wiki/Database_normalization
This being your first web project, my advice would the following:
Don't be afraid to make mistakes. I would strongly encourage trying approaches you think are good and then don't be afraid to change your mind. The lessons learned will stick with you.
Keep everything simple up front. Only add complexity when you need it.
Definitely don't be afraid to grow horizontally with tables (add more and more tables). When I first started working with databases I was afraid to have too many tables because it felt wrong. Try to resist the temptation to cram everything in one table.
Definitely separate login, users and vehicle information. Not a bad idea to also separate out user address information since people can have more than one address.
You must use the same database for holding all the information for your project. Two different database is not really good idea , you can create many tables in an database. and each table is designed to hold different information.In case of your example you may choose the following tables in the same database
UserLogin [store login information]
User [ store personal info]
Vehicle
and so on
There must be one to one relationship between UserLogin and User table and one to many in user - Vehicle table
One user may have many Vehicle
Hopefully it will help

how to tell if specifications are modelled using database oriented approach or class design oriented approach

Given a problem specification, how to tell if it is a database design problem or class design(object oriented design) problem?
What comes to my mind, is that in OOP, classes(objects) contain methods, whereas a database is just a collection of relationships and values.
Therefore:
If you can say a problem is about how "things" in the specification relate to each other you have a database design problem.
If it is about what the "things" in the specification can do, you're going to be modeling more along object oriented programming.
If you're using a database and creating domain objects, it's both. Database design and class design are two different things, and both are necessary if you're using a database and classes. It's not like you choose one or the other.
This is where an ORM comes into play. When your data layer retrieves information from the database, a typical approach is to transform the relational data into your domain object(s) and pass that to the business logic layer so the rest of your application can deal with domain objects instead of a relational model.
Then your ORM does the opposite when persisting data: it takes a domain entity and turns it back into a relational structure that can be saved to the database.
Note: I'm assuming a relational database here. If not, substitute relational for whatever type of persistence layer you're using.
I believe that the only specifications which should be addressed as database-oriented problems are those which are focused on the manipulation of structured data types. If your specification is all about "store a customer record", "delete an order record", "change the value of price from 12 to 33 for record matching specifcation", you've got a database project.
I haven't seen that kind of problem specification since the Cobol team I worked in employed a systems ~~anarchist~~ analyst. Almost every project I've worked on since has had requirements that were not about how data was stored, but what the data meant.
If you get a requirement that says "Users may create Customers. Customers can place orders. Orders contain products. Orders can have delivery methods, payment methods, and status. Status follows a business process", you have an OO problem. You probably need a storage mechanism - and a database would be an excellent choice - but you have business logic that cannot be exclusively implemented by creating structured data types and relationships.

Why use database schemas?

I'm working on a single database with multiple database schemas,
e.g
[Baz].[Table3],
[Foo].[Table1],
[Foo].[Table2]
I'm wondering why the tables are separated this way besides organisation and permissions.
How common is this, and are there any other benefits?
You have the main benefit in terms of logically groupings objects together and allowing permissions to be set at a schema level.
It does provide more complexity in programming, in that you must always know which schema you intend to get something from - or rely on the default schema of the user to be correct. Equally, you can then use this to allow the same object name in different schemas, so that the code only writes against one object, whilst the schema the user is defaulted to decides which one that is.
I wouldn't say it was that common, anecdotally most people still drop everything in the dbo schema.
I'm not aware of any other possible reasons besides organization and permissions. Are these not good enough? :)
For the record - I always use a single schema - but then I'm creating web applications and there is also just a single user.
Update, 10 years later!
There's one more reason, actually. You can have "copies" of your schema for different purposes. For example, imagine you are creating a blog platform. People can sign up and create their own blogs. Each blog needs a table for posts, tags, images, settings etc. One way to do this is to add a column
blog_id to each table and use that to differentiate between blogs. Or... you could create a new schema for each blog and fresh new tables for each of them. This has several benefits:
Programming is easier. You just select the approppriate schema at the beginning and then write all your queries without worrying about forgetting to add where blog_id=#currentBlog somewhere.
You avoid a whole class of potential bugs where a foreign key in one blog points to an object in another blog (accidental data disclosure!)
If you want to wipe a blog, you just drop the schema with all the tables in it. Much faster than seeking and deleting records from dozens of different tables (in the right order, none the less!)
Each blog's performance depends only (well, mostly anyway) on how much data there is in that blog.
Exporting data is easier - just dump all the objects in the schema.
There are also drawbacks, of course.
When you update your platform and need to perform schema changes, you need to update each blog separately. (Added yet later: This could actually be a feature! You can do "rolling udpates" where instead of updating ALL the blogs at the same time, you update them in batches, seeing if there are any bugs or complaints before updating the next batch)
Same about fixing corrupted data if that happens for whatever reason.
Statistics for all the platform together are harder to calculate
All in all, this is a pretty niche use case, but it can be handy!
To me, they can cause more problems because they break ownership chaining.
Example:
Stored procedure tom.uspFoo uses table tom.bar easily but extra rights would be needed on dick.AnotherTable. This means I have to grant select rights on dick.AnotherTable to the callers of tom.uspFoo... which exposes direct table access.
Unless I'm completely missing something...
Edit, Feb 2012
I asked a question about this: SQL Server: How to permission schemas?
The key is "same owner": so if dbo owns both dick and tom schema, then ownership chaining does apply. My previous answer was wrong.
There can be several reasons why this is beneficial:
share data between several (instances
of) an application. This could be the
case if you have group of reference
data that is shared between
applications, and a group of data
that is specific for the instance. Be careful not to have circular references between entities in in different schema's. Meaning don't have a foreign key from an entity in schema 1 to another entity in schema 2 AND have another foreign key from schema 2 to schema 1 in other entities.
data partitioning: allows for data to be stored on different servers
more easily.
as you mentioned, access control on DB level

Resources