Aggregating all relations into one table SQL Server - sql-server

I'm trying to design an enterprise level database architecture. In ERD level I have an Issue.
Many of my tables have relations which each other. there may be some developments in the future and my design should be flexible and also fast on gathering the results.
In recent days I have created a Parent Table which is named Node and all of my Functional Tables has an one-to-one relation with this table.
(Functional Tables are those who keep real life datas like Content, User, Folder, Role, .... and not those who related to applications life-cycle)
So before adding a record to each table, We must add a Node into the Node Table and take the new NodeId to add into secondary table.
Node table alone, has a Many-To-Many relation with itself. so I designed this table to keep whole of my relation concerns.
All of the other entities are like the User and are related to the Node table as shown above.
Problem is: Does this design makes my relational queries faster on NodeAssoc table or It's better to keep relations separately ?

You say:
There may be some developments in the future and my design should be flexible and also fast on gathering the results.
Flexibility and performance are two separate things. Which have different ways to approach them or solve them. When you are designing a database, you have to concider database principles. Normalization is very important to keep in mind. Relations one-to-one and many-to-many are by design not common. In your case you are mentioning one-to-one and many-to-many relations, on which I have my worries.
Advice one -> Denormalize (merge) one-to-one tables to one table.
This reduces the amount of joins.
Advice two -> Introduce a bridge table on many-to-many table,
because there could be multiple matches. Fixing multiple matches means
complex queries, which leads to performance drop.
Advice three -> Use proper indexes in order to improve the performance
Increasing of flexibility can be through using Database Views, which is a query. The structure of the database may change in the future, while modifieing the view can be very fast too.

Related

hybris - one to many and many to many relationship

In our production sytem, we have an existing relationship which is one to many. We would like to change this relationship to many to many due to business/data reasons.
What steps we need to take without loosing data and with no impact to production data, as we need to change *-items.xml file within hybris system.
Appreciate your inputs.
Thanks!
The database structure for one-to-many and many-to-many is different. One-to-many records uses 1 table (the many records are saved in the one table), but many-to-many uses an extra table.
I suggest to export existing data, update the items.xml (with platform update), and reimport the data.

Database table creation for large data

I am making a client management application in which I am storing the data of employee , admin and company. In the future the database will have hundreds of companies registered. I am thinking to go for the best approach to database design.
I can think of 2 approaches:
Making all tables of app separately for each company
Storing all data in app database
Can you suggest the best way to do that?
Please note that all 3 tables are linked on the basis of ids and there will be hundreds of companies and each company will have many admin and each admin will have hundreds of employee . What would be the best approach to do with security and query performance
With the partial information you provided, it look like 3 normalized tables is what you need, plus the auxiliar data like lookups and other stuff.
But when you design a database you would need to consider many more point like, security, visibility, client access methods, etc
For example if you want to ensure isolation, and don't allow users to have any visibility to other's data, you could create dynamically a schema per company, create user and access rights for each schema dynamically. Then you'll need support these stuff in the DAL, which in fact will be quite fat.
Another approach for the DAl could be exposing views that always return subsets for one company.
A big reason reason that I would suggest going for the normalized approach is that maintenance will be much easier this way.
From a SQL point of view I don't see any performance advantage having many tables or just 3, efficiency of the indexes, and smart DAL will make the difference.
The performance of the query doesn't much depends on the size of table but it depends more on the indexes you have on that table. so you need to put clustered and non clustered indexes as per your requirement and i can guarantee that up to 10 GB of data you will not face any problem
This is a classic problem shared my most web business services: for discussions of the factors involved, Google "multi-tenant architecture."
You almost certainly want to put all companies into a common set of tables: each data table should reference the company key, and all queries should join on that key, among their other criteria. This allows the best overall performance, and saves you the potential maintenance nightmare of duplicating views, stored procedures and so on hundreds of times, or of having to apply the same structural changes to hundreds of tables should you wish to add a field or a table.
To help assure that you don't inadvertently intermingle data from different customers, it might be useful to do all data access through a validated set of stored procedures (all of which take the company ID as a parameter).
Hundreds of parallel databases will not scale very well: the DB server will constantly be pushing tables and indexes out of memory to accommodate the next query, resulting in disk thrashing and poor performance, as well. There is only pain down that path.
depending on the use-cases of your application there is no "best" way.
Please explain the operations your application will provide so we can get further insight into your problem.
The data to be stored seemed to be structured so a relational database at a first glance would work out well, but stick to the point i marked above.
You have not said how this data links at all or if there are even any links between them. However, at a guess, you need 3 tables.
EmployeeTable
AdminTable
CompanyTable
Each with the required properties in there, without additional information I'm not able to provide any more guidance.

Why are relational sets important?

A friend is developing a website, and has to make a database using SQL. He asks why do you need "has-a" or "is-a" relationships since you can take the primary keys of a one entity set and place it in the other appropriate entity set (and vice-versa) to find the relations.
I could not answer the question because I was just taught that relational sets are just how database works.
Edit: I did not want to go into normalization. He made a point that the information is replicated in the relationship set.
Your question mixes two different levels of abstraction together, namely the conceptual level and the logical level.
At the conceptual level, one is interested in describing the information requirements on the proposed database. It's useful to do this without tilting the description towards one solution or another. One model that is useful for this purpose is the Entity-Relationship (ER) model. In this model, the subject matter is broken down into entities (subjects) and relationships among those entities. All data is seen as describing some aspect of one of the entites or one of the relationships.
"Is-a" and "has-a" relationships are relevant at this level of abstraction. At this level, relationships are identified, but not implemented.
After creating a conceptual model of the database, but before creating the database itself, it's useful to go through a logical design phase, resulting in a logical model of the database. If the database is to be relational, it's useful to make the logical model a relational one. The relational model is the next level of abstraction.
This is where primary keys and foreign keys come in. These keys implement the relationships that were identified at the conceptual stage. This is how the relational model implements relationships. At this stage, you get involved with design issues like junction tables, table composition, and normalization.
In addition to the conceptual level and the logical level, there are the physical level and the script level. But these are outside the scope of your question.
The two kinds of relationships are features of the problem to be solved. foreign key references to primary keys are features of the proposed solution.

Referential Integrity and HBase

One of the first sample schemas you read about in the HBase FAQ is the Student-Course example for a many-many relationship. The schema has a Courses column in the Student table and a Students column in the Course table.
But I don't understand how in HBase you guarantee integrity between these two objects. If something were to crash between updating one table and before another, we'd have a problem.
I see there is a transaction facility, but what is the cost of using this on what might be every Put? Or are there other ways to think about the problem?
We hit the same issue.
I have developed a commercial plugin for hbase that handles transactions and the relationship issues that you mention. Specifically, we utilize DataNucleus for a JDO Compliant environment. Our plugin is listed on this page http://www.datanucleus.org/products/accessplatform_3_0/datastores.html or you can go directly to our small blog http://www.inciteretail.com/?page_id=236.
We utilize JTA for our transaction service. So in your case, we would handle the relationship issue and also any inserts for index tables (Hard to have an app without index lookup and sorting!).
Without an additional log you won't be able to guarantee integrity between these two objects. HBase only has atomic updates at the row level. You could probably use that property though to create a Tx log that could recover after a failure.
If you have to perform two INSERTs as a single unit of work, that means you have to use a transaction manager to preserve ACID properties. There's no other way to think about the problem that I know of.
The cost is less of a concern that referential integrity. Code it properly and don't worry about performance. Your code will be the first place to look for performance problems, not the transaction manager.
Logical relational models use two main varieties of relationships: one-to-many and
many-to-many. Relational databases model the former directly as foreign keys (whether
explicitly enforced by the database as constraints, or implicitly referenced by your
application as join columns in queries) and the latter as junction tables (additional
tables where each row represents one instance of a relationship between the two main
tables). There is no direct mapping of these in HBase, and often it comes down to de-
normalizing the data.
The first thing to note is that HBase, not having any built-in joins or constraints,
has little use for explicit relationships. You can just as easily place data that is one-to-
many in nature into HBase tables:. But
this is only a relationship in that some parts of the row in the former table happen to
correspond to parts of rowkeys in the latter table. HBase knows nothing of this rela-
tionship, so it’s up to your application to do things with it (if anything).

Is it acceptable to cross between databases?

I'm not sure what this practice is actually called, so perhaps someone can edit the title to more accurately reflect my question.
Let's say we have a site that stores objects of different types. Each type of object has its own database (a database of books and assorted information with its tables, a database of CDs and information with its tables, and so on). However, all of the objects have keywords and the keywords should be uniform across all objects, regardless of type. A new database with a few tables is made to store keywords, however each object database is responsible for mapping the object ID to a keyword.
Is that a good practice?
Is there a reason to have separate databases for each type of object? You would be better off using multiple tables, and joining them. For example, you may have a table GENERIC_OBJECT which holds things that are common across all types, and then a table called BOOK_OBJECT where BOOK_OBJECT.ID = GENERIC_OBJECT.ID for a given book. Another table would be CD_OBJECT where CD_OBJECT.ID = GENERIC_OBJECT.ID for a given CD. Then things like keywords that are common across all objects would be stored in the GENERIC_OBJECT table, and things that are specific to the item would go in the item's corresponding table.
By separating them into different databases, you lose:
the ability to do ACID transactions (assuming you aren't using a two-phase commit solution).
the ability to have referential integrity.
JOINs across tables.
Thomas, what you're missing in your comment responses to our concerns about referential integrity is that you can't do a foriegn key across two databases. If the two tables are in one database, then you can use foriegn key constraints to ensure that when you delete an object, anything that relies upon its object id is also deleted, and other similar things.
While it is possible to do joins across databases, I wouldn't generally split the data across databases just because they are of slightly different categories. Others have also mentioned the inability to use referential integrity across databases.
On the other hand, if each type of product has radically different front-end applications, or if you expect each database to become massively large, those might be reasons to consider leaving them in separate databases. (Although scaling isn't a problem for most modern databases).
Syntax example for cross-database joins:
SELECT *
FROM books b
INNER JOIN KeywordDB.dbo.Keywords k
ON b.keywordID = k.keywordID
In this example, you are performing the query from the local database that contains the books table, and you are joining to the other database. (This is a MS SQL syntax example)
No, it's a bad idea. By separating them into different databases, you significantly impair your ability to do JOIN queries.
It does seem a little bit too seperated but with some well designed views it could work especially if the views are simply lookups.
Why such seperation in the first place?
As everyone has mentioned this is, in general, not a good idea. However, to play devils advocate, I've seen other developers do this. I'm sure that there are some reasons that one might want accomplish this however if absolutely needed (not sure if your asking for solutions but) you might want to use some sort of synchronization to keep the data synchronized. Have all (or what is needed) of the data in both databases.
This also isn't an ideal solution, but if you must uses two different database types, this might be a better way to go about such a thing.
It could at least solve the issues that everyone has been outlining – though keep in mind that it does present a new problem… Is everything in sync?
Good Luck,Frank
If the decision about whether to use one database or two is yours, I recommend going with just one database. The data in the two tables appears closely related, judging from your question. The size and complexity doesn't seem to merit splitting into two databases.
What's your DBMS? If it's Oracle, DB2, SQL Server, or even MS Access, you shouldn't have any trouble administering a single database with keyword data and object data in logically related tables.

Resources