hybris - one to many and many to many relationship - database

In our production sytem, we have an existing relationship which is one to many. We would like to change this relationship to many to many due to business/data reasons.
What steps we need to take without loosing data and with no impact to production data, as we need to change *-items.xml file within hybris system.
Appreciate your inputs.
Thanks!

The database structure for one-to-many and many-to-many is different. One-to-many records uses 1 table (the many records are saved in the one table), but many-to-many uses an extra table.
I suggest to export existing data, update the items.xml (with platform update), and reimport the data.

Related

EDR M:N relationship with multiple dependencies

I am in process of designing a reporting tool. the Interface will be C# with backend database. The tool will allow to enter and edit data through an interface and save it to the Database. Additionally, it will provide specific reports, based on the data retrieved from DB.
Currently, I have been trying to solve a M:N relationship in for my DB tables.
The tool lets a user to enter daily Item amounts (Steel and Mesh) based on a Project. I have solved the M:N relationship in the following diagram but I am not sure if this is actually possible and whether I need to break down the daily stats table further, due to a composite key containing 4 PKs from other tables. This is the current diagram i got.
I am wondering whether the diagram has solved the M:N relationship correctly and whether there is a better way to utilise the date table.

Aggregating all relations into one table SQL Server

I'm trying to design an enterprise level database architecture. In ERD level I have an Issue.
Many of my tables have relations which each other. there may be some developments in the future and my design should be flexible and also fast on gathering the results.
In recent days I have created a Parent Table which is named Node and all of my Functional Tables has an one-to-one relation with this table.
(Functional Tables are those who keep real life datas like Content, User, Folder, Role, .... and not those who related to applications life-cycle)
So before adding a record to each table, We must add a Node into the Node Table and take the new NodeId to add into secondary table.
Node table alone, has a Many-To-Many relation with itself. so I designed this table to keep whole of my relation concerns.
All of the other entities are like the User and are related to the Node table as shown above.
Problem is: Does this design makes my relational queries faster on NodeAssoc table or It's better to keep relations separately ?
You say:
There may be some developments in the future and my design should be flexible and also fast on gathering the results.
Flexibility and performance are two separate things. Which have different ways to approach them or solve them. When you are designing a database, you have to concider database principles. Normalization is very important to keep in mind. Relations one-to-one and many-to-many are by design not common. In your case you are mentioning one-to-one and many-to-many relations, on which I have my worries.
Advice one -> Denormalize (merge) one-to-one tables to one table.
This reduces the amount of joins.
Advice two -> Introduce a bridge table on many-to-many table,
because there could be multiple matches. Fixing multiple matches means
complex queries, which leads to performance drop.
Advice three -> Use proper indexes in order to improve the performance
Increasing of flexibility can be through using Database Views, which is a query. The structure of the database may change in the future, while modifieing the view can be very fast too.

Using GAE datastore relationship vs flat kind design

We have a requirement to implement in GAE datastore. There are set of documents (in millions) and each document has a owner, some comments and revisions associated with it.
If the owner of document is leaving the organization, then we need to change the ownership of the document to the person who did last revision. Also we need to retain the revisions and comments for each document. This ownership change is to be implemented by a job which will process each and every document one by one.
Is it the right approach to have Parent-Child relationships between the entities Document,Comment and Revision like Document is the parent with Comment and Revision as its child? OR in typical NoSql way we need to flatten the table and make a single entity?
The typical NoSQL implementation needs only insert and read but no updates. Is this the way the Google datastore works? Please clarify.
Our research says that we can have relationship but that will look more like RDBMS.
To choose proper schema design, you should clarify how you plan to work with data and keep in mind datastore limitations. In brief:
NoSql approach (single entity)
one update per second per entity group
you read and write the whole entity every time (except for projection queries)
Parent-child relations (ancestor relationships)
cannot be changed in future
form single entity-group so you achieve strong consistency while reading the query
one update per second per entity group! (So if you have a case with lots of live comments this wont work for you)
RDBMS approach (tables and relations)
datastore has no joins on multiple tables (so only split data in tables where you are not intending to query together)
eventually consistent reads

Few database design questions relating to user content site

Designing a user content website (kind of similar to yelp but for a different market and with photo sharing) and had few databse questions:
Does each user get their own set of
tables or are we storing multiple
user data into common tables? Since
this even a social network, when
user sizes grows for scalability
databases are usually partitioned
off. Different sets of users are
sent separately, so what is the best
approach? I guess some data like
user accounts can be in common
tables but wall posts, photos etc
each user will get their own table?
If so, then if we have 10 million
users then that means 10 million x
what ever number of tables per user?
This is currently being designed in
MySQL
How does the user tables know what
to create each time a user joins the
site? I am assuming there may be a
system table template from which it
is pulling in the fields?
In addition to the above question,
if tomorrow we modify tables,
add/remove features, to roll the
changes down to all the live user
accounts/tables - I know from a page
point of view we have the master
template, but for the database, how
will the user tables be updated? Is
that something we manually do or the
table will keep checking like every
24 hrs with the system tables for
updates to its structure?
If the above is all true, that means we are maintaining 1 master set of tables with system default values, then each user get the same value copied to their tables? Some fields like say Maximum failed login attempts before system locks account. One we have a system default of 5 login attempts within 30 minutes. But I want to allow users also to specify their own number to customize their won security, so that means they can overwrite the system default in their own table?
Thanks.
Users should not get their own set of tables. It will most likely not perform as well as one table (properly indexed), and schema changes will have to be deployed to all user tables.
You could have default values specified on the table for things that are optional.
With difficulty. With one set of tables it will be a lot easier, and probably faster.
That sort of data should be stored in a User Preferences table that stores all preferences for all users. Again, don't duplicate the schema for all users.
Generally the idea of creating separate tables for each entity (in this case users) is not a good idea. If each table is separate querying may be cumbersome.
If your table is large you should optimize the table with indexes. If it gets very large, you also may want to look into partitioning tables.
This allows you to see the table as 1 object, though it is logically split up - the DBMS handles most of the work and presents you with 1 object. This way you SELECT, INSERT, UPDATE, ALTER etc as normal, and the DB figures out which partition the SQL refers to and performs the command.
Not splitting up the tables by users, instead using indexes and partitions, would deal with scalability while maintaining performance. if you don't split up the tables manually, this also makes that points 2, 3, and 4 moot.
Here's a link to partitioning tables (SQL Server-specific):
http://databases.about.com/od/sqlserver/a/partitioning.htm
It doesn't make any kind of sense to me to create a set of tables for each user. If you have a common set of tables for all users then I think that avoids all the issues you are asking about.
It sounds like you need to locate a primer on relational database design basics. Regardless of the type of application you are designing, you should start there. Learn how joins work, indices, primary and foreign keys, and so on. Learn about basic database normalization.
It's not customary to create new tables on-the-fly in an application; it's usually unnecessary in a properly designed schema. Usually schema changes are done at deployment time. The only time "users" get their own tables is an artifact of a provisioning decision, wherein each "user" is effectively a tenant in a walled-off garden; this only makes sense if each "user" (more likely, a company or organization) never needs access to anything that other users in the system have stored.
There are mechanisms for dealing with loosely structured types of information in databases, but if you find yourself reaching for this often (the most common method is called Entity-Attribute-Value), your problem is either not quite correctly modeled, or you may not actually need a relational database, in which case it might be better off with a document-oriented database like CouchDB/MongoDB.
Adding, based on your updated comments/notes:
Your concerns about the number of records in a particular table are most likely premature. Get something working first. Most modern DBMSes, including newer versions of MySql, support mechanisms beyond indices and clustered indices that can help deal with large numbers of records. To wit, in MS Sql Server you can create a partition function on fields on a table; MySql 5.1+ has a few similar partitioning options based on hash functions, ranges, or other mechanisms. Follow well-established conventions for database design modeling your domain as sensibly as possible, then adjust when you run into problems. First adjust using the tools available within your choice of database, then consider more drastic measures only when you can prove they are needed. There are other kinds of denormalization that are more likely to make sense before you would even want to consider having something as unidiomatic to database systems as a "table per user" model; even if I were to look at that route, I'd probably consider something like materialized views first.
I agree with the comments above that say that a table per user is a bad idea. Also, while it's a good idea to have strategies in mind now for how you can cope when things get really big, I'd concentrate on getting things right for a small number of users first - if no-one wants to / is able to use your service, then unfortunately you won't be faced with the problem of lots of users.
A common approach among very large sites is database sharding. The summary is: you have N instances of your database in parallel (on separate machines), and each holds 1/N of the total data. There's some shared way of knowing which instance holds a given bit of data. To access some data you have 2 steps, rather than the 1 you might expect:
Work out which shard holds the data
Go to that shard for the data
There are problems with this, such as: you set up e.g. 8 shards and they all fill up, so you want to share the data over e.g. 20 shards -> migrating data between shards.

Referential Integrity and HBase

One of the first sample schemas you read about in the HBase FAQ is the Student-Course example for a many-many relationship. The schema has a Courses column in the Student table and a Students column in the Course table.
But I don't understand how in HBase you guarantee integrity between these two objects. If something were to crash between updating one table and before another, we'd have a problem.
I see there is a transaction facility, but what is the cost of using this on what might be every Put? Or are there other ways to think about the problem?
We hit the same issue.
I have developed a commercial plugin for hbase that handles transactions and the relationship issues that you mention. Specifically, we utilize DataNucleus for a JDO Compliant environment. Our plugin is listed on this page http://www.datanucleus.org/products/accessplatform_3_0/datastores.html or you can go directly to our small blog http://www.inciteretail.com/?page_id=236.
We utilize JTA for our transaction service. So in your case, we would handle the relationship issue and also any inserts for index tables (Hard to have an app without index lookup and sorting!).
Without an additional log you won't be able to guarantee integrity between these two objects. HBase only has atomic updates at the row level. You could probably use that property though to create a Tx log that could recover after a failure.
If you have to perform two INSERTs as a single unit of work, that means you have to use a transaction manager to preserve ACID properties. There's no other way to think about the problem that I know of.
The cost is less of a concern that referential integrity. Code it properly and don't worry about performance. Your code will be the first place to look for performance problems, not the transaction manager.
Logical relational models use two main varieties of relationships: one-to-many and
many-to-many. Relational databases model the former directly as foreign keys (whether
explicitly enforced by the database as constraints, or implicitly referenced by your
application as join columns in queries) and the latter as junction tables (additional
tables where each row represents one instance of a relationship between the two main
tables). There is no direct mapping of these in HBase, and often it comes down to de-
normalizing the data.
The first thing to note is that HBase, not having any built-in joins or constraints,
has little use for explicit relationships. You can just as easily place data that is one-to-
many in nature into HBase tables:. But
this is only a relationship in that some parts of the row in the former table happen to
correspond to parts of rowkeys in the latter table. HBase knows nothing of this rela-
tionship, so it’s up to your application to do things with it (if anything).

Resources