Database Design - Best way to manage these relationships - database

I am working on a project (based in Django although that's not really relevant to my question) and I am struggling to work out the best way to represent the data models.
I have the four following models:
User,
Client,
Meeting,
Location
User and Client have a many-to-many relationship through the Meeting model. The Meeting model has a one-to-one relationship with the Location model.
Meetings will take place at either:
The address defined in the User (or UserProfile) model
The address defined in the Client model.
Some other location which has to be defined at a later date.
I'm struggling to work out the best way to store the Location data in order to make it as clean and reusable as possible.
I considered making Location as a field in the Meetings model rather than a model in its own right - although this could also lead to redundant data if lots of Meetings are created at the same location, so this is probably a non-starter.
I could automatically create Location records for each User and Client that gets created and use a generic relationship between the relevant records, however, I understand that this can lead to inefficient database performance. Also, not every Client / User would be able to hold meetings at their Location.
Can anyone see an tidier alternative?
Any advice appreciated.
Thanks.

I considered making Location as a field in the Meetings model rather
than a model in its own right - although this could also lead to
redundant data if lots of Meetings are created at the same location,
so this is probably a non-starter.
No, that's a really good thought, because it points you straight at the real problem.
The real problem is that there's a difference between a meeting and the parties that attend a meeting. A meeting has some attributes that have nothing to do with the attendees: it has at the very least a time and a place.
So I think you should change your thinking about the Meeting model.
Instead of users having a M:N relationship with clients through the Meeting model, they should have a M:N relationship through, say, an Attendance model. (A Registration or Reservation or MightAttend model might be more appropriate for you.) And the Meeting model should change to reflect the unique attributes of a real-world meeting: time and place.

I would expect Meetings and Locations to have a many-to-one relationship. Can't a location be used for more than one meeting? (at different times, of course)
It seems to me that a location has attributes that persist beyond its use for a single meeting. Example: seating capacity.

Related

Database Design ~ have no idea where to start

I have something that completely confuses me and I have no idea how to store this much data in a database. Below I'll explain exactly what I think I need to store in the database and how I plan to use that data (to store it efficiently).
Okay, so. I have a around 40 points on a grid. I'll call them "objects". They have information associated with them such as coordinates (x,y), ID, number, resources, and then a lot of other objects and an amount that "defends" that point on the grid. There are over 100 different types of units that can defend the point. These units can be owned by any number of players. ID and number can be derived from each other easily (so both may not need to be stored).
What I need to do, is store all this information every time I scan these points with the time I'm scanning them. I'll need to then take this information out of the database to create graphs of a player's units over time to see if it is increasing or decreasing. I'd also like to plot the objects total defense over time to track how that is changing overall.
The frequency I scan these objects can vary, to even be at most once a minute. I can't even conceive how I'll store all this information in a database.
Any help is appreciated! Ask any and all questions you need.. I know it's a wall of text, but please read it!
Edit: The number of objects on the grid can change at any instant. We can gain one or we can lose one.
The starting point is really to understand Entity Relationship Modeling. Although your requirements look very unique to you, in terms on an entity relationship model they are old hat. Basically
is all about the types of relationships between objects that matter. Learn about one-one, one-to-many, and many-to-many relationships. The entity model is the place to start, and some tools even let you generate the tables off this. Once you understand how a given relationship translates to relational database model you are on your way. For example, one team has many baseball players. So this is a one-to-many relationship. Once you get this it will be a lot easier to understand why you need foreign keys in tables, and also unique id per row etc. As you build out your tables remember to model the relationships first and all the attributes later.
The other approach is to design your object model first, using say UML. Still its about relationships, inheritance, composition etc which will also translate into a database design. But if you want to design off database, then entity relationship modeling is the way to go.

Geolocation and database schema

I am building an application that stores the following: People, Places and Posts.
People can create Posts and live in a Place, and Posts also belong to a Place.
Users of the application when viewing posts will be able to see the location of the post that was made, e.g. London, UK. They will then be able to click on that place and see a list of other posts that are also posted in that location.
On the home page of the application I want to show a map that using geolocation will get the current users location and then show an overlay of bubbles of posts that have been posted near them that they can then click on to view that post.
e.g.
That all being said I'm trying to figure out the best way to build the database. This is the schema I have in my head so far:
**Posts**
id
title
datetime
content
author_id
**People**
id
firstname
lastname
**Places**
id
name
lon
lat
As you can see their is a relationship between the Posts and People with the user_id foreign key, but I also need to build a relationship between the Places and Posts and People, but I don't want data to get repeated, e.g. have London stored twice in the DB.
I have thought about doing a linker table but that could get messy as the id of a person and a post may be the same so I'd need some sort of additional id to tell them apart.
Can anyone offer any suggestions/best practices for building such an app?
Should I be even saving all this data in the places table as it would take a while to build up the locations so not sure how people like: http://www.touristeye.com/London-p-1066
Thanks
I think that your Places table is not quite right. For example, it suggests that a place such as New York would have a unique lat/long -- which is perfectly a sensible way to analyse the data for some applications but possibly not for yours. I'd suggest making lat and long attributes of Posts and model the relationship between Places and Posts some other way. I'd then modify Places to hold the attributes necessary to record some idea of the area that a Place occupies -- perhaps a simple polygon, perhaps something more complex.
If you are happy with a simple idea of Place, ie that every lat/long tuple is in only one Place (eg London) and that there is no interesting relationship between Places (eg Westminster is inside London) then you could model the relationship between Places and Posts by a foreign key. But this would mean that all Posts within a Place were given the same lat/long tuple, which may not be what you want at all.
At a guess, you probably don't intend to (or need to) implement anything approaching a spatial database so don't let the re-modelling of Places get out of hand.
EDIT after comment
It's too simple to think 'duplication of data is a bad thing'. For one, I don't think that you are duplicating data, for another there are reasons why you (or anyone else designing a database) might want to. Broadly speaking, those reasons relate to query performance. But turning to your issue:
I think that the location from which a post is made is not the same thing as a Place. From what you have written you want to, for example, record the lat/long of posts made 100ft apart but tie them to the same place (I'm guessing that Times Square is more than 100ft across). If you have a simple concept of Place you could implement the relationship by using a foreign key. But the definition of Place, in terms of lat/long, is independent of the locations of Posts made from within it. If you forced all posts made in Times Sq to have the same lat/long you would be losing information abut their precise location.
And losing information is another of those bad things that we are not supposed to do with databases (unless, of course, there is a good reason for it).

Data Modeling Verification

Looking for advice on the best way to model the following generic requirements. Since these are just generic only basic entities/attributes are included in the model.
The purpose is to capture and list websites for businesses that may or may not have franchises.
A business may have zero, one, or many websites
Franchises (reason for including ExternalBusinessId) of the business may or may not share the same websites as the Business itself or other franchises
In my attempt to fulfill these requirements I removed ExternalBusinessId from the PK of Website. Not sure if it is that simple to meet these requirements, but it looks like it would still leave a lot of dups.
Another approach that I may need to take is to move the franchises to their own table which could make this problem easier to solve but complicate the rest of my model (not shown here). If having Franchises in their own table is the right approach I would rather go that route and go through the rest of the exercise of having that fit into my complete model. In my current model the way to handle businesses without franchises they are given a default ExternalBusinessId of 001.
Any thoughts?
Thanks
A franchise is a business.
The word franchise describes a relationship between two businesses.
Every business has zero, one, or many
websites.
If I understand you correctly, you seem to think something like this.
Storing franchises in a separate
table implies eliminating them from
the table "business".
Franchises are businesses. Store them in the table "business", just like every other business. Store the relationship between a franchise and its franchiser in another table.
Information related to the franchise as a business should reference a key in the table "business". Such information might include its mailing address and phone numbers.
Information related only to the franchise as a franchise should reference a key in the table of franchises. Such information might include the franchise license number and franchise termination date.

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Google Appengine: Is This a Good set of Entity Groups?

I am trying to wrap my head around Entity Groups in Google AppEngine. I understand them in general, but since it sounds like you can not change the relationships once the object is created AND I have a big data migration to do, I want to try to get it right the first time.
I am making an Art site where members can sign up as regular a regular Member or as one of a handful of non-polymorphic Entity "types" (Artist, Venue, Organization, ArtistRepresentative, etc). Artists, for example can have Artwork, which can in turn have other Relationships (Gallery, Media, etc). All these things are connected via References and I understand that you don't need Entity Groups to merely do References. However, some of the References NEED to exist, which is why I am looking at Entity Groups.
From the docs:
"A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller."
That said, I have a couple hopefully yes/no questions.
Question 0: I gather you don't need Entity Groups just to do transactions. However, since Entity Groups are stored in the same region of Big Table, this helps cut down on consistency issues and race conditions. Is this a fair look at Entity Groups and Transactions together?
Question 1: When a child Entity is saved, do any parent objects get implicitly accessed/saved? i.e. If I set up an Entity Group with path Member/Artist/Artwork, if I save an Artwork object, do the Member and Artist objects get updated/accessed? I would think not, but I am just making sure.
Question 2: If the answer to Question 1 is yes, does the accessing/updating only travel up the path and not affect other children. i.e. If I update Artwork, no other Artwork child of Member is updated.
Question 3: Assuming it is very important that the Member and its associated account type entity exist when a user signs up and that only the user will be updating its Member and associated account type Entity, does it make sense to put these in Entity Groups together?
i.e. Member/Artist, Member/Organization, Member/Venue.
Similarly, assuming only the user will be able to update the Artwork entities, does it make sense to include those as well? Note: Media/Gallery/etc which are references to Artwork may be related to lots of Artwork, not just those owned by the user (i.e. many to many relations).
It makes sense to have all the user's bits in an entity group if it works the way I suspect (i.e. Q1/Q2 are "no"), since they will all be in the same region of BigTable. However, adding the Artwork to the entity group seems like it might violate the "keep it small" principal and honestly, may not need to be in Transactions aside from saving bandwidth/retrys when users are uploading artwork images.
Any thoughts? Am I approaching Entity Groups wrong?
0: You do need entity groups for transactions among multiple entities
1: Modifying/accessing children does not modify/access a parent
2: N/A
3: Sounds reasonable. My feeling is, entity groups should not be used unless you need transactions among them.
It is not necessary to have the the Artwork as a child for permission purposes. But if you need transactional modification to them (including e.g. creation and deletion) it might be better. For example: if you delete an account, you delete the user entity but before you delete the child, you get DeadlineExceeded or the server crashes. Now you have an orphaned Artwork. If you have more than 1,000 Artworks for an Artist, you must delete in batches.
Good luck!

Resources