I am building an application that stores the following: People, Places and Posts.
People can create Posts and live in a Place, and Posts also belong to a Place.
Users of the application when viewing posts will be able to see the location of the post that was made, e.g. London, UK. They will then be able to click on that place and see a list of other posts that are also posted in that location.
On the home page of the application I want to show a map that using geolocation will get the current users location and then show an overlay of bubbles of posts that have been posted near them that they can then click on to view that post.
e.g.
That all being said I'm trying to figure out the best way to build the database. This is the schema I have in my head so far:
**Posts**
id
title
datetime
content
author_id
**People**
id
firstname
lastname
**Places**
id
name
lon
lat
As you can see their is a relationship between the Posts and People with the user_id foreign key, but I also need to build a relationship between the Places and Posts and People, but I don't want data to get repeated, e.g. have London stored twice in the DB.
I have thought about doing a linker table but that could get messy as the id of a person and a post may be the same so I'd need some sort of additional id to tell them apart.
Can anyone offer any suggestions/best practices for building such an app?
Should I be even saving all this data in the places table as it would take a while to build up the locations so not sure how people like: http://www.touristeye.com/London-p-1066
Thanks
I think that your Places table is not quite right. For example, it suggests that a place such as New York would have a unique lat/long -- which is perfectly a sensible way to analyse the data for some applications but possibly not for yours. I'd suggest making lat and long attributes of Posts and model the relationship between Places and Posts some other way. I'd then modify Places to hold the attributes necessary to record some idea of the area that a Place occupies -- perhaps a simple polygon, perhaps something more complex.
If you are happy with a simple idea of Place, ie that every lat/long tuple is in only one Place (eg London) and that there is no interesting relationship between Places (eg Westminster is inside London) then you could model the relationship between Places and Posts by a foreign key. But this would mean that all Posts within a Place were given the same lat/long tuple, which may not be what you want at all.
At a guess, you probably don't intend to (or need to) implement anything approaching a spatial database so don't let the re-modelling of Places get out of hand.
EDIT after comment
It's too simple to think 'duplication of data is a bad thing'. For one, I don't think that you are duplicating data, for another there are reasons why you (or anyone else designing a database) might want to. Broadly speaking, those reasons relate to query performance. But turning to your issue:
I think that the location from which a post is made is not the same thing as a Place. From what you have written you want to, for example, record the lat/long of posts made 100ft apart but tie them to the same place (I'm guessing that Times Square is more than 100ft across). If you have a simple concept of Place you could implement the relationship by using a foreign key. But the definition of Place, in terms of lat/long, is independent of the locations of Posts made from within it. If you forced all posts made in Times Sq to have the same lat/long you would be losing information abut their precise location.
And losing information is another of those bad things that we are not supposed to do with databases (unless, of course, there is a good reason for it).
Related
I'm an intern student at a company that does both wiring and aircon services. The job that they gave me was to make a database for them. I don't have any experience in anything related to databases.
So, I started to look up videos and stuff to at least learn a bit about databases and made something that works and I made it after 1.5 months of learning.
in the database that I created,
I have 1 table (CustomerDetailsT):
CustomerID (pk)
CustomerName
PhoneNumber
Address
Aircond (type and model of ac,ex: WM daikin 1.0HP)
AcDetails (what has been done for the ac.)
Others (yes/no) (Wiring, installing a fan and so on)
WhatHasBeenDone (shows what has been done for others)
Then 3 queries (CustomerOthersDetailsQ, CustomerAcDetailsQ, CustomerDetailsQ).CustomerAcDetailsQ has CustomerName, PhoneNumber, Address, Aircond and AcDetails. CustomerOthersDetailsQ has CustomerName, PhoneNumber, Address, Others, and WhatHasBeenDone.CustomerDetailsQ has CustomerID, CustomerName, PhoneNumber and Address
And 1 form with 3 subforms.
it's a search form, which would search for customers as we're typing in their name/phone number and it will show what has been done for the customer.
With this, I have created what the company wants, but now they want to add dates. Dates which would show when we have done something for a customer. Dates for Aircond and the Others stuff.
I've tried with what I know and it didn't work. tried searching it on youtube and google, but still couldn't find it.
how can I go about doing this?. I have tried having separate tables for each service, but it became a hassle when I wanted to create a new customer. . I hope I could some help, I could send pictures if someone needs them.
[1]: https://i.stack.imgur.com/mtrmC.png [The Customer search form] [1]: https://i.stack.imgur.com/A3Y9d.png [example of a customer that has ac installation] [1]: https://i.stack.imgur.com/dsGL5.png [example of a customer that has both ac and wiring done]
Acknowledging the question is too broad, here is some guidance. One of the nice things about Access is that each database is a single file. First protect your work by finding that file and make two copies. Make a backup and a play around version. Only mess with the play around version.
Your question indicates you are still learning Table Normalization and 1 to many relationships. Both of these topics are general to all databases, so you don't have to restrict yourself to just Access when looking for guides and Youtube videos.
Part of normalization is putting separate entities into their own tables. Also, in Access there is a big payoff for using the Relationship Tool, so here is a rather lame example of normalization:
Make sure to select the checkboxes when setting up relationships.
WhatHasbeenDone should also have WhatHasbeenDoneDate. I've wrapped AC and Other as Unit because later it will be easier than having two WhatHasBeenDone tables(AC)(Other).
Now imagine someone taking the customer request call. They just want to see a form to enter the customer details, request, unit-type, etc. They don't want to see those tables. Even with training entering data in the tables is error prone. The person fulfilling the request just wants to enter what they did and when. That's how you start to figure out what your final Data entry forms will look like.
Since we normalized the tables and used the relationships tool, the payoff is Access can give us an assortment of working starter forms. Select Each Table and then hit Create and then hit Form. Choose your Favorites and start playing around from there. While playing, keep in mind that Access will not let you add an item on the many side of a relationship unless there is an item on the 1 side.
For example I selected the customers table and hit create form:
Access uses a concept of form and subform based on separate but related tables. So, to get a form that shows what has been done for each customer I created a form for the What has been done table, and dragged it onto the customers form:
Unless an ID is also being used as a part number or something there is probably no reason for the person entering data to see it. So I removed the texboxes bound to ID's. Except for UnitTypeID, where I replaced the textbox with a combobox that displays the userfriendly UnitDescription. The ID's are still part of the form recordsources, Access is still adding new IDs and using those IDs to put the appropriate data in the right tables.
Oh, didn't we need dates (went back and added a date to the table, and adjusted the subform accordingly). Also changed the subform format from single record to continuous records to show multiple dates:
In conclusion and in my opinion your final forms will use VBA behind the scenes to insert data from the forms into the tables. This is because either you will want to rapidly insert multiple records or How the end users think about the data will not match the default forms and subforms approach Access depends upon to figure out how to insert the data. However, the default approach is fast and I always use it for version 1 of my Access Databases.
P.S. For simplicity I avoided including any Many to Many relationships
For a project, I have a database with some tables.
All of them are related between them. The table organization has a relationship with offer and user, etc.
However, I have some columns that only serve to link two tables together.
Take a look at the diagram. I use users_interests to link the user to his interests. Same thing for badges and group.
It doesn't really feel efficient. When I try to get the interests of a users, I must first go through requesting the user interests and from that, require the interest details.
Is there a way that I can request the interests of an user without having to go through a second table ?
https://dbdiagram.io/d/5da3f119ff5115114db53551
There's a couple things you should consider when looking at this question/functionality and what you're going to be doing with the data.
Do you have a unique set of interests that will remain the same or be added to, without duplicates?
Do you need to have multiple interests for a single user?
If you answered "yes" to both of these questions, then you should leave things the way they are. You'll be able to reuse the same interests entry for multiple users as well as each user being able to have multiple interests.
If you only want one interest per user, but want to keep a "master list" (aka: lookup table) of interests, then you can move the interest_id to the users table.
If you don't care about duplicates, but you still want each user to have multiple interests, move the name column to the users_interests table.
Finally, if you don't care about duplicates and you want each user to have a single interest, move the name column to the users table.
FYI, how you currently have the tables structured will likely take up less disk space for a large amount of users.
It's unclear what you're trying to do, so I included all options. Each option has it's own set of pros and cons, and each will be required of some other project or another at some point. You might want to learn the difference and reasoning behind why you would use one option over another now, so you don't have to think too hard about it later.
You can do it if you will make interests like this
id
user_id
name
but then you will lose uniqness of your interests, you will have many rows with the same name only because they connect to different users. Now db is perfectly ok according my understanding...It is as it should be
I am working on a project (based in Django although that's not really relevant to my question) and I am struggling to work out the best way to represent the data models.
I have the four following models:
User,
Client,
Meeting,
Location
User and Client have a many-to-many relationship through the Meeting model. The Meeting model has a one-to-one relationship with the Location model.
Meetings will take place at either:
The address defined in the User (or UserProfile) model
The address defined in the Client model.
Some other location which has to be defined at a later date.
I'm struggling to work out the best way to store the Location data in order to make it as clean and reusable as possible.
I considered making Location as a field in the Meetings model rather than a model in its own right - although this could also lead to redundant data if lots of Meetings are created at the same location, so this is probably a non-starter.
I could automatically create Location records for each User and Client that gets created and use a generic relationship between the relevant records, however, I understand that this can lead to inefficient database performance. Also, not every Client / User would be able to hold meetings at their Location.
Can anyone see an tidier alternative?
Any advice appreciated.
Thanks.
I considered making Location as a field in the Meetings model rather
than a model in its own right - although this could also lead to
redundant data if lots of Meetings are created at the same location,
so this is probably a non-starter.
No, that's a really good thought, because it points you straight at the real problem.
The real problem is that there's a difference between a meeting and the parties that attend a meeting. A meeting has some attributes that have nothing to do with the attendees: it has at the very least a time and a place.
So I think you should change your thinking about the Meeting model.
Instead of users having a M:N relationship with clients through the Meeting model, they should have a M:N relationship through, say, an Attendance model. (A Registration or Reservation or MightAttend model might be more appropriate for you.) And the Meeting model should change to reflect the unique attributes of a real-world meeting: time and place.
I would expect Meetings and Locations to have a many-to-one relationship. Can't a location be used for more than one meeting? (at different times, of course)
It seems to me that a location has attributes that persist beyond its use for a single meeting. Example: seating capacity.
I have a form that reveals user IDs to public. I was wondering that is this dangerous. Personally I do not see anything bad about it. The ID is just used to reference a single database record.
If it were dangerous, Stack Overflow wouldn't be displaying user IDs in their URLs in order to make user profile lookups work: https://stackoverflow.com/users/104826/rfactor
Edit of seriousness of immense levels: if user IDs are themselves sensitive data; for example your primary keys for some reason happen to be social security numbers, that'll definitely be a security and privacy liability. If your user IDs are just auto-increment numbers though, you're clear.
Generally it's not a problem but it can give away hints on how active your site is, like how many users you have etc. If you consider this sensitive information or maybe even good marketing is completely up to you.
There's a story that this was one of the reasons the germans lost the WW2. They had sequential serial numbers from production written on each tank. By collecting id numbers from tanks taken out the british could estimate how many tanks the whole german army had and make new strategies from that.
I have found that exposing primary keys that identify physical entities can create headaches.
Imagine if two blood samples come into a laboratory and test results are generated for each sample. Many different kinds of test might be done and each record representing a test result will have the sample_id as a foreign key.
If you share the database ID with the customer and you discover that two samples were accidentally switched, you will have to update the foreign keys in all the detail records representing the tests. If you instead exposed some other unique name outside your system, you will just need to switch the two unique names on the sample records in the master table.
There are other advantages related to data migration and there are advantages when entities are represented in more than one database in which it is difficult to create records with identical database ID's.
In my experience it is always best to expose a unique identifier other than the primary key outside your system. It gives you more flexibility in resolving data mix-ups, dealing with data migration issues, and in otherwise future-proofing your system.
as For me ID is as dangerous as showing user name.
Exposing an user ID is not, in and of itself, bad. It depends on the level of privacy and security needed. If the user ID does not expose and cannot be tied to any other personal data that should otherwise be private, it may not be a problem.
But don't think that public user IDs can never be a problem.
Make sure you don't allow anyone to break in to any private data just by knowing user IDs. Facebook has had problems like that. Here's just one example. While revealing user IDs wasn't the whole story, it was part of the equation.
Will it hurt anything? Only you can decide that, and you should think that through.
But in general, it is poor form to display the User ID without having a business reason to do so. (Saves you work is probably not a good business reason.)
If it is a generated database id with no other meaning, it's not dangerous. Though I don't think revealing an id is elegant either. It's a technical detail and I can't understand why you would like to show it to users.
I am wondering when and when not to pull a data structure into a separate database table when it appears in several tables.
I have pulled the 12 attribute address structure into a separate table because I have a couple of different entities containing a single address in this format.
But how about my 3 attribute person name structure (given, middle, surname)?
Should this be put into its own table referenced with a foreign key for all the entities containing a name... e.g. the company table has a contact person name, the citizen table has a person name etc.
Are these best left as attributes in the main tables or should they be extracted?
I would usually keep the address on the Person table, unless there was an unusual need for absolutely uniform addresses on each entity, or if an entity could have an arbitrary number of addresses, or if addresses need to be shared between entities, or if it was a large enterprise product where I know I have to invest in infrastructure all over the place or I will end up gutting everything down the road.
Having your addresses in a seperate table is interesting because it's flexible, but in the context of a small project lacking a special need like the ones mentioned above, it's probably a slight waste. Always be aware of the balance between complexity and flexibility. Flexibility is important, but be discriminating... It's easy to invest way too much there!
In concrete terms, the times that I experimented with (for instance) one-to-one relationships for things like addresses, I ended up refactoring them back into the table because it introduced a bunch of headaches including more complex queries, dealing with situations where the address does not exist, etc. More entities also increases your cognitive load -- it makes the project harder to think about. In my case, it was an unecessary cost because there was no concrete need and, in truth, not even a gain in flexibility.
So, based on my experiences, I would "try" to keep the addresses in the same table, and I would definitely keep the names on them - again, unless there was a special need.
So to paraphrase Einstein, make it as simple as possible and no simpler. But in the short term, experiment. It's the best way to learn these lessons.
It's about not repeating information, so you don't want to store the same information in two places when one will do.
Another useful rule of thumb is one entity per table. If you find that one table contains, say, "person" AND "order" then you probably should split those into two tables.
And (putting myself at risk of repeating information...) you might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Creating a person entity across your data model will give you this present and future advantages -
The same person occurring as a contact, or individual in different contexts. Saves redundancy.
Info can be maintained and kept current with far-less effort.
Easier to search for a person and identify them - i.e. is it the same John Smith?
You can expand the information - i.e. maintain addresses for this person far more easily.
Programming will be more consistent and debugging will be easier as well.
Moves you closer to a 'self-documenting' system.
As a counterpoint to the other (entirely valid) replies: within your application's current structure, how likely will it be for a given individual (not just name, the actual "person" -- multiple people could be "John Smith") to appear in more than one table? The less likely this is to happen, the less likely you are to get benefits from normalization.
Another way to think of it is entities. Outside of labels (names), is their any overlap between "customer" entity and an "employee" entity?
Extract them. Your aim should be to have no repeating data in your database.
Read about Normalization
It really depends on the problem you are trying to solve. In general it is probably a good idea to have some sort of 'person' table which holds details of people. However, there are occasions where that is potentially a very bad idea.
One example would be if you are holding details of prescriptions written out to people by a doctor. In some countries it is a legal requirment that the prescription details are held with the name in which they were prescribed NOT the name the person is going under currently. For instance a woman might be prescribed a drug as miss X, but then she gets married and becomes Mrs Y. If you had a person table that was linked to the prescriptions table you would now have the wrong details and would possibly face legal consequences. In that case you would need to probably copy the relevant details of the person into the prescription table, even though this would be duplicating data.
So again - it depends on the problem you are trying to solve. Don't just blindly follow what people consider to be best practices. Understand your data and any issues surrounding it, then try to follow best practices that fit.
Depends on what you're using the database for.
If you want fast queries on your tables you should de-normalize your tables. Having to run multiple JOIN's will take longer and make your queries more complex.
On the other hand if your intention is to have a flexible storage database which is not meant to be hit with a ton of fast-response queries, then normalizing the tables by splitting them out into multiple xref'ed tables will provide more flexibility in your design and reduce the need for submitting duplicated data.
Since de-normalization is "optimization", I would suggest you normalize the tables first, index them properly and see if you're getting any bottlenecks on your queries. If so, flatten the affected tables where needed.
You should really consider your whole database structure and do a ER diagram (entity relationship diagram) first. OF COURSE there should be another table called "Person" where the concept of a person is stored...