What is a good way to model city, regions and postcodes? - database

I am thinking about the "best" way to model city, region, postcodes for usage in different countries. (Application should be used in different languages)
My first guess is:
class City {
String name
String region
static hasMany = [postcodes:PostCode]
static belongsTo = [region:Region]
static constraints = {
name size:2..100
region nullable:true
}
}
Now the questions are:
Can one city belong to more than one region? (States/Bundesländer/Communities/...)
As all of this is final, no new cities, regions or postCodes I thought about not using the database and put it into enums!?
As I want to separate data for different countries I have to add this to a city. I took a look at multiTenant-plugin but I am not quite sure if this isn't to much!?
Probably someone has dealt with this before and can share some insights.
Thanks a lot
Sebastian

Can one city belong to more than one
region?
(States/Bundesländer/Communities/...)
That depends on you. You haven't explicitly stated what a region is. A region could be the suburbs of x city or Western Europe depending on how you define it. If a region isn't defined by state boundaries, then it's possible that a city can belong to more than one region. For example, "Paris" could be contained in the "France" region as well as the "Western Europe" region.
A simple solution would be to limit regions to state boundaries, but that might not be a good solution depending on what you need the data for. You could create an array of strings to store your regions, but it's difficult to pick an alternative if you don't say what the data is needed for. Based off of what you've written, I think a separate "String country" would be appropriate since you can then sort by country, and your regions can remain as they are.
As all of this is final, no new
cities, regions or postCodes I thought
about not using the database and put
it into enums!?
You could do that, but it's much easier to manage in a DB.
As I want to separate data for
different countries I have to add this
to a city. I took a look at
multiTenant-plugin but I am not quite
sure if this isn't to much!?
I'm haven't used the plugin so I can't help you there, but if you just want to separate data by country, then adding a country field and sorting would be pretty straightforward.

Related

How to efficiently save and retrieve City, Region, Country

I need to save and lookup information regarding the Place of my database entities. Place can be either a city, region or country.
I am looking the most efficient solution to save and retrieve this information, both in terms of speed but also in development time.
I identified two viable options, but both have significant drawbacks.
DYI Approach: populate (and maintain) 3 tables, one for world
cities, one for world regions and one for world countries. Cities would have foreign key reference to regions and regions to countries. If some countries do not have regions, I would simply link the city straight to the country. The problem with this approach is that Cities and Regions tables would be massive and simply impractical to maintain. Also I haven't found a good resource to populate the data in the first place..
Rely on third party API: another approach I thought about was to rely on the google place API. I could use the API to search for cities, regions and countries only and as a result I get a JSON in this format :
{
...
"geometry": {
...
},
"icon": "https://icons/geocode-71.png",
"id": "3a4638f694c001b0aa4f710e9b7925bf914e4f56",
"name": "Arco"
}
As an option I thought I could save the google place id in my database. This would make it very easy to store the information and retrieve all related data when needed. But what if Google decides to change its ids?
Are there any other options I am missing? Have you faced a similar issue? Are there workarounds to the outlined problems?

Why would the specifications for this database use an aggregation instead of attributes on an entity?

I'm trying to better understand designing a database schema. After reviewing the solution for a problem that I'm working on, I don't understand why the solution chooses to use an aggregation for the attributes "address" and "phone number" for a given "musician". Here are the specifications, I'm only interested in bullet point 1:
Each musician that records at Notown has an SSN, a name, an address, and a phone
number. Poorly paid musicians often share the same address, and no address has more
than one phone.
Each instrument used in songs recorded at Notown has a name (e.g., guitar, synthesizer,
flute) and a musical key (e.g., C, B-flat, E-flat).
Each album recorded on the Notown label has a title, a copyright date, a format (e.g.,
CD or MC), and an album identifier.
Each song recorded at Notown has a title and an author.
Each musician may play several instruments, and a given instrument may be played by
several musicians.
Each album has a number of songs on it, but no song may appear on more than one
album.
Each song is performed by one or more musicians, and a musician may perform a number
of songs.
Each album has exactly one musician who acts as its producer. A musician may produce several albums, of course.
Here is a solution that I found:
The ER Diagram I created looks almost exactly the same, except for the fact that I made "address" and "phone number" attributes of "musician" instead of giving each of them an entity set of their own, creating a relationship, and turning it into an aggregation. I don't understand why this would be done in this situation. Can anyone explain?? Thank you!
I'm not able to see the image you linked to, but anyway...
no address has more than one phone
This means we should make the phone number an attribute of the address - unless we want to allow for multiple phones per address in the future.
So it would not be completely wrong to make phones a table. But then, we know little about the future. Would there be multiple musicians sharing the same address and the same phones? (I.e. the phone number would be linked to an address.) Or would there be multiple musicians sharing the same address, but each would have their own phone? (I.e. the phone number would be linked to a musician. To use a phone table and link the phones to musicians, however, would only be necessary if a musician could have multiple phone numbers. Otherwise we'd still not make a phone table, but rather make the phone a musician's attribute.)
poorly paid musicians often share the same address
This means we make the address a table of its own. Thus there is only one row to change in case the phone number or some other attribute changes. If we made the address number a musician's attribute instead, we'd store the address redundantly and could get inconsistent data (e.g. same address, but different phone numbers).
A possible data model:
address (address_id, street, city, phone, ...)
musician (musician_id, ssn, name, address_id, ...)
This is a 1:n relation. A musician has one address; an address can belong to multiple musicians.
The primary purpose of database normalization is to make it more difficult for anomalous data to get into the database. Reading the first bullet point, we see that each address may have zero or one phone numbers associated with it. In other words, the phone number is an attribute of/identified by the address. Which normalization level does this violate?
To illustrate how not normalizing the address fields (including phone number) increases the chances of anomalous data, let's say you have four students staying at that address. This means you have four rows where the address data exists. Suppose the phone number changes. You have to make sure you change all four versions of the data. I said there were four students, but suppose there are actually five and I just missed one? Or suppose you found only three when you went to make the change? An address may have at most one phone number however now you have several copies of the same address but with different phone numbers. This is anomalous data.
If this data is normalized, you would have only one copy to change. Since this data is referenced by all the students who live there, no matter how many, this change is "propagated" to all of them. The integrity of the data is maintained.

Reusing a database table for many other entities? Is this possible?

Say for example, I have an ADDRESS table, that will store similar attributes of other entities like address, city, zip, country, etc. The entities are USER, COMPANY, BANK, BRANCH, etc. I would like to use this one table ADDRESS to store the addresses of the other entities rather than creating other tables for each entity to store the ADDRESS like so, USER_ADDRESS, COMPANY_ADDRESS, BANK_ADDRESS, BRANCH_ADDRESS.
Is this possible? Am i breaking any laws or conventions? What are the consequences, if any?
Each entity (USER, COMPANY, etc.) should contain a reference to an entry in the ADDRESS table.
There are a few issues:
If 2 users have the same address, they should reference the same address id.
You will need to normalise addresses so that you're not duplicating information (e.g. if you know the city, then you automatically know the zip and country).
Of course, you may not want a well-normalised database. Saving the entire address as a string will improve read performance by reducing the number of join operations.
A lot of things depend on the exact use of the database.
It is fine to use a single ADDRESS table for that purpose and have an ADDRESS_ID in each of the other entities. Depends on the use case and the way you prefer to implement it. I most probably wouldn't do it. I also wouldn't do the other solution you're suggesting (an address table per entity).
So, let's say you want to implement a function to search for all the addresses, where it doesn't matter what type of entity is connected to it. You will have to search the ADDRESS table. If you get results, then you have to search the other four tables to see which record is connected to that address.
You could add a field ENTITY_TYPE in the ADDRESS table where you specify which type of entity it is connected to, so you don't have to search the four tables, but I don't recommend this since you can have consistency errors (USER 17 points to ADDRESS 14, but ADDRESS 14 has ENTITY_TYPE = BANK).
Now, with your other solution (having four separate tables to store the addresses of the four different entities) you're just going to have to search those four tables and then search the corresponding entity table to get the entity you're looking for.
My solution in most cases is adding the address fields to the entities tables themselves. Having ADDRESS, ZIP_CODE and COUNTRY_CODE (always use proper country codes, not country names https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes) will make it simple. When you present a list of items (users, banks, companies, offices, whatever), it's really common to show the name and the address at the same time in a table. Having no JOINS makes it faster and easier to process. If you want to update an address, it's on the table itself. No lookups!
Of course, like most things in programming, it depends on what your needs are.
Also, please, don't try to split the ADDRESS in more fields. I've seen ADDRESS_TYPE (street, road, avenue, square, ...), STREET_NAME, STREET_NUMBER, BLOCK_NUMBER, BLOCK_FLOOR, BLOCK_LETTER. I'm pretty sure you're never going to need something like SELECT * FROM USER WHERE STREET_NUMBER = 74.

Separate tables for name and email addresses?

I've been fighting my initial decision since I've began work on my database. I'm debating whether or not I need a separate table for email addresses. My database looks like this:
people(id, first_name, last_name, email)
addresses(id, address, street, city, state, zip, latitude, longitude)
addresses_people(id, person_id, address_id)
phone_numbers(id, person_id, phone_number, type)
I figured I didn't need a separate table for email addresses as I only wanted one per person regardless if they have more. The problem I'm seeming to have though is that some people will NOT have email addresses. Very often I will be storing children in the people table. Now it seems like it would be better design if I put the email addresses in a separate table to avoid the thousands of empty email fields I'll have.
It's a huge hassle to change this now as the app is already in production somewhat, but changing it now as opposed to a year or two from now would be exponentially easier. Is it worth a day or two to change the emails to another table?
In my opinion you are over-engineering. Optional email field is fine. Actually having a separate table might introduce much bigger overhead.
The only reason for a separate table is to model 1-N relationship if you expect the user to have more than one e-mail.
I believe there is no need to change it. Having some fields in a row be empty is not a big deal most of the time. In fact, it is common in most databases. The design of the database should depend on the objects you are trying to model primarily and not be concerned with separating things into tables simply for storage reasons unless you have serious storage constraints or other extenuating circumstances.
See Database design - empty fields which also shed a lot of light on your question.

Data constraints in US state/county/city/zip database?

I have a database of US zip codes and their corresponding states, cities and counties. It was supplied as a flat file and I'm trying to normalize the data and figure out exactly which entities depend on which others.
One problem I've come across is that some cities seem to exist in more than one county. I was under the impression that in the US, there is a hierarchy of State -> County -> City -> Zip.
However, this data seems to show otherwise for some cities:
Is my data set incorrect or is this actually a feature of US geography?
I am working with this same topic. I have learned that Virgina has cities that are not within a county. The city functions as both a city and county but in not within any county boundary. Also Alaska has no counties. Their equivilant is Boroughs, but the whole state is not divided into boroughs. Any area not within a borough is referred to as the "unorganized borough".
No, there isn't a clean hierarchy like that.
You're also liable to find cities that straddle state borders (cities in two states), and ZIP codes that take in more than one city. Not long ago, there were ZIP codes that straddled state borders, too. (ZIP codes are more about the route followed to deliver mail than about geography.) There might still be some.
As far as I know, no county is split between two states. But if there happened to be one, it wouldn't surprise me.
Depending on your application, you might discover even weirder things. I used to have to deal with addresses in the mountains that were "in" one county geographically, but were "in" a second county for emergency services (fire, police), and "in" yet a third county for non-emergency services (water, sewer, garbage collection). It depended on where the address was in relation to mountain ridges and roads.

Resources