Storing phone numbers country code - database

I split the phone number in 2 parts:
country prefix(e.g. +49)
phone number without leading 0
My question is, which is the best approach to store the country code As it is (+49) or a Foreign key to a countries table?

You shoud use The Normal Forms for Databases.
https://en.wikipedia.org/wiki/Database_normalization
there are rules to roll with such a problem.
/M

The choice is dependent upon:
No. or records
The database used
No. of relationships with other tables
As, country code would be a repeating column it could be placed in a varchar type column as it is e.g. +91-9654637268 .This will allow different formats of the phone number but no validation at database level that the entered value must be a number that you will need to validate at code level .Using a varchar must be the first choice for storing the phone numbers with their country codes as it will be faster by avoiding joins.
But if you need good amount of manipulation use a bigint which will store the number as e.g. 9764536377443 where first two digits are country code and rest of the digits are the phone number part.
Or you could have a separate column for the country code which would add an unnecessary join but could be helpful if the country code is needed at several places and must be well validated and constrained whihc could also be achieved by using any of the above techniques.
Hope it is helpful.

Transactional Database
If this is a transactional database (lots of updating), or a general purpose database (querying and updating) then use Database normalisation as Jonathan says. So have a table called Country with structure
| ID | CountyCode | CountryName |
| 1 | +49 | Germany |
| 2 | +1 | USA |
This way you can keep the country code and the describing information related to it away from the data about the telephone number. So say a country changes its name or country code, rather than having to update each effected row in the telephone number table, you just update the one row in the Country table.
Then a table(s) for the rest of the telephone number (depending on whether you want to split up area code etc..) with a column that references as a foreign key the CountyCode ID
| ID | CountyCodeID | TelNumber |
| 1 | 1 | 12345 |
However bear in mind that this is a general purpose way of doing things, in query heavy situations with larger amounts of data (dataMart, datawarehouse) then a different approach is best see Star Schemas

Related

How do i prevent the duplicate id from the imported table in MariaDB?

(Before that, i apologize for my bad English)
I have study cases like this:
I am currently having a trouble with my Web Application. I made a Web application for a certain company. I made the app using CodeIgniter 3.
I built the database using Maria DB. For the id in each table, i am using Auto-increment id for my application database for each table. I usually deploy the web app to the cloud server (sometimes the company have their own dedicated server, but sometimes haven't ). One day, there is a company that they don't want to deploy the app that i have made before to the cloud ( for the security purposes they said ).
This company wanted to deploy the app to the employee's PC personally in the office, while the pc for each employee not connected to each other ( i.e stand alone pc/personal computer/employee's Laptop ). They said, for every 5 months, they would collect all of the data from employee's personal computer to company's data center, and of course the data center are no connected to the internet. I told them that's not the good way to store their data. ( because the data will be duplicate when i am trying to merge all of the data into one, since my column id for every table are in auto-increment id, and it's a primary key). Unfortunately, The company still want to kept the app that way, and i don't know how to solved this.
They have at least 10 employees that would used this web app. According that, I have to deploy the app to the 10 PC personally.
Additional info : Each employee have their own unique id which they got from the company, and i made the auto_increment id for each employee, just like the table below:
id | employee_id | employee_name |
1 | 156901010 | emp1
2 | 156901039 | emp2
3 | 156901019 | emp3
4 | 156901015 | emp4
5 | 156901009 | emp5
6 | 156901038 | emp6
The problem is whenever they fill the form from that application, some of the table are not stored the employee's id but the new id that come from increment id.
For example electronic_parts table. They have the attribute like below:
| id | electronic_part_name | kind_of_electronic_part_id |
if the emp1 fill the form from the web app , the table's content would like below.
| id | electronic_part_name | kind_of_electronic_part_id |
| 1 | switch | 1 |
and if the emp2 fill the form from the web app , the table's content would like below.
| id | electronic_part_name | kind_of_electronic_part_id |
| 1 | duct tape | 10 |
When i tried to merge the contents of the table into the data center it would falling apart because the duplicate id.
It's getting worst when i think about my foreign key in other tables.. like for example the customer_order table.
The table for customer_order column looks like below (just a sample, not the actual table, but similar).
|id | customer_name | electronic_parts_id | cashier(a.k.a employee_id, the increment id one, not the id that employee got from a company as i described above ) |
| 1 | Henry | 1 | 10 |
| 2 | Julie | 2 | 9 |
Does anyone know how to solved this problem ? or can someone suggest/recommend me some good way to solved this ?
NOTE: Each Employees have their own database for their app, so the database is not centralized, it's a stand-alone database, that means, i have to installed the database to the employee's pc one by one
This is an unconventional situation and you can have an unconventional solution.
I can suggest you two methods to solve this issue.
Instead of using autoincrement for primary key generate a UUID and use it as the primary key. Regarding the probability of duplicates
in random UUIDs: Only after generating 1 billion UUIDs every second
for the next 100 years
In CodeIgniter you could do this with the following code snippet.
$this->db->set('id', 'UUID', FALSE);
This generates a 36 characters hexadecimal key (with 4 dashes
included).
ac689561-f7c9-4f7e-be94-33c6c0fb0672
As you can see it has dashes in the string, using the CodeIgniter DB
function will insert this in the database with the dashes, it still
will work. If it does not look at clean, you could remove and
convert the string to a 32-char key.
You can use the following function with the help of [CodeIgniter
UUID library][1].
function uuid_key {
$this->load->library('uuid');
//Output a v4 UUID
$id = $this->uuid->v4();
$id = str_replace('-', '', $id);
$this->db->set('id', $id, FALSE);
}
Now we have a 32-byte key,
ac689561f7c94f7ebe9433c6c0fb0672
An alternate unconventional method to tackle the situation is by
adding function to log all Insert, Update, Delete queries processed
in the site to a file locally. By this way, in each local
implementation will generate a log file with an actual list of
queries that modify the DB over time in the right sequential order.
At any point in time, the state of the database is the result of the
set of all those queries happened in the past till that date.
So in every 5 months when you are ready to collect data from
employees personal computer, instead of taking data dump, take this
file with all query log.(Note: Such a query log won't have
auto-increment id as it will be created only in the real time when
it is executed towards a Database. )
Use such files to import data to your datacenter. This will not
conflict as it will generate autoincrements in your data center in
real time. (Hope you do not have to link your local to data center
at any point of time in future)
[1]: https://github.com/Repox/codeigniter-uuid
Is that id used in any other tables? It would probably be involved in a JOIN. If so, you have a big problem of unraveling the ids.
If the id is not used anywhere else, then the values are irrelevant, and the rows can be renumbered. This would be done (roughly speaking) by loading the data from the various sources into the same table, but not include the id in the load.
Or, if there is some other column (or combination of columns) that is UNIQUE, then make that the PRIMARY KEY and get rid of id.
Which case applies? We can pursue in more detail. Please provide SHOW CREATE TABLE for any table(s) that are relevant.
In my first case (where id is used as a FK elsewhere), do something like this:
While inserting the rows into the table with id, increment the values by enough to avoid colliding with the existing ids. Then do (in the same transaction):
UPDATE the_other_table SET fk_id = fk_id + same_increment.
Repeat for each other table and each id, as needed.
I think your problem come from your database... you didn't design it well.
it's a bug if you have an id for two difference users .
if you just made your id field unique in your database then two employee wouldn't have a same id so your problem is in your table design .
just initiate your id field like this and your problem will be solved .
CREATE TABLE [YOUR TABLE NAME](
[ID] int NOT NULL IDENTITY(1,1) PRIMARY KEY,
....
Is it required for the id to be an integer? if not may be you can use a prefix on the id so the input for each employee will be unique in general. that means you have to give up the auto increment and just do count on the table data (assuming youre not deleting any of the records.)
You may need to write a code in PHP to handel this. If other table is already following unique/primary key based than it is fine.
You can also do it after import.
like this
Find duplicates in the same table in MySQL

Drawback of 3rd Normal Form Databases

I was asked this question in an interview.
What is the drawback of using 3rd Normal form in databases?
I know its main advantages which are
1. Duplication is reduced
2. Data integrity
Is there any Drawback of using 3rd Normal form?
Third normal form is violated when a non-key field is a fact about another non-key field, as in
| EMPLOYEE | DEPARTMENT | LOCATION |
The EMPLOYEE field is the key. If each department is located in one place, then the LOCATION field is a fact about the DEPARTMENT -- in addition to being a fact about the EMPLOYEE. The problems with this design are the same as those caused by violations of second normal form:
To satisfy third normal form, the record shown above should be decomposed into the two records:
| EMPLOYEE | DEPARTMENT | | DEPARTMENT | LOCATION |
So The answer to your question is
In the unnormalized form, the application searches one record type. With the normalized design, the application has to search two record types, and connect the appropriate pairs.So there is some possible performance cost for certain retrieval applications

Database structure and relationships

I'm building this gaming portal and I have some database concerns. Currently I have about 10 tables, but I think they will be more than 20 when I'm finished programming. Anyway, I want to create some sort of relationships between the different tables (somewhat like WordPress). That table will hold any relation that one row from table A has to a row in table B. And what I came up with is the following:
table relationships
| rs_id | rs_type | rs_alpha | rs_beta |
rs_id -> just an id
rs_type -> the type of relation
rs_alpha -> related table #1 and row id
rs_beta -> related table #2 and row id
examples:
| 1 | cover | games:153 | images:318 |
| 2 | tag | news:183 | tags:18 |
| 3 | group_admin | users:918 | group:75 |
...
This might just do it, but here it comes my concerns:
1. This table is going to grow so fast that in no time there might be over 100,000 rows which will slow the load time.
2. To extract info I'll have to explode every call which might slow down the load time.
3. I might divide table name from id (rs_alpha, rs_beta), yet that might also slow down the load time.
Thank you and I'm open to any other solutions that might be better than this one :)
If you have time you can download my db structure from here to see what it looks like:
demirevdesign.com/public/pcanvil.sql.gz
(The addon_ tables will become the relationships table)
As far as I understand, relationship type itself defines tables involved , so no need in storing table names.
Also, if you refactor your schema and add a common parent table for all entities that might be involved in relationship, you won't need to care about table name at all , you just store id of that new table.
Finally, relationship always has start date and may have end date, I'd suggest adding this attributes to relationships table.
As to performance, it's hard to answer without seeing how you are going to query the table. I guess in general partitioning by relationship type column will be beneficial

Alternative approach to database design: "verticality"

I would like to ask someone, who has experiences in database design. This is my idea, and I can't assess deep consequences of such approach to, let's say, common problem. I appreciate your comments in advance...
Imagine:
- patients in hospital
- each patient should have:
1. personal data - Name, Surname, Street, SecurityID, contact, and many more (which could be changed over time)
2. personal records - a heap of various forms (also changing over time)
Typically I would design table for patient's pesonal data:
personaldata_tbl
| ID | SecurityID | Name | Surname ... | TimeOfEntry
and similar tables for each form in program. This could be very hard task, because it could reach several hundreds of such tables. In addition to it, probably their count will be increasingly growing.
And yes, all of them should be relationally connected for example:
releaseform_tbl
| ID | personaldata_tbl_ID | DateOfRelease | CauseOfRelease ... | TimeOfEntry
My intention is to revert 2D tables to single 1D table - all data about patients would be stored in one table! Other tables will describe (referentially) what kind of data is stored in the main table. Look at this:
data_info_tbl
| ID | Description |
| 1 | Name |
| 2 | Surname |
patient_data_tbl
| ID | patient_ID | data_info_ID | form_ID | TimeOfEntry | Value
| 1 | 121 | 1 | 7 | 17.12.2011 14:34 | John
| 2 | 121 | 2 | 7 | 17.12.2011 14:34 | Smith
The main reason, why this approach attracts me is:
- simplicity
- ability to store any data with appropriate specification and precision
- no table jungle
Contras:
- SQL querying could be problematic in some cases
- there should be reliable algorithm to delete, update, insert data (one way is to dynamically create table, perform operations on it, and finally store it)
- dataaware controls won't be used.
So what would you say ?
thanx for your time and answers
The most obvious problems . . .
You lose control of size. The "Value" column must be big enough to hold the largest type you use, which in the general case must be a blob. (X-ray images, in a hospital database.)
You lose data types. PostgreSQL, for example, includes the data types "point", bit string, internet address, cidr address, MAC address, and UUID. Storing all values in a column of a single type means you lose all the type-safety built into the specific data types.
You lose constraints. Some integers need to be constrained to between 1 and 10, others between 1000 and 3000. Some text strings need to be all numbers (ZIP codes), some need to be a particular mix of alpha and numerics (tire sizes).
You lose scalability. If there are 1000 attributes in a person's medical records, each person's data will take 1000 rows in the table. If you have 100,000 patients--an easily manageable number even in Microsoft Access and SQLite--your table suddenly balloons from a manageable 100,000 rows to 100,000,000 rows. Any query that does a table scan will have to scan 100 million rows, every time. Any single query that needs to return, say, 30 attributes will need 30 joins.
What you're proposing is the EAV anti-pattern. (Starts on slide 30.)
I disagree with Bert Evans (in the sense that I don't find this terribly valid).
First of all it's not clear to me what problem you are trying to solve. The three "benefits" you list:
simplicity
ability to store any data with appropriate specification
and precision
no table jungle
don't really make a lot of sense if the application is small, and if it isn't (like hospital records you mention in your example) any possible "gain" is lost when you start to factor in that this will make any sort of query very inefficient, and that human operators trying to design reports, data extractions or to extend the DB will have to put in a lot of extra effort.
Example: I suppose your hospital patient has an address and therefore a ZIP code... have you considered what loops you will have to jump in to create foreing index on the zip code/state table?
Another example: as soon as you realize that the patient may have a middle name and that on the form it will be placed between the first and last name what will you do? renumber all the last name fields? or place the middle name at the bottom of the pile, so that your form will have to re-add special logic to show it in the "correct" position?
You may want to check some alternatives to SQL DBs, like for example XML based data stores, or even MUMPS, but I really can't see any benefit in the approach you are proposing (and please consider I had seen an over-zealous DBA trying to do something very similar when designing a web application backed by an Oracle DB: every field/label/image on the webpage had just a numeric reference to a sequence-based ID record in the DB, making the whole webapp a nightmare to mantain - so I am not just being a "purist" here).

relational VS parametrized Data modeling when building semantic web applications?

Here is the summary of my question then i'll describe it in more details :
I read about using the parametrized data modeling method instead of using the standard relational data modeling when building semantic web application,i think we'll lose 90% of normalization if we used this method,If I want to design the database of my semantic web application should i use this way? what is the practical value ?
In More Details :
I've read a lot of articles around this, in this book "Programming the semantic web - Toby Segaran, Colin Evans, and Jamie Taylor" at page 14 they tell us to use parametrized Data modeling to get Semantic Relationships instead of the standard relational database described by this example:
in the standard Relational Database :
Venue : [ ID(PK), Name, Address ]
Restaurant : [ ID(PK), VenueID(FK), CuisineID]
Bar : [ ID(PK), VenueID(FK), DJ?, Specialty ]
Hours : [ VenueID(FK), Day, Open, Close ]
For Semantic Relationships : One table only !!! Fully parameterized venues
Properties : [ VenueID,Field, Value ]
Example:
VenueID _ Field____Value
1__Cuisine__Deli
1__Price__ $
1__Name__Deli Llama
1__Address__Peachtree Rd
2__Cuisine__Chinese
2__Price__ $$$
2__Specialty Cocktail __ Scorpion Bowl
2__DJ?__No
2__Name__ Peking Inn
2__Address Lake St
3__Live Music? __ Yes
3__Music Genre__ Jazz
3__Name__ Thai Tanic
3__Address__Branch Dr
Then the authors Says :
Now each datum is described alongside the property that defines it. In doing this, we’ve
taken the semantic relationships that previously were inferred from the table and column
and made them data in the table. This is the essence of semantic data modeling:
flexible schemas where the relationships are described by the data itself.
If I want to design the database of my semantic web application should i use this way? what is the practical value ?
What you lose in immediate clarity, you gain in flexibly. Notice with your more parametrized approach you gain the ability to easily add fields without altering any tables. This allows you give different fields to different venues as it suites your application. By association, this also makes it easy to extend your web application via your creation or future maintainer/modification authors (if you intend to release) down the road.
Just be careful when it comes to performance. Don't adopt a fully parametrized design when it is easier to a standard relational design. Let's say, for a moment, you have a two different users tables, one relational the other parametrized:
Table: users_relational
+---------+----------+------------------+----------+
| user_id | username | email | password |
+---------+----------+------------------+----------+
| 1 | Sam | sam#example.com | ******** |
| 2 | John | john#example.com | ******** |
| 3 | Jane | jane#example.com | ******** |
+---------+----------+------------------+----------+
Table: users_parametrized
+---------+----------+------------------+
| user_id | field | value |
+---------+----------+------------------+
| 1 | username | Sam |
| 1 | email | sam#example.com |
| 1 | password | ******** |
| 2 | username | John |
| 2 | email | john#example.com |
| 2 | password | ******** |
| 3 | username | Jane |
| 3 | email | jane#example.com |
| 3 | password | ******** |
+---------+----------+------------------+
Now you want to select a single user. With your relational table, you will only select one row, while your parametrized version will select the number of rows that there are fields associated with that user, in this case 3.
The next issue is searchability (at times). Say you have that same users table from the example above, but instead of knowing the user ID, you only know the username. You may be using two queries, one to find the user id and the other to get the data associated with the user.
Your last con stems from selecting only a few rows at a time. Taking the users tables example again, we can limit the number of fields easily with the relational one:
SELECT username, email FROM users_relational WHERE user_id = 2
We should get a single result with two columns.
Now, for the parametrized table:
SELECT field, value FROM users_parametrized WHERE user_id = 2 AND field IN('username','email')
It's a little more verbose and will become less readable than the first one, especially if you start taking on more and more fields to select.
Additionally, the parametrized will be slower for a few reasons. It now has to do text comparisons from the varchar in the field column, instead of a single, numerically indexed user_id. With the first query, it knows when to stop looking for the record because you're selecting by a primary key. In the parametrized, you are not selecting by a primary key, so you will take a performance hit because your database must look through all the records.
This leads me into the final real difference (as far as your DBMS sees it). There is no primary key in the parametrized, which (as you saw above) can be a performance issue, especially if you already have a considerable number of records. For something like a users table where you can have thousands of records, your record count would be that number times 3 (as we have three non-user_id fields) in this case alone. That's a lot of data for the database to search through.
There are quite a few things to consider when designing your application. Don't be afraid to mix your database with parametrized and relational style - it just has to make sense practically. In the case you gave, it makes perfect sense to do so; in the case I displayed, it would be pointless.
It is possible to stay fully relational while pursuing the intent of storing data in a parameterized fashion. The following is a greatly oversimplified demonstration, but should suffice to show the main tricks that are needed -- in a nutshell, additional levels of abstraction, some surrogate primary keys, and some tables with composite primary keys. I will leave out exact description of foreign key constraints assuming the reader can grasp the obvious relations between tables below.
Your first table is only to establish the entities you want to store information about, and a key to look up what sorts of information will be stored:
entity_id | entity_type
---------------------------
1 | lawn mower
2 | toothbrush
3 | bicycle
4 | restaurant
5 | person
The next table relates entity type to the fields you wish to store for each entity type:
entity_type | attribute
------------------------
lawn mower | horsepower
lawn mower | retail price
lawn mower | gas_or_electric
lawn mower | ...etc
toothbrush | bristle stiffness
toothbrush | weight
toothbrush | head size
toothbrush | retail price
toothbrush | ...etc
person | name
person | email
person | birth date
person | ...etc
This is expandable to as many fields as you like for each entity type. It's still relational; this table does have a primary key, it's just a composite key composed of both columns.
This example is oversimplified for brevity; in actual practice you have to confront the namespacing issues with attributes and you probably want certain attribute names to be per-entity-type in case the same name means something different on an entirely different kind of entity. Use a surrogate primary key for the attributes in order to solve the namespacing issue, if you don't mind the decrease in readability when looking directly at the tables.
Meanwhile, and opposite of the preceding point, it's useful to make common and unambiguous attributes (such as "weight in grams" or "retail price in USD" available for reuse across multiple entity types. To handle this, add a level of abstraction between attributes and entity types. Make a table of "attribute sets", with each set linked to 1..n attributes. Then each entity type in the table above would be linked not directly to attributes, but to one or more attribute sets.
You'll need to either guarantee that attribute sets do not overlap in what attributes they point to, or create a means of resolving conflicts by hierarchy, composition, set union, or whatever fits your needs.
So at this point a lookup for a particular entity goes as follows. From the entity id we get the entity type. From entity type we get 1..n attribute sets, which yield a resulting attribute set that is held by the entity. Finally there is the big table with the actual data in it as follows:
entity_id | attribute_id | value
---------------------------------------
923 | 1049272 | green
923 | 1049273 | 206.55
924 | 1049274 | 843-219-2862
924 | 1049275 | Smith
929 | 1049276 | soft
929 | 1049277 | ...etc
As with all of these tables, this one has a primary key, in this case composed of the entity_id and attribute_id columns. The values are stored in a plain-text column without units. The units are stored in a separate table linking attributes to units. More tables can be established if you need to get more specific on that; you can set up additional levels of abstraction to establish an "attribute type" system similar to the entity type system described above.
If needed, you can go as far as storing relationships such as "attribute X is numerically convertible to attribute Y by the following formula", for numerical attributes. Or for non-numerical attributes you can establish equivalence tables to manage alternate spellings or formats for the allowed values of an attribute.
As you can imagine, the farther you go with your "attribute types and units" system, and the more you use that additional machinery in computation, the slower this all will be. In the worst case you're looking at many joins. But that problem can be addressed with caching and views, if your situation allows you to make tradeoffs such as slowing write speed to gain a great increase in read speed. Also, many of your queries to the database will be in situations where you already know what entity type you're working with at the moment and what its resulting attributes are and their types; so you only have to grab the literal values out of the entity/attribute/value table, and that is plenty fast.
In conclusion, hopefully I have shown how you can get as parameterized as you wish while remaining fully relational. It just requires more tables for more levels of abstraction than some of the simpler approaches do; yet it avoids the disadvantages of the "one-big-table" style. This style of entity>type>attribute>value storage is powerful, flexible, and can be extended as far as you need.
And thanks to a relational/normalized table setup, you can do all sorts of reorganizing along the way as your entity schema evolves, without losing data. The additional levels of abstraction allow you to re-parent attributes from one attribute set to another, change their names if needed, and change which sets of attributes an entity type makes use of, without losing stored values, as long as you write appropriate migrations. The other day I realized I needed to store a certain product attribute on a per-brand basis instead of per-product, and was able to make the schema change in five minutes with only a couple of updated rows in the database. In many other setups, particularly in a one-big-table setup, it could have been a lot more work, requiring as much as one or more updated rows per entity affected by the change.

Resources