I need to implement one user ID for one user. After looking around, closest I can see is using mobile phone numbers to text. Assuming this is best available method How to handle mobile number recycling by telephone companies? Any ideas?
Following is best one that attempted to answer on how to implement one ID for one person-http://stackoverflow.com/questions/5964664/account-verification-only-1-account-per-person
Thank you.
Back when I worked for a marketing client, our business was based on determining whether two people were the same - "our records show Bob Smith. Is the Robert Smith who responded the same as Bob Smith?"
Our data could be segmented into name, method of contact (address, phone, email) and relationship (Alonzo works for Square Tires Inc). Once the data had been scrubbed/standardized, we would then compare that entity to possible matches in the system. Matching on a single segment was insufficient to categorize as a match. We required two different segment matches for an entity to be matched to another. This was important as matching phone and address might yield different members of the same household. Depending on your criteria, that might be sufficient but generally it is not.
With all that said, I don't think you have enough information to make a semi-authoritative guess as to whether phone number 1234567 is the original user or a person who has had their phone number reassigned. I would attempt to capture more personally identifiable information about the entity registering the phone number. The registration of an existing phone number with different PII would not cause the original holder of the number to become invalidated, it merely signifies that a new user has come onboard. Until you get enough information back about the original holder of that number having a new method of contact, you should probably retain the association to the phone number. It could be the case the second registrant supplied a bogus number or either applicant fat-fingered their number and you wouldn't know which was accurate.
You can still run into issues with say corporate issued mobile phones. A company has a pool of phones they distribute to folks. People quit and get hired and those phone numbers remain the domain of the company but are assigned to people that then register for your service.
Hope this helps give you some starting points for solving the problem.
Related
I'm trying to better understand designing a database schema. After reviewing the solution for a problem that I'm working on, I don't understand why the solution chooses to use an aggregation for the attributes "address" and "phone number" for a given "musician". Here are the specifications, I'm only interested in bullet point 1:
Each musician that records at Notown has an SSN, a name, an address, and a phone
number. Poorly paid musicians often share the same address, and no address has more
than one phone.
Each instrument used in songs recorded at Notown has a name (e.g., guitar, synthesizer,
flute) and a musical key (e.g., C, B-flat, E-flat).
Each album recorded on the Notown label has a title, a copyright date, a format (e.g.,
CD or MC), and an album identifier.
Each song recorded at Notown has a title and an author.
Each musician may play several instruments, and a given instrument may be played by
several musicians.
Each album has a number of songs on it, but no song may appear on more than one
album.
Each song is performed by one or more musicians, and a musician may perform a number
of songs.
Each album has exactly one musician who acts as its producer. A musician may produce several albums, of course.
Here is a solution that I found:
The ER Diagram I created looks almost exactly the same, except for the fact that I made "address" and "phone number" attributes of "musician" instead of giving each of them an entity set of their own, creating a relationship, and turning it into an aggregation. I don't understand why this would be done in this situation. Can anyone explain?? Thank you!
I'm not able to see the image you linked to, but anyway...
no address has more than one phone
This means we should make the phone number an attribute of the address - unless we want to allow for multiple phones per address in the future.
So it would not be completely wrong to make phones a table. But then, we know little about the future. Would there be multiple musicians sharing the same address and the same phones? (I.e. the phone number would be linked to an address.) Or would there be multiple musicians sharing the same address, but each would have their own phone? (I.e. the phone number would be linked to a musician. To use a phone table and link the phones to musicians, however, would only be necessary if a musician could have multiple phone numbers. Otherwise we'd still not make a phone table, but rather make the phone a musician's attribute.)
poorly paid musicians often share the same address
This means we make the address a table of its own. Thus there is only one row to change in case the phone number or some other attribute changes. If we made the address number a musician's attribute instead, we'd store the address redundantly and could get inconsistent data (e.g. same address, but different phone numbers).
A possible data model:
address (address_id, street, city, phone, ...)
musician (musician_id, ssn, name, address_id, ...)
This is a 1:n relation. A musician has one address; an address can belong to multiple musicians.
The primary purpose of database normalization is to make it more difficult for anomalous data to get into the database. Reading the first bullet point, we see that each address may have zero or one phone numbers associated with it. In other words, the phone number is an attribute of/identified by the address. Which normalization level does this violate?
To illustrate how not normalizing the address fields (including phone number) increases the chances of anomalous data, let's say you have four students staying at that address. This means you have four rows where the address data exists. Suppose the phone number changes. You have to make sure you change all four versions of the data. I said there were four students, but suppose there are actually five and I just missed one? Or suppose you found only three when you went to make the change? An address may have at most one phone number however now you have several copies of the same address but with different phone numbers. This is anomalous data.
If this data is normalized, you would have only one copy to change. Since this data is referenced by all the students who live there, no matter how many, this change is "propagated" to all of them. The integrity of the data is maintained.
Problem Statement: Lets consider that you have huge customer data which has demographic information as firstname, lastname, phone and email. Now lets say that we have a twitter handle, then is there any way to tie that twitter handle to the proper contact/customer data present.
So basically I want to get phone number, email or firsname and last name so that match could be done in internal database to identify the existing customer.
Any pointers would be helpful here.
As i understand it, you want to proper view of customer information instead corrupt or lack information on your customers.
I don't think is that possible. Because twitter hiding telephone number or mail adress for securty and privacy.
Not sure which is the best Stack Exchange site for this, so will try my hand here.
I have a web application that stores user disciplinary data for organisations. Rather than clients enter their staff into multiple systems, some want to push the basic personnel data into ours (data such as First Name, Surname, DOB, Job Title etc) from their source (e.g. HR/ERP) databases.
Our clients are using a range of existing systems to store their data, such as Oracle, SAP, JD Edwards, etc.
I am familiar with the technical methods to get this data (e.g. web service, web API), but not for a case such as when a person's surname changes (e.g. Janet Smith gets married and becomes Janet Doe). Unless there is a unique identifier for that person across both systems, I can't see how that change can be managed reliably.
How is this process best-managed please? Is an additional field added to the destination database that contains the UID of the source data? Or, do both parties agree on a common field, e.g. employee number, that never changes?
This issue arises in many circumstances. One case is in the Texas school system where students are tracked longitudinally through numerous education subsystems. A social security number, providing a unique identifier in some cases (although not all) was considered too sensitive for use. Thus, a unique identifier has been generated for each student and staff member. This is part of the permanent information associated with each individual, regardless of employment change, location change, or name change.
This link describes the rationale for the unique id.
This link is the documentation on the Texas Student Data System (TSDS) unique identifier. You might find the XML examples at the end of the document of most interest. Much of the information involves submitting requests for an id where demographic information is needed for disambiguation.
Basically, something similar to a Java UUID as an extra field in the database should be sufficient to achieve your aim.
Hope this helps.
Yes, the UID is the only solution. This problem comes up in medical systems too, for example. Another is photos, I'm not sure which causes more problems!
I know approach which is using "external_id" field for that. Several external ids can be exploited in case of many systems.
I am working on a medical php application which will be implemented at national level.
It will be used by multiple hospitals and the patient record will be centralized i.e every hospital will be accessing and adding the patient records into same database.
I want that there should be only 1 record of a patient without any duplication. Simply speaking no hospital can again enter the 2nd record for same patient but in order to make it possible I need to know which criteria should we use which will remain fix throughout the entire lifetime of a patient. Only 2 are there in my mind i.e Name and Date of birth.
What other criterias can be there? I dont want to use mobile numbers and phone numbers etc. Moreover infants cant be having it. I need the criteria which will be there for every patient and unique.
Please give me your suggestions or any other better way to implement this functionality?
I'll take a shot because I've been involved in some data matching and validation, although not specifically in the medical industry. You haven't specified a particular country, just mentioned Asia, so I'll use an example from my home country of Australia just because I'm familiar with the rules and I believe the same would apply to many Asian countries:
We have a unique Medicare number used for health care, but it's not mandatory and while the free / discounted care means I expect 99%+ of people would have one you can't rely on it.
There is also a tax file number, likewise not mandatory even if you
work and people who have never had a job wouldn't normally have one.
You might be dealing with foreign people that aren't residents.
Drivers licenses are of course not mandatory to get healthcare.
It's perfectly legal to have "no fixed address". Plus some people will lie to get treatments and repeats of drugs etc. Not to mention many people move often.
Changing name is common in case of marriage / divorce and unless done
for illegal purposes someone can change their name just because they
don't like their original. Not to mention people use common substitutions for various things like Jim versus James.
Typing mistakes will be very common over a large dataset.
In short I think the 'perfect' scheme you are asking for is impossible. The best you can do is apply a weighting rule to find likely duplicates. Same name / date of birth / place of birth for example is an unlikely but possible event so show a warning to the data entry operator it's a likely duplicate and let them see the details of the likely duplicate. Even things like a drivers license number that should be unique may indicate that the original entry just had a data entry error, not a new duplicate.
From my experience the best thing is a report that lists likely duplicates that must be reviewed by someone higher up the chain, and give them an easy option to merge the duplicates. Then you can start to use more vague regex expressions that throw a few false positives that can be dismissed when a human reviews them. You can also refine the model over time to get the best match results.
Combination of name, date of birth, blood group, place of birth etc., can be tried.
You need to use some national-wide ID. Like Passport ID, or health insurance number.
Social Insurance Number with country.
Given a schema for a DVD rental store, should customers' phone numbers belong to the addresses table, or the users table, and why? Are there any benefits associated with one approach or the other?
Why do you even have an addresses table (unless you want more than one address for a given customer)?
You primary "client" is a customer. You don't rent DVDs to an address, you rent them to a person. You can't take a block of land to court when the occupant runs off with your prized "Free Willy" collectors edition trilogy.
In a world where a person only lived at one place, the address would be part of the customer table (and so would the phone in a one-phone-per-customer scenario).
If you want multiple addresses, that's fine, have a separate addresses table tying those addresses back to the customer.
But you should probably also have a similar setup for phones. Either allow up to N phone numbers per customer (with N columns) in the customers table, or (much better) have a separate phones table allowing any number of phone numbers per customer.
Something like:
customers:
cust_id
cust_stuff
addresses:
cust_id references customers(cust_id)
addr_seq_num
addr_stuff
phones:
cust_id references customers(cust_id)
phon_seq_num
phon_stuff
There's no one correct answer to this, except to say "it depends".
It really depends on what you're modelling with your database schema. Does a phone number logically belong to a user, or an address that could potentially be shared by multiple users?
Example - a mobile phone number might be tied to a particular person, and so be part of the users table. A land-line number might be tied to a particular location or residence, and so be part of the address.
Basic cases of information modeling :
Case A. Each customer can have more than one phone number.
In this case, phone number belongs in a separate table.
Case A1. It is not the case that a customer is required to have a phone number. i.e. the "relationship" is 1-1 to 0-n (i.e. assuming all phone number must always "be for" some customer). Nothing to do.
Case A2. It is the case that each customer is indeed required to have a phone number. You can model this as a relationship that is 1-1 to 1-n, but the "1" of the 1-n part is very hard to enforce in SQL systems (and in the cheapest of them, probably just impossible). That does not mean that you shouldn't be documenting the business rule properly as it is.
Case B. Each customer has AT MOST one phone number.
Case B1. Each customer is required to have a phone number. This means that each customer always has exactly one phone number. Phone number is best put in the customer table. (Note that "to have a phone number" means "to have a phone number THAT IS KNOWN TO THE STORE in question !)
Case B2. It is not required for a customer to have a phone number. In formal relational theory, it is required that you define a separate table which will hold only the known phone numbers. In informal modeling techniques such as ER and UML, you can model this as an "optional attribute". In SQL systems, many would define a nullable attribute for this.
As for "phone numbers 'belonging' to addresses" : is there any kind of "connection" between phone numbers and addresses that is relevant to your business ? I mean, let's say some customer has two addresses and two phone numbers. Is it important to know which of those two phone numbers belongs to which one of those two addresses ? What address would a cellphone number 'belong to' ?
Just an assumption about your site/app, but usually I'd say "Addresses", because user information tends to be info that you pull out frequently to run the site (ID, username, visits etc) whereas phone number may not be?
What do you mean by "traditional?" Since a user can have an arbitrarily large number of contact phone numbers (home, work, personal mobile, work mobile, fax, etc.), it seems like there should be a separate phone numbers table, each row of which includes a number and a value that says what type of number it is.
Contact information is notoriously difficult to model in a relational schema. In order to keep your sanity, I would advise that you make a minimum number of assuptions with respect to phone numbers. Allowing multiple phone numbers for one customer/account is good; beyond that it's hard to apply rules to phone numbers.
There is one well known exception: many pizza delivery shops use phone numbers as primary keys for customers. This works because in general there is one phone associated with the place to which one delivers pizza. On the other hand, many people no longer have land lines, so perhaps even that system is breaking down. In any case, I don't think this applies to DVD rental.
More than one customer may have the same phone number: perhaps multiple people from the same building buy from you.
What happens when you update the phone number for one customer; should it update that number for the other people that supposedly share that number?
Yes if the address for both parties is still the same (but that might actually break some privacy laws).
No if there are now two different addresses.