Data mapping across websites? - database

I am starting work on a project which has multiple websites for a client. For 'analytic' purposes need to measure metrics across websites over a period of time. This means i need to centralize the data model and all the possible questions/answers/lookup values so it can be used across websites. The question i have is:
Example: Age range of a user visiting website 1 is say: 30-39 years old. (We ask for age range when they enter). so in the data model I have a lookup table for answers which has all possible answers used across all websites. So (30-39) has PK ID of say 102. Now in website 2, same thing, so again (30-39) has PK of 102. This way i can measure across websites the same age range. But the problem is where or how to store the user's answer and map that to this ID?
If i have a table called say UserAnsers, it has an AgeRange colunm. Do i make this a FK to the Answer table at PK 102 to store (30-39) for the user? if yes, then what value gets written in the Useranswer table, would it be 102?
Secondly I need to measure textfields also. Like how many are complete across websites. So say "email address" field. I give this textfield a field Id of 10. Again when i write the consumer' email of say xyz#abc.com in the 'email' colunm of the the answers table how will i link this to the field ID 10?

I'd probably make an API for each website that dumps the data I want in a consistent format, which I then could load into a separate database, where I would then do the metrics on.
This gets unfeasible if the data is so large it takes hours every night to migrate the data.
But otherwise it's a solution that has the benefit that you don't have to touch the current schemas.
If the sites doesn't exist already, then I'd just force a consistent datamodel on all sites, which is trivial. It doens't matter where the mapping between "30-39" and 102 is mapped, the only thing that matters is that it's the same everywhere, which you then set up when you create the databases.
If you want the values and schemas to change a lot, then using just one database for all sites would probably be better, but if you can't do this, then make an export for each site.

Related

is there a way to have 2 dates under a customer's name in Microsoft access

I'm an intern student at a company that does both wiring and aircon services. The job that they gave me was to make a database for them. I don't have any experience in anything related to databases.
So, I started to look up videos and stuff to at least learn a bit about databases and made something that works and I made it after 1.5 months of learning.
in the database that I created,
I have 1 table (CustomerDetailsT):
CustomerID (pk)
CustomerName
PhoneNumber
Address
Aircond (type and model of ac,ex: WM daikin 1.0HP)
AcDetails (what has been done for the ac.)
Others (yes/no) (Wiring, installing a fan and so on)
WhatHasBeenDone (shows what has been done for others)
Then 3 queries (CustomerOthersDetailsQ, CustomerAcDetailsQ, CustomerDetailsQ).CustomerAcDetailsQ has CustomerName, PhoneNumber, Address, Aircond and AcDetails. CustomerOthersDetailsQ has CustomerName, PhoneNumber, Address, Others, and WhatHasBeenDone.CustomerDetailsQ has CustomerID, CustomerName, PhoneNumber and Address
And 1 form with 3 subforms.
it's a search form, which would search for customers as we're typing in their name/phone number and it will show what has been done for the customer.
With this, I have created what the company wants, but now they want to add dates. Dates which would show when we have done something for a customer. Dates for Aircond and the Others stuff.
I've tried with what I know and it didn't work. tried searching it on youtube and google, but still couldn't find it.
how can I go about doing this?. I have tried having separate tables for each service, but it became a hassle when I wanted to create a new customer. . I hope I could some help, I could send pictures if someone needs them.
[1]: https://i.stack.imgur.com/mtrmC.png [The Customer search form] [1]: https://i.stack.imgur.com/A3Y9d.png [example of a customer that has ac installation] [1]: https://i.stack.imgur.com/dsGL5.png [example of a customer that has both ac and wiring done]
Acknowledging the question is too broad, here is some guidance. One of the nice things about Access is that each database is a single file. First protect your work by finding that file and make two copies. Make a backup and a play around version. Only mess with the play around version.
Your question indicates you are still learning Table Normalization and 1 to many relationships. Both of these topics are general to all databases, so you don't have to restrict yourself to just Access when looking for guides and Youtube videos.
Part of normalization is putting separate entities into their own tables. Also, in Access there is a big payoff for using the Relationship Tool, so here is a rather lame example of normalization:
Make sure to select the checkboxes when setting up relationships.
WhatHasbeenDone should also have WhatHasbeenDoneDate. I've wrapped AC and Other as Unit because later it will be easier than having two WhatHasBeenDone tables(AC)(Other).
Now imagine someone taking the customer request call. They just want to see a form to enter the customer details, request, unit-type, etc. They don't want to see those tables. Even with training entering data in the tables is error prone. The person fulfilling the request just wants to enter what they did and when. That's how you start to figure out what your final Data entry forms will look like.
Since we normalized the tables and used the relationships tool, the payoff is Access can give us an assortment of working starter forms. Select Each Table and then hit Create and then hit Form. Choose your Favorites and start playing around from there. While playing, keep in mind that Access will not let you add an item on the many side of a relationship unless there is an item on the 1 side.
For example I selected the customers table and hit create form:
Access uses a concept of form and subform based on separate but related tables. So, to get a form that shows what has been done for each customer I created a form for the What has been done table, and dragged it onto the customers form:
Unless an ID is also being used as a part number or something there is probably no reason for the person entering data to see it. So I removed the texboxes bound to ID's. Except for UnitTypeID, where I replaced the textbox with a combobox that displays the userfriendly UnitDescription. The ID's are still part of the form recordsources, Access is still adding new IDs and using those IDs to put the appropriate data in the right tables.
Oh, didn't we need dates (went back and added a date to the table, and adjusted the subform accordingly). Also changed the subform format from single record to continuous records to show multiple dates:
In conclusion and in my opinion your final forms will use VBA behind the scenes to insert data from the forms into the tables. This is because either you will want to rapidly insert multiple records or How the end users think about the data will not match the default forms and subforms approach Access depends upon to figure out how to insert the data. However, the default approach is fast and I always use it for version 1 of my Access Databases.
P.S. For simplicity I avoided including any Many to Many relationships

Too few items for a table?

I am creating a website and one of the things the user should specify is his/her country. I suppose there are around 200 countries. Which approach is better- storing the 200 countries in a List/Array in the application(therefore a bit more RAM usage) or storing them as a table in the database? How are these kind of problems generally solved?
A 200-row table is not bad practice per se. The main issue is the technical complexity of having your country column be a foreign key to this table. Do countries come and go on a regular basis? Do they frequently change names in a way that requires centralizing all the names in a single table?
My usual approach is to simply store the country's two-letter country code in a char(2) column. This gives you the ability to quickly infer the actual country's name (as opposed to checking what country ID 37 means) without actually introducing an extra table, extra join.
Now, if your application is actually centered on such a country's data, it makes sense to have a country table with several columns to store information about the country, but then I would advocate using the two-letter country code as primary key for that table.
Having a small number of rows is not an issue. If a table is small the database server can cache the whole thing in memory. Databases are fine with small tables.
If you store the countries in a table you can always set up an application level cache that keeps a copy of the contents in memory. It’s easy to add new entries in the database when needed, you can set the expiration time on the cache so new entries can get picked up.
My preferred way to build an application is to have an application-level cache that is separate from the application code so that I can tweak it without code changes. That way I know I can tune the cache later without changing code and I don’t have to code with caching in mind.

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Extendable database schema for contacts (social)

I have an old application that needs upgrading. Doesn't everything now days?
The existing DB schema consists of predefined fields like phone, fax, email. Obviously with the social explosion over the last 5-7 years (or longer depending on your country) end users need more control over creating contact cards the way they see fit rather than just what I think might be useful.
Im concerned here with "digital" addresses. i.e. One line type addresses. phone=ccc ccc ccc ccc etc
Since physical addresses are pretty standard in terms of requirements in this case users will have to use what they are given (location, postal, delivery) in order to keep the scope managable.
So I'm wondering what the best practice format for storing digital info is. To me it seems I have two choices:
A simple 4 field table (ContactId, AddressTypeId, Address, FormatterId)
1000, "phone", "ccc ccc ccc ccc", phoneformatter
1000, "facebook", "myfacebook", facebookformatter
This would then be JOINED anywhere it's need. The table would get massive though and the join performance would degrade over time i suspect.
A json blob that would require additional processing once read (ContactId, Addresses)
1000, {{"phone": "ccc ccc ccc ccc"}, {"facebook": "myfacebook"}}
Or ... something else.
This db is for use in a given country by customers only trading domestically with client bases ranging from 3000-12000 accounts and then however many contacts per account - averages about 10 in current system.
My primary concern is user flexibility but performance is a key consideration in that. So I dunno, just do whatever and throw heaps of hardware at it ;)
Application is in C# if that makes any difference re: post query processing.
I would not go for the JSON blob. This will be nasty if you need to answer any queries like:-
Does anyone have me in their Facebook contacts?
What's the most popular type of social media contact?
You would be forced to parse the JSON for every record and be unable to create a simple index.
Your additional solution is nearly correct, however FormatterId would need to be on a AddressType table. What you have is not normalised as FormatterId would depend only on AddressTypeId. So you would have three tables:-
Contact
ContactAddress
AddressType
You haven't stated if you need to store two addresses of the same type against a single contact. e.g. if someone has two twitter accounts. Answering this question will allow you to define the correct primary key on ContactAddress. It would either be (ContactId, AddressTypeId) if you can only have one of each type per contact or create a synthenic key (ContactAddressId).
Well, I believe you have a table named contact
contact(contactid, contact details, other details)
and now you want to remove this contact details from the contact table because the contact details may contain digital address, phone number and all.
But the table you are considering
(ContactId, AddressTypeId, Address, FormatterId) is not in normal form and you can't uniquely identify a tuple until you read all the four columns which is bad and in this case indexing also not going to help you.
So better if you have if separate table for each type of the digital address, and have indexing on contactID
facebookdetails(contactid, rest of the details)
phonedetails(contactid, rest of the details)
And then the query can be join of all the tables, it will not degrade the performance.
Hope this will help :)

Dynamic database structure

I would like some database/programming suggestion on a specific issue.
I have 5 different people (that live in different parts of the world) that provide me with data. This data is given to me in many variety of ways, following a standard structure layout. However it's not always harmonized, the data might have extra things that are not in the standard, so I'd like the structure to be as dynamic as possible to accommodate what the person wants to use.
These 5 data sources are then placed inside a central database I host. So basically I have 5 data sources that are formatted following a standard structure, and they are uploaded to my local database.
I want to automate the upload of this data as much as possible for the person providing the data, so I want them to upload new sets of data that are automatically inserted in my local db.
My questions are:
How should I keep the structure dynamic without having to revisit my standard layout to accommodate new fields of data, or different structure?
How do I make them upload data in a way that is incremental? For example they might be uploading an XML version of their data, my upload code should figure out what already exists.
My final and most important question. Are there better ways of going about this instead of having an upload infrastructure?
How should I keep the structure dynamic without having to revisit my standard layout to accommodate new fields of data, or different structure?
Basically, you pivot the normal database idea of columns and rows.
You have a data name table, which consists of the unique names of the fields of data, and an indicator to tell the import process what type of data is stored, like a date, timestamp, or integer.
You have a data table, which contains the data name id, a sequence number, the data field, and a foreign key to identifying information.
The sequence number is used to differentiate between different values of the same data name.
The data field holds every type of data possible. This would be a VARCHAR(MAX) in most databases. It's up to the upload process to convert dates and numbers to strings.
Your data table will have a foreign key to the rest of the information that identifies who the data field belongs to.
How do I make them upload data in a way that is incremental? For example they might be uploading an XML version of their data, my upload code should figure out what already exists.
The short answer is that you can't.
Your upload process has to identify duplicate data and not store it on the database.
My final and most important question. Are there better ways of going about this instead of having an upload infrastructure?
This is a hard question to answer without knowing more about the type of data you're receiving, but there is software that allows you to load databases without a lot of programming, by defining the input data structure and mapping that structure to your database tables.
This is a very general question, but I think I have a general answer. What I think solves your problem is to construct a new relational calculus where the properties attached to the master record are not pre-determined. Here is an example involving a phone book application.
Common method using a non-relational table:
Table PERSON has columns Name,
HomePhone, OfficePhone.
All well and good, but what do you do if the occasional person shows up with a mobile phone, more than one mobile phone, a fax phone, etc.
Instead what you do is:
Table Person has columns Person_ID,
Name.
Table Phones has columns Person_ID,
Phone_Type, PhoneNumber.
There is a one-to-many relationship between Person and Phones, and there can be any number of them from zero to a zillion. The tables are JOINed by Person_ID. You have to have business and presentation logic that enumerates the Phone_Type column (or just let it be free-form, which is not as useful but easier).
You can do that for any property, and is what relational data bases are all about. I hope this helps.
As others have said, EAV tables can handle dynamic structure. (be aware of performance issues on large tables)
But is it in your interest to have your database fields dictated by the client? You can't write business logic to act upon those new fields because they don't exist yet, they could be anything.
Can you force the client to conform to your model? This allows you to know the fields ahead of time and have business logic act upon the fields. It allows you to write meaningful reports as well, rather than just pivoted data dumps.

Resources