Storing user profile data from multiple lookup tables. how? - database

I have a user table which has about 50-ish pieces of data. Some of it is Religion, political party, Ethnicity, City, Favorite movies, etc. Each of these items are lookup values from either: Their own lookup table OR I have a common lookup table for the small items like gender, sex preference, etc. Even favorite movie is from a movie lookup table.
The question is i assume in the member table all these will be stored as IDs and not text? So first Q:
1) Should they or should they not have FKs to the lookup tables?
2) If we store IDs then to get the actual answer text like Id 6 in city table = new york, Id 10 in nationality table = American etc. for the actual output on the page ,how will it be done? Do we need to Select from each lookup table in the read mode to output the text value? This scares me because out of the 50 pieces of data about 40 of them are lookup based, so that means 40 different select on 40 tables on page read mode and again on edit mode for the user to edit the values.
How is this implemented in real world sites with detailed user profiles? (I have search and analytics on each value so I need to ID them)

Depends on the scope, but this sounds like a sync process - setup a weekly/daily/hourly process to resync extended user information into a master table with a foreign key to the "user"-related table (username, password, email, update stamps, etc...).

What you've described is the big tradeoff between normalized DB design and more of a flat-table design: the queries are a lot more complicated with the normalized design, which is sounds like you have.
I'd think that you'd be reading from the table a lot more than you'd be writing to it? (How often does a person's religion, gender, city, etc. change?) In this case, (only) if you're running into performance issues on the read end, you might maintain two representations of the table: one extensible, normalized one like you have, and a plain-text, flat version that's fast and piece of cake to query and read. When you update the record in the normalized one, you update the record in the flat one.

Related

Oracle APEX - Data Modeling & Primary Keys

I'm creating a rather large APEX application which allows managers to go in and record statistics for associates in the company. Currently we have a database in oracle with data from AD which hold all the associates information. Name, Manager, Employee ID, etc.
Now I'm responsible for creating and modeling a table that will house all their stats for each employee. The table I have created has over 90+ columns in it. Some contain data such as:
Documents Processed
Calls Received
Amount of Doc 1 Processed
Amount of Doc 2 Processed
and the list goes on for well over 90 attributes. So here is my question:
When creating this table in my application with so many different columns how would I go about choosing a primary key that's appropriate? Should I link it to our employee table using the employees identification which is unique (each have a associate number)?
Secondly, how can I create these tables (and possibly form) to allow me to associate the statistic I am entering for an individual to the actual individual?
I have ordered two books from amazon on data modeling since I am new to APEX and DBA design. Not a fresh chicken, but new enough to need some guidance. An additional problem I am running into is that each form can have only 60 fields to it. So I had thought about creating tables for different functions out of my 90+ I have.
Thanks
4.2 allows for 200 items per page.
oracle apex component limits
A couple of questions come to mind:
Are you sure that the employee Ids are not recyclable? If these ids are unique and not recycled.. you've found yourself a good primary key.
What do you plan on doing when you decide to add a new metric? Seems like you might have to add a new column to your rather large and likely not normalized table.
I'd recommend a vertical table for your metrics.. you can use oracle's pivot function to make your data appear more like a horizontal table.
If you went this route you would store your employee Id in one column, your metric key in another, and value...
I'd recommend that you create a metric table consisting of a primary key, a metric label, an active indicator, creation timestamp, creation user id, modified timestamp, modified user id.
This metric table will allow you to add new metrics, change the name of the metric, deactivate a metric, and determine who changed what and when.
This would be a much more flexible approach in my opinion. You may also want to think about audit logs.

T-SQL - Using GUID's Across tables for storing common information?

I started building a database to manage things like vehicles, finances, bills, work history, residence history....etc.
I'm mostly doing this as a learning exercise to teach myself different methods of schema design and ground up development. I've been a database developer for 4 years, but I've only ever worked on the same system/schema. Even my ground up development of new features still have to abide by the existing Schema.
Anyways, I have tables for things like Vehicle Registration, Vehicle Loans/Purchases, Bills, etc. But rather than storing their billing info (payment amount, occurrence, etc), I thought maybe I could put a GUID column on every table, and then store the billing info by the GUID, and then have some sort of view or function that lets me look up the object_id (table) for a particular GUID.
Is this design method a good way to do things?
How would I go about designing the function/view to return the objectid / tablename for specific GUID's?
EDIT: I guess this is a poor explanation, so here's a quick example:
Table: VehicleRegistration
This table would have information like license plate, when the registration is good for, etc.
Table: VehicleLoan
This table would have loan information about each vehicle (amount financed, term, apr, date of purchase, etc).
Both of those tables would also have information like Billing Date, Occurence, Estimated Amount, etc. But instead of storing that data in the two tables, I would store it in another table called BillingInfo or something like that. Obviously I could add a FK on the two tables that point to the PK on the BillingInfo table. But that would mean every table that requires billing information would require that FK on it. Rather than creating that FK on every table...what if instead, every row in VehicleRegistration and VehicleLoan had a unique ID. And I would store the billing info by unique ID instead.
And since it's unque across tables, I would have a function or view to tell me which table that GUID is in. (keep in mind, this is a very small personal database, so for now, speed and optimization is not a concern).
If I applied this method to all common info, like billing information, then I could avoid having to put a FK on every table that needs it. I could just create the table and use the Unique ID's instead?

Where should I store repetitive data in Access?

I'm creating this little Access DB, for the HR department to store all data related to all the training sessions that the company organizes for all the employees.
So, I have a Training Session table with information like date, subject, place, observations, trainer, etc, and the unique ID number.
Then there's the Personnel table, with employer ID (which is also the unique table number), names and working department.
So, after that I need another table that keeps a record of all the attendants of each training session. And here's the question, should I use a table for that in the first place? Does it have to be one table for each training session to store the attendants?
I've used excel for quite some time now, but I'm very new to Access and databases (even small ones like this). Any information will be highly appreciated.
Thanks in advance!
It should be one table for persons, one table for trainings, and one for participation/attendance, to minimize (or best: avoid) repetition. Your tables should use primary and foreign keys, so that there are one-to-many relationships between trainings and attendances as well as people and attendances (the attendances table would then have a column referring to the person who attended, and another column referring to the training session).
Google "database normalization" for more detail and variations of that principle (https://en.wikipedia.org/wiki/Database_normalization).

What is the best way of storing a geographical information in a relation db?

I want to save a geographical data in a relational db and be able to query for data based on their location (country, state or similar not coordinates).
My current solution is to have 4 extra fields (all countries I'm interested in have 2 or 3 administrative divisions) in my table and filter on strings. But I realize that this is a bad solution and would like to normalize my table.
I will also use that data to determine which page my users wants to visit, so it must be simple to lookup a request like "/usa/california/san_fransisco/..."
The only other solution I can come up with is to store those 4 extra fields in another table and link them with a foreign key but that would still mean some data duplication as country name would duplicated in allot of rows.
Is there any better way of doing this?
Normalizing is definitely the way to go. Databases are designed to function that way. Yes the query might look long but it's not that bad. It might look something like this:
select * --or whatever fields you need
from Customer
left outer join City on (Customers.CityID = City.CityID)
left outer join State on (City.StateID = State.StateID)
left outer join Country on (State.CountryID = Country.CountryID)
where CustomerID = 1234
You're on the right track with putting the info in tables. Their called lookup tables. If you want to go the full relational route, you can have the entity link with a foreign key to the city lookup table. The city table links to the state table. The state table links to the country table. You could also store a text version of the complete location in the entity's original table for data display.
My current solution is to have 4 extra fields (all countries I'm interested in have 2 or 3 administrative divisions) in my table and filter on strings. But I realize that this is a bad solution and would like to normalize my table.
I don't think that this is a bad solution. Storing simple geographical/address-based information per row and using WHERE to fetch all records that match is fairly standard procedure. Using a foreign key to link to a separate table is going to be additional work and won't be any faster.
The searching/request using a RESTful interface (as you suggested) is a good idea, however.
Go the normalized route. Joining tables is NOT slow, or complicated. PK of each table will be an integer with a clustered index. Foreign keys will have an index. The join is going to fly.
If you want to list cities in a drop down list, you don't want duplicates. You may list all the cities under a state. De-normalized will slow your query with "distinct", i guarantee you that is slower going the de-normalized route. ironic?
But there is a case for de-normalized. There are millions of addresses. It will probably not be feasible to enter all addresses in your application. So you are going to rely on..... free text input from the user. In this case you don't care about exact correctness or duplicates, you are forced to just accept whatever is data is thrown at you due to the impossibility of having exhaustive data to validate against. And you would rather not bother inserting to "lookup" tables as you don't trust the input to begin with.
You could go for a re-cursive model if you want ultra flexibility to handle different countries. Some countries may not have states, counties, etc. They all have their own hierarchy.

Storing Preferences/One-to-One Relationships in Database

What is the best way to store settings for certain objects in my database?
Method one: Using a single table
Table: Company {CompanyID, CompanyName, AutoEmail, AutoEmailAddress, AutoPrint, AutoPrintPrinter}
Method two: Using two tables
Table Company {CompanyID, COmpanyName}
Table2 CompanySettings{CompanyID, utoEmail, AutoEmailAddress, AutoPrint, AutoPrintPrinter}
I would take things a step further...
Table 1 - Company
CompanyID (int)
CompanyName (string)
Example
CompanyID 1
CompanyName "Swift Point"
Table 2 - Contact Types
ContactTypeID (int)
ContactType (string)
Example
ContactTypeID 1
ContactType "AutoEmail"
Table 3 Company Contact
CompanyID (int)
ContactTypeID (int)
Addressing (string)
Example
CompanyID 1
ContactTypeID 1
Addressing "name#address.blah"
This solution gives you extensibility as you won't need to add columns to cope with new contact types in the future.
SELECT
[company].CompanyID,
[company].CompanyName,
[contacttype].ContactTypeID,
[contacttype].ContactType,
[companycontact].Addressing
FROM
[company]
INNER JOIN
[companycontact] ON [companycontact].CompanyID = [company].CompanyID
INNER JOIN
[contacttype] ON [contacttype].ContactTypeID = [companycontact].ContactTypeID
This would give you multiple rows for each company. A row for "AutoEmail" a row for "AutoPrint" and maybe in the future a row for "ManualEmail", "AutoFax" or even "AutoTeleport".
Response to HLEM.
Yes, this is indeed the EAV model. It is useful where you want to have an extensible list of attributes with similar data. In this case, varying methods of contact with a string that represents the "address" of the contact.
If you didn't want to use the EAV model, you should next consider relational tables, rather than storing the data in flat tables. This is because this data will almost certainly extend.
Neither EAV model nor the relational model significantly slow queries. Joins are actually very fast, compared with (for example) a sort. Returning a record for a company with all of its associated contact types, or indeed a specific contact type would be very fast. I am working on a financial MS SQL database with millions of rows and similar data models and have no problem returning significant amounts of data in sub-second timings.
In terms of complexity, this isn't the most technical design in terms of database modelling and the concept of joining tables is most definitely below what I would consider to be "intermediate" level database development.
I would consider if you need one or two tables based onthe following criteria:
First are you close the the record storage limit, then two tables definitely.
Second will you usually be querying the information you plan to put inthe second table most of the time you query the first table? Then one table might make more sense. If you usually do not need the extended information, a separate ( and less wide) table should improve performance on the main data queries.
Third, how strong a possibility is it that you will ever need multiple values? If it is one to one nopw, but something like email address or phone number that has a strong possibility of morphing into multiple rows, go ahead and make it a related table. If you know there is no chance or only a small chance, then it is OK to keep it one assuming the table isn't too wide.
EAV tables look like they are nice and will save futue work, but in reality they don't. Genreally if you need to add another type, you need to do future work to adjust quesries etc. Writing a script to add a column takes all of five minutes, the other work will need to be there regarless of the structure. EAV tables are also very hard to query when you don;t know how many records you wil need to pull becasue normally you want them on one line and will get the information by joining to the same table multiple times. This causes performance problmes and locking especially if this table is central to your design. Don't use this method.
It depends if you will ever need more information about a company. If you notice yourself adding fields like companyphonenumber1 companyphonenumber2, etc etc. Then method 2 is better as you would seperate your entities and just reference a company id. If you do not plan to make these changes and you feel that this table will never change then method 1 is fine.
Usually, if you don't have data duplication then a single table is fine.
In your case you don't so the first method is OK.
I use one table if I estimate the data from the "second" table will be used in more than 50% of my queries. Use two tables if I need multiple copies of the data (i.e. multiple phone numbers, email addresses, etc)

Resources