Questions about DB modelling - database

How would you model these relationships in a db?
You have a Page entity that can contain PageElements.
A PageElement can for instance be an Article, or a Picture. An Article table obviously has other members / columns than a Picture. An article could have ie. "Title", "Lead", "Body" columns that are all of type nvarchar, while a Picture might have something like "AltText", "Path", "Width", "Height". I like this to be extensible, who knows what PageElements I might need in 3 months? So I guess I'd need a PageElementTypes table.
For the relationships, what about tables like these:
Pages with an Id, and other mumbo jumbo. (Create Date, Visible, what not)
Pages_PageElements with PageId and PageElementId.
PageElements with an Id and a PageElementTypeId and more mumbojumbo (SortOrder, Visibility etc.).
PageElementTypes with an Id and a Name (for instance "Article", "Picture", "AddressBlock")
Now, should I create a PageElementId column in every Articles, Pictures, AddressBlocks table to finish things up? That's where I'm a bit stuck, it's a simple 1:1 relationship so this should work, but somehow I might miss something.
Follow up:
The recommended solutions below with separate attributes would force me to store all attributes as the same type, or not? What If one PageElement has attributes that are nvarchar(255) and some are nvarchar(1000), what if some are integers?
If I got the EAV way I would have to create tons of tables for holding the attribute values for all the different data types out there.

The two common choices are Single Table Inheritance and Multi Table Inheritance. Other approaches include having tables for each concrete class which I've never used, and what I'd call a meta-table implementation, where the attribute definitions are moved into data rather than any sort of schema.
I've had generally good experiences with STI, and provided you don't expect a plethora of classes and attributes it's the simplest solution. Simple is very good in my book.
Unless new page element types need to be created by users at runtime, I'd avoid the meta-tables approach and anything that begins to look like it. In my experience such code quickly becomes a quagmire and rarely delivers much value compared to a more concrete implementation updated at regular intervals by developers.

Just as you have configured Page Elements, you need to configure the Attributes associated with the Page Elements.
So we have two items that are extensible Page Elements & their Attributes.
I sugges the following tables:
Page : Page ID | ...
Page Elements : Page Element ID | Element Type ID | Page ID | ...
Page Element Type : Element Type ID | Page Element Type Label
Page Element Attribute Type : Attribute Type ID | Element Type ID | Attribute Label
Page Element Attributes : Page Element ID | Attribute Type ID | Attribute Value
The Page Element Attribute Type table will contain the list of attributes associated with an element. Example :
Atttibute Type ID 1 | Article | "Title"
Atttibute Type ID 2 | Article | "Lead"
Atttibute Type ID 3 | Picture | "AltText"
The Page Element Attributes table will store the actual value for the attributes assciated with a page element. Example :
Page Element ID 1 | Attribute Type ID 1 | "Everybody Loves Raymond"
Page Element ID 2 | Attribute Type ID 3 | "World Map"

The universal solution would be:
PageElementType: ID, Name, [Mumbo Jumbo]
PageElementTypeParameter: ID, PageElementTypeID, [Mumbo Jumbo]
Page: ID, [Mumbo Jumbo]
PageElement: ID, PageElementTypeID, [Mumbo Jumbo]
PageElementParameters: ID, PageElementID, PageElementTypeParameterID, Value, [Mumbo Jumbo]
In human words: There is a table for page element types, and an associated table, which lists possible parameters for each page element (like SRC and ALT for an image; TEXT for an article, etc).
Then there is a table with all the pages; an associated table which lists elements in each page; and a table which lists parameter values for each element.

I use a different naming convention then you but this is essentially what I would do:
PageElementType(PageElementTypeID, PageElementTypeName)
PageElement(PageElementID, PageElementTypeID)
Article(ArticleID, PageElementID, ...)
Picture(PictureID, PageElementID, ...)
Page(PageID, ...)
PageHasPageElement(PageHasPageElementID, PageID, PageElementID) => {PageID, PageElementID} are unique
This what I do and seems to be fairly well normalized and performs fine.

I guess I'll just go with what I got, EAV is no option for me. What I got now is a somewhat hybrid approach.

Related

How can i upsert to multiple external Id's in Salesforce?

I have an Account object in Salesforce and I have an custom field called ExternalText. I have marked the field as and External Id and
"Set this field as the unique record identifier from an external system"
There are 2 accounts that have this field set to a value of E1 in Salesforce.
I want to do an upsert from a csv file using DataLoader and the csv looks something like this:
External Description
E1 Description 1
E1 Description 2
But when i do the upsert i get the error:
ExternalTest: more than one record found for external id field: [<id1>, <id2>]
I would have expected the Description field for both to be updated to Description 1 and then Description 2, so if i view the object in Salesforce the Description field would say Description 2
How can i do this ?
You can't do it like that. Upsert has to find 0 or exactly 1 record with that external id. On 0 it'll try to create, on 1 it'll try to update, anything else - error.
For most normal usages you'll want fields marked as ext id to also be marked unique. If this isn't unique at source - you need different value in your field or bite the bullet, learn SF record IDs and do plain old query + update for example.
There's 1 edge case why ext id doesn't automatically mark field unique but if you rely on that technicality I'd say you have bigger problems. Imagine system where both UK and Germany created customer ID 123 and they want to push it to Salesforce. They both claim they were first and absolutely won't change their unique ID. So the trick is you can pull it off with right sharing rules. Upsert done with user that only sees UK data will work and update only UK customer. As I said - it's a technicality, in a "you think you're clever but you just made admin's job trickier" area.

In cucumber background steps are passed for first scenario outline but failing for the second scenario outline

Feature: search by customer
Background:
Given user selects search type as customer
Scenario Outline: search customer
When slects customer type as customer
Then enter the customer id as "<customer>" in search
And clicks on search icon to search
Examples:
|customer|
|248069 |
Scenario Outline: Search hierarchy
When slects customer type as hierarchy
Then enter the hierarchy id as "<hierarchy>" in search
And clicks on search icon to search
Examples:
|hierarchy |
|3779213 |
If the second scenario results in an error when executed, I would modify the first scenario outline so that it could run both scenarios. You will need to parameterize the step definition for the first scenario outline something like this:
Scenario Outline: search user types
When selects customer type as <type>
Then enter the customer id as <id> in search
And clicks on search icon to search
Examples:
| type | id |
| customer | 248069 |
| hierarchy | 3779213 |
You will need to modify the step definitions that work (the ones used by the first scenario outline). My suspicion is that the step definition for the first step in the second scenario (slects customer type as hierarchy) is broken and causing your issue. Even if it is not defective, there is no good reason to have two step definition that basically do the same thing. Pass a parameter and make a decision inside the method body to make a decision based on the parameter passed if you need to execute an alternate path.
If you make these changes and the second scenario example fails, you can assume that it is due to a bad parameter being passed. In this case, the id parameter is one character longer in the second scenario example. It is possible this could be the problem.
Since you haven't provided a specific description of the error you are getting, it is impossible to say for certain what solution will work for you. That said, this is my best guess.

Database structure: how to best design for this issue?

I have users that have several objects and can upload images for those objects. Each object has several items. The photos the user upload can be assigned to those items. The thing is, one object can have one specific item more than once.
To give an example: objects are cars and items are seats, windows, doors, etc. A car may have 5 seats, but all seats are the same item. The description of the image should, however, still be "seat 1", "seat 2", etc. and the user can upload multiple images for seat 2 as well.
Till now I have the following tables:
objects: id, name
items: id, name
assigned_items: id, object_id, item_id, quantity
images: id, object_id, item_id
How would you best solve this issue?
The reason I use quantity is, because if type of the item changes, then most probably of all the items. E.g. 4 seats can become 4 wheels, etc. So, if there was a row for each assigned_item, lets say seat1, seat2, seat3, etc, then this would be more difficult to change, no?
Take a look at this model:
It allows you to:
Connect multiple items to multiple objects (thanks to OBJECT_ITEM table).
Connect the same item multiple times to the same object (thanks to OBJECT_ITEM.POSITION field).
Connect multiple images to an object-item connection (thanks to OBJECT_ITEM_IMAGE table). So, we are connecting to a connection, not directly to an item.
Name the image specific to the object-item connection (thanks to OBJECT_ITEM_IMAGE.IMAGE_NAME field), instead of just specific to the image.
Ensure image name is unique per object-item connection (thanks to UNIQUE constraint "U1").
NOTE: This model can be simplified in case OBJECT:ITEM relationship is 1:N instead of the M:N, but your own attempted model seems to suggest it is M:N.
NOTE: To connect an image directly to OBJECT (instead of OBJECT_ITEM), you'd need additional link table (OBJECT_IMAGE) in "between" OBJECT and IMAGE.
Example data:
OBJECT:
Car
ITEM:
Seat
OBJECT_ITEM:
Car-Seat-1
Car-Seat-2
Car-Seat-3
Car-Seat-4
Car-Seat-5
OBJECT_ITEM_IMAGE:
Car-Seat-1-Image1 "Seat1 Image"
Car-Seat-2-Image1 "Seat2 Image"
Car-Seat-2-Image2 "Seat2 Alternate Image"
Car-Seat-3-Image1 "Seat3 Image"
Car-Seat-4-Image1 "Seat4 Image"
Car-Seat-5-Image1 "Seat5 Image"
IMAGE:
Image1
Image2
Unless you actually mean that items can belong to multiple objects, using assigned_items is not helpful. If I understand you correctly, your main concern is that you sometimes have images that are for part of an item, so how do you describe the image?
Here is what I suggest:
OBJECT: id, name
ITEM: id, name, quantity, object_id
IMAGE: id, name (null), object_id (null), item_id (null)
If your DBMS supports constraints, have a constraint on IMAGE to enforce one or the other of object_id or item_id (but not both). This allows you to define the image as being either for an item or for the object as a whole.
When you query for the name of an image, you would use the COALESCE function (or your DB's equivalent) to pick up the image override name (if it exists) or the object/item name (if the override doesn't exist).

Best way to store user-submitted item names (and their synonyms)

Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.
My current database schema is as follows:
item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)
Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")
I think the bright side of this schema is:
Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")
Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")
The down side would be:
Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.
Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:
items: id | name | picture | price | description | picture
(... with item_names and item_synonyms as utility tables that I could query)
Is there a better schema you would suggested?
Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
Is the first schema or the second better/optimal for search?
Thanks in advance!
References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT
EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.
UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names and item_synonyms tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:
items: id | name | picture | price | description | picture
Thanks everyone for the help!
The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr
Excerpted from the solr feature list:
Faceted Searching based on unique field values, explicit queries, or date ranges
Spelling suggestions for user queries
More Like This suggestions for given document
Auto-suggest functionality
Performance Optimizations
If there were more attributes exposed for mapping, I would suggest using a fast search index system. No need to set aliases up as the records are added, the attributes simply get indexed and each search issued returns matches with a relevance score. Take the top X% as valid matches and display those.
Creating and storing aliases seems like a brute-force, labor intensive approach that probably won't be able to adjust to the needs of your users.
Just an idea.
One thing that comes to my mind is sorting the characters in the name and synonym throwing away all white space. This is similar to the solution of finding all anagrams for a word. The end result is ability to quickly find similar entries. As you pointed out, all synonyms should converge into one single term, or name. The search is performed against synonyms using again sorted input string.

How to best represent items with variable # of attributes in a database?

Lets say you want to create a listing of widgets
The Widget Manufacturers all create widgets with different number and types of attributes. And the Widget sellers all have different preferences on what type and number of attributes they want to store in the database and display.
The problem here now is that each time you add in a new widget, it may have attributes on it that donot currently exist for any other widget, and currently you accomplish this by modifying the table and adding in a new column for that attribute and then modifying all forms and reports to reflect this change.
How do you go about creating a database which takes into account that attributes on a widget are fluid and can change from widget to widget.
Ideally the widget attributes should be something the user can define according to his/her preference and needs
I would have a table for widgets and one for widget attributes. For example:
Widgets
- Id
- Name
WidgetAttributes
- Id
- Name
Then, you would have another table which has what widgets have which attributes:
WidgetAttributeMap
- Id
- WidgetId
(a value from the Id column in the Widget table)
- WidgetAttributeId
(a value from the Id column in the WidgetAttribute table)
This way, you can add attributes to widgets by modifying rows in the WidgetAttributeMap table, not by modifying the structure of your widget table.
casperOne is showing the way, although I would personally add yet one more table for the attribute values, ending up with
Widgets
-WidgetID (pk)
-Name
WidgetAttributes
-AttributeID (pk)
-Name
WidgetHasAttribute
-WidgetID (pk)
-AttributeID (pk)
WidgetAttributeValues
-ValueID (pk)
-WidgetID
-AttributeID
-Value
In order to retrieve the results, you want to join the tables and perform an aggregate concatenation, so you can end up with data looking like (for example):
Name Properties
Widget1 Attr1:Value1;Attr2:Value2;...etc
Then you could split the Properties string in your Business Logic Layer and use as you wish.
A suggestion on how to join the data:
SELECT w.Name, wa.Name + ':' + wav.Value
FROM ((
Widgets w
INNER JOIN
WidgetHasAttribute wha
ON w.WidgetID = wha.WidgetID)
INNER JOIN WidgetAttributes wa
ON wha.AttributeID = wa.AttributeID)
INNER JOIN WidgetAttributeValues wav
ON (w.WidgetID = wav.WidgetID AND wa.AttributeID = wav.AttributeID)
You can read more on aggregate concatenation here.
As far as performance is concerned, it shouldn't be a problem as long as you make sure to index all columns that will be frequently read - that is
All the ID columns, as they will be compared in the join clauses
WidgetAttributes.Name and WidgetAttributeValues.Value, as they will be concatenated

Resources