I have a field in a table that I will be storing different kinds of data in it, Like: X-Large, Medium, Small....or I might store: 22-March-2009, 1 Year, 2 Years, 3 Years...or 06Months, 12 Months, 1 Year, or I might store: "33-36", "37-40"...and that data is not fixed, i might need in the future to add new categories...
The obvious choice of a data type would be an nvarchar(length), but any other suggestions? is there a way to go around this?
Sounds like you're trying to store a "size". Maybe you need a "Size" table with those values in it (X-Large, Medium, Small, 1 Year, etc.) and an ID field that goes in the other table.
Why you would also want to store a date in the same field is a bit confusing to me. Are you sure you shouldn't have two different fields there?
ETA:
Based on your comment, I would suggest creating a couple additional tables:
SizeType - Would define the type of "size" you were working with (e.g. childrens clothing, childrens shoes, mens shoes, womens shoes, mens shirts, mens pants, womens shirts, womens pants, etc.). Would have two columns - an ID and a Description.
Size - Would define the individual sizes (e.g. "Size 5", XL, 33-34, 0-6 Months, etc.). Would have three columns - and ID, a Description, and the corresponding SizeType id from SizeType.
Now on your product table, you would put the ID from the size table. This gives you some flexibility in terms of adding new sizes, figuring out which sizes go with which type of products, etc. You could break it down further as well to make the design even better, but I don't want to overcomplicate things here.
No matter what you do, such database design does not look good.
Still, you can use BLOB data type to just store any data in a column, or a Text type if it’s text (this way search will work better, understanding upper and lower case and such).
nvarchar(max) would work. Otherwise, you might have multiple columns, one of each possible type. That would keep you from converting things like double-precision numbers into strings and back.
nvarchar(max) if the data is
restricted to strings of less than
2Gb.
ntext if you need to allow for
strings of more than 2Gb.
binary or image if you need to store
binary data.
Related
I am trying to create a database to store my recipes. However, I am not sure how to implement it. I looked at other questions like this but they do not have the same focus as I.
I assume any dish is actually just an ingredient, which can then be used in other dishes, or in this case in other ingredients. Any ingredient may have multiple recipes. For now, each recipe indicates how much of each ingredient is needed, but I also want to know how these ingredients are combined without having a long text description of it.
For example, in text, I would describe one (very bad) scrambled eggs recipes like this:
Scrambled eggs:
Cooked for 5 minutes(
1g Butter,
Whisked(
1g Salt,
1g Pepper,
2 Eggs
)
and then Scrambled eggs could be used in another recipe as an ingredient.
But how would that translate in a database? I don't need that database to be SQL based since this is a personal project, but I don't know any other kind of databases so far.
I thought about defining an Ingredient, as having an optional Technique associated with it but that means Whisked(1g salt, 1g pepper, 2 eggs) would have to be an Ingredient. Which I guess could work and I could also make the name of ingredients optional, but it seems awkward.
I also thought about defining a Recipe as having multiple TransformedIngredients which would contain a Technique applied to many Ingredients but sometimes a Recipe contains raw, untransformed, Ingredients and sometimes TransformedIngredients would need to be applied to TransformedIngredient. From what I know of databases that wouldn't work.
PS: I stumbled onto a functional programming Tiramisu recipe which, though very much focused on the techniques, displays fairly well what I'm trying to implement for my database.
I think what's confusing is that there are two different things to think about with a recipe, 'Items' and 'Steps'.
One database structure that comes to mind for this is a Star Schema structure which separates these ideas nicely (into Dimension and Fact tables, respectively).
A quick description of each:
Dimension
"The state of something" i.e. a record is merely there to describe what the thing is. A customer's address table would be an example of a dimension table.
Fact
"Things changing over time" i.e. each record relates to a dimension table, but has changing values. An example would be shipped purchases from a website to a customer's address. The address stays the same, but the shipments are getting constantly added to the table.
This isn't to say that Dimension tables don't change, too; obviously new users sign up for websites all the time. In the above address example, if a customer were to change his address, a new primary key value would be added for the new address.
Now on to your recipe examples:
Imagine you're cooking something. I would put anything that you hold in your hands in a "dimension" table. For example: DIM_INGREDIENT (with columns such as INDREDIENT_ID, INGREDIENT_NAME), and DIM_AMOUNT (AMOUNT_ID, AMOUNT, UNITS) to describe the amounts. And DIM_ACTION (ACTION_ID, TYPE, LENGTH, UNITS) to describe the action. There are more you can come up with; these are a few to get started.
Any steps I'd be taking could go in a FACT_RECIPE_STEPS table that would map to all the dimension tables. Any step that doesn't have a logical step would have a null value (i.e. stir for 5 minutes would have null for INGREDIENT_ID).
The FACT_RECIPE_STEPS could look like this:
RECIPE_ID, RECIPE_STEP, ACTION_STEP_ID, INGREDIENT_ID, AMOUNT_ID, ACTION_ID
What gets confusing is the "substep" of whisking the stuff together. I put that in another FACT table called FCT_ACTION_STEP since "whisking" is one action in the recipe list, but to perform the action you actually need to do three things.
I think the following is what some of the tables would look like with your data:
DIM_INGREDIENT
INGREDIENT_ID: 1
INGREDIENT_NAME: 'Scrambled eggs'
INGREDIENT_ID: 2
INGREDIENT_NAME: 'Salt'
INGREDIENT_ID: 3
INGREDIENT_NAME: 'Pepper'
INGREDIENT_ID: 4
INGREDIENT_NAME: 'Eggs'
INGREDIENT_ID: 5
INGREDIENT_NAME: 'Butter'
DIM_ACTION
ACTION_ID: 1
TYPE: 'Cook'
LENGTH: 5
UNITS: 'minutes'
ACTION_ID: 2
TYPE: 'Whisk'
LENGTH: null
UNITS: null
FCT_ACTION_STEP
STEP_ID: 1
ACTION_ID: 2
DIM_AMOUNT
AMOUNT_ID: 1
AMOUNT: 1
UNITS: 'grams'
AMOUNT_ID: 2
AMOUNT: 2
UNITS: null
FACT_RECIPE_STEPS
RECIPE_ID, RECIPE_STEP, ACTION_STEP_ID, INGREDIENT_ID, AMOUNT_ID, ACTION_ID
EDIT:
I was a bit unsure myself as to how to do the "Whisked" part of the recipe and thought that, when you add the whisked mixture to the final result, it's like adding in one ingredient to the recipe. However, you need to prepare the mixture before and it has three steps. It's basically like it's own little recipe, and the FACT_ACTION_STEP takes that other 'recipe' into account to be able to add the result one row in the FACT_RECIPE_STEPS table.
Now that I think about it a bit more, it might be better to just assign "Whisked" as its own recipe in FACT_RECIPE_STEPS and DIM_INGREDIENT (called something like "Whisked spices for eggs") +and get rid of the FACT_ACTION_STEP table altogether. That way you can easily make more complex recipes, such as "Eggs and Pancake Breakfast" where the Eggs part is the result of this recipe.
You can add some other fields to tables but I believe this schema works for you.
recipe
------------
r_id PK
recipe_name
cooking_time
recipe_of_recipes
-----------------
ror_id PK
ror_name
recipe_ror (table for many to many relation-> defining a recipe as an ingredient)
-------------
r_ror_id PK
r_id FK
ror_id FK
ingredients
-------------
i_id PK
t_id FK
r_id FK
ror_id FK (added later)
ingredient_name
quantity
technique
-------------
t_id PK
technique_name
EDIT
Let's say you want to store a recipe (X) which is a combination of x and y recipes plus z ingredient.
To prepare X recipe (big X),
in recipe,ingredients and technique tables you store
the x recipe and w,t,r ingredients with technique of p
the y recipe and b,n,m ingredients with technique of v
also z ingredient with technique of f (for this I forgot to add field ror_id as a FK in ingredients table)
You can define 2 different recipes (x and y) as ingredients of a recipe (X) using the recipe_ror table. This table relates to different recipes as one.(many to many relationship between tables recipe and recipe_of_recipes)
If you also want to store the technique for X,x or y recipes(like cook in your example), you can also add t_id field as FK to recipe and recipe_of_recipes table.
If I have the following data:
Results Table
.[Required]
I want one grape
I want one orange
I want one apple
I want one carrot
I want one watermelon
Fruit Table
.[Name]
grape
orange
apple
What I want to do is essentially say give me all results where users are looking for a fruit. This is all just example, I am looking at a table with roughly 1 million records and a string field of 4000+ characters. I am expecting a somewhat slow result and I know that the table could DEFINITELY be structured better, but I have no control of that. Here is the query I would essentially have, but it doesn't seem to do what I want. It gives every record. And yes, [#Fruit] is a temp table.
SELECT * FROM [Results]
JOIN [#Fruit] ON
'%'+[Results].[Required]+'%' LIKE [#Fruit].[Name]
Ideally my output should be the following 3 rows:
I want one grape
I want one orange
I want one apple
If that kind of think is doable, I would try the other way round:
SELECT * FROM [Results]
JOIN [#Fruit] ON
[Results].[Required] LIKE '%'+[#Fruit].[Name]+'%'
This topic interests me, so I did a little bit of searching.
Suggestion 1 : Full Text Search
I think what you are trying to do is Full Text Search .
You will need Full-Text Index created on the table if it is not already there. ( Create FULLTEXT Index ).
This should be faster than performing "Like".
Suggestion 2 : Meta Data Search
Another approach I'd take is to create meta data table, and maintain the information myself when the [Result].Required values are updated(or created).
This looks more or less doable, but I'd start from the Fruit table just for conceptual clarity.
Here's roughly how I would structure this, ignoring all performance / speed / normalization issues (note also that I've switched around the variables in the LIKE comparison):
SELECT f.name, r.required
FROM fruits f
JOIN results r ON r.required LIKE CONCAT('%', f.name, '%')
...and perhaps add a LIMIT 10 to keep the query from wasting time while you're testing it out.
This structure will:
give you one record per "match" (per Result row that matches a Fruit)
exclude Result rows that don't have a Fruit
probably be ungodly slow.
Good luck!
Trying to figure out how to change a structure from what I currently have which is this:
tblHaulLogs
intLogID
intHaulType
intSerial
intOriginSource
intOrigin
intDestinationSource
intDestination
dtmHaulDate
ccyLogPay
intHauler
txtLogNotes
intInvoiceID
In this table, what I am doing is using the origin and destination source fields to determine which table the fk for the origin and destination comes from. This feels very wrong to me.
tblHaulTypes
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
Data in the Haul Types Table:
LOT, 1, 1
DEL, 1, 2
RPO, 2, 1
Now let me explain:
The first type happens when an item goes from a sales lot to another sales lot.
The second type happens when an item goes from a sales lot to a customer(sale gets delivered).
The third type happens when an item returns from the customer back to the sales lot.
Then the Item can be resold/returned/resold/returned(rent-to-own system).
Now, here are the problems I have:
An Haul Log's origin will always be the destination of the last move. Therefore I thought that the origin field is redundant. However, it's the relation between the destination of the last move and the destination of the new move that defines what the shipper gets paid and what type of haul it is.
In other words, even though the first type and the third type technically have the same fields, the type of move is not the same because of the previous move type. What do I need to do here? Am I totally missing the boat on what the structure should be?
The questions I need to answer based on this data is:
How many Items do I have on my sales lots that are new inventory(have never been sold).
How many Items do I have that have been sold and returned(doesn't matter how many times).
I'm guessing at the relationship between the various fields and tables.
Your tblHaulTypes table looks fine.
intHaulTypeID
chrHaulType
intOriginSourceType
intDestinationSourceType
You're missing a haul type that accounts for deliveries from suppliers to your lots.
There has to be some table that lists your lots. I'd call it tblHaulLot.
intLotNumber
txtLotName
...
I'd make a tblHaulTransaction table that looks like this.
intTransactionID
intHaulTypeID
intHauler
intOriginOrganizationID
intDestinationOrganizationID
intOriginLot (null if origin is supplier)
intDestinationLot (null if destination is customer)
dtmHaulDate
txtLogNotes
Now, we need an tblOrganization.
intOrganizationID
txtOrganizationName
txtOrganizationAddress
...
The organization at ID 0 is your organization. Suppliers and customers would fill the rest of the table.
I'd make a tblHaulInvoice table that looks like this.
intInvoiceID
intTransactionID
ccyTransactionPay
dtmDateInvoiced
AmountInvoiced
The amount invoiced (and amount paid) have to be accounted for in some table. I don't know what ccy stands for, and I don't know your 3 letter code for a decimal (money) field.
How many Items do I have on my sales lots that are new inventory(have never been sold). How many Items do I have that have been sold and returned(doesn't matter how many times).
Nowhere in your data model is there any kind of inventory table. I'd need to know a lot more about your business to create one or more inventory tables.
I have a view setup with a map reduce. Right now this code works great:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([id,doc.items[id].name], 1);
}
}
}
}
function(keys,prices){
return (keys, sum(prices));
}
I get a return and when using the group parameter, it condenses everything just fine.
My issue/question, I want to add a third key.... DATE, so I may only reduce records from certain dates. So for example:
function(doc) {
if (doc.type == 'test'){
if(doc.trash != 1){
for (var id in doc.items) {
emit([date,id,doc.items[id].name], 1);
}
}
}
}
My issue is that since date is at the beginning of the array, the reduce groups by date, id etc. I know I use group_level and say just take the first key from the array or the first 2 keys, but that doesn't help either because afaik, group_level goes from left to right in the array. I could put the date on the end of the emit array, but that doesn't help either because I need to have values at the beginning of my startkey and endkey to search on.
Here is an example of the output of data:
{"key":["2012-03-13","356752b8a5f6871f3","Apple"],"value":1},
{"key":["2012-03-20","123752b8a76986857","Pear"],"value":1},
{"key":["2012-04-12","3013531de05871194","Grapefruit"],"value":1},
{"key":["2012-04-12","356752b8a5f6871f3","Apple"],"value":1},
I want APPLE to be added up in one row, here it's adding up apples by date first. I was able to successfully just add up all the apples if I remove DATE as the first key in the array, but then I can't search by date range.
Any ideas on how to accomplish this?
If I correctly understand what you want to do, then you'd want to put the date as the first element of your array, and use group_level as well as start_key and end_key.
Eg. startkey=[1, "someid"] endkey=[1,"someid",{}] group_level=2
Will get you all items from date 1 (obviously choose your own format here), with id "someid" and any name. It seems funny that you emit id's before names, and without having more information about what you're actually trying to accomplish, it's hard to advise your general data model. If ID is a "type" id meaning that many items share the same ID then this makes sense. If ID is a unique per item ID, then it does not. In that case, you'd want to emit "name" before ID...
Edit 1
As per your comment, to do a range of dates you do this:
startkey=[1] endkey=[5,{}] group_level=2
You will get everything from date 1 to date 5 grouped by id ie. apples, oranges etc. I use this exact technique in a very large scale production application. I actually formatted the dates as an easily human readable integers of the format yyyymmdd, so 20140624 would sort to the top. If I want everything from the start of the month till now grouped by my group ids, I call
startkey=[20140601] endkey=[20140624,{}] group_level=2
It works perfectly and as far as I can tell that's what you're looking to do. I also have a third key layer "detail" which allows me to provide a deeper level of grouping for items that need it. I can then call
startkey=[20140601, "someid"] endkey=[20140624, "someid",{}] group_level=3
To drill to the detail level for a particular id, or just use the previous query with group_level=3 if I want the details for every id. I'm certain you can make this work - I've solved this exact problem in a production application using the techniques described.
Edit 2
If you want to group all apples regardless of date, then you'll need to let apples be the first element in the key. You can then get all apples over all time as a single row in the view result using group_level=1, and Apples over a date range using group_level=2. The difference here is that you'll only be able to do the group_level=2 query on a single item type at a time. If you want the best of both worlds, you unfortunately just need to make 2 views. That's just how key ordering works... If you need fast response times for both types of queries, all item types over a date range, and all of a particular item not grouped by date, I believe 2 views is the only way to achieve that.
Note
Another thing to note is about your reduce function. Wherever possible it is highly recommended that you use the built in reduce functions. They're implemented in erlang and are highly optimized compared to custom javascript reduce functions.
In your case, just replace your reduce function with this
_sum
Easy hey?
If you post more info about your application, data model etc. then I'd be happy to help out more with your database design.
I have a search in solr that is returning about 1500 documents. These documents are basically products. For example, I have a bunch of womens shoes in my dataset. My dataset has a wide variety of shoes for women, but it also has some very similar results, for instance, size 11 womens nike trainers, size 10 womens nike trainers, etc... Now, when I search for womens shoes, solr scoring causes a certain set of these results to bubble to the top that are all very similar.. For instance, all the colors of one particular shoe model might come to the top. They are definitely different products, but I would prefer to get a wider variety of results than just every color of nike trainer shoes.
Does anyone have any suggestions? Note, I don't want to eliminate all the individually colored products. When someone searches for blue womens nike trainers, I want them to get the blue model as the top result. I'm using the dismax query as my main query. What I would like to do is basically boost on some kind of "uniqueness of name compared to other results" factor.
You could either collapse on fields like color or so:
http://wiki.apache.org/solr/FieldCollapsing
or you can use near duplicate detection when indexing:
http://wiki.apache.org/solr/Deduplication
http://karussell.wordpress.com/2010/12/23/detect-stolen-and-duplicate-tweets-with-solr/
the latter algorithm is implemented in jetwick for tweets, so it should work for titles, but not performant enough for big documents (so only plagiarism detection for 'short' strings). for long text you'll need local sensitive hashing:
http://en.wikipedia.org/wiki/Locality_sensitive_hashing