Assign rows to category in Openrefine - dataset

I have a dataset like this, and I'm looking for a way to add a category, based on what kind of product I have.
Can I search for Apple + Orange and assign them to a category named Fruits, and similar with Milk + Wine and assign them to another category, named Drinks?
| Item | Category |
|-------|----------|
| Apple | | <-- Fruits
| Orange| | <-- Fruits
| Milk | | <-- Drinks
| Wine | | <-- Drinks
Or maybe a simpler method: find any rows containing Milk and assign them to category Drinks?

This is something you can do without code.
Filter or facet in the Item field for each value
Create a facet on the Category field
Click the edit button next to the blank value in the Category facet and type with the category you want to add.
Edit your Item facet or filter to move to the next category and repeat this process until you have categorized all yours items.

As magdmartin says you can do this using facets and edits - the solution he describes is probably the simplest approach and least error prone. However, if you prefer to do in a single step you can use GREL to test the content of the Item cell and then set the value in the Category cell dependent on the content of the Item cell.
with(cells["Item"].value.toLowercase(),w,if(or(w=="orange",w=="apple"),"Fruits",if(or(w=="milk",w=="wine"),"Drinks","")))
This is the same approach as given by Ettore Rizza above but in GREL rather than Jython.

magdmartin and Owen Stephens give good answers. Another simple way using GREL:
From the options dropdown for your 'Item' choose Edit column > Add column based on this column...
New column name 'Category' and in the expression set:
value.replace("Apple","Fruit").replace("Orange","Fruit").replace("Milk","Drink").replace("Wine","Drink")
You could keep adding .replace("whatever food","whatever category") ad nauseum

Related

Fill a text field with a preset value based on the selected value of a list in XLS Forms

I am building an xlsx form for ODK. I have a dropdown list and a text-input field. I would like when I select a value from the ListA, a specific code to be filled in the text-input field.
For example if I choose "valueA" from the dropdown list the value "codeA" to appear in the text-input.
Currently I have an excel sheet with all the matches between the values in ListA and the corresponding codes (around 300). All the values in ListA are unique.
I was making some tests trying to use the "Calculation" field of Xls but no success till now.
Has anyone done something like that with Xls? Is it possible?
You could include a calculate field between the dropdown list and the text-input where you pull values from a csv file with the correspondence between valueA and codeA using the value selected in ValueA as a key. You can find an explanation of how the pulldata(...) calculation works here.
Here goes a brief example. In the survey tab:
|type |name |label |calculation|
|select_one keys|value_a|Value A | |
|calculate |code_a | |pulldata('data', 'value_column', 'key_column', ${value_a}|
|note |note |Code A value: ${code_a}| |
In the choices tab:
|list_name|name |label |
|keys |valueA_1|valueA_1|
|keys |valueA_2|valueA_2|
(.......)
And the file data.csv should look like this:
|value_column|key_column|
|valueA_1 |codeA_1 |
|valueA_2 |codeA_2 |
(.......)
Finally, I would advice to consider applying a cascade selection layout in value_a, because selecting one out of 300 values can be challenging. Another option could be sorting these choices alphabetically or in a different order, such that when someone fills-in your form she will know how to easily locate the appropriate choice.
Hope this helps!

Concatenate values in one row with comma as delimiter in TABLEAU worksheet

Currently my sheet looks like:
Type | Product
A | p1
B | p2
A | p2
C | p3
I want my sheet to look like:
Type | Product
A |p1,p2
B |p2
C |p3
I want to show all products of Type 'A' in one row. To avoid duplicates 'A' entries.
You will need to create a couple table calculations to do this:
create 1 named Products:
IF INDEX() = 1
THEN ATTR([Product])
ELSE
PREVIOUS_VALUE(ATTR([Product]))+ ", "+ ATTR([Product])END
This needs to be set to compute using pane down
Then create another called Rank
RANK([Products])
Put Type, Rank (you will need to change this to discreet to put between type and product), and your original Product field into your Rows.
Right click on your "Rank" and "Product" fields in the row and deselect "Show Header"
put "Rank" into your filters and set it to only be the value of "1"
You will then need to right click on Rank and set it to Calculate using "Pane(across then down)"
Put your new "Products" measure into your text mark.
With this you should get a sheet that looks like this:

CakePHP 2.9 converting CSV string to HABTM data

I am trying to convert a large database (~3m rows) that contains the following data set titled "Posts":
+-------|---------------|-----------------------+
| id | name | tags |
|-------|---------------------------------------|
| 1 | post title | tag_a, tag_b |
| 2 | another title | tag_b, tag_e, tag_j |
+-------|---------------|-----------------------+
I also have an empty "tags" table with the headings id, title and a "posts_tags" table with the headings id, post_id, tag_id
Post <-- Habtm --> Tag
My question:
I would like to know the most efficient (preferred but not required cake way) of populating the "tags" table and the "posts_tags" habtm table while keeping the tags table free from duplicates?
Many Thanks SO Team!
I have no time to write code right now.
You could get all posts (I recommend you paginate your result) and for each post, you get its tags and explode it by comma.
Then you create an HABTM array data using the tags and the currently post, and finally you save your data.

How to merge two Excel sheets

I have an Excel document with 10000 rows of data in two sheets, the thing is one of these sheets have the product costs, and the other has category and other information. These two are imported automatically from the sql server so I don't want to move it to Access but still I want to link the product codes so that when I merge the product tables as product name and cost on the same table, I can be sure that I'm getting the right information.
For example:
Code | name | category
------------------------------
1 | mouse | OEM
4 | keyboard | OEM
2 | monitor | screen
Code | cost |
------------------------------
1 | 123 |
4 | 1234 |
2 | 1232 |
7 | 587 |
Let's say my two sheets have tables like these, as you can see the next one has one that doesn't exist on the other- I put it there because in reality one has a few more, preventing a perfect match. Therefore I couldn't just sort both tables to A-Z and get the costs that way- as I said there are more than 10000 products in that database and I wouldn't want to risk a slight shift of costs -with those extra entries on the other table- that would ruin the whole table.
So what would be a good solution to get the entry from another sheet and inserting it to the right row when merging? Linking two tables with field name??... checking field and trying to match it with the other sheet??... Anything at all.
Note: When I use Access I would make relationships and when I would run a query it would match them automatically... I was wondering if there's a way to do that in excel too.
Why not use a vlookup? If there is a match, it will list the cost. Assuming the top is sheet1 and the other sheet2 and they both start on cell A1. You just need this in cell D2.
=VLOOKUP(A2,Sheet2!A:B,2,0)
You can then drag it down. Easiest way to fill all your 10000 rows is to hover over the bottom left corner of the cell with your cursor. It will turn from a white plus sign into a thin black one. Then simply double click.
Just use VLOOKUP - you can add a row to your first sheet, and find the cost based on code in the other sheet.

Database relationships - 1:1 but not always?

Apologies for the fairly unhelpful title, if you have a better suggestion please feel free to edit it.
I'm using CakePHP and the bake functionality (I don't need to bake however).
What's the best way of achieving the following:
table schema:
table ranges
id | name | description
table images
id | range_id | picture
table info (here i am confused)
id | range_id | height | width | colour
Basically, one range may have many images (1:Many). I can show this fine.
Now, each range will have an entry in the info table (1:1) and some attributes about the range such as height, colour, width. But not always...
Let's say I have a range foo. foo has five images that all have the same height, width and colour. However, foo has one image that is a different size and a different colour.
When the attributes differ, I need to show this information with the respective image, rather than the ranges default information. So this image will need it's own entry in the info table.
Does this even make sense? Or am I going about this entirely the wrong way.
My application, in brief:
(If it helps, think of "range" as a product)
User selects a range
User views images in the range
User can click an image, and the information from info pops up about that range.
Some images have different attributes, but still belong to the same range.
How can I make this distinction and store it appropriately?
Please let me know if I can clarify further. Thank you.
I've needed to do this on occasion where a parent entity has a value that can get "overridden" by a child entity.
There are a couple of approaches you can take the structure being the easiest part.
consider the following structure
table ranges
id | name | description | default_info_id
table images
id | range_id | picture | info_id
table info
id | height | width | colour
When does image.info_id have a value? There are two choices
Populate the image.info_id with the default_info_id from the parent. The user can then override it on the image
Pros
You never need to look at the Range to figure out what the info is on the image
Cons
you need to decide what to do when the range.default_info_id changes. does it effect the images or is it just for future
Only Populate the images.info_id when its different than the parent.
Pros
If the parents.default_info_id changes when images.info_id is null it will automatically change as well
Cons
you need to decide what to do when the range.default_info_id changes. Do you need to now null out any images.info_id that are now the same as the parent?
You need to look at the rages table to figure out what the info_id is on the images when its null.
You can have several varieties of the above data structures but you'll still need to figure when to populate what. Here are two others you could consider that are valid (but less optimal in my opinion)
Info has an FK to both tables but one is always null
table ranges
id | name | description
table images
id | range_id | picture
table info
id | range_id | image_id | height | width | colour
No Info Table at all
table ranges
id | name | description | default_height | default_width | default_colour
table images
id | range_id | picture | height | width | colour

Resources