Google Refine Reconciliation: How to auto-replace with result text - google-refine

Let's say I have cell like this:
"Nat. Taiwan Normal Univ., Taipei"
Once I do reconciliation, I get this:
Nat. Taiwan Normal Univ., Taipei
V V Create new topic
Search for match
Then I click on "Search for match" and get a drop down result:
"National Taiwan Normal University"
2 questions:
- How to I output the result text ("National Taiwan Normal University") in a separate or same column?
- Is it possible to output the country, which the result university is in, to a separate column?
Thank you!

The GREL expression cell.recon.match.name will get you the name. Put it in a separate column, so you don't lose the recon object.
Using "Add column from Freebase" will allow you to add the property /location/location/contained_by with a constraint of "type":"/location/country"

Related

Show UniData SELECT results that are not record keys

I'm looking over some UniData fields for distinct values but I'm hoping to find a simpler way of doing it. The values aren't keys to anything so right now I'm selecting the records I'm interested in and selecting the data I need with SAVING UNIQUE. The problem is, in order to see what I have all I know to do is save it out to a savedlist and then read through the savedlist file I created.
Is there a way to see the contents of a select without running it against a file?
If you are just wanted to visually look over the data, use LIST instead of SELECT.
The general syntax of the command is something like:
LIST filename WITH [criteria] [sort] [attributes | ALL]
So let's say you have a table called questions and want to look over all the author for questions that used the tag unidata. Your query might look something like:
LIST questions WITH tag = "unidata" BY author author
Note: The second author isn't a mistake, it's the start of the list of attributes you want displayed - in this case just author, but you might want the record id as well, so you could do #ID author instead. Or just do ALL to display everything in each record.
I did BY author here as it will make spotting uniques easier, but you can also use other query features like BREAK.ON to help here as well.
I don't know why I didn't think of it at the time but I basically needed something like SQL's DISTINCT statement since I just needed to view the unique values. Replicating DISTINCT in UniData is explained here, https://forum.precisonline.com/index.php?topic=318.0.
The trick is to sort on the values using BY, get a single unique value of each using BREAK-ON, and then suppress everything except those unique values using DET-SUP.
LIST BUILDINGS BY CITY BREAK-ON CITY DET-SUP
CITY.............
Albuquerque
Arlington
Ashland
Clinton
Franklin
Greenville
Madison
Milton
Springfield
Washington

Laravel show records as flat array or single record

I have 2 column in my table setting
with the following values
KEY VALUE
company ABC
phone 14344
address Somerset City
I need to display this like a single record or a flatten
array in the view/blade page
something like
{{$sett->company}}
{{$sett->phone}}
or an array with lookup
{{$myarray('company')}}
{{$myarray('phone')}}
The idea is if I add another settings like contact us email address
for my website I don't want to add another column.
I know this is achievable in controller by creating different variable
and executing different query but I'm kind of looking for some options here.
Thanks for the help really appreciated.
You can use $settings->pluck('value', 'key') to get your result. Read more here: https://laravel.com/docs/5.4/collections#method-pluck

SSRS 2008:How to hide a table row (Conditionally) based on category field

I am new to Sql Server Reporting Services. I have created my following report.
I want to remove/hide rows of Brand Total whenever it does not exist in Brand list. Like in following picture i want to remove/hide "Ethnic Total" whereas "Ethnic" Brand does not exist in "Sample Store 1".
Similary i want to reomve/hide rows of "Outfitters Total" and "Junior Total" from Section Two whereas "Outfitters" and "Junior" don't exist in "Sample Store 2".
This is the structure of my report.
And following is the expression for Net Qty of a Single Brand total.
=Sum(IIf(Fields!Brand.Value = "Outfitters", Fields!Quantity.Value, Nothing))
What should i do?
What condition should i write in expression for Row Visibility?
Thanks in Advance for help.
i hope the below comments you are looking for.
Step 1: select that particular row (Outfitlers Total, Junior Total,Ethnic Total,Store Total)
One at a time and right click and select Row Visibility Option.
Step 2 :
A Dialog box appears with 3 options
1.Show
2.Hide
3. Show or hide based on expression
Select option 3 and copy the below expression in the Expression dialog box.
=iif((Sum(IIf(Fields!Brand.Value = "Outfitters", Fields!Quantity.Value, Nothing))) is nothing ,True,False)
i hope So this will be helpful.
=IIF(Fields!TotalRems.Value=0, True, False)
Replace TotalRems with your correct field name
You can do this way:
=IIF(Fields!YourField.Value like "YourValue",false,true)
Replace "YourField" with your own one and also change "YourValue" to whatever you need.
NB, " " or '' not treated as NOTHING,
For more explanation:
SSRS – Hide Rows in a Group based on a Value
another possibility for the hiding expression is, to use a text box reference. In place of "Textbox1" in the expression below, you can use the name of the text box, which is in the crossing of column "Net Qty" and row "Ethnic Total" (or one of the other total rows you mentioned)
=Iif(IsNothing(ReportItems!Textbox1.Value),True,False)

Best way to store user-submitted item names (and their synonyms)

Consider an e-commerce application with multiple stores. Each store owner can edit the item catalog of his store.
My current database schema is as follows:
item_names: id | name | description | picture | common(BOOL)
items: id | item_name_id | picture | price | description | picture
item_synonyms: id | item_name_id | name | error(BOOL)
Notes: error indicates a wrong spelling (eg. "Ericson"). description and picture of the item_names table are "globals" that can optionally be overridden by "local" description and picture fields of the items table (in case the store owner wants to supply a different picture for an item). common helps separate unique item names ("Jimmy Joe's Cheese Pizza" from "Cheese Pizza")
I think the bright side of this schema is:
Optimized searching & Handling Synonyms: I can query the item_names & item_synonyms tables using name LIKE %QUERY% and obtain the list of item_name_ids that need to be joined with the items table. (Examples of synonyms: "Sony Ericsson", "Sony Ericson", "X10", "X 10")
Autocompletion: Again, a simple query to the item_names table. I can avoid the usage of DISTINCT and it minimizes number of variations ("Sony Ericsson Xperia™ X10", "Sony Ericsson - Xperia X10", "Xperia X10, Sony Ericsson")
The down side would be:
Overhead: When inserting an item, I query item_names to see if this name already exists. If not, I create a new entry. When deleting an item, I count the number of entries with the same name. If this is the only item with that name, I delete the entry from the item_names table (just to keep things clean; accounts for possible erroneous submissions). And updating is the combination of both.
Weird Item Names: Store owners sometimes use sentences like "Harry Potter 1, 2 Books + CDs + Magic Hat". There's something off about having so much overhead to accommodate cases like this. This would perhaps be the prime reason I'm tempted to go for a schema like this:
items: id | name | picture | price | description | picture
(... with item_names and item_synonyms as utility tables that I could query)
Is there a better schema you would suggested?
Should item names be normalized for autocomplete? Is this probably what Facebook does for "School", "City" entries?
Is the first schema or the second better/optimal for search?
Thanks in advance!
References: (1) Is normalizing a person's name going too far?, (2) Avoiding DISTINCT
EDIT: In the event of 2 items being entered with similar names, an Admin who sees this simply clicks "Make Synonym" which will convert one of the names into the synonym of the other. I don't require a way to automatically detect if an entered name is the synonym of the other. I'm hoping the autocomplete will take care of 95% of such cases. As the table set increases in size, the need to "Make Synonym" will decrease. Hope that clears the confusion.
UPDATE: To those who would like to know what I went ahead with... I've gone with the second schema but removed the item_names and item_synonyms tables in hopes that Solr will provide me with the ability to perform all the remaining tasks I need:
items: id | name | picture | price | description | picture
Thanks everyone for the help!
The requirements you state in your comment ("Optimized searching", "Handling Synonyms" and "Autocomplete") are not things that are generally associated with an RDBMS. It sounds like what you're trying to solve is a searching problem, not a data storage and normalization problem. You might want to start looking at some search architectures like Solr
Excerpted from the solr feature list:
Faceted Searching based on unique field values, explicit queries, or date ranges
Spelling suggestions for user queries
More Like This suggestions for given document
Auto-suggest functionality
Performance Optimizations
If there were more attributes exposed for mapping, I would suggest using a fast search index system. No need to set aliases up as the records are added, the attributes simply get indexed and each search issued returns matches with a relevance score. Take the top X% as valid matches and display those.
Creating and storing aliases seems like a brute-force, labor intensive approach that probably won't be able to adjust to the needs of your users.
Just an idea.
One thing that comes to my mind is sorting the characters in the name and synonym throwing away all white space. This is similar to the solution of finding all anagrams for a word. The end result is ability to quickly find similar entries. As you pointed out, all synonyms should converge into one single term, or name. The search is performed against synonyms using again sorted input string.

"2d Search" in Solr or how to get the best item of the multivalued field 'items'?

The title is a bit awkward but I couldn't found a better one. My problem is as follows:
I have several users stored as documents and I am storing several key-value-pairs or items (which have an id) for each document. Now, if I apply highlighting with hl.snippets=5 I can get the first 5 items. But every user could have several hundreds items, so
you will not get the most relevant 5 items. You will get the first 5 items ...
Another problem is that
the highlighted text won't contain the id and so retrieving additional information of the highlighted item text is ugly.
Example where items are emails:
user1 has item1 { text:"developers developers developers", id:1, title:"ms" }
item2 { text:"c# development", id:2, title:"nice!" }
...
item77 ...
user2 has item1 { text:"nice restaurant", id:3, title:"bla"}
item2 { text:"best cafe", id:4, title:"blup"}
...
item223 ...
Now if I use highlighting for the text field and query against "restaurant" I get user2 and the text nice <b>restaurant</b>. But how can I determine the id of the highlighted text to display e.g. the title of this item? And what happens if more relevant items are listed at the end of the item-list? Highlighting won't display those ...
So how can I find the best items of a documents with multiple such items?
I added my two findings as answers, but as I will point out each of them has its own drawbacks.
Could anyone point me to a better solution?
One of my rules of thumb for designing Solr schemas is: the document is what you will search for.
If you want to search for 'items', then these 'items' are your documents. How you store other stuff, like 'users', is secondary. So 'users' could be in another index like you mentioned, they could be "denormalized" (e.g. their information duplicated in each document), in a relational database, etc. depending on RDBMS availability, how many 'users' there are, how many fields these 'users' have, etc.
EDIT: now you explain that the 'items' are emails, and a possible search is 'restaurant X' and you want to find the best 'items' (emails). Therefore, the document is the email. The schema could be as simple as this: (id, title, text, user).
You could enable highlighting to get snippets of the 'text' or 'title' fields matching the 'restaurant X' query.
If you want to give the end-user information about the users that wrote about 'restaurant X', you could facet the 'user' field. Then the end-user would see that John wrote 10 emails about 'restaurant X' and Robert wrote 6. The end-user thinks "This John dude must know a lot about this restaurant" so he drills down into a search by 'restaurant x' with a filter query user:John
You could use use two indices: users->items as described in the question and an index with 'pure items' referencing back to the user.
Then you will need 2 queries (thats the reason I called the question '2d Search in Solr'):
query the user index => list of e.g. 10 users
query the items index for each user of the 1. step => best items
Assume the following example:
userA emails are "restaurant X is bad but restaurant X is cheap", "different topic", "different topicB" and
userB emails are "restaurant X is not nice", "revisited restaurant X and it was ok now", "again in restaurant X and I think it is the best".
Now I query the user index for "restaurant X" and the first user will be userB, which is what I want. If I would query only the item-index I would get the item1 of less relevant userA.
Drawbacks:
bad performance, because you will need one query against the user index and e.g. 10 more to get the most relevant items for each user.
maintaining two indices.
Update to avoid many queries I will try the following: using the user index to get some highlighted snippets and then offering a 'get relevant items'-button for every user which then triggers a query against the item index.
You can use the collapse patch and store each item as separate document linking back to the user.
The problem of that approach is that you won't get the most relevant user. Ie. the most relevant item is not necessarily from the most relevant user (because he can have several slightly less relevant items)
See the "Assume the following example:" part in my second answer.

Resources