I understand GQL does not handle the LIKE operator, so how would I do auto complete, for example, I'm storing "People" with a "Name" and I have a box that I enter "Nig". How would I look for all people who's name matches what was typed and not just starts with?
Assuming I already have all the handling code to pass the content of the box to the back end, I'm just wondering how to do the data mine.
Thanks.
Google Cloud Datastore queries do not support arbitrary substring matches (such as the CONTAINS operator in SQL). You can do prefix matching by executing range queries.
If you want full-text search capabilities and are running in App Engine, check out the Search API.
Related
I'm new to Azure Cognitive Services, and while I'm pretty sure it can help me solve my problem, I don't quite understand which part of it to use for it...
Here's what I want to do:
We have blog posts, say ~1k, and those blog posts all have categories and tags (multiple each). What I want to do, is to "guess" the right categories/tags for each article based on the content, and then present that to the editor as a suggestions at the time of input ("looks like this article is about: health, well-being, ..."). The ~1k articles we already have in the system are currently correctly tagged/categorized, so I'd like to use these a data source for this "guessing".
I've used Azure Search before, and it seems like some combination of EntityRecognition and KeyPhraseExtraction might be a way in the right direction? Azure Cognitive Services also seems to have an API that supports TextAnalytics that would do something similar. I'm a bit confused about why these are two different things (or are they not?)
This also seems like an entirely common problem (matching text against pre-defined categories based on other text that is categorized), so I'm wondering if I'm just missing an obvious solution here?
Thanks in advance.
I think the Azure Cognitive Text Analytics API is your best bet as you are looking for real-time analysis prior to tagging/categorizing for storage.
Text Analytics could return a list of named entities that you could map to your available tags/categories and present to the user.
Azure Cognitive Search requires an indexer and skillset to process target text with an end result of storing the processed results to an index specifically for searching.
Scenario:
Blob storage: contains pdf, word, image files (about 70 files)
I used default fields and predefined skills to create an Azure search instance through the Azure Portal.
But the results for querying any text in these files is not very good. I made content and key phrases as searchable and retrievable. I tried to use Lucene analyzers but was not a great help.
The main concern is if I type even a letter for example "u" in the search explorer, it returns the file. as per my understanding, there is no such word in my files. so what is it doing?
How to refine the search? and also how to manipulate the result?
I am not an expert in document processing. So using the unstructured documents in the blob instead of JSON formatted documents.
another thing, how to define some field in the index, let's say chapter-name or title name which can relate to the PDF chapters/title name?
Please suggest me some ideas or some example links. I am using .net core to develop this.
use custom skill set to extract the fields which you required and make sure those fields are defined in index.
Maybe I am missing something but is there any way to use the new text search features as described in the 2011 presentation http://www.youtube.com/watch?v=7B7FyU9wW8Y (approx. 30min mark) with Objectify, Entities, and Java? I realize it is an experimental release but the text search features that are present don't seem to cover the full extent they discussed in the presentation. I don't want to have to write my own code to manage the creation, updates to documents. But I don't currently see another way??
Currently, you can't use Full Text Search to search through entities in Datastore; you'll need to create search documents in a search index to use the Full Text Search API, as described in these docs.
I have a User class and store the user's submitted full name as a single String:
class User {
private String mFullName;
}
There's no partial string matching available in app engine though. I am looking through current options, looks like the nonrel-search project has come up with a pretty good solution.
I'm using java though. I don't think there's a java port of that project. What are other java users doing?
I'm thinking of just writing user info to a separate MySQL database on account creation, and just hitting that database with a %like% search when users need it (this should be pretty infrequent).
Duplicating user info to a separate MySQL database would be a temporary solution until something comes built-in for app-engine java, but I haven't heard of anything like this on the roadmap.
Any info/ideas would be great!
Thanks
On the python side of things, I've seen one or two full text search implementations mentioned. The way they work is to tokenize the text to be searched and then create App Engine indexes that can be used for queries. I'm not sure if anything comparable has been created for Java at this time.
If you can make do with searches that are equivalent to 'string%' or '%string', it shouldn't be too difficult to make your own index and do a range scan. In the case of '%string', you would want to store the strings to be searched in reverse, e.g., 'gnirts%'.
Background:
I'm building a poetry site with user submitted content. The relevant user actions for my questions are that users can:
a. Go to fancysitename.com/view to see all poems so far
b. Go to fancysitename.com/submit to submit your own poem.
c. Go to fancysitename.com/apoemid to view a particular poem you've bookmarked before.
d. Go to fancysitename.com/search to enter a word to search for in all the poems.
All the poems are stored as text fields in a database and referenced by a poem id. So the "apoemid" in step c will be the primary key of the tuple and I'll just pull up the text after getting the key from the url.
Question:
The poems exist nowhere except in a database. My webapp is literally 4 html files. Will this approach affect my search engine rankings?
Is there a more efficient way to do 'd' rather than do a Select * on the db and manually parsing the text on the server? Each poem will be at the most 10 lines long, so I would imagine using a full text search engine like Lucerne will probably be overkill.
Caveat
I'm running this on the google app engine for now, so my database customization options are pretty limited. So while I'd certainly be interested in hearing about the ideal way to do this, this is a pet side project so my budget is limited :(
Thanks!
Edit: Apparently I don't google so well at 7am. I've since found a solution for question 2 here so please disregard question 2.
AppEngine currently doesnt support full text indexing, they do have a better than nothing SearchableModel.
Some details of SearchableModel can be found here:
http://groups.google.com/group/google-appengine/browse_thread/thread/f64eacbd31629668/8dac5499bd58a6b7?lnk=gst&q=searchablemodel
Regarding search engine ranking, yes having all your poems in the datastore can affect your ranking. This is generally overcome through the use of a sitemap. Here is an article about how StackOverflow uses a sitemap to help its search ranking.
http://www.codinghorror.com/blog/archives/001174.html
In most database engines, you can accomplish this kind of searching. For example MysQL does have full text searching. I am not sure how app engine works but you can always have a stored procedure does this search.
Where you store your data will not affect your site's ranking, only how you serve it up (on what URLs, etc). There's absolutely no way for an arbitrary search spider to tell where you store your data, and no reason for it to care, either.
Regardless of the length of your text, you will need full-text searching if you want to search inside a string. As Sam points out, SearchableModel ought to work just fine for that.