Single search box Web2py, union usage - union

I am trying to create a single search box on my website.
First I split up the search input in multiple strings using split().
Then I am looping over the multiple strings I created with split(), with every string I create a query. These query's will be stored in a list.
In the next step I am trying to execute all those query's and store the results (rows) in another list.
The next thing I want to do is union all these results(rows). In this case the final result will be an output of a query containing all the different keywords used in the searchbox.
This is my code:
def ajaxlivesearch():
str = request.vars.values()[0]
a=str.split()
items = []
q = []
r =[]
for partialstr in a:
q.append((db.profiel.sport.like('%'+partialstr+'%'))|(db.profiel.speelsterkte.like('%'+partialstr+'%'))|(db.profiel.plaats.like('%'+partialstr+'%')))
for query in q:
r.append(db(query).select(groupby=db.profiel.id))
for results in r:
for (i,row) in enumerate(results):
items.append(DIV(A(B(row.id_user.first_name) ,NBSP(1), B(row.id_user.last_name),BR(), I(row.sport),I(','), NBSP(1), I(row.speelsterkte),I(','), NBSP(1),I(row.plaats),HR(), _id="res%s"%i, _href=row.id_user, _onclick="copyToBox($('#res%s').html())"%i), _id="resultLiveSearch"))
return TAG[''](*items)
My question is: How do I union the multiple results(rows)?

You can get the union of two Rows objects (removing duplicates) as follows:
rows_union = rows1 | rows2
However, it would be more efficient to get all the records in a single query. To simplify, you can also use the .contains method rather than using .like and wrapping each term with %s.
fields = ['sport', 'speelsterkte', 'plaats']
query_terms = [db.profiel[f].contains(term) for f in fields for term in a]
query = reduce(lambda a, b: a | b, query_terms)
results = db(query).select()
Also, you are not using any aggregation functions, so it is not clear why you have specified the groupby argument (and in any case, each record has a unique id, so grouping would have no effect). Perhaps you instead meant orderby=db.profiel.id.
Finally, it is probably not a good idea to do request.vars.values()[0], as request.vars is a dictionary-like object, and the particular value of interest is not guaranteed to be the first item in .values(). Instead, just refer to the name of the particular variable (e.g., request.vars.keyword), which is also more efficient because you are extracting a single item rather than converting all values to a list.

Related

Snowflake Flatten Query for array

Snowflake Table has 1 Variant column and loaded with 3 JSON record. The JSON records is as follows.
{"address":{"City":"Lexington","Address1":"316 Tarrar Springs Rd","Address2":null} {"address":{"City":"Hartford","Address1":"318 Springs Rd","Address2":"319 Springs Rd"} {"address":{"City":"Avon","Address1":"38 Springs Rd","Address2":[{"txtvalue":null},{"txtvalue":"Line 1"},{"Line1":"Line 1"}]}
If you look at the Address2 field in the JSON , The first one holds NULL,2nd String and 3rd one array.
When i execute the flatten query for Address 2 as one records holds array, i get only the 3rd record exploded. How to i get all 2 records with exploded value in single query.
select data:address:City::string, data:address:Address1::string, value:txtvalue::string
from add1 ,lateral flatten( input => data:address:Address2 );
When I execute the flatten query for Address 2 as one records holds array, I get only the 3rd record exploded
The default behaviour of the FLATTEN table function in Snowflake will skip any columns that do not have a structure to expand, and the OUTER argument controls this behaviour. Quoting the relevant portion from the documentation link above (emphasis mine):
OUTER => TRUE | FALSE
If FALSE, any input rows that cannot be expanded, either because they cannot be accessed in the path or because they have zero fields or entries, are completely omitted from the output.
If TRUE, exactly one row is generated for zero-row expansions (with NULL in the KEY, INDEX, and VALUE columns).
Default: FALSE
Since your VARIANT data is oddly formed, you'll need to leverage conditional expressions and data type predicates to check if the column in the expanded row is of an ARRAY type, a VARCHAR, or something else, and use the result to emit the right value.
A sample query illustrating the use of all above:
SELECT
t.v:address.City AS city
, t.v:address.Address1 AS address1
, CASE
WHEN IS_ARRAY(t.v:address.Address2) THEN f.value:txtvalue::string
ELSE t.v:address.Address2::string
END AS address2
FROM
add1 t
, LATERAL FLATTEN(INPUT => v:address.Address2, OUTER => TRUE) f;
P.s. Consider standardizing your input at ingest or source to reduce your query complexity.
Note: Your data example is inconsistent (the array of objects does not have homogenous keys), but going by your example query I've assumed that all keys of objects in the array will be named txtvalue.

Google Search API - is it possible to do an IN query?

Using the Search API, is it possible to do an IN query? Where you can query for documents with a string parameter that is contained in an array of Strings?
The Search API doesn't have an IN operator, but one can be emulated using the OR operator. For example, pattern IN [word1, word2, word3] and pattern IN word_listcan be written as:
index.search('word1 OR word2 OR word3')
and, respectively:
index.search(' OR '.join(word_list))
Yes, it is possible to query documents by their contents.
There are several possibilities, you can find this example in Searching for documents by their contents:
def query_index():
index = search.Index('products')
query_string = 'product: piano OR price < 5000'
results = index.search(query_string)
for scored_document in results:
print(scored_document)
You also can find more information about the Query Class and its options as well as more documentation about how queries work, for example:
The simplest query, sometimes called a "global search" is a string
that contains only field values. This search uses a string that
searches for documents that contain the words "rose" and "water":
def simple_search(index):
index.search('rose water')
If the array of strings is too much big, using a categories array with string arrays could be an option to slightly reduce the cost. Some categories could be excluding which would help to reduce the query processing time. For example:
categories=[['animal','dog','cat','fish'],['positive','good','fine','great'],['negative','horrible',disgusting','awful'],['water recreation', 'relax','holidays','spa','beach','sea']]
def query_index():
index = search.Index('products')
for categ in categories:
query_string = ' OR '.join(categ)
results = index.search(query_string)
if len(results)>0:
print(categ[0])
break
for scored_document in results:
print(scored_document)

PostgreSQL count results within jsonb array across multiple rows

As stated in the title, I am in a situation where I need to return a count of occurrences within an array, that is within a jsonb column. A pseudo example is as follows:
CREATE TABLE users (id int primary key, tags jsonb);
INSERT INTO users (id, j) VALUES
(1, '{"Friends": ["foo", "bar", "baz"]}'),
(2, '{"Friends": ["bar", "bar"]}');
please note that the value for friends can contain the same value more than once. This will be relevant later (in this case the second value contains contains the name "bar" twice in jsonb column under the key "Friends".)
Question:
For the example above, if I were to search for the value "bar" (given a query that I need help to solve), I want the number of times "bar" appears in the j (jsonb) column within the key "Friends"; in this case the end result I would be looking for is the integer 3. As the term "bar" appears 3 times across 2 rows.
Where I'm at:
Currently I have sql written, that returns a text array containing all of the friends values (from the multiple selected rows) in a single, 1 dimensional array. That sql is as follows
SELECT jsonb_array_elements_text(j->'Friends') FROM users;
yielding result is the following:
jsonb_array_elements_text
-------------------------
foo
bar
baz
bar
bar
Given that this is an array, is it possible to filter this by the term "bar" in some fashion in order to get the count of the number of times it appears? Or am I way off in my approach?
Other Details:
Version: psql (PostgreSQL) 9.5.2
The table in question and a gin index on it.
Please let me know if any additional information is needed, thanks in advance.
You need to use the result of the function as a proper table, then you can easily count the number of times the value appears.
select count(x.val)
from users
cross join lateral jsonb_array_elements_text(tags->'Friends') as x(val)
where x.val = 'bar'

Google App Engine: IN filter and argument position

I have a table where one of the columns contains a list. I want to know if it is possible to select all rows where the list contains a specific element.
More concretely, I have a guests column containing a list of strings and I want to know if a specific guest string is part of this list. I would like to write a query like this:
q = TableName.gql('WHERE :g IN guests', g=guest)
It seems, however, that I can't put variables in this position. For instance, this query (where ownerid is a string and not a string list) is also disallowed:
q = TableName.gql('WHERE :g = ownerid', g=guest)
I seem to have to write it this way:
q = TableName.gql('WHERE ownerid = :g', g=guest)
Thus I have the following questions:
How can I construct a query that gets rows where a list-cell contains a specific member?
Are arguments for GQL queries restricted to the right-hand side of operators? What is the restriction?
I am using Google App Engine with Python 2.7. Thanks!
You have misunderstood what the IN operator is for. It is not for querying against a repeated field: you just use the normal = for that. IN is for querying against a list of values: eg guest IN [1, 2, 3, 4]. Your query should be:
q = TableName.gql('WHERE guests = :g', g=guest)
or better, since GQL doesn't give you anything that the standard DB syntax doesn't:
q = TableName.all().filter('guests =', guest)

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Resources