Indexing in room prepackaged database not working - database

I have a table in a prepackaged database whose index I created using this query
CREATE INDEX index_less_equal_to_L ON entries_less_equal_to_L(entry_word);
And I specificied in my room entity using this
#Entity(tableName = "entries_less_equal_to_L", indices = {#Index("index_less_ equal_to_L"), #Index(value = "entry_word")})
But it doesn't work, shows me this
[build error output]: https://i.stack.imgur.com/lQmtp.png
What I'm I doing wrong?

You are trying to create 2 indicies, 1 for each #Index.
The first will be named index_less_ equal_to_L but have no columns (hence the error message).
The second will be created (successfully) with a generated name and be on the entry_word column. The name of the index will be as per :-
If not set, Room will set it to the list of columns joined by '' and prefixed by "index${tableName}". So if you have a table with name "Foo" and with an index of {"bar", "baz"}, generated index name will be "index_Foo_bar_baz".
https://developer.android.com/reference/androidx/room
So you need to combine the two into a single index.
You could use either (note the space between _ and equal_to_L removed as I assume that this is a typing error)
indices = {#Index(name = "index_less_equal_to_L",value = "entry_word")}
or
indices = {Index(name ="index_less_equal_to_L",value = {"entry_word"})}
The latter being the more flexible as it allows for composite indicies (multiple columns).

Related

How to get the maximum number in DynamoDB column?

I have a DynamoDB table called URLArray that contains a list of URL's (myURL) and a unique video number (myNum).
I use AWS Amplify to query my table like so for example:
URLData = await API.graphql(graphqlOperation(getUrlArray, { id: "173883db-9ff1-4...."}));
Also myNum is a GSI, so i can also query the row using it, for example:
URLData = await API.graphql(graphqlOperation(getURLinfofromMyNum, { myNum: 5 }));
My question is, I would like to simply query this table to know what the maximum number of myNum is. So in this picture it'd return myNum = 12. How do i query my table to get this?
DynamoDb does not have the equivalent of the SQL expression select MAX(myNum), so you cannot do what you are asking with your table as-is.
A few suggestions:
Record the highest value of myNum as you insert items into the table. For example, you could create an item with PK = "METADATA" and an attribute named maxMyNum. The maxMyNum attribute could be updated conditionally if you have a value that is higher than what is stored in DDB.
You could build a secondary index with myNum as the sort key in a single partition. This would allow you to execute a query operation with ScanIndexForward set to false (descending order), and pick the first returned entry (the max value)
If you are generating an auto incrementing value in your application code, consider checking out the documentation regarding atomic counters.

MongoDB numeric index

I was wondering if it's possible to create a numeric count index where the first document would be 1 and as new documents are inserted the count would increase. If possible are you also able to apply it to documents imported via mongoimport? I have created and index via db.collection.createIndex( {index : 1} ) but it doesn't seem to be applying.
I would strongly recommend using ObjectId as your _id field. This has the benefit of being a good value for distributed systems, but also based on the date it was created. It also has a built-in index inside MongoDB.
Example using Morphia:
Date d = ...;
QueryImpl<MyClass> query = datastore.createQuery(MyClass);
query.field("_id").greaterThanOrEq(new ObjectId(d));
query.sort("_id");
query.limit(100);
List<MyClass> myDocs = query.asList();
This would fetch all documents created since date d in order of creation.
To load the next batch, change to:
query.field("_id").greaterThan(lastDoc.getId());
This will very efficiently load the next batch based on the ID of the last document from the previous batch.

Single search box Web2py, union usage

I am trying to create a single search box on my website.
First I split up the search input in multiple strings using split().
Then I am looping over the multiple strings I created with split(), with every string I create a query. These query's will be stored in a list.
In the next step I am trying to execute all those query's and store the results (rows) in another list.
The next thing I want to do is union all these results(rows). In this case the final result will be an output of a query containing all the different keywords used in the searchbox.
This is my code:
def ajaxlivesearch():
str = request.vars.values()[0]
a=str.split()
items = []
q = []
r =[]
for partialstr in a:
q.append((db.profiel.sport.like('%'+partialstr+'%'))|(db.profiel.speelsterkte.like('%'+partialstr+'%'))|(db.profiel.plaats.like('%'+partialstr+'%')))
for query in q:
r.append(db(query).select(groupby=db.profiel.id))
for results in r:
for (i,row) in enumerate(results):
items.append(DIV(A(B(row.id_user.first_name) ,NBSP(1), B(row.id_user.last_name),BR(), I(row.sport),I(','), NBSP(1), I(row.speelsterkte),I(','), NBSP(1),I(row.plaats),HR(), _id="res%s"%i, _href=row.id_user, _onclick="copyToBox($('#res%s').html())"%i), _id="resultLiveSearch"))
return TAG[''](*items)
My question is: How do I union the multiple results(rows)?
You can get the union of two Rows objects (removing duplicates) as follows:
rows_union = rows1 | rows2
However, it would be more efficient to get all the records in a single query. To simplify, you can also use the .contains method rather than using .like and wrapping each term with %s.
fields = ['sport', 'speelsterkte', 'plaats']
query_terms = [db.profiel[f].contains(term) for f in fields for term in a]
query = reduce(lambda a, b: a | b, query_terms)
results = db(query).select()
Also, you are not using any aggregation functions, so it is not clear why you have specified the groupby argument (and in any case, each record has a unique id, so grouping would have no effect). Perhaps you instead meant orderby=db.profiel.id.
Finally, it is probably not a good idea to do request.vars.values()[0], as request.vars is a dictionary-like object, and the particular value of interest is not guaranteed to be the first item in .values(). Instead, just refer to the name of the particular variable (e.g., request.vars.keyword), which is also more efficient because you are extracting a single item rather than converting all values to a list.

Best database design (model) for user tables

I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Resources