I have a DynamoDB table called URLArray that contains a list of URL's (myURL) and a unique video number (myNum).
I use AWS Amplify to query my table like so for example:
URLData = await API.graphql(graphqlOperation(getUrlArray, { id: "173883db-9ff1-4...."}));
Also myNum is a GSI, so i can also query the row using it, for example:
URLData = await API.graphql(graphqlOperation(getURLinfofromMyNum, { myNum: 5 }));
My question is, I would like to simply query this table to know what the maximum number of myNum is. So in this picture it'd return myNum = 12. How do i query my table to get this?
DynamoDb does not have the equivalent of the SQL expression select MAX(myNum), so you cannot do what you are asking with your table as-is.
A few suggestions:
Record the highest value of myNum as you insert items into the table. For example, you could create an item with PK = "METADATA" and an attribute named maxMyNum. The maxMyNum attribute could be updated conditionally if you have a value that is higher than what is stored in DDB.
You could build a secondary index with myNum as the sort key in a single partition. This would allow you to execute a query operation with ScanIndexForward set to false (descending order), and pick the first returned entry (the max value)
If you are generating an auto incrementing value in your application code, consider checking out the documentation regarding atomic counters.
Related
Can i smhow get index of a query response in DyanmoDB?
[hashKey exists, sortKey exists]
query { KeyCondExp = "hashKey = smthin1", FilterExp = "nonPrimeKey = smthin2" }
I need index of row according to sortKey for that selected document
When a DynamoDB Query request returns an item - in your example chosen by a specific filter - it will return the full item, including the sort key. If that is what you call "the index of row according to sortKey", then you are done.
If, however, by "index" you mean the numeric index - i.e., if the item is the 100th sort key in this partition (hash key), you want to return the number 100 - well, that you can't do. DynamoDB keeps rows inside a partition sorted by the sort key, but not numbered. You can insert an item in the middle of a million-row partition, and it will be inserted in the right place but DynamoDB won't bother to renumber the million-row list just to maintain numeric indexes.
But there is something else you should know. In the query you described, you are using a FilterExpression to return only specific rows out of the entire partition. With such a request, Amazon will charge you for reading the entire partition, not just the specific rows returned after the filter. If you're charged for reading the entire partition, you might as well just read it all, without a filter, and then you can actually count the rows and get the numeric index of the match if that's what you want. Reading the entire partition will cause you more work at the client (and more network traffic), but will not increase your DynamoDB RCU bill.
I used to use Milvus1.0. And I can get all IDs from Milvus1.0 by using get_collection_stats and list_id_in_segment APIs.
These days I am trying Milvus2.0. And I also want to get all IDs from Milvus2.0. But I don't find any ways to do it.
milvus v2.0.x supports queries using boolean expressions.
This can be used to return ids by checking if the field is greater than zero.
Let's assume you are using this schema for your collection.
referencing: https://github.com/milvus-io/pymilvus/blob/master/examples/hello_milvus.py
as of 3/8/2022
fields = [
FieldSchema(name="pk", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, "hello_milvus is the simplest demo to introduce the APIs")
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")
Remember to insert something into your collection first... see the pymilvus example.
Here you want to query out all ids (pk)
You cannot currently list ids specific to a segment, but this would return all ids in a collection.
res = hello_milvus.query(
expr = "pk >= 0",
output_fields = ["pk", "embeddings"]
)
for x in res:
print(x["pk"], x["embeddings"])
I think this is the only way to do it now, since they removed list_id_in_segment
I was wondering if it's possible to create a numeric count index where the first document would be 1 and as new documents are inserted the count would increase. If possible are you also able to apply it to documents imported via mongoimport? I have created and index via db.collection.createIndex( {index : 1} ) but it doesn't seem to be applying.
I would strongly recommend using ObjectId as your _id field. This has the benefit of being a good value for distributed systems, but also based on the date it was created. It also has a built-in index inside MongoDB.
Example using Morphia:
Date d = ...;
QueryImpl<MyClass> query = datastore.createQuery(MyClass);
query.field("_id").greaterThanOrEq(new ObjectId(d));
query.sort("_id");
query.limit(100);
List<MyClass> myDocs = query.asList();
This would fetch all documents created since date d in order of creation.
To load the next batch, change to:
query.field("_id").greaterThan(lastDoc.getId());
This will very efficiently load the next batch based on the ID of the last document from the previous batch.
As stated in the title, I am in a situation where I need to return a count of occurrences within an array, that is within a jsonb column. A pseudo example is as follows:
CREATE TABLE users (id int primary key, tags jsonb);
INSERT INTO users (id, j) VALUES
(1, '{"Friends": ["foo", "bar", "baz"]}'),
(2, '{"Friends": ["bar", "bar"]}');
please note that the value for friends can contain the same value more than once. This will be relevant later (in this case the second value contains contains the name "bar" twice in jsonb column under the key "Friends".)
Question:
For the example above, if I were to search for the value "bar" (given a query that I need help to solve), I want the number of times "bar" appears in the j (jsonb) column within the key "Friends"; in this case the end result I would be looking for is the integer 3. As the term "bar" appears 3 times across 2 rows.
Where I'm at:
Currently I have sql written, that returns a text array containing all of the friends values (from the multiple selected rows) in a single, 1 dimensional array. That sql is as follows
SELECT jsonb_array_elements_text(j->'Friends') FROM users;
yielding result is the following:
jsonb_array_elements_text
-------------------------
foo
bar
baz
bar
bar
Given that this is an array, is it possible to filter this by the term "bar" in some fashion in order to get the count of the number of times it appears? Or am I way off in my approach?
Other Details:
Version: psql (PostgreSQL) 9.5.2
The table in question and a gin index on it.
Please let me know if any additional information is needed, thanks in advance.
You need to use the result of the function as a proper table, then you can easily count the number of times the value appears.
select count(x.val)
from users
cross join lateral jsonb_array_elements_text(tags->'Friends') as x(val)
where x.val = 'bar'
I'm developping a web application using google appengine and django, but I think my problem is more general.
The users have the possibility to create tables, look: tables are not represented as TABLES in the database. I give you an example:
First form:
Name of the the table: __________
First column name: __________
Second column name: _________
...
The number of columns is not fixed, but there is a maximum (100 for example). The type in every columns is the same.
Second form (after choosing a particular table the user can fill the table):
column_name1: _____________
column_name2: _____________
....
I'm using this solution, but it's wrong:
class Table(db.Model):
name = db.StringProperty(required = True)
class Column(db.Model):
name = db.StringProperty(required = True)
number = db.IntegerProperty()
table = db.ReferenceProperty(table, collection_name="columns")
class Value(db.Model):
time = db.TimeProperty()
column = db.ReferenceProperty(Column, collection_name="values")
when I want to list a table I take its columns and from every columns I take their values:
data = []
for column in data.columns:
column_data = []
for value in column.values:
column_data.append(value.time)
data.append(column_data)
data = zip(*data)
I think that the problem is the order of the values, because it is not true that the order for one column is the same for the others. I'm waiting for this bug (but until now I never seen it):
Table as I want: as I will got:
a z c a e c
d e f d h f
g h i g z i
Better solutions? Maybe using ListProperty?
Here's a data model that might do the trick for you:
class Table(db.Model):
name = db.StringProperty(required=True)
owner = db.UserProperty()
column_names = db.StringListProperty()
class Row(db.Model):
values = db.ListProperty(yourtype)
table = db.ReferenceProperty(Table, collection_name='rows')
My reasoning:
You don't really need a separate entity to store column names. Since all columns are of the same data type, you only need to store the name, and the fact that they are stored in a list gives you an implicit order number.
By storing the values in a list in the Row entity, you can use an index into the column_names property to find the matching value in the values property.
By storing all of the values for a row together in a single entity, there is no possibility of values appearing out of their correct order.
Caveat emptor:
This model will not work well if the table can have columns added to it after it has been populated with data. To make that possible, every time that a column is added, every existing row belonging to that table would have to have a value appended to its values list. If it were possible to efficiently store dictionaries in the datastore, this would not be a problem, but list can really only be appended to.
Alternatively, you could use Expando...
Another possibility is that you could define the Row model as an Expando, which allows you to dynamically create properties on an entity. You could set column values only for the columns that have values in them, and that you could also add columns to the table after it has data in it and not break anything:
class Row(db.Expando):
table = db.ReferenceProperty(Table, collection_name='rows')
#staticmethod
def __name_for_column_index(index):
return "column_%d" % index
def __getitem__(self, key):
# Allows one to get at the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# col1 = first_row[1]
# col12 = first_row[12]
value = None
try:
value = self.__dict__[Row.__name_for_column_index]
catch KeyError:
# The given column is not defined for this Row
pass
return value
def __setitem__(self, key, value):
# Allows one to set the columns of Row entities with
# subscript syntax:
# first_row = Row.get()
# first_row[5] = "New values for column 5"
self.__dict__[Row.__name_for_column_index] = value
# In order to allow efficient multiple column changes,
# the put() can go somewhere else.
self.put()
Why don't you add an IntegerProperty to Value for rowNumber and increment it every time you add a new row of values and then you can reconstruct the table by sorting by rowNumber.
You're going to make life very hard for yourself unless your user's 'tables' are actually stored as real tables in a relational database. Find some way of actually creating tables and use the power of an RDBMS, or you're reinventing a very complex and sophisticated wheel.
This is the conceptual idea I would use:
I would create two classes for the data-store:
table this would serve as a
dictionary, storing the structure of
the pseudo-tables your app would
create. it would have two fields :
table_name, column_name,
column_order . where column_order
would give the position of the
column within the table
data
this would store the actual data in
the pseudo-tables. it would have
four fields : row_id, table_name,
column_name , column_data. row_id
would be the same for data
pertaining to the same row and would
be unique for data across the
various pseudo-tables.
Put the data in a LongBlob.
The power of a database is to be able to search and organise data so that you are able to get only the part you want for performances and simplicity issues : you don't want the whole database, you just want a part of it and want it fast. But from what I understand, when you retrieve a user's data, you retrieve it all and display it. So you don't need to sotre the data in a normal "database" way.
What I would suggest is to simply format and store the whole data from a single user in a single column with a suitable type (LongBlob for example). The format would be an object with a list of columns and rows of type. And you define the object in whatever language you use to communicate with the database.
The columns in your (real) database would be : User int, TableNo int, Table Longblob.
If user8 has 3 tables, you will have the following rows :
8, 1, objectcontaintingtable1;
8, 2, objectcontaintingtable2;
8, 3, objectcontaintingtable3;