DynamoDB update entire column efficiently

DynamoDB update entire column efficiently - database

I have a dynamo table called 'Table'. There are a few columns in the table, including one called 'updated'. I want to set all the 'updated' field to '0' without having to providing a key to avoid fetch and search in the table.
I tried batch write, but seems like update_item required Key inputs. How could I update the entire column to have every value as 0 efficiently please?
I am using a python script.
Thanks a lot.

At this point, you cannot do this, we have to pass a key (Partition key or Partition key and sort key) to update the item.
Currently, the only way to do this is to scan the table with filters to get all the values which have 0 in "updated" column and get respective Keys.
Pass those keys and update the value.
Hopefully, in future AWS will come up with something better.

If you can get partition key, for each partition key you can update the item

Related

Is sorting a table by a time field (where auto_now_add=True), equivalent to sorting it by the said table's primary key ID?

Imagine a database table with a time_of_insert attribute, which is auto-filled by the current time for every INSERT (e.g. in a Django model's example, the attribute has auto_now_add=True).
In that case, is sorting the said table by time_of_insert equivalent to sorting it by each row's ID (primary key)?
Background: I ask because I have a table where I have an auto created time_of_insert attribute. I'm currently sorting the said table by time_of_insert; this field isn't indexed. I feel I can simply sort it by id, instead of indexing time_of_insert - that way I get fast results AND I don't have to incur the over-head of indexing one more table column. My DB is postgres.
What am I missing?

Now it's not.
id guarantees uniqueness. And your datetime column does not.
So in case if there are 2 rows with the same time_of_insert value - then the result set order is not guaranteed.

Cassandra, counters and delete by field

I've this specific use case. I'm storing counters in a table associated with a timestamp:
CREATE TABLE IF NOT EXISTS metrics(
timestamp timestamp,
value counter,
PRIMARY KEY ((timestamp))
);
And I would like to delete all the metrics whose timestamp is lower than a specific value, for example:
DELETE FROM metrics WHERE timestamp < '2015-01-22 17:43:55-0800';
But this command is returning the following error:
code=2200 [Invalid query] message="Invalid operator < for PRIMARY KEY part timestamp"
How could I implement this functionality?

For a delete to work, you will need to provide a precise key with an equals operator. Deleting with a greater/less than operator does not work. Basically, you would have to obtain a list of the timestamps that you want to delete, and iterate through them with a (Python?) script or short (Java/C#) program.
One possible solution (if you happen to know how long you want to keep the data for), would be to set a time to live (TTL) on the data. On a table with counter columns, you cannot do it as a part of an UPDATE command. The only option, is to set it when you create the table:
CREATE TABLE IF NOT EXISTS metrics(
timestamp timestamp,
value counter,
PRIMARY KEY ((timestamp))
) WITH default_time_to_live=259200;
This will remove all data put into the table after 3 days (259200 seconds).
EDIT
And it turns out that the possible solution really isn't possible. Even though Cassandra lets you create a counter table with a default_time_to_live set, it doesn't enforce it.
Back to my original paragraph, the only way to execute a DELETE is to provide the specific key that you are deleting. And for counter tables, it looks like that is probably the only way possible.

Spring Batch Update: Insert only if does not exist otherwise update

I need to write a batch update statement. I am able to do that. I dont have any primary key in my table. There are chances that duplicate data will be sent to database.
I want to write batch update in such a manner that It will insert only if data does not exist. When I say data does not exist, I mean 3 columns of the table which can uniquely identify a row. I don't want to make a primary key using these 3 columns.
Is there a way where we can write batch update which will insert only if data does not exist otherwise it will do the update.
I have tried merge query but could not get it.
Thanks

You can use an ItemProcessor to filter out duplicated items with a query, just return null if item is already present in database: objects that pass the processor can be written with ItemWriter and you are sure there are not duplicated

Inserting a record into a database with looping foreign keys

I have this relationship:
Where CurrentVersionID points to the current active version of the game.
In ArcadeGameVersion the GameID property points to the associated ArcadeGame record.
Problem is, I can't insert either record:
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_ArcadeGame_ArcadeGameVersions". The conflict occurred in database "Scirra", table "dbo.ArcadeGameVersions", column 'ID'.
Is this a badly formed data structure? Otherwise what is the best solution to overcome this?

This structure can work if you need it to be this way.. assuming the IDs are identity fields, I believe you will need to do this in 5 steps:
Insert an ArcadeGame record with a null value for CurrentVersionId
Determine the ID value of the record just added, using a statement like: SELECT #arcadeGameId = SCOPE_IDENTITY()
Insert an ArcadeGameVersion record, setting the GameID to the value determined in the previous step
Determine the ID value of the record just added (again using SCOPE_IDENTITY())
Update the ArcadeGame record (where the ID matches that determined in step 2) and set the CurrentVersionId to the value determined in the previous step.
You will (most likely) want to do the above within a transaction.
If the IDs aren't identity fields and you know the values ahead of time, you can mostly follow the same steps as above but skip the SELECT SCOPE_IDENTITY() steps.

Seems badly formed. I can not see why you need this circular reference.
I would use only one table ArcadeGame with additional fields CurrentVersion and UploadDate.
You can query it based on UploadDate for example, up to your needs. If you explain what you want from that db, answer could be more specific.

Sort records by time inserted

How can I query data in the order it was created?
I don't have a date-created field in this table.

If you don't have a field storing the time of insertion, or any other meta-data regarding the order of insertion, there is no reliable way to get this information.
You could maybe depend on a clustered index key, but these are not guaranteed. Neither are IDENTITY fields or other auto-generated fields.
To clarify, an IDENTITY field does auto-increment, but...
You can insert explicit values with IDENTITY_INSERT
You can reseed and start reusing values
There is no built-in enforcement of uniqueness for an identity field
If the ID field is your PK, you can probably use that to get a rough idea:
SELECT *
FROM MyTable
ORDER BY IdField ASC
Per your comment, the field is a GUID. In that case, there is no way to return any sort of reliable order since GUIDs are inherently random and non-sequential.

You can use NEWSEQUENTIALID().

I came across this questions because I was facing the same issue and here is the solution that worked for me.
Alter the table and add an IDENTITY(1,1) column and that identity column will be auto-populated for existing rows when you add it. It will be in the order in which records were inserted.
It worked in my case but not sure if it works in all cases or not.