How to make a UUID in DynamoDB? - database

In my db scheme, I need a autoincrement primary key. How I can realize this feature?
PS For access to DynamoDB, I use dynode, module for Node.js.

Disclaimer: I am the maintainer of the Dynamodb-mapper project
Intuitive workflow of an auto-increment key:
get the last counter position
add 1
use the new number as the index of the object
save the new counter value
save the object
This is just to explain the underlying idea. Never do it this way because it's not atomic. Under certain workload, you may allocate the same ID to 2+ different objects because it's not atomic. This would result in a data loss.
The solution is to use the atomic ADD operation along with ALL_NEW of UpdateItem:
atomically generate an ID
use the new number as the index of the object
save the object
In the worst case scenario, the application crashes before the object is saved but never risk to allocate the same ID twice.
There is one remaining problem: where to store the last ID value ? We chose:
{
"hash_key"=-1, #0 was judged too risky as it is the default value for integers.
"__max_hash_key__y"=N
}
Of course, to work reliably, all applications inserting data MUST be aware of this system otherwise you might (again) overwrite data.
the last step is to automate the process. For example:
When hash_key is 0:
atomically_allocate_ID()
actual_save()
For implementation details (Python, sorry), see https://bitbucket.org/Ludia/dynamodb-mapper/src/8173d0e8b55d/dynamodb_mapper/model.py#cl-67
To tell you the truth, my company does not use it in production because, most of the time it is better to find another key like, for the user, an ID, for a transaction, a datetime, ...
I wrote some examples in dynamodb-mapper's documentation and it can easily be extrapolate to Node.JS
If you have any question, feel free to ask.

Another approach is to use a UUID generator for primary keys, as these are highly unlikely to clash.
IMO you are more likely to experience errors consolidating primary key counters across highly available DynamoDB tables than from clashes in generated UUIDs.
For example, in Node:
npm install uuid
var uuid = require('uuid');
// Generate a v1 (time-based) id
uuid.v1(); // -> '6c84fb90-12c4-11e1-840d-7b25c5ee775a'
// Generate a v4 (random) id
uuid.v4(); // -> '110ec58a-a0f2-4ac4-8393-c866d813b8d1'
Taken from SO answer.

If you're okay with gaps in your incrementing id, and you're okay with it only roughly corresponding to the order in which the rows were added, you can roll your own: Create a separate table called NextIdTable, with one primary key (numeric), call it Counter.
Each time you want to generate a new id, you would do the following:
Do a GetItem on NextIdTable to read the current value of Counter --> curValue
Do a PutItem on NextIdTable to set the value of Counter to curValue + 1. Make this a conditional PutItem so that it will fail if the value of Counter has changed.
If that conditional PutItem failed, it means someone else was doing this at the same time as you were. Start over.
If it succeeded, then curValue is your new unique ID.
Of course, if your process crashes before actually applying that ID anywhere, you'll "leak" it and have a gap in your sequence of IDs. And if you're doing this concurrently with some other process, one of you will get value 39 and one of you will get value 40, and there are no guarantees about which order they will actually be applied in your data table; the guy who got 40 might write it before the guy who got 39. But it does give you a rough ordering.
Parameters for a conditional PutItem in node.js are detailed here. http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/frames.html#!AWS/DynamoDB.html. If you had previously read a value of 38 from Counter, your conditional PutItem request might look like this.
var conditionalPutParams = {
TableName: 'NextIdTable',
Item: {
Counter: {
N: '39'
}
},
Expected: {
Counter: {
AttributeValueList: [
{
N: '38'
}
],
ComparisonOperator: 'EQ'
}
}
};

For those coding in Java, DynamoDBMapper can now generate unique UUIDs on your behalf.
DynamoDBAutoGeneratedKey
Marks a partition key or sort key property as being auto-generated.
DynamoDBMapper will generate a random UUID when saving these
attributes. Only String properties can be marked as auto-generated
keys.
Use the DynamoDBAutoGeneratedKey annotation like this
#DynamoDBTable(tableName="AutoGeneratedKeysExample")
public class AutoGeneratedKeys {
private String id;
#DynamoDBHashKey(attributeName = "Id")
#DynamoDBAutoGeneratedKey
public String getId() { return id; }
public void setId(String id) { this.id = id; }
As you can see in the example above, you can apply both the DynamoDBAutoGeneratedKey and DynamoDBHashKey annotation to the same attribute to generate a unique hash key.

Addition to #yadutaf's answer
AWS supports Atomic Counters.
Create a separate table (order_id) with a row holding the latest order_number:
+----+--------------+
| id | order_number |
+----+--------------+
| 0 | 5000 |
+----+--------------+
This will allow to increment order_number by 1 and get the incremented result in a callback from AWS DynamoDB:
config={
region: 'us-east-1',
endpoint: "http://localhost:8000"
};
const docClient = new AWS.DynamoDB.DocumentClient(config);
let param = {
TableName: 'order_id',
Key: {
"id": 0
},
UpdateExpression: "set order_number = order_number + :val",
ExpressionAttributeValues:{
":val": 1
},
ReturnValues: "UPDATED_NEW"
};
docClient.update(params, function(err, data) {
if (err) {
console.log("Unable to update the table. Error JSON:", JSON.stringify(err, null, 2));
} else {
console.log(data);
console.log(data.Attributes.order_number); // <= here is our incremented result
}
});
🛈 Be aware that in some rare cases their might be problems with the connection between your caller point and AWS API. It will result in the dynamodb row being incremented, while you will get a connection error. Thus, there might appear some unused incremented values.
You can use incremented data.Attributes.order_number in your table, e.g. to insert {id: data.Attributes.order_number, otherfields:{}} into order table.

I don't believe it is possible to to a SQL style auto-increment because the tables are partitioned across multiple machines. I generate my own UUID in PHP which does the job, I'm sure you could come up with something similar like this in javascript.

I've had the same problem and created a small web service just for this purpose. See this blog post, that explains how I'm using stateful.co with DynamoDB in order to simulate auto-increment functionality: http://www.yegor256.com/2014/05/18/cloud-autoincrement-counters.html
Basically, you register an atomic counter at stateful.co and increment it every time you need a new value, through RESTful API. The service is free.

Auto Increment is not good from performance perspective as it will overload specific shards while keeping others idle, It doesn't make even distribution if you're storing data to Dynamodb.
awsRequestId looks like its actually V.4 UUID (Random), code snippet below to try it:
exports.handler = function(event, context, callback) {
console.log('remaining time =', context.getRemainingTimeInMillis());
console.log('functionName =', context.functionName);
console.log('AWSrequestID =', context.awsRequestId);
callback(null, context.functionName);
};
In case you want to generate this yourself, you can use https://www.npmjs.com/package/uuid or Ulide to generate different versions of UUID based on RFC-4122
V1 (timestamp based)
V3 (Namespace)
V4 (Random)
For Go developers, you can use these packages from Google's UUID, Pborman, or Satori. Pborman is better in performance, check these articles and benchmarks for more details.
More Info on Universal Unique Identifier Specification could be found here.

Create the new file.js and put this code:
exports.guid = function () {
function _p8(s) {
var p = (Math.random().toString(16)+"000000000").substr(2,8);
return s ? "-" + p.substr(0,4) + "-" + p.substr(4,4) : p ;
}
return (_p8() + _p8(true) + _p8(true)+new Date().toISOString().slice(0,10)).replace(/-/g,"");
}
Then you can apply this function to the primary key id. It will generate the UUID.

Incase you are using NoSQL DynamoDB then using Dynamoose ORM, you can easily set default unique id. Here is the simple user creation example
// User.modal.js
const dynamoose = require("dynamoose");
const userSchema = new dynamoose.Schema(
{
id: {
type: String,
hashKey: true,
},
displayName: String,
firstName: String,
lastName: String,
},
{ timestamps: true },
);
const User = dynamoose.model("User", userSchema);
module.exports = User;
// User.controller.js
const { v4: uuidv4 } = require("uuid");
const User = require("./user.model");
exports.create = async (req, res) => {
const user = new User({ id: uuidv4(), ...req.body }); // set unique id
const [err, response] = await to(user.save());
if (err) {
return badRes(res, err);
}
return goodRes(res, reponse);
};

Instead of using UUID use KSUID for ids. Naturally ordered by generation time.
https://www.npmjs.com/package/ksuid?activeTab=readme

Related

firebase firestore how to read from a subcollection [duplicate]

I thought I read that you can query subcollections with the new Firebase Firestore, but I don't see any examples. For example I have my Firestore setup in the following way:
Dances [collection]
danceName
Songs [collection]
songName
How would I be able to query "Find all dances where songName == 'X'"
Update 2019-05-07
Today we released collection group queries, and these allow you to query across subcollections.
So, for example in the web SDK:
db.collectionGroup('Songs')
.where('songName', '==', 'X')
.get()
This would match documents in any collection where the last part of the collection path is 'Songs'.
Your original question was about finding dances where songName == 'X', and this still isn't possible directly, however, for each Song that matched you can load its parent.
Original answer
This is a feature which does not yet exist. It's called a "collection group query" and would allow you query all songs regardless of which dance contained them. This is something we intend to support but don't have a concrete timeline on when it's coming.
The alternative structure at this point is to make songs a top-level collection and make which dance the song is a part of a property of the song.
UPDATE
Now Firestore supports array-contains
Having these documents
{danceName: 'Danca name 1', songName: ['Title1','Title2']}
{danceName: 'Danca name 2', songName: ['Title3']}
do it this way
collection("Dances")
.where("songName", "array-contains", "Title1")
.get()...
#Nelson.b.austin Since firestore does not have that yet, I suggest you to have a flat structure, meaning:
Dances = {
danceName: 'Dance name 1',
songName_Title1: true,
songName_Title2: true,
songName_Title3: false
}
Having it in that way, you can get it done:
var songTitle = 'Title1';
var dances = db.collection("Dances");
var query = dances.where("songName_"+songTitle, "==", true);
I hope this helps.
UPDATE 2019
Firestore have released Collection Group Queries. See Gil's answer above or the official Collection Group Query Documentation
Previous Answer
As stated by Gil Gilbert, it seems as if collection group queries is currently in the works. In the mean time it is probably better to use root level collections and just link between these collection using the document UID's.
For those who don't already know, Jeff Delaney has some incredible guides and resources for anyone working with Firebase (and Angular) on AngularFirebase.
Firestore NoSQL Relational Data Modeling - Here he breaks down the basics of NoSQL and Firestore DB structuring
Advanced Data Modeling With Firestore by Example - These are more advanced techniques to keep in the back of your mind. A great read for those wanting to take their Firestore skills to the next level
What if you store songs as an object instead of as a collection? Each dance as, with songs as a field: type Object (not a collection)
{
danceName: "My Dance",
songs: {
"aNameOfASong": true,
"aNameOfAnotherSong": true,
}
}
then you could query for all dances with aNameOfASong:
db.collection('Dances')
.where('songs.aNameOfASong', '==', true)
.get()
.then(function(querySnapshot) {
querySnapshot.forEach(function(doc) {
console.log(doc.id, " => ", doc.data());
});
})
.catch(function(error) {
console.log("Error getting documents: ", error);
});
NEW UPDATE July 8, 2019:
db.collectionGroup('Songs')
.where('songName', isEqualTo:'X')
.get()
I have found a solution.
Please check this.
var museums = Firestore.instance.collectionGroup('Songs').where('songName', isEqualTo: "X");
museums.getDocuments().then((querySnapshot) {
setState(() {
songCounts= querySnapshot.documents.length.toString();
});
});
And then you can see Data, Rules, Indexes, Usage tabs in your cloud firestore from console.firebase.google.com.
Finally, you should set indexes in the indexes tab.
Fill in collection ID and some field value here.
Then Select the collection group option.
Enjoy it. Thanks
You can always search like this:-
this.key$ = new BehaviorSubject(null);
return this.key$.switchMap(key =>
this.angFirestore
.collection("dances").doc("danceName").collections("songs", ref =>
ref
.where("songName", "==", X)
)
.snapshotChanges()
.map(actions => {
if (actions.toString()) {
return actions.map(a => {
const data = a.payload.doc.data() as Dance;
const id = a.payload.doc.id;
return { id, ...data };
});
} else {
return false;
}
})
);
Query limitations
Cloud Firestore does not support the following types of queries:
Queries with range filters on different fields.
Single queries across multiple collections or subcollections. Each query runs against a single collection of documents. For more
information about how your data structure affects your queries, see
Choose a Data Structure.
Logical OR queries. In this case, you should create a separate query for each OR condition and merge the query results in your app.
Queries with a != clause. In this case, you should split the query into a greater-than query and a less-than query. For example, although
the query clause where("age", "!=", "30") is not supported, you can
get the same result set by combining two queries, one with the clause
where("age", "<", "30") and one with the clause where("age", ">", 30).
I'm working with Observables here and the AngularFire wrapper but here's how I managed to do that.
It's kind of crazy, I'm still learning about observables and I possibly overdid it. But it was a nice exercise.
Some explanation (not an RxJS expert):
songId$ is an observable that will emit ids
dance$ is an observable that reads that id and then gets only the first value.
it then queries the collectionGroup of all songs to find all instances of it.
Based on the instances it traverses to the parent Dances and get their ids.
Now that we have all the Dance ids we need to query them to get their data. But I wanted it to perform well so instead of querying one by one I batch them in buckets of 10 (the maximum angular will take for an in query.
We end up with N buckets and need to do N queries on firestore to get their values.
once we do the queries on firestore we still need to actually parse the data from that.
and finally we can merge all the query results to get a single array with all the Dances in it.
type Song = {id: string, name: string};
type Dance = {id: string, name: string, songs: Song[]};
const songId$: Observable<Song> = new Observable();
const dance$ = songId$.pipe(
take(1), // Only take 1 song name
switchMap( v =>
// Query across collectionGroup to get all instances.
this.db.collectionGroup('songs', ref =>
ref.where('id', '==', v.id)).get()
),
switchMap( v => {
// map the Song to the parent Dance, return the Dance ids
const obs: string[] = [];
v.docs.forEach(docRef => {
// We invoke parent twice to go from doc->collection->doc
obs.push(docRef.ref.parent.parent.id);
});
// Because we return an array here this one emit becomes N
return obs;
}),
// Firebase IN support up to 10 values so we partition the data to query the Dances
bufferCount(10),
mergeMap( v => { // query every partition in parallel
return this.db.collection('dances', ref => {
return ref.where( firebase.firestore.FieldPath.documentId(), 'in', v);
}).get();
}),
switchMap( v => {
// Almost there now just need to extract the data from the QuerySnapshots
const obs: Dance[] = [];
v.docs.forEach(docRef => {
obs.push({
...docRef.data(),
id: docRef.id
} as Dance);
});
return of(obs);
}),
// And finally we reduce the docs fetched into a single array.
reduce((acc, value) => acc.concat(value), []),
);
const parentDances = await dance$.toPromise();
I copy pasted my code and changed the variable names to yours, not sure if there are any errors, but it worked fine for me. Let me know if you find any errors or can suggest a better way to test it with maybe some mock firestore.
var songs = []
db.collection('Dances')
.where('songs.aNameOfASong', '==', true)
.get()
.then(function(querySnapshot) {
var songLength = querySnapshot.size
var i=0;
querySnapshot.forEach(function(doc) {
songs.push(doc.data())
i ++;
if(songLength===i){
console.log(songs
}
console.log(doc.id, " => ", doc.data());
});
})
.catch(function(error) {
console.log("Error getting documents: ", error);
});
It could be better to use a flat data structure.
The docs specify the pros and cons of different data structures on this page.
Specifically about the limitations of structures with sub-collections:
You can't easily delete subcollections, or perform compound queries across subcollections.
Contrasted with the purported advantages of a flat data structure:
Root-level collections offer the most flexibility and scalability, along with powerful querying within each collection.

When should I use _id in MongoDB?

MongoDB has a field for every document called "_id". I see people using it everywhere as a primary key, and using it in queries to find documents by the _id.
This field defaults to using an ObjectId which is auto-generated, an example is:
db.tasks.findOne()
{
_id: ObjectID("ADF9"),
description: "Write lesson plan",
due_date: ISODate("2014-04-01"),
owner: ObjectID("AAF1") // Reference to another document
}
But in JavaScript, the underscore behind a field in an object is a convention for private, and as MongoDB uses JSON (specifically, BSON), should I be using these _ids for querying, finding and describing relationships between documents? it doesn't seem right.
I saw that MongoDB has a way to generate UUID https://docs.mongodb.com/manual/reference/method/UUID
Should I forget that _id property, and create my own indexed id property with an UUID?
Use UUIDs for user-generated content, e.g. to name image uploads. UUIDs can be exposed to the user in an URL or when the user inspects an image on the client-side. For everything that is on the server/not exposed to the user, there is no need to generate a UUID, and using the auto-generated _id is preferred.
An simple example of using UUID would be:
const uuid = require('uuid');
exports.nameFile= async (req, res, next) => {
req.body.photo = `${uuid.v4()}.${extension}`;
next();
};
How MongoDB names its things should not interfere in how you name your things. If data sent by third-party hurts the conventions you agreed to follow, you have to transform that data into the format you want as soon as it arrives in your application.
An example based in your case:
function findTaskById(id) {
var result = db.tasks.findOne({"_id": id});
var task = {
id: result._id,
description: result.description,
something: result.something
};
return task;
}
This way you isolate the use of Mongo's _id into the layer of your application that is responsible to interact with the database. In all other places you need task, you can use task.id.

How to ensure unique key while using Parse database

I need unique records my Parse object but due to the 'saveinbackground' a simple find() on the client didn't do the job. Even adding and setting a boolean like bSavingInbackGround and skip an additional save if true wouldn't prevent my app from creating duplicates.
Ensure unique keys is obvious very helpfull in many (multi user) situations.
Parse CloudCode would be the right way but I didn't find a proper solution.
After doing some trail and error testing I finally got it to work using Cloud Code. Hope it helps someone else.
My 'table' is myObjectClass and the field that needs to be unique is 'myKey'.
Add this to main.js an upload to Parse Server Cloud Code.
Change myObjectClass and 'myKey' to suit your needs:
Parse.Cloud.beforeSave("myObjectClass", function(request, response) {
var myObject = request.object;
if (myObject.isNew()){ // only check new records, else its a update
var query = new Parse.Query("myObjectClass");
query.equalTo("MyKey",myObject.get("myKey"));
query.count({
success:function(number){ //record found, don't save
//sResult = number;
if (number > 0 ){
response.error("Record already exists");
} else {
response.success();
}
},
error:function(error){ // no record found -> save
response.success();
}
})
} else {
response.success();
}
});
Your approach is the correct approach, but from a performance point of view, I think using query.first() is faster than query.count() since query.first() will stop as soon as it finds a matching record, whereas query.count() will have to go through the whole class records to return matching the number of matching records, this can be costly if you have a huge class.

An approach to deal with dependency resolution and optimistic updates in react applications

In an architecture where objects have many complex relationships, what are some maintainable approaches to dealing with
Resolving Dependencies
Optimistic Updates
in react applications?
For example, given this type of schema:
```
type Foo {
...
otherFooID: String,
bars: List<Bar>
}
type Bar {
...
bizID: String,
}
type Biz {
...
}
```
A user might want to save the following ->
firstBiz = Biz();
secondBiz = Biz();
firstFoo = Foo({bars: [Bar({biz: firstBiz})]
secondFoo = Foo({bars: [Bar({biz: secondBiz})] otherFooId: firstFooId.id})
First Problem: Choosing real ids
The first problem with above is having the correct id. i.e in order for secondFoo to save, it needs to know the actual id of firstFoo.
To solve this, we could make the tradeoff, of letting the client choose the id, using something like a uuid. I don't see anything terribly wrong this this, so we can say this can work
Second Problem: Saving in order
Even if we determine id's from the frontend, the server still needs to receive these save requests in order.
```
- save firstFoo
// okay. now firstFoo.id is valid
- save secondFoo
// okay, it was able to resolve otherFooID to firstFoo
```
The reasoning here is that the backend must guarantee that any id that is being referenced is valid.
```
- save secondFoo
// backend throws an error otherFooId is invalid
- save firstfoo
// okay
```
I am unsure what the best way to attack this problem is
The current approaches that come to mind
Have custom actions, that do the coordination via promises
save(biz).then(_ => save(Bar).then(_ => save(firstFoo)).then(_ => save(second)
The downside here is that it is quite complex, and the number of these kinds of combinations will continue to grow
Create a pending / resolve helper
const pending = {}
const resolve = (obj, refFn) => {
return Promise.all(obj, refFn(obj));
}
const fooRefs = (foo) => {
return foo.bars.map(bar => bar.id).concat(foo.otherFooId);
}
pending[firstFoo].id = resolve(firstFoo, fooRefs).then(_ => save(firstFoo))
```
The problem with 2. is that it can cause a bunch of errors easily, if we forget to resolve or to add to pending.
Potential Solutions
It seems like Relay or Om next can solve these issues, but i would like something less high power. Perhaps something that can work in with redux, or maybe it's some concept I am missing.
Thoughts much appreciated
I have a JS/PHP implementation of such a system
My approach is to serialize records both on the client and server using a reference system
For example unsaved Foo1 has GUID eeffa3, and a second Foo references its id key as {otherFooId: '#Foo#eeffa3[id]' }
Similarily you can reference a whole object like this
Foo#eefa3:{bars['#Baz#ffg4', '#Baz#ffg5']}
Now the client-side serializer would build a tree of relations and model attributes like this
{
modelsToSave:[
'Foo#effe3':{
attribs:{name:'John', title:'Mr.'},
relations:{bars:['#Bar#ffg4']}
},
'Bar#ffg4':{
attribs:{id:5}
relations:{parentFoo:'#Foo#effe3'}
},
]
}
As you can see in this example I have described circular relations between unsaved objects in pure JSON.
The key here is to hold these "record" objects in client-side memory and never mutate their GUID
The server can figure out the order of saving by saving first records without "parent" dependencies, then records which depend on those parents
After saving, the server wil return the same reference map, but now the attribs will also include primary keys and foreign keys
JS walks the received map twice (first pass just update server-received attributes, second pass substitute record references and attribute references to real records and attributes).
So there are 2 mechanisms for referencing a record, a client-side GUID and a server-side PK
When receiving a server JSON, you match your GUID with the server primary key

Slick lifted update on an Object

I do updates on lifted entities using Slick. This code updates the firstName of a Contact object:
def updateContact(id: Int, firstName: Option[String]): Unit = {
val q1 = for {
c <- Contacts
if c.id is id
} yield c.firstName
// Update value with same or new value
q1.update(firstName.getOrElse(q1.list().head))
}
The option here is already useful for updating the value in case it is a Some (although it would be nicer if the update only happened if there is a new value).
What I am looking for is a way to query the object by ID, then do all the updates in memory using getOrElse and then do an update on the whole object.
Else I have to run the above for each field of the object which works but you know, feels like a dirty hack.
Instead of q1.update(firstName.getOrElse(q1.list().head))
you can write firstName.foreach{ fn => q1.update(fn) }
which is shorter, simpler, one instead of two queries :).
Using foreach on Option stops looking weird when you think of it as a collection with one or zero elements.
Regarding your idea to fetch the whole object, modify it and save it back, you can do it like this:
def updateContact(id: Int, firstName: Option[String], lastName:Option[String], ...): Unit = {
val q1 = Query(Contacts).filter(_.id === id)
val c = q1.first
val modifiedC = c.copy(
firstName = firstName.getOrElse(c.firstName),
lastName = lastName.getOrElse(c.lastName),
...
)
q1.update(modifiedC)
}
Here is another example: http://sysgears.com/notes/how-to-update-entire-database-record-using-slick/
This is clean and simple and probably the best way to do it if performance is not mission critical as this always transfers all columns of Contacts. You can save some traffic by only transferring selected columns.

Resources