How to simulate "multi-versioning" regarding database MVCC in JavaScript? - database

Here is a demo which demonstrates the problem with not having transaction "locking". It sort of simulates async / concurrency using setTimeout. I have never dealt with concurrency in languages like C, Go, or Rust, so I am not really sure how it works in implementation detail, but I am trying to grasp the concept of MVCC.
const db = {
table1: {
records: [
{ id: 1, name: 'foo', other: 'hello' },
{ id: 2, name: 'bar', other: 'world' },
]
}
}
function readTable1(id) {
return db.table1.records.find(x => x.id === id)
}
function writeTable1(id) {
const record = readTable1(id)
return new Promise((res, rej) => {
console.log('transaction 1 start')
setTimeout(() => {
record.other = 'qwerty'
setTimeout(() => {
record.name = 'asdf'
console.log('transaction 1 done')
res()
}, 1000)
}, 1000)
})
}
function wait(ms) {
return new Promise((res) => setTimeout(res, ms))
}
async function test1() {
writeTable1(1)
console.log(readTable1(1))
await wait(1100)
console.log(readTable1(1))
await wait(2200)
console.log(readTable1(1))
}
test1()
It logs
transaction 1 start
{ id: 1, name: 'foo', other: 'hello' } // read
{ id: 1, name: 'foo', other: 'qwerty' } // read
transaction 1 done
{ id: 1, name: 'asdf', other: 'qwerty' } // read
In the middle while the transaction is processing the record, it changes the real record which can be concurrently read. There are no locks on it, or however MVCC does it without locks (using multiple versions of records). I next try to implement how I think MVCC works, with hopes that you can correct my understanding. Here is that.
const db = {
table1: {
records: [
[{ id: 1, name: 'foo', other: 'hello' }],
[{ id: 2, name: 'bar', other: 'world' }],
]
}
}
function readTable1(id) {
const idx = db.table1.records.findIndex(x => x[0].id === id)
return [idx, db.table1.records[idx][0]]
}
// this is a long transaction.
function writeTable1(id) {
const [idx, record] = readTable1(id)
// create a new version of record for transaction to act on.
const newRecordVersion = {}
Object.keys(record).forEach(key => newRecordVersion[key] = record[key])
db.table1.records[idx].push(newRecordVersion)
return new Promise((res, rej) => {
console.log('transaction 2 start')
setTimeout(() => {
newRecordVersion.other = 'qwerty'
setTimeout(() => {
newRecordVersion.name = 'asdf'
console.log('transaction 2 done')
// now "commit" the changes
commit()
res();
}, 1000)
}, 1000)
})
function commit() {
db.table1.records[idx].shift()
}
}
function wait(ms) {
return new Promise((res) => setTimeout(res, ms))
}
async function test1() {
writeTable1(1)
console.log(readTable1(1)[1])
await wait(1100)
console.log(readTable1(1)[1])
await wait(2200)
console.log(readTable1(1)[1])
console.log(db.table1.records)
}
test1()
That outputs this, which seems correct.
transaction 2 start
{ id: 1, name: 'foo', other: 'hello' }
{ id: 1, name: 'foo', other: 'hello' }
transaction 2 done
{ id: 1, name: 'asdf', other: 'qwerty' }
[
[ { id: 1, name: 'asdf', other: 'qwerty' } ],
[ { id: 2, name: 'bar', other: 'world' } ]
]
Is this correct, generally how it works? Mainly, how many versions per record are created in a real implementation? Can there be more than 2 versions at a time? If so, in what situations does that occur generally speaking? And how do the timestamps work? I read about the timestamps on the wiki page, but it doesn't really register to me how to implement it. Also the incrementing transaction IDs. So basically how those 3 pieces fit together (versioning, timestamps, and transaction IDs).
I am looking for some sort of simulation of the timestamps and versioning in JavaScript, so I can make sure I understand the general concepts at a high level, yet at a sort of rough approximation of an implementation level. Just knowing what MVCC is and reading a few papers isn't enough in the weeds to know how to implement it.
In my example there will only ever be 2 versions of a record during a transaction. I am not sure if there are cases where you would need more than that. And I am not sure how to plug in the timestamps.

Short answer: Multiversion concurrency control in "databases" involves several different things; different vendors implement each of these things in several different ways.
Here's a list of databases using MVCC (both RDBMs and No-SQL DBs), and which versions first supported MVCC: https://en.wikipedia.org/wiki/List_of_databases_using_MVCC
Here's a good response for MSSQL (Microsoft's enterprise RDBMS database):
https://dba.stackexchange.com/questions/174791/does-sql-server-use-multiversion-concurrency-control-mvcc
Does SQL Server really implement MVCC anywhere
Yes, since SQL Server 2005.
The SQL Server terminology is "row-versioning isolation levels". See
the product documentation tree starting at Locking and Row
Versioning.
Note in particular that there are two separate "MVCC" implementations,
read committed isolation using row versioning (RCSI) and snapshot
isolation (SI).
and how does that reconcile with the idea of with (tablock, holdlock) if it does?
Using that combination of hints serializes access to whole table(s).
It is the least concurrent option available, so using these hints
should be very rare. Whether a particular use could be replaced with
RCSI or SI isolation depends on the specific circumstances. You could
ask a follow-up question with a specific example if you want us to
address that aspect in detail.
You might also like to read my series of
articles
on SQL Server isolation levels.
Here is another good link: Well-known Databases Use Different Approaches for MVCC. It discusses things like:
rollback segments (Oracle)
row versioning (Microsoft)
Finally, here is a good paper from Microsoft Research:
https://www.microsoft.com/en-us/research/wp-content/uploads/2011/12/MVCC-published-revised.pdf
It's a big topic, with no short, simple "one size fits all" reply.
As far as "simulating MVCC in JavaScript":
Javascript is inherently single-threaded; there is no concept of "threads" or "locking" in the language itself.
"Promises" (as you're using) are an excellent way to "order" asynchronous callbacks
We'd need more details on exactly what you're trying to accomplish.

Related

Update cached data for multiple queries that are related on single mutation in react-query?

I have following queries with in my codebase :
Query all the articles
useInfiniteQuery(
['articles', { pageSize: props.pageSize }],
queryFn
);
Query articles of an single category
useInfiniteQuery(
['articles', { categoryId : props.categoryId , pageSize: props.pageSize }],
queryFn
);
Query articles related to single a user
useInfiniteQuery(
['articles', { username : props.username , pageSize: props.pageSize }],
queryFn
);
and for every article there is 'Like' feature so i have created a mutation for it.
useMutation(
articleApi.likePost(props),
{
onMutate: () => {
// I want to implement cache update here
// is there any way to update the liked article
// from all 3 queries at the same time if it present in all of
// them or some of them
},
}
);
My question is is there any way to update the liked article onMutate from all 3 queries at the same time if it present in all of them or some of them.
Have a look at setQueriesData
It will call setQueryData for all matching queries, and you can use fuzzy matching to find your entries. Especially if all 3 entries have the same structure, you can do:
queryClient.setQueriesData(['articles'], newData)
to update them all

useFirestoreConnect with populates - usage/implementation

DB Setup:
- users
- A: { // private user info }
- usersPublic
- A: { // public user info }
- rooms
- 1: { readAccess: ['A'] }
I have a component that displays all rooms and am fetching that in the following way:
useFirestoreConnect(() => [{collection: 'rooms'}] )
This is working fine, but I now want to also load in the info from usersPublic for each user in the rooms readAccess array.
I'm attempting to use populates in the following way:
useFirestoreConnect(() => [{
collection: 'rooms',
populates: [{
root: 'usersPublic',
child: 'A'
}]
}])
I'm pretty sure my implementation of populates is wrong and I'm failing to understand exactly how to make this work.
I could return a bunch of other query configs for all users with read access once I have the room object but that seems inefficient and it seems that populates is meant to solve exactly this problem.
I'm also open to suggestions on modeling the DB structure - the above made sense to me and offers a nice separation between private/public user info but there might be a better way.
The way to do it is:
populates: [{ root: usersPublic, child: 'read_access ' }]
This results in a redux state that looks like:
...etc
data: {
rooms: { ... rooms ... }
usersPublic: { A: { ... usersPublic[A] ... }, etc }
}

Tell apollo-client what gets returned from X query with Y argiments?

I have a list of Items of whatever type. I can query all of them with query items or one with query item(id).
I realize apollo can't know what will be returned. It knows the type, but it doesn't know the exact data. Maybe there is a way not to make additional request? Map one query onto another?
Pseudo-code:
// somewhere in Menu.tsx (renders first)
let items = useQuery(GET_ITEMS);
return items.map(item => <MenuItemRepresenation item={item} />);
// meanwhile in apollo cache (de-normalized for readability):
{ ROOT_QUERY: {
items: [ // query name per schema
{ id: 1, data: {...}, __typename: "Item" },
{ id: 2, data: {...}, __typename: "Item" },
{ id: 3, data: {...}, __typename: "Item" },
]
}
}
// somewhere in MainView.tsx (renders afterwards)
let neededId = getNeededId(); // 2
let item = useQuery(GET_ITEM, { variables: { id: neededId } } );
return <MainViewRepresentation item={item} />;
Code like this will do two fetches. Even though the data is already in the cache. But it seems apollo thinks on query level. I would like a way to explain to it: "If I make item query, you need to look over here at items query you did before. If it has no item with that id go ahead and make the request."
Something akin to this can be done by querying items in MainView.tsx and combing through the results. It might work for pseudo-code, but in a real app it's not that simple: cache might be empty in some cases. Or not sufficient to satisfy required fields. Which means we have to load all items when we need just one.
Upon further research Apollo Link looks promising. It might be possible to intercept outgoing queries. Will investigate tomorrow.
Never mind apollo link. What I was looking for is called cacheRedirects.
It's an option for ApolloClient or Cache constructor.
cacheRedirects: {
Query: {
node: (_, args, { getCacheKey }) => {
const cacheKey = getCacheKey({
__typename: "Item",
id: args.id,
});
return cacheKey;
},
},
},
I'd link to documentation but it's never stable. I've seen too many dead links from questions such as this.

Firebase: Multi Location Update using Firebase Object Observable

I'm trying to work out how to do a multi-location update using the FirebaseObjectObservable.
This is what my data looks like.
recipes: {
-R1: {
name: 'Omelette',
ingredients: ['-I1']
}
}
ingredients: {
-I1: {
name: 'Eggs',
recipes: ['-R1']
},
-I2: {
name: 'Cheese',
recipes: []
}
}
I want to then update that recipe and add an extra ingredient.
const recipe = this.af.database.object(`${this.path}/${key}`);
recipe.update({
name: 'Cheesy Omelette',
ingredients: ['-I1', '-I2']
});
And to do multi-location updates accordingly:
recipes: {
-R1: {
name: 'Cheesy Omelette',
ingredients: ['-I1', '-I2'] // UPDATED
}
}
ingredients: {
-I1: {
name: 'Eggs',
recipes: ['-R1']
},
-I2: {
name: 'Cheese',
recipes: ['-R1'] // UPDATED
}
}
Is this possible in Firebase? And what about the scenario where an update causes 1000 writes.
Storing your ingredients in an array makes it pretty hard to add an ingredient. This is because arrays are index-based: in order to add an item to an array, you must know how many items are already in that array.
Since that number requires a read from the database, the code becomes pretty tricky. The most optimal code I can think of is:
recipe.child("ingredients").orderByKey().limitToLast(1).once("child_added", function(snapshot) {
var updates = {};
updates[parseNum(snapshot.key)+1] = "-I2";
recipe.child("ingredients").update(updates);
});
And while this is plenty tricky to read, it's still not very good. If multiple users are trying to change the ingredients of a recipe at almost the same time, this code will fail. So you really should be using a transaction, which reads more data and hurts scalability of your app.
This is one of the reasons why Firebase has always recommended against using arrays.
A better structure to store the ingredients for a recipe is with a set. With such a structure your recipes would look like this:
recipes: {
-R1: {
name: 'Omelette',
ingredients: {
"-I1": true
}
}
}
And you can easily add a new ingredient to the recipe with:
recipe.update({ "ingredients/-I2": true });

Exclude items from first-level filter when children-levels are empty at Loopback find()

When using Strongloop Loopback, we can make a data request (with relations) to the database these ways:
(1) Using lb-service (at front-end)
Model.find({
filter: {
where: {id: 1},
include: {
relation: 'relationship',
scope: {where: {id: 2}}
}
}
}, function (instances) {
}, function (err) {
});
(2) Using node.js (at server-side)
Model.find({
where: {id: 1},
include: {
relation: 'relationship',
scope: {where: {id: 2}}
}
}, function (err, instances) {
});
What I need: Exclude items from first filter whether another filter fails.
There is one obvious solution: filtering the response, this way:
instances = instances.filter(function(instance){
return typeof(instance.relationship) !== "undefined";
});
But... Using filter() to eliminate is not a good scalable solution, because it will always iterate over the array. Using this solution at the front-end is not good, because the size of the array will slow down the performance. Bringing it to the server-side could be a solution. But... each model will have a particular set of relations... and it is not scalable again!
Main question: Is there some way to overcome this situation, excluding items from the first filter whether second (third, or more) fails simultaneously (or not)?
Something like, defining it on filter object:
var filter = {
where: {id: 1},
include: {
relation: {name: 'relationship', required: true}, // required means this filter *needs* to be satisfied
scope: {where: {id: 2}}
}
};
Requirements:
(1) SQL query is not an option ;)
(2) I am using MySQL as database. So things like
{ where: { id: 1, relationship.id: 2 } }
will not work as desired.
I don't know of a way to do this within the filter syntax itself. I think you would have to write a custom remote method to do the filtering yourself after the initial query was complete. Here's what that might look like:
// in /common/models/model.js
Model.filterResults = function filterResults(filter, next) {
Model.find(filter, function doFilter(err, data) {
if (err) { return next(err); }
var filteredData = data.filter(function(model) {
return model.otherThings && model.otherThings().length;
});
next(null, filteredData);
});
};
Model.remoteMethod(
'filterResults',
{
accepts: { arg: 'filter', type: 'object', http: { source: 'query' } },
returns: { arg: 'results', type: 'array' },
http: { verb: 'get', path: '/no-empties' }
}
);
Now you can hit: .../api/Models/no-empies?filter={"include":"otherThings"} and you will only get back Models that have a related OtherThing. Note that this is for a one-to-many relationship, but hopefully you can see how to change it to fit your needs.

Resources