Getting concurrent requests processing issue with Redis+PHP - phpredis

I am using PHP 5.3 I am using Redis, to store PHP array data for every
requests in Redis variable. I am setting value for 1 redis variable
and on every request I am incrementing value by 1. Using that value as
key for the each array element. But on concurrent requests its
creating problem - 1) Its skipping few records in between. 2) Value of
varible is getting stored as key is getting duplicated.
Right now I am using "Predis" as PHP+Redis client. Please help me in
this and let me know how can I achive this.

Did you try using set (or sorted set) instead? You don't need to worry about managing an external index as its automatically done by Redis.

Related

how can I make custom auto generated IDs for documents in firestore

For my project I need ids that can be easily shared, so firestores default auto generated ids won't work.
I am looking for a way to auto generate id like 8329423 that would be incremented or randomly chosen in range 0 to 9999999.
Firestore's auto-ID fields are designed to statistically guarantee that no two clients will ever generate the same value. This is why they're as long as they are: it's to ensure there is enough randomness (entropy) in them.
This allows Firestore to determine these keys completely client-side without needing to look up on the server whether the key it generated was already generated on another client before. And this in turn has these main benefits:
Since the keys are generated client-side, they can also be generated when the client is not connected to any server.
Since the keys are generated client-side, there is no need for a roundtrip to the server to generate a new key. This significantly speeds up the process.
Since the keys are generated client-side, there is no contention between clients generating keys. Each client just generates keys as needed.
If these benefits are important to your use-case, then you should strongly consider whether you're likely to create a better unique ID than Firestore already does. For example, Firestore's IDs have 62^20 unique values, which is why they're statistically guaranteed to never generate the same value over a very long period of time. Your proposed range of 0 - 9999999 has 1 million unique values, which is much more likely to generate duplicate.
If you really want this scheme for IDs, you will need to store the IDs that you've already given out on the server (likely in Firestore), so that you can check against it when generating a new key. A very common way to do this is to keep a counter of the last ID you've already handed out in a document. To generate a new unique ID, you:
Read the latest counter value from the document.
Increment the counter.
Write the updated counter value to the document.
Use the updated counter value in your code.
Since this read-update-write happens from multiple clients, you will need to use a transaction for it. Also note that the clients now are coordinating the key-generation, so you're going to experience throughput limits on the number of keys you can generate.

I have URL and ID pairings, and I send the URLs to an API which it returns in no specific order. How should I pair them back up?

This is a programming problem that I haven't came across yet and I want to make sure I'm tackling it in an efficient manner.
I have an array of dictionaries, with each dictionary consisting of a "url" key and a "id" key with their corresponding values.
I then iterate over this array and throw each URL value into a request object that I then send to an API to be processed.
The API sends them all back at once, not necessarily in the order I sent them with other attributes that I use the API for gathering.
My question is: Now that I have all these URLs and their data, how do I match them back with the IDs that they corresponded to in order to add all these attributes (the ID + the new attributes the API returned) into a database?
My solution: Create one dictionary with the URL of each item as the key and the ID as the value of the key, and then when I get the URLs back just find the value that responds to that URL key.
Is there a better solution to this problem? Maybe architecturally I should be doing this all differently in a way that better facilitates an answer?
The Best Solution is the way you suggested. If you make a key-value pair the lookup time will decrease considerably as it would be O(1). So mapping them back wont be a problem even if you API call is asynchronous.

What's the difference between findAndModify and update in MongoDB?

I'm a little bit confused by the findAndModify method in MongoDB. What's the advantage of it over the update method? For me, it seems that it just returns the item first and then updates it. But why do I need to return the item first? I read the MongoDB: the definitive guide and it says that it is handy for manipulating queues and performing other operations that need get-and-set style atomicity. But I didn't understand how it achieves this. Can somebody explain this to me?
If you fetch an item and then update it, there may be an update by another thread between those two steps. If you update an item first and then fetch it, there may be another update in-between and you will get back a different item than what you updated.
Doing it "atomically" means you are guaranteed that you are getting back the exact same item you are updating - i.e. no other operation can happen in between.
findAndModify returns the document, update does not.
If I understood Dwight Merriman (one of the original authors of mongoDB) correctly, using update to modify a single document i.e.("multi":false} is also atomic. Currently, it should also be faster than doing the equivalent update using findAndModify.
From the MongoDB docs (emphasis added):
By default, both operations modify a single document. However, the update() method with its multi option can modify more than one document.
If multiple documents match the update criteria, for findAndModify(), you can specify a sort to provide some measure of control on which document to update.
With the default behavior of the update() method, you cannot specify which single document to update when multiple documents match.
By default, findAndModify() method returns the pre-modified version of the document. To obtain the updated document, use the new option.
The update() method returns a WriteResult object that contains the status of the operation. To return the updated document, use the find() method. However, other updates may have modified the document between your update and the document retrieval. Also, if the update modified only a single document but multiple documents matched, you will need to use additional logic to identify the updated document.
Before MongoDB 3.2 you cannot specify a write concern to findAndModify() to override the default write concern whereas you can specify a write concern to the update() method since MongoDB 2.6.
When modifying a single document, both findAndModify() and the update() method atomically update the document.
One useful class of use cases is counters and similar cases. For example, take a look at this code (one of the MongoDB tests):
find_and_modify4.js.
Thus, with findAndModify you increment the counter and get its incremented
value in one step. Compare: if you (A) perform this operation in two steps and
somebody else (B) does the same operation between your steps then A and B may
get the same last counter value instead of two different (just one example of possible issues).
This is an old question but an important one and the other answers just led me to more questions until I realized: The two methods are quite similar and in many cases you could use either.
Both findAndModify and update perform atomic changes within a single request, such as incrementing a counter; in fact the <query> and <update> parameters are largely identical
With both, the atomic change takes place directly on a document matching the query when the server finds it, ie an internal write lock on that document for the fraction of a millisecond that the server confirms the query is valid and applies the update
There is no system-level write lock or semaphore which a user can acquire. Full stop. MongoDB deliberately doesn't make it easy to check out a document then change it then write it back while somehow preventing others from changing that document in the meantime. (While a developer might think they want that, it's often an anti-pattern in terms of scalability and concurrency ... as a simple example imagine a client acquires the write lock then is killed while holding it. If you really want a write lock, you can make one in the documents and use atomic changes to compare-and-set it, and then determine your own recovery process to deal with abandoned locks, etc. But go with caution if you go that way.)
From what I can tell there are two main ways the methods differ:
If you want a copy of the document when your update was made: only findAndModify allows this, returning either the original (default) or new record after the update, as mentioned; with update you only get a WriteResult, not the document, and of course reading the document immediately before or after doesn't guard you against another process also changing the record in between your read and update
If there are potentially multiple matching documents: findAndModify only changes one, and allows you customize the sort to indicate which one should be changed; update can change all with multi although it defaults to just one, but does not let you say which one
Thus it makes sense what HungryCoder says, that update is more efficient where you can live with its restrictions (eg you don't need to read the document; or of course if you are changing multiple records). But for many atomic updates you do want the document, and findAndModify is necessary there.
We used findAndModify() for Counter operations (inc or dec) and other single fields mutate cases. Migrating our application from Couchbase to MongoDB, I found this API to replace the code which does GetAndlock(), modify the content locally, replace() to save and Get() again to fetch the updated document back. With mongoDB, I just used this single API which returns the updated document.

Delayed Table Initialization

Using the net-snmp API and using mib2c to generate the skeleton code, is it possible to support delayed initialization of tables? What I mean is, the table would not be initialized until any of it's members were queried directly. The reason for this is that the member data is obtained from another server, and I'd like to be able to start the snmpd daemon without requiring the other server to be online/ready for requests. I thought of maybe initializing the table with dummy data that gets updated with the real values when a member is queried, but I'm not sure if this is the best way.
The table also has only one row of entries, so using mib2c.iterate.conf to generate table iterators and dealing with all of that just seems unnecessary. I thought of maybe just implementing the sequence defined in the MIB and not the actual table, but that's not usually how it's done in all the examples I've seen. I looked at /mibgroup/examples/delayed_instance.c, but that's not quite what I'm looking for. Using mib2c with the mib2c.create-dataset.conf config file was the closest I got to getting this to work easily, but this config file assumes the data is static and not external (both of which are not true in my case), so it won't work. If it's not easily done, I'll probably just implement the sequence and not the table, but I'm hoping there's an easy way. Thanks in advance.
The iterator method will work just fine. It won't load any data until it calls your _first and _next routines. So it's up to you, in those routines and in the _handler routine, to request the data from the remote server. In fact, by default, it doesn't cache data at all so it'll make you query your remote server for every request. That can be slow if you have a lot of data in the table, so adding a cache to store the data for N seconds is advisable in that case.

How to get just the size of the value in BerkeleyDB?

Is there a way to get only the length (in bytes) of a value stored in BDB? I don't need the entire data array, only its size.
If you don't want to have to retrieve the entire entry and aren't using DPL, I'd say you should add a secondary index on the size of the stored byte array and make sure that your DAO properly updates this value on any save or updates. You could add a KeyCreator which creates a secondary size key in a secondary database based on the record.
What type of query are you trying to perform? Are you wanted to search for all records of a given size? Or are you wanting to know the size of a certain record before you retrieve it? I think the latter question is harder to answer.
I'm assuming you're using the JE version (or the Java binding of BDB) in which case, once you get the DatabaseEntry of the desired key, getSize() should give you what you want.
If you're using the C binding, check the DBT handle's size field.
If you store your document ids as duplicate data items, instead of as one blob data item value, then you can use DBC->count() to detect the number of matching documents without actually retrieving the long list of ids. Otherwise, the Berkeley DB API does not seem to support what you're asking for (even though you'd think it could be efficient for them to add it). I puzzled over this as well, and that was the solution I came up with for my own project.
for your problem, using the DB_DBT_PARTIAL flag ang asking for the begining of the record will provide you your first IDs and the DBT.size can be used to compute the total number of IDs.

Resources