I have URL and ID pairings, and I send the URLs to an API which it returns in no specific order. How should I pair them back up? - database

This is a programming problem that I haven't came across yet and I want to make sure I'm tackling it in an efficient manner.
I have an array of dictionaries, with each dictionary consisting of a "url" key and a "id" key with their corresponding values.
I then iterate over this array and throw each URL value into a request object that I then send to an API to be processed.
The API sends them all back at once, not necessarily in the order I sent them with other attributes that I use the API for gathering.
My question is: Now that I have all these URLs and their data, how do I match them back with the IDs that they corresponded to in order to add all these attributes (the ID + the new attributes the API returned) into a database?
My solution: Create one dictionary with the URL of each item as the key and the ID as the value of the key, and then when I get the URLs back just find the value that responds to that URL key.
Is there a better solution to this problem? Maybe architecturally I should be doing this all differently in a way that better facilitates an answer?

The Best Solution is the way you suggested. If you make a key-value pair the lookup time will decrease considerably as it would be O(1). So mapping them back wont be a problem even if you API call is asynchronous.

Related

Fetching multiple IDs in one request with React Query

We have a GET API endpoint which given an array of IDs returns an array of objects matching those IDs.
We have a form where we need to lookup an arbitrary array of IDs using this API based on user selection.
We would like to use React Query to allow us to pass an array of IDs to it and it will return objects from the cache and/or fetch data, as needed.
One option we’ve considered is to have a custom hook which, given array of IDs, filters to those IDs in the cache and which should be fetched (by using what is returned from getQueryData). After fetch, iterate through response and setQueryData on each item. This feels a little convoluted and might not be leveraging React Query as much as it could.
Essentially our question is, how can we best use React Query when requesting multiple IDs in one GET request?

Get the "Place in line" of records in the realtime db?

Basically I'm creating a system to manage requests from users in a Firebase Realtime Database, via an Express app loaded into Firebase Functions. The request table will be in a queue format, so FIFO. Records would be ordered chronologically by their keys, a la timecodes that are created from Post requests. I'd like to be able to tell a user their place in line, and I'd like to be able to list this place in line in my app. I expect this queue to have requests numbering in the thousands, so iterating up to the entire length of the queue every time a client requests its place in line seems unattractive.
I've thought of doing a query for the user's UID (which would be saved in each request, naturally), but I can't figure out how to structure that query while maintaining the chronological order of the requests. Something like requestsReference.orderByKey().endAt({UID}, key: "requestorUid") doesn't seem like it'd work from what I'm seeing in the docs; but if I pulled off a query like that then I'd be able to get the place in line just from the length of the returned object. It's worth saying now that I have no idea how efficient this would be compared to just iterating the entire queue in its original chronological order.
I've also thought of taking an arithmetic approach, basically adding the "place in line when requested" and "total fulfillments when requested" as data in the request records. So then I'd be able to retrieve a record by its UID, and determine the place in line via placeInLineAtRequestTime - (totalCurrentFulfillments - totalFulfillmentsAtRequestTime). It'd be a rough approach, and I'd need to fetch the entire fulfillments table in order to get the current count. So again, I'm not sure how this compares.
Anyway, any thoughts? Am I missing some real easy way I could do this? Or would iterating it be cheaper than I think it'd be?

What's the best way to store event data in Redshift?

I'm new to Redshift and am looking at the best way to store event data. The data consists of an identifier, time and JSON metadata about the current state.
I'm considering three approaches:
Create a table for each event type with a column for each piece of data.
Create a single table for events and store metadata as a JSON field.
Create a single table with a column for every possible piece of data I might want to store.
The advantage of #1 is I can filter on all data fields and the solution is more flexible. The disadvantage is every time I want to add a new event I have to create a new table.
The advantage of #2 is I can put all types of events into a single table. The disadvantage is to filter on any of the data in the metadata I would need to use a JSON function on every row.
The advantage of #3 is I can easily access all the fields without running a function and don't have to create a new table for each type. The disadvantage is whoever is using the data needs to remember which columns to ignore.
Is one of these ways better than the others or am I missing something entirely?
This is a classic dilemma. After thinking for a while, in my company we ended up keeping the common properties of the events in separate columns and the unique properties in the JSON field. Examples of the common properties:
event type, timestamp (every event has it)
URL (this will be missing for backend events and mobile app events but is present for all frontend events and is worth to have in a separate column)
client properties: device, browser, OS (will be missing in backend but present in mobile app events and frontend events)
Examples of unique properties (no such properties in other events):
test name and variant in AB test event
product name or ID in purchase event
Borderline between common and unique property is your own judgement based on how many events share this property and how often will this property be used in the analytics queries to filter or group data. If some property is just "nice-to-have" and it is not involved in regular analysis use cases (yeah, we all love to store anything that is trackable just in case) the burden of maintaining a separate column is an overkill.
Also, if you have some unique property that you use extensively in the queries there is a hacky way to optimize. You can place this property at the beginning of your JSON column (yes, in Python JSON is not ordered but in Redshift it is a string, so the order of keys can be fixed if you want) and use LIKE with a wildcard only at the end of the field:
select *
from event_table
where event_type='Start experiment'
and event_json like '{"test_name":"my_awesome_test"%' -- instead of below
-- and json_extract_path_text(event_json,'test_name')='my_awesome_test'
LIKE used this way works much faster than JSON lookup (2-3x times faster) because it doesn't need to scan every row, decode JSON, find the key and check the value but it just checks if the string starts with a substring (much cheaper operation).

Alternate string ID for Guid ID objects

I currently use Guid as the primary key for my ContentItems in my code-first Entity Framework Context. However, since Guid are so unwieldy I would like to also set an alternate, friendly ID for each ContentItem (or descendant of ContentItem) according to the following logic:
Use the Name property, to lower, replacing whitespace with a - , and end the prefix with a - as well
Look in the database to see which other ContentItem have a FriendlyID with the same prefix, and find the one with the highest numeric suffix
Increment that by 1 and add as a suffix
So the first item with name "Great Story" would have FriendlyID of great-story-1, the next one great-story-2, and so forth.
I realize there are a number of ways to implement this sort of thing, but here are my questions:
Is it advisable to explicitly set a new field with the alternate ID according to this logic, or should I just run a query each time applying the same rules as I would to generate the ID to find the right object?
How should I enforce the setting of the alternate ID? Should I do it in my service methods for each content item at creation time? (This concerns me because if someone forgets to add that logic to the service method, now the object doesn't have a FriendlyID) Or should I do it in the model itself, with a property with manually-defined getters/setters that have to query the DB and find out what the next available FriendlyID is?
Are there alternatives to using this sort of FriendlyID for the purpose of making human-friendly URL's and web service requests? The ultimate purpose of this thing is really so that we can have users go to http://awesomewebsite.com/Content/great-story-1 and get sent to the right content item, rather than http://awesomewebsite.com/Content/f0be271e-ee01-48de-8599-ddd602e777b6, etc.
Pre-generate them. This allows you to index them. I understand your concern but there's no alternative in practice. (I have done this.)
I don't know the architecture of your app. Just note, that generating such an ID requires database query access. It probably shouldn't be done as a property or method on the entity itself.
You could use a combination by putting both a "speaking name" and and ID into the URL. I have seen sites do this. For GUID ID's this is not exactly pretty, though.
Write yourself a few helper methods to generate such string IDs in a convenient and robust way. That way it is not that much trouble doing this.

mapping encoded keys to shorter identifiers in appengine

I want to send unique references to the client so that they client can refer back to specific objects. The encoded keys appengine provides are sometimes 50 bytes long, and I probably only need two or three bytes (I could hope to need four or five, but that won't be for a while!).
Sending the larger keys is actually prohibitively expensive, since I might be sending 400 references at a time.
So, I want to map these long keys to much shorter keys. An obvious solution is to store a mapping in the datastore, but then when I'm sending 400 objects I'm doing 400 additional queries, right? Maybe I mitigate the expense by keeping copies of the mappings in memcache as well. Is there a better way?
Can I just yank the number out of the unencoded keys that appengine creates and use that? I only need whatever id I use to be unique per entity kind, not across the whole app.
Thanks,
Riley
Datastore keys include extra information you don't need - like the app ID. So you definitely do not need to send the entire keys.
If these references are to a particular Kind in your datastore, then you can do even better and just send the key_name or numeric ID (whichever your keys use). If the latter is the case, then you could transmit each key with just a few bytes (you could opt for either a variable-length or fixed-length integer encoding depending on which would be more compact for your specific case [probably the former until most of the IDs you're sending get quite large]).
When you receive these partial keys back from the user, it should be easy to reconstruct the full key which you need to retrieve the entities from the datastore. If you are using the Python runtime, you could use db.Key.from_path(kind_name, numeric_id_or_key_name).
A scheme like this should be both simpler and (a lot) faster than trying to use the datastore/memcache to store a custom mapping.
You don't need a custom mapping mechanism. Just use entity key names to store your short identifier :
entity = MyKind(key_name=your_short_id)
entity.put()
Then you can fetch these short identitiers in one query :
keys = MyKind.all(keys_only=True).filter(...).fetch(400)
short_ids = [key.name() for key in keys]
Finally, use MyKind.get_by_key_name(short_id) in order to retrieve entities from identifiers sent back by your users.

Resources