My understanding is that Couchbase views are built incrementally, but I can't seem to find an answer to whether a document can exist in a view multiple times. For example, say I want to create a view based on an updatedAt timestamp, that is changed every time I update this document type.
If the view is built incrementally, that seems to imply that if document id "1234" is updated several times and that updatedAt timestamp changed each time, I'd end up with several entries in the view for the same document, when what I want is just one entry, for the latest value.
It does seem like Couchbase is limiting it to a single copy of any given document id within the view, but I can't find firm confirmation of that anywhere. I want to make sure I'm not designing something for a production system around a behavior that might not work the way it seems to on a small scale.
Yes. When a view index is refreshed, any documents modified since the last refresh have their associated rows removed from the view, and the map function is invoked again to emit the new row(s).
A single document can generate multiple view rows, but only if the view's map function calls emit multiple times.
Related
I've currently got a bookings and a bookable collection. Each document in bookings holds a date range (check-out and check-in) and an array of references to bookable documents.
I'm a bit stumped at how to guarantee two overlapping bookings for the same bookables aren't written at the same time. From what I understand I can't technically lock a collection via something like a transaction, so I'm wondering what my options are (perhaps restructuring how I'm storing data, etc).
Any pointers or advice would be much appreciated.
EDIT:
Say User A wants to make a booking for the same two items as User B does and for the same time range. They both load the booking UI at around the same time and confirm their selection.
Prior to creating a new document inside the bookings collection for each of their requests, the app would perform a get query to check for any overlaps and if none exist insert the new booking documents. That fraction of time between the app's check for overlaps across the booking collection and the creation of new documents is what seems to open up a window for inconsistencies (e.g. potentially allowing two documents with overlapping time ranges and items to be created).
Could a transaction help prevent a new document being written to a collection based the existance of other documents in that collection that fit a specific criteria?
To prevent users from accidentally overwriting each other's data, you'll want to use a transaction.
To prevent users from intentionally overwriting each other's data, you'll want to use security rules. Key to this is to use the information that you want to be unique as the ID of the documents.
So say you identify time slots by the date and start time, you could have a document ID "20210420T0900". If a user is trying to write to that document when it already exists, you can reject that write in the security rules of your database.
I am facing the exact same problem, and here is my best option at the moment....
I need a collection of all booking (bookingscollection), regardless of date, time, resource booked etc. This collection is usabale in many parts of my UI, as I list upcoming bookingss etc.
I need to avoid writes being made to this collection, where there is an overlap.
I am considering adding an additional collection, where each document descibes the bookings for a specific resource on a specific day (lockcollection). It could be a doc, with the resource id, the date it covers and an array of start,stop times of booking already made.
Then when considering adding a new booking to my bookingscollection, I would make a transaction to the relevant document in the lockcollection, see if there is an overlap in which case I would fail, and if not, I would add the new intervall to the lock document within the transaction.
Once this succeed, I know that I can just plainly add the booking to the booking collection, as the lock is already there....
Similar logic would be applied to the procedure of deleting or changing bookings.
This idea is new to me, but I wanted to sahre, so I can hear your inputs....?
I'm new to Redshift and am looking at the best way to store event data. The data consists of an identifier, time and JSON metadata about the current state.
I'm considering three approaches:
Create a table for each event type with a column for each piece of data.
Create a single table for events and store metadata as a JSON field.
Create a single table with a column for every possible piece of data I might want to store.
The advantage of #1 is I can filter on all data fields and the solution is more flexible. The disadvantage is every time I want to add a new event I have to create a new table.
The advantage of #2 is I can put all types of events into a single table. The disadvantage is to filter on any of the data in the metadata I would need to use a JSON function on every row.
The advantage of #3 is I can easily access all the fields without running a function and don't have to create a new table for each type. The disadvantage is whoever is using the data needs to remember which columns to ignore.
Is one of these ways better than the others or am I missing something entirely?
This is a classic dilemma. After thinking for a while, in my company we ended up keeping the common properties of the events in separate columns and the unique properties in the JSON field. Examples of the common properties:
event type, timestamp (every event has it)
URL (this will be missing for backend events and mobile app events but is present for all frontend events and is worth to have in a separate column)
client properties: device, browser, OS (will be missing in backend but present in mobile app events and frontend events)
Examples of unique properties (no such properties in other events):
test name and variant in AB test event
product name or ID in purchase event
Borderline between common and unique property is your own judgement based on how many events share this property and how often will this property be used in the analytics queries to filter or group data. If some property is just "nice-to-have" and it is not involved in regular analysis use cases (yeah, we all love to store anything that is trackable just in case) the burden of maintaining a separate column is an overkill.
Also, if you have some unique property that you use extensively in the queries there is a hacky way to optimize. You can place this property at the beginning of your JSON column (yes, in Python JSON is not ordered but in Redshift it is a string, so the order of keys can be fixed if you want) and use LIKE with a wildcard only at the end of the field:
select *
from event_table
where event_type='Start experiment'
and event_json like '{"test_name":"my_awesome_test"%' -- instead of below
-- and json_extract_path_text(event_json,'test_name')='my_awesome_test'
LIKE used this way works much faster than JSON lookup (2-3x times faster) because it doesn't need to scan every row, decode JSON, find the key and check the value but it just checks if the string starts with a substring (much cheaper operation).
I have a datastore entity called lineItems, which consists of individual line items to be invoiced. The users find the line items and attach a purchase order number to the line items. These are they displayed on the web page where they can create the invoice.
I would display my code for fetching the entities, but I don't think it matters at all as this also happened a couple times when I was using managed VM's a few months ago and the code is completely different. (I was using objectify before, now I am using the datastore API). In a nutshell, I am currently just using a StructuredQuery.setFilter(new PropertyFilter.eq("POnum",ponum)).setFilter(new PropertyFilter.eq("Invoiced", false)); (this is pseudo code you can't do two .setFilters like this. The real code accepts a list of PropertyFilters and creates a composite filter properly.)
What happened this morning was the admin person created the invoice, and all but two of the lines were on the invoice. There were two lines which the code never fetched, and those lines were stuck in the "invoices to create" section.
The admin person simply created the invoice again for the given purchase order number, but the second time it DID pick up the two remaining lines and created a second invoice.
Note that the entities were created/edited almost 24 hours before (when she assigned the purchase order number to them), so they were sitting in the database for quite a while. (I checked my logs). This is not a case where they were just created, and then tried to be accessed within a short period of time. It is also NOT a case of failing to update the entities - the code creates the invoice in a 3'rd party accounting package, and they simply were not there. Upon success of the invoice creation, all of the entities are then updated with "invoiced = true" and written in the datastore. So the lines which were not on the invoice in the accounting program are the ones that weren't updated in the datastore. (This is not a "smart" check either, it does not check line-by line. It simply checks if the invoice creation was successful or not, and then updates all of the entities that it has in memory).
As far as I can tell, the datastore simply did not return all of the entities which matched the query the first time but it did the second time.
There are approximately 40'000 lineItem entities.
What are the conditions which can cause a datastore fetch to randomly fail to grab all of the entities which meet the search parameters of a StructuredQuery? (Note that this also happened twice while using Objectify on the now deprecated Managed VM architecture.) How can I stop this from happening, or check to see if it has happened?
You may be seeing eventual consistency because you are not using an ancestor query.
See: https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/
I have custom object 'Subject_c' with 3 fields and I have created those objects by uploading a CSV file. Subject_c has a lookup relationship with Leads (Its general for the same user regardless of what lead he is viewing). I am able to insert a related list and I can see that the objects are created under Data Management/Storage Usage. But it shows blank under related list.
You're saying that the custom object has lookup to Lead but then you say Subjects are generic and somehow should be displayed on every Lead page? I don't think it'll work.
Stuff appears on related list only when field Subject_c.Lead_c will be populated with "this" Lead's Id. (please note I've made best guess at the field name). So you'd need to insert separate data for each Lead which can quickly blow your storage usage and will be a pain in the a$$ to maintain later. Is it only for displaying? Or do you plan to later capture some kind of survey results for each Lead?
If it's just for display I think you'll need to embed a Visualforce page in the Lead page layout to achieve that in a saner way. The subjects are specific to current viewing user? Or it's more like a general list, just 3 subjects for whole organisation?
P.S. "object" is like a table in normal database. I think you mixed a bit the difference between table and records / rows of data stored in it.
Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?
Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.
I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.