Query data nested within key

Query data nested within key - database

How do I query firebase for specific times?

You just have to adopt another date format like YYYY-MM-DD-HH:MM and your nodes will be sorted in a way you can use the filtering methods (e.g. startAt()).
If you cannot change the key (it must stay like 11-20-2020 11:30 for any business or app architecture reason) you can add the YYYY-MM-DD-HH:MM like value as a child node.

Related

Are documents removed from a couchbase view if the data changes?

My understanding is that Couchbase views are built incrementally, but I can't seem to find an answer to whether a document can exist in a view multiple times. For example, say I want to create a view based on an updatedAt timestamp, that is changed every time I update this document type.
If the view is built incrementally, that seems to imply that if document id "1234" is updated several times and that updatedAt timestamp changed each time, I'd end up with several entries in the view for the same document, when what I want is just one entry, for the latest value.
It does seem like Couchbase is limiting it to a single copy of any given document id within the view, but I can't find firm confirmation of that anywhere. I want to make sure I'm not designing something for a production system around a behavior that might not work the way it seems to on a small scale.

Yes. When a view index is refreshed, any documents modified since the last refresh have their associated rows removed from the view, and the map function is invoked again to emit the new row(s).
A single document can generate multiple view rows, but only if the view's map function calls emit multiple times.

Recommendation for a noSQL data structure for organizational application

I'm redesigning my data structure for an organizational application The problem is trying to come up with the optimal structure and boils down to indexing and keeping the structure flexible. It is based on a JSON structure and starts with the question of a map of objects or an array of objects. [{}] vs {{}}. Should each top level object be indexed by a key, or should the key be inside of object, and an index is generated separately).
The app contains user tasks, appointments, events, and notes. I used Localstorage on the client, and mongoDB on the server. For the client, I'm changing to IndexedDB and will take this opportunity to also redesign my local JSON data structure.
When using the google calendar api, I noticed many of the results are just a random list of calendar events. The list is an array of objects which have relevant event info. Granted, these are the result of a REST request, not the actually data storage structure itself, however it got me to thinking... previously my data was all key:value pairs, sometimes nested, but always starting with a key. {{}}
For example, using a startTime key, represented by an epoch number (or could be isoDateTime string):
{{}}
"events": {
(EPOCH NUMBER): {
creationDate: (EPOCH NUMBER),
UID: (STRING),
summary: (STRING),
endDateTime: (EPOCH NUMBER)
}
...
}
vs
[{}]
"events": [{
startDateTime: (EPOCH NUMBER)
creationDate: (EPOCH NUMBER),
UID: (STRING),
summary: (STRING),
endDateTime: (EPOCH NUMBER)
}
...
]
In the first, I can easily get date ranges of Events, test if an event of a certain day exists, get all keys, etc. I can save to localstorage or mongodb directly using my unique key. I also have a key generator which increments the isoDateTime key (in the case they might overlap, javascript epoch uses milisecond so there are 1000 diff per second so I'm not concerned about overlapping keys). Problem: if I change an event start time, I'd need to change my key or generate a new object with the right key. Overall seems efficient, but a brittle approach.
In the second, on application initialization, I could run an indexing function which orders by the startDateTime and each points to the associated object. To save to storage, it would be a little more interesting since I don't have an obvious key/value pair. I could save the array under the key "events" but i'm not sure how updates would work, unless I also kept an index on all the array positions. This could be more flexible as I can easily change my startTime field, and I could have multiple indexes, which could also easily be changed.
So two questions: First, between the two options, {{}} and [{}] which is the more recommended approach for saving nested data which needs to be indexed. Second, I'm saving all dateTime data as UTC (changing on client when rendering to local Timezone), should I use the isoDateTime string or maybe just the Epoch number?
Any recommendations or feedback greatly appreciated, I've been scribbling different scenarios and algorithms for days now. I really wanna get this right.
Thanks,
Paul

My first instinct is to basically create an object store of events. Give each event an auto-incremented id. For each event, ensure that you store a few basic properties like start-date, end-date, etc. Then, for the particular queries you wish to run and hope for them to complete quickly, create indices on the properties involved.
The events will be sorted according to id when iterating over the store, but will be sorted by date or whatever when iterating over the index.
If you want to export to json, you would export an object containing an array of event objects.
For nosql, it isn't important that each event has the same properties. Only the object type itself, and a minimal set of properties like the key path, are important. The rest of the properties are completely variable, and should be understand as just a 'bag' of misc. props.
If this doesn't help then I guess I misunderstood the question.

Want to capture fields which get updated in Salesforce

I wish to create a generic component which can save the Object Name and field Names with old and new values in a BigObject.
The brute force algo says, on every update of each object, get field API names using describe and check old and new value of those fields. If it gets modified insert it into new BigObject.
But it will consume a lot of CPU time and I am looking for an optimum solution to handle this.
Any suggestions are appreciated.

Well, do you have any code written already? Maybe benchmark it and then see what you can optimise instead of overdesigning it from the start... Keep it simple, write test harness and then try to optimise (without breaking unit tests).
Couple random ideas:
You'd be doing that in a trigger? So your "describe" could happen only once. You don't need to describe every single field, you need only one operation outside of trigger's main loop.
Set<String> fieldNames = Account.sObjectType.getDescribe().fields.getMap().keyset();
System.debug(fieldNames);
This will get you "only" field names but that's enough. You don't care whether they're picklists or dates or what. Use that with generic sObject.get('fieldNameHere') and it's a good start.
or maybe without describe at all. sObject's getPopulatedFieldsAsMap() will give you cool Map which you can easily iterate & compare.
or JSON.serialize the old & new version of the object and if they aren't identical - you know what to do. No idea if they'll always serialise with same field order though so checking if the maps are identical might be better
do you really need to hand-craft this field history tracking like that? You have 1M records free storage but it could explode really easily in busier SF org. Especially if you have workflows, processes, other triggers that would translate to multiple updates (= multiple trigger runs) in same transaction. Perhaps normal field history tracking + chatter feed tracking + even salesforce shield (it comes with 60 more fields tracked I think) would be more sensible for your business needs.

What's the best way to store event data in Redshift?

I'm new to Redshift and am looking at the best way to store event data. The data consists of an identifier, time and JSON metadata about the current state.
I'm considering three approaches:
Create a table for each event type with a column for each piece of data.
Create a single table for events and store metadata as a JSON field.
Create a single table with a column for every possible piece of data I might want to store.
The advantage of #1 is I can filter on all data fields and the solution is more flexible. The disadvantage is every time I want to add a new event I have to create a new table.
The advantage of #2 is I can put all types of events into a single table. The disadvantage is to filter on any of the data in the metadata I would need to use a JSON function on every row.
The advantage of #3 is I can easily access all the fields without running a function and don't have to create a new table for each type. The disadvantage is whoever is using the data needs to remember which columns to ignore.
Is one of these ways better than the others or am I missing something entirely?

This is a classic dilemma. After thinking for a while, in my company we ended up keeping the common properties of the events in separate columns and the unique properties in the JSON field. Examples of the common properties:
event type, timestamp (every event has it)
URL (this will be missing for backend events and mobile app events but is present for all frontend events and is worth to have in a separate column)
client properties: device, browser, OS (will be missing in backend but present in mobile app events and frontend events)
Examples of unique properties (no such properties in other events):
test name and variant in AB test event
product name or ID in purchase event
Borderline between common and unique property is your own judgement based on how many events share this property and how often will this property be used in the analytics queries to filter or group data. If some property is just "nice-to-have" and it is not involved in regular analysis use cases (yeah, we all love to store anything that is trackable just in case) the burden of maintaining a separate column is an overkill.
Also, if you have some unique property that you use extensively in the queries there is a hacky way to optimize. You can place this property at the beginning of your JSON column (yes, in Python JSON is not ordered but in Redshift it is a string, so the order of keys can be fixed if you want) and use LIKE with a wildcard only at the end of the field:
select *
from event_table
where event_type='Start experiment'
and event_json like '{"test_name":"my_awesome_test"%' -- instead of below
-- and json_extract_path_text(event_json,'test_name')='my_awesome_test'
LIKE used this way works much faster than JSON lookup (2-3x times faster) because it doesn't need to scan every row, decode JSON, find the key and check the value but it just checks if the string starts with a substring (much cheaper operation).

Data storage: "grouping" entities by property value? (like a dictionary/map?)

Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?

Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.

I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Query data nested within key - database

How do I query firebase for specific times?

Related

Are documents removed from a couchbase view if the data changes?

Recommendation for a noSQL data structure for organizational application

Want to capture fields which get updated in Salesforce

What's the best way to store event data in Redshift?

Data storage: "grouping" entities by property value? (like a dictionary/map?)

Categories

Resources