What does Datomic do when adding same value to the same attribute? - datomic

I am very new to Datomic but I see value in Datomic's history keeping - considering what we need in our application.
My very basic question is, does Datomic help stop duplicating value to an attribute (let us say I keep adding name="my name" again and again becuase user presses save button without making any change to name)?
One another thing that is needed in our app is the ability to query the approved information rather than latest change. My question is, can I add attributes ("approved", "proposed" etc) to the transaction and query data based on these attributes?
Thanks!

In Datomic, objects (or Clojure maps) are broken into datoms (aka tuples of key/value pairs associated with their owner).
For question one, you can set your 'name' attribute to unique in your Datomic schema. Btw, it's recommended to use namespace-qualified keyword for attribute, something like :user/name.
Just set the key for approved information to "many" in your schema
{:comment/approved {:db/cardinality :db.cardinality/many}}

Repeated datoms will not be recorded again. Calls to add attributes/entities that already exist will be handled as upserts. If you define unique attributes, based on whether you select by "value" or "identity", repeated assertions may fail or upsert, see the docs here and here.
You can annotate transactions. Instructions for doing so are in the docs here and you can find an example in the github repository for day-of-datomic, specifically this section:
(def db (:db-after #(d/transact
conn
[{:db/id (d/tempid :db.part/user)
:story/title "ElastiCache in 5 minutes"
:story/url "http://blog.datomic.com/2012/09/elasticache-in-5-minutes.html"}
{:db/id (d/tempid :db.part/tx)
:source/user editor}])))

Related

In Azure Search, can an indexer combine information from different documents to a single index item without them overwritting each other?

My goal is to create a single searchable Azure Index that has all of the relevant information currently stored in many different sql tables.
I'm also using an Azure Cognitive Service to add additional info from related documents. Each document is tied to only a single item in my Index, but each item in the index will be tied to many documents.
According to my understanding, if two documents have the same value for the indexer's Key, then the index will overwrite the extracted information from the first document with the information extracted from the second. I'm hoping there's a way to append the information instead of overwriting it. For example: if two documents relate to the same index item, I want the values mapped to keyphrases for that item to include the keyphrases found in the first document and the keyphrases found in the second document.
Is this possible? Is there a different way I should be approaching this?
If it is possible, can I do it without having duplicate values?
Currently I have multiple indexes and I'm combining the search results from each one, but this seems inefficient and likely messes up the default scoring algorithm.
Every code example I find only has one document for each index item and doesn't address my problem. Admittedly, I haven't tried to set up my index as described above, because it would take a lot of refactoring, and I'm confident it would just overwrite itself.
I am currently creating my indexes and indexers programmatically using dotnet. I'm assuming my code isn't relevant to my question, but I can provide it if need be.
Thank you so much! I'd appreciate any feedback you can give.
Edit: I'm thinking about creating a custom skill to do the aggregation for me, but I don't know how the skill would access access everything it needs. It needs the extracted info from the current document, and it needs the previously aggregated info from previous documents. I guess the custom skill could perform a search on the index and get the item that way, but that sounds dangerously hacky. Any thoughts would be appreciated.
Pasting from docs:
Indexing actions: upload, merge, mergeOrUpload, delete
You can control the type of indexing action on a per-document basis, specifying whether the document should be uploaded in full, merged with existing document content, or deleted.
Whether you use the REST API or an SDK, the following document operations are supported for data import:
Upload, similar to an "upsert" where the document is inserted if it is new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null.
merge updates a document that already exists, and fails a document that cannot be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type Collection(Edm.String). For example, if a tags field starts with a value of ["budget"] and you execute a merge with ["economy", "pool"], the final value of the tags field is ["economy", "pool"]. It won't be ["budget", "economy", "pool"].
mergeOrUpload behaves like merge if the document exists, and upload if the document is new.
delete removes the entire document from the index. If you want to remove an individual field, use merge instead, setting the field in question to null.

Alternate string ID for Guid ID objects

I currently use Guid as the primary key for my ContentItems in my code-first Entity Framework Context. However, since Guid are so unwieldy I would like to also set an alternate, friendly ID for each ContentItem (or descendant of ContentItem) according to the following logic:
Use the Name property, to lower, replacing whitespace with a - , and end the prefix with a - as well
Look in the database to see which other ContentItem have a FriendlyID with the same prefix, and find the one with the highest numeric suffix
Increment that by 1 and add as a suffix
So the first item with name "Great Story" would have FriendlyID of great-story-1, the next one great-story-2, and so forth.
I realize there are a number of ways to implement this sort of thing, but here are my questions:
Is it advisable to explicitly set a new field with the alternate ID according to this logic, or should I just run a query each time applying the same rules as I would to generate the ID to find the right object?
How should I enforce the setting of the alternate ID? Should I do it in my service methods for each content item at creation time? (This concerns me because if someone forgets to add that logic to the service method, now the object doesn't have a FriendlyID) Or should I do it in the model itself, with a property with manually-defined getters/setters that have to query the DB and find out what the next available FriendlyID is?
Are there alternatives to using this sort of FriendlyID for the purpose of making human-friendly URL's and web service requests? The ultimate purpose of this thing is really so that we can have users go to http://awesomewebsite.com/Content/great-story-1 and get sent to the right content item, rather than http://awesomewebsite.com/Content/f0be271e-ee01-48de-8599-ddd602e777b6, etc.
Pre-generate them. This allows you to index them. I understand your concern but there's no alternative in practice. (I have done this.)
I don't know the architecture of your app. Just note, that generating such an ID requires database query access. It probably shouldn't be done as a property or method on the entity itself.
You could use a combination by putting both a "speaking name" and and ID into the URL. I have seen sites do this. For GUID ID's this is not exactly pretty, though.
Write yourself a few helper methods to generate such string IDs in a convenient and robust way. That way it is not that much trouble doing this.

Winforms DevExpress UnitOfWork - Check if the values to save from UnitOfWork aren't already in the database

In an application we inherited (Winforms) based on DevExpress, an object of type UnitOfWork is used to keep track and save multiples records in the database.
Usually around 100 objects can be saved in the database on a button click using the method UnitOfWork.CommitChanges();
The table where the records are inserted has an unique constraint on a column.
It might happen that different users try to treat the same entities and to try to enter in that table the same value in the unique column.
So definitively before using UnitOfWork.CommitChanges() we should test if one or more values do not exist already in the database.
What would be the best approach to test if one or more objects are not already in the database before calling UnitOfWork.CommitChanges(), so that we can surely warn the user on his validation?
Thank you
It's not really the DevExpress way of working, see the help topic on auto-generated keys.
However, since you have inherited the code, you could use UnitOfWork.GetObjectsToSave() to obtain an ICollection of the IXPObjects you are about to commit. From this collection you could generate a list of potential key conflicts. Then you could use UnitOfWork.ExecuteScalar() to run a direct SQL command to determine if there are conflicts and resolve them.
The UnitOfWork.CommitChanges() calls CommitTransaction, so you could perform this check in one of the UnitOfWork events such as UnitOfWork.BeforeCommitTransaction.
As with everything DevExpress, your best support option is the DevExpress Support Center.
The functionality you ask for is not possible to implement in a multiuser system. As every user (necessarily) works in his own database transaction, there is always a chance for a user to insert a datarow with a key that another user inserted right before. You could only avoid this by employing database mechanisms like autogenerated keys. But I think this is not applicable for you.
So I suggest to catch the exception thrown by commitChanges and act upon it.

Data storage: "grouping" entities by property value? (like a dictionary/map?)

Using AppEngine datastore, but this might be agnostic, no idea.
Assume a database entity called Comment. Each Comment belongs to a User. Every Comment has a date property, pretty standard so far.
I want something that will let me: specify a User and get back a dictionary-ish (coming from a Python background, pardon. Hash table, map, however it should be called in this context) data structure where:
keys: every date appearing in the User's comment
values: Comments that were made on date.
I guess I could just iterate over a range of dates an build a map like this myself, but I seriously doubt I need to "invent" my own solution here.
Is there a way/tool/technique to do this?
Datastore supports both references and list properties. This let's you build one-to-many relationships in two ways:
Parent (User) has a list property containing keys of Child entities (Comment).
Child has a key property pointing to Parent.
Since you need to limit Comments by date, you'd best go with option two. Then you could query Comments which have date=somedate (or date range) and where user=someuserkey.
There is no native grouping functionality in Datastore, so to also "group" by date, you can add a sort on date to the query. Than when you iterate over the result, when the date changes you can use/store it as a grouping key.
Update
Designing no-sql databases should be access-oriented (versus datamodel oriented in sql): for often-used operations you should be getting data out as cheaply (= as few operations) as possible.
So, as a rule of thumb you should, in one operation, only get data that is needed at that moment (= shown on that page to user). I'm not sure about your app's design, but I doubt you need all user's full comments (with text and everything) at one time.
I'd start by saying you shouldn't apologize for having a Python background. App Engine started supporting only Python. Using the db module, you could have a User entity as the parent of several DailyCommentBatch entities each a parent of a couple Comment entities. IIRC, this will keep all related entities stored together (or close).
If you are using the NDB (I love it) you may have employ a StructuredProperty either at the User or DailyCommentBatch levels.

Clarification: can I put all of a user's data in a single entity group by making up an ancestor key?

I want to do several operations on a user's data in a single transaction, but won't need to update multiple users' data in a single transaction. I see from http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html#Entity_Groups_Ancestors_and_Paths that "A good rule of thumb for entity groups is that [entity groups] should be about the size of a single user's worth of data or smaller," so I think the correct choice is to use a single parent key when building the keys for the other entities related to a user.
Does this seem like a good idea?
Is it easy to code? Something like KeyBuilder.setParent(theKeyOfMyUserEntity)?
1) It is hard to comment without some addition details about the data. There are several things you should be aware of with entity groups; the biggest is that the group will be stored together. That means if you are trying to do many (separate) updates you could face contention, limiting your app's performance.
2) yes it is easy to code. The syntax is pretty close to what you posted.
There are other options for transactions. Check out Nick Johnson's article on distributed transactions. If you are wanting transactions for aggregates you should also check out Brett Slatkin's IO talk on high-throughput data pipelines.
Yes, it seems reasonable to store some user data as child entities of a User entity.
Why do you need to manually create keys ? The db.Model() constructor already has a convenient "parent" argument which will automatically put both the parent entity and the child entity in the same entity group.

Resources