Viewing a "log" of new (distinct) events in a database

Viewing a "log" of new (distinct) events in a database - database

I have an application which has several unrelated tables in its db. I'll explain by using an "auto-updating" version of the SO homepage as an example, so lets say I have the tables "users", "comments" and "questions".
The homepage client side needs to periodically poll the server, and get a log of all the new "events" that have happened. I.e., I'd like to display (somehow) the new questions, comments and users that have been added to SO since the last poll.
On way would be to simply keep a variable on the client side containing the last index of each of my tables, send it to the server, and have the server send me the new users, comments and questions.
The problem is, what happens when I add a new type of information, say, votes. Now I have to store another variable on the client-side, and the server has to poll another table. And so on, for every new type of information I keep.
I'm looking for a solution that helps me avoid this.
Another problem - say I'd like to see all the "events" that have happened since last time, but sorted according to when they took place.
One direction I had is to have a single "events" table, which contains the info about when each event happened. I can then poll only this table, and get a list of all the new events that have happened. The problem is that each event is pretty different (a new comment has different columns than a new upvote, etc.) So I'm not sure how to implement this, or if this is even a good idea.
Does anybody have any ideas how I can solve this? This seems like something that would come up a lot, but I don't really have much experience with databases, unfortunately.
Thanks!

It sounds to me like you're trying to future proof via database design. While this can be done through something an EVA model I caution against that because the value its adds tend to not be worth the cost.
Instead you should model the database as closely to reality as possible and not how you intend to use it.
Then use SQL to project the data to how you need it. You can do this by statements that will either deliver the meta data that you need
e.g.
Select
Count(ID) , 'Comments' Type
From
Comments
Where
lastUpdate > #InputParamter1#
UNION Select
Count(ID) , 'Questions' Type
From
Questions
Where
lastUpdate > #InputParamter1#
Or (and this doesn't get used Often enough)
Return more than one result set from your database in one go
Select
userid,
ComentText
From
Comments
Where
lastUpdate > #InputParamter1#;
Select
userId,
Questions,
Tags
From
Questions
Where
lastUpdate > #InputParamter1#
That said you will still have to write some code if you add new stuff but it should be limited to updating your sql, adding new containers for your data and then code to display to the end users and then to validate and store it.
Honestly the idea of adding new stuff requiring some work doesn't seem that awful to me.

Related

Saleforce Admin Sharing Rule

[Note: There is a Teacher Object with the fields such as Teacher Name, DateofJoining, and also a formula field called Experience]
My Task was to create a Public Group consisting of another user
and this user should only see teachers who have experience greater than 2 years
But when i create a sharing rule based on criteria the field name called Experience doesn't show up as it is a formula field.
So i got an idea of creating a new field(maybe a text or number data type) which would have the value of Experience in it. (But i have no idea on how to implement this)
Is there a way to implement this?
Any other solution is also well appreciated!

Hard to say.
Normal trick would be to create a helper field (text, number, whatever) and have piece of functionality that populates it. An "early flow" or "before insert, before update" trigger ideally. Worst case a normal flow, process builder or "after insert, after update" trigger. Something like "if Experience__c != 'your formula here' then Experience__c = 'your formula here'". Consult normal SF help and trailhead if you never used early flows
You'd make an one-off data fix to populate existing records and job done, normal field should be selectable as sharing rule criteria.
=====
But I smell trouble with your formula. What exactly you have there, something like Experience__c = (TODAY() - DateofJoining__c) / 365? That's bit evil. Formulas with TODAY(), NOW() or anything with $ (roughly speaking who's looking at the data, user's name, profile role... not what's actually on the record itself) are "nondeterministic". Unpredictable.
A "today()" changes just like that, without updating the record. Sure, when you watch the record a fresh value will be calculated but other than that LastModifiedDate doesn't change, there's no magical trigger running at midnight that rechecks sharing. (especially that there's no single midnight, you could have users in multiple timezones). SF just doesn't allow nondeterministic fields in many places, see https://salesforce.stackexchange.com/q/32122/799
So if you do rely on TODAY() in your formula you might have to make a "scheduled flow" or read about schedulable, batchable apex. Create nightly job that would run and recalculate your helper field with right experience. You'd probably even need both solutions, a "before save" flow for new data created today and nightly job to advance the clock on existing old data...

rest: modify the order of the resources associated to a resource

I'm trying to write a REST interface for an application.
The Model
In the model of this application I already have a situation like the following:
A person can have multiple activities.
An activity can be referred by several people.
The activities of the people are ordered with a specific field (position).
I've represented this situation in my db with three tables: people, activities, people_activities. The table people_activities features three fields: person_id, activity_id, position.
The new Requirement
Everything worked fine until now but now I've got a new requirement, I should be able to insert an activity for a person on top of the others.
So if the content of the people_activities table is for instance this one:
"mark reading 0"
"mark running 1"
if I received the request to add "mark juggling" the result should be:
"mark juggling 0"
"mark reading 1"
"mark running 2"
The solutions
The dilemma for the REST interface now comes from the fact that I identify only two options:
using a single URL like /people/mark/addActivity but this doesn't seem restful at all since it modifies resources not referred by the URL and contains a "verb" too much.
using some URLs like what I'm already using (something like /people/mark/activities or /people_activities?person=mark) and post the new change towards all these resources. This seems to be restful but very sloppy in my opinion.
What's the proper way to deal with this situation in your opinion? Is there a third option I'm not considering?
First Edit
Just by thinking better about the questions I'm realizing that another reasonable solution would be to end with this situation in the database:
"mark juggling -1"
"mark reading 0"
"mark running 1"
Because the position is just a number that has not a "real" value for me. In practice I cannot do that but it looked like a precious information and a way to add a new resource without modifying the other associations, that is not something i need to do from my business logic point of view.
Maybe another error is that I'm letting to much database data spilling on my interface. In this case what i really need from a business perspective is the order of the elements, not their position. The position thing is just a technical detail I have used in the database to accomplish the ordering.
So another question maybe:
is reasonable in your opinion to modify some information in your database if this information is just a technical detail and it's not exposed to the interface users?

I think your solution of inserting the new entity with a position of min(position) - 1 sounds good.
We are also looking to build a RESTful API backed by a SQL database that allows the resources to be reordered and have faced this problem with the implementation details of how to add a new entry to the start or in the middle of the list without having to update the positions in all the entities.
In our implementation we plan use a floating point number for the position field and plan to follow these rules:
If the user wants to insert the entity to the start of the list, insert with a position of min(position) - 1.0
If the user wants to insert the entity to the end of the list, insert with a position of max(position) + 1.0
If the user wants to insert the entity anywhere else, insert with a position equal to the average of the positions of the two entities either side.
With the technology we are using we get 1073 inserts between two entities before we need to rebalance, by updating the position fields in all the entities to be all spaced 1.0 apart from each other. This is fine for our use case.
Side note: With a graph database you don't have this problem. It is very easy to add a new entity anywhere within the list by just updating the two relationships of the nodes either side, just like when you insert something into a Linked List. So that is something else to consider if you are not tied to a SQL database
All of the above is implementation details. With regards to how the API should look to an end user, I would vote to make it appear however is easiest for your clients. In our use case I hope for us to make the position field appear as a 0 based integer index (just like an array), as that will be easiest for our clients. This would mean that inserting a new entity at the beginning of the list will make it appear that the position field in all the other entities has changed, but I think that is fine. It doesn't fly in the face of the REST philosophy too much. It's pragmatic.

Best practices for managing updating a database with a complex set of changes

I am writing an application where I have some publicly available information in a database which I want the users to be able to edit. The information is not textual like a wiki but is similar in concept because the edits bring the public information increasingly closer to the truth. The changes will affect multiple tables and the update needs to be automatically checked before affecting the public tables.
I'm working on the design and I'm wondering if there are any best practices that might help with some particular issues.
I want to provide undo capability.
I want to show the user the combined result of all their changes.
When the user says they're done, I need to check the underlying public data to make sure it hasn't been changed by somebody else.
My current plan is to have the user work in a set of tables setup to be a private working area. Once they're ready they can kick off a process to check everything and update the public tables. Undo can be recorded using Command pattern saving to a table.
Are there any techniques I might have missed or useful papers or patterns?
Thanks in advance!

I would do it like this:
Use insert only databases, you never update data only add new rows
Each row has a valid from date, a valid to date and who made the change
Read the data through a query where the valid to date = null, and the row is approved, this gives the most recent row
When a user adds data, he can see his changes by selecting the last row that he added
When the user is happy with the changes he has made he can mark them as approved
When they are approved they can be seen by other users
Undo is not a problem since you have all the previous versions, you can mark a row as no longer being approved and revert to a previous version.

Database design question. BIT column for deletions

Part of my table design is to include a IsDeleted BIT column that is set to 1 whenever a user deletes a record. Therefore all SELECTS are inevitable accompanied by a WHERE IsDeleted = 0 condition.
I read in a previous question (I cannot for the love of God re-find that post and reference it) that this might not be the best design and an 'Audit Trail' table might be better.
How are you guys dealing with this problem?
Update
I'm on SQL Server. Solutions for other DB's are welcome albeit not as useful for me but maybe for other people.
Update2
Just to encapsulate what everyone said so far. There seems to be basically 3 ways to deal with this.
Leave it as it is
Create an audit table to keep track of all the changes
Use of views with WHERE IsDeleted = 0

Therefore all SELECTS are inevitable accompanied by a WHERE IsDeleted = 0 condition.
This is not a really good way to do it, as you probably noticed, it is quite error-prone.
You could create a VIEW which is simply
CREATE VIEW myview AS SELECT * FROM yourtable WHERE NOT deleted;
Then you just use myview instead of mytable and you don't have to think about this damn column in SELECTs.
Or, you could move deleted records to a separate "archive" table, which, depending on the proportion of deleted versus active records, might make your "active" table a lot smaller, better cached in RAM, ie faster.

If you have to have this kind of Deleted Bit column, then you really should consider setting up some VIEWs with the WHERE clause in it, and use those rather than the underlying tables. Much less error prone.
For example, if you have this view:
CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No
Then someone who wants to see current products can simply write:
SELECT * FROM [Current Product List]
This is much less error prone than writing:
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No
As you say, people will forget that WHERE clause, and get confusing and incorrect results.
P.S. the example SQL comes from Microsoft's Northwind database. Normally I would recommend NOT using spaces in column and table names.

We're actively using the "Deleted" column in our enterprise software. It is however a source of constant errors when forgetting to add "WHERE Deleted = 0" to an SQL query.
Not sure what is meant by "Audit Trail". You may wish to have a table to track all deleted records. Or there may be an option of moving the deleted content to paired tables (like Customer_Deleted) to remove the passive content from tables to minimize their size and optimize performance.

A while ago there was some blog uproar on this issue, Ayende and Udi Dahan both posted on this.

Nai this is totally up to you.
Do you need to be able to see who has deleted / modified / inserted what and when? If so, you should design the tables for this and adjust your procs to write these values when they are called.
If you dont need an audit trail, dont waste time with one. Just do as you are with IsDeleted.
Personally, I flag things right now, as an audit trail wasn't specified in my spec, but that said, I don't like to actually delete things. Hence, I chose to flag it. I'm not going to waste a clients time writing something they diddn't request. I wont mess about with other tables because that's another thing for me to think about. I'd just make sure my index's were up to the job.
Ask your manager or client. Plan out how long the audit trail would take so they can cost it and let them make the decision for you ;)

Udi Dahan said this:
Model the task, not the data
Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).
As we broaden our perspective to more parts of the system, we see this same pattern repeating:
Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.
Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.
Jobs aren’t deleted – they’re filled (or their requisition is revoked).
In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

If you have Oracle DB, then you can use audit trail for auditing. Check the AUDIT VAULT tool form OTN, here. It even supports SQL Server.

Views (or stored procs) to get at the underlying table data are the best way. However, if you have the problem with "too many cooks in the kitchen" like we do (too many people have rights to the data and may just use the table without knowing enough to use the view/proc) you should try using another table.
We have a complete mimic of the base table with a few extra columns for tracking. So Employee table has an EmployeeDeleted table with the same schema but extra columns for when it was deleted and who deleted it and sometimes even the reason for deletion. You can even get fancy and have triggers do the insertion directly instead of going through applications/procs.
Biggest Advantage: no flag to worry about during selects
Biggest Disadvantage: any schema changes to the base table also have to be made on the "deleted" table
Best for: situations where for whatever reason (usually political with us) many not-as-experienced people have rights to the data but still expect it to be accurate without having to understand flags or schemas, etc

I've used soft deletes before on a number of applications I've worked on, and overall it's worked out quite well. Yes, there is the issue of always having to remember to add AND IsActive = 1 to all of your SELECT queries, but really that's not so bad. You can create views if you don't want to have to remember to always do that.
The reason we've done this is because we had very specific business needs to be able to report on records that have been deleted. The reporting needs varied widely - sometimes they'd need to see just the active records, or just the inactive records, or sometimes a mix of both - so pushing all the deleted records into an audit table wasn't a very good option.
So, depending on your particular business needs, I think this approach is certainly a viable option.

Creating a Notifications type feed in GAE Objectify

I'm working on a notification feed for my mobile app and am looking for some help on an issue.
The app is a Twitter/Facebook like app where users can post statuses and other users can like, comment, or subscribe to them.
One thing I want to have in my app is to have a notifications feed where users can see who liked/comment on their post or subscribed to them.
The first part of this system I have figured out, when a user likes/comments/subscribes, a Notification entity will be written to the datastore with details about the event. To show a users Notification's all I have to do is query for all Notification's for that user, sort by date created desc and we have a nice little feed of actions other users took on a specific users account.
The issue I have is what to do when someone unlikes a post, unsubscribes or deletes a comment. Currently, if I were to query for that specific notification, it is possible that nothing would return from the datastore because of eventual consistency. We could imagine someone liking, then immediate unliking a post (b/c who hasn't done that? =P). The query to find that Notification might return null and nothing would get deleted when calling ofy().delete().entity(notification).now(); And now the user has a notification in their feed saying Sally liked his post when in reality she liked then quickly unliked it!
A wrench in this whole system is that I cannot delete by Key<Notification>, because I don't really have a way to know id of the Notification when trying to delete it.
A potential solution I am experimenting with is to not delete any Notifications. Instead I would always write Notification's and simply indicate if the notification was positive or negative. Then in my query to display notifications to a specific user, I could somehow only display the sum-positive Notification's. This would save some money on datastore too because deleting entities is expensive.

There are three main ways I've solved this problem before:
deterministic key
for example
{user-Id}-{post-id}-{liked-by} for likes
{user-id}-{post-id}-{comment-by}-{comment-index} for comments
This will work for most basic use cases for the problem you defined, but you'll have some hairy edge cases to figure out (like managing indexes of comments as they get edited and deleted). This will allow get and delete by key
parallel data structures
The idea here is to create more than one entity at a time in a transaction, but to make sure they have related keys. For example, when someone comments on a feed item, create a Comment entity, then create a CommentedOn entity which has the same ID, but make it have a parent key of the commenter user.
Then, you can make a strongly consistent query for the CommentedOn, and use the same id to do a get by key on the Comment. You can also just store a key, rather than having matching IDs if that's too hard. Having matching IDs in practice was easier each time I did this.
The main limitation of this approach is that you're effectively creating an index yourself out of entities, and while this can give you strongly consistent queries where you need them the throughput limitations of transactional writes can become harder to understand. You also need to manage state changes (like deletes) carefully.
State flags on entities
Assuming the Notification object just shows the user that something happened but links to another entity for the actual data, you could store a state flag (deleted, hidden, private etc) on that entity. Then listing your notifications would be a matter of loading the entities server side and filtering in code (or possibly subsequent filtered queries).
At the end of the day, the complexity of the solution should mirror the complexity of the problem. I would start with approach 3 then migrate to approach 2 when the fuller set of requirements is understood. It is a more robust and flexible approach, but complexity of XG transaction limitations will rear its head - but ultimately a distributed feed like this is a hard problem.

What I ended up doing and what worked for my specific model was that before creating a Notification Entity I would first allocate and ID for it:
// Allocate an ID for a Notification
final Key<Notification> notificationKey = factory().allocateId(Notification.class);
final Long notificationId = notificationKey.getId();
Then when creating my Like or Follow Entity, I would set the property Like.notificationId = notificationId; or Follow.notificationId = notificationId;
Then I would save both Entities.
Later, when I want to delete the Like or Follow I can do so and at the same time get the Id of the Notification, load the Notification by key (which is strongly consistent to do so), and delete it too.
Just another approach that may help someone =D

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight