Determining calendar availability, search algorithm and db structure - database

need to design an availability application.
The users would mark on a calendar all their events, just like a regular calendar. Then I would need to search for something like, "I need someone from next Thursday afternoon through Saturday morning". So what I'm searching for is the negative of what the user puts in - the user puts in time slots when they're NOT available, and I search for the slots when they are.
The simplest thing I can think of is to just put down the calendar info in these 2 tables,
User (id, name, etc.... )
Events (id, user_id_foreign_key, time_start, time_end, type)
The "type" in Event is probably going to be used for something like "daily", "weekly", etc. for infinitely repeating events. Haven't really figured out how to handle those yet...
So all I really have to do is to search for whether or not, for each user, if any of their events' start / end times fall within the time range I gave (i.e., events' start times are earlier than the range's end time OR events' end times are later than the range's start time). If they do then the user is unavailable. I'd of course have to cycle through all the users to do this.
Does this sound like the most efficient way to do this? Came up with this myself so would like to get some feedback. Thanks!

No, it doesn't sound like the most efficient way to do this if the number of users and events grows large in your system.
For example, if you would store the FREE time intervals in your database instead, you would be able to find all the available users with a single database query (find all free intervals that contain the interval you search for, and join this with the users data table). The free intervals data table should be indexed with both start and end times of the intervals.
Maintaining free intervals is a bit more complicated, because when you add an event you normally would split a free interval. But for the search if would be a better datastructure.

Related

Ways to design a digest email implementation?

I'm looking for some thoughts about how to design a digest email feature. I'm not concerned about the actual business code; instead I'd like to focus on the gist of it.
Let's tackle this with a known example: articles. Here's a general overview of some important features:
The user is able to choose the digest frequency (e.g. daily or weekly);
The digest only contains new articles;
"New articles" are to be considered relative to the previous digest that was sent to a specific user;
I've been thinking about the following:
Introduce per-user tracking of articles previously included in a digest and filter those out?
Requires a new database table;
Could become expensive when the table contains millions of rows;
What to do in case of including multiple types of models in the digest? Multiple tracking tables? Polymorphic table? ...?
Use article creation dates to include articles between current date and the chosen digest frequency?
Uses current date and information already present in the database, so no new tables required;
What happens when a user changes from daily to weekly emails? He could receive the same article again in the weekly digest. Should this edge case be considered? If so, how to mitigate?
For some reason the creation date of an article is being updated to today, positively triggering the date comparison again. Should this edge case be considered? If so, how to mitigate?
Or can you think of other ways to implement this feature?
I'm eager to learn your insights.
You can make an additional table that will contain information about digest subscription by each user. This way gives the ability to make a database design cleaner and more universal because mailing is a separate logical module. Aside from that, the additional table gives the ability to expand stored data about digest subscription easy in future. For example:
With help usage of this table, you would manage data easy. For example, you can select all recipient of daily digest:
SELECT *
FROM digest_subscription
WHERE interval_type = 'daily'
AND last_date_distribution <= NOW()
or select all recipient of the weekly digest
SELECT *
FROM digest_subscription
WHERE interval_type = 'weekly'
AND last_date_distribution <= NOW() - INTERVAL 7 DAY
Condition by interval type and compare the last date distribution by rule "equal or less" give the ability to avoid problems of untimely sending of emails (for example technical failures on a server, etc.)
Also, you can make correct articles list with help information of the last data distribution. Usage of the last data distribution gives the ability to avoid problems of interval change. For example:
SELECT *
FROM articles
WHERE created_at >= <the last date distribution of the user>
Of course, you don't avoid the problems of updated creation date. But you should minimize the reasons for that happening. For example, your code can update the modification date but your code shouldn't modify the creation date.

memgraphdb: Support for time travel queries in graph databases

Let's say I want to model a graph with sales people. They belong to an organisation, have a manager, etc.. They are assigned to specific territories and/or client accounts. Your company may work with external partners, which must be managed, and so on. A nice, none-trivial, graph.
Elements in this graph keep on changing all the time: sales people come and go, or move within the organisation and thus change responsibilities; customers sign contract or cancel them, ...
In my specific use cases, the point in time is very important. How did the graph look like at the end of last month? End of last fiscal year? last Monday when we run job ABC. E.g. what was the manager hierarchy end of last month? Which clients did the sale person manage end of last month? and so on.
In our use cases, DELETE doesn't delete anything, but some sort of end_date gets updated. UPDATE doen't update anything, but a new version of the record is created.
I'm sure I can add CREATED and START-/END_DATE properties to nodes as well as relations, and for sure I can also create queries. But these queries are a pain to write, and almost unreadable, with tons of repeating where clauses everywhere.
I wish graph databases (and their graphical query builder) would allow me to travel in time more easily, e.g. by setting a session variable to a point in time, and all the where clauses are automatically added for all nodes and references that have the start/end date properties. The algorithm should not fail for objects that don't have these properties, but consider the condition met.
What are you thoughts about this use case und what help does memgraph provide for these use cases?
thanks a lot
Juergen
As far as I am aware there is not any graph database that supports the type of functionality you are asking about directly although as #buda points out you can model and query against time series data. I agree with #buda that the way in which you would like this to work seems a bit undefined and very application specific so I would not expect this to be a feature of any database.
The closest I can think of to out of the box support for something like this would be to use a Tinkerpop-enabled database with a PartitionStrategy or SubgraphStrategy to create the subgraph of only the times you wanted and then query against that. Another option would be creating a domain specific language to minimize the amount of times you need to repeated code in your queries.
PartitionStrategy
SubgraphStrategy
Domain Specific Languages

"Tinder Data Model": how to select candidates?

I have been thinking about of how Tinder might have setup their data model - especially the part to select the candidates to be shown (I'm not talking about the algorithm that determines the order, but only how to get all possible candidates in the first place). This process should only display other profile, that the current user did not already vote on.
So I could imagine this:
A table for the Users (>40mio entries), and another one for the swipes (>1,5 billion new entries each day).
When selecting the candidates, one could join the two tables (+ obviously apply certain other selection criteria like the location, age range etc) and only return the users that the current user has not yet swiped for.
But: does that scale? Both of those tables are rather huge - so I guess at some point you would run into problem, right?
Furthermore, I read that Tinder is using AWS DynamoDB - so not a relational model. And this makes it even harder I guess...
So my question is: do you have an idea on how Tinder accomplished this?

Finding unique products (never seen before by a user) in a datastore sorted by a dynamically changing value (i.e. product rating)

been trying to solve this problem for a week and couldn't come up with any solutions in all my research so I thought I'd ask you all.
I have a "Product" table and a "productSent" table, here's a quick scheme to help explain:
class Product(ndb.Model):
name = ndb.StringProperty();
rating = ndb.IntegerProperty
class productSent(ndb.Model): <--- the key name here is md5(Product Key+UUID)
pId = ndb.KeyProperty(kind=Product)
uuId = ndb.KeyProperty(kind=userData)
action = ndb.StringProperty()
date = ndb.DateTimeProperty(auto_now_add=True)
My goal is to show users the highest rated product that they've never seen before--fast. So to keep track of the products users have seen, I use the productSent table. I created this table instead of using Cursors because every time the rating order changes, there's a possibility that the cursor skips the new higher ranking product. An example: assume the user has seen products 1-24 in the db. Next, 5 users liked product #25, making it the #10 product in the database--I'm worried that the product will never be shown again to the user (and possibly mess things up on a higher scale).
The problem with the way I'm doing it right now is that, once the user has blown past the first 1,000 products, it really starts slowing down the query performance. Because I'm literally pulling 1,000+ results, checking if they've been sent by querying against the productSent table (doing a keyName lookup to speed things up) and going through the loop until 15 new ones have been detected.
One solution I thought of was to add a repeated property (listProperty) to the Product table of all the users who have seen a product. Or if I don't want to have inequality filters I could put a repeated property of all the users who haven't seen a product. That way when I query I can dynamically take those out. But I'm afraid of what happens when I have 1,000+ users:
a) I'll go through the roof on the limit of repeated properties in one entity.
b) The index size will increase size costs
Has anyone dealt with this problem before (I'm sure someone has!) Any tips on the best way to structure it?
update
Okay, so had another idea. In order to minimize the changes that take place when a rating (number of likes) changes, I could have a secondary column that only has 3 possible values: positive, neutral, negative. And sort by that? Ofcourse for items that have a rating of 0 and get a 'like' (making them a positive) would still have a chance of being out of order or skipped by the cursor--but it'd be less likely. What do y'all think?
Sounds like the inverse, productNotSent would work well here. Every time you add a new product, you would add a new productNotSent entity for each user. When the user wants to see the highest rated product they have not seen, you will only have to query over the productNotSent entities that match that user. If you put the rating directly on the productNotSent you could speed the query up even more, since you will only have to query against one Model.
Another idea would be to limit the number of productNotSent entities per user. So each user only has ~100 of these entities at a time. This would mean your query would be constant for each user, regardless of the number of products or users you have. The creation of new productNotSent entities would become more complex, though. You'd have to have a cron job or something that "tops up" a user's collection of productNotSent entities when they use some up. You also may want to double-check that products rated higher than those already within the user's set of productNotSent entities get pushed in there. These are a little more difficult and well require some design trade-offs.
Hope this helps!
I do not know your expected volumes and exact issues (only did a quick perusal of your question), but you may consider using Json TextProperty storage as part of your plan. Create dictionaries/lists and store them in records by json.dump()ing them to a TextProperty. When the client calls, simply send the TextProperties to the client, and figure everything out on the client side once you JSON.parse() them. We have done some very large array/object processing in JS this way, and it is very fast (particularly indexed arrays). When the user clicks on something, send a transaction back to update their record. Set up some pull or push queue processes to handle your overall product listing updates, major customer rec updates, etc.
One downside is higher bandwidth going out of you app, but I think this cost will be minimal given potential processing savings on GAE. If you structure this right, you may be able to use get_by_id() to replace all or most of your planned indices and queries. We have found json.loads() and json.dumps() to be very fast inside the app, but we only use simple dictionary/list structures.This approach will be, though, a big, big quantum measure lower than your planned use of queries. The other potential issue is that very large objects may run into soft memory limits. Be sure that your Json objects are fairly simple+lightweight to avoid this (e.g. do no include product description, sub-objects, etc. in the Json item, just the basics such as product number). HTH, -stevep

Design question: How would you design a recurring event system? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
If you were tasked to build an event scheduling system that supported recurring events, how would you do it? How do you handle when an recurring event is removed? How could you see when the future events will happen?
i.e. When creating an event, you could pick "repeating daily" (or weekly, yearly, etc).
One design per response please. I'm used to Ruby/Rails, but use whatever you want to express the design.
I was asked this at an interview, and couldn't come up with a really good response that I liked.
Note: was already asked/answered here. But I was hoping to get some more practical details, as detailed below:
If it was necessary to be able to comment or otherwise add data to just one instance of the recurring event, how would that work?
How would event changes and deletions work?
How do you calculate when future events happen?
I started by implementing some temporal expression as outlined by Martin Fowler. This takes care of figuring out when a scheduled item should actually occur. It is a very elegant way of doing it. What I ended up with was just a build up on what is in the article.
The next problem was figuring out how in the world to store the expressions. The other issue is when you read out the expression, how do those fit into a not so dynamic user interface? There was talk of just serializing the expressions into a BLOB, but it would be difficult to walk the expression tree to know what was meant by it.
The solution (in my case) is to store parameters that fit the limited number of cases the User Interface will support, and from there, use that information to generate the Temporal Expressions on the fly (could serialize when created for optimization). So, the Schedule class ends up having several parameters like offset, start date, end date, day of week, and so on... and from that you can generate the Temporal Expressions to do the hard work.
As for having instances of the tasks, there is a 'service' that generates tasks for N days. Since this is an integration to an existing system and all instances are needed, this makes sense. However, an API like this can easily be used to project the recurrences without storing all instances.
I've had to do this before when I was managing the database end of the project. I requested that each event be stored as separate events. This allows you to remove just one occurrence or you could move a span. It's a lot easier to remove multiples than to try and modify a single occurrence and turn it into two. We were then able to make another table which simply had a recurrenceID which contained the information of the recurrence.
#Joe Van Dyk asked: "Could you look in the future and see when the upcoming events would be?"
If you wanted to see/display the next n occurences of an event they would have to either a) be calculated in advance and stored somewhere or b) be calculated on the fly and displayed. This would be the same for any evening framework.
The disadvantage with a) is that you have to put a limit on it somewhere and after that you have to use b). Easier just to use b) to begin with.
The scheduling system does not need this information, it just needs to know when the next event is.
When saving the event I would save the schedule to a store (let's call it "Schedules" and I'd calculate when the event was to fire the next time and save that as well, for instance in "Events". Then I'd look in "Events" and figure out when the next event was to take place and go to sleep until then.
When the app "wakes up" it would calculate when the event should take place again, store this in "Events" again and then perform the event.
Repeat.
If an event is created while sleeping the sleep is interrupted and recalculated.
If the app is starting or recovering from a sleep event or similar, check "Events" for passed events and act accordingly (depending on what you want to do with missed events).
Something like this would be flexible and would not take unnecessary CPU cycles.
Off the top of my head (after revising a couple things while typing/thinking):
Determine the minimum recurrence-resolution needed; that's how often the app runs. Maybe it's daily, maybe every five minutes.
For each recurring event, store the most recent run time, the run-interval and other goodies like expiration time if that's desirable.
Every time the app runs, it checks all events, comparing (today/now + recurrenceResolution) to (recentRunTime + runInterval) and if they coincide, fire the event.
When I wrote a calendar app for myself mumble years ago, I basically just stole the scheduling mechanism from cron and used that for recurring events. e.g., Something taking place on the second Saturday of every month except January would include the instruction "repeat=* 2-12 8-14 6" (every year, months 2-12, the 2nd week runs from the 8th to the 14th, and 6 for Saturday because I used 0-based numbering for the days of the week).
While this makes it quite easy to determine whether the event occurs on any given date, it is not capable of handling "every N days" recurrence and is also rather less than intuitive for users who aren't unix-savvy.
To deal with unique data for individual event instances and removal/rescheduling, I just kept track of how far out events had been calculated for and stored the resulting events in the database, where they could then be modified, moved, or deleted without affecting the original recurrent event information. When a new recurring event was added, all instances were immediately calculated out until the existing "last calculated" date.
I make no claim that this is the best way to do it, but it is a way, and one which works quite well within the limitations I mentioned earlier.
If you have a simple reoccuring event, such as daily, weekly or a couple days a week, whats wrong with using buildt in scheduler/cron/at functionallity? Creating an executable/console app and set up when to run it? No complicated calendar, event or time management.
:)
//W

Resources