I want to implement a session in my server to be able to store some temporary data to handle user's requests properly. I would like to get advice how to do it in the right way.
What I need is:
Each user in my website has his own session.
The session is alive as long as the user is surfing my website, and it gets deleted once he leaves the website.
The session is thread-safe, which means it is shared over multiple threads, if one performs changes to it, the others gets these changes.
The session is meant to store a few small data.
What I tried to do:
As I'm using flask to implement my back-end, I used flask-session which was totally perfect for my use case until I encountered problems with the third point, as flask-sessions isn't thread-safe, is there an analogous package in python like flask-session that is thread-safe? because I find flask-session so easy and straightforward.
I tried using global dictionary as a session, as it gets deleted from storage as a user leaves the website, but again I encountered the same problem and even worse as for flask-session, global variables in python isn't thread-safe and furthermore it's not process-safe.
Finally I thought of two solutions and I would like to get advice about the implementation:
Implement the session as a global dictionary in python but using packages for handling multi-proccesses/multi-threads to ensure that all threads share the same dictionary/session (as I don't have enough experience in multi-threads packages I don't know which to use and how to do that but it seems easy to learn, maybe using multiprocessing, threading packages in python).
Implement the session as a database, here there are two things I find it hard to determine:
How to identify each user uniquely? I thought of generating a unique key once a user requests the website for the first time in the server-side and send it to him, and in each request he sends me this id back so I can load his session and manipulate it.
How can I empty sessions of users that are no longer on my website? I can't find a way to know that the user left my website in the server-side, and I can't keep the session in the database as it will get larger with time.
Related
I am making a bus prediction web application for a college project. The application will use GTFS-R data, which is essentially a transit delay API that is updated regularly. In my application, I plan to use a cron job and python script to make regular get requests and write the response to a JSON file, essentially creating a feed of transit updates. I have set up a get request, where the user inputs trip data that will be searched against the feed to determine if there are transit delays associated with their specific trip.
My question is - if the user sends a request at the same time as the JSON file is being updated, could this lead to issues?
One solution I was thinking of is having an intermediary JSON file, which when fully loaded will replace the file used in the search function.
I am not sure if this is a good solution or if it is even needed. I am also not sure of the semantics needed to search for solutions to similar problems so pointers in the right direction would be useful.
Internet says using database for queues is an anti-pattern, and you should use (RabbitMQ or Beanstalked or etc)
But I want all requests stored. So I can later lookup how long they took, any failed attempts or errors or notes logged, who requested it and with what metadata, what was the end result, etc.
It looks like all the queue libraries don't have this option. You can't persist the data to allow you to query it later.
I want what those queues do, but with a "persist to database" option. Does this not exist? How do people deal with this? Do you use a queue library and copy over all request information into your database when the request finishes?
(the language/database I'm using is anything, whatever works best for this)
If you want to log requests, and meta-data about how long they took etc, then do so - log it to the database when you know the relevant results, and run your analytic queries as you would expect to.
The reason to not be using the database as a temporary store is that under high traffic, the searching for, and locking of unprocessed jobs, and then updating or deleting them when they are complete, can take a great deal of effort. That is especially true if don't remove jobs from the active table, and so have to search ever more completed jobs to find those that have yet to be done.
One can implement the task queue by themselves using a persistent backend (like database) to persist the tasks in queues. But the problem is, it may not scale well and also, it is always better to use a proven implementation instead of reinventing the wheel. These are tougher problems to solve and it is better to use the existent frameworks.
For instance, if you are implementing in Python, the typical choice is to use Celary with Redis/RabbitMQ backend.
We need to keep our Firebase data in sync with other databases for full-text search (in ElasticSearch) and other kinds of queries that Firebase doesn't easily support.
This needs to be as close to real-time as possible, we can't just export a nightly dump of the Firebase JSON or anything like that, aside from the fact that this will get rather large.
My initial thought was to run a Node.js client which listens to child_changed, child_added, child_removed etc... events of all the main lists, but this could get a bit unweildy and would it be a reliable way of syncing if the client re-connects after a period of time?
My next thought was to maintain a list of "items changed" events and write to that every time an item is created/updated, similar to the Firebase work queue example. The queue could contain the full path to the data which has changed and the worker just consumes that and updates the local database accordingly.
The problem here is every bit of code which makes updates has to remember to write to this queue otherwise the two systems will get out of sync. Some proxy code shouldn't be too hard to write though.
Has anyone else done anything similar with any success?
For search queries, you can integrate directly with ElasticSearch; there is no need to sync with a secondary database. Firebase has a blog post about integrating and a lib, Flashlight, to make this quick and painless.
Another option is to use the logstash-input-firebase Logstash plugin in order to listen to changes in your Firebase real-time database(s) and forward the data in real-time to Elasticsearch using an elasticsearch output.
I am trying to implement a 2-player turn-based game with a GAE backend. The first thing this game requires is a very simple match making system that operates like this:
User A asks the backend for a match. The back ends tells him to come back later
User B asks the backend for a match. He will be matched with A.
User C asks the backend for a match. The back ends tells him to come back later
User D asks the backend for a match. He will be matched with C.
and so on...
(edit: my assumption is that if I can figure this one out, most other operation i a turn based game can use the same implementation)
This can be done quite easily in Apple Gamecenter and Xbox Live, however I would rather implement this on an open and platform independent backend like GAE. After some research, I have found the following options for a GAE implementation:
use memcache. However, there is no guarantee that the memcache is synchronized across different instances. I did some tests and could actually see match request disappearing due to memcache mis-synchronization.
Harden memcache with Sharding Counters. This does not always solve the multiple instance problem and mayabe results in high memcache quota usage.
Use memcache with Compare and Set. Does not solve the multiple instance problem when used as a mutex.
task queues. I have no idea how to use these but someone mentioned as a possible solution. However, I am afraid that queues will eat me GAE quota very quickly.
push queues. Same as above.
transaction. Same as above. Also probably very expensive.
channels. Same as above. Also probably very expensive.
Given that the match making is a very basic operation in online games, I cannot be the first one encountering this. Hence my questions:
Do you know of any safe mechanism for match making?
If multiple solutions exist, which is the cheapest (in terms of GAE quota usage) solution?
You could accomplish this using a cron tasks in a scheme like this:
define MatchRequest:
requestor = db.StringProperty()
opponent = db.StringProperty(default = '')
User A asks for a match, a MatchRequest entity is created with A as the requestor and the opponent blank.
User A polls to see when the opponent field has been filled.
User B asks for a match, a MatchRequest entity is created with B as as the requestor.
User B pools to see when the opponent field has been filled.
A cron job that runs every 20 seconds? or so runs:
Grab all MatchRequest where opponent == ''
Make all appropriate matches
Put all the MatchRequests as a transaction
Now when A and B poll next they will see that they they have an opponent.
According to the GAE docs on crons free apps can have up to 20 free cron tasks. The computation required for these crons for a small amount of users should be small.
This would be a safe way but I'm not sure if it is the cheapest way. It's also pretty easy to implement.
What is the best way to program an immediate reaction to an update to data in a database?
The simplest method I could think of offhand is a thread that checks the database for a particular change to some data and continually waits to check it again for some predefined length of time. This solution seems to be wasteful and suboptimal to me, so I was wondering if there is a better way.
I figure there must be some way, after all, a web application like gmail seems to be able to update my inbox almost immediately after a new email was sent to me. Surely my client isn't continually checking for updates all the time. I think the way they do this is with AJAX, but how AJAX can behave like a remote function call I don't know. I'd be curious to know how gmail does this, but what I'd most like to know is how to do this in the general case with a database.
Edit:
Please note I want to immediately react to the update in the client code, not in the database itself, so as far as I know triggers can't do this. Basically I want the USER to get a notification or have his screen updated once the change in the database has been made.
You basically have two issues here:
You want a browser to be able to receive asynchronous events from the web application server without polling in a tight loop.
You want the web application to be able to receive asynchronous events from the database without polling in a tight loop.
For Problem #1
See these wikipedia links for the type of techniques I think you are looking for:
Comet
Reverse AJAX
HTTP Server Push
EDIT: 19 Mar 2009 - Just came across ReverseHTTP which might be of interest for Problem #1.
For Problem #2
The solution is going to be specific to which database you are using and probably the database driver your server uses too. For instance, with PostgreSQL you would use LISTEN and NOTIFY. (And at the risk of being down-voted, you'd probably use database triggers to call the NOTIFY command upon changes to the table's data.)
Another possible way to do this is if the database has an interface to create stored procedures or triggers that link to a dynamic library (i.e., a DLL or .so file). Then you could write the server signalling code in C or whatever.
On the same theme, some databases allow you to write stored procedures in languages such as Java, Ruby, Python and others. You might be able to use one of these (instead of something that compiles to a machine code DLL like C does) for the signalling mechanism.
Hope that gives you enough ideas to get started.
I figure there must be some way, after
all, web application like gmail seem
to update my inbox almost immediately
after a new email was sent to me.
Surely my client isn't continually
checking for updates all the time. I
think the way they do this is with
AJAX, but how AJAX can behave like a
remote function call I don't know. I'd
be curious to know how gmail does
this, but what I'd most like to know
is how to do this in the general case
with a database.
Take a peek with wireshark sometime... there's some google traffic going on there quite regularly, it appears.
Depending on your DB, triggers might help. An app I wrote relies on triggers but I use a polling mechanism to actually 'know' that something has changed. Unless you can communicate the change out of the DB, some polling mechanism is necessary, I would say.
Just my two cents.
Well, the best way is a database trigger. Depends on the ability of your DBMS, which you haven't specified, to support them.
Re your edit: The way applications like Gmail do it is, in fact, with AJAX polling. Install the Tamper Data Firefox extension to see it in action. The trick there is to keep your polling query blindingly fast in the "no news" case.
Unfortunately there's no way to push data to a web browser - you can only ever send data as a response to a request - that's just the way HTTP works.
AJAX is what you want to use though: calling a web service once a second isn't excessive, provided you design the web service to ensure it receives a small amount of data, sends a small amount back, and can run very quickly to generate that response.