servlet/database: how to do fine-grained database connection and statement management (not bound to servlet lifetime)

servlet/database: how to do fine-grained database connection and statement management (not bound to servlet lifetime) - database

Question/Environment
The goal of my web application is to be a handy interface to the database at our company.
I'm using:
Scalatra (as minimal web framework)
Jetty (as servlet container)
SBT (Simple Build Tool)
JDBC (to interface with the database)
One of the requirements is that each user can manage a number of concurrent queries, and that even when he/she logs off, the queries keep running and can be retrieved later on (or their completion status checked if they stopped for any reason).
I suppose queries will likely have to run in their own seperate thread.
I'm not even sure if this issue is orthogonal or not to connection pooling (which I'm definitely going to use, BoneCP and C3PO seem nice).
Summary
In short: I need to have very fine-grained control over the lifetime of database requests, and they cannot be bound to the servlet lifetime
What ways are there to fulfill my requirements? I've searched quite a bit on google and stack overflow and haven't found anything that addresses my problem, is it even possible?

What is missing from your stack is a scheduler. e.g http://www.quartz-scheduler.org/
A rough explanation:
Your connection pool (eg C3P0) will be bound to the application's lifecycle.
Your servlets will be sending query requests to the scheduler (these will be associated to the user requesting the query).
The scheduler will be executing queries as soon as possible, by using connections from the connection pool. It may also do so in a synchronized/serialized order (for each user).
The user will be able to see all query requests associated with him, probably with status (pending, completed with results etc).

Related

"Real Time" data change detection in SQL Server

We have a requirement for notifying external systems of changes in data in various tables in a SQL Server database. The choice of which data to monitor is somewhat under the control of the user (gets to choose from a list of what we support). The recipients of the notifications may be on a locally connected network (i.e., in the same data center) or they may be remote.
We currently handle this by application code within our data access layer that detects changes and queues notifications on a Service Broker queue which is monitored by a Windows service that performs the actual notification. Not quite real time but close enough.
This has proven to have some maintenance problems so we are looking at using one of the change detection mechanisms that are built into SQL Server. Unfortunately none of the ones I have looked at (I think I looked at them all) seem to fit very well:
Change Data Capture and Change Tracking: Major problem is that they require polling the captured information to determine changes that are to be passed on to recipients. I suspect that will introduce too much overhead.
Notification Services: Essentially uses SQL Server as a web server, which is a horrible waste of licenses. It also requires access through at least two firewalls in the network, which is unacceptable from a security perspective.
Query Notification: Seems the most likely candidate but does not seem to lend itself particularly well to dynamically choosing the data elements to watch. The need to re-register the query after each notification is sent means that we would keep SQL Server busy with managing the registrations
Event Notification: Designed to notify on database or instance level events, not really applicable to data change detection.
About the best idea I have come up with is to use CDC and put insert triggers on the change data tables. The triggers would queue something to a Service Broker queue that would be handled by some other code to perform the notifications. This is essentially what we do now except using a SQL Server feature to do the change detection. I'm not even sure that you can add triggers to those tables but I thought I'd get feedback before spending a lot of time with a POC.
That seems like an awful roundabout way to get the job done. Is there something I have missed that will make the job easier or have I misinterpreted one of these features?
Thanks and I apologize for the length of this question.

Why don't you use update and insert triggers? A trigger can execute clr code, which is explained enter link description here

Reducing impact on server load caused by long but non-priority adhoc queries

I have a couple of applications which do long queries in an OLTP database. They however have a significant impact on database server load.
Is it possible to run them with low priority? Is still intend to allow the user make adhoc queries, but response time is not critical.
Please advice solutions for oracle and/or sqlserver.

If you're using 11g, then perhaps the Database Resource Manager will help you out. The resource manager allows you to change consumer groups based on I/O consumption, something that was unavailable in prior releases. If not, the best you can do is lower priority based on CPU use.

Place resource limits on their accounts via profiles. Here is a link:
http://psoug.org/reference/profiles.html

For this you can use Oracle Resource Manager.
Most important for this is that you need to have an idea how Resource Manager can pick out which session to throttle. You can have lots of criteria to assign a user to a resource consumer group. Often username is used but this can be a few other things like machine, module etc. See Creating Consumer Group Mapping Rules (http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/dbrm004.htm#CHDEDAIB)
Specifying Automatic Switching by Setting Resource Limits (http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/dbrm004.htm#CHDDCGGG) could be very useful for you since all users are starting in the same OLTP group. Some start long running adhoc queries. You want those sessions to switch to a lower priority group for the duration of that call.
There could be one little snag: if that throttled session has locks, those locks will stay longer and might cause problems elsewhere.

Storing database connections in session, in a small scale webapp

I have a j2ee webapp that's being used internally by ~20-30 people.
There is no chance of significant growth in the number of users.
From what I understood there's a trade-off between opening a new DB connection for each request made to the webapp (expensive, but doesn't block other users when the DB is in use), to using the singleton pattern (doesn't open new connections but only allows one user at a time).
I thought that since I know that only 30 users will ever use my webapp at the same time, maybe the simplest and best solution would be to store the connection as a session attribute, thus reducing to a minimum the amount of openings made, while still allocating one connection per user.
What do you think?

From what I understood there's a
trade-off between opening a new DB
connection for each request made to
the webapp
That is what connection pools are for. If you use a connection pool in your application, the pool once initialized, is in charge of providing connections for use in the application as and when needed. In a properly tuned connection pool, there are going to be enough connections created on reserve that can be provided to the application, mitigating the need to create and open a connection only when the application requests for it.
I thought that since I know that only
30 users will ever use my webapp at
the same time, maybe the simplest and
best solution would be to store the
connection as a session attribute
Per-user connections are not a good idea, primarily when a web application is concerned. In a web application, it is perfectly possible for users to initiate multiple requests to the server (think multi-tabbed browsing). In such a case, the use of a single connection per user will result in weird application behavior, unless you synchronize access to the connection.
One must also consider the side-effect of putting transient attributes into the session - Connection objects are not serializable and hence must be marked transient. If the session is deserialized at some point, one has to account for the fact that the Connection object will not be available, and must be re-initialized.

I think you're getting into premature optimization especially given the scale of the application. Opening a new connection is not that expensive and like Makach says, most modern RDBMSs handle connection pooling and will hold connections open for subsequent requests. You'd be trying to write better code than the compiler, so to speak.

No. Don't do that. It's perfectly ok to reconnect to the database every time you need to. Any database management system will do their own connection pool caching I think.
If you want to try to keep open connections you'll make it incredible hard for yourself to manage this in a secure, bug-free, safe etc way.

Why use Singleton to manage db connection?

I know this has been asked before here there and everywhere but i can't get a clear explanation so i'm going to pitch it again. So what is all of the fuss about using a singleton to control the db connection in your web app? Some like it some hate it i don't understand it. From what I've read, "it's to ensure that there is always only one active connection to your DB". I mean why is that a good thing? 1 active DB connection on a data driven web app processing multiple requests per second spells trouble doesn't it? For whatever reason nobody can properly explain this. I've been all over the web. I know i'm thick.

Assuming Java here, but is relevant to most other technologies as well.
I'm not sure whether you've confused the use of a plain singleton with a service locator. Both of them are design patterns. The service locator pattern is used by applications to ensure that there is a single class entrusted with the responsibility of obtaining and providing access to databases, files, JMS queues, etc.
Most service locators are implemented as singletons, since there is no need for multiple service locators to do the same job. Besides, it is useful to cache information obtained from the first lookup that can be later used by other clients of the service locator.
By the way, the argument about
"it's to ensure that there is always
only one active connection to your DB"
is false and misleading. It is quite possible that the connection can be closed/reclaimed if left inactive for quite a long period of time. So caching a connection to the database is frowned upon. There is one deviation from this argument; "re-using" the connection obtained from the connection pool is encouraged as long as you do so with the same context, i.e. within the same HTTP request, or user request (whichever is applicable). This done obviously, from the point of view of performance, since establishing new connections can prove to be an expensive operation.

High-performance (or even medium-performance) web apps use database connection pooling, so one DB connection can be shared among many web requests. The singleton is usually the object which manages this pool. I think the motivation for using a singleton is to idiot-proof against maintenance programmers that might otherwise instantiate many of these objects needlessly.

"it's to ensure that there is always only one active connection to your DB." I think that would be better stated as to ensure each CLIENT has only one active connection to your DB. The reason why this is incredibly important is because you want to prevent deadlocks. If I have TWO open database connections (as a client) I might be updating on one connection, then I might try to update the same row in another connection. This will a deadlock which the database cannot detect. So, the idea of the singleton is basically to make sure that there is ONE object who is charge of handing out database connections to each client. Basically. You don't HAVE to have a singleton for this, but most people will tell you it just makes sense that the system only has one.

You're right--usually this isn't what you want.
However, there are plenty of cases where you need to throttle yourself down to a single connection. By serializing your access to the database through a singleton, you can address other issues or constraints like load, bandwidth, etc.
I've done something similar in the past for a bulk processing app. Instead, though, I used a semaphore to synchronize access to the database so I could allow n concurrent db operations.

One might want to use a singleton due to database server constraints, for example, a server might limit the number of connections.
My main conscious reason is that you know what connections can be managed/closed etc., just makes things a bit more organised when you don't have unnecessary, redundant connections.

I don't think it's a simple answer. For instance on ASP.NET, the platform implements connection pooling by default, so it will automatically adjust a "pool" of connections and re-use them so you're not constantly creating and destroying expensive objects.
However, let's say you were writing a data collection application that monitored 200 separate input sources. Every time one of those inputs changed, you fire off a thread that records the event to the database. I would say that could be a bad design if there's a chance that even a fraction of those could fire off at the same time. Suddenly having 20 or 40 active database connections is inefficient. It might be better to queue the updates, and as long as there are updates left in the queue, a singleton connection picks them off the queue and executes them on the server. It's more efficient because you only have to negotiate the connection and authentication once. Once there's no activity for a while you could choose to close down the connection. This kind of behavior would be hard to implement without a central resource manager like a singleton.

"only one active connection" is a very narrow statement for illustration. It could just as well be a singleton managing a pool of connection. The point of a singleton for database connections is that you don't want every consumer making it's own connection or set of connections.

I think you might want to be more specific about, "using a singleton to control the db connection in your web app." Ideally, a java.sql.Connection object will not be thread safe, but your javax.sql.DataSource may want to pool connections, so you should go to a single instance of it to share the pooling.

you are more looking for one connection per request, not one connection for the entire application. you can still control access to it through a singleton though (storing the connection in the HttpContext.Items collection).

It guarantees that each client using your site only gets one connection to the db.
You really do not want a new connection being made everytime a user does an action that will create a db query. Not only for performance reasons with the connection handshaking involved, but to decrease load on the db server.
DB connections are a precious commodity, and this technique helps minimize the amount used at any given time.

.NET CF mobile device application - best methodology to handle potential offline-ness?

I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!

We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.

I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.

The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight