Should we use webservices or do direct database access - database

Should we use webservices or do direct database access. Ofcourse direct DB access is relatively faster and also with webservices it is good if we have to make for multiple platforms.
is the time significantly high in case of accessing data through a webservice as against a DB call or is it marginally high ?

I would have to disagree with TruthOf42 in that web services are best practices for data access. There is certainly a big shift towards that approach these days, but I don't think common use is the same as best practice. Just because something is common/popular doesn't mean it's the best fit for all situations.
Why use web services?
If you plan on having more than one application use a generic data access layer.
If you plan on exposing your data to external clients.
If you want to draw some hard physical lines between your application and the database.
I would argue making web service calls will always be slower than just writing queries against the database. But you can mitigate the issues with wise network planning and caching.
I agree with Aphelion in that if it's a simple application, then keep it simple.
A good idea would be to make an interface in your code that gets data and you start with a database implementation. If you find you'd like to introduce web services later, then you can keep the same interface and just implement a version that makes web service calls instead of directly dialing the database.

Don't try to optimize something you haven't even written yet. It's best practices to use web services. Doing direct calls to the database just opens you to more security issues.
The most important thing when writing software is writing it well, speed is usually of last concern.

Related

Best practices to structure a database to be scaling-ready

I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)
I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.
How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?
I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.
What are the best practices for making a structure almost ready to be scaled with partitioning and sharding, and what hacks must be absolutely avoided?
Edit some detail about my application:
The application will run as a multisite behavior
I'll have a database for each application version (db_0_0_1, db_0_0_2, etc..)*
Every 'site' will have a schema inside a database* and a role that can access only his own schemas
Application code will be mostly PHP and few things (daemons and maintenance things) in Python
Web server will probably be Nginx and lighttpd or node.js as support for long-polling tasks (e.g. chat)
Caching will be done with memcached (plus apc for things strictly related to the php code, as it can be used outside php)
The question is really generic, but here are few tips:
Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.
Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)
Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe') you need SELECT create_user('Joe').
check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.
think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?

Best way to access a remote database: via webservice or direct DB-access?

I'm looking to develop an application for Mac and iOS-devices. The application will rely on information stored in a remote database. It needs both read (select) and write (insert, update, delete) access to the database. The application will be a multi-user application.
Now I'm looking at two different approaches to access the database:
- via web service: the application accesses the web service (REST, JSON) which accesses the database. Authentication will be done via HTTP authentication over SSL (https).
- access the remote database directly over a VPN.
The app will be used by a maximum of let's say 100 people and is aimed at small groups/organizations/businesses.
So my question is: what would be the best approach to access the database? What about security and performance? What would a typical implementation for a small business look like?
Any advice will be appreciated.
Thanks
Using web services adds a level of indirection between the clients and the database. This has several advantages that are all due to the fact that the clients need to have no knowledge of the database, only of your web service interface. Since client applications are more complicated to control and update than your server side code, it pays to add a level of business logic on the server that lets you tweak your system without pushing updates to the clients. Main advantages:
Flexibility - you can change the database configuration / replace the data layer altogether and change nothing on the client apps as long as you keep the same web service interface.
Security - implement some authentication mechanism for your web services, and avoid giving clients access credentials to your database engine.
There are some disadvantages too: you pay for that flexibility by adding a level of complexity - it'd probably be faster to just code the database access into the clients and get done with it. Consider the web services layer as an investment that might pay dividends down the road. Whether it's worth it really depends on your business requirements and outlook.
Given the information you have provided, the answer is almost certainly web services, unless the VPN is fast.
If the VPN is fast enough to handle the traffic, you will save a lot of time, effort and expense by accessing the database directly from your application.
You can also provide remote access to virtual PC sessions, if that's your thing.
So it's all going to depend on what your requirements are. There are a lot of ways to do this, and each has its advantages and disadvantages. Making the right decision will require a fair amount of systems analysis, probably beyond the scope of a question posted on StackOverflow.

Webservice/Entity objects or datareader for my winform app?

Is it best to use a webservice to pull data from a database and load it into my entity object, then send the entity objects to my winform app?
Will this make any performance difference over going direct to the database and pulling a datareader back to the winform client, then loading the entities on the client? Some of the users will be in China accessing a database in the US.
Are there better options?
Thanks
This is subjective, but in general, you will find better performance going directly to the database. That's not good for separation of concerns, however.
Given the highly distributed nature of your system, using web services (or at least an SOA approach) makes sense to me. However, I would go an extra step and have the business logic in at the web services tiers, not just data access, but again, that depends highly on the situation. I just think that the less places you have to change code and re-deploy to if coding changes are needed, the better,
Is there a reason this has to be a client app and not a web application? it would make keeping your distributed users up-to-date a bit easier.
The best option is probably to have distributed databases and/or distributed servers. No matter how you go from a client app in China to a database in the US, the network will be a massive bottleneck and performance will likely be pretty horrible. If you can put a replicated database in China, that would make a huge positive difference.
Whether you have a webservice or not is not going to be a huge factor here. Sure, adding a webservice adds a network hop, which is going to negatively impact performance, but as I said, I don't think that will be your performance bottleneck.

An API which allows users to connect directly to the database

I've worked with many APIs and it's never usually an easy task. Messing about with POST requests and then trying to handle the XML is a pain. And I thought wouldn't it be easier for both user and developer if they could just directly interact with the database.
Is it possible to create a user which API users would connect to then assign that certain privilages? For example they would only be able to select from particular tables and columns. And basically make it so they can't do anything malicious or anything you don't want.
I realise that there is a lot more than just taking data so there would be certain limitations there however selecting is probably what goes on the most when it comes to API usage.
Is this a practical idea? Is it secure? I'm really not sure, I'm the furthest thing from a professional here, it's just an idea.
You could set up a RESTful API that can speak directly to a mySQL database like PHPRestSQL. It can do all the dirty work for you, but you would have full freedom in implementing new functions or restrictions.
What do you mean exactly by API, which API are you talking about?
This sounds more like a design decision. If I understand correctly, you want to interact with the User layer and Database / Persistence layer of an application. In general this is a bad idea. First it really reduces code reuse. This may not be a concern at your point in the development but it is a good idea to learn best practices. The layers I usually follow are:
Model-View-Controller
Service
Persistence
Model / Domain
You can see here that MVC (user interface) is separated from the model by at least two layers. This is usually more secure, and promotes code reuse.
Yes you can do this with any client / server database system (if it is a database server there must be a way to connect to it.)
It is not done much because of a number of issues.
Maintenance is hard
Security is worse
In general there is no benefit.
Basically it causes headaches and does not really provide anything which is good.
The two most important counter-questions are:
1) Is the underlying DB already determined, or can you choose one?
2) What sort of DB operations do your users really need to perform? If "select" is really enough then yes, it probably does make sense to expose the data via a "read only" web service. But if you want to update, delete, make stored procedure calls, etc. then you're going to need something like SQL and it's way hard to build a web services API for that.
If the answer to counter-question 1 is "I can choose", then take a look at CouchDB, which already has a RESTful API (http://wiki.apache.org/couchdb/HTTP_REST_AP) built for it.
Yes, almost all databases allow you to create users with only select access to a specific schema. I've used this to give advanced Excel users ODBC access without worrying that they will mess anything up. Use very sparingly--it has always created maintenance difficulty because people end up using parts of your schema in ways that you didn't intend (or had plans to replace).
You can connect Access to any database - Oracle, for example.
However, it's not necessarily a good idea - for security and data integrity reasons.

In Memory Database

I'm using SqlServer to drive a WPF application, I'm currently using NHibernate and pre-read all the data so it's cached for performance reasons. That works for a single client app, but I was wondering if there's an in memory database that I could use so I can share the information across multiple apps on the same machine. Ideally this would sit below my NHibernate stack, so my code wouldn't have to change. Effectively I'm looking to move my DB from it's traditional format on the server to be an in memory DB on the client.
Note I only need select functionality.
I would be incredibly surprised if you even need to load all your information in memory. I say this because, just as one example, I'm working on a Web app at the moment that (for various reasons) loads thousands of records on many pages. This is PHP + MySQL. And even so it can do it and render a page in well under 100ms.
Before you go down this route make sure that you have to. First make your database as performant as possible. Now obviously this includes things like having appropriate indexes and tuning your database but even though are putting the horse before the cart.
First and foremost you need to make sure you have a good relational data model: one that lends itself to performant queries. This is as much art as it is science.
Also, you may like NHibernate but ORMs are not always the best choice. There are some corner cases, for example, that hand-coded SQL will be vastly superior in.
Now assuming you have a good data model and assuming you've then optimized your indexes and database parameters and then you've properly configured NHibernate, then and only then should you consider storing data in memory if and only if performance is still an issue.
To put this in perspective, the only times I've needed to do this are on systems that need to perform millions of transactions per day.
One reason to avoid in-memory caching is because it adds a lot of complexity. You have to deal with issues like cache expiry, independent updates to the underlying data store, whether you use synchronous or asynchronous updates, how you give the client a consistent (if not up-to-date) view of your data, how you deal with failover and replication and so on. There is a huge complexity cost to be paid.
Assuming you've done all the above and you still need it, it sounds to me like what you need is a cache or grid solution. Here is an overview of Java grid/cluster solutions but many of them (eg Coherence, memcached) apply to .Net as well. Another choice for .Net is Velocity.
It needs to be pointed out and stressed that something like NHibernate is only consistent so long as nothing externally updates the database and that there is exactly one NHibernate-enabled process (barring clustered solutions). If two desktop apps on two different PCs are both updating the same database with NHibernate the caching simply won't work because the persistence units simply won't be aware of the changes the other is making.
http://www.db4o.com/ can be your friend!
Velocity is an out of process object caching server designed by Microsoft to do pretty much what you want although it's only in CTP form at the moment.
I believe there are also wrappers for memcached, which can also be used to cache objects.
You can use HANA, express edition. You can download it for free, it's in-memory, columnar and allows for further analytics capabilities such as text analytics, geospatial or predictive. You can also access with ODBC, JDBC, node.js hdb library, REST APIs among others.

Resources