Databases Software for Rapid Queries - database

I'm writing a Comet application that has to keep track of each open connection to the server. I want to write an entry to the database for each connection, and I will have to search the database for the proper connections every time the application receives new data (often), which is why I don't want to start off on the wrong foot by choosing slow database software. Any suggestions for a database that favors rapid, small pieces of data (rather than occasional large pieces of data)?

I suggest rather using a server platform that allows the creation of persistent servers, that keep all such info in the memory. Thus all database access will be limited to writing (if you want to actually save any information permanently), which usually is signifficantly less in typical Comet-apps (such as chats/games).
Databases are not made to keep such data. Accessing a database directly always means composing query strings, often sending them to a db server (sometimes even over the network), db lookup, serialization of the results, sending back, deserialization and traversing the fetched results. There is no way this can be even nearly as fast as just retrieving a value from memory.
If you really want to stick with PHP, then I suggest you have a look at memcached and similar caching servers.
greetz
back2dos

SQL Server 2008 has a FileStream data type that can be used for rapid, small pieces of data. McLaren Electronic Systems uses it to capture and analyze telemetry/sensor data from Formula One race cars.

Hypersonic: http://hsqldb.org/
MySQL (for webapps)

Related

How to sync SQL Server data to MongoDB

I have a table in SQL Server, and I have to sync the data to MongoDB for r/w separation. The table was inserted/updated/deleted so often that the SQL Server was not able to be stopped.
Is there an experienced way to implement that?
A proper answer to that is probably going to be beyond the scope of the question as posed, but things like workload (lots of data? heavy volume? network toplogy), SLAs (how fast, how often, how foolproof, can you have down time?) and synchronicity (See Brewers Cap Theorem), will dramatically change what is or isnt reasonable as an approach.
A naive answer might be as simple as "write the data to a csv once a day, then feed those into mongo via a script".
On the other side of the coin, there are ETL tools and libraries out there which I'm sure specialize in moving lots and lots of data as quickly as possible between your two storage engines. In fact I believe there's an ODBC driver for mongo which is a standard you can find in products from microsoft, oracle, open source, and everything in between.
A happy middle ground might just be a lightweight application or a script whos job it is to fire off documents one by one (or in batches) off of a queue till its empty. App reads sql, writes mongo, and handles any distributed transaction logic you may wish to impose.

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

SQL Server table or flat text file for infrequently changed data

I'm building a .net web app that will involve validating an email input field against a list of acceptable email addresses. There could be up to 10,000 acceptable values and they will not change very often. When they do, the entire list would be replaced, not individual entries.
I'm debating the best way to implement this. We have a SQL Server database but since these records will be relatively static and only replaced in bulk, I'm considering just referencing / searching text files containing the string values. Seems like that would make the upload process easier and there is little benefit to having this info in an rdbms.
Feedback appreciated.
If the database is already there then use it. What you are talking about is exactly what databases are designed to do. If down the road you decided you need do something slightly more complex you will be very glad you went with the DB.
I'm going to make a few assumptions about your situation.
Your data set contains more than 1 column of usable data.
Your record set contains more rows than you will always display.
Your data will need to be formatted into some kind of output view (e.g. HTML).
Here are some specific reasons why your semi-static data should stay in SQL and not in a text file.
You will not have to parse the text data every time you wish to read and process it. (String parsing is a relatively heavy memory and CPU load). SQL will store your columns as structured data which are pre-parsed.
You will not have to develop your own row-filtering or searching algorithm (or implement a library that does it for you). SQL is already a sophisticated engine that applies advanced algorithms of caching, query optimization (with many underlying advanced algorithms for seek/search/scan/index/hash/etc.)
You have the option with SQL to expand your solution's robustness and integration with other tools over time. (Putting the data into a text or XML file will limit future possibilities).
Caveats:
Your particular SQL Server implementation can be affected by disk IO and network latency performance. To tune disk IO performance, a well constructed SQL Server places the data files (.mdf) on fast multi-spindle disk arrays tuned for fast reads, and separates them from the Log Files (.log) on spindles that are tuned for fast writes.
You might find that the latency and busy-ness (not a word) of the SQL server can affect performance. If you're running on a very busy or slow SQL server, you might be in a situation where you'd look to a local-file alternative, in which case I would recommend a structured format such as XML. (However, if you find yourself looking for work-arounds to avoid using your SQL server, it would probably be best to invest some time/money into improving your SQL implementation.

Merge multiple Access database into one big database

I have multiple ~50MB Access 2000-2003 databases (MDB files) that only contain tables with data. The data-databases are located on a server in my enterprise that can take ~1-2 second to respond (and about 10 seconds to actually open the 50 MDB file manually while browsing in the file explorer). I have other databases that only contain forms. Most of those forms-database (still MDB files) are actually copied from the server to the client (after some testing, the execution looks smoother) before execution with a batch file. Most of those forms-databases use table-links to fetch the data from the data-databases.
Now, my question is: is there any advantage/disadvantage to merge all data-databases from my ~50MB databases to make one big database (let's say 500MB)? Will it be slower? It would actually help to clean up my code if I wouln't have to connect to all those different databases and I don't think 500MB is a lot, but I don't pretend to be really used to Access by any mean and that's why I'm asking. If Access needs to read the whole MDB file to get the data from a specific table, then it would be slower. It wouldn't be really that surprising from Microsoft, but I've been pleased so far with MS Access database performances.
There will never be more than ~50 people connected to the database at the same time (most likely, this number won't in fact be more than 10, but I prefer being a little bit conservative here just to be sure).
The db engine does not read the entire MDB file to get information from a specific table. It must read information from the system tables (hidden tables whose names start with MSys) to determine where the data you need is stored. Furthermore, if you're using a query to retrieve information from the table, and the db engine can use an index to determine which rows satisfy the query's WHERE clause, it may read only those rows from the table.
However, you have issues with your network's performance. When those lead to dropped connections, you risk corrupting the MDB. That is why Access is not well suited for use in wide area networks or with wireless connections. And even on a wired LAN, you can suffer such problems when the network is flaky.
So while reducing the amount of data you pull across the network is a good thing, it is not the best remedy for Access on a flaky network. Instead you should migrate the data to a client-server db so it can be kept safe in spite of dropped connections.
You are walking on thin ice here.
Access will handle your scenario, but is not really meant to allow so many concurrent connections.
Merging everything in a big database (500mb) is not a wise move.
Have you tried to open it from a network location?
As far as I can suggest, I will use a backend SqlServer Express to merge all the tables in a single real client-server database.
The changes required by client mdb front-end should not be very pervasive.

Copying data from a local database to a remote one

I'm writing a system at the moment that needs to copy data from a clients locally hosted SQL database to a hosted server database. Most of the data in the local database is copied to the live one, though optimisations are made to reduce the amount of actual data required to be sent.
What is the best way of sending this data from one database to the other? At the moment I can see a few possibly options, none of them yet stand out as being the prime candidate.
Replication, though this is not ideal, and we cannot expect it to be supported in the version of SQL we use on the hosted environment.
Linked server, copying data direct - a slow and somewhat insecure method
Webservices to transmit the data
Exporting the data we require as XML and transferring to the server to be imported in bulk.
The data copied goes into copies of the tables, without identity fields, so data can be inserted/updated without any violations in that respect. This data transfer does not have to be done at the database level, it can be done from .net or other facilities.
More information
The frequency of the updates will vary completely on how often records are updated. But the basic idea is that if a record is changed then the user can publish it to the live database. Alternatively we'll record the changes and send them across in a batch on a configurable frequency.
The amount of records we're talking are around 4000 rows per table for the core tables (product catalog) at the moment, but this is completely variable dependent on the client we deploy this to as each would have their own product catalog, ranging from 100's to 1000's of products. To clarify, each client is on a separate local/hosted database combination, they are not combined into one system.
As well as the individual publishing of items, we would also require a complete re-sync of data to be done on demand.
Another aspect of the system is that some of the data being copied from the local server is stored in a secondary database, so we're effectively merging the data from two databases into the one live database.
Well, I'm biased. I have to admit. I'd like to hypnotize you into shelling out for SQL Compare to do this. I've been faced with exactly this sort of problem in all its open-ended frightfulness. I got a copy of SQL Compare and never looked back. SQL Compare is actually a silly name for a piece of software that synchronizes databases It will also do it from the command line once you have got a working project together with all the right knobs and buttons. Of course, you can only do this for reasonably small databases, but it really is a tool I wouldn't want to be seen in public without.
My only concern with your requirements is where you are collecting product catalogs from a number of clients. If they are all in separate tables, then all is fine, whereas if they are all in the same table, then this would make things more complicated.
How much data are you talking about? how many 'client' dbs are there? and how often does it need to happen? The answers to those questions will make a big difference on the path you should take.
There is an almost infinite number of solutions for this problem. In order to narrow it down, you'd have to tell us a bit about your requirements and priorities.
Bulk operations would probably cover a wide range of scenarios, and you should add that to the top of your list.
I would recommend using Data Transformation Services (DTS) for this. You could create a DTS package for appending and one for re-creating the data.
It is possible to invoke DTS package operations from your code so you may want to create a wrapper to control the packages that you can call from your application.
In the end I opted for a set of triggers to capture data modifications to a change log table. There is then an application that polls this table and generates XML files for submission to a webservice running at the remote location.

Resources