Is Google Cloud SQL highly available by default? - google-app-engine

On this page it says
You can access a familiar, highly available SQL database
but then on the same page down below it says
Unsupported Features: MySql replication
Can anyone clarify this?

Each Cloud SQL instance replicates your data at the storage level, rather than within MySQL. So a single MySQL instance writes every byte multiple times in multiple geographic locations. Even if an entire datacenter becomes available a new MySQL instance can be spun up in a new location and carry on serving your data.
See https://developers.google.com/cloud-sql/faq#replication

Related

Presto integration with MSSQL

I'm looking for a tutorial or something that allow me to learn Presto step by step.
The idea is to start integrating file's and MSSQL, which is my knowledge area.
Unfortunately, since it is a relatively new area, I didn't find anything more than Facebook page or the Presto.io page, however it is not good enough for someone that want to start knowing the big data world from scratch.
I will appreciate your help and/or orientation in this area.
Presto has 2 primary use cases:
querying data stored in a cluster (on Hadoop's HDFS) or in a cloud (e.g. Amazon S3)
data federation, i.e. querying (and joining) data from multiple data sources (e.g. HDFS, S3, traditional RDBMS like PostgreSQL or SQL Server)
As far as SQL Server support is concerned -- Presto supports connecting to SQL Server since https://github.com/prestosql/presto/commit/072440cbb2c8df2a689c4c903dd325013eae41a0.
When it comes to querying files -- Presto uses Hive's Metastore to keep track of metadata (everything besides actually reading the data). Thus the files must reside on HDFS or S3 to be accessible (other cloud data stores like Azure's Blob are, AFAIK, not supported yet).

Data warehousing using local DB - Beginner

I want to get an idea on how to achieve this;
I have an application that runs at 5 different geographical locations. Eg: Texas, NY,California, Boston, Washington
This application saves data to a local database, which is located at that location.
I want to do data warehousing, So is it a must to have just have one database (Where all the 5 applications will now save its data in a single database - without having local DBs)
Or is it possible to have 5 local databases, and do data warehousing by retrieving data from those local DBs to a central DB and then performing data warehousing.
Please give me your thoughts and references.
You have three options for this:
you use a single, centrally hosted database server. Typical relational database servers can be directly accessed via network these days: mySQL, Postgresql, Oracle, ... This means you can implement an application which opens a network connection to the database server and uses that remote server to store and retrieve the data as required. Multiple connections are possible at the same time.
you use a single, central database server but put a wrapper around it. So some small network layer application layer acting as a broker. This way you can address that central instance over network, but via standard protocols like for example http.
you use a decentralized approach and install a database instance at each location. Then you need some additional tool to perform a synchronization. For most modern database servers (see above) such tools exist, but the setup is not trivial.
If in doubt and if the load is not that high go with the first alternative.

How do I interface an xBase based ERP to a web application?

I am required to setup a web application that will interact with an existing ERP system (WinMagi). The ERP is basically a front-end to an xBase (FoxPro) database. The database is located on an in-house server. The ERP, as far as I'm aware, doesn't have an API but can accept purchase orders, etc through an EDI module. The web application should be able to accept online orders and query data for reporting.
My plan so far:
Synchronize the xBase DB to a SQL server instance on a cloud hosted VM.
(one-way from ERP -> SQL Server)
Use this sync process as an interface between the ERP and web application.
Push purchase orders back to the ERP using EDI.
My thinking here is that it would be safer from a data concurrency perspective to create or update data in the ERP through a controlled and accepted (by the ERP) interface.
Questions/Concerns:
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
Would the xBase DB become locked during sync? Or otherwise cause an issues for the live ERP?
How do I avoid data concurrency / integrity problems during the sync?
This system wouldn't be serving live data to the web app. What sort of issues can I expect due to this?
Should I prefer one language over another for this sort of project? My plan was to use Java/Hibernate MVC.
Am I perhaps going about this the wrong way? Would I be better off interfacing my web app directly with the xBase DB? Some problems that immediately spring to mind with this approach are networking issues between the office and the cloud-based VM and potential security vulnerabilities from opening up the ERP directly to the internet.
Any advice or suggestions you might be able to provide would be greatly appreciated!! Thanks in advance.
UPDATE - 3 Sep 2012
How I'm currently doing the data copy (it's not a synchronization) - runs nightly:
A linux box in the office copies the required DBFs from a read-only share on the ERP server to local storage.
The DBFs are converted to CSV using Dave Burton's fantastic dbf2csv perl script
The resulting CSVs are rsync'd to the remote VM. There are only small changes in the data so this is quite fast.
Once the rsync is complete the remote VM does a mysqlimport to the production DB.
Advantages of this approach
The ERP cannot be damaged in any way as the network access is read-only.
No custom logic has to be implemented to sync data and hence there are no concerns that the data could be wrong on the remote VM.
As the data copy runs at night the run time isn't too important.
Current run time is approx 7 minutes for over 1 million records with approx 20-30 fields per record.
Longest phases are the DBF copy and conversion to CSV.
Disadvantages
The DBFs have to be copied in full every time.
The DBFs have to be converted in full every time.
Tables that are being copied are locked during the mysqlimport. This isn't really too much of an issue though as the import runs during the night and the mysqlimport only takes about 20 seconds.
If you are using Visual Foxpro 3.0 or greater, you could use the built in DataBase container to create a connection to the SQL Server DB. Then the Views in the .DBC would do the heavy lifting of reading and updating the SQL Server tables.
I would envision a routine that looped through your Foxpro table and reading the rows and then making the updates to the SQL Server DB. So, the Foxpro tables shouldn't be lock. To ensure this, you could first query the DBFs into a cursor, then loop through the cursor.
I would suggest adding procedure to do concurrency checking.
Another option to server live Foxpro data in your web apps would be to create a linked server in SQL Server to your Foxpro database. That way your Foxpro data could be accessed real time.
I am currently doing something similar - I have to make invoice transactions from a FoxPro-based system available through a web application that will be on a remote, hosted VM running SQL Server.
I will answer your first point based on what I'm doing - you can decide for yourself whether it would work for you!
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
I didn't really look for any shared libraries. What I did was (somewhat simplified):
Added a field to the ERP-side transaction table that holds a CRC32 value based on other fields that I want to detect changes to (for example, the transaction balance).
Wrote a standalone EXE that scans the ERP-side transaction table on a timer, calculates a CRC32 value based on some fields, compares this to the last CRC32 value stored in the new field from point 1, and if different then something has changed and the transaction needs to be re-sent. This EXE was written in VFP for simplicity in accessing DBF files, and it runs as a Windows service. When I get time it will be re-done in C#.
Still in this EXE, once I have a list of new or changed transactions I convert them to JSON. I rolled my own JSON functions, but you could use Craig Boyd's from [Sweet Potato Software][1] or a number of others. There may be a PDF document associated with the transaction, if so it is encoded and embedded in the JSON.
I send the JSON to a web service on the remote side using a class that leverages the standard Windows WinHTTP library (WinHttp.WinHttpRequest.5.1) . The remote web service is essentially running Java. It decodes it all and updates the SQL Server.

SQL Server Express vs MS Access

A colleague I work with recently told me that SQL Express and MS Access were essentially the same thing; that does not seem to be an accurate statement. I know you can convert Access to a SQL DB and maybe under the covers they are similar, but I would assume that the SQL DB engine and what is used to run access are not the same. Not only that, but the SQL statement syntax, etc. I know are not the same.
I am mainly trying to understand so that I am more informed about the versions.
Um, no, not the same.
First off, I need to clear up some terminology. MS Access is a Rapid Application Development (RAD) tool that allows you to quickly build forms and reports that are bound to relational data. It comes with a file-based database engine (Jet/ACE).
Access the RAD tool can be used with many different backend databases (Jet, SQL Server, any db that supports ODBC, etc). I have to assume your colleague was specifically commenting on Jet/ACE, ie the database engine that MS Access uses.
I think the single biggest difference between the Jet/ACE database engine and MS SQL Server Express is that Jet/ACE is file-based and SQL Server Express uses a client/server model. This means that SQL Server Express requires a running service to provide access to the datastore. This can complicate deployment in some scenarios.
SQL Server Express is really just a throttled-back version of SQL Server: max database size of 4GB (10GB in 2008R2), only uses a single physical CPU, etc. These limitations are imposed to prevent large organizations from using the freely available Express edition in place of a full-blown SQL Server install. The upshot to this is that SQL Server Express offers a truly seamless upgrade path to SQL Server. It is also (generally speaking) a more robust and fully featured database management system then Jet/ACE.
Similarities
relational database management systems
written by Microsoft
Differences
MS Access
File based
free distributable runtime (2007 or later)
RAD tools (form/report designer)
uses Jet SQL
max file size 2GB
SQL Server Express
Client/Server model
free
no RAD tools
uses Transact-SQL
max database size 4GB (10GB for SSE R2), max one physical CPU
I think what your colleague had in mind was SQL Server CE, which is a super-lightweight embedded database, which is still (IMO) far superior to Access in database-management aspect. SQL Express cannot even be compared with Access without offending the former.
Here are the datasheets for both products so you can see some hard facts on the difference between the two databases.
Access:
http://office.microsoft.com/en-us/access-help/access-specifications-HP005186808.aspx
SQL (Express is listed on the far right column):
http://www.microsoft.com/sqlserver/2008/en/us/editions-compare.aspx
The comment I have always read is that Access is great for single user single access database use, the minute you scale beyond a single user look elsewhere. While that may be a "bit" of a stretch, Access really does not do well in a multi-user environment. From experience we've had a client who has ignored and ignored our requests to migrate a backend database from Access to SQL, and there have been numerous occasions where we have had to restore from backups, or take the Access database offline due to corruption.
They are two completely different technologies with two different target markets. The database engines are indeed different, as you mention T-SQL is different than Access SQL.
You can "scale up" an Access database to SQL by creating an SSIS package or other tool to do the import, but this takes the Access schema and data and migrates it to a true SQL database. It does more than just attach the Access database or the like.
Anytime you need a "real" database I'd highly recommend looking at any of the SQL versions that are available over Access.
Just remember that with MS-Access you don't have size limitations if you play your cards right. There is no reason, for example, not to have many 2 to 4 Gig tables each contained singularly in their own database. Your ODBC applications can open a connection to multiple MS-Access databases and query the single table in each. So you can have a database containing trillions of records, stored in multiple MDB files. One company I went to work for was using a single MS-Access database to run a issue tracking system done in MS-Access forms. They could only use it one person at a time because of sharing issues that would lock MS-Access up. I wrote a Win32 Perl native Windows GUI user-interface to the database that was better at field/record validation, and my ODBC code was able to manage the connection for simultaneous user access. I managed the opening and reading and writing and closing of the database for each user through my Perl program. I did not leave the database open. I did not maintain a persistent connection for each user, but instead only maintained a connection long enough to retrieve a record for edit. Then I closed the connection until it was time to write the record back to the database. Also, I wrote my own record locking program logic by maintaining a user login table that contained the record id of the record a user was currently editing, then erased that entry when no longer editing that record. When another user went to edit the same record, the program checked if that record was currently open for edit by another user. The system worked flawlessly. MS-Access never locked up via ODBC and multi-user access. I even embedded the password to the database in my compiled Perl program so that no one could get to the data in the Access database other than through my Perl program.

Export from a standalone database to an embedded database

I have a two-part application, where there is a central database that is edited, and then at certain times, the data is released and distributed as its own application. I would like to use a standalone database for the central database (MySQL, Postgres, Oracle, SQL Server, etc.) and then have a reliable export to an embedded database (probably SQLite) for distribution.
What tools/processes are available for such an export, or is it a practice to be avoided?
EDIT: A couple of additional pieces of information. The distributed application should be able to run without having to connect to another server (ex: your spellchecker still works even you don't have internet), and I don't want to install a full DB server for read-only access to the data.
If you really only want your clients to have read-access to the offline data it should not be that difficult to update your client-data manually.
A good practice would be to use the same product for the server database and the client database. You wouldn't have to write SQL-Statements twice since they use the same SOL-Dialect and same features.
Firebird for example offers a server
and an embedded version.
Also Microsoft offers their MS SQL Server
as a mobile version (compact edition) and there are
also Synchronization services
provided by Microsoft (good blog
describing sync services in visual
studio:
http://keithelder.net/blog/archive/2007/09/23/Sync-Services-for-SQL-Server-Compact-Edition-3.5-in-Visual.aspx)
MySQL has a product which is called "MySQLMobile" but I never actually used it.
I can also recommend SQLite as an embedded database since it is very easy to use.
Depending on your bandwidth and data amount you could even download the whole database and delete the old one. (in Firebird for example only copy the database files and it will also work with the mobile version) Very easy - BUT you have to know if it will work for your scenario. If you have more data you will need something more flexible and sophisticated, only updating the data that really changed.

Resources