Data warehousing using local DB - Beginner

Data warehousing using local DB - Beginner - database

I want to get an idea on how to achieve this;
I have an application that runs at 5 different geographical locations. Eg: Texas, NY,California, Boston, Washington
This application saves data to a local database, which is located at that location.
I want to do data warehousing, So is it a must to have just have one database (Where all the 5 applications will now save its data in a single database - without having local DBs)
Or is it possible to have 5 local databases, and do data warehousing by retrieving data from those local DBs to a central DB and then performing data warehousing.
Please give me your thoughts and references.

You have three options for this:
you use a single, centrally hosted database server. Typical relational database servers can be directly accessed via network these days: mySQL, Postgresql, Oracle, ... This means you can implement an application which opens a network connection to the database server and uses that remote server to store and retrieve the data as required. Multiple connections are possible at the same time.
you use a single, central database server but put a wrapper around it. So some small network layer application layer acting as a broker. This way you can address that central instance over network, but via standard protocols like for example http.
you use a decentralized approach and install a database instance at each location. Then you need some additional tool to perform a synchronization. For most modern database servers (see above) such tools exist, but the setup is not trivial.
If in doubt and if the load is not that high go with the first alternative.

Related

postgresql database server and many users

I'm relatively new to databases - I've used postgresql in the past to create databases stored on my computer and accessed only by myself.
I'm currently designing a database that will be used and edited by multiple people (10-15 max) living in different parts of the world. What is the best way to ensure we will all have access to the most current version of the database? Is it best to continue storing the database on my individual computer? Should I host the database on a cloud server? I've read that it is dangerous to store databases on Dropbox.
We are social science researchers organizing our data into a single database.

Based on your comment about not always be on and connected, it seems to me that the cloud service is the way to go for you. There are two approaches there, just rent a machine ("AWS EC2") and install the database software and manage the database yourself, or use a cloud provider's managed database service ("AWS RDS"). The names are just by way of a concrete examples, there are other providers of each type of service.

How to connect a database with files

What would be the best way to insert metadata into a database that need to be logicaly connected files that are stored locally on a web-server?

In general, databases control their own storage. The proper procedure is to load data into tables in the database. This is important, because databases manage storage and memory. In a typical configuration, you don't want to be accessing files being updated by another application. And, you typically don't want to be storing database data over the network.
The general answer to the question is that you want to load data into the database.
That said, many database engines allow you to remotely access data in other databases or through a technology such as ODBC. You can get drivers for flat files, even those stored remotely on the network. However, this is not an optimal set up for querying. Alternatively, databases can be used to manage metadata for remote files, such as image files stored on disk. The purpose is to allow searches through the metadata which, in essence, retrieve file names that are then resolved (either on the client side or server side, depending on the architecture).
You should, perhaps, ask another question with a lot more detail about what you are trying to accomplish and about which database you are using.

Approaches for Database Synchronization

I am currently been assigned to develop a sync application for my company. We have SQL server on our database server which will be synced with the client database. Client databases are not known, they can be SQLite or MYSQL or whatever.
What this sync app does is, detect changes that occur on server & client databases. Save these changes and sync. If changes occur on server database it will be synced with the client database and vice versa.
I did some research on it and came to know many solutions. One of them is to use a Microsoft Sync Framework. But I hardly found a good implementation example on it for syncing with remote databases.
Then I came across Change Data Capture(CDC) on SQL Server 2008. CDC works by detecting the change on the source table through triggers and put these changes on a separate table called sync_table, this table is then used for syncing.
Since, I cannot use the CDC feature because I don't have sufficient database rights on my machine, I have started to develop my own solution which works like how CDC does. I create separate sync_table for each source table, create triggers to detect data change and put this data in the sync_table.
However, I am advised to do some more research on it for choosing the best implementation methodology.
I need to keep the following things in mind,
Databases may/may not be on the same network.
On server side, the user must be able to select which tables will take part in the sync process.
Devices that will sync with the server database need to be registered first. Meaning that all client devices will be registered by the user before they can start syncing.
As usual any help will be appreciated :)

There is an open source project called SymmetricDS with many of the same goals. Take a look at the documentation and data model to see how the problem was solved, and maybe you will get some ideas. Instead of a separate shadow table for each source table, there is a single sym_data table where all the data is captured in comma separated value format. The advantage is one place to look for captured data and retrieve changes that were part of the same transaction. The table is kept small by purging it often after data is transferred successfully. It uses web protocols (HTTP) for data transfer. The advantage is leveraging existing web servers for performance, administration, and known filtering through firewalls. There is also a registration protocol used before clients are allowed to sync. The server admin "opens registration" for a client ID, which allows the client to connect for the first time. It supports many different databases, so you'll find examples of how to write triggers and retrieve unique transaction IDs on those systems.

How do I interface an xBase based ERP to a web application?

I am required to setup a web application that will interact with an existing ERP system (WinMagi). The ERP is basically a front-end to an xBase (FoxPro) database. The database is located on an in-house server. The ERP, as far as I'm aware, doesn't have an API but can accept purchase orders, etc through an EDI module. The web application should be able to accept online orders and query data for reporting.
My plan so far:
Synchronize the xBase DB to a SQL server instance on a cloud hosted VM.
(one-way from ERP -> SQL Server)
Use this sync process as an interface between the ERP and web application.
Push purchase orders back to the ERP using EDI.
My thinking here is that it would be safer from a data concurrency perspective to create or update data in the ERP through a controlled and accepted (by the ERP) interface.
Questions/Concerns:
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
Would the xBase DB become locked during sync? Or otherwise cause an issues for the live ERP?
How do I avoid data concurrency / integrity problems during the sync?
This system wouldn't be serving live data to the web app. What sort of issues can I expect due to this?
Should I prefer one language over another for this sort of project? My plan was to use Java/Hibernate MVC.
Am I perhaps going about this the wrong way? Would I be better off interfacing my web app directly with the xBase DB? Some problems that immediately spring to mind with this approach are networking issues between the office and the cloud-based VM and potential security vulnerabilities from opening up the ERP directly to the internet.
Any advice or suggestions you might be able to provide would be greatly appreciated!! Thanks in advance.
UPDATE - 3 Sep 2012
How I'm currently doing the data copy (it's not a synchronization) - runs nightly:
A linux box in the office copies the required DBFs from a read-only share on the ERP server to local storage.
The DBFs are converted to CSV using Dave Burton's fantastic dbf2csv perl script
The resulting CSVs are rsync'd to the remote VM. There are only small changes in the data so this is quite fast.
Once the rsync is complete the remote VM does a mysqlimport to the production DB.
Advantages of this approach
The ERP cannot be damaged in any way as the network access is read-only.
No custom logic has to be implemented to sync data and hence there are no concerns that the data could be wrong on the remote VM.
As the data copy runs at night the run time isn't too important.
Current run time is approx 7 minutes for over 1 million records with approx 20-30 fields per record.
Longest phases are the DBF copy and conversion to CSV.
Disadvantages
The DBFs have to be copied in full every time.
The DBFs have to be converted in full every time.
Tables that are being copied are locked during the mysqlimport. This isn't really too much of an issue though as the import runs during the night and the mysqlimport only takes about 20 seconds.

If you are using Visual Foxpro 3.0 or greater, you could use the built in DataBase container to create a connection to the SQL Server DB. Then the Views in the .DBC would do the heavy lifting of reading and updating the SQL Server tables.
I would envision a routine that looped through your Foxpro table and reading the rows and then making the updates to the SQL Server DB. So, the Foxpro tables shouldn't be lock. To ensure this, you could first query the DBFs into a cursor, then loop through the cursor.
I would suggest adding procedure to do concurrency checking.
Another option to server live Foxpro data in your web apps would be to create a linked server in SQL Server to your Foxpro database. That way your Foxpro data could be accessed real time.

I am currently doing something similar - I have to make invoice transactions from a FoxPro-based system available through a web application that will be on a remote, hosted VM running SQL Server.
I will answer your first point based on what I'm doing - you can decide for yourself whether it would work for you!
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
I didn't really look for any shared libraries. What I did was (somewhat simplified):
Added a field to the ERP-side transaction table that holds a CRC32 value based on other fields that I want to detect changes to (for example, the transaction balance).
Wrote a standalone EXE that scans the ERP-side transaction table on a timer, calculates a CRC32 value based on some fields, compares this to the last CRC32 value stored in the new field from point 1, and if different then something has changed and the transaction needs to be re-sent. This EXE was written in VFP for simplicity in accessing DBF files, and it runs as a Windows service. When I get time it will be re-done in C#.
Still in this EXE, once I have a list of new or changed transactions I convert them to JSON. I rolled my own JSON functions, but you could use Craig Boyd's from [Sweet Potato Software][1] or a number of others. There may be a PDF document associated with the transaction, if so it is encoded and embedded in the JSON.
I send the JSON to a web service on the remote side using a class that leverages the standard Windows WinHTTP library (WinHttp.WinHttpRequest.5.1) . The remote web service is essentially running Java. It decodes it all and updates the SQL Server.

Suitable method For synchronising online and offline Data

I have two applications with own database.
1.) Desktop application which has vb.net winforms interface, runs in offline enterprise network and stores data in central database [SQL Server]
**All the data entry and other office operations are carried out and stored in central database
2.) Second application has been build on php. it has html pages and runs as website in online environment. It stores all data in mysql database.
**This application is accessed by registered members only and they are facilitied with different reports of the data processed by 1st application.
Now I have to synchronize data between online and offline database servers. I am planning for following:
1.) Write a small program to export all the data of SQL Server [offline server] to a file in CVS format.
2.) Login to admin Section of live server.
3.) Upload the exported cvs file to the server.
4.) Import the data from cvs file to mysql database.
Is the method i am planning good or it can be tunned to perform good. I would also appreciate for other nice ways for data synchronisation other than changing applications.. ie. network application to some other using mysql database

What you are asking for does not actually sound like bidirectional sync (or movement of data both ways from SQL Server to MySQL and from MySQL to SQL Server) which is a good thing as it really simplifies things for you. Although I suspect your method of using CSV's (which I would assume you would use something like BCP to do this) would work, one of the issues is that you are moving ALL of the data every time you run the process and you are basically overwriting the whole MySQL db everytime. This is obviously somewhat inefficient. Not to mention during that time the MySQL db would not be in a usable state.
One alternative (assuming you have SQL Server 2008 or higher) would be to look into using this technique along with Integrated Change Tracking or Integrated Change Capture. This is a capability within SQL Server that allows you to determine data that has changed since a certain point of time. What you could do is create a process that just extracts the changes since the last time you checked to a CSV file and then apply those to MySQL. If you do this, don't forget to also apply the deletes as well.

I don't think there's an off the shelf solution for what you want that you can use without customization - but the MS Sync framework (http://msdn.microsoft.com/en-us/sync/default) sounds close.
You will probably need to write a provider for MySQL to make it go - which may well be less work than writing the whole data synchronization logic from scratch. Voclare is right about the challenges you could face with writing your own synchronization mechanism...

Do look into SQL Server Integration Service as a good alternate.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight