Error connecting with SAP ERP System with Pentaho ETL tool - database

I am trying connect SAP ERP System with Pentaho Data Integration tools by required properties. As like below image. But didn't connect with SAP ERP System.
If I click on Test button then nothing saying. But, If I try to execute query then showing this error.

Try installing the drivers again. Below links should provide you some additional info.
https://blogs.sap.com/2014/09/04/creating-a-connection-to-sap-hana-using-pentaho-pdi/
https://www.danielpradilla.info/blog/how-to-connect-pentaho-data-integration-to-sap-hana/

The step in the background of your first screen shot, SAP ERP Table Input is not a standard Pentaho step, but a Plugin developed by IT-Novum. It is a product that has to be purchased (see here). But even that step doesn't allow direct queries using the table input step from Pentaho, it delivers its own input step. The reason is that SAP ERP doesn't allow direct database access at all, you have to retrieve data through RFC modules if you want to go through the SAP Netweaver server instead of accessing the database directly. This youtube video contains a demo around 7 minutes into the video, showing the plugin specific step.
You can use remote function modules like RFC_READ_TABLE to access tables and views with pentaho (and the sapjco library) alone, but it doesn't work through the regular Table Input step in Pentaho, you need to use the SAP Input step. And RFC_READ_TABLE has a number of limitations that make it pretty much useless for extended ETL tasks without a few modifications (see for instance this SCN thread)
If you only need to access a very small number of tables with few fields and ideally only in a few transformations, the SAP input step can help you do that. But if this a cornerstone of a large ETL process, start looking into commercial tools to help you. Or, if this is a viable alternative for you, access the database directly. But there are possible license limitations (the Oracle db licensed by SAP only allows access for the SAPSR3 user) as well as technical reasons (for instance security risks) to avoid doing that.

Related

Best approach to move data between SQL Servers

Ok, let me explain the environment we are facing here:
We have an ASP.NET MVC 4 app that uses a SQL Server database.
This app isolates data in "projects", so when any user connects to it only can work on the data of one of this projects.
Sometimes... a group of users have to travel to remote regions for some days to retrieve data for a single project, and quite often they won't be able to have an internet connection (even mobile or satellite solutions are often out of reach).
While the displaced team works on a project, people at the office still can work on the rest of the projects (but no on the one that is abroad).
So... we are pondering the possibility of using a laptop to act as a "mobile server", where users can download the data from a specific project before travelling. While abroad, they can work against this "mobile server", update any data on their project and, when they come back, they could upload their updated data to the main server.
Our idea is to create stored procedures on both servers (main and mobile) that executes different queries to update data from a project between them, passing the project identifier as a parameter. Probably using Linked servers to allow main and mobile to see themselves during update operations.
Our questions here are:
Is this a good aproach?
Is there any other better approach that we're not seeing?
Are there any risks we should pay attention to in this or other approachs?
I've never used Bidirectional transaction replication so if that works for you, problem solved. I do have quite a bit of experience with data migration, including merging large data sets into software driven systems. And from that experience, replication has hurt us more than it has helped us (from a migration/merge view).
The biggest challenge in my opinion is going to be conflict resolution. I know you say that all of the data is in project specific databases, but there is no shared data at all? What about multiple remote users updating the same data? In that case you're going to need a little more than just replication.
Instead of maintaining two databases at all times (one for mobile, one as the regular in-house db), why not a system where a job is called to your main system indicating that a project needs to be prepared for "offline mode" (the job could be stored procedures or SSIS packages or straight T-SQL). Whatever the technology used, this job would copy all of the requested project data to a new database on the remote server/laptop and mark it somehow in the main database as read-only to prevent users in the office from updating that data.
Once the data is in offline mode on the remote server, the users can update and use the data as much as they want from that remote server. Then when the users get an internet connection or they are back in the office they can kick off another job that syncs the data to the main server, removes offline mode, and deletes/archives the remote database. Almost like a temporary project database.
Seriously, it sounds like a fun project.
Technologies to look at:
SSIS (Sql Server Integration Services) - In my experience, this is extremely fast at moving data and allows you the ability to add logic to handle conflict resolution, error logic, etc. It's free (with certain Sql Server editions) and the community is huge so supporting it should be easy. SSIS is not as dynamic as some of the specialized solutions out there.
A data migration suite like Pervasive's Data Integrator - I loved this but it's expensive. You could right an entire solution in this product that could handle the processing of your data bidirectionally and like SSIS it allows for complex programming logic.
T-SQL - With a linked server you could just write straight queries (using stored procedures if you wanted). The problem here is security on the linked server. We don't use them because of this issue. Linked Servers: Good or Bad?
Start using some of Microsoft's built in change detection technologies right off the bat. It's harder to implement when you're already using the system. Change Data Capture (CDC) will give you a full history of the records updated while Change Tracking will give you a light-weight summary of your changes. Using either technology will make syncing the data many times easier.
Change Tracking: http://msdn.microsoft.com/en-us/library/bb933874.aspx
Change Data Capture: http://msdn.microsoft.com/en-us/library/cc645937.aspx
SSIS: http://msdn.microsoft.com/en-us/library/ms169917.aspx
SQL Server Agent Jobs: http://msdn.microsoft.com/en-us/library/ms189237.aspx

How do I interface an xBase based ERP to a web application?

I am required to setup a web application that will interact with an existing ERP system (WinMagi). The ERP is basically a front-end to an xBase (FoxPro) database. The database is located on an in-house server. The ERP, as far as I'm aware, doesn't have an API but can accept purchase orders, etc through an EDI module. The web application should be able to accept online orders and query data for reporting.
My plan so far:
Synchronize the xBase DB to a SQL server instance on a cloud hosted VM.
(one-way from ERP -> SQL Server)
Use this sync process as an interface between the ERP and web application.
Push purchase orders back to the ERP using EDI.
My thinking here is that it would be safer from a data concurrency perspective to create or update data in the ERP through a controlled and accepted (by the ERP) interface.
Questions/Concerns:
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
Would the xBase DB become locked during sync? Or otherwise cause an issues for the live ERP?
How do I avoid data concurrency / integrity problems during the sync?
This system wouldn't be serving live data to the web app. What sort of issues can I expect due to this?
Should I prefer one language over another for this sort of project? My plan was to use Java/Hibernate MVC.
Am I perhaps going about this the wrong way? Would I be better off interfacing my web app directly with the xBase DB? Some problems that immediately spring to mind with this approach are networking issues between the office and the cloud-based VM and potential security vulnerabilities from opening up the ERP directly to the internet.
Any advice or suggestions you might be able to provide would be greatly appreciated!! Thanks in advance.
UPDATE - 3 Sep 2012
How I'm currently doing the data copy (it's not a synchronization) - runs nightly:
A linux box in the office copies the required DBFs from a read-only share on the ERP server to local storage.
The DBFs are converted to CSV using Dave Burton's fantastic dbf2csv perl script
The resulting CSVs are rsync'd to the remote VM. There are only small changes in the data so this is quite fast.
Once the rsync is complete the remote VM does a mysqlimport to the production DB.
Advantages of this approach
The ERP cannot be damaged in any way as the network access is read-only.
No custom logic has to be implemented to sync data and hence there are no concerns that the data could be wrong on the remote VM.
As the data copy runs at night the run time isn't too important.
Current run time is approx 7 minutes for over 1 million records with approx 20-30 fields per record.
Longest phases are the DBF copy and conversion to CSV.
Disadvantages
The DBFs have to be copied in full every time.
The DBFs have to be converted in full every time.
Tables that are being copied are locked during the mysqlimport. This isn't really too much of an issue though as the import runs during the night and the mysqlimport only takes about 20 seconds.
If you are using Visual Foxpro 3.0 or greater, you could use the built in DataBase container to create a connection to the SQL Server DB. Then the Views in the .DBC would do the heavy lifting of reading and updating the SQL Server tables.
I would envision a routine that looped through your Foxpro table and reading the rows and then making the updates to the SQL Server DB. So, the Foxpro tables shouldn't be lock. To ensure this, you could first query the DBFs into a cursor, then loop through the cursor.
I would suggest adding procedure to do concurrency checking.
Another option to server live Foxpro data in your web apps would be to create a linked server in SQL Server to your Foxpro database. That way your Foxpro data could be accessed real time.
I am currently doing something similar - I have to make invoice transactions from a FoxPro-based system available through a web application that will be on a remote, hosted VM running SQL Server.
I will answer your first point based on what I'm doing - you can decide for yourself whether it would work for you!
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
I didn't really look for any shared libraries. What I did was (somewhat simplified):
Added a field to the ERP-side transaction table that holds a CRC32 value based on other fields that I want to detect changes to (for example, the transaction balance).
Wrote a standalone EXE that scans the ERP-side transaction table on a timer, calculates a CRC32 value based on some fields, compares this to the last CRC32 value stored in the new field from point 1, and if different then something has changed and the transaction needs to be re-sent. This EXE was written in VFP for simplicity in accessing DBF files, and it runs as a Windows service. When I get time it will be re-done in C#.
Still in this EXE, once I have a list of new or changed transactions I convert them to JSON. I rolled my own JSON functions, but you could use Craig Boyd's from [Sweet Potato Software][1] or a number of others. There may be a PDF document associated with the transaction, if so it is encoded and embedded in the JSON.
I send the JSON to a web service on the remote side using a class that leverages the standard Windows WinHTTP library (WinHttp.WinHttpRequest.5.1) . The remote web service is essentially running Java. It decodes it all and updates the SQL Server.

C# How to Synch Data Between Database Instances Over the Internet

We have an application that requires our customers to have a SQL server instance on site. At their request, the application needs to synchronize the data in their database with a copy in our datacenter.
We're using .Net 3.5 SP1. We need to synchronize the data exactly, including IDENTITY columns.
We'd prefer to use something like LINQ to SQL that would let us make some simple select and insert/update calls against mapped entities. However, the IDENTITY columns seem to be a problem with LINQ and similar approaches.
We can do this all with built-up SQL statements and turn IDENTITY INSERT on / off as needed, but I'd prefer a more elegant solution.
Thanks!
** Edit - We DO need to write our own solution, and we do need to use .Net 3.5 SP1 to do it. I won't spend your time explaining all the reasons why, but please limit suggestions to options within the .Net playground.
Microsoft Sync Framework can be your solution. This is framework description from Microsoft:
Microsoft Sync Framework is a data synchronization platform from Microsoft that can be used to synchronize data across multiple data stores. Sync Framework includes a transport-agnostic architecture, into which data store-specific synchronization providers, modelled on the ADO.NET data provider API, can be plugged in.
Sync Framework is a comprehensive data synchronization solution that enables developers to build solutions that support synchronization of any database, on any data protocol over any network topology. msdn.microsoft.com
For your convinience providing link to good tutorial on the subject
If it is just a couple of tables that need to be synchronized and there is not a lot of data in the tables (now and future) you could develop some sort of bulk copy from your servers and bulk insert routine on the customer's server.
Since you said you can't use SQL Server replication services or SSIS, then perhaps a backup/restore procedure could be written. You could take a scheduled backup of your database and make it available to calling applications which could then copy the backup, restore it to another instance on the customers server, then pull all data you need via any number of methods and it would exist locally on the customers servers.
Beyond that, I think you may be asking for a maintenance and synchronization nightmare if you can't base your solution on tools that are made to do this sort of thing.

Suitable method For synchronising online and offline Data

I have two applications with own database.
1.) Desktop application which has vb.net winforms interface, runs in offline enterprise network and stores data in central database [SQL Server]
**All the data entry and other office operations are carried out and stored in central database
2.) Second application has been build on php. it has html pages and runs as website in online environment. It stores all data in mysql database.
**This application is accessed by registered members only and they are facilitied with different reports of the data processed by 1st application.
Now I have to synchronize data between online and offline database servers. I am planning for following:
1.) Write a small program to export all the data of SQL Server [offline server] to a file in CVS format.
2.) Login to admin Section of live server.
3.) Upload the exported cvs file to the server.
4.) Import the data from cvs file to mysql database.
Is the method i am planning good or it can be tunned to perform good. I would also appreciate for other nice ways for data synchronisation other than changing applications.. ie. network application to some other using mysql database
What you are asking for does not actually sound like bidirectional sync (or movement of data both ways from SQL Server to MySQL and from MySQL to SQL Server) which is a good thing as it really simplifies things for you. Although I suspect your method of using CSV's (which I would assume you would use something like BCP to do this) would work, one of the issues is that you are moving ALL of the data every time you run the process and you are basically overwriting the whole MySQL db everytime. This is obviously somewhat inefficient. Not to mention during that time the MySQL db would not be in a usable state.
One alternative (assuming you have SQL Server 2008 or higher) would be to look into using this technique along with Integrated Change Tracking or Integrated Change Capture. This is a capability within SQL Server that allows you to determine data that has changed since a certain point of time. What you could do is create a process that just extracts the changes since the last time you checked to a CSV file and then apply those to MySQL. If you do this, don't forget to also apply the deletes as well.
I don't think there's an off the shelf solution for what you want that you can use without customization - but the MS Sync framework (http://msdn.microsoft.com/en-us/sync/default) sounds close.
You will probably need to write a provider for MySQL to make it go - which may well be less work than writing the whole data synchronization logic from scratch. Voclare is right about the challenges you could face with writing your own synchronization mechanism...
Do look into SQL Server Integration Service as a good alternate.

How do I query the Sharepoint database?

I want to retrieve some data. How can I make a query on a Sharepoint database?
You shouldn't because of these reasons:
This is completely unsupported by the EULA you agreed to when you installed SharePoint. (I have to add a note that changing or calling triggers (except some) directly is unsupported, but not selecting)
Your queries are not guaranteed to work after applying any patches or service packs to SharePoint since Microsoft could change the database schema anytime.
Directly querying the database can place extra load on a server and hence performance issues.
Direct SELECT statements against the database take shared read locks at the default transaction level so your custom queries might cause deadlocks and hence stability issues.
Your custom queries might lead to incorrect data being retrieved.
Let me clarify, that #1 DOES NOT ALLOW you to modify sharepoint database in any way. SELECT`ing is permitted, however, as mentioned, that may lead to other problems.
However, if you are not interested in these points, then just use Visual Studio to connect to existing database, just do the regular procedure on how you connect to any other database.
But you can make your own database and store some additional information there.
Access SharePoint data the right way
Use SharePoint Object Model (Code can only be run on SharePoint server)
Use SharePoint WebServices (Run code from anywhere, from any application)
SharePoint 2013 now features REST API.
I have one thing to add. If you do decide to query sharepoint content databases directy, use the NOLOCK hint to prevent shared lock being taken out and potentially creating dead locks in the application.
If you don't mind using other proprietary Microsoft programs, Access/Excel/PowerBI all offer native connectivity to data stored in sharepoint lists/document libraries/meta data.

Resources