I have been tasked with creating an application or using an existing one (Access, Excel, Power Apps) that allows users to read Snowflake data and also allow update, insert and delete operations. I am pretty sure Excel, Access and PowerApps are read only. PowerApps would also run 10 bucks a month for an app that currently only needs to be used once a quarter.
I was hoping I could used ODBC, but it looks like that only reads, no writeback. I do have the ability to use a SQL server as a middle man. I thought I would use ADF to mirror the data being modified with truncate and loads to Snowflake. But if I could skip that link in the chain it would be preferable.
Thoughts?
There are a couple of tools that can help you and business users read and write back to Snowflake. Many users then use Streams and Tasks on the table that is updated to automate further processing on Snowflake.
Some examples:
Open-source Excelerator - Excel plug-in to write to Snowflake
Sigma Computing - a cloud-native, serverless Excel / BI tool
Related
What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2
I am trying connect SAP ERP System with Pentaho Data Integration tools by required properties. As like below image. But didn't connect with SAP ERP System.
If I click on Test button then nothing saying. But, If I try to execute query then showing this error.
Try installing the drivers again. Below links should provide you some additional info.
https://blogs.sap.com/2014/09/04/creating-a-connection-to-sap-hana-using-pentaho-pdi/
https://www.danielpradilla.info/blog/how-to-connect-pentaho-data-integration-to-sap-hana/
The step in the background of your first screen shot, SAP ERP Table Input is not a standard Pentaho step, but a Plugin developed by IT-Novum. It is a product that has to be purchased (see here). But even that step doesn't allow direct queries using the table input step from Pentaho, it delivers its own input step. The reason is that SAP ERP doesn't allow direct database access at all, you have to retrieve data through RFC modules if you want to go through the SAP Netweaver server instead of accessing the database directly. This youtube video contains a demo around 7 minutes into the video, showing the plugin specific step.
You can use remote function modules like RFC_READ_TABLE to access tables and views with pentaho (and the sapjco library) alone, but it doesn't work through the regular Table Input step in Pentaho, you need to use the SAP Input step. And RFC_READ_TABLE has a number of limitations that make it pretty much useless for extended ETL tasks without a few modifications (see for instance this SCN thread)
If you only need to access a very small number of tables with few fields and ideally only in a few transformations, the SAP input step can help you do that. But if this a cornerstone of a large ETL process, start looking into commercial tools to help you. Or, if this is a viable alternative for you, access the database directly. But there are possible license limitations (the Oracle db licensed by SAP only allows access for the SAPSR3 user) as well as technical reasons (for instance security risks) to avoid doing that.
I've done some research without getting valuable information about my question.
I'm working on a data warehouse project and one my customer's requirement is that to use power bi pro for data visualisation.
What is not clear to me is if power bi, while acquiring data in its data model, would or not benefit from the indexing structure developed in SQL Server.
Thank you in advance for recommendation/tips on this subject.
It somewhat depends on whether you are using a live connection.
Existing indexes may speed up data loading when using PowerBI in import mode where the data source is a view, query, or stored procedure.
They will also be used in Live mode when connecting to the above sources, and might be used when connecting directly to multiple tables.
As the comments state, if you are bringing entire tables into PowerBI with import mode, then the existing indexes will not benefit you, and the internal SSAS instance that PBI uses is a whole different kettle of fish.
One caveat is that columnstore indexes can be used to get around some of the data size limitations when dealing with the gateway as described here: https://community.powerbi.com/t5/Power-Query/Using-SQL-Server-with-Nonclustered-Columnstore-Index/td-p/563787, but that's not directly related to your question.
Indexes help with retrieval speed on the server end. The answer to how much it will help depends on the specifics of your situation. If you are doing a lot of data transformation and mashup in the Power BI query editor, indexes will only help where there is a step that selects rows from the SQL Server. It won't help with steps where the processing is being done on the Power BI end (such as merging with data from an Excel file or adding custom columns or some forms of substituting values). However, since you mention a data warehouse rather than a simple database, I'm going to assume you're barely doing any transformation on the Power BI end, relying instead on the server end to do the heavy lifting. In that case indexes will definitely help speed things up if they're done strategically
There are some difference between Import mode and Connect live mode.
Import mode:
Data import can be used against any data source type, it can combining Data from different sources. Current Power BI service limitation published file size is 1 GB.
When using import, data are stored in Power BI file/service. Therefore, there is no need to setup permissions on data source side (service account for load is enough) and you can share data publically or with people outside organization. On the other hand, all data are stored on Power BI. It is supported to implement full DAX expressions and full Power Query transformations.
Connect live mode:
There are more limitations for live connection in place. It doesn’t work against all data sources. Current list can be seen here, it cannot combine data from multiple sources.
You are also limited to just one data source/database you selected. You can’t combine data from multiple data sources anymore. If you are connected to SQL Database, you can still create logical relationships between objects from that database as well as measures and calculated columns. When you are connected to SQL Server Analysis Services, you are limited just to report layout and even can’t make calculated columns ,while you can only create measures currently. When using live connection, users have to have access to underlying data source. This means you can’t share outside of your organization or publically. And It is not supported to implement full DAX expressions, only Report Level Measures, to learn more about report level measures, watch this great video from Patrick, and there is no Power Query transformations.
You can learn more: directquery-live-connection-or-import-data-tough-decision
Yes, I am a learning Access and am not familiar with Joomla!, but I am working on creating an Access database (in Access 2013) so multiple users can have a user friendly way to look at and edit our data via forms, use the queries, etc. We are trying to transfer our Schedule worksheet from Excel to Access.
1) My co-worker is working on Joomla! version 3.6.2, and I would like to know if Joomla! has forms and the ability to do queries, etc., like Access, so we can use Joomla! on the front end and house our tables in SQL server on the back end?
2) I don't know if Joomla! is compatible with SQL server, but do you recommend us sticking with Access as our database or using Joomla! on the front end? We have other things in Joomla! and would like to see if we can view and make changes to our Schedule via forms and queries, etc. in Joomla!, making it the one place to go to for all of our needs.
Thank you for your help.
Microsoft Access is a good tool for learning what a relational database is but I suspect there are less limitations with SQL (or MySQL) in terms of the amount of records, the size of the database and with sharing the data with multiple users.
There are quite a few forms extensions that enable you to not only submit data to the database but to also retrieve it in whatever way you wish (sometimes with a little custom coding).
Joomla supports SQL but MySQL is probably the preferred database.
I am required to setup a web application that will interact with an existing ERP system (WinMagi). The ERP is basically a front-end to an xBase (FoxPro) database. The database is located on an in-house server. The ERP, as far as I'm aware, doesn't have an API but can accept purchase orders, etc through an EDI module. The web application should be able to accept online orders and query data for reporting.
My plan so far:
Synchronize the xBase DB to a SQL server instance on a cloud hosted VM.
(one-way from ERP -> SQL Server)
Use this sync process as an interface between the ERP and web application.
Push purchase orders back to the ERP using EDI.
My thinking here is that it would be safer from a data concurrency perspective to create or update data in the ERP through a controlled and accepted (by the ERP) interface.
Questions/Concerns:
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
Would the xBase DB become locked during sync? Or otherwise cause an issues for the live ERP?
How do I avoid data concurrency / integrity problems during the sync?
This system wouldn't be serving live data to the web app. What sort of issues can I expect due to this?
Should I prefer one language over another for this sort of project? My plan was to use Java/Hibernate MVC.
Am I perhaps going about this the wrong way? Would I be better off interfacing my web app directly with the xBase DB? Some problems that immediately spring to mind with this approach are networking issues between the office and the cloud-based VM and potential security vulnerabilities from opening up the ERP directly to the internet.
Any advice or suggestions you might be able to provide would be greatly appreciated!! Thanks in advance.
UPDATE - 3 Sep 2012
How I'm currently doing the data copy (it's not a synchronization) - runs nightly:
A linux box in the office copies the required DBFs from a read-only share on the ERP server to local storage.
The DBFs are converted to CSV using Dave Burton's fantastic dbf2csv perl script
The resulting CSVs are rsync'd to the remote VM. There are only small changes in the data so this is quite fast.
Once the rsync is complete the remote VM does a mysqlimport to the production DB.
Advantages of this approach
The ERP cannot be damaged in any way as the network access is read-only.
No custom logic has to be implemented to sync data and hence there are no concerns that the data could be wrong on the remote VM.
As the data copy runs at night the run time isn't too important.
Current run time is approx 7 minutes for over 1 million records with approx 20-30 fields per record.
Longest phases are the DBF copy and conversion to CSV.
Disadvantages
The DBFs have to be copied in full every time.
The DBFs have to be converted in full every time.
Tables that are being copied are locked during the mysqlimport. This isn't really too much of an issue though as the import runs during the night and the mysqlimport only takes about 20 seconds.
If you are using Visual Foxpro 3.0 or greater, you could use the built in DataBase container to create a connection to the SQL Server DB. Then the Views in the .DBC would do the heavy lifting of reading and updating the SQL Server tables.
I would envision a routine that looped through your Foxpro table and reading the rows and then making the updates to the SQL Server DB. So, the Foxpro tables shouldn't be lock. To ensure this, you could first query the DBFs into a cursor, then loop through the cursor.
I would suggest adding procedure to do concurrency checking.
Another option to server live Foxpro data in your web apps would be to create a linked server in SQL Server to your Foxpro database. That way your Foxpro data could be accessed real time.
I am currently doing something similar - I have to make invoice transactions from a FoxPro-based system available through a web application that will be on a remote, hosted VM running SQL Server.
I will answer your first point based on what I'm doing - you can decide for yourself whether it would work for you!
What is the best way to update the SQL DB from the xBase DB? Are there any pre-existing libraries that can do this so I don't have to reinvent the wheel?
I didn't really look for any shared libraries. What I did was (somewhat simplified):
Added a field to the ERP-side transaction table that holds a CRC32 value based on other fields that I want to detect changes to (for example, the transaction balance).
Wrote a standalone EXE that scans the ERP-side transaction table on a timer, calculates a CRC32 value based on some fields, compares this to the last CRC32 value stored in the new field from point 1, and if different then something has changed and the transaction needs to be re-sent. This EXE was written in VFP for simplicity in accessing DBF files, and it runs as a Windows service. When I get time it will be re-done in C#.
Still in this EXE, once I have a list of new or changed transactions I convert them to JSON. I rolled my own JSON functions, but you could use Craig Boyd's from [Sweet Potato Software][1] or a number of others. There may be a PDF document associated with the transaction, if so it is encoded and embedded in the JSON.
I send the JSON to a web service on the remote side using a class that leverages the standard Windows WinHTTP library (WinHttp.WinHttpRequest.5.1) . The remote web service is essentially running Java. It decodes it all and updates the SQL Server.