I was wondering if, like with OpenSearch, I can insert data to a database or datalake without the need of creating a table (From what I know I can't do that, but I'm not sure). Thank you.
Snowflake just recently released schema detection to public preview
https://www.snowflake.com/blog/schema-detection-public-preview/
You can also load files direct to s3 or an internal stage with various formats supported
Related
I'm exploring options on how to one-way sync from a table available via API to an SQL database. Does anyone have any suggestions on how to achieve this?
The data from the "Source" is often updated and should be copied to the "Destination" as the changes happen (live).
Source
Read Only table from an ERP available via an API. Webhooks on the source are not possible. Entries to this table may be created, updated or deleted. There would be approximately 150,000 entries in the table with about 1000 changes per day.
Destination
Azure MS SQL database which I have full control over.
I'm looking for best practice or any ideas on how to achieve this. There seems to be very few articles that I can find with anything helpful.
I'm open to using any tool on Azure including Logic Apps and Azure Functions but want to stay away from using 3rd party tools.
If you are trying to achieving this through logic apps, Below is the flow that you can follow.
Note: Make sure you preprocess the data before sending the data to SQL database using appropriate actions based on the type of data that you are receiving.
What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2
We choosed Snowflake as our DWH and we would like to connect different data sources like (Salesforce, Hubspot and Zendesk).
Is there a way to extract data from these sources and store them in Snowflake in a staging schema without having to store the data in cloud storage like S3 then reading the data into Snowflake?
Many thanks in advance.
You can use any of the connectors Snowflake provide (odbc, jdbc, python, etc) and any tool that can use one of these connectors. However they wont perform well compared to the COPY INTO approach that is optimised for bulk loading.
There are ETL tools, such as Matillion, that use the stage/copy into approach but do it in the background so that it appears that you are loading directly into Snowflake.
I have a Dynamics CRM 2011 instance where the database has become corrupted. The corrupted data appears to be isolated to a few tables (e.g. PrincipalObjectAccess) and the instance still functions normally to all appearances. The data is irretrievable (all forms of DBCC CHECKDB, etc. have been run) and a backup is not available (preaching on backups will not help resolve the issue).
I've tried using schema and data synchronization tools like those offered by dbForge and Red-Gate, the schema sync works but the data sync always seems to come up inconsistent.
At this juncture I think my best route is probably to export all data from Dynamics CRM 2011 and then import it into a new instance of Dynamics CRM 2011. Any thoughts on the best way to accomplish this? Or alternative methods of rectifying the situation?
Exporting all data and importing it into new organization will likely create more errors and I wouldn't really go with that option unless everything else fails.
You said data synchronization failed: have you tried deleting all data from new instance first and then running data synchronization. It should be simpler than synchronization when data already exists there.
Have you tried synchronizing data using ApexSQL Data Diff ?
Another option you can try that doesn't require you to create new organization is reading your SQL Server transaction logs and checking if corrupted data can be found there. If you can retrieve the data then you can just re-create tables with valid data and you'll be all good. Unfortunately this is only possible using 3rd pary tools such as ApexSQL Log
I would recommend looking into the CRM 2011 Instance Adapter
Unlike Scribe, it's free.
Microsoft blog post: http://blogs.msdn.com/b/crm/archive/2012/10/24/the-microsoft-dynamics-crm-2011-instance-adapter-has-released.aspx
PowerObjects wrote an article about it as well:
http://www.powerobjects.com/blog/2012/10/26/introduction-microsoft-dynamics-crm-2011-instance-adapter/
Peter
If you can, export to Excel and import from there.
Advantage: easy, fast, graspable
If you can't, design a console application that connects to the server, queries it, fetches data and shoves it into the other instance.
Advantage: full control, repeatibility, configurability, coolness factor and you get to type some code
This really depends on the scope of your data. Are you talking about millions of records with a huge list of entities or are you talking a couple entities with a thousand or so records?
If it's something small, you could always try exporting via excel and then importing into the new org.
SSIS, CozyRoc or Scribe will do the trick. I'd opt for Scribe and go entity by entity if it is a mission critical situation.
I have a web application that has a requirement to take data from an Excel workbook and load it to the database. I am using ADO.NET 2.0 and need to use DataSet/DataTable/DataAdapter/etc. to perform this task.
We also have a requirement of verifying the data before it is uploaded and informing the user if there are any PK/FK/Unique/Other constraint violations.
Currently, we are uploading the data from Excel into a DataTable and getting the data from the database into another DataTable, merging the two tables and (if everything merged successfuly) then we update the database.
Can I use methods on DataTable or DataSet (or another ADO.Net data structure) to check for PK/FK/Unique/Other constraints? What would the recommended workflow be to upload the data to the database given my requirements?
Can I use methods on DataTable or
DataSet (or another ADO.Net data
structure) to check for
PK/FK/Unique/Other constraints?
I don't think so.
Instead you can start transaction and catch all the SqlException-s.
Then you could check the Number of the exception to see what kind of error it is (FK violation, Unique constraint etc)
What would the recommended workflow be
to upload the data to the database
given my requirements?
You may also consider using Sql Server Integration Services. It allows you to do pretty much everything.
What you describe is a "standard" ETL job. Although you can do it with custom scripts, there are available tools which allow you to focus on he job itself, as opposed to custom development. There is the oracle data integrator and Pentaho suite (open source) available for download and trial.