Need inputs on the below requirement where there is a need to migrate the data from databases like
SQL Server, Teradata, Oracle-DB to AWS-S3 first to create a data lake.
Specifically looking for suggestions on :
--> Which services of AWS can be used to achieve this, as in to migrate the data models along with the data.
--> Any tools which can help expedite this migration process.
Regards,
Somen Swain
Related
We have a requirement where we need to migrate MySQL data which is running on AWS-RDS services to Snowflake. Any pointers/docs/references which can guide us would help.
The idea is to create a "data lake" in Snowflake.
MySQL instance running on AWS needs to be migrated to Snowflake-data lake.
The data needs to be migrated as "semi-structured" data.
Regards,
Somen Swain
I am new to Azure and have no prior experience or knowledge regarding working with Azure data warehouse systems (now Azure Synapse Analytics Framework)
I have access to a "read only" data warehouse (not in Azure) that looks like this:
I want to replicate this data warehouse as it is on Azure cloud. Can anyone point me to the right direction (video tutorials or documentation) and the number of steps involved in this process? There are around 40 databases in this warehouse. And what if I wanted to replicated only specific ones?
We can't do that you only have the read only permisson. No matter which data warehouse, we all need the server admin or database owner permission to do the database replicate.
You can easily get this from the all documents relate to the database backup/migrate/replicate, for example: https://learn.microsoft.com/en-us/sql/t-sql/statements/backup-transact-sql?view=sql-server-ver15#permissions,
If you have enough permission then you can to that. But for Azure SQL datawarehouse, now we called SQL pool (formerly SQL DW), we can't replicate other from on-premise datawarehouse to Azure directly.
The official document provide a way import the data into to Azure SQL pool((formerly SQL DW)):
Once your dedicated SQL pool is created, you can import big data with
simple PolyBase T-SQL queries, and then use the power of the
distributed query engine to run high-performance analytics.
You also could use other ETL tool to achieve the data migration from on-premise datawarehouse to Azure. For example using Data Factory, combine these two tutorials:
Copy data to and from SQL Server by using Azure Data Factory
Copy and transform data in Azure Synapse Analytics by using Azure
Data Factory
We are planning to implement a project in Azure cloud where data storage will be Azure Data lake for now and in future HDP will be implemented and ADLS will be the extended datanode. From ADLS we want to expose data for Dashboard creation using Tableau. Initial plan was to use Hive and Tableau will connect to Data through Hive. But here comes the performance issue as:
There will be multiple users who will have access to Data through Tableau(100+)
We will also have to expose Data to different portal with API calls.
Which means multiple connectivity will be established at the same time which will hit hive . My question is:
Can hive serve the purpose with minimal time?
How can i measure the performance?
I dont want to let my users to sit back after running a query in tableau and wait for a long time to see the dashboard.
Would you please share your experiences in this design issue? Should we use Hive or should We use some other tools which have better performance to work with tableau and HDFS storage. Someone suggested me to use Azure SQL Server and connect Tableau to SQL server. But its again the old fashion and also matter of cost as price is related with the execution of each query.
If you have any better solution experience please share , would be greatly appreciated.
Thanks in advance.
Hive LLAP could work, if you can get it installed.
Otherwise, at my work, we've had good experience with PrestoDB and Tableau on S3 data.
Some teams use Spark SQL, and you can setup a Spark Thrift Server, that should be compatible with the Hive JDBC/ODBC drivers
My current model looks like this:
Gather disparate data sources and import into SQL Server.
Process and transform data using SSIS packages.
final step in the SSIS package uploads data to the data warehouse.
BI tools pull data from the data warehouse for end users.
Is this a logical work flow? I initially was going to use data factory and the Azure SSIS integration runtime to process data. However I didn't understand why these steps were needed, as it would seem simpler in my situation just to build my SSIS packages on premises and upload the processed data to my data warehouse. What benefits would I gain from using data factory and the integration runtime? My main concern is that my current model will make automation difficult but I'm not entirely sure. Any help is appreciated.
Your possible paths here would be SSIS on prem, SSIS on VM in Cloud, SSIS in ADF or natively build the pipelines in ADF.
ADF is an Azure Cloud PaaS managed service for data movement and data integration orchestration. To reach back into on-prem data sources, you need to use an Integration Runtime gateway on the source side. So, if you are looking to move to a Cloud-first architecture or migrating into Azure, ADF is a good solution (use V2).
If you are remaining all on-prem SSIS on-prem is the best scenario.
If this is hybrid, where you will continue to have some data on prem and load Azure Data Warehouse in the Cloud, then you can still use SSIS on prem with connectors into ADW as the target. Or if you have to eliminate the local server concept, you can run that SSIS in a VM in Azure.
If you want to eliminate both the datacenter server and the need to patch, maintain, etc. the SSIS server, then use SSIS in ADF, which provides SSIS as a Service. In that case, you can still move data in a hybrid manner.
It really is going to depend on factors such as are you comfortable more in Visual Studio to develop SSIS jobs or do you want to build the pipelines in JSON in ADF? Do you have a plan or a need to move to Cloud? Do you want to move to a Cloud-Managed service (i.e. ADF V2)?
I hope that helps!!
I'm in the process of migration from dedicated servers to Azure. In my existing SQL Server, I have a few jobs that move data from live database to archives.
From what I have read so far, in Azure you cannot use cross database scritps. The other options I have seen include Azure SQL Data Sync, Azure Factory and maybe SSIS. I have to note that there's some logic on what data is archived and I need the ability to specify this in the query.
Has anyone some experience and what would you recommend?
Thanx
You can use the copy feature inside of data factory to do this now directly in Azure.
Azure Data Factory