Replace SSIS with PowerShell? - sql-server

Some people don't like SSIS for the following reasons,
Need to find and click the express replacement scattered in different place when design a little bit more complex package.
These merge, lookup components don't perform well. I heard a lot of consultants just recommend loading data in the SQL Server tables and use transact-sql.
I've used a powershell in a small project which export the data and create csv files. I've used powershell and like it. Is it a trend to replace some of the tasks traditionally using SSIS with Powershell? Especially in the export only cases?

For very small projects/tasks power shell is an ok tool.
For projects that need to be robust, maintainable, modular, handle errors and auditing, SSIS is vastly superior.
The truth is, too many SSIS implementations are crafted by devs that don't understand the strengths of the program. They simply try to replicate their current T-SQL ETL process into SSIS with minimal effort or leverage of its capabilities. Performance issues almost always go right along with this.
SSIS is Not just a GUI way to get SPs and TSQL to autorun. If you really want to learn more on the subject I suggest picking up a few books - careful listening to narrow-fielded experts; their skillsets can easily fade from relevance and keep others behind with them.
Powershell trend away from SSIS? Not anywhere close to where it counts.

This is an old topic but I find it is well worth discussing. So I'd like to put a few ideas why I think SSIS is a bad choice 99% of the time as a ETL tool.
At this time, the only thing I can think of SSIS better than PowerShell is its performance in handling huge amount of data with multiple sources / targets, this is mainly due to the SSIS internal parallelism and caching capability.
However, SSIS is notorious for its error message, almost unable to debug once SSIS packages are deployed, also from source control perspective, SSIS packages, which are XML files, are difficult to compare between versions, also very fragile once either source or target objects have minor changes (like a column on target is increased by one char).
In my prod environment, there are many SSIS packages deployed and scheduled with sql agent jobs, so when there is a job failure, there is no way for me to figure out the problem until I went to TFS to find the SSIS project and open it in Visual Studio to figure out the logic. It is a nightmare.
With PowerShell, the code you see is the code executed, and you can always get the logic from the PS code and do the trouble-shooting along the way.
With many, many open-sourced PS modules these days, PowerShell's power is increasing exponentially, it is indeed the time to consider using PS as an alternative tool rather than the SSIS.

Related

Migrate data from MariaDB to SQLServer

We are planning to migrate all the data from MariaDB to SQLServer. Can anyone please suggest any approach to migrate the data so that no downtime is required as well as no data is lost.
In context of that, I have gone through a few posts here, but did not get much idea.
You could look into SQL Server Integration Services functionality for migrating your data.
Or you could manually create a migration script using a linked server in your new SQL Server instance.
Or you could use BCP to perform bulk imports (which is quite fast, but requires intermediate steps to put the data in text files).
What's more important is how you want to realize the "no downtime" requirement. I suppose the migration routines need some functional requirement, which might be difficult to implement with a general migration tool, like:
the possibility to perform the migration in multiple batches/runs (where already migrated data is skipped), and
the possibility to implement different phases of the migration in different solutions, like bulk imports (using text files and staging tables) for history data (which will not change anymore), but live queries over a live database connection for the latest updates in the MariaDB/MySQL database.
The migration strategy might also largely depend on the size of the data in MariaDB/MySQL, and the structure of the database(s) and its data. Perhaps you want to keep auto-generated primary key values, because the system requires them to remain unchanged. Perhaps you need to use different data types for some exotic table fields. Perhaps you need to re-implement some database logic (like stored procedures and functions). Etc. etc.
It is very difficult to give some ad-hoc advice about these kind of migration projects; as Tim Biegeleisen already commented, this can be quite a complex job, even for "small" databases. It practically always requires a lot of research, extensive preparations, test runs (in a testing environment using database backups), some more test runs, a final test run, etc. And - of course - some analytics, metrics, logging, and reporting for troubleshooting (and to know what to expect during the actual migration). If the migration will be long-running, you want to make sure it does not freeze the live production environment, and you might also want some form of progress indication during the migration.
And - last but not least - you surely want to have a "plan B" or a quick return strategy in case the actual migration will fail (despite all those careful preparations).
Hope I did not forget something... ;-)

Extract & transform data from Sql Server to MongoDB periodically

I have a Sql Server database which is used to store data coming from a lot of different sources (writers).
I need to provide users with some aggregated data, however in Sql Server this data is stored in several different tables and querying it is too slow ( 5 tables join with several million rows in each table, one-to-many ).
I'm currently thinking that the best way is to extract data, transform it and store it in a separate database (let's say MongoDB, since it will be used only for read).
I don't need the data to be live, just not older that 24 hours compared to the 'master' database.
But what's the best way to achieve this? Can you recommend any tools for it (preferably free) or is it better to write your own piece of software and schedule it to run periodically?
I recommend respecting the NIH principle here, reading and transforming data is a well understood exercise. There are several free ETL tools available, with different approaches and focus. Pentaho (ex Kettle) and Talend are UI based examples. There are other ETL frameworks like Rhino ETL that merely hand you a set of tools to write your transformations in code. Which one you prefer depends on your knowledge and, unsurprisingly, preference. If you are not a developer, I suggest using one of the UI based tools. I have used Pentaho ETL in a number of smaller data warehousing scenarios, it can be scheduled by using operating system tools (cron on linux, task scheduler on windows). More complex scenarios can make use of the Pentaho PDI repository server, which allows central storage and scheduling of your jobs and transformations. It has connectors for several database types, including MS SQL Server. I haven't used Talend myself, but I've heard good things about it and it should be on your list too.
The main advantage of sticking with a standard tool is that once your demands grow, you'll already have the tools to deal with them. You may be able to solve your current problem with a small script that executes a complex select and inserts the results into your target database. But experience shows those demands seldom stay the same for long, and once you have to incorporate additional databases or maybe even some information in text files, your scripts become less and less maintainable, until you finally give in and redo your work in a standard toolset designed for the job.

Is this circumstance, is it better to use an SSIS package, or just script out the job?

Forewarning: I wasn't entirely sure if this question should be in here (SO) or in The Workplace because it isn't so much about programming, as much as it is convincing my co-worker that I think their method is bad. But it's still programming related. So MODs, please feel free to relocate this question to 'the workplace'. Anyway...
At work we have large SSAS cubes that have been split into multiple partitions. The individual who set up these partitions scheduled every partition to be processed everyday. But in hindsight because the data in these partitions is historic there is no need to process each partition everyday. Only the current partition should be processed after the latest data has been added into the cube's data source.
I gave my coworker a task to automate this process. I figured all they need to do is get the current date, and then process the partition corresponding to that date range. Easily scriptable.
My coworker creates an SSIS package for doing this...
Cons:
the ssis package is hard to source control
the ssis package will be hard to test
the ssis package is a pain in the ass to debug
the ssis package requires Visual Studio and Visual Studio Data Tools to even open
lastly, I feel SSIS packages lead to heavy technical-debt
Pros:
it's easier for my coworker to do (maybe)
Correct me if I'm wrong on any of those but just the first reason is enough for me to scrap all of their work.
Needless to say I'm extremely biased against anything done in SSIS. But processing a cube can be scripted out in xmla (ref: link). And then using a SQL Server Agent job you can schedule that script to run a specific times. The only tricky part would be changing out the partition name that is processed within the script. Furthermore, the script/job can be kept in source control and then deployed to the MSSQL server whenever a change is made.
Am I being too critical here? I'm just trying to keep the next developers from ripping their hair out.
What you can do is to have two SQL Jobs:
1) Full processing + repartitioning
2) Incremental processing (processing of only last (current) partition).
You don't need SSIS neither for (1), nor for (2).
For (2) the script will be fixed - you just make a call to process one partition and incremental processing of dimensions (if required). Current partition must have a condition WHERE >= .... (not BETWEEN), so it covers the future dates if a new partition is not created yet.
For (1), you can write TSQL code that creates a new partition for the new period and reprocess the cube. It can be scheduled to run over weekend when the server is idle, or once per month.
(1) does below:
backup existing cube (Prod) via SSAS Command of SQL Agent
restore the backup as TempCube via SSAS Command of SQL Agent with AllowOverwrite (in case if temp cube was not deleted before)
delete all partitions in TempCube via TSQL + LinkedServer to SSAS
re-create partitions and process cube (full) via TSQL +LinkedServer to SSAS
Backup TempCube
Delete TempCube
Restore backup of TempCube as Production Cube.
As you see, the process is crash safe and you don't need SSIS. Even when a job (that creates a new partition) wasn't run for some reason, the cube still have the new data. The data will be split when a new partition structure is created by (1).
I think you are looking at this wrong way. To be honest your list of cons is pretty bad and is just a reflection of your opinion of SSIS. There is always more than one tool in the toolbox for any job. The proper tool to use will vary from shop to shop.
If the skill set of the party responsible for development and maintenance of this automated process is SSIS then you should really have a better reason than personal preference to rewrite the process with a different tool. A couple reasons I can think of are company standards and skill set of the team.
If company standard dictates the tool then follow that. You should have staff that are capable of using the tools the company mandates. Otherwise assess the skill set of your team. If you have a team of SSIS developers don't force them to use something else because its your personal preference.
Your task of dynamic SSAS partitions processing can be automated with or without SSIS. SSIS is just an environment to execute tasks and do data manipulation. On its Pros - it has built-in components which execute XMLA script from variable and capture error messages. In pure .NET you have to do it yourself, but it is not too complex.
Several sample approaches to your task
Create XMLA XML and execute it with SSIS.
Generate XMLA from AMO library and execute it in .NET. You need to look at chapter 4d) Process All Dims. Provided sample does more that that and steps are put into SSIS package as well.
I personally used SSIS in similar situation, probably because the other 99% of ETL logic and data manipulation is on SSIS. As said before, SSIS offers no significant advantage here. The second example shows how to do it in pure .NET.

What is the current trend for SQL Server Integration Services?

Could anybody tell me what the current trend for SQL Server Integration Services is? Is it better than other ETL tools available in market like Informatica, Cognos, etc?
I was introduced to SSIS a couple of weeks ago. Executive summary: I am unlikey to consider it for future projects.
I'm pretty sure flow charts (i.e. non-structured) were discredited as an effective programming paradigm a long time, except in a tiny minority of cases.
There's no point replacing a clean textual (source code) interface with a colourful connect-the-dots one if the user still needs to think like a programmer to know where to drag the arrows.
A program design that you can't access (e.g. fulltext search, navigate using alternative methods, effectively version control, ...) except by one prescribed method is a massive productivity killer. And a wonderful source of RSI.
It's possible there is a particular niche where it's just right, but I imagine most ETL tasks would outgrow it pretty quickly.
SSIS isn't great for production applications from my experience for the following reasons:
To call an SSIS package remotely, you have to call a stored procedure, that calls a job, that calls the SSIS
Using the above method, you can't pass in parameters.
Passing parameters means you have to call the SSIS on a local server - meaning code running on a remote server will have to call code running on the SQL server to execute the package.
I would always rather write specific code to handle ETL and use SSIS for one off transforms.
In my opinion it's quite good platform, and I see a good progress on it. Many of the drwabacks that 2005 version had and that the community complained about, have been corrected on 2008.
From my point of view, the best thing is that you can extend and complement it with SQL or .NET code in an organized way as much as you want.
For instance, you can decide if in your solution you want 80% of c# code and 20% of ETL componenets or 5% of c# code and 95% of ETL components.
disclaimer - i work for microsoft
now the answer
SSIS or SQL Server Integration services is a great tool for ETL operations, there is a lot of uptake in the market place. there is no additional cost other than licensing SQL server and you can also use .Net languages to write tasks.
http://www.microsoft.com/sqlserver/2008/en/us/integration.aspx
http://msdn.microsoft.com/en-us/library/ms141026.aspx
I would list as benefits:
you use SSIS for bigger projects, probably/preferably once or in one run, and then use the integration project for many months with minor changes; the tasks, packages and everything in general is easily readable (of course, depends on perspective)
the tool itself handles the scheduled runs, sending you mails with the logs, and - as long as my experience reaches - it communicates very well with all the other tools (such as SSAS, SQL Server Management Studio, Microsoft Office Excel, Access etc., and other, non-Microsoft tools)
the manually, in-detail configured tasks seem to take over the responsibility in all ways, letting only small chance for errors
as also mentioned above, there are many former problems corrected in the new versions
I would recommend it for ETL, especially if you would continue with analytical processes, since the SSIS, SSAS and SSRS tools blend together quite smoothly.
Drawback: debugging/looking for errors is a bit harder until you get used to it.

Consideration DECS vs SSIS?

I need solution to pump data from Lotus Notes to SqlServer. Data will be transfered in 2 modes
Archive data transfer
Current data transfer
Availability of data in Sql is not critical, data is used for reports. Reports could be created daily, weekly or monthly.
I am considering to choose from one of those solutions: DESC and SSIS. Could You please give me some tips about prons and cons of both technologies. If You suggest something else it could be also taken into consideration.
DECS - Domino Enterprise Connection Services
SSIS - Sql Sever Integration Services
I've personally used XML frequently to get data out of Lotus Notes in a way that can be read easily by other systems. I'd suggest you take a look and see if that fits your needs. You can create views that emit XML or use NotesAgents or Java Servlets, all of which can be accessed using HTTP.
SSIS is a terrific tool for complex ETL tasks. You can even write C# code if you need to. There are lots of pre-written available data cleaning components already out there for you to download if you want. It can pretty much do anything you need to do. It does however have a fairly steep learning curve. SSIS comes free with SQL Server so that is a plus. A couple of things I really like about SSIS are the ability to log errors and the way it handles configuration so that moving the package from the dev environment to QA and Prod is easy once you have set it up.
We have also set up a meta data database to record a lot of information about our imports such as the start and stop time, when the file was recieved, the number of records processed, types of errors etc. This has really helped us in researching data issues and has helped us write some processes that are automatically stopped when the file exceeds the normal parameters by a set amount. This is handy if you normally recive a file with 2 million records and the file comes in one day with 1000 records. Much better than delting 2,000,000 potential customer records because you got a bad file. We also now have the ability to do reporting on files that were received but not processed or files that were expected but not received. This has tremendously improved our importing porcesses (we have hundreds of imports and exports in our system). If you are designing from sratch, you might want to take some time and think about what meta data you want to have and how it will help you over time.
Now depending on your situation at work, if there is a possibility that data will also be sent to the SQL Server database from sources other than Lotus Notes as well as the imports from Notes that you are developing for, I would suggest it might be worth your time to go ahead and start using SSIS as that is how the other imports are likely to be done. As a database person, I would prefer to have all the imports I support using the same technology.
I can't say anything about DECS as I have never used it.
Just a thought - but as Lotus Notes tends to behave a bit "different" than relational databases (or anything else), you might be safer going with a tool which comes out of the Notes world, versus a tool from the sql world.
(I have used DECS in the past (prior to Domino 8) and it has worked fine for pumping data out into a SQL Server database. I have not used SSIS).

Resources