How to validate large data set from PowerShell against SQL server table - sql-server

I have couple hundred rows of data of about 100 bytes each and need the field values validated against fairly large (milions of rows) table in SQL server. The query itself on SQL is very quick, but running single query per row from the script is not slow, probably because of the connection set-up and teardown. I can't add any stored procs to the server. Is there any way how to pass a dataset/table to a query from the script and return results rather then iterate the dataset row-by-row? Of course PowerShell code samples would be welcome ;)

There won't be a way via powershell (or any other method I am aware of) as SQL Server can't accept a table as an input parameter in a query. Any method you use for this will at some point be passing a row/specific values to SQL.
What you COULD do is create a table from your array and compare that temp table to your large table for validation. Determining if this is faster will need some testing, though.

Related

Monthly report automated with VBA + SQL Server Stored Procs?

I am trying to completely automate this process, and I'm wondering if its viable or efficient to do in VBA.
Report process involves 2 files: one sql file and one excel file.
SQL file has the algorithm, and the final step is a query who's result is then pasted into the excel file.
The algorithm is simpler(than what the audience might be used to) but has two "into" commands and several "update" commands.
Two "into" commands, the first grabs a small portion(constrained on first and last day of previous month) of a 500m+ record table. The second joins the first table with an eligibility type table.
After the second table is created, there is a series of UPDATE commands that change existing data of existing columns.
Then a series of ALTER & UPDATE commands that add new columns to the [second] table and UPDATES them with desired data.
the final step is a query who's results are copy-pasted into excel (as is, no formatting changes necessary).
I'm not too well-versed in VBA/VBNET nor TSQL stored procedures and dynamic sql, if the sql algorithm was a simple pull query with no table creation, I can build something to automate that. But the SQL has 2 table creations, and about a dozen ALTER & UPDATE commands.
Am I stirring up the wrong nest? Should I run it manually as is?
You can definitely do automate this. I created a report that ran two stored procedures and created numerous queries with temp tables including both update and alter commands then used VBA to run execute these and aggregate the data in the final summary sheet.
There is a ton of documentation out there. You can even pass your values to the stored procedure after the user inputs them.
I would add this as a comment but I do not have enough reputation to comment yet (need 50).

How to run SSIS packages dynamically?

We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.

Bulk Update with SQL Server based on a list of primary keys

I am processing data from a database with millions of rows. I am pulling a batch of 1000 items from the database and processing them without a problem. I am not loading the whole entity and I am just pulling down a few columns of data for the batch.
What I want to do is mark the 1000 rows as processed with a single SQL command.
Something like:
UPDATE dbo.Clients
SET HasProcessed = 1
WHERE ClientID IN (...)
The ... is just a list of integers.
Context:
Azure SQL Server 2012
Entity Framework
Database.ExecuteSqlCommand
Ideas:
I know I could build the command as a pure string, but this would mean not using any SqlParameters and not benefiting from the query plan optimization.
Also, I found some information about table-valued parameters, but this requires creating a table type and some overhead which I would like to avoid. This is just a list of integers after all.
Question:
Is there an easy (performant) way to do this that I am overlooking either with Entity Framework or ExecuteSqlCommand?
If not and using table-valued parameters is the best way, could you provide a complete example of how to convert an integer list into the simplest type and running that with the above query?

Combining MSSQL results with Oracle into CFSPREADSHEET

I have a report that generates an excel file daily with data extracted from a MS-SQL database. I now have to add additional columns to my spreadsheet from an Oracle database where the ID matches the ID in the MS-SQL query results.
My problem is that I have about 1200-1400+ unique IDs generated on this report from the first query. When I plug them into an IN list with the Oracle query and try to do a CFDUMP to see if the results will come out as it should, I receive a CF error saying that query cannot list more than 1000 results from the oracle query.
I basically set the values from the first query into a valuelist for the ID column and then put that into the IN clause for the Oracle query. I then do a cfdump on the Oracle where I receive that error. I've also tried wrapping cfloop query = "firstquery"> around the Oracle query and just placing #firstquery.columnIDname# but that does not work either.
So two questions I have here is ..
How do I handle the limit on Oracle with 1k limit and if I only have read only access to the Oracle database with ColdFusion?
After #1 is figured out, how could I combine the results from the Oracle Query with my MSSQL query or in other words, add the columns I'm pulling from the Oracle query to the spreadsheet for the matching ID.
Thanks.
For your quick, dirty, and sub-optimal approach, visit cflib.org and look for a function called ListSplit(). It converts a long list to an array of short lists.
You then loop through this array and run a query each time. Make sure the query name changes with each loop iteration.
After the loop, do a query of queries union query. Then do whatever you have to do to combine that data with what you got from sql server.
Note that you will probably have to use array notation to access your dynamically named query objects.

Traversing multiple CSV in SQL

I have a SQL Server 2008 database. This database has a stored procedure that will update several records. The ids of these records are stored in a parameter that is passed in via a comma-delimited string. The property values associated with each of these ids are passed in via two other comma-delimited strings. It is assumed that the length (in terms of tokens) and the orders of the values are correct. For instance, the three strings may look like this:
Param1='1,2,3,4,5'
Param2='Bill,Jill,Phil,Will,Jack'
Param3='Smith,Li,Wong,Jos,Dee'
My challenge is, I'm not sure what the best way to actually go about parsing these three CSVs and updating the corresponding records are. I have access to a procedure named ConvertCSVtoTable, which converts a CSV to a temp table of records. So Param1 would return
1
2
3
4
5
after the procedure was called. I thought about a cursor, but then it seems to get really messy.
Can someone tell me/show me, what the best way to address this problem is?
I'd give some thought to reworking the inputs to your procedure. Since you're running SQL 2008, my first choice would be to use a table-valued parameter. My second choice would be to pass the parameters as XML. Your current method, as you already know, is a real headache and is more error prone.
You can use bulk load to insert values to tmp table after that PIVOT them and insert to proper one.

Resources