SQL Management Studio : Request group to multiple files - sql-server

This may sound like a bizarre question, so let me clarify.
I am currently exporting a bunch of lines from a MS SQL database to a file. The total is approximately 5M records with 10 fields.
Result file is huge and target software struggles to handle it.
What I'd like to do is I'd like to split this request in order to get multiple smaller files instead of one huge file, grouped with one of the 10 fields, let's say by regions.
Is that something SQL Studio can do ? Otherwise is there any solution to my problem ?
I have never worked with SQL fuctions, mabe could they help as well ?
Thanks in advance for your help & have a great day !
Vincent

You can handle this by SQL but I would say if you already produced your intended file and only need to split it you can split the file using some tool.
See this question for how to do it on Windows using command line:
Batch file to split .csv file
If it is a csv file as seems from the tags of this question, you will have to copy the first line and add to all files but first one. Because first line is the header of the CSV file and I assume your application you will need it for every part file.
The other solution would be to write a SQL statement to filter results.
Say if you want to filter by regions field you can write:
SELECT * FROM WHERE regions = ''
This however is very simplistic and you might need to do more work to get intended result.
Your regions values might not be the same number as your intended parts. You will need to figure out how to split based on many region values. You can also implement some SQL partitioning of the result set but I would say the file processing solution should be easier for you to apply.

Related

SQL Query to read a text file and display only selected contents from that

I am working on something which requires me to run an sql query to read a text file from a path but it has to display only few contents based on my conditions/requirements. I have read about using ROWSET/BULK copy but it copies the entire file but I need only certain data from the file.
Ex:
Line 1 - Hello, Good Morning.
Line 2 - Have a great day ahead.
Line 3 - Phone Number : 1112223333 and so on.
So, if I read this file and give the condition as "1112223333", it should display only the lines consisting of "1112223333".
NOTE: It should display the entire line of the matched case/condition
Is it possible to achieve this using an sql query? If so then please help me with this.
Thanks in advance.
Unfortunately what you're trying to do doesn't work with ROWSET. There is no way to apply a filter at read time. You're stuck with reading in the entire table. You can of course read into a temp table, then delete the rows. This will give you the desired end result, but you have to take the hit on reading in the entire table.
You may be able to generate a script to filter the file server side and trigger that with xp_cmdshell but you'd still need to take the performance hit somewhere. While this would be lower load on the SQL server, you'd just be pushing the processing elsewhere, and you'd still have to wait for the processing to happen before you could read the file. May be worth doing if the file is on a separate server and network traffic is an issue. If the file is on the same server, unless SQL is completely bogged down, I can't see an advantage to this.

How to read and change series numbers in columns SSIS?

I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx

Better way to store updatable scientific data?

I am using a file consisting of published scientific data. I'm using this file with a program that reads in the first 5 space delimited data fields, and everything after that is considered a comment by the program.
2 example lines (of thousands):
FeII 1608.4511 0.521 55.36 -1300 M03 Journal of Physics
FeII 1611.23045 0.0321 55.36 1100 01J AJ
The program reads it as:
FeII 1608.4511 0.521 55.36 -1300
FeII 1611.23045 0.0321 55.36 1100
These numbers are each measurements and most (don't get me started) have associated errors that are not listed in this file. I would like to store this information in a useful and updatable way. That is, say the first entry FeII 1608.4511 has an error of plus/minus 0.002. Consider when a new measurement is made and changes it to: FeII 1608.45034 plus/minus 0.0005. I would like to update the value, the error, and record some information about the publication that it came from.
The program that uses this file is legacy code and is both crucial and inflexible: and it needs the file to look like the above output when it's read in. I would really like for there to be a way to update the input file to include things like errors on the values and publication hyperlinks in comments. I would also like a kind of version control ability to return the state of this large file today; or in 5 months after 20 more lines are updated with new values.
Any suggestions on how best to accomplish this? Should I store everything in some kind of database?
Databases are deeply tied to identity. If a database can't identify a row by the data that's in it, a database isn't going to help you.
If I were you, I'd start by storing the base file in a version control system, not a database. At 20 changes per 5 months, I'd probably make those changes manually and commit each batch of changes. (I don't know what might constitute a batch for you. Could be a single change every time.)
Since the format of the existing file is both crucial and brittle, I'm not sure whether modifying it is a good idea. I think I'd feel better about storing error ranges and publication hyperlinks in a separate file, and using a script to put the pieces together for applications that can use error ranges and hyperlinks.
A database sounds sensible, SQL Server Express is free and widely used.
You can read in the text file including all comments and output the edited data in the same format. You can use a number of front ends including Access, for rapid development, or something you create yourself in VB.Net, or even Excel, at a pinch.
You will need to consider the structure of the table(s) but it should not be too difficult, and you can get help here.
For updating the information in the file introducing errors and links, you don't need any database; just open the file, iterate through the lines and update each one.
If you want to be able to restore a line state, you definetively need some kind of database. You can create a database in Sql Server or Firebird for example, and store in it a row for each line historical state (with date of creation off course); your file itself would be the repository for current values and you would be able to restore the file with a date and some simple fetcing of the database information.
If you can't use a database like Firebird or SQL Server, you can store the historical data in a simple text file, it's up to you. Just remember that you necesarely will need, like #CatCall commented, a way to identify each line in order to create a relation between the line in the file and the historical data stored in your repository.

How do I get “select for xml” to output to several files?

Our customer is complaining that our export file is too long; they would like us to split the export into many files with no more than “n” records per file. Is there a way of doing this with “select for xml”
At persent we are using Sql Server 2005 for this project.
(If this is too hard, I can always post process the single large file to split it up)
I don't think there's anything simple'n'easy you can do here.
My approach would probably be to limit the number of rows returned by each SELECT statement (by partioning the data returned by some criteria, e.g. by date or location or something), and then put those smaller XML streams into files one by one. Doable, but not very elegant or sophisticated..

SQL2008 Integration Services - Loading CSV files with varying file schema

I'm using SQL2008 to load sensor data in a table with Integration Services. I have to deal with hundreds of files. The problem is that the CSV files all have slightly different schemas. Each file can have a maximum of 20 data fields. All data files have these fields in common. Some files have all the fields others have some of the fields. In addition, the order of the fields can vary.
Here’s and example of what the file schemas look like.
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,RD_1,SH_1,CL_2
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,CL_1,RS_1,RI_1,PR_1,WS_1,WD_1,WSM_1,WDM_1,SH_1
Station Name,Station ID,LOCAL_DATE,T_1,TD_1,RH_1,RS_1,RI_1,PR_1,RD_1,WS_1,WD_1,WSM_1,WDM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,PW_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,WS_1,WD_1,WSM_1
Station Name,Station ID,LOCAL_DATE,T_1,RH_1,RS_1,PR_1,VI_1,WS_1,WD_1,WSM_1
I’m using a Data Flow Script Task to process the data via CreateNewOutputRows() and MyOutputBuffer.AddRow(). I have a working package to load the data however it’s not reliable and robust because as I had more files the package fails because the file schema has not been defined in CreateNewOutputRows().
I'm looking for a dynamic solution that can cope with the variation in the file schema. Doeas anyone have any ideas?
Who controls the data model for the output of the sensors? If it's not you, do they know what they are doing? If they create new and inconsistent models every time they invent a new sensor, you are pretty much up the creek.
If you can influence or control the evolution of the schemas for CSV files, try to come up with a top level data architecture. In the bad old days before there were databases, files made up of records often had, as the first field of each record, a "record type". CSV files could be organized the same way. The first field of every record could indicate what type of record you are dealing with. When you get an unknown type, put it in the "bad input file" until you can maintain your software.
If that isn't dynamic enough for you, you may have to consider artificial intelligence, or looking for a different job.
Maybe the cmd command is good. in the cmd, you can use sqlserver import csv.
If the CSV files that all have identical formats use the same file name convention or if they can be separated out in some fashion you can use the ForEach Loop Container for each file schema type.
Possible way to separate out the CSV files is run a Script (in VB) in SSIS that reads the first row of the CSV file and checks for the differing types (if the column names are in the first row) and then moves the files to the appropriate folder for use in the ForEach Loop Container.

Resources