SSIS and missing cr lf in row - sql-server

I'm having an issue with SSIS. Its really throwing things off. Here is an example of what I am facing
H~Column1~Column2~Column3~Column4
D~1~2~3~4<LF>
D~6-7-8-9<LF>
T~ More Stuff<LF>
The first line doesn't have an LF character so when I set up a File Task in SSIS, the program reads it as 1 column as one long string
H~Column1~Column2~Column3~Column4D~1~2~3~4D~6-7-8-9T~ More Stuff
Any idea on how to break this up so that SSIS can delimit this properly.

Create a Script task to read the whole file line-by-line and output it to a new file caled {YourFilename}_Cleaned (or something like that). Below is a skeleton of the Main() method. Just replace the comment "// insert LF into LineData after column name list" with the code to insert the LF at the correct point in your first line.
/* include these
using System;
using System.IO;
using System.Text.RegularExpressions;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
*/
// put the below in your Main() method
string sourceFile = (string)Dts.Variables["FilePickupRootPath"].Value + "\\Process\\" + (string)Dts.Variables["FileName"].Value;
string cleanFile = (string)Dts.Variables["FilePickupRootPath"].Value + "\\Process\\" + (string)Dts.Variables["FileName"].Value + "_Cleaned";
string lineData;
Boolean isFirstLine = true;
try
{
StreamReader reader = new StreamReader(sourceFile);
StreamWriter writer = new StreamWriter(cleanFile, false);
lineData = reader.ReadLine();
while (lineData != null)
{
if (isFirstLine)
{
// insert LF into LineData after column name list
isFirstLine = false;
}
writer.WriteLine(lineData);
lineData = reader.ReadLine();
}
reader.Close();
writer.Close();
}
catch (Exception e)
{
MessageBox.Show(e.Message, "Error!");
Console.WriteLine("Exception: " + e.ToString());
Dts.TaskResult = (int)ScriptResults.Failure;
}
Dts.TaskResult = (int)ScriptResults.Success;

Related

Exporting all table in SQL Server to .text using C#

I have an application that trying to extract all data in different table with 1 Database. First, I stored all the query in a .txt file to retrieve the table name and stored it in List.
[Here's my .txt file]
string script = File.ReadAllText(#"D:\Schooldb\School.txt");
List<string> strings = new List<string>();
strings.Add(script);
using (SqlConnection connection = new SqlConnection(constring))
{
foreach (string x in strings)
{
using (SqlCommand cmd = new SqlCommand(x, connection))
{
using (SqlDataAdapter adapter = new SqlDataAdapter())
{
cmd.Connection = connection;
adapter.SelectCommand = cmd;
using (DataTable dt = new DataTable())
{
adapter.Fill(dt);
string txt = string.Empty;
foreach (DataColumn column in dt.Columns)
{
//Add the Header row for Text file.
txt += column.ColumnName + "\t\t";
}
//Add new line after Column Name.
txt += "\r\n";
foreach (DataRow row in dt.Rows)
{
foreach (DataColumn column in dt.Columns)
{
//Add the Data rows.
txt += row[column.ColumnName].ToString() + "***";
}
//Add new line.
txt += "\r\n";
}
int y = 0;
StreamWriter file = new StreamWriter($#"D:\SchoolOutput\{x}_{DateTime.Now.ToString("yyyyMMdd")}.txt");
file.WriteLine(txt.ToString());
file.Close();
y++;
}
}
}
}
Expected:
teachers_datetoday
students_datetoday
subjects_datetoday
But reality my output is just
datetoday txt
Can someone tell me, where part did I go wrong?
Thanks in advance!
There are other approaches for extracting data directly using SSMS.
In this case, your code reads the entire text as a single string, and the for loop runs only once.
Instead of reading the entire file as a string, you can have each line as one command and read the commands like the following.
foreach (string line in System.IO.File.ReadLines(#"D:\Schooldb\School.txt"))
{
//Each line contains one command
//Write your logic here
}

Load CSV files into Sql table with different number of columns and column names using Script Task in SSIS

I want to load csv files into an SQL table using a Script Task in SSIS which have similar column names but not exactly the same and the number of columns also vary.
I am currently using the script below from this useful blog which checks whether the exact column name exists and if it does, loads it into the table, however, it will fail if the column name doesn't exist. Is there a way to use the LIKE operator to search the column name from the csv file in the sql table? If it finds it, load the data into the table and if it doesn't find it, ignore the column.
Script:
public void Main()
{
string delimiter = Dts.Variables["$Package::Delimiter"].Value.ToString();
string TableName = Dts.Variables["$Package::TableName"].Value.ToString();
SqlConnection myADONETConnection = new SqlConnection();
myADONETConnection = (SqlConnection)
(Dts.Connections["ADOConn"].AcquireConnection(Dts.Transaction) as SqlConnection);
//Reading file names one by one
string SourceDirectory = Dts.Variables["$Package::SourceFolder"].Value.ToString();
string[] fileEntries = Directory.GetFiles(SourceDirectory);
foreach (string fileName in fileEntries)
{
// MessageBox.Show(fileName);
string columname = "";
//Reading first line of each file and assign to variable
System.IO.StreamReader file2 = new System.IO.StreamReader(fileName);
//Writing Data of File Into Table
int counter = 0;
string line;
System.IO.StreamReader SourceFile =
new System.IO.StreamReader(fileName);
while ((line = SourceFile.ReadLine()) != null)
{
if (counter == 0)
{
columname = line.ToString();
columname = "" + columname.Replace(delimiter, ",");
//MessageBox.Show(columname);
}
else
{
// MessageBox.Show("Inside ELSE");
string query = "Insert into " + TableName +
"(" + columname + ") VALUES('" + line.Replace(delimiter, "','") + "')";
//MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, myADONETConnection);
myCommand1.ExecuteNonQuery();
}
counter++;
}
SourceFile.Close();
}
Dts.TaskResult = (int)ScriptResults.Success;
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
}
Thanks,
J

using Microsoft.SqlServer.TransactSql.ScriptDom for parse query with errors

I use the following code to get a list of statements in the query:
using System;
using System.Collections.Generic;
using System.IO;
using System.Windows.Forms;
using Microsoft.SqlServer.TransactSql.ScriptDom;
namespace SqlTokenazer
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
Tokenaze();
}
private void Tokenaze()
{
rtbLog.Clear();
string script = "select * from dbo.Mytable where columnName = 0 delete from dbo.Mytable where columnName = 0";
var sqlScript = ParseScript(script);
PrintStatements(sqlScript);
}
public TSqlScript ParseScript(string script){
IList<ParseError> parseErrors;
TSql100Parser tsqlParser = new TSql100Parser(true);
TSqlFragment fragment;
using (StringReader stringReader = new StringReader(script))
{
fragment = (TSqlFragment)tsqlParser.Parse(stringReader, out parseErrors);
}
if (parseErrors.Count > 0)
{
var retMessage = string.Empty;
foreach (var error in parseErrors)
{
retMessage += error.Number + " - " + error.Message + " - position: " + error.Offset + ";\r\n";
}
rtbLog.Text += retMessage;
}
return (TSqlScript)fragment;
}
public void PrintStatements(TSqlScript tsqlScript)
{
if (tsqlScript != null)
{
foreach (TSqlBatch batch in tsqlScript.Batches)
{
if (batch.Statements.Count == 0) continue;
foreach (TSqlStatement statement in batch.Statements)
{
rtbLog.Text += string.Format("{0}\r\n", statement.GetType().ToString());
}
}
}
}
}
}
Results:
Microsoft.SqlServer.TransactSql.ScriptDom.SelectStatement
Microsoft.SqlServer.TransactSql.ScriptDom.DeleteStatement
But when I make a mistake in query, a list of statements is empty :(
string script = "select * from dbo.Mytable where ...
delete from dbo.Mytable where columnName = 0";
how can I get a list of statements, if the query is wrong?
Thanks!
I know this is an old question, but I came across it while Googling so I figured I'd answer it.
If your question is, how to get a list of statements if the SQL can't be parsed, the short answer is that you can't - the parser has no idea what the list of statements would be. You'd have to look at the errors and figure it out.
If your question is, what's wrong with the input code, it's that the select and delete statements are all on the same line. If you separate them with a semicolon or break them into two lines, it'll work and you can get your two statements.

SqlServer Converting XML to varbinary and parsing it in .NET (C#)

Consider the following code:
[Test]
public void StackOverflowQuestionTest()
{
const string connectionString = "enter your connection string if you wanna test this code";
byte[] result = null;
using (var connection = new SqlConnection(connectionString))
{
connection.Open();
using (var sqlCommand = new SqlCommand("declare #xml as xml = '<xml/>' SELECT convert(varbinary(max), #xml) as value"))
//using (var sqlCommand = new SqlCommand("SELECT convert(varbinary(max), N'<xml/>') as value"))
{
sqlCommand.Connection = connection;
using (SqlDataReader reader = sqlCommand.ExecuteReader())
{
while (reader.Read())
{
result = (byte[])reader["value"];
}
reader.Close();
}
}
}
string decodedString = new UnicodeEncoding(false, true).GetString(result);
var document = XElement.Parse(decodedString);
}
If I run this test I get an XmlException with message : "Data at the root level is invalid. Line 1, position 1." As it turns out the problem is "0xFFFE" preamble which is considered as invalid character.
Note that if I use commented string instead, everything works just fine, which is strange as per me. Looks like SqlServer stores XML strings in UCS-2 with a BOM, and at the same time it stores nvarchar values without it.
The main question is: how can I decode this byte array to string which will not contain this preamble (BOM)?
In case anyone will need this in future, the following code works:
using(var ms = new MemoryStream(result))
{
using (var sr = new StreamReader(ms, Encoding.Unicode, true))
{
decodedString = sr.ReadToEnd();
}
}

How to Bulk Insert csv with double quotes around all values?

I am trying to insert a .csv file into SQL Server 2008 R2.
The .csv is 300+MB from http://ipinfodb.com/ip_database.php Complete
(City), 4.0M records.
Here're the top 5 lines, with 1st line = column headers:
"ip_start";"country_code";"country_name";"region_code";"region_name";"city";"zipcode";"latitude";"longitude";"metrocode"
"0";"RD";"Reserved";;;;;"0";"0";
"16777216";"AU";"Australia";;;;;"-27";"133";
"17367040";"MY";"Malaysia";;;;;"2.5";"112.5";
"17435136";"AU";"Australia";;;;;"-27";"133";
I tried Import and Export Data, and BULK INSERT, but haven't been able to import them correctly yet.
Shall I resort to use bcp? can it handle stripping the ""? how?
Thank you very much.
Got it, forgot to set Text Qualifier as ":
Your data looks pretty inconsistent since NULL values don't also carry a quotation enclosure.
I believe you can create a format file to customize to your particular csv file and its particular terminators in SQL SERVER.
See more here:
http://lanestechblog.blogspot.com/2008/08/sql-server-bulk-insert-using-format.html
Is this a single import or are you wanting to schedule a recurring import? If this is a one-time task, you should be able to use the Import and Export Wizard. The text qualifier will be the quotation mark ("), be sure to select column names in the first data row, and you'll want to convey that the field delimiter is the semicolon (;).
I'm not certain the file is properly formatted - the last semicolon following each of the data rows might be a problem. If you hit any errors, simply add a new column header to the file.
EDIT: I just did a quick test, the semicolons at the end will be treated as part of the final value in that row. I would suggest adding a ;"tempheader" at the end of your header (first) row - that will cause SQL to treat the final semicolon as a delimiter and you can delete that extra column once the import is complete.
In C# you can use this code, working for me
public bool CSVFileRead(string fullPathWithFileName, string fileNameModified, string tableName)
{
SqlConnection con = new SqlConnection(ConfigurationSettings.AppSettings["dbConnectionString"]);
string filepath = fullPathWithFileName;
StreamReader sr = new StreamReader(filepath);
string line = sr.ReadLine();
string[] value = line.Split(',');
DataTable dt = new DataTable();
DataRow row;
foreach (string dc in value)
{
dt.Columns.Add(new DataColumn(dc));
}
while (!sr.EndOfStream)
{
//string[] stud = sr.ReadLine().Split(',');
//for (int i = 0; i < stud.Length; i++)
//{
// stud[i] = stud[i].Replace("\"", "");
//}
//value = stud;
value = sr.ReadLine().Split(',');
if (value.Length == dt.Columns.Count)
{
row = dt.NewRow();
row.ItemArray = value;
dt.Rows.Add(row);
}
}
SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
bc.DestinationTableName = tableName;
bc.BatchSize = dt.Rows.Count;
con.Open();
bc.WriteToServer(dt);
bc.Close();
con.Close();
return true;
}

Resources