I'm testing the retrieval of blobs by a web application.
There are some difficulties uploading blobs programmatically from the javascript code so I decided to prepopulate the database with some data instead. However I'm running into some problems with that aswell.
We have a database versioning process that expects all the schema + data for the database to be in scripts that can be run by sqlcmd.
This post seems to show how to insert blobs. However this script requires that you specify an absolute path to a file on the server.
Is there another way? We are using source control and continuous integration and so wouldn't ever really want to refer to a file in a specific place outside a given copy of a repository on one machine.
If not it seems like there are 2 options:
Take the hit and never change or delete anything from a random directory on the database server aswell. The data will need to be split between several locations. Further more we either ban blobs from production config deployment or just have to bear in mind we have to do something crazy if we ever need them - we wont be in control of the directory structure on a remote server. This probably wont be a massive problem to be fair - I can't see us wanting to ship any config in blob form really.
or
write a program that does something crazy like remotely create a temporary directory on the server and then copy the file there at the correct version and output a script with that filename in.
It doesn't really seem like having things under source control and not wanting to hardcode paths is exactly an outlandish scenario but the poor quality of database tools stopped surprising me a while ago!
Assuming you are referring to a field of type BINARY / VARBINARY / IMAGE, you should be able to just specify the Hex Bytes such as:
0x0012FD...
For example:
INSERT INTO TableName (IDField, BlobField) VALUES (1, 0x0012FD);
You just need to get that string of hex digits from the file. If you already have such a value in the DB already, then just select that row and field in SSMS, and copy / paste the value from the cell (in "Results to Grid" mode) into your SQL script.
You can also wrap long lines using a backslash as follows:
INSERT INTO TableName (IDField, BlobField) VALUES (1, 0x0012FD\
12B36D98\
D523);
If wrapping via back-slash, be sure to start each new line at the first position as the entire thing is treated as a continuous string. Hence, indenting lines that come immediately following a back-slash would then have spaces between the hex digits, which is not valid. For example:
INSERT INTO TableName (IDField, BlobField) VALUES (1, 0x0012FD\
12B36D98\
D523);
equates to:
0x0012FD 12B36D98D523
If you have access to C#, here's a function that I've used that will take a binary blob and spit out a SQL script that sets a varbinary(max) variable to the contents of the blob. It will format it nicely and take into account length restrictions on SQL statements (which can be an issue with very large blobs). So basically it will output something like:
select #varname = 0x4d5a90000300000004000000ffff0000b8000000000000 +
0x0040000000000000000000000000000000000000000000000000000000000000 +
0x0000000000800000000e1fba0e00b409cd21b8014ccd21546869732070726f67 +
...
0x007365745f4d6574686f64007365745f53656e644368756e6b65640053747265;
select #varname = #varname + 0x616d004765745265717565737453747265 +
0x616d0053797374656d2e5465787400456e636f64696e6700476574456e636f64 +
...
You just have to make sure to declare the variable at the front of the script it gives you. You could build a little utility that runs this function on a file (or wherever your blobs come from) to help in creating your scripts.
public static string EncodeBinary(string variable, byte[] binary)
{
StringBuilder result;
int column;
int concats;
bool newLine;
if (binary.Length == 0)
{
return "select " + variable + " = null;";
}
result = new StringBuilder("select ");
result.Append(variable);
result.Append(" = 0x");
column = 12 + variable.Length;
concats = 0;
for (int i = 0; i < binary.Length; i++)
{
newLine = false;
if (column > 64)
{
concats++;
newLine = true;
}
if (newLine)
{
if (concats == 64)
{
result.Append(";\r\nselect ");
result.Append(variable);
result.Append(" = ");
result.Append(variable);
result.Append(" + 0x");
column = 15 + variable.Length * 2;
concats = 1;
}
else
{
result.Append(" +\r\n0x");
column = 2;
}
}
result.Append(binary[i].ToString("x2"));
column += 2;
}
result.Append(";\r\n");
return result.ToString();
}
Related
I've got some SSIS packages that take CSV files that come from the vendor and puts them into our local database. The problem I'm having is that sometimes the vendor adds or removes columns and we don't have time to update our packages before our next run, which causes the SSIS packages to abend. I want to somehow prevent this from happening.
I've tried reading in the CSV files line by line, stripping out new columns, and then using an insert statement to put the altered line into the table, but that takes far longer than our current process (the CSV files can have thousands or hundreds of thousands of records).
I've started looking into using ADO connections, but my local machine has neither the ACE nor JET providers and I think the server the package gets deployed to also lacks those providers (and I doubt I can get them installed on the deployment server).
I'm at a loss as to what I can do to be able to load tables and be able to ignore newly added or removed columns (although if a CSV file is lacking a column the table has, that's not a big deal) that's fast and reliable. Any ideas?
I went with a different approach, which seems to be working (after I worked out some kinks). What I did was take the CSV file rows and put them into a temporary datatable. When that was done, I did a bulk copy from the datatable to my database. In order to deal with missing or new columns, I determined what columns were common to both the CSV and the table and only processed those common columns (new columns were noted in the log file so they can be added later). Here's my BulkCopy module:
Private Sub BulkCopy(csvFile As String)
Dim i As Integer
Dim rowCount As Int32 = 0
Dim colCount As Int32 = 0
Dim writeThis As ArrayList = New ArrayList
tempTable = New DataTable()
Try
'1) Set up the columns in the temporary data table, using commonColumns
For i = 0 To commonColumns.Count - 1
tempTable.Columns.Add(New DataColumn(commonColumns(i).ToString))
tempTable.Columns(i).DataType = GetDataType(commonColumns(i).ToString)
Next
'2) Start adding data from the csv file to the temporary data table
While Not csvReader.EndOfData
currentRow = csvReader.ReadFields() 'Read the next row of the csv file
rowCount += 1
writeThis.Clear()
For index = 0 To UBound(currentRow)
If commonColumns.Contains(csvColumns(index)) Then
Dim location As Integer = tableColumns.IndexOf(csvColumns(index))
Dim columnType As String = tableColumnTypes(location).ToString
If currentRow(index).Length = 0 Then
writeThis.Add(DBNull.Value)
Else
writeThis.Add(currentRow(index))
End If
'End Select
End If
Next
Dim row As DataRow = tempTable.NewRow()
row.ItemArray = writeThis.ToArray
tempTable.Rows.Add(row)
End While
csvReader.Close()
'3) Bulk copy the temporary data table to the database table.
Using copy As New SqlBulkCopy(dbConnection)
'3.1) Set up the column mappings
For i = 0 To commonColumns.Count - 1
copy.ColumnMappings.Add(commonColumns(i).ToString, commonColumns(i).ToString)
Next
'3.2) Set the destination table name
copy.DestinationTableName = tableName
'3.3) Copy the temporary data table to the database table
copy.WriteToServer(tempTable)
End Using
Catch ex As Exception
message = "*****ERROR*****" + vbNewLine
message += "BulkCopy: Encountered an exception of type " + ex.GetType.ToString()
message += ": " + ex.Message + vbNewLine + "***************" + vbNewLine
LogThis(message)
End Try
End Sub
There may be something more elegant out there, but this so far seems to work.
Look into BiML, which build and executes your SSIS Package dynamically based on the meta-data at run time.
Based on this comment:
I've tried reading in the CSV files line by line, stripping out new
columns, and then using an insert statement to put the altered line
into the table, but that takes far longer than our current process
(the CSV files can have thousands or hundreds of thousands of
records).
And this:
I used a csvreader to read the file. The insert was via a sqlcommand
object.
It would appear at first glance that the bottleneck is not in the flat file source, but in the destination. An OLEDB Command executes in a row by row fashion, one statement per input row. By changing this to an OLEDB destination, it will convert the process to a bulk insert operation. To test this out, just use the flat file source and connect it to a derived column. Run that and check the speed. If it's faster, change to the oledb destination and try again. It also helps to be inserting into a heap (no clustered or nonclustered indexes) and use tablock.
However, this does not solve your whole varied file problem. I don't know what the flat file source does if you are short a column or more from how you originally configured it at design time. It might fail, or it might import the rows in some jagged form where part of the next row is assigned to the last columns in the current row. That could be a big mess.
However, I do know what happens, when a flat file source gets extra columns. I put in this connect item for it which was sadly rejected: https://connect.microsoft.com/SQLServer/feedback/details/963631/ssis-parses-flat-files-incorrectly-when-the-source-file-contains-unexpected-extra-columns
What happens is that the extra columns are concatenated into the last column. If you plan for it, you could make the last column large and then parse in SQL from the staging table. Also, you could just jam the whole row into SQL and parse each column from there. That's a bit clunky though because you'll have a lot of CHARINDEX() checking the position of values all of the place.
An easier option might be to parse it in .Net in a script task using some combo of split() to get all the values and check the count of values in the array to know how many columns you have. This would also allow you to direct the rows to different buffers based on what you find.
And lastly, you could ask the vendor to commit to a format. Either a fixed number of columns or use a format that handles variation like XML.
I've got a C# solution (I haven't checked it, but I think it works) for a source script component.
It will read the header into an array using split.
And then for each data row use the same split function and use the header value to check the column and use rowval to set the output.
You will need to put all the output columns in to the output area.
All columns that are not present will have a null value on exit.
public override void CreateNewOutputRows()
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"[filepath and name]"))
{
while (!sr.EndOfStream)
{
string FullText = sr.ReadToEnd().ToString();
string[] rows = FullText.Split('\n');
//Get header values
string[] header = rows[0].Split(',');
for (int i = 1; i < rows.Length - 1; i++)
{
string[] rowVals = rows[i].Split(',');
for (int j = 0; j < rowVals.Length - 1; j++)
{
Output0Buffer.AddRow();
//Deal with each known header name
switch (header[j])
{
case "Field 1 Name": //this is where you use known column names
Output0Buffer.FieldOneName = rowVals[j]; //Cast if not string
break;
case "Field 2 Name":
Output0Buffer.FieldTwoName = rowVals[j]; //Cast if not string
break;
//continue this pattern for all column names
}
}
}
}
}
}
I have a EF code first database, to populate the initial tables, I am using sql scripts (are far easier to handle and update that the seed methods).
The problem is, that the scripts are inserting wihtout special characters....
The database collation is: SQL_Latin1_General_CP1_CI_AS
The seed is reading the script like this:
context.Database.ExecuteSqlCommand(File.ReadAllText(baseDir + #"..\..\Scripts\Oficina.sql"));
And the script looks like this:
INSERT [dbo].[Oficina] ([IdOficina], [Nombre], [SISTEMA], [ORDEN]) VALUES (20, N'Comisión Admisión', 1, 5)
The problem is, that its being saved in the database as:
Comisi�n Admisi�n
I have no clue what the problem could be.....any ideas?
I faced the same problem few time ago
public static void ExecuteBatchFromFile(this DataContext dc, String fileName, String batchSeparator,
Encoding enc = null) {
if (enc == null)
enc = Encoding.UTF8;
String stSql = File.ReadAllLines(fileName, enc);
/* ... */
}
I solved it by adding the enc parameter to my function.
The problem is to correctly read the source. The collation is not necessarily the storage encoding, but the encoding used for comparison.
Check the table definition to see if the data type is varchar or nvarchar. The question has been asked on the site before. Here is a good explanation:
Which special characters are allowed?
In my project i need to create a file for each student and i thinki have the method created, here it is below
public addStudent(String fullName, int grn, String formClass, String formTeacher)
{
//Default values
int creativity = 0;
int action = 0;
int service = 0;
int total = 0;
//Initialize File
RandomAccessFile adding = new RandomAccessFile(new File(fullName + ".dat"), "rw");
long fileSize = adding.length();
adding.seek(fileSize);
//Variables from Method
adding.writeUTF(fullName + "\n");
adding.writeInt(grn + "\n");
adding.writeUTF(formClass + "\n");
adding.writeUTF(formTeacher + "\n");
//Variables created in method
adding.writeInt(creativtiy + "\n");
adding.writeInt(action + "\n");
adding.writeInt(service + "\n");
adding.writeInt(total + "\n");
adding.close();
}
I just keep thinking that its not right and would like some clarification about certain parts such as this line
RandomAccessFile adding = new RandomAccessFile(new File(fullName + ".dat"), "rw");
fullname is a variable that is passed into the method and it is the name and surname of a student (ex: John Lennon). What i want to do is have the file named "John Lennon.dat". however i keep thinking my approach here is wrong.
Another question is about the integer values. they will be updated from time to time, but by simple addition of current+new. How do i do that?
You have to be carefull if you use possible user input (fullname) unfiltered for naming your files. This can lead to a security hole. You should check fullname for special characters which are not allowed in your file system or would change your directory. Imagine someone could input ../importantfile as fullname and without checking it is possible to overwrite some important files in other directories.
The safest way is to use some generic name schema for your files (like data1.dat, data2.dat and to store the relation fullname to filename in another place (maybe a file index.dat).
I assume you have a good reason to use a RandomAccessFile here. According to your code it is possible to have more than one record in one file. If you do not store the record's starting position at another location then you have to read one record after another. If you found your record to change then you have to read in all fields before your integer value so that your file position points to the integer start position. Then you can read your integer, change the value, move your file position 4 bytes back (seek(-4)). Then you can write your modified integer value.
Alternatives:
You can read the whole file in, modify the integer values and then create the whole file new and overwrite the old. For short files this could be less complex without a significant performance penalty (but this depends, if you have thousand files to change in a short time, this alternative is not recommended).
You can store the file positions of your integer values in another place and use these to directly access these values. This only works if your strings are immutable.
You can use an alternative file format like XML, JSON or serialized objects. But all of these doesn't support in situ changes.
You can use an embedded database like SQLite or H2 and let the database care about file access and indexing.
I have a text file that has a large grouping of numbers (137mb text file) and am looking to use groovy to open the text file, read it line-by-line, modify the numbers, and then place them into a database (as strings). There are going to be 2 items per line that need to be written to separate database columns, which are related.
My text file looks as such:
A.12345
A.14553
A.26343
B.23524
C.43633
C.23525
So the flow would be:
Step 1.The file is opened
Step 2.Line 1 is red
Step 3.Line 1 is split into letter/number pair [:]
Step 4.The number is divided by 10
Step 5.Letter is written to letter data base (as string)
Step 6.Number is written to number database (as string)
Step 7.Letter:number pair is also written to a separate comma separated text file.
Step 8.Proceed to next line (line 2)
Output text file should look like this:
A,1234.5
A,1455.3
A,2634.3
B,2352.4
C,4363.3
C,2352.5
Database for numbers should look like this:
1:1234.5
2:1455.3
3:2634.3
4:2352.4
5:4363.3
6:2352.5
*lead numbers are database index locations, for relational purpose
Database for letters should look like this:
1:A
2:A
3:A
4:B
5:C
6:C
*lead numbers are database index locations, for relational purpose
I have been able to do most of this; the issue I am running into is not be able to use the .eachLine( line -> ) function correctly... and have NO clue how to output the values to the databases.
There is one more thing I am quite dense about, and that is the instance where the script encounters an error. The text file has TONS of entries (around 9000000) so I am wondering if there is a way to make it so if the script fails or anything happens that I can restart the script from the last modified line.
Meaning, the script has an error (my computer gets shut down somehow) and stops running at line 125122 (completes modification of line 125122) of the text file... how do I make it so when I start the script the second time run the script at line 125123.
Here is my sample code so far:
//openfile
myFile = new File("C:\\file.txt")
//set fileline to target
printFileLine = { it }
//set target to argument
numArg = myFile.eachLine( printFileLine )
//set argument to array split at "."
numArray = numArg.split(".")
//set int array for numbers after the first char, which is a letter
def intArray = numArray[2] { it as int } as int
//set string array for numbers after the first char, which is a letter
def letArray = numArray[1] { it as string }
//No clue how to write to a database or file... or do the persistence thing.
Any help would be appreciated.
I would use a loop to cycle over every line within the text file, I would also use Java methods for manipulating strings.
def file = new File('C:\\file.txt')
StringBuilder sb = new StringBuilder();
file.eachLine { line ->
//set StringBuilder to new line
sb.setLength(0);
sb.append(line);
//format string
sb.setCharAt(1, ',');
sb.insert(5, '.');
}
You could then write each line to a new text file, example here. You could use a simple counter (e.g. counter = 0; and then counter++;) to store the latest line that has been read/written and use that if an error occurs. You could catch possible errors within a try/catch statement if you are regularly getting crashes also.
This guide should give you a good start with working with a database (presuming SQL).
Warning, all of this code is untested and should hopefully give you more direction. There are probably many other ways to solve this differently, so keep an open mind.
I have a very large CSV file that I imported into a sqlite table. There are over 50 columns and I'd like to find all rows where any of the columns are null. Is this even possible? I'm just trying to save myself the time of writing out all of the 50 different columns in a where clause. Thanks.
It's an interesting question but it's a probably quicker to write a quick script that generates your converts that copy/pasted header row from your CSV to the appropriate script.
For instance this works in LINQPad (C#)
void Main()
{
string input = "adasda|sadasd|adasd|";
char delim = '|';
StringBuilder sql = new StringBuilder();
sql.AppendLine("SELECT * FROM table WHERE ");
foreach (string s in input.Split(delim))
{
if (!String.IsNullOrEmpty(s))
sql.Append(s).AppendLine(" IS NULL OR ");
}
sql.ToString().Trim('\r', '\n', 'O', 'R',' ').Dump();
}
No. Not without a cursor using DESCRIBE TABLE or an intermediate technology.
Your best bet would be to DEFAULT NULL the columns and re-import the data. But depending on the CSV import and column types, you may still get empty values in the columns.
Sucks, but it is probably quicker to just copy and paste the SQL commands. The script would be reusable.