Pulling rows from .log file into SQL Server table - sql-server

I have a very flat, simple log file (6 rows of which one row is blank) that I want to insert into a simple 5 column SQL Server table.
Please excuse my SQL ignorance as my knowledge around this topic is not educated.
Below is the .log file content :-
-----------Log File content start----------
07/30/2016 00:02:03 : BATCH CLOSE SUMMARY
MerchantID - 000022673665
TerminalID - 013
BatchItemCount - 650
NetBatchTotal - 5095.00
----------Log file content end-------------
Below is the simple SQL Server table layout:
CREATE TABLE dbo.CCClose
(
CloseTime NVARCHAR(50) NOT NULL,
MercID NVARCHAR(50) NOT NULL,
TermID NVARCHAR(50) NOT NULL,
BatchCount NVARCHAR(30) NOT NULL,
NetBatcTotal NVARCHAR(50) NOT NULL
);
I'm hoping that somehow have each row looked at by SQL for example:
if .log file like 'Batch close Summary' then insert into CloseTime else
if .log file like 'MerchantID' then insert into MercID else
if .log file like 'BatchItemCount' then insert into BatchCount else
if .log file like 'NetBatchTotal' then insert into NetBatchTotal
Off course it would be great if the proper formatting for each column was in place but at this time I just looking at getting the .log file data populated from a directory of these logs.
I plan to use Crystal Reports to build on the SQL Server tables.

This is not going to be a simple process. You can probably do it with bulk insert. The idea is to read it into a staging table, using:
a record terminator of something like "----------Log file content end-------------" + newline
a field separator of a newline
a staging table with several columns of varchars
Then process the staging table to extract the values (and types) that you want. There are probably other options, if you set up a format file, but that adds another level of complexity.
I would read the table into a staging table with one line per row in the table. Then, I would:
use window functions to assign a record number to rows, based on the "content start" lines
aggregate based on the record number
extract the values using aggregations, string functions, and conversions

Related

Easy way to load a CSV file from the command line into a new table of an Oracle database without specifying the column details

I often want to quickly load a CSV into an Oracle database. The CSV (Unicode) is on a machine with an Oracle InstantClient version 19.5, the Oracle database is of version 18c.
I look for a command line tool which uploads the rows without me specifying a column structure.
I know I can use sqlldr with a .ctl file, but then I need to define columns types, etc. I am interested in a tool which figures out the column attributes itself from the data in the CSV (or uses a generic default for all columns).
The CSVs I have to ingest contain always a header row the tool in question could use to determine appropriate columns in the table.
Starting with Oracle 12c, you can use sqlldr in express mode, thereby you don't need any control file.
In Oracle Database 12c onwards, SQLLoader has a new feature called
express mode that makes loading CSV files faster and easier. With
express mode, there is no need to write a control file for most CSV
files you load. Instead, you can load the CSV file with just a few
parameters on the SQLLoader command line.
An example
Imagine I have a table like this
CREATE TABLE EMP
(EMPNO number(4) not null,
ENAME varchar2(10),
HIREDATE date,
DEPTNO number(2));
Then a csv file that looks like this
7782,Clark,09-Jun-81,10
7839,King,17-Nov-81,12
I can use sqlldr in express mode :
sqlldr userid=xxx table=emp
You can read more about express mode in this white paper
Express Mode in SQLLDR
Forget about using sqlldr in a script file. Your best bet is on using an external table. This is a create table statement with sqlldr commands that will read a file from a directory and store it as a table. Super easy, really convenient.
Here is an example:
create table thisTable (
"field1" varchar2(10)
,"field2" varchar2(100)
,"field3" varchar2(100)
,"dateField" date
) organization external (
type oracle_loader
default directory <createDirectoryWithYourPath>
access parameters (
records delimited by newline
load when (fieldname != BLANK)
skip 9
fields terminated by ',' optionally ENCLOSED BY '"' ltrim
missing field values are null
(
"field1"
,"field2"
,"field3"
,"dateField" date 'mm/dd/yyyy'
)
)
location ('filename.csv')
);

How do I save archive from SQL Server database

I have a database in SQL Server. Basically, table consists of a number of XML documents that represent same table data at given time (like backup history). What is the best method to cut off all the old (3 months) backups, remove from DB and save them archived?
There is no export out of the box in SQL Server.
Assuming
Your table can be pretty big, since it looks like you and image of the table every minute.
If you want to do it all from inside SQL Server.
Then I'll suggest doing cleanup in chunks.
The usual process in SQL to delete by chunks is using DELETE in combination with OUTPUT statement.
The easiest way to archive and remove then would be having the OUTPUT to a table in another database, for that sole purpose.
so your steps would be:
Create a new database (ArchiveDatabase)
Create an Archive table in ArchiveDatabase (ArchiveTable) with same structure of the table that you want to remove.
In a while loop perform the DELETE/OUTPUT
Backup the ArchiveDatabase
TRUNCATE ArchiveTable table in ArchiveDatabase
The DELETE/OUTPUT loops will look like something like
declare #RowsToDelete int = 1000
declare #DeletedRowsCNT int = 1000
while #DeletedRowsCNT = #RowsToDelete
begin
delete top (#RowsToDelete)
from MyDataTable
output deleted.* into ArchiveDatabase.dbo.ArchiveTable
where dt < dateadd(month, 3, getdate())
set #DeletedRowsCNT = ##ROWCOUNT
end

hive "\n" value in records

I am processing a large 120 GB file using hive. Data is first loaded from sql server table to aws s3 as csv file (tab separated) and then hive external table is created on top of this file. I have encountered a problem while querying data from hive external table. I noticed that csv contains \n in many columns fields (which was actually “null” in sql server). Now when I create hive table the \n that appears in any record takes hive to new record and generate NULL for rest of the columns in that record. I tried lines terminated by "001" but no success. I get error that hive only supports only "lines terminated by \n". My question is if hive supports only \n as line separator how would you handle columns that contains \n values?
Any suggestions?
This is how I am creating my external table:
DROP TABLE IF EXISTS IMPT_OMNITURE__Browser;
CREATE EXTERNAL TABLE IMPT_OMNITURE__Browser (
ID int, Region string, Description string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://abm-dw/data-import/omniture/Browser/';
You could alter the table with the below command or add the property in the create statement in the TBL properties ;
ALTER TABLE table set SERDEPROPERTIES ('serialization.null.format' = "");
This would make the data in the file as NULL.

How to insert xml file into xml field using bcp?

I have a table:
USE [testdb]
GO
CREATE TABLE [dbo].[a](
[n] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
[x] [xml] NULL)
GO
How to insert xml file into field x from client?
MSDN Example doesn't suit me.
INSERT INTO T(XmlCol)
SELECT * FROM OPENROWSET(
BULK 'c:\SampleFolder\SampleData3.txt',
SINGLE_BLOB) AS x;
I'm not the administrator of this server. And I have the only access to the database. I can not put a file in a directory on the server. I can use the BCP and other tools to access the database
XML file is very large (> 50 MB), so it doesn't turn to insert the text of file as a constant in the query SSMS
Little known fact: the bcp utility supports arbitrary strings as column and row delimiters. Construct a file with delimiters not present in your data, and invoke bcp accordingly.
For example, your column delimiter could be -t \0Field\0. Just check the data first. :-)

How to improve data insert/update performance?

I need to improve the performance of data loading. The current algorythm makes a full select from a table:
select Field1, Field2,...,FieldN from Table1 order by FieldM
The new data is read from a text file (say, textfile line per datatable row).
The table has a primary key, containing two fields. For each line of a textfile it locates the necessary row by these two fields (i.e. the primary key).
query.Locate('Field1;Field2',VarArrayOf([Value1,Value2]),[]);
If Locate returns True, it edits the row, otherwise it appends a new one.
So, as far as the table consists of about 200000 rows, each Locate operation takes certain amount of time...so it manages to update about 5-6 rows per second.
What things should I consider to improve it?
Probably replace locating through this great select with separate queries?
DON'T use Locate(). If you use locate() then Delphi searches row on the client side just scanning row set from your query it takes a LOT of time.
If you have access to MSSQL to create stored procedures then create following procedure and just run it for each line from your TEXT file without any conditions (Use TAdoStoredProc.ExecProc in Delphi). So in this case your don't need first select and Locate procedure. It updates record if Filed1 and Field2 are found and insert if don't.
CREATE PROCEDURE dbo.update_table1
#Field1 int, --key1
#Field2 int, --key2
#Field3 int, -- data fileds
#Field4 int
AS
SET NOCOUNT ON
update table1 set Field3=#Field3,Field4=#Field4
where Field1=#Field1 and Field2=#Field2;
IF(##Rowcount=0)
BEGIN
insert into table1(Field1,Field2,Field3,Field4)
values (#Field1,#Field2,#Field3,#Field4);
END
GO
Here is Delphi code to invoke this stored procedure with ADO:
......
var
ADOStoredP: TADOStoredProc;
......
begin
........
ADOStoredP:=TADOStoredProc.Create(nil);
try
ADOStoredP.Connection:=DataMod.SQL_ADOConnection; //Your ADO Connection instance here
ADOStoredP.ProcedureName:='Update_table1';
ADOStoredP.Parameters.CreateParameter('#Field1', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field2', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field3', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field4', ftInteger, pdInput, 0, 0);
While () -- Your text file loop here
begin
ADOStoredP.Parameters.ParamByName('#Field1').Value:=Field1 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field2').Value:=Field2 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field3').Value:=Field3 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field4').Value:=Field4 value from text file here;
ADOStoredP.ExecProc;
end
finally
if Assigned(ADOStoredP) then
begin
ADOStoredP.Free;
end;
end;
........
end;
If it is possible, then you should send the text file to the server running SQL Server. Then use OPENROWSET(BULK) to open the text file (see "E. Using the OPENROWSET BULK provider with a format file to retrieve rows from a text file").
If you cannot send the text file to the server, then create a temporary or persistent DB table and use INSERT to insert all text file rows into the table.
If you are using SQL Server 2008, then you should use MERGE operator. If more old SQL Server version, then you can use two SQL commands: UPDATE and INSERT. And as a data source use (1) OPENROWSET or (2) DB table.

Resources