Microsoft LogParser CSV output format - suppression of double quoting - logparser

I am having a problem with MS LogParser v2.2 and cannot seem to track down "the solution".
I am importing a UTF-16 TSV file with headers and am trying to export a subset of these fields this to a CSV file, generally with no processing, but with one instance of concatenating two of the fields with an intervening space (combining a first and last name into a full name).
The problem is that not only are all the fields double quoted irrespective of the oDQuotes argument (which I can happily ignore), but the concatenated field contains the result of that double quoting. I.e. given two fields Fred and Bloggs, the contents of the concatenated field is always
"Fred" "Bloggs"
rather than the less harmful
"Fred Bloggs"
or even
Fred Bloggs
no matter what the value of the oDQuotes parameter (OFF, AUTO or ON). The presence of these double quotes cannot be ignored or easily discarded.
I have tried this both in a batch file and in Windows Script:
e.g. Batch File:
set lp=%ProgramFiles(x86)%\Log Parser 2.2\LogParser.exe
set fields=[Buyer E-mail Address]
set fields=%fields%, [Order ID]
set fields=%fields%, [Shipping Addr 1]
set fields=%fields%, [Shipping Addr 2]
set fields=%fields%, [Shipping City]
set fields=%fields%, [Shipping Postal Code]
set fields=%fields%, [Buyer First Name]
::set fields=%fields%, strcat([Buyer First Name], ' ', [Buyer Last Name]) --- does not work. :-(
set fields=%fields%, strcat([Buyer First Name], strcat(' ', [Buyer Last Name]))
set fields=%fields%, [Buyer Last Name]
set fields=%fields%, [Buyer Company]
set fields=%fields%, [Buyer Day Phone]
set sql=SELECT %fields% into chad_out.csv from %1
"%lp%" -q:ON -i:TSV -icodepage:-1 -nSep:1 -fixedSep:on -o:CSV -oDQuotes:OFF -fileMode:1 "%sql%"
or JScript:
function ProcessFile(filename) {
DebugEcho(50, "D&D File name is <" + filename + ">");
var lq = WScript.CreateObject("MSUtil.LogQuery");
var lqif = WScript.CreateObject("MSUtil.LogQuery.TSVInputFormat");
var lqof = WScript.CreateObject("MSUtil.LogQuery.CSVOutputFormat");
// check that we actually have the objects in question
if (lq && lqif && lqof) {
DebugEcho(100, "Everything ok");
} else {
DebugEcho(0, "Something bad with LogQuery objects - exiting");
WScript.Quit(1);
}
// see command line "> LogParser.exe -h -i:TSV" for details
lqif.codepage = -1; // this is for unicode
lqif.fixedSep = true; // seems to need this
lqif.nSep = 1; // seems to need this?
// see command line "> LogParser.exe -h -o:CSV" for details
lqof.oDQuotes = "OFF"; // OFF | AUTO | ON - doesn't make any difference!
lqof.fileMode = 1; // 0 - append, 1 - overwrite, 2 - ignore
var fields = [
"[Buyer E-mail Address]",
"[Order ID]",
"[Shipping Addr 1]",
"[Shipping Addr 2]",
"[Shipping City]",
"[Shipping Postal Code]",
"[Buyer First Name]",
"strcat([Buyer First Name], strcat(' ', [Buyer Last Name]))", //
"[Buyer Last Name]",
"[Buyer Company]",
"[Buyer Day Phone]"
];
var sql = [
"SELECT",
fields.join(", "),
"INTO", "chad_out.csv",
"FROM", filename
].join(" ");
DebugEcho(20, "query string:", sql);
lq.ExecuteBatch(sql, lqif, lqof);
}
I'm afraid I can't actually give any data as it's confidential, but I hope that the illustration I have given is sufficient.
I do have other alternatives (i.e. Python csv) but this requires at least packaging the script into an executable (I do not wish to install Python as generally available software).
Can anyone spot something that I have obviously missed to control the behaviour of quoting or is this a deficiency of an otherwise powerful tool? Googling oDQuotes does not seem to be very productive.

It sounds like the quotes are in the input TSV file, aren't they? If that's the case, then the quotes are imported with the field values and you'll need to strip them off with your query (using SUBSTR(MyField, 1, -1)).
TSV does not expect quoted fields and thus it does not remove them.

Related

parse csv file with string fields that contain double quotes and/or commas using Snowflake COPY INTO

MY QUESTION:
How do I construct my copy into statement so that my file properly parses and loads? Thanks!
THE PROBLEM:
I have a csv file that I need to parse and copy to a table from a named stage in Snowflake.
The file looks similar to below:
ID, Name, Job Title,Company Name, Email Address, Phone Number
5244, Ted Jones, Manager, Quality Comms, tj#email.com,555-630-1277
5246, Talim Jones,""P-Boss"" of the world, Quality Comms, taj#email.com,555-630-127
5247, Jordy Jax,,"M & G Services.",jj#services.com, 616-268-1546
MY CODE:
COPY INTO DB.SCHEMA.TABLE_NAME
(
ID,
FULL_NAME,
JOB_TITLE,
EMAIL_ADDRESS
)
FROM
(
SELECT $1::NUMBER AS ID,
$2 AS FULL_NAME,
$3 AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME)
--SNOWFLAKE DOES NOT SUPPORT UTF 16 OR 32 SO HAVING REPLACE INVALID UTF 8 CHARACTERS
FILE_FORMAT = (TYPE = 'CSV', RECORD_DELIMITER = '\n', FIELD_DELIMITER = ',', SKIP_HEADER = 1,FIELD_OPTIONALLY_ENCLOSED_BY = '"',TRIM_SPACE = TRUE,REPLACE_INVALID_CHARACTERS = TRUE)
ON_ERROR = CONTINUE
--COPY A FILE INTO A TABLE EVEN IF IT HAS ALREADY BEEN LOADED INTO THE TABLE
FORCE = TRUE
MY ERROR MESSAGE:
Found character 'P' instead of field delimiter ','
WHAT I HAVE TRIED:
I have tried many things, most notably:
I have tried to escape the double quotes in my select statement for the Job Title.
I have tried removing the FIELD_OPTIONALLY_ENCLOSED_BY = '"' parameter and just using ESCAPE = '"' with no luck.
Try removing the option FIELD_OPTIONALLY_ENCLOSED_BY = '"' and also include a replace function in your inner query.
Example:
SELECT
$1::NUMBER AS ID,
$2 AS FULL_NAME,
replace($3,'"','') AS JOB_TITLE,
$5 AS EMAIL_ADDRESS
FROM #STAGE_NAME

psycopg2 write list of strings (with text delimiter) to a postgres array

Objective:
I have a list containing strings, some have single quotes in them (as part of the string itself) ;
listOfStr = ['A sample string', "A second string with a ' single quote", 'a third string', ...]
Note that each entry does not necessarily use the same text delimiter, some are single quoted, other (the ones containing single quote as part of the string) are double quoted.
I want to insert my list as a postgresql ARRAY using psycopg2:
import psycopg2
connString = (...) # my DB parameters here.
conn = psycopg2.connect(connString)
curs = conn.cursor()
update_qry = ("""UPDATE "mytable" SET arraycolumn = {listofStr}::varchar[],
timestamp = now() WHERE id = {ID}""".format(listofStr=listofStr,
ID=ID))
curs.execute(update_qry)
The problem:
But I get this error:
SyntaxError: syntax error at or near "["
LINE 1: UPDATE "mytable" SET arraycolumn = ['A sample string'...
If I specify the ARRAY data type in the SQL query by adding the word 'ARRAY' in front of my list:
update_qry = ("""UPDATE "mytable" SET arraycolumn = ARRAY {listofStr}::varchar[],
timestamp = now() WHERE id = {ID}""".format(listofStr=listofStr,
ID=ID))
I get this error:
UndefinedColumn: column "A second string with a ' single quote" does not exist
LINE 1: 'A sample string', "A second string with a '...
I don't know how to fix it.
Environment:
Ubuntu 18.04 64 bits 5.0.0-37-generic x86_64 GNU/Linux
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
psycopg2 2.7.7
psycopg2-binary 2.8.4
"PostgreSQL 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit"
Related threads:
Postgres/psycopg2 - Inserting array of strings
Doc:
http://initd.org/psycopg/docs/usage.html -> # list adaptation
Basically the question should have been closed as a duplicate. However, you know Piro's answer and I think you have a problem with interpreting it.
id = 1
list_of_str = ['A sample string', "A second string with a ' single quote", 'a third string']
update_qry = """
UPDATE mytable
SET arraycolumn = %s,
timestamp = now()
WHERE id = %s
"""
cur = conn.cursor()
cur.execute(update_qry, [list_of_str, id])
conn.commit()
I agree with #piro that you really want Bind Parameters,
rather than attempting to do any crazy quoting.
You already know how to accomplish that when inserting
one simple VARCHAR row per list element.
I recommend you create a TEMP TABLE and
send your data to the database in that way.
Then consult https://www.postgresql.org/docs/current/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
and use this example to munge rows of the temp table into an array:
SELECT ARRAY(SELECT oid FROM pg_proc WHERE proname LIKE 'bytea%');
You will want an expression like
SELECT ARRAY(SELECT my_text FROM my_temp_table);
It is possible that your temp table will also need an integer column,
to preserve element order.

Case test if substring exist and replace with blank

I need to cleanup a set of companies name by replacing : INC, LTD, LTD. , INC. , others, with a empty space when they are individual words ( with one blank space before the word i.e. Incoming INC) and not letters part of company name i.e. INComing Money.
The logic I tried :
case
when FINDSTRING([Trade Name]," INC",1) > 0 then REPLACE([Trade Name]," INC","")
when FINDSTRING([Trade Name]," LTD",1) > 0 then REPLACE([Trade Name]," LTD","")
ELSE [Trade Name]
I tried SSIS expresion in a derived column :
FINDSTRING( [Trade Name] ," INC",1) ? REPLACE([Trade Name]," INC","") :
FINDSTRING([Trade Name]," LTD",1) ? REPLACE([Trade Name]," LTD",""):
The error received:
Error at Data Flow Task [Derived Column [1]]: Attempt to find the
input column named "A" failed with error code 0xC0010009. The input
column specified was not found in the input column collection.
In a similar case it is easier to use a Script Component to clean this column, you can simply split the column based on spaces then re concatenate the parts that are not equal to INC, you can use the following method to do that, or you can simple use RegEx.Replace() method to replace values based on regular expressions:
string value = "";
string[] parts = Row.TradeName.Split(' ');
foreach(string str in parts){
if(str != "INC"){
value += " " + str;
}
}
Row.outTradeName = value.TrimStart();

SQL Server 2014 takes off leading zeroes when making Excel file. . . but.

This sp_send_dbmail script works in one of our processes. It attaches an Excel file filled with whatever the query is. It knows to do this because of the extension on the file's name (.xls).
However, it changes a varchar(50) field into a number field, and removes the leading zeroes. This is a known annoyance dealt with in a million ways that won't work for my process.
EXEC msdb.dbo.sp_send_dbmail
#profile_name = #profileName
,#recipients = #emailRecipientList
,#subject = #subject
,#importance = #importance
,#body = #emailMsg
,#body_format = 'html'
,#query = #QuerySQL
,#execute_query_database = #QueryDB
,#attach_query_result_as_file = 1
,#query_attachment_filename = #QueryExcelFileName
,#query_result_header = 1
,#query_result_width = #QueryWidth
,#query_result_separator = #QuerySep
,#query_result_no_padding = 1
Examples of problem below: this simple query changes the StringNumber column from varchar to number in Excel, and removes the zeroes.
SELECT [RowID],[Verbage], StringNumber FROM [dbo].[tblTestStringNumber]
In SQL Server (desired format):
After in Excel (leading zeroes missing):
Now, there might be a way. I only say this because in SQL Server 2016 results pane, if you right click in upper left hand corner, it gives the option of "Open in Excel"
And. . . . drum roll . . . the dataset opens in Excel and the leading zeroes are still there!
If you start a number with a single quote (') in Excel, it will interpret it as a string, so a common solution is to change the query to add one in:
SELECT [RowID]
,[Verbage]
, StringNumber = '''' + [StringNumber]
FROM [dbo].[tblTestStringNumber]
And Excel will usually not display the single quote because it knows that it's a way to cast to type string.
#JustJohn I think it will work fine:
SELECT [RowID]
,[Verbage]
, '="' + [StringNumber]+ '"' StringNumber
FROM [dbo].[tblTestStringNumber]

mysql2sqlite.sh script is not working as required

I am using mysql2sqlite.sh from script Github to change my mysql database to sqlite. But the problem i am getting is that in my table the data 'E-001' gets changed to 'E?001'.
I have no idea how to modify the script to get the required result. Please help me.
the script is
#!/bin/sh
# Converts a mysqldump file into a Sqlite 3 compatible file. It also extracts the MySQL `KEY xxxxx` from the
# CREATE block and create them in separate commands _after_ all the INSERTs.
# Awk is choosen because it's fast and portable. You can use gawk, original awk or even the lightning fast mawk.
# The mysqldump file is traversed only once.
# Usage: $ ./mysql2sqlite mysqldump-opts db-name | sqlite3 database.sqlite
# Example: $ ./mysql2sqlite --no-data -u root -pMySecretPassWord myDbase | sqlite3 database.sqlite
# Thanks to and #artemyk and #gkuenning for their nice tweaks.
mysqldump --compatible=ansi --skip-extended-insert --compact "$#" | \
awk '
BEGIN {
FS=",$"
print "PRAGMA synchronous = OFF;"
print "PRAGMA journal_mode = MEMORY;"
print "BEGIN TRANSACTION;"
}
# CREATE TRIGGER statements have funny commenting. Remember we are in trigger.
/^\/\*.*CREATE.*TRIGGER/ {
gsub( /^.*TRIGGER/, "CREATE TRIGGER" )
print
inTrigger = 1
next
}
# The end of CREATE TRIGGER has a stray comment terminator
/END \*\/;;/ { gsub( /\*\//, "" ); print; inTrigger = 0; next }
# The rest of triggers just get passed through
inTrigger != 0 { print; next }
# Skip other comments
/^\/\*/ { next }
# Print all `INSERT` lines. The single quotes are protected by another single quote.
/INSERT/ {
gsub( /\\\047/, "\047\047" )
gsub(/\\n/, "\n")
gsub(/\\r/, "\r")
gsub(/\\"/, "\"")
gsub(/\\\\/, "\\")
gsub(/\\\032/, "\032")
print
next
}
# Print the `CREATE` line as is and capture the table name.
/^CREATE/ {
print
if ( match( $0, /\"[^\"]+/ ) ) tableName = substr( $0, RSTART+1, RLENGTH-1 )
}
# Replace `FULLTEXT KEY` or any other `XXXXX KEY` except PRIMARY by `KEY`
/^ [^"]+KEY/ && !/^ PRIMARY KEY/ { gsub( /.+KEY/, " KEY" ) }
# Get rid of field lengths in KEY lines
/ KEY/ { gsub(/\([0-9]+\)/, "") }
# Print all fields definition lines except the `KEY` lines.
/^ / && !/^( KEY|\);)/ {
gsub( /AUTO_INCREMENT|auto_increment/, "" )
gsub( /(CHARACTER SET|character set) [^ ]+ /, "" )
gsub( /DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP|default current_timestamp on update current_timestamp/, "" )
gsub( /(COLLATE|collate) [^ ]+ /, "" )
gsub(/(ENUM|enum)[^)]+\)/, "text ")
gsub(/(SET|set)\([^)]+\)/, "text ")
gsub(/UNSIGNED|unsigned/, "")
if (prev) print prev ","
prev = $1
}
# `KEY` lines are extracted from the `CREATE` block and stored in array for later print
# in a separate `CREATE KEY` command. The index name is prefixed by the table name to
# avoid a sqlite error for duplicate index name.
/^( KEY|\);)/ {
if (prev) print prev
prev=""
if ($0 == ");"){
print
} else {
if ( match( $0, /\"[^"]+/ ) ) indexName = substr( $0, RSTART+1, RLENGTH-1 )
if ( match( $0, /\([^()]+/ ) ) indexKey = substr( $0, RSTART+1, RLENGTH-1 )
key[tableName]=key[tableName] "CREATE INDEX \"" tableName "_" indexName "\" ON \"" tableName "\" (" indexKey ");\n"
}
}
# Print all `KEY` creation lines.
END {
for (table in key) printf key[table]
print "END TRANSACTION;"
}
'
exit 0
I can't give a guaranteed solution, but here's a simple technique I've been using successfully to handle similar issues (See "Notes", below). I've been wrestling with this script the last few days, and figure this is worth sharing in case there are others who need to tweak it but are stymied by the awk learning curve.
The basic idea is to have the script output to a text file, edit the file, then import into sqlite (More detailed instructions below).
You might have to experiment a bit, but at least you won't have to learn awk (though I've been trying and it's pretty fun...).
HOW TO
Run the script, exporting to a file (instead of passing directly
to sqlite3):
./mysql2sqlite -u root -pMySecretPassWord myDbase > sqliteimport.sql
Use your preferred text editing technique to clean up whatever mess
you've run into. For example, search/replace in sublimetext. (See the last note, below, for a tip.)
Import the cleaned up script into sqlite:
sqlite3 database.sqlite < sqliteimport.sql
NOTES:
I suspect what you're dealing with is an encoding problem -- that '-' represents a character that isn't recognized by, or means something different to, either your shell, the script (awk), or your sqlite database. Depending on your situation, you may not be able to finesse the problem (see the next note).
Be forewarned that this is most likely only going to work if the offending characters are embedded in text data (not just as text, but actual text content stored in a text field). If they're in a machine name (foreign key field, entity id, e.g.), binary data stored as text, or text data stored in a binary field (blob, eg), be careful. You could try it, but don't get your hopes up, and even if it seems to work be sure to test the heck out of it.
If in fact that '-' represents some unusual character, you probably won't be able to just type a hyphen into the 'search' field of your search/replace tool. Copy it from the source data (eg., open the file, highlight and copy to clipboard) then paste into the tool.
Hope this helps!
To convert mysql to sqlite3 you can use Navicom Premium.

Resources