I have some SQL deadlocks I am trying to capture mediaName from. The deadlock report is in XML but the attribute i need is buried in XML, then SQL, then XML again. Here is an example.
XPATH for where the SQL starts is /deadlock/process-list/process/inputbuf, then the SQL is:
SET DEADLOCK_PRIORITY 8;
EXEC spM_Ext_InsertUpdateXML N'<mediaRecords><media
title="This Is the title" mediaType="0"
creationTime="2018-03-16T00:59:43" origSOM="01:00:00;00" notes="Air Date:
2018-03-18 
Air Window: 3 
" mediaName="This is what i need"
><mediaInstances><mediaInstance directory="here"
duration="00:28:40;11" version="1" position="00:00:00;00" mediaSetId="34"
creationStartTime="2018-03-16T00:59:43;25" creationEndTime="2018-03-
16T00:59:43;25"/></mediaInstances><properties><
classifications><classification category="HD" classification="Content
Resolution"/></classifications><markups><markup
name=""><Item duration="00:00:10;00" orderNo="1"
type="Dynamic" som="00:59:50;00" comment=""
name="Segment"/></markup><markup
name="Segment"><markupItem duration="00:08:41;10" orderNo="2"
type="Dynamic" som="01:00:00;00" comment="Main Title and Segment 1 |
ID:SEDC" name="Segment"/></markup><markup
name="Black"><markup
See how the XML isnt using < and > for the elements but the < and > which adds complexity.
I am trying to extract only mediaName from this report but cant get past the above mentioned XPath with powershell. Was hoping someone might have an idea. I was using
$xml = [xml](Get-Content "C:\Users\user\desktop\test.xml")
$xml.SelectNodes('/deadlock/process-list/process/inputbuf') | select mediaName
I have also tried piping select-xml to where-object but I don't think I am using the right $_.[input]
With the help of tomalak and the answer below this is the fixed and working parsing script.
#report file location, edited by user when needed
$DeadlockReport = "C:\Users\User\Desktop\xml_report1.xml"
# Create object to load the XML from the deadlock report and find the SQL within
$xml = New-Object xml
$xml.Load($DeadlockReport)
$inputbuf = $xml.SelectNodes('//deadlock/process-list/process/inputbuf')
$value = $inputbuf.'#text'
#find the internal XML and replace bad values, SQL, and truncation with RE
$value = $value -replace "^[\s\S]*?N'","" -replace "';\s*$","" -replace "<markup.*$","</properties></media></mediaRecords>"
#append root elements to $value
$fix = "<root>" + $value + "</root>"
#Load the XML after its been corrected
$payload.LoadXml($fix)
#find the nodes in the xml for mediaName
$mediaName = $payload.SelectNodes('//root/mediaRecords/media/#mediaName')
#iterate through and return all media names.
foreach($i in $mediaName)
{
return $mediaName
}
What you have is:
an XML file,
which contains a string value,
which is SQL,
which contains another string value,
which is XML again.
So let's peel the onion.
First-off, please never load XML files like this
# this is bad code, don't use
$xml = [xml](Get-Content "C:\Users\user\desktop\test.xml")
XML has sophisticated file encoding detection, and you are short-circuiting that by letting Powershell load the file. This can lead to data breaking silently because Powershell's Get-Content has no idea what actual encoding of the XML file is. (Sometimes the above works, sometimes it doesn't. "It works for me" doesn't mean that you're doing it right, it means that you're being lucky.)
This is the correct way:
$xml = New-Object xml
$xml.Load("C:\Users\user\desktop\test.xml")
Here the XmlDocument object will take care of loading the file and transparently adapt to any encoding it might have. Nothing can break and you don't have to worry about file encodings.
Second, don't let the looks of the XML file in a text editor deceive you. As indicated, /deadlock/process-list/process/inputbuf contains a string as far as XML is concerned, the < and > and all the rest will be there when you look at the actual text value of the element.
$inputbuf = $xml.SelectSingleNode('/deadlock/process-list/process/inputbuf')
$value = $inputbuf.'#text'
Write-Host $value
Would print something like this, which is SQL:
SET DEADLOCK_PRIORITY 8;
EXEC spM_Ext_InsertUpdateXML N'<mediaRecords><media
title="This Is the title" mediaType="0"
creationTime="2018-03-16T00:59:43" origSOM="01:00:00;00" notes="Air Date:
2018-03-18
Air Window: 3
" mediaName="This is what i need"
><mediaInstances><mediaInstance directory="here"
duration="00:28:40;11" version="1" position="00:00:00;00" mediaSetId="34"
creationStartTime="2018-03-16T00:59:43;25" creationEndTime="2018-03-
16T00:59:43;25"/></mediaInstances><properties><
classifications><classification category="HD" classification="Content
Resolution"/></classifications><markups><markup
name=""><Item duration="00:00:10;00" orderNo="1"
type="Dynamic" som="00:59:50;00" comment=""
name="Segment"/></markup><markup
name="Segment"><markupItem duration="00:08:41;10" orderNo="2"
type="Dynamic" som="01:00:00;00" comment="Main Title and Segment 1 |
ID:SEDC" name="Segment"/></markup><markup
name="Black"><markup ...
</mediaRecords>';
And the XML you are interested in is actually a string inside this SQL. If the SQL follows this pattern...
SET DEADLOCK_PRIORITY 8;
EXEC spM_Ext_InsertUpdateXML N'<...>';
...we need to do three things in order to get to the XML payload:
Remove the enclosing SQL statements
Replace any '' with ' (because the '' is the escaped quote in SQL strings)
Pray that part in-between does not contain any other SQL expressions
So
$value = $value -replace "^[\s\S]*?N'","" -replace "';\s*$","" -replace "''","'"
would remove everything up to and including N' and the '; at the end, as well as replace all the duplicated single quotes (if any) with normal single quotes.
Adapt the regular expressions as needed. Replacing the SQL parts with regex isn't exactly clean, but if the expected input is very limited, like in this case, it'll do.
Write-Host $value
Now we should have a string that is actually XML. Let's parse it. This time, it's already in our memory, there isn't any file encoding to pay attention to. So it's actually all-right if we cast it to XML directly:
$payload = [xml]$value
And now we can query it for the value you are interested in:
$mediaName = $payload.SelectSingleNode("/mediaRecords/media/#mediaName")
Write-Host $mediaName
I want to generate a .CSV file based on data in a datatable. I know this question has been asked before, but I can't find any examples of how to specify what the separator should be.
For example if I have a table and a query like this, this is what I want the output to be:
MyTable:
Id - Int Key
NickName- NvarChar
REALName - NvarChar
Number - NvarChar
Updated - bit
Query:
SELECT *
FROM MyTable
WHERE Updated = 1
Output:
I want my output to use | as the field separator. So the output in the CSV file will look something like this:
Id|NickName|REALName|Number|Updated
1|NickNameHere|RealNameHere|0798548558|1
2|NickNameHere2|RealNameHere2|079948558|1
and so on.
The following query generate CSV data with '|' separator
select 'Id|NickName|REALName|Number|Updated'
union all
select
cast (Id as nvarchar) + '|'
+ NickName + '|'
+ RealNameHere + '|'
+ Number + '|'
+ cast (Updated as nvarchar)
from MyTable
WHERE Updated = 1
Saving output results to TextFile:
Method 1: from within SSMS
From ssms menu: query -> results to -> results to file
Method 2: using Powershell Invoke-SqlCmd
Invoke-SqlCmd -Query "your query" | Export-Csv "path\to\csvfile"
Method 3: using SqlCmd command line tool:
sqlcmd -q "your query" -o "path\to\csvfile" -S server -P password -d database
I have been twiddling with a fairly simple idea to export ReportServer reports from the underlying database and then find its dependent stored procedures and export them as well.
However, when testing initially I found that the XML data for the report itself is truncated in the standard way I export things to files, and I think I may be using an incorrect method.
The code is fairly simple at this point, and I am using a simplified report called "ChartReport":
Import-Module 'sqlps'
$saveloc = "$home\savedata\filename.txt"
$dataquery = #"
DECLARE #name NVARCHAR(MAX)='ChartReport',
#path NVARCHAR(MAX) = '/ChartReport'
SELECT CAST(CAST(c.Content AS VARBINARY(MAX)) AS XML) [ReportData], c.Name, c.Path
FROM ReportServer.dbo.Catalog c
WHERE c.Name = #name
AND c.Path LIKE #path+'%'
"#
Invoke-SQLCMD -Query $dataquery | select ReportData | Out-File $saveloc
I have verified the query returns XML (The underlying XML file itself is over 25000 characters, and I would be happy to provide a link to it if anyone is interested), however when I save the file I get something like:
Column1
<Report xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesigner" xmlns:cl="http://schemas.microsof...
I have attempted to use some of the ideas already posted on SO, such as:
> $somefile Powershell 2: Easy way to direct every bit of output to a file?
out-file and specifying width Powershell Add-Content Truncating Output
Using the format-table with -autosize and -wrap
Each of these fail at some point (though the format-table method gets pretty far before it truncates).
I would definitely consider some sort of XML specific solution, but really I think it is just that I am missing some information. As far as I am considering, this is a file of "stuff" and I want to write said file to the disk after it is loaded into the object.
Would iterating over some sort of line break and writing each line of the object to a file be the idiomatic answer?
Use -MaxCharLength parameter of Invoke-SQLCMD command. By default it 4000.
See Invoke-SqlCmd doesn't return long string?
we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:
insert overwrite directory '/home/output.csv' select books from table;
When I run it, it says it completeld successfully but I can never find the file. How do I find this file or should I be extracting the data in a different way?
Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.
According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.
Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.
A slight modification (adding the LOCAL keyword) will store the data in a local directory.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;
When I run a similar query, here's what the output looks like.
[lvermeer#hadoop temp]$ ll
total 4
-rwxr-xr-x 1 lvermeer users 811 Aug 9 09:21 000000_0
[lvermeer#hadoop temp]$ head 000000_0
"row1""col1"1234"col3"1234FALSE
"row2""col1"5678"col3"5678TRUE
Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:
hive -e 'select books from table' > /home/lvermeer/temp.tsv
That gives me a tab-separated file that I can use. Hope that is useful for you as well.
Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):
hive -e 'select books from table' | sed 's/[[:space:]]\+/,/g' > /home/lvermeer/temp.csv
This is most csv friendly way I found to output the results of HiveQL.
You don't need any grep or sed commands to format the data, instead hive supports it, just need to add extra tag of outputformat.
hive --outputformat=csv2 -e 'select * from <table_name> limit 20' > /path/toStore/data/results.csv
You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.
You also might have to do some trickery to convert the files from '\001' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.
You can use INSERT … DIRECTORY …, as in this example:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees
WHERE se.state = 'CA';
OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.
If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.
I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.
To make the column data tsv safe, I replaced all \t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:
hive -e 'tab_replaced_hql_query' | python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)\nfor row in reader: writer.writerow(row)")'
This created a perfectly valid csv. Hope this helps those who come looking for this solution.
You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )
for ex:
hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv
I had a similar issue and this is how I was able to address it.
Step 1 - Loaded the data from Hive table into another table as follows
DROP TABLE IF EXISTS TestHiveTableCSV;
CREATE TABLE TestHiveTableCSV
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' AS
SELECT Column List FROM TestHiveTable;
Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension
Start-AzureStorageBlobCopy
-DestContext $destContext
-SrcContainer "Source Container"
-SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
-DestContainer "Destination Container"
-DestBlob "CSV/TestHiveTable.csv"
hive --outputformat=csv2 -e "select * from yourtable" > my_file.csv
or
hive --outputformat=csv2 -e "select * from yourtable" > [your_path]/file_name.csv
For tsv, just change csv to tsv in the above queries and run your queries
The default separator is "^A". In python language, it is "\x01".
When I want to change the delimiter, I use SQL like:
SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table
Then, regard delimiter+"^A" as a new delimiter.
I tried various options, but this would be one of the simplest solution for Python Pandas:
hive -e 'select books from table' | grep "|" ' > temp.csv
df=pd.read_csv("temp.csv",sep='|')
You can also use tr "|" "," to convert "|" to ","
Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.
In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.
It will:
Login to bastion host.
pbrun.
kinit.
beeline (with your query).
Save echo from beeline to a file on Windows.
Execute it like this:
set PROXY_HOST=your_bastion_host
set SERVICE_USER=you_func_user
set LINUX_USER=your_SOID
set LINUX_PWD=your_pwd
python hh.py --query_file=query.sql
Just to cover more following steps after kicking off the query:
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
In my case, the generated data under temp folder is in deflate format,
and it looks like this:
$ ls
000000_0.deflate
000001_0.deflate
000002_0.deflate
000003_0.deflate
000004_0.deflate
000005_0.deflate
000006_0.deflate
000007_0.deflate
Here's the command to unzip the deflate files and put everything into one csv file:
hadoop fs -text "file:///home/lvermeer/temp/*" > /home/lvermeer/result.csv
I may be late to this one, but would help with the answer:
echo "COL_NAME1|COL_NAME2|COL_NAME3|COL_NAME4" > SAMPLE_Data.csv
hive -e '
select distinct concat(COL_1, "|",
COL_2, "|",
COL_3, "|",
COL_4)
from table_Name where clause if required;' >> SAMPLE_Data.csv
This shell command prints the output format in csv to output.txt without the column headers.
$ hive --outputformat=csv2 -f 'hivedatascript.hql' --hiveconf hive.cli.print.header=false > output.txt
Use the command:
hive -e "use [database_name]; select * from [table_name] LIMIT 10;" > /path/to/file/my_file_name.csv
I had a huge dataset whose details I was trying to organize and determine the types of attacks and the numbers of each type. An example that I used on my practice that worked (and had a little more details) goes something like this:
hive -e "use DataAnalysis;
select attack_cat,
case when attack_cat == 'Backdoor' then 'Backdoors'
when length(attack_cat) == 0 then 'Normal'
when attack_cat == 'Backdoors' then 'Backdoors'
when attack_cat == 'Fuzzers' then 'Fuzzers'
when attack_cat == 'Generic' then 'Generic'
when attack_cat == 'Reconnaissance' then 'Reconnaissance'
when attack_cat == 'Shellcode' then 'Shellcode'
when attack_cat == 'Worms' then 'Worms'
when attack_cat == 'Analysis' then 'Analysis'
when attack_cat == 'DoS' then 'DoS'
when attack_cat == 'Exploits' then 'Exploits'
when trim(attack_cat) == 'Fuzzers' then 'Fuzzers'
when trim(attack_cat) == 'Shellcode' then 'Shellcode'
when trim(attack_cat) == 'Reconnaissance' then 'Reconnaissance' end,
count(*) from actualattacks group by attack_cat;">/root/data/output/results2.csv
I am trying to count the columns from a sqlite db using the sqlite command line tool. To test it I created a sample db like this:
c:\>sqlite.exe mydb.sqlite "create table tbl1(one varchar(10), two smallint);"
Now lets say i don't know that the table tbl1 has 2 columns, how can I find that using a query from the command line tool?
Run:
pragma table_info(yourTableName)
See:
http://www.sqlite.org/pragma.html#pragma_table_info
for more details.
Here is a way I found useful under Linux. Create a bash script file columns.sh and ensure it has execute permissions and copy - paste the following code.
columns() { for table in $(echo ".tables" | sqlite3 $1); do echo "$table $(echo "PRAGMA table_info($table);" | sqlite3 $1 | wc -l)"; done ;}
Type the following command, in terminal, on the first line to return results
$ columns <database name>
<table1> <# of columns>
<table2> <# of columns>
Note: Ensure database is not corrupted or encrypted.
source: http://www.quora.com/SQLite/How-can-I-count-the-number-of-columns-in-a-table-from-the-shell-in-SQLite
UPDATE
Here is an interesting URL for Python Script Solution
http://pagehalffull.wordpress.com/2012/11/14/python-script-to-count-tables-columns-and-rows-in-sqlite-database/