Perl Not Reading Full Values from Database - sql-server

I am currently using the DBI module to connect to my MSSQL Database to pull some data from a table.
The row I am trying to pull contains a lot of text (it is of type ntext and can contain up to 6Mb of text).
My query is a very simple one at the moment:
my $sql = "SELECT TOP 1 [reportRow] from UsageReport";
And I also have LongTruncOk enabled for the Database options.
After I execute the query, I want to display the rows.
while ( my #row = $sth->fetchrow_array ) {
print "#row\n";
}
Unfortunately it displays the data in a very weird manner with spaces in between every character and it only retrieves the first 40 characters.
< r e p o r t > < r e p o r t h e a d e r > < m o n t h > O c t o b e r 2 0
If I use File::Slurp to output #row to a file, it displays as
Is there a reason the data is being cut off and is being displayed weird?
Edit: How would I convert UTF16 into a format that doesn't insert spaces between characters?

You need to set LongReadLen in addition to LongTruncOk. You've told DBI it's ok to cut off long results from the DB, and now you need to tell it how long of a string you're willing to accept.

Related

Google Sheets Query - Not like partial match

So I have this formula, and it's working as intended but I would like to further refine the data.
Formula:
=QUERY('Users'!A1:Q, "Select A,B,C,F,G,O,Q where Q >= 180 and Q < 44223")
I have tried:
=QUERY('Users'!A1:Q, "Select A,B,C,F,G,O,Q where Q >= 180 and Q < 44223 and F not like '*text*'")
But that caused an error. Ideally, I would like to ommet any results that match partial texts in columns C and F. Any ideas?
EDIT:
Included a link to an example sheet
Ideally, I want the query to match everything in column F except 'Archived' (but needs to be wildcarded) and everything in column C except Delta (again, needs to be wildcarded)
try:
=QUERY(Sheet1!A1:Q6,
"select A,B,C,F,G,O,Q
where Q >= 180
and Q < 44223
and not lower(O) matches '.*archived.*'
and not lower(C) matches '.*delta.*'")

Google Sheet creating an IF or IFS

I am trying to auto populate information based on what I am inputting into one column to another.
For example: Column D has a list I choose from: CD, DD, CC, Closed. Based on what is input in Column D in each row I am looking to populate that information in each individual rows for column N though P.
D2=Closed, N though P should state NA.
D3=CD or DD or CC. N though P is left blank.
When I try IFS function it will state the NA if D is Closed but for CD, DD or CC it will spilt out False. How do I leave the N-P columns blank if D3 or other D columns are not equal to Closed?
try:
=ARRAYFORMULA(IF(D2:D="Closed", {"NA","NA","NA","NA"}, ))

How to check df rows that has a difference between 2 columns and then send it to another table to verify information

I’m very new to python and am trying really hard these last few days on how to go through a df row by row, and check each row that has a difference between columns dQ and dCQ. I just said != 0 since there could be a pos or neg value. Now if this is true, I would like to check in another table whether certain criteria are met. I'm used to working in R, where I could store the df into a variable and call upon the column name, I can't seem to find a way to do it in python. I posted all of the code I’ve been playing with. I know this is messy, but any help would be appreciated. Thank you!
I've tried installing different packages that wouldn't work, I tried making a for loop (I failed miserably), maybe a function? I’m not sure where to even look. I've never learned Python, I’m really doing my best watching videos online and reading on here.
import pyodbc
import PyMySQL
import pandas as pd
import numpy as np
conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};"
"Server=***-***-***.****.***.com;"
"Database=****;"
"Trusted_Connection=no;"
"UID=***;"
"PWD=***")
# cur = conn.cursor()
# cur.execute("SELECT TOP 1000 tr.dQ, po.dCQ,
tr.dQ - po.dCQ as diff FROM [IP].[dbo].
[vT] tr (nolock) JOIN [IP].[dbo].[vP] po
ON tr.vchAN = po.vchCustAN WHERE tr.dQ
!= po.dCQ")
# query = cur.fetchall()
query = "SELECT TOP 100 tr.dQ, po.dCQ/*, tr.dQ -
po.dCQ as diff */FROM [IP].[dbo].[vT]
tr (nolock) INNER JOIN [IP].[dbo].[vP] po ON
tr.vchAN = po.vchCustAN WHERE tr.dQ !=
po.dCQ"
df = pd.read_sql(query, conn)
#print(df[2,])
cursor = conn.cursor(PyMySQL.cursors.DictCursor)
cursor.execute("SELECT TOP 100 tr.dQ, po.dCQ/*,
tr.dQ - po.dCQ as diff */FROM [IP].[dbo].
[vT] tr (nolock) INNER JOIN [IP].[dbo].
[vP] po ON tr.vchAN = po.vchCustAN
WHERE tr.dQ != po.dCQ")
result_set = cursor.fetchall()
for row in result_set:
print("%s, %s" % (row["name"], row["category"]))
# if df[3] != 0:
# diff = df[1]-df[2]
# print(diff)
# else:
# exit
# cursor = conn.cursor()
# for row in cursor.fetchall():
# print(row)
#
# for record in df:
# if record[1] != record[2]:
# print(record[3])
# else:
# record[3] = record[1]
# print(record)
# df['diff'] = np.where(df['dQ'] != df["dCQ"])
I expect some sort of notification that there's a difference in row xx, and now it will check in table vP to verify we received this data's details. I believe i can get to this point, if i can get the first part working. Any help is appreciated. I'm sorry if this question is not clear, i will do my best to answer any questions someone may have. Thank you!
One solution could be to make a new column where you store the result of the diff between df[1] and df[2]. One note first. It might be more precise to either name your columns when you make the df, then reference them with df['name1'] and df['name2'], or use df.iloc[:,1] and df.iloc[:,2]. Also note that column numbers start with zero, so these would refer to the second and third columns in the df. The reason to use iloc is and the colons is to explicitly state that you want all rows and and column numbers 1 and 2. Otherwise, with df[1] or df[2] if your df was transposed that may actually refer to what you think of as the index. Now, on to a solution.
You could try
df['diff']=df.iloc[:,1]-df.iloc[:,2]
df['diff_bool']=np.where(df['diff']==0,False, True)
or you could combine this into one method
df['diff_bool']==np.where(df.iloc[:,1]-df.iloc[:,2]==0,False, True)
This will create a column in your df that says if there is a difference between columns one and two. You don't actually need to loop through row by row because pandas functions work like matrix math, so df.iloc[:,1]-df.iloc[:,2] will apply the subtraction row by row automatically.

Collation for URL

Warning: I know very little about database collations so apologies in advance if any of this is obvious...
We've got a database column that contains urls. We'd like to place a unique constraint/index on this column.
It's come to my attention that under the default db collation Latin1_General_CI_AS, dupes exist in this column because (for instance) the url http://1.2.3.4:5678/someResource and http://1.2.3.4:5678/SomeResource are considered equal. Frequently this is not the case... the kind of server this url points at is case sensitive.
What would be the most appropriate collation for such a column? Obviously case-sensitivity is a must, but Latin1_General? Are urls Latin1_General? I'm not bothered about a lexicographical ordering, but equality for unique indexes/grouping is important.
You can alter table to set CS (Case Sensitive) collation for this column:
ALTER TABLE dbo.MyTable
ALTER COLUMN URLColumn varchar(max) COLLATE Latin1_General_CS_AS
Also you can specify collation in the SQL statement:
SELECT * FROM dbo.MyTable
WHERE UrlColumn like '%AbC%' COLLATE Latin1_General_CS_AS
Here is a short article for reference.
The letters CI in the collation indicates case insensitivity.
For a URL, which is going to be a small subset of latin characters and symbols, then try Latin1_General_CS_AI
Latin1_General uses code page 1252 (1) and URL's allowed characters are included on that code page(2), so you can say that URLs are Latin1_General.
You just have to select the case sensitive option Latin1_General_CS_AS
rfc3986 says:
The ABNF notation defines its terminal values to be non-negative
integers (codepoints) based on the US-ASCII coded character set
[ASCII].
Wikipedia say that allowed chars are:
Unreserved
May be encoded but it is not necessary
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~
Reserved
Have to be encoded sometimes
! * ' ( ) ; : # & = + $ , / ? % # [ ]
It seems that they are not conflicts between this chars in compare operations. Also, you can use HASHBYTES function for make this comparation.
But this kind of operation is not the major problem. Major problem is that http://domain:80 and http://domain may be the same. Also with encoded characters, a url may seems different with encoded chars.
In my opinion, RDBMS will incorporate this kind of structures as new data types: url, phone number, email address, mac address, password, latitude, longitude, ... . I think that collation can helps but don't will solve this issue.

Read/Write/Find/Replace huge csv file

I have a huge (4,5 GB) csv file.. I need to perform basic cut and paste, replace operations for some columns.. the data is pretty well organized.. the only problem is I cannot play with it with Excel because of the size (2000 rows, 550000 columns).
here is some part of the data:
ID,Affection,Sex,DRB1_1,DRB1_2,SENum,SEStatus,AntiCCP,RFUW,rs3094315,rs12562034,rs3934834,rs9442372,rs3737728
D0024949,0,F,0101,0401,SS,yes,?,?,A_A,A_A,G_G,G_G
D0024302,0,F,0101,7,SN,yes,?,?,A_A,G_G,A_G,?_?
D0023151,0,F,0101,11,SN,yes,?,?,A_A,G_G,G_G,G_G
I need to remove 4th, 5th, 6th, 7th, 8th and 9th columns;
I need to find every _ character from column 10 onwards and replace it with a space ( ) character;
I need to replace every ? with zero (0);
I need to replace every comma with a tab;
I need to remove first row (that has column names;
I need to replace every 0 with 1, every 1 with 2 and every ? with 0 in 2nd column;
I need to replace F with 2, M with 1 and ? with 0 in 3rd column;
so that in the resulting file the output reads:
D0024949 1 2 A A A A G G G G
D0024302 1 2 A A G G A G 0 0
D0023151 1 2 A A G G G G G G
(both input and output should read one line per row, ne extra blank row)
Is there a memory efficient way of doing that with java(and I need a code to do that) or a usable tool for playing with this large data so that I can easily apply Excel functionality..
You need two things:
- Knowledge of Regular Expressions (aka Regex, Regexes)
- PowerGrep

Resources