How to load a csv into a table in Q? - database

Very new to Q and I am having some issues loading my data into a table following the examples on the documentation.
I am running the following code:
table1: get `:pathname.csv
While it doesn't throw an error, when I run the following command nothing comes up:
select * from table1
Or when selecting a specific column:
select col1 from table1
If anyone could guide me in the right direction, that would be great!
Edit: This seems to work and retain all my columns:
table1: (9#"S";enlist csv) 0: `:data.CSV

You're going to need to use 0: https://code.kx.com/q/ref/filenumbers/#load-csv
The exact usage will depend on your csv, as you need to define the datatypes to load each column as.
As an example, here I have a CSV with a long, char & float column:
(kdb) chronos#localhost ~/Downloads $ more example.csv
abc,def,ghi
1,a,3.4
2,b,7.5
3,c,88
(kdb) chronos#localhost ~/Downloads $ q
KDB+ 3.6 2018.10.23 Copyright (C) 1993-2018 Kx Systems
l64/ 4()core 3894MB chronos localhost 127.0.0.1 EXPIRE 2019.06.15 jonathon.mcmurray#aquaq.co.uk KOD #5000078
q)("JCF";enlist",")0:`:example.csv
abc def ghi
-----------
1 a 3.4
2 b 7.5
3 c 88
q)meta ("JCF";enlist",")0:`:example.csv
c | t f a
---| -----
abc| j
def| c
ghi| f
q)
I use the chars "JCF" to define the datatypes long, character & float respectively.
I enlist the delimiter (",") to indicate that the first row of the CSV contains the headers for the columns. (Otherwise, these can be supplied in your code & the table constructed)
On a side note, note that in q-sql, the * is not necessary as in standard SQL; you can simply do select from table1 to query all columns

Related

T-SQL: How to break a column with concatenated string into multiple rows?

I'm working with a dataset where most columns are normal, but one has one or more concatenated values jammed into a single string, using a '|' as a delimiter between values. I need to reshape it so that there's one row per existing row, per concatenated value. There are 60 potential values--that I know of-- in the concatenated string, and most rows have between 0 and 10 values smashed into the string. It's also going to be necessary to repeat this process over the next few months, and it's possible the list will change/ add new members.
I'm going to have to do this on an unknown number of future tables--at least 4 more--so if there's an approach I can easily repurpose it will be MUCH better. Also, I'm using t-SQL, but l could probably bring in R or something if that would help. Any ideas?
If you have a table containing the 60 possible values, you could join to it with tsql something like this:
select table1.id, potentialvalues.value
from table1
inner join potentialvalues
on charindex('|'+potentialvalues.value+'|', '|'+table1.concatField+'|')>0
Note: Added the pipes to beginning and end of the concatfield so that it can match the first and last values in the field. So, if your concatfield is something like '1|2|10' on a record it would be able to match '|1|', '|2|' and '|10|'.
In R, you could use dplyr and tidyr functions to expand your rows by separating each combined string at the pipe symbol. This has the advantage that it can be applied to your table without knowing what the piped combinations are in advance.
library(dplyr)
library(tidyr)
separate_rows(df, string, sep = "[|]") %>%
mutate(string = trimws(string))
The trimws function from base R is used to remove any extra whitespace that may be between your piped string components. Toy test data and results shown below.
Test data
df = data.frame(key = c("A", "B", "C", "D"),
string = c("Simple", "Piped 1 | Piped 2", "Simple 2", "Piped A1 | Piped A2 | Piped A3"), stringsAsFactors = FALSE)
> df
key string
1 A Simple
2 B Piped 1 | Piped 2
3 C Simple 2
4 D Piped A1 | Piped A2 | Piped A3
Result
key string
1 A Simple
2 B Piped 1
3 B Piped 2
4 C Simple 2
5 D Piped A1
6 D Piped A2
7 D Piped A3

LTRIM and RTRIM Truncating Floating Point Number

I am experiencing what I would describe as entirely unexpected behaviour when I pass a float value through either LTRIM or RTRIM:
CREATE TABLE MyTable
(MyCol float null)
INSERT MyTable
values (11.7333335876465)
SELECT MyCol,
RTRIM(LTRIM(MyCol)) lr,
LTRIM(MyCol) l,
RTRIM(MyCol) r
FROM MyTable
Which gives the following results:
MyCol | lr | l | r
--------------------------------------------
11.7333335876465 | 11.7333 | 11.7333 | 11.7333
I have observed the same behaviour on SQL Server 2014 and 2016.
Now, my understanding is that LTRIM and RTRIM should just strip off white space from a value - not cast it/truncate it.
Does anyone have an idea what is going on here?
Just to explain the background to this. I am generating SQL queries using the properties of a set of C# POCOs (the results will be used to generate an MD5 hash that will then be compared to an equivalent value from an Oracle table) and for convenience was wrapping every column with LTRIM/RTRIM.
Perhaps you can use format() instead
Declare #F float = 11.7333335876465
Select format(#F,'#.##############')
Returns
11.7333335876465

Can I set rules for string comparison in SQL? (or do I need to hardcode using CASE WHEN)

I need to make a comparison for ratings in two points in time and indicate if the change was upwards,downwards or stayed the same.
For example:
This would be a table with four columns:
ID T0 T0+1 Status
1 AAA AA Lower
2 BB A Higher
3 C C Same
However, this does not work when applying regular string comparison, because in SQL
A<B
B<BBB
I need
A>B
B<BBB
So my order(highest to lowest): AAA,AA,A,BBB,BB,B
SQL order(highest to lowest): BBB,BB,B,AAA,AA,A
Now I have 2 options in mind, but I wonder if someone know a better one:
1) Use CASE WHEN statements for all the possibilities of ratings going up and down ( I have more values than indictaed above)
CASE WHEN T0=T0+1 then 'Same'
WHEN T0='AAA' and To+1<>'AAA' then 'Lower'
....adress all other options for rating going down
ELSE 'Higher'
However, this generates a very large number of CASE WHEN statements.
2) My other option requires generating 2 tables. In table 1 I use case when statements to assign values/rank to the ratings.
For example:
CASE WHEN T0='AAA' then 6
CASE WHEN T0='AA' then 5
CASE WHEN T0='A' then 4
CASE WHEN T0='BBB' then 3
CASE WHEN T0='BB' then 2
CASE WHEN T0='B' then 1
The same for T0+1.
Then in table 2 I use a regular compariosn between column T0 and Column T0+1 on the numeric values.
However, I am looking for a solution where I can do it in one table (with as little lines as possible), and optimally never really show the ranking column.
I think a nested statement would be the best option, but it did now work for me.
Anybody has suggestions?
I use SQL Server 2008.
If you are using Credit Rating, this is very likely that this is not just about AAA > AA or BBB > BB.
Whether you are using one agency or another, it could also be AA+ or Aa1 for long term, F1+ for short term or something else in different contexts or with other agencies.
It is also often requiered to convert data from one agency to other agencies Rating.
Therefore it is better to use a mapping table such as:
Id | Rating
0 | AAA
1 | AA+
2 | AA
3 | AA-
4 | A+
5 | A
6 | A-
7 | BBB+
Using this table, you only have to join the rating in your data table with the rating in the mapping table:
SELECT d.Rating_T0, d.Rating_T1
CASE WHEN d.Rating_T0 = d.Rating_T1 THEN '='
WHEN m0.id < m1.id THEN '<'
WHEN m0.id > m1.id THEN '>'
END
FROM yourData d
INNER JOIN RatingMapping m0
ON m0.Rating= d.Rating_T0
INNER JOIN RatingMapping m1
ON m1.Rating= d.Rating_T1
If you only store the Rating id in you data table, you will not only save space (1 byte for tinyint versus up to 4 chars) but will also be able to compare without the JOIN to the mapping table.
SELECT d.Rating_Id0, d.Rating_Id1
CASE WHEN d.Rating_Id0 = d.Rating_Id1 THEN '='
WHEN d.Rating_Id0 < d.Rating_Id1 THEN '<'
WHEN d.Rating_Id0 > d.Rating_Id1 THEN '>'
END
FROM yourData d
The JOIN would only be requiered when you want to display the actual Rating value such as AAA for Rating_ID = 0.
You could also add an agency_Id to the Mapping table. This way, you can easily choose which Notation agency you want to display and easily convert between Agency 1 and Agency 2 or Agency 3 (ie. Id 1 => S&P and Id 2 => Fitch, Id 3 => ...)

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

sphinx - Column count doesn't match

I have the following in my sphinx
mysql> desc rec;
+-----------+---------+
| Field | Type |
+-----------+---------+
| id | integer |
| desc | field |
| tid | uint |
| gid | uint |
| no | uint |
+-----------+---------+
And I ran the following successfully in sphinx sql
replace into rec VALUES ('24','test test',1,1, 1 );
But when I run in the C mysql API I get this error
Column count doesn't match value count at row 1
the c code is this
if (mysql_query(con, "replace into rec VALUES ('24','test test',1,1, 1 )") )
{
fprintf(stderr, "%s\n", mysql_error(con));
mysql_close(con);
exit(1);
}
Please note that the C program is connecting to the sphinx sql with no issues
One problem may be that you are quoting the integer for the id column. I would try taking out the single quotes around the 24. The column named desc is also concerning, since that is a reserved word in MySQL.
A good best practice is to always specify the column names, even if you are inserting into all columns. The reason is that you may want to alter the table later to add a column and you don't necessarily want to go back and change all your code to match the new structure. It also makes your code clearer since you don't have to reference the table structure to know what the values mean and it helps in case a tool like Sphinx is using a different order for the columns than you expect. Try changing your code to this, which specifies the columns and quotes them (mysql uses backticks for quotes) and also removes the quotes around the value for the id column:
if (mysql_query(con, "replace into rec (`id`, `desc`, `tid`, `gid`, `no`) VALUES (24, 'test test', 1, 1, 1)") )

Resources