This question already has an answer here:
Virtually blank column in array?
(1 answer)
Closed 5 months ago.
I have this file with an input table in Google Sheets.
Keys
Tags
V1
V2
kEp
tag1
30
12
PgZ
tag2
8
2
pac
tag3
15
21
This is what i did; I added REGEXREPLACE(QUERY({A1:D},"Select Col1"),".+"," ") to get the empty column I
=ArrayFormula({
QUERY({A1:D}," Select Col1,Col2,Col3 ",1),
REGEXREPLACE(QUERY({A1:D},"Select Col1"),".+"," "),
QUERY({A1:D}," Select Col1,Col2,Col4 ",1)})
The ask
Is there is a simple way with the same range refrence this case A1:D to add an empty column to the array {} like this &""& ?
If 'empty' doesn't really have to be that empty, this is pretty simple...
=QUERY({A1:D4,A1:B4},"select Col1,Col2,Col3,' ',Col5,Col6,Col4 label ' '''")
You can try-
={QUERY({A1:D}," Select Col1,Col2,Col3 where Col1 is not null",1),
FLATTEN(SPLIT((REPT(" |",COUNTA(A:A))),"|")),
QUERY({A1:D}," Select Col1,Col2,Col4 where Col1 is not null",1)}
And simplified formula-
={QUERY(A:D,"select A,B,C where A is not null",1),
FLATTEN(SPLIT((REPT(" |",COUNTA(A:A))),"|")),
QUERY(A:D,"Select A,B,D where A is not null",1)}
Get different cuts of the range through OFFSET and join them along with empty arrays crafted with MAKEARRAY:
=LAMBDA(rg,where,how_many,
{
OFFSET(rg,0,0,,where),
MAKEARRAY(ROWS(rg),how_many,LAMBDA(r,c,)),
OFFSET(rg,0,where,,COLUMNS(rg)-where)
}
)(A1:INDEX(D:D,COUNTA(D:D)),1,2)
Related
This question already has answers here:
Find maximum row per group in Spark DataFrame
(2 answers)
Closed 1 year ago.
I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09
Identifiant
Val
MAC26
36
MAC10
9
MAC02
2
MAC32
11
MAC09
37
MAC28
10
there are several way of doing it, here is a solution using a rank
from pyspark.sql import functions as F, Window
df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
"rnk = 1"
).drop("rnk").show()
+-----------+---+
|Identifiant|Val|
+-----------+---+
| MAC09| 37|
+-----------+---+
This question already has answers here:
LISTAGG function: "result of string concatenation is too long"
(14 answers)
Closed 6 years ago.
Is there some function that makes the same behavior of SYS.STRAGG in oracle but instead of returning VARCHAR (and being limited to the VARCHAR size), it returns a CLOB , and thus allows (virtually) infinite number of concatenated strings ?
for example , I have a query select x from y where z that returns 2.5M records and I want to return all these records concatenated together 1 shot
XML functions can be used for such aggregation but for 2.5M records it will be very slow.
Example:
SELECT
rtrim(
dbms_xmlgen.convert(
extract(
xmlroot(
xmlelement(
"x",
xmlagg(sys_xmlgen(object_name || ', '))
),
version '1.0'),
'/x/ROW/text()').getclobval(),
1),
', ') aggregated_data
FROM
all_objects
You might consider to use LISTAGG for pre-aggregation of small row groups into VARCHARs smaller than 4000/32767 bytes and then use the XML aggregation for the final result.
I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.
I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated
I have table of over 4 million rows and accidentally in one column there is more data than needed.
For example instead of ABC there is ABC DEFG.
How can I remove that N symbols using TSQL? Please note that I want to delete this characters from database, NOT just select substring. Thank you
UPDATE mytable SET column=LEFT(column, LEN(column)-5)
Removes the last 5 characters from the column (every row in mytable)
I got the answer to my own question, ant this is:
select reverse(stuff(reverse('a,b,c,d,'), 1, N, ''))
Where N is the number of characters to remove. This avoids to write the complex column/string twice
You could do it using SUBSTRING() function:
UPDATE table SET column = SUBSTRING(column, 0, LEN(column) + 1 - N)
Removes the last N characters from every row in the column
This should do it, removing characters from the left by one or however many needed.
lEFT(columnX,LEN(columnX) - 1) AS NewColumnName
You can use function RIGHT [https://www.w3schools.com/sql/func_sqlserver_right.asp]
RIGHT( "string" , number_of_chars_from_right_to_left)
That should look like this:
Query: SELECT RIGHT('SQL Tutorial', 3) AS ExtractString;
Result: "ial"