SSIS: How to split excel cell value into SQL columns - sql-server

I have an excel file with data like this:
ID | FieldA | FieldB
1 ABC A, B
2 FGH W, Z
3 KLÑ G, K
What I want to do is to use SSIS and import this data into a SQL Table. The only problem is that this table has an structure like this:
ID | FieldA | FieldB1 | FieldB2
So, what I need to do is to split the "FieldB" Column in Excel and put it into FieldB1 and FieldB2 in SQL.
The result would be something like this:
ID | FieldA | FieldB1 | FieldB2
1 | ABC | A | B
2 | FGH | W | Z
3 | KLÑ | G | K
Any ideas on how to achieve this?

Unless I'm missing something, I'd just skip the header row and have it import the subsequent data correctly. Take a minute or so to assign column names and voilà, done.

Try selecting the relevant range, then running this:
Sub SplitColumn()
Dim strArr() as String
Dim cell as Range
For Each cell In Selection
cell.offset(0, 1).resize(1,2).value = split(cell.value,", ")
Next cell
End Sub
Now copy and paste your data wherever required.
Non-VBA alternative:
Enter the following formula in cell D2:
=LEFT(C2,FIND(",",C2)-1)
And in E2:
=RIGHT(C2,LEN(C2)-FIND(", ",C2)-1)
And autocomplete the rest of the column.

As I see here is a detailed explanation of your example.
On the other side you can use another approach - split one excel column on two columns in excel using excel formulas and import document with 4 columns.

you can use derived column and add as two new columns .First Column expression should be like this :
SUBSTRING([FieldB],1,FINDSTRING([FieldB],",",1) - 1)
and the second one like this :
SUBSTRING([FieldB],FINDSTRING([FieldB],",",1) + 1,LEN([FieldB])- FINDSTRING([FieldB],"_",1) )

Related

Use excel to summarise data from a column by identifier

I have a spreadsheet with a column called MRN (the identifier) and the drugs administered next to them. There are duplicates of the MRN in column A that correspond to different courses of drugs. What I'm hoping to do is to summarise all the drugs administered associated with one MRN in one line, removing all duplicates. It looks something like this.
| | A | B |
| 1 | MRN Item
| 2 | 1 cefoTAXime
| 3 | 1 ampicillin
| 4 | 1 cefoTAXime
| 5 | 1 vancomycin
| 6 | 1 cefTRIaxone
| 7 | 2 ampicillin
| 8 | 2 vancomycin
| 9 | 2 vancomycin
I have 3 different formulas. The first is to produce a list of MRNs that are all unique. The second is to pull all drugs by MRN and list them in one line. The third is to remove duplicates from this list. They are below (in order).
{=IFERROR(INDEX($A$2:$A$2885, MATCH(0,COUNTIF(D$1:$D1, $A$2:$A$2885),0 )),"")}
{=INDEX($A$2:$B$2885,SMALL(IF($A$2:$A$2885=$D2,ROW($A$2:$A$2885)),COLUMN(D:D))-4,2)}
{=IFERROR(INDEX($E$2:$AE$2, MATCH(0,COUNTIF(D$3:$D3, $E$2:$AE$2),0 )),"")}
*I know that I can edit the second one by adding IF(ISERROR ...) to remove NA and print blanks if drug not found, but want to keep the formulas as simple as possible at this time.
My problem is that second formula isn't pulling all the drugs by MRN, and in an ideal world I would be able to combine the second and third formula into one, but I am not sure how to. Here is a link to a test file that shows my issue and the formulas in action.
https://1drv.ms/x/s!ApoCMYBhswHzhooXnumW2iV7yx-JaA
I appreciate that there may be a better way to do this using python/R, and if that's possible then I'm more than happy to try, but I couldn't make any headway. Thanks for your help and suggestions.
If you could deal with a count of the number of courses per drug per MRN, you can do this with Power Query (aka Get & Transform in Excel 2016)
Starting with the data you provided on your worksheet, the results would look like:
M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"MRN", Int64.Type}, {"Item", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"MRN"}, {{"Count", each _, type table}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"MRN", "Item"}, {"Count.MRN", "Count.Item"}),
#"Pivoted Column" = Table.Pivot(#"Expanded Count", List.Distinct(#"Expanded Count"[Count.Item]), "Count.Item", "Count.MRN", List.NonNullCount)
in
#"Pivoted Column"

Pivot Table with merged date fields

I have a source data sheet, each data item having two date fields, startDate and endDate. What I would like to to in excel is generate a pivot table with row headers for each date from either of these columns, and two summary columns, one for Count Started, the other Count Ended.
For example, the following source data:
ItemId | startDate | endDate
1 | 6/1/16 | 6/2/16
2 | 6/2/16 | 6/3/16
3 | 6/1/16 | 6/3/16
Would produce a pivot table like this:
Date | Started | Ended
6/1/16 | 2 | 0
6/2/16 | 1 | 1
6/3/16 | 0 | 2
I doubt I would choose a PivotTable solution for this (that's unlike me!) but I think possible with a PT:
1) Create a PT from multiple consolidation ranges (example here) with ranges A:B and A:C (assuming ItemID is in A1).
2) After 7. select ColumnsA:C (in the new sheet) and apply Remove Duplicates (with all Columns checked).
3) Create a new PT from what remains (Column for COLUMNS, Value for ROWS, Count of Row for VALUES)
4) Right-click on startDate, Move, and click on first option.
5) In PivotTable Options..., Totals & Filters uncheck both Grand Totals and in Layout & Format, Format, check For empty cells show and enter 0.
6) Adjust labels to suit.

Excel Lookup IP addresses in multiple ranges

I am trying to find a formula for column A that will check an IP address in column B and find if it falls into a range (or between) 2 addresses in two other columns C and D.
E.G.
A B C D
+---------+-------------+-------------+------------+
| valid? | address | start | end |
+---------+-------------+-------------+------------+
| yes | 10.1.1.5 | 10.1.1.0 | 10.1.1.31 |
| Yes | 10.1.3.13 | 10.1.2.16 | 10.1.2.31 |
| no | 10.1.2.7 | 10.1.1.128 | 10.1.1.223 |
| no | 10.1.1.62 | 10.1.3.0 | 10.1.3.127 |
| yes | 10.1.1.9 | 10.1.4.0 | 10.1.4.255 |
| no | 10.1.1.50 | … | … |
| yes | 10.1.1.200 | | |
+---------+-------------+-------------+------------+
This is supposed to represent an Excel table with 4 columns a heading and 7 rows as an example.
I can do a lateral check with
=IF(AND((B3>C3),(B3 < D3)),"yes","no")
which only checks 1 address against the range next to it.
I need something that will check the 1 IP address against all of the ranges. i.e. rows 1 to 100.
This is checking access list rules against routes to see if I can eliminate redundant rules... but has other uses if I can get it going.
To make it extra special I can not use VBA macros to get it done.
I'm thinking some kind of index match to look it up in an array but not sure how to apply it. I don't know if it can even be done. Good luck.
Ok, so I've been tracking this problem since my initial comment, but have not taken the time to answer because just like Lana B:
I like a good puzzle, but it's not a good use of time if i have to keep guessing
+1 to Lana for her patience and effort on this question.
However, IP addressing is something I deal with regularly, so I decided to tackle this one for my own benefit. Also, no offense, but getting the MIN of the start and the MAX of the end is wrong. This will not account for gaps in the IP white-list. As I mentioned, this required 15 helper columns and my result is simply 1 or 0 corresponding to In or Out respectively. Here is a screenshot (with formulas shown below each column):
The formulas in F2:J2 are:
=NUMBERVALUE(MID(B2,1,FIND(".",B2)-1))
=NUMBERVALUE(MID(B2,FIND(".",B2)+1,FIND(".",B2,FIND(".",B2)+1)-1-FIND(".",B2)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2)+1)+1,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)-1-FIND(".",B2,FIND(".",B2)+1)))
=NUMBERVALUE(MID(B2,FIND(".",B2,FIND(".",B2,FIND(".",B2)+1)+1)+1,LEN(B2)))
=F2*256^3+G2*256^2+H2*256+I2
Yes, I used formulas instead of "Text to Columns" to automate the process of adding more information to a "living" worksheet.
The formulas in L2:P2 are the same, but replace B2 with C2.
The formulas in R2:V2 are also the same, but replace B2 with D2.
The formula for X2 is
=SUMPRODUCT(--($P$2:$P$8<=J2)*--($V$2:$V$8>=J2))
I also copied your original "valid" set in column A, which you'll see matches my result.
You will need helper columns.
Organise your data as outlined in the picture.
Split address, start and end into columns by comma (ribbon menu Data=>Text To Columns).
Above the start/end parts, calculate MIN FOR START, and MAX FOR END for all split text parts (i.e. MIN(K5:K1000) .
FORMULAS:
VALIDITY formula - copy into cell D5, and drag down:
=IF(AND(B6>$I$1,B6<$O$1),"In",
IF(OR(B6<$I$1,B6>$O$1),"Out",
IF(B6=$I$1,
IF(C6<$J$1, "Out",
IF( C6>$J$1, "In",
IF( D6<$K$1, "Out",
IF( D6>$K$1, "In",
IF(E6>=$L$1, "In", "Out"))))),
IF(B6=$O$1,
IF(C6>$P$1, "Out",
IF( C6<$P$1, "In",
IF( D6>$Q$1, "Out",
IF( D6<$Q$1, "In",
IF(E6<=$R$1, "In", "Out") )))) )
)))

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

sort 2d Array re-order the first column

For example:
Array
ID | Primary | Data2
------------------
1 | N | Something 1
2 | N | Something 2
3 | Y | Something 3
I'm trying to sort it based on the primary column and I want the "Y" to show first. It should bring all the other column at the top.
The end result would be:
Sorted Array
ID | Primary | Data2
------------------
3 | Y | Something 3
1 | N | Something 1
2 | N | Something 2
Is there a pre-made function for that. If not, how do we do this?
It is declared like this:
Dim Array(,) As String
regards,
I like using LINQ's OrderBy and ThenBy to order collections of objects. You just pass in a selector function to use to order the collections. For example:
orderedObjs = objs.OrderByDescending(function(x) x.isPrimary).ThenBy(function(x) x.id).ToList()
This code orders a collection first by the .isPrimary boolean, then by the id. Finally, it immediately evaluates the query into a List and assigns it to some variable.
Demo
There's a similar C# question whose solution applies just as well to VB. In short, you can use an overload of Array.Sort if you first split your 2D array into separate (1D) arrays:
Dim Primary() As String
Dim Data2() As String
// ...
Array.Sort(Primary,Data2)
This would reorder Data2 according to the Y/N sort of Primary, after which point you could then recombine them into a 2D array.

Resources