Can someone tell me how I code in SQL Server so that I am looking in a varchar text column to see if it contains a numerical range within the text?
For example, I'm looking for columns that contain anything between 100000 and 999999. The column may have a value like
this field contains a number `567391`
so I want to select that one, but not if it had
this field contains a number `5391`
For your given example, you can check the digits:
where col like '%[^0-9][1-9][0-9][0-9][0-9][0-9][0-9][^0-9]%'
This is not a generic solution, but it works for your example. In general, parsing strings in SQL Server is difficult. It is better to extract the values you are interested in when loading the data, so the relevant values are correctly in their own columns.
Related
I have my data on Excel and I uploaded in Google Sheets so I can use Google Data Studio, everything in Data Studio so far working good, but I am having trouble in a case.
Few of my fields in Data source have values in numbers from 1 to 1000+ and I chose cell formatting in excel as Numbers with 1000 Separator(,).
In Data Studio data source I changed same field with "Number" but when I try to create a simple Scoreboard on Data Studio, it seems like it's only doing SUM with values lower than 1000, meaning any value with 1000 or above are being skipped.
I can feel this is because of separator(,) which is troubling and I can use:
CAST(REGEXP_REPLACE)
but I want to know that why it's making trouble, even after choosing correct Cell Format?
Sample Data Link: https://datastudio.google.com/reporting/abafc2cb-9033-4851-9f72-02896a91384c
The field ITEM SUPPLIED is a text field and contains numbers formated as text. However, there is a 1000 separator (,) in the text.
Therefore, every , in the field has to be removed. This can be done by the REPLACE function. Then the CAST converts the text to a number.
CAST(REPLACE(ITEM SUPPLIED, ",", "") AS number )
Link to example file:
https://docs.google.com/spreadsheets/d/1dCQSHWjndejkyyw-chJkBjfHgzEGYoRdXmPTNKu7ykg/edit?usp=sharing
The tab "Source data" contains the data to be used in the query on the tab "Query output". The tab "Desired result" shows what I would like the end result to look like.
The goal I'm trying to achieve is to have the formula in cell A2 on the tab "Query output" to populate the data in all four of the columns, so that it looks exactly like the "Desired result" tab. I know I can get the same result simply by entering additional formulas in C2 and D2, but this is not the objective, I need the results to come specifically from the single formula in A2.
The information in the "Additional data 1" column should simply repeat the word "Test" for every row that contains data in the first two columns. The information in the "Additional data 2" column should simply repeat the data from cell 'Source data'!A1 for every row that contains data in the first two columns.
Please feel free to edit the example file as it only contains dummy data. If you like, you can copy the tab "Query output" to create your own working formula for illustrative purposes.
EDIT:
I'm thinking along the lines of creating an array that consists of the required data for the columns "Additional data 1" and "Additional data 2" and then combining that array with the array of the query result which provides the first two columns. I've been experimenting with this in various ways, but so far the only result I have achieved is an error on the first cell of the query results. I also have no idea yet how I could make sure that the second array contains an equal amount of rows to the query result.
You can add static data into query:
=QUERY('Source data'!A3:B,"SELECT A,B, 'Test', '" & 'Source data'!A1 &"' WHERE A IS NOT NULL LABEL A '', B '', 'Test' '', '" & 'Source data'!A1 &"' ''")
Many thanks to #basic for the provided assistance! The insights were a great help to solving my issue. That said, I have muddled along a bit, and I've come up with a slightly different solution which I find better suited as it gives true blank values instead of a column filled with spaces.
First of all, instead of querying directly on the source data, I built an array and queried on that. I used the two existing columns (A and B) from the source data and added a third column to the array which does not exist in the source data. In order to make sure that the third column would consist of blank values, I used the IFERROR formula.
=IFERROR(0/0)
The formula above returns a blank because dividing by zero forces an error and the IFERROR method returns a blank unless an alternative return value is specified.
In order to be able to use this formula in an array however, it had to be tweaked slightly, because as it is it would only return a single blank cell value instead of a column of blank values. To do this, I used an already existing column from the source data, and then encapsulated it in an ARRAYFORMULA.
=ARRAYFORMULA(IFERROR('Source data'!A3:A/0))
Using this, the resulting array has the following formula.
=ARRAYFORMULA({'Source data'!A3:A,'Source data'!B3:B,IFERROR('Source data'!A3:A/0)})
This creates an array consisting of the two original columns A and B from the source data, plus an additional third column filled with blank values. This array can now be queried upon, and using the tricks previously provided by #basic the desired result as specified in the original question can be achieved.
Due to the query now being used upon a user-defined array, the columns in the SELECT statement now have to be referred to as Col1, Col2, Col3, instead of A, B, C. The final formula now looks like this.
=QUERY(ARRAYFORMULA({'Source data'!A3:A,'Source data'!B3:B,IFERROR('Source data'!A3:A/0)}),"SELECT Col1,Col2,'Test',Col3,'"&'Source data'!A1&"' WHERE Col1 IS NOT NULL LABEL 'Test' '','"&'Source data'!A1&"' ''")
I hope this information may prove of use to someone else as well.
I have a transformation as
where the text file is in the following format:
For each of the t_cmp(the number of t_cmp is not known prior) in the text file, I want to execute Read Company
But it is giving error as
Can anyone please tell me where am I going wrong?
You need to pass 3 rows, each with 1 field, instead of a single row with 3 fields.
The number of fields must match the number of parameters of your query.
So, in short, transpose your data. Either:
read line as a single field then use Split field to rows
or read as now and use Row normalizer
Both approaches should work.
I am processing a flat file in SSIS and one of the requirements is that if a given row contains an incorrect number of delimiters, fail the row but continue processing the file.
My plan is to load the rows into a single column in SQL server, but during the load, I’d like to test each row during the data flow to see if it has the right number of delimiters, and add a derived column value to store the result of that comparison.
I’m thinking I could do that with a script task component, but I’m wondering if anyone has done that before and what would be the best method? If a script task component would be the way to go, how do I access the raw row with its delimiters inside the script task?
SOLUTION:
I ended up going with a modified version of Holder's answer as I found that TOKENCOUNT() will not count null values per this SO answer. When two delimiters are not separated by a value, it will result in an incorrect count (at least for my purposes).
I used the following expression instead:
LEN(EntireRow) - LEN(REPLACE(EntireRow, "|", ""))
This results in the correct count of delimiters in the row, regardless of whether there's a value in a given field or not.
My suggestion is to use Derrived Column to do your test
And then add a Conditional Split to decide if you want to insert the rows or not.
Something like this:
Use the TokenCount function in the Derrived Column box to get number of columns like this: TOKENCOUNT(EntireRow,"|")
I have got a excel sheet which inserts data in to SQL Server, but noticed for a particular field, the data is being inserted with e, this particular field is of type varchar and size 20.
Why is e being inserted when the actual data for these respective fields is 54607677038, 77200818179 and 9920996.
Help me out
Thanks in anticipation.
You may think of '2007038971' as being just a string of numbers (some kind of article code, I guess). Excel just sees numbers and treats it as a numerical value. It probably is right aligned (default for numbers) and not left-aligned (default for strings).
When asked to store in as a string, it 'helpfully' formats that number into a string, thereby introducing that "e" notation (the value 2007038971 is about 2.00704 * 10^9).
You need to convince Excel that that code really is a string, maybe by adding a quote in front of it.
How about this. When you read value from excel, then convert ToString() and insert into DB. Need to change relevant data type based on data in your excel.
double doub = 2.00704e+009;
string val = doub.ToString();