Teradata BTEQ Whitespace at end of every column - export

I am exporting to a text file from BTEQ and I am getting whitespace padding to the maximum length of each of my columns in my output text file. For example I just want customer_name and post_code columns to look like;
Mr Always Teste,AB10 1AB,
but it on my file it is like;
Mr Always Teste ,AB10 1AB ,
I just want the data I need and not all the whitespace at the end as I need to import the data cleanly after exporting.
My export script contains:
.SET TITLEDASHES OFF
.SET SEPARATOR ','
.SET FORMAT OFF (ON MAKES IT ALL WEIRD)
.SET NULL ''
.SET WIDTH 1000
Forgive me I can't paste any data as it's on another pc and its all confidential anyway.
Example column definitions are (they are all like this with varying lengths):
Name: customer_name Type: CV Format: X(208) Max Length: 208
Like I say, this and all the other columns pad out to their length with whitespace in the output file. Anything I can do about it?

REPORT format in BTEQ is fixed width, setting the SEPERATOR will not remove spaces. But you might return a single column only using the CSV function to return a delimited string:
with cte as
(
select * from tab
)
select *
from table(CSV(new variant_type(cte.col1 -- list all columns here
,cte.col2
,cte.col3)
,',' -- seperator
,'"') -- string delimiter
returns (s varchar(10000)) as t;
This is much easier and better performing than CONCAT & COALESCE all columns.

Related

Anychart tables: How to include thousand separators?

How can I put the text "100.000" in a table in Anychart? When I try to get the string "100.000" in, it is modified to "100".
For a working example see https://jsfiddle.net/Republiq/xcemvm9L/
table = anychart.standalones.table(2,2);
table.getCell(0,0).content("100.000");
table.container("container").draw();
If you want to use such number formatting for the whole table you can define numberLocale in the beginning. If the actual number is 100 and '.' - is a decimal separator and you want to show 3 zeros as decimals, put the following lines before creating the table:
anychart.format.locales.default.numberLocale.decimalsCount = 3;
anychart.format.locales.default.numberLocale.zeroFillDecimals = true;
And then put in the number as:
table.getCell(0,0).content(100);
If '.' - is a group separator and the actual number is 100000, put the following line:
anychart.format.locales.default.numberLocale.groupsSeparator = '.';
And then put in the number as:
table.getCell(0,0).content(100000);
If you want to use special format only for a single cell, we recommend you to use number formatter, which helps to configure all these options only for a single number. For example, it may looks like:
table = anychart.standalones.table(5,5);
table.getCell(0,0).content(anychart.format.number(100000, 3, ".", ","));
table.container("container").draw();
Also, you may learn more about this useful method and find examples in this article

Hive: create table with arrays of struct from csv file where everything is comma delimited

I have a csv file with array of structs where everything is delimited by ','. After the ID field, the data contains arrays of triplets of X, Y and Z coordinates.
ID, X1,Y1,Z1,X2,Y2,Z2,X3,Y3,Z3,...
1,1,2,3,4,5,6,7,8,9
2,4,5,6,7,8,9
3,10,11,12
4,15,16,17,18,19,20,25,26,27
I tried to use the following code to create the Hive table which would have worked if my fields, collection items and map keys were delimted with different characters. However, since everything is delimited with a comma, it failed. Wondering if there is an alternate solution for this situation.
CREATE TABLE IF NOT EXISTS Hivetable (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
row format delimited
fields terminated by ','
collection items terminated by ','
map keys terminated by ','
stored as textfile
;
LOAD DATA local INPATH 'Path/datafile.csv' OVERWRITE INTO TABLE Hivetable;
SCV file input should be:
1,1;2;3#4;5;6#7;8;9
2,4;5;6#7;8;9
Table creation:
CREATE TABLE IF NOT EXISTS Hivetable (
ID INT,
XYZ array<STRUCT<X:DOUBLE, Y:DOUBLE, Z:DOUBLE>>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY ';'
LINES TERMINATED BY '\n'
STORED AS
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
Output:
select * from Hivetable
1 [{"X":1,"Y":2,"Z":3},{"X":4,"Y":5,"Z":6},{"X":7,"Y":8,"Z":9}]

SQL: Fix for CSV import mistake

I have a database that has multiple columns populated with various numeric fields. While trying to populate from a CSV, I must have mucked up assigning delimited fields. The end result is a column containing It's Correct information, but also contains the next column over's data- seperated by a comma.
So instead of Column UPC1 containing "958634", it contains "958634,95877456". The "95877456" is supposed to be in the UPC2 column, instead UPC2 is NULL.
Is there a way for me to split on the comma and send the data to UPC2 while keeping UPC1 data before the comma in tact?
Thanks.
You can do this with string functions. To query the values and verify the logic, try this:
SELECT
LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000)
FROM myTable;
If the result is what you want, turn it into an update:
UPDATE myTable SET
UPC1 = LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
UPC2 = SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000);
The expression for UPC1 takes the left side of UPC1 up to one character before the comma.
The expression for UPC2 takes the remainder of the UPC1 string starting one character after the comma.
The third argument to SUBSTRING needs some explaining. It's the number of characters you want to include after the starting position of the string (which in this case is one character after the comma's location). If you specify a value that's longer than the string SUBSTRING will just return to the end of the string. Using 1000 here is a lot easier than calculating the exact number of characters you need to get to the end.

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.
Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),
The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.
This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length
Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;
declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Resources