varchar field length reported incorrectly - sql-server

I have a varchar(800) column that is a primary key in one table and a FK to another.
The problem is that if I do len(field) - it says 186. If I copy/paste the text and I check it in notepad or something, I have 198 characters
The content is this :
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGGTo8JmCWDydNA19MrL4aON-02pA&url=http://creativity-online.com/news/chrysler-nokia-target-among-winners-of-teds-first-ad-contest/149189
Any ideas on why the length difference?
EDIT
You are right. I was using a web based sql manager and that tricked me.
Thank you.

Are you HTML encoding the URL after you have read it from the database?
moriartyn suggested that the SQL Server len function would count & as a single character, but that is not the case. However, if the actual content in the field is not HTML encoded, and it's HTML encoded when inserted in the page, that would change each & character into &, which would account for the extra length.

My guess is that because there are three & in your text, the sql server len function is counting those as just & or one character, and in notepad it is counting them as five each, that would give you twelve extra in that count.

186:
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGGTo8JmCWDydNA19MrL4aON-02pA&url=http://creativity-online.com/news/chrysler-nokia-target-among-winners-of-teds-first-ad-contest/149189
198:
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGGTo8JmCWDydNA19MrL4aON-02pA&url=http://creativity-online.com/news/chrysler-nokia-target-among-winners-of-teds-first-ad-contest/149189
Note the & and &: 3 of them, & is 4 characters longer = 12
You are neither comparing equally nor comparing the same strings.
In SQL:
SELECT
LEN('http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGGTo8JmCWDydNA19MrL4aON-02pA&url=http://creativity-online.com/news/chrysler-nokia-target-among-winners-of-teds-first-ad-contest/149189'),
LEN('http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGGTo8JmCWDydNA19MrL4aON-02pA&url=http://creativity-online.com/news/chrysler-nokia-target-among-winners-of-teds-first-ad-contest/149189')

Related

Oracle data is returned with spaces between characters

I am trying to retrieve data(select *..) from a SQL Server database to an Oracle database using dblinks. In my SQL Server database, I have a columns AddressLine1 and AddressLine2 of type nvarchar.
I am running the below script in SQL Developer (v 4.1.3.20). The results appear having spaces between characters. I used Benthic and SQL Plus and the results are same, spaces between characters.
SELECT
c.CandidateID,
pa."AddressLine1", pa."AddressLine2"
FROM
CANDIDATES c --Oracle table
INNER JOIN
PostalAddress#HIM pa ON pa."EntityID" = c.CandidateID -- SQL Server table
--#HIM --dblink name`
This screenshot shows the results (when copying blank spaces are copied):
I also tried to cast the results to varchar and the results are same. I tried to trim the spaces and also tried to replace the whitespaces with NULL but the results remain the same.
Any suggestions would be greatly appreciated. Thank you.
Your problem does in fact appear to have something to do with the encoding. Specifically, your text seems to be getting decoded using a character set where the width is two bytes, yet your ASCII data is only taking up one byte.
As a temporary fix, consider the following query:
SELECT REGEXP_REPLACE('6 2 1 1 W r i g h t s v i l l e A v e', ' ([^ ])', '\1')
FROM dual;
Demo
This outputs 6211 Wrightsville Ave, which is what you want. Note that I assume that every character has an extra ghost space, the result of which is that words which were originally separated by one space would now be separated by two spaces.
This isn't the best solution for so many reasons. From a regex point of view, a much tighter answer could be given using lookarounds, but REGEXP_REPLACE does not appear to support them.

trim characters from string of one field and fill a new field for each record in SQL table

I added a new field "StrikePrice" to my existing table "OS1115".
I need to trim some characters from an existing field "symbol" and enter that into the new field.
I know I can do this:
UPDATE OS1115 SET StrikePrice = RIGHT(symbol, 4)
But the problem is that the amount of characters I need vary in length, but are always superseded by either a P or a C.
Here are a couple examples:
QQQ_112015C112.5
PCLN_112015P1287.5
NFLX_112015P107
I need to trim the numbers at the end of the string that come after the P or C and enter that in the new field.
so in this case that would result in:
112.5
1287.5
107
How can I do this?
I use MS SQL Express 2012
This would work:
RIGHT(symbol, PATINDEX('%[cp]%', REVERSE(symbol))-1)
though I'm sure there are a variety of ways to do it.
You got one answer while I was testing mine. Here's another way:
substring(StrikePrice,charindex('C',StrikePrice,6)+charindex('P',StrikePrice,6)+1,99)

Sql Server - Encoding issue, replace strange characters

After importing some data into a Sql 2014 database, I realized that there are some fields in which the data replaced German characters such as (ü, ß, ä, ö, etc) with some weird characters. Ex.
München should be München
ChiemgaustraÃe should be Chiemgaustraße
Königstr should be Königstr
I would like to replace these characters with the right German letter. Ex.
ü -> ü
à - > ß
ö -> ö
However when I run queries like the following to try to identify which rows have these characters, the queries returns 0 rows.
select address
from Directory
where street like N'%ChiemgaustraÃe 50%'
select address
from Directory
where street like N'%ü%'
Is there a query I can run to identify and replace these characters?
I must clarify that most of the data was imported correctly, in fact I believe the strange characters were already part of the original data.
Also, I think I can export the data to a text file, replace the characters and re-import, but I was wondering if there is a way to do it directly in sql.
Thanks in advance for the help.
I couldn't get it fix using only sql.
FutbolFa suggestion worked for the most part but there were a couple of symbols, in particular "Ã" that wasn't picked up by any query a tried. I ended up exporting the data to a text file and replacing the symbols there. Then I just re-imported the info.

Format fields during bulk insert SQL 2008

I am currently working on a project that requires data from a report generated by third party software to be inserted into a local SQL database. So far I have the data stored as a tab delimited .txt file and the following bulk insert SQL statement:
BULK INSERT ExampleTable
FROM 'c:\temp\Example.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
The two problems I am encountering are, quotation marks around any value that includes it's own comma, and money signs in every field that has a dollar amount.
For instance one of the columns of the table is a description field and some of the values come out looking like:
"this is an example description, some more information, I don't know why the author would use commas in the first place here"
I don't care about the description field nearly as much as other fields that include dollar amounts. Each of these fields is already prefixed with a $ sign, so I have to set them as a nvarchar instead of a decimal or a float, which would be A LOT more useful for reporting. Furthermore, when the dollar amount is greater than 1000, the field will also contain a comma, and thus, quotation marks. ex "$1,084.59"
I am familiar with SSMS, but I have never made a format or bcp file (the solutions I have found online).
Any help would be greatly appreciated.
You can use a format file, but only if your metadata remains constant, which it does not appear to be in your case. You state that the dollar amounts are enclosed in quotes only when they exceed 999 and the comma is inserted. A format file would allow you to define per column delimiters such as [,] or [","]. But if that delimiter is shifting throughout your file, you will have to pre-process the file. Text qualifiers themselves are not supported.
For reference:
CSV import in SQL Server 2008
http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html
i dont see why, but ThiefMaster deleted my answer :-(
probabaly a mistake and he did not check the link, as this link is the full answer to you question, i will try again for the last time here...
Tip: if your CSV file don't have consistent format, for example ON THE SAME COLUMN some of the values are doubleqouted and some not than this blog will help you do it in an easy way (using openrowset in the last step make it a one simple query): http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx
There is a new WIKI at: http://social.technet.microsoft.com/wiki based on this blog if you prefer to read from Microsoft site.

Migrating from sql server to Oracle varchar length issues

Im facing a strange issue trying to move from sql server to oracle.
in one of my tables i have column defined by NVARCHAR(255)
after reading a bit i understod that SQL server is counting characters when oracle count bytes.
So i defined my table in oracle as VARCHAR(510) 255*2 = 510
But when using sqlldr to load the data from a tab delimetered text file i get en error indicating some entries had exiceeded the length of this column.
after checking in the sql server using:
SELECT MAX(DATALENGTH(column))
FROM table
i get that the max data length is 510.
I do use Hebrew_CI_AS collationg even though i dont think it changes anything....
I checked in SQL Server also if any of the entries contains TAB but no... so i guess its not a corrupted data....
Any one have an idea?
EDIT
After further checkup i've noticed that the issue is due to the data file (in addition to the issue solved by #Justin Cave post.
I have changed the row delimeter to '^' since none of my data contains this character and '|^|' as column delimeter.
creating a control file as follows:
load data
infile data.txt "str '^'"
badfile "data_BAD.txt"
discardfile "data_DSC.txt"
into table table
FIELDS TERMINATED BY '|^|' TRAILING NULLCOLS
(
col1,
col2,
col3,
col4,
col5,
col6
)
The problem is that my data contain <CR> and sqlldr expecting a stream file there for fails on the <CR>!!!! i do not want to change the data since its a textual data (error messages for examples).
What is your database character set
SELECT parameter, value
FROM v$nls_parameters
WHERE parameter LIKE '%CHARACTERSET'
Assuming that your database character set is AL32UTF8, each character could require up to 4 bytes of storage (though almost every useful character can be represented with at most 3 bytes of storage). So you could declare your column as VARCHAR2(1020) to ensure that you have enough space.
You could also simply use character length semantics. If you declare your column VARCHAR2(255 CHAR), you'll allocate space for 255 characters regardless of the amount of space that requires. If you change the NLS_LENGTH_SEMANTICS initialization parameter from the default BYTE to CHAR, you'll change the default so that VARCHAR2(255) is interpreted as VARCHAR2(255 CHAR) rather than VARCHAR2(255 BYTE). Note that the 4000-byte limit on a VARCHAR2 remains even if you are using character length semantics.
If your data contains line breaks, do you need the TRAILING NULLCOLS parameter? That implies that sometimes columns may be omitted from the end of a logical row. If you combine columns that may be omitted with columns that contain line breaks and data that is not enclosed by at least an optional enclosure character, it's not obvious to me how you would begin to identify where a logical row ended and where it began. If you don't actually need the TRAILING NULLCOLS parameter, you should be able to use the CONTINUEIF parameter to combine multiple physical rows into a single logical row. If you can change the data file format, I'd strongly suggest adding an optional enclosure character.
The bytes used by an NVARCHAR field is equal to two times the number of characters plus two (see http://msdn.microsoft.com/en-us/library/ms186939.aspx), so if you make your VARCHAR field 512 you may be OK. There's also some indication that some character sets use 4 bytes per character, but I've found no indication that Hebrew is one of these character sets.

Resources