I'm reading info from a pqsql database (and changed my engine from mysql)
there is something weird in the string data returned,
if the field length in the db is 50 chars the returned value has 50 chars length, so
even if the value in the db is inserted with fewer elements the returned value size is 50.
in mysql the returned value size was the same as it was inserted in the db.
example
create table values(data char(50))
insert into table values('info');
when i pick up this data with PQgetvalue
res=PQexec(conn,"select * from values");
printf(" the value has the lenght of %d chars",strlen(PQgetvalue(res,0,0)));
it will display
the value has the lenght of 50
im kinda surprised of this because now i need to store o calculate in some way the real size of the field, no the max field size
I'm doing something wrong?
(sorry for the typos)
The field is declared as char, not varchar. A char field's value will always have the declared length, probably after padding the value from the right with spaces. From your description, I'd consider changing it to varchar.
See http://www.postgresql.org/docs/8.2/static/datatype-character.html
Related
After creating super table and tables, call taos_load_table_info to load the table information. Then initialize stmt by calling taos_stmt_init and taos_stmt_set_tbname to set up table name.
Create the TAOS_BIND object with the following attributes:
buffer_type = TSDB_DATA_TYPE_NCHAR
buffer_length = sizeof(str)
buffer = &str
length = sizeof(str)
Then call taos_stmt_bind_param and taos_stmt_add_batch, and finally execute with taos_stmt_execute.
The problem is that the insertion failed because I check the shell and use select * to look for the data but it only shows an empty column.
I strongly recommend you first try to insert a simple nchar type data to check whether it is the taos_stmt API's problem. If that insertion success, then you can also check if the insert nchar string has the same length as str variable. Sometimes, buffer_length is greater than or equal to length. If the actual size of your nchar data is less than the length value in TAOS_BIND, then tdengine will still analyze the binding value with other extra empty values and will fail to insert.
I've deleted a row of data that was inserted recently.
Rather than restore and roll forward a second copy of this huge DB to retrieve the inserted data, I'm trying to use the fn_dblog() "undocumented" system function to retrieve it.
Using a description (found here: https://sqlfascination.com/2010/02/03/how-do-you-decode-a-simple-entry-in-the-transaction-log-part-1/)
of the contents of the [Log Content 0] column fn_dblog() returns, I am successfully retrieving my inserted (and later deleted) data from the log file. In the section of this binary data reserved for fixed width column data, I found that the SQL DateTime column values take 8 bytes. I'm processing the binary data in a .NET program, using BitConverter.ToInt64 or BitConverter.ToInt32 as appropriate for the Int or BigInt values
I've managed to retrieve all the inserted column values I need except for the datetime columns...
I'm unclear how to interpret the 8 bytes of a SQL DateTime column as a C# DateTime object. If it helps, below is an example hex and Int64 version of the datetime 8 bytes retrieved from the transaction log data for a particular datetime.
DateTime (around 7/31/2020) in binary: 0xF030660009AC0000 (Endian reversed: 0x0000AC09006630F0)
as an Int64: 189154661380804
Any suggestions? This is internal SQL Server representation of a date, I'm not sure where to find doc on it...
I finally did discover the answer: The SQL DateTime stored as VARBINARY (similar to the bytes I'm reading from the transaction log) contains two integers. The first is the date part - number of days since 1/1/1900. It will be negative for earlier dates.
The second integer is the number of milliseconds since midnight, divided by 3.33333333.
Because the bytes are stored as a long and in reverse, the first 4 bytes of the 8 bytes in the buffer are the minutes, the second is the date.
So here is a code snippet I used to get the date. I'm running through the fixed length fields one at a time, keeping track of the current offset in the byte array...
the variable ba is the byte array of the bytes in the [Log Content 0] column.
int TimeInt;
int DateInt;
DateTime tmpDt;
//initialize the starting point for datetime - 1/1/1900
tmpDt = new DateTime(1900, 1, 1);
// get the time portion of the SQL DateTime
TimeInt = BitConverter.ToInt32(ba, currOffset);
currOffset += 4;
// get the date portion of the SQL DateTime
DateInt = BitConverter.ToInt32(ba, currOffset);
currOffset += 4;
// Add the number of days since 1/1/1900
tmpDt = tmpDt.AddDays(DateInt);
// Add the number of milliseconds since midnight
tmpDt = tmpDt.AddMilliseconds(TimeInt * 3.3333333);
Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).
Can someone please explain below behavior
KAP.ADMIN(ADMIN)=> create table char1 ( a char(64000),b char(1516));
CREATE TABLE
KAP.ADMIN(ADMIN)=> create table char2 ( a char(64000),b char(1517));
ERROR: 65536 : Record size limit exceeded
KAP.ADMIN(ADMIN)=> insert into char1 select * from char1;
ERROR: 65540 : Record size limit exceeded => why this error during
insert if create table does not throw any error for same table as
shown above.
KAP.ADMIN(ADMIN)=> \d char1
Table "CHAR1"
Attribute | Type | Modifier | Default Value
-----------+------------------+----------+---------------
A | CHARACTER(64000) | |
B | CHARACTER(1516) | |
Distributed on hash: "A"
./nz_ddl_table KAP char1
Creating table: "CHAR1"
CREATE TABLE CHAR1
(
A character(64000),
B character(1516)
)
DISTRIBUTE ON (A)
;
/*
Number of columns 2
(Variable) Data Size 4 - 65520
Row Overhead 28
====================== =============
Total Row Size (bytes) 32 - 65548
*/
I would like to know the calculation of row size in above case.
I checked the netezza db user guide, but not able to understand its calculation in above example.
I think this link does a good job of explaining the over head of Netezza / PDA Datatypes:
For every row of every table, there is a 24-byte fixed overhead of the rowid, createxid, and deletexid. If you have any nullable columns, a null vector is required and it is N/8 bytes where N is the number of columns in the record.
The system rounds up the size of
this header to a multiple of 4 bytes.
In addition, the system adds a record header of 4 bytes if any of the following is true:
Column of type VARCHAR
Column of type CHAR where the length is greater than 16 (stored internally as VARCHAR)
Column of type NCHAR
Column of type NVARCHAR
Using UTF-8 encoding, each Unicode code point can require 1 - 4 bytes of storage. A 10-character string requires 10 bytes of storage if it is ASCII and up to 20 bytes if it is Latin, or as many as 40 bytes if it is Kanji.
The only time a record does not contain a header is if all the columns are defined as NOT NULL, there are no character data types larger than 16 bytes, and no variable character data types.
https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_data_types_calculate_row_size.html
First create a temp table based on one row of data.
create temp table tmptable as
select *
from Table
limit 1
Then check the used bytes of the temp table. That should be the size per row.
select used_bytes
from _v_sys_object_storage_size a inner join
_v_table b
on a.tblid = b.objid
and b.tablename = 'tmptable'
Netezza has some Limitations:
1)Maximum number of characters in a char/varchar field: 64,000
2)Maximum row size: 65,535 bytes
Beyond 65 k bytes is impossible for a record length in NZ.
Though NZ box offers huge space, it would be really good idea to move with accurate space forecasting rather radomly spacing. Now in your requirement does all the attributes would mandatorily require a char(64000) or can be compacted with real-time data analysis. If further compacting can be done, then revisit on the attribute length .
Also during such requirements, never go with insert into char1 select * ....... statements because this will allow system to choose preferred datatypes and that will be of higher sizing ends which might not be necessary.
If varchar(max) is used as the datatype and the inserted data is less than the full allocation, i.e. only 200 chars, then will SQL Server always take the full space of varchar(max) or just the 200 chars' space?
Further, what are the other data types that will take the max space even if lesser data is inserted?
Are there any documents that specify this?
From MS DOCS on char and varchar (Transact-SQL):
char [ ( n ) ]
Fixed-length, non-Unicode string data. n defines the string length and must be a value from 1 through 8,000. The storage size is n bytes. The ISO synonym for char is character.
varchar [ ( n | max ) ]
Variable-length, non-Unicode string data. n defines the string length and can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of the data entered + 2 bytes. The ISO synonyms for varchar are char varying or character varying.
So for varchar, including max - the storage will depend on actual data length, while char is always fixed size even when entire space is not used.
Use CHAR only for strings
whose length you know to be fixed. For example, if you define a domain
whose values are restricted to 'T' and 'F', you should probably make
that CHAR[1]. If you're storing US social security numbers, make the
domain CHAR[9] (or CHAR[11] if you want punctuation).
Use VARCHAR for strings that can vary in length, like names, short
descriptions, etc. Use VARCHAR when you don't want to worry about
stripping trailing blanks. Use VARCHAR unless there's a good reason
not to.
varchar size depends on the length of the data. So in your case, it will just take 200 chars.