Question about the description of "bit string type" in the openGauss official website document - database

I noticed that in the openGauss official website document, the bit string type is described as follows: "A bit string is a string of 1s and 0s", and found that this type is not included under "character type" and "binary type", It is an independent type. Since "0, 1" and "string" are mentioned in the description, there is some confusion about this type, and the following three questions are raised:
Does this type store binary data or character data?
If the binary data is stored, according to the answer in the previous forum (the bit string type has no storage upper limit), then the difference between the bit string type and the binary type is only that the bit string has no storage space upper limit and the binary type has storage space limit this?
Can it be used to store larger (eg >2GB) raw binary data?

Bit string type: It is a 01 string, but the underlying memory in the database will be stored at the bit level of 01 to save space. Without paying too much attention to its underlying logic, it is a special string that can only consist of the character 01. Convenient to store some masks and stuff for us.
Binary type: Specialized to store binary. Taking bytea as an example, any ascii character is input in the SQL statement, and the corresponding ascii binary is stored, and the query displays the hexadecimal code corresponding to ascii. For example insert 'a', then the result of select will be \x61. Other binary types are similar.
Taking the input character '0' as an example, the bit string type stores bit 0, and bytea stores the ascii of the character '0'. When querying output, the bit string type outputs the character '0', and bytea outputs \x30

Related

ADF Data Flow: how to converting hex string to integer

I am trying to convert 32 character hex string generated using md5 to an integer to get mod 10000. Can someone help me with this?
The steps I followed are as below:
Using derived column in Azure data factory Dataflow and inside expression builder, I am adding the below text:
md5(id) where id can be any constant for example.
The above command provides 32 character Hex string. Now I would like to convert it to an integer and then take mod(%) of 100000.

Storing hexidecimal values

I'd like to store this value efficiently in MSSQL 2016:
6d017ed2a48846f0ac025dd8603902c7
i.e, Fixed-length, ranging from 0 to f, hexidecimal, right?.
Char(32) seems too expensive.
Any kind of help would be appreciated. Thank you!
In almost all cases you shouldn't store this as a string at all. SQL Server has binary and varbinary types.
This string represents a 16-byte binary value. If the expected size is fixed, it can be stored as a binary(16). If the size changes, it can be stored as a varbinary(N) where N is the maximum expected size.
Don't use varbinary(max), that's meant to store BLOBs and has special storage and indexing characteristics.
Storing the string itself would make sense in few cases, eg if it's a hash string used in an API, or it's meant to be shown to humans. In this case, the data will always come as a string and will always have to be converted to a string to be used. In this case the constant conversions will probably cost more than the storage benefits.

SQL Server: Hex values padded with zeros and bytes out of order

I'm working on a database that has a VARBINARY(255) column that doesn't make sense to me. Depending on the length of the value, the value is either numbers or words.
For whatever number is stored, it is a 4-byte hex string 0x00000000, but reads left to right while the bytes read right to left. So for a number such as 255, it is 0xFF000000 and for a number such as 745, it is 0xE9020000. This is the part that I do not understand, why is it stored that way instead of 0x02E9, 0x2E9 or 0x000002E9?
When it comes to words, each character is stored as a 4-byte hex string just like above. Something like a space is stored as 0x20000000, but a word like Sensor it is 0x53000000650000006E000000730000006F00000072000000 instead of just 0x53656E736F72.
Can anyone explain to me why the data is stored in this way? Is everything represented as 4-byte strings because the numbers stored can be the full 4-bytes while text is padded with zeros for consistency? Why are the zeros padded to the right of the value? Why are the values stored with the 4th byte first and 1st byte last?
If none of this makes sense from an SQL standpoint, I suppose it is possible that the data is being provided this way from the client application which I do not have access to the source on. Could that be the case?
Lastly, I would like to create a report that includes this column, but converted to the correct numbers or words. Is there a simpler and more performant method than using substrings, trims, and recursion?
With the help of Smor in the comments above, I can now answer my own questions.
The client application provides the 4-byte strings and the database just takes them as they fit within the column's VARBINARY(255) data type and length. Since the application is providing the values in a little-endian format, they are stored in that way within the database with the least significant byte first and the most significant byte last. Being that most values are smaller than the static 4-byte length, the values are padded with zeros to the right to fit the 4-byte requirement.
Now as to my question of the report, this is what I came up with:
CASE
WHEN LEN(ByteValue) <= 4
THEN CAST(CAST(CAST(REVERSE(ByteValue) AS VARBINARY(4)) AS INT) AS VARCHAR(12))
ELSE CAST(CONVERT(VARBINARY(255),REPLACE(CONVERT(VARCHAR(255),ByteValue,1),'000000',''),1) AS VARCHAR(100))
END AS PlainValue
In my particular case, only numbers are stored as just 4-byte or less values while words are stored as much longer values. This allows me to break the smaller values into numbers while longer values are broken down into words.
Using CASE WHEN I can specify that only data 4-bytes or less needs the REVERSE() function as it is the easiest way to convert the little-endian format to the big-endian format that SQL is looking for when converting from hex to integers. Due to the REVERSE() function returning a NVARCHAR datatype, I then have to convert that back to VARBINARY, then to INT, then to VARCHAR to match the datatype of the second case datatype.
Any string longer than 4-bytes, used specifically for words, falls under the ELSE part and allows me to strip the extra zeros from the hex value so I get just the first byte of each 4-byte long character (the only part that matters in my situation). By converting the hex string to VARCHAR, I can then easily remove the 6 repeating zeros using the REPLACE() function. With the zeros gone, converting the string back to VARBINARY allows converting to VARCHAR to be done with ease.

Amazon DynamoDB getting items as a string

I am encountering difficulty in retrieving data from my table. I am using Amazon Dynamo DB and I have successfully populated my table. When I scan the table or use getItem, the returning information is of type AttributeValue. I have looked through the documentation and I can't find how you should process an AttributeValue to get it to become an int or string. The example code of scan from the Amazon Website has the information returned in a Dictionary object, but it is a dictionary with strings mapped to Attribute Values. Do you know of anyway to query a Dynamo DB table and store the result in something where strings are mapped to string or strings are mapped to integers?
Assuming you are using the AWS SDK for Java, objects of Class AttributeValue can be of type String, Number, StringSet, NumberSet and the class features respective getters/setters accordingly, e.g.:
public String getN() - Numbers are positive or negative exact-value decimals and integers. A number can have up to 38 digits precision and can be between 10^-128 to 10^+126.
public String getS() - Strings are Unicode with UTF-8 binary encoding. The maximum size is limited by the size of the primary key (1024 bytes as a range part of a key or 2048 bytes as a single part hash key) or the item size (64k).
Please note that the return value of getN() is still a string and must be converted by your Java string to number conversion method of choice accordingly. This implicit weak typing of the DynamoDB data types retrieval/submission based on String parameters only is a bit unfortunate and doesn't exactly ease developing, see e.g. my answer to Error in batchGetItem API in java for such an issue.
Good luck!

PostgreSQL character varying length limit

I am using character varying data type in PostgreSQL.
I was not able to find this information in PostgreSQL manual.
What is max limit of characters in character varying data type?
Referring to the documentation, there is no explicit limit given for the varchar(n) type definition. But:
...
In any case, the longest possible
character string that can be stored is
about 1 GB. (The maximum value that
will be allowed for n in the data type
declaration is less than that. It
wouldn't be very useful to change this
because with multibyte character
encodings the number of characters and
bytes can be quite different anyway.
If you desire to store long strings
with no specific upper limit, use text
or character varying without a length
specifier, rather than making up an
arbitrary length limit.)
Also note this:
Tip: There is no performance
difference among these three types,
apart from increased storage space
when using the blank-padded type, and
a few extra CPU cycles to check the
length when storing into a
length-constrained column. While
character(n) has performance
advantages in some other database
systems, there is no such advantage in
PostgreSQL; in fact character(n) is
usually the slowest of the three
because of its additional storage
costs. In most situations text or
character varying should be used
instead.
From documentation:
In any case, the longest possible character string that can be stored is about 1 GB.
character type in postgresql
character varying(n), varchar(n) = variable-length with limit
character(n), char(n) = fixed-length, blank padded
text = variable unlimited length
based on your problem I suggest you to use type text. the type does not require character length.
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.
source : https://www.postgresql.org/docs/9.6/static/datatype-character.html
The maximum string size is about 1 GB. Per the postgres docs:
Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that.)
Note that the max n you can specify for varchar is less than the max storage size. While this limit may vary, a quick check reveals that the limit on postgres 11.2 is 10 MB:
psql (11.2)
=> create table varchar_test (name varchar(1073741824));
ERROR: length for type varchar cannot exceed 10485760
Practically speaking, when you do not have a well rationalized length limit, it's suggested that you simply use varchar without specifying one. Per the official docs,
If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.

Resources