ADF Data Flow: how to converting hex string to integer - md5

I am trying to convert 32 character hex string generated using md5 to an integer to get mod 10000. Can someone help me with this?
The steps I followed are as below:
Using derived column in Azure data factory Dataflow and inside expression builder, I am adding the below text:
md5(id) where id can be any constant for example.
The above command provides 32 character Hex string. Now I would like to convert it to an integer and then take mod(%) of 100000.

Related

Question about the description of "bit string type" in the openGauss official website document

I noticed that in the openGauss official website document, the bit string type is described as follows: "A bit string is a string of 1s and 0s", and found that this type is not included under "character type" and "binary type", It is an independent type. Since "0, 1" and "string" are mentioned in the description, there is some confusion about this type, and the following three questions are raised:
Does this type store binary data or character data?
If the binary data is stored, according to the answer in the previous forum (the bit string type has no storage upper limit), then the difference between the bit string type and the binary type is only that the bit string has no storage space upper limit and the binary type has storage space limit this?
Can it be used to store larger (eg >2GB) raw binary data?
Bit string type: It is a 01 string, but the underlying memory in the database will be stored at the bit level of 01 to save space. Without paying too much attention to its underlying logic, it is a special string that can only consist of the character 01. Convenient to store some masks and stuff for us.
Binary type: Specialized to store binary. Taking bytea as an example, any ascii character is input in the SQL statement, and the corresponding ascii binary is stored, and the query displays the hexadecimal code corresponding to ascii. For example insert 'a', then the result of select will be \x61. Other binary types are similar.
Taking the input character '0' as an example, the bit string type stores bit 0, and bytea stores the ascii of the character '0'. When querying output, the bit string type outputs the character '0', and bytea outputs \x30

SSIS Conditional Split Error - The data type DT_BYTES cannot be used with binary operator "=="

While configuring a conditional split component with the following expression:
[VersionStamp_Source] == (DT_I8)[VersionStamp_Destination]
I am getting the following error:
The data type DT_BYTES cannot be used with binary operator "==".
Screenshot:
As shown in the error message, one of the columns used in the conditional split expression has a data type of DT_BYTES which cannot be compared using binary operators.
You need to cast this column to another data type. As mentioned in the official documentation, DT_BYTES can be converted to DT_I8 or to a string data type.
As #billinkc mentioned in the comments, casting DT_BYTES to a string data type is more preferable since some values cannot be converted to an 8-byte integer.
To solve your problem, try using the following expression:
(DT_WSTR,255)[VersionStamp_Source] == (DT_WSTR,255)[VersionStamp_Destination]
Also, make sure to use an accurate length for the string casting operator. You can increase the string length to 4000

ByteArray insertion in MarkLogic using "temporal.documentInsert" inserts but returns twice the count of ByteArray?

Have inserted into MarkLogic using temporal.documentInsert by passing ByteArray of count 5000, but after insertion when retrieving the data using cts.doc it returns the ByteArray count as 10000 (double the actual initial value).
Can someone explain why?
I can find nothing referencing 'ByteArray's in the docs.
What did you use to get the 'count' of the document ?
My guess is that there is a byte -> char conversion,
Java chars are 16 bits (2 bytes). Depending on the encoding,
which will occur both on insert and on get -- in the java JVM,
and exactly which java API you used to get 'count' (count of what?)
an exact 2x difference is suspiciously identical to a byte -> char conversion in java.
If you convert your document to a String, what is the string length (in chars),
for the input and output documents, as seen in Java, using String.length,
and using an explicit charset for conversion.

SQL Server: Hex values padded with zeros and bytes out of order

I'm working on a database that has a VARBINARY(255) column that doesn't make sense to me. Depending on the length of the value, the value is either numbers or words.
For whatever number is stored, it is a 4-byte hex string 0x00000000, but reads left to right while the bytes read right to left. So for a number such as 255, it is 0xFF000000 and for a number such as 745, it is 0xE9020000. This is the part that I do not understand, why is it stored that way instead of 0x02E9, 0x2E9 or 0x000002E9?
When it comes to words, each character is stored as a 4-byte hex string just like above. Something like a space is stored as 0x20000000, but a word like Sensor it is 0x53000000650000006E000000730000006F00000072000000 instead of just 0x53656E736F72.
Can anyone explain to me why the data is stored in this way? Is everything represented as 4-byte strings because the numbers stored can be the full 4-bytes while text is padded with zeros for consistency? Why are the zeros padded to the right of the value? Why are the values stored with the 4th byte first and 1st byte last?
If none of this makes sense from an SQL standpoint, I suppose it is possible that the data is being provided this way from the client application which I do not have access to the source on. Could that be the case?
Lastly, I would like to create a report that includes this column, but converted to the correct numbers or words. Is there a simpler and more performant method than using substrings, trims, and recursion?
With the help of Smor in the comments above, I can now answer my own questions.
The client application provides the 4-byte strings and the database just takes them as they fit within the column's VARBINARY(255) data type and length. Since the application is providing the values in a little-endian format, they are stored in that way within the database with the least significant byte first and the most significant byte last. Being that most values are smaller than the static 4-byte length, the values are padded with zeros to the right to fit the 4-byte requirement.
Now as to my question of the report, this is what I came up with:
CASE
WHEN LEN(ByteValue) <= 4
THEN CAST(CAST(CAST(REVERSE(ByteValue) AS VARBINARY(4)) AS INT) AS VARCHAR(12))
ELSE CAST(CONVERT(VARBINARY(255),REPLACE(CONVERT(VARCHAR(255),ByteValue,1),'000000',''),1) AS VARCHAR(100))
END AS PlainValue
In my particular case, only numbers are stored as just 4-byte or less values while words are stored as much longer values. This allows me to break the smaller values into numbers while longer values are broken down into words.
Using CASE WHEN I can specify that only data 4-bytes or less needs the REVERSE() function as it is the easiest way to convert the little-endian format to the big-endian format that SQL is looking for when converting from hex to integers. Due to the REVERSE() function returning a NVARCHAR datatype, I then have to convert that back to VARBINARY, then to INT, then to VARCHAR to match the datatype of the second case datatype.
Any string longer than 4-bytes, used specifically for words, falls under the ELSE part and allows me to strip the extra zeros from the hex value so I get just the first byte of each 4-byte long character (the only part that matters in my situation). By converting the hex string to VARCHAR, I can then easily remove the 6 repeating zeros using the REPLACE() function. With the zeros gone, converting the string back to VARBINARY allows converting to VARCHAR to be done with ease.

Amazon DynamoDB getting items as a string

I am encountering difficulty in retrieving data from my table. I am using Amazon Dynamo DB and I have successfully populated my table. When I scan the table or use getItem, the returning information is of type AttributeValue. I have looked through the documentation and I can't find how you should process an AttributeValue to get it to become an int or string. The example code of scan from the Amazon Website has the information returned in a Dictionary object, but it is a dictionary with strings mapped to Attribute Values. Do you know of anyway to query a Dynamo DB table and store the result in something where strings are mapped to string or strings are mapped to integers?
Assuming you are using the AWS SDK for Java, objects of Class AttributeValue can be of type String, Number, StringSet, NumberSet and the class features respective getters/setters accordingly, e.g.:
public String getN() - Numbers are positive or negative exact-value decimals and integers. A number can have up to 38 digits precision and can be between 10^-128 to 10^+126.
public String getS() - Strings are Unicode with UTF-8 binary encoding. The maximum size is limited by the size of the primary key (1024 bytes as a range part of a key or 2048 bytes as a single part hash key) or the item size (64k).
Please note that the return value of getN() is still a string and must be converted by your Java string to number conversion method of choice accordingly. This implicit weak typing of the DynamoDB data types retrieval/submission based on String parameters only is a bit unfortunate and doesn't exactly ease developing, see e.g. my answer to Error in batchGetItem API in java for such an issue.
Good luck!

Resources