Azure Logic Apps - JSON Parsing - Decimal type parsed as Integer - azure-logic-apps

I'm using Azure Logic apps and one of the steps is to parse an API JSON response. I'm uploading a payload to generate the schema.
One of my properties is a decimal type for Tax, specific in the JSON as “Number” type
The value in my source JSON comes through as this…
"TaxAmount": 999.00
However when its parsed it is set as "Integer"
When I change the value to...
"TaxAmount": 999.01
It will correctly come through as a "Number" type
Is there a way I can define the value of 999.00 and it be parsed as a “Number” rather than an “Integer”?
Any help would be appreciated

One workaround is to directly(i.e., manually) change the type of the variable while parsing. Something like
to

Unfortunatly, no.
Some programming languages and parsers use different internal
representations for floating point numbers than they do for integers.
For consistency, integer JSON numbers SHOULD NOT be encoded with a
fractional part.
https://json-schema.org/draft/2020-12/json-schema-core.html#integers
Note that this is SHOULD NOT, so it MAY be allowable.
But, consider, implementations may behave differently.
"SHOULD NOT" means, "you really should not do this unless you have a really good reason, and you better document it if you do".
If you need this, consider encoding the numbers in strings and using regular expression to do the validation.

Related

How to deal with decimals in MongoDB v3.6

I am working with MongoDB v3.6.3.
I have seen a similar question that recieved a good answer. So why am I asking this question?
Because I am working with a different version of MongoDB
Because I have just stored a decimal number in my DB without registering any serializers as instructed in the answer of the similar question And no error was thrown.
My MongoDB schema looks like this:
rating:{
type: Number,
required: true
}
So my question is, is there anything wrong with the way I have implemented this. Considering that I have already stored a decimal number in my DB. Is it okay to store decimal numbers with the current schema? Or is this a setup for errors in the future because I am missing a step?
Thank you.
The Number type is a floating point numeric representation that cannot accurately represent decimal values. This may be fine if your use case does not require precision for floating point numbers, but would not be suitable if accuracy matters (for example, for fractional currency values).
If you want to store and work with decimal values with accuracy you should instead use the Decimal128 type in Mongoose. This maps to the Decimal 128 (aka NumberDecimal) BSON data type available in MongoDB 3.4+, which can also be manipulated in server-side calculations using the Aggregation Framework.
If your rating field doesn't require exact precision, you could continue to use the Number type. One reason to do so is that the Number type is native to JavaScript, while Decimal128 is not.
For more details, see A Node.js Perspective on MongoDB 3.4: Decimal Type.

Binary data different when viewed with CFDUMP

I have a SQL Server database that has a table that contains a field of type varbinary(256).
When I view this binary field via a query in MMS, the value looks like this:
0x004BC878B0CB9A4F86D0F52C9DEB689401000000D4D68D98C8975425264979CFB92D146582C38D74597B495F87FEA09B68A8440A
When I view this same field (and same record) using CFDUMP, the value looks like this:
075-56120-80-53-10279-122-48-1144-99-21104-1081000-44-42-115-104-56-10584373873121-49-714520101-126-61-115116891237395-121-2-96-101104-886810
(For the example below, the original binary value will be #A, and the CFDUMP value above will be #B)
I have tried using CAST(#B as varbinary(256)) but didn't get the same value as #A.
What must I do to convert the value retrieved from CFDUMP into the correct binary representation?
Note: I no longer have the applicable records in the database. I need to convert #B into the correct value that can re-INSERT into a varbinary(256) field.
(Expanded from comments)
I do not mean this sarcastically, but what difference does it make how they display binary? It is simply a difference in how the data is presented. It does not mean the actual binary values differ.
It is similar to how dates are handled. Internally, they are a big numbers. But since most people do not know which date 1234567890 represents, applications chose to display the number in a more human friendly format. So SSMS might present the date as 2009-02-13 23:31:30.000, while CF might present it as {ts '2009-02-13 23:31:30'}. Even though the presentations differ, it still the same value internally.
As far as binary goes, SSMS displays it as hexadecimal. If you use binaryEncode() on your query column, and convert the binary to hex, you can see it is the same value. Just without the leading 0x:
writeDump( binaryEncode(yourQuery.binaryColumn, "hex") )
If you are having some other issue with binary, could you please elaborate?
Update:
Unfortunately, I do not think you can easily convert the cfdump representation back into binary. Unlike Railo's implementation, Adobe's cfdump just concatenates the numeric representation of the individual bytes into one big string, with no delimiter. (The dashes are simply negative numbers). You can reproduce this by looping through the bytes of your sample string. The code below produces the same string of numbers you posted.
bytes = binaryDecode("004BC878B0CB9A4F...", "hex");
for (i=1; i<=arrayLen(bytes); i++) {
WriteOutput( bytes[i] );
}
I suppose it is theoretically possible to convert that string into binary, but it would be very difficult. AFAIK, there is no way to accurately determine where one number (or byte) begins and the other ends. There are some clues, but ultimately it would come down to guesswork.
Railo's implementation, displays the byte values separated by a dash "-". Two consecutive dashes indicates a negative number. ie "0", "75", "-56", ...
0-75--56-120--80--53--102-79--122--48--11-44--99--21-104--108-1-0-0-0--44--42--115--104--56--105-84-37-38-73-121--49--71-45-20-101--126--61--115-116-89-123-73-95--121--2--96--101-104--88-68-10
So you could probably parse that string back into an array of bytes. Then insert the binary into your database using <cfqueryparam cfsqltype="CF_SQL_BINARY" ..>. Unfortunately that does not help you, but the explanation might help the next guy.
At this point, I think your best bet is to just restore the data from a database backup.

CSV String vs Arrays: Is this too stringly typed?

I came across some existing code in our production environment given to us by our vendor. They use a string to store comma seperated values to store filtered results from a DB. Keep in mind that this is for a proprietary scripting language called PowerOn that interfaces with a database residing on an AIX system, but it's a language that supports strings, integers, and arrays.
For example, we have;
Account
----------------
123
234
3456
28390
The psuedo code might look like;
Define accounts As String
For Each Account
accounts=accounts + CharCast(Account) + ","
End
as opposed to something I would expect to see like
Define accounts As Integer Array(99)
Define index as Integer=0
For Each Account
accounts(index)=Account
index=index+1
End
By the time the loop is done, accounts will look like; 123,234,3456,28390,. The string is later used to test if a specific instance exists like so
If CharSearch("28390", accounts) > 0 Then Call DoSomething
In the example, the statement evaluates to true and DoSomething gets called. Given the option of arrays, why would want to store integer values whithin a string of comma seperated values? Every language I've come across, it's almost always more expensive to perform string based operations than integer based operations.
Considering I haven't seen this technique before and my experience is somewhat limitted, is there a name for this? Is this common practice or is this just another example of being too stringly typed? To extend the existing code, should I continue using string method? Did we get cruddy code from our vendor?
What I put in the comment still holds but my real answer is: It's probably a design decision with respect to compatibility/portability. In your integer-array case (and a low enough level of the API) you'd typically find yourself asking questions like, what's a safe guess of the size of an integer on "today"'s machines. What about endianness.
The most portable and most flexible of all data formats always has been and always will be printed representation. It may not be as fast to process that but that's where adapters/converters or so kick in. I wouldn't be surprised to find (human-readable) printed representation of something especially in database APIs like you describe.
If you want something fast, just take whatever is given to you, convert it to a more efficient internal format, do you processing and convert it back.
There's nothing inherently wrong with using comma-separated strings instead of arrays. Sure you can't readily access a random n's element of such a collection, but if such random access is not needed then there's no penalty for it, right?
As far as I know Oracle DB stores NUMBER values as strings (and if my memory is correct - for DATEs as well) for very practical reasons.
In your specific example looks like using strings is an overkill when dealing with passing data around without crossing the process boundaries. But could it be that the choice of string data type makes more sense when sending data over wire or storing on disk?

Case fold UTF-8 without knowing the language

I'm trying to evaluate different strategies for case insensitive UTF-8 string comparison.
I've read some material from the Unicode consortium, experimented with ICU and tried to come up with various quality-of-implementation alternatives.
On multiple occasions I've seen texts differ between Simple Case Mapping and Full Case Mapping, and I wanted to make sure I understand the difference entirely.
As I read it, Simple Case Mapping is "context-free", i.e. doesn't need to know what language the payload is. This will give approximate results, due to the Turkic "I/ı/İ/i" debacle.
Full Case Mapping, on the other hand, needs to know the language of the payload to be able to perform the mapping. With that extra information, it can take special measures to cover cases where "Kim" as a Turkic string should become "KİM" in upper-case, but "Kim" as an English string, should become "KIM" in upper-case.
Have I got that right?
Are there other examples of "multi-faceted" code points that fold differently for different languages?
Thanks!
UPDATE: One of the sources mentioning simple case mapping as language independent is ICU's documentation. I interpreted that as Unicode truth, but maybe it's just a statement of the implementation?
No, a "full case mapping" is a casing where one codepoint needs to be replaced by more than one new codepoints. A simple case mapping is a single codepoint substitution.
If you want to implement this yourself then the Unicode CaseFolding.txt file is crucial to get this right. Note the status field code "T", specifically there to handle the Turkish I problem.
Well ... The consonant combination "SS" would down-case to "ss" for most Western languages, but in German it might become the special letter "ß". That's just "might", there are quite involved usage rules to consider.
I think this doesn't directly affect collation order (any Germans are of course welcome to correct me) though, so maybe it's a moot point.

Is it a good idea to use an integer column for storing US ZIP codes in a database?

From first glance, it would appear I have two basic choices for storing ZIP codes in a database table:
Text (probably most common), i.e. char(5) or varchar(9) to support +4 extension
Numeric, i.e. 32-bit integer
Both would satisfy the requirements of the data, if we assume that there are no international concerns. In the past we've generally just gone the text route, but I was wondering if anyone does the opposite? Just from brief comparison it looks like the integer method has two clear advantages:
It is, by means of its nature, automatically limited to numerics only (whereas without validation the text style could store letters and such which are not, to my knowledge, ever valid in a ZIP code). This doesn't mean we could/would/should forgo validating user input as normal, though!
It takes less space, being 4 bytes (which should be plenty even for 9-digit ZIP codes) instead of 5 or 9 bytes.
Also, it seems like it wouldn't hurt display output much. It is trivial to slap a ToString() on a numeric value, use simple string manipulation to insert a hyphen or space or whatever for the +4 extension, and use string formatting to restore leading zeroes.
Is there anything that would discourage using int as a datatype for US-only ZIP codes?
A numeric ZIP code is -- in a small way -- misleading.
Numbers should mean something numeric. ZIP codes don't add or subtract or participate in any numeric operations. 12309 - 12345 does not compute the distance from downtown Schenectady to my neighborhood.
Granted, for ZIP codes, no one is confused. However, for other number-like fields, it can be confusing.
Since ZIP codes aren't numbers -- they just happen to be coded with a restricted alphabet -- I suggest avoiding a numeric field. The 1-byte saving isn't worth much. And I think that that meaning is more important than the byte.
Edit.
"As for leading zeroes..." is my point. Numbers don't have leading zeros. The presence of meaningful leading zeros on ZIP codes is yet another proof that they're not numeric.
Are you going to ever store non-US postal codes? Canada is 6 characters with some letters. I usually just use a 10 character field. Disk space is cheap, having to rework your data model is not.
Use a string with validation. Zip codes can begin with 0, so numeric is not a suitable type. Also, this applies neatly to international postal codes (e.g. UK, which is up to 8 characters). In the unlikely case that postal codes are a bottleneck, you could limit it to 10 characters, but check out your target formats first.
Here are validation regexes for UK, US and Canada.
Yes, you can pad to get the leading zeroes back. However, you're theoretically throwing away information that might help in case of errors. If someone finds 1235 in the database, is that originally 01235, or has another digit been missed?
Best practice says you should say what you mean. A zip code is a code, not a number. Are you going to add/subtract/multiply/divide zip codes? And from a practical perspective, it's far more important that you're excluding extended zips.
Normally you would use a non-numerical datatype such as a varchar which would allow for more zip code types. If you are dead set on only allowing 5 digit [XXXXX] or 9 digit [XXXXX-XXXX] zip codes, you could then use a char(5) or char(10), but I would not recommend it. Varchar is the safest and most sane choice.
Edit: It should also be noted that if you don't plan on doing numerical calculations on the field, you should not use a numerical data type. ZIP Code is a not a number in the sense that you add or subtract against it. It is just a string that happens to be made up typically of numbers, so you should refrain from using numerical data types for it.
From a technical standpoint, some points raised here are fairly trivial. I work with address data cleansing on a daily basis - in particular cleansing address data from all over the world. It's not a trivial task by any stretch of the imagination. When it comes to zip codes, you could store them as an integer although it may not be "semantically" correct. The fact is, the data is of a numeric form whether or not, strictly speaking it is considered numeric in value.
However, the very real drawback of storing them as numeric types is that you'll lose the ability to easily see if the data was entered incorrectly (i.e. has missing values) or if the system removed leading zeros leading to costly operations to validate potentially invalid zip codes that were otherwise correct.
It's also very hard to force the user to input correct data if one of the repercussions is a delay of business. Users often don't have the patience to enter correct data if it's not immediately obvious. Using a regex is one way of guaranteeing correct data, however if the user enters a value that doesn't conform and they're displayed an error, they may just omit this value altogether or enter something that conforms but is otherwise incorrect. One example [using Canadian postal codes] is that you often see A0A 0A0 entered which isn't valid but conforms to the regex for Canadian postal codes. More often than not, this is entered by users who are forced to provide a postal code, but they either don't know what it is or don't have all of it correct.
One suggestion is to validate the whole of the entry as a unit validating that the zip code is correct when compared with the rest of the address. If it is incorrect, then offering alternate valid zip codes for the address will make it easier for them to input valid data. Likewise, if the zip code is correct for the street address, but the street number falls outside the domain of that zip code, then offer alternate street numbers for that zip code/street combination.
No, because
You never do math functions on zip code
Could contain dashes
Could start with 0
NULL values sometimes interpreted as zero in case of scalar types
like integer (e.g. when you export the data somehow)
Zip code, even if it's a number, is a designation of an area,
meaning this is a name instead of a numeric quantity of anything
Unless you have a business requirement to perform mathematical calculations on ZIP code data, there's no point in using an INT. You're over engineering.
Hope this helps,
Bill
ZIP Codes are traditionally digits, as well as a hyphen for Zip+4, but there is at least one Zip+4 with a hyphen and capital letters:
10022-SHOE
https://www.prnewswire.com/news-releases/saks-fifth-avenue-celebrates-the-10th-birthday-of-its-famed-10022-shoe-salon-300504519.html
Realistically, a lot of business applications will not need to support this edge case, even if it is valid.
Integer is nice, but it only works in the US, which is why most people don't do it. Usually I just use a varchar(20) or so. Probably overkill for any locale.
If you were to use an integer for US Zips, you would want to multiply the leading part by 10,000 and add the +4. The encoding in the database has nothing to do with input validation. You can always require the input to be valid or not, but the storage is matter of how much you think your requirements or the USPS will change. (Hint: your requirements will change.)
I learned recently that in Ruby one reason you would want to avoid this is because there are some zip codes that begin with leading zeroes, which–if stored as in integer–will automatically be converted to octal.
From the docs:
You can use a special prefix to write numbers in decimal, hexadecimal, octal or binary formats. For decimal numbers use a prefix of 0d, for hexadecimal numbers use a prefix of 0x, for octal numbers use a prefix of 0 or 0o…
I think the ZIP code in the int datatype can affect the ML-model. Probably, the higher the code can create outlier in the data for the calculation

Resources