Long Long, decimals and input validation in C - c

Currently I'm using TCC as it's the easiest thing to get setup on windows. Simply unzip and you're ready to go. However I'm open to other compilers, GCC, whatever microsoft has on offer etc.
My problem is that I need to validate the input to a size 16 array of integers. Here are my rules:
if number is under 15 (including negative values) then input is valid
if number is under -2147483648 then -2147483648
if number is over 2147483647 then 15
else if number is over 15 then mod 16
if the number is a decimal, remove decimal point and validate again
Considering I'm using C, the last point scares me, and I'll come back to it later. For now I'm just trying to handle the first 4 conditions.
The problem I'm running into is that trying to test for the outer bounds results in Integer overflows and screws up my checks. So I've made a temporary array of long longs to hold the input for validation purposes. The moment everything is successfully validated it should fit in an array of Integers, so I will (somehow) copy the long longs from the temp array to the actual one and start the program as normal.
I've messed around with long longs and trying to do what I want to do, but my code is getting messy fast and everything is so vague and machine dependant in C so when something goes wrong I don't know whether it's me and my crappy coding, or the fact that my machine is different to everyone else's that is causing the error. I am going to stick at it cause I know this sort of thing can be investigated and worked out, however I don't want to waste too much time on it so I'll ask SO and see if there's a shortcut.
The decimal validation part I've got various ideas on how to approach, but I'm not hopeful. What's your opinion?
Anyone who wants to know why I'm doing this: It doesn't matter, I can solve the higher level problem that requires this array quite easily and it will work for all valid inputs. However I'm just being pedantic right now, hence this question.

First, your conditions have some problems. If a number is under -2147483648, it's also under 15, so that check never matches (neither being a decimal for numbers under 15).
Second, you can check for overflows with strtol (check errno for ERANGE) and then compare with your limits (though there's no need if your long has 32-bits and is in two's complement).
As for decimals, if you always want to remove the decimal point (not what you're saying you want because you condition that on a series of other conditions failing), you can setup a preprocessing step that removes periods from the string. It can easily be done in-place with two pointers – a read pointer and a write pointer.

Related

Converting from Base 16-8 to Base 2 without Functions nor Bitwise op in C

I have an assignment to make the Full Adder, it was chosen for us to practice the loops and conditinals in C.
So i did the easiest part of checking wether the number is in Base-2 and printing C-Out and Sum. But for Base-16 and Base-8 I couldn't figure out how to convert them to a smaller bases.
No advanced techniques are allowed, rules as follows:
You are not allowed to use data structures such as arrays to store values for the conversion
operation.
You are not allowed to use bitwise operators.
You are not allowed to define your own functions.
I hope that you don't give me the full solution for this step, like only help me with converting one base to another, and i will try figuring out the rest of it by myself.
Think of it this way: you must be familiar with base 10, or decimal numbers. You use them every day. So how do they work? First, the number of symbols to represent them is the base number, 10. This is why, as you are counting the numbers, whenever you get to a power of 10, you need to increase the number of symbols used to represent the number. What you are asked to do here is kind of the reverse of that process. If you had to write down the digits of a number in base 10 without being allowed to see the number, how would you do it? I will give you the first step: you can get the least significant digit by diving the number by 10 and taking the remainder. This will give you the number of times you had to change the symbol used since the last time you had to increase the number of symbols used.
If you do num%2 you will get the right most bit (LSBit) -- depending on how you want to return the bit pattern (string etc) -- save this bit.
If you divide by two then you will lose the right most bit (LSBit) .. keep doing this in a loop until the number becomes zero.

lightweight (quasi-random) integer fingerprint of C string

I would like to generate a nicely-mixed-up integer fingerprint of an arbitrary C string (s). Most C strings will consist of ASCII text characters:
I want very different fingerprints for similar strings, esp such similar strings as "ab" and "ba"
I want it to be difficult to invert back from the fingerprint to the string (well, my string is typically longer than 32 bits, which means that many strings would map into the same integer), which means again that I want similar strings to yield very different codes;
I want to use the 32 bits available to me efficiently in the integer result,
I want the function source to be small
I want the function to be fast.
one usage is security (but not encryption) related. I can ask a user for a text password, convert it into an integer for storage and later test whether this integer is correct. (I know I could store strings, but I don't want to. guessing a 32-bit integer correctly is impossible if my program can slow down incorrect attempts to the point where brute force cannot work faster than password guessing. another use of this function is as the start of a hash index function (mod array length) into an array.)
alas, I am probably reinventing the wheel here. such functions have probably been written a million times, and by people who are much more versed in cryptography. I don't need AES, of course, but something much more lightweight. the use is different.
my first thinking was
mod 64 each character to take advantage of the ASCII text aspect. now I have 6 bits. call this x.
I can place a 6bit string into 5 locations in a 32-bit space, leaving 2 bits over.
take the current string index position (0, 1, 2...), mod5 it to determine where I want to start to place my x into my running integer result code. XOR my x into this running-result integer.
use the remaining 2 bits to increment a counter [mod 4 to prevent overflow] for each character processed.
then I thought that bit operations may be computer-fast but take more source code. I can think of other choices. take each index position i and multiply it by an ascii representation of each character [or the x from above], and call this y[i]. now do the following:
calculate the natural logarithm of the sums of the y (or this sum plus the running result), and just pretend that the first 32 bits of this result [maybe leaving off the first few bits], which are really a double, are an integer representation. I can XOR each bitint(log(y[i])) into the running integer result.
do it even cheaper. just add the y's, and then do the logarithm with 32-bit pickoff just once at the end. alternatively, run a sum-y through srand as a seed and grab a rand.
there are probably a few other ways to do it, too. in sum, the function should map strings into very different integers, be short to code, and be very fast.
Any pointers?
A common method of generating a non-reversible digest or hash of a string is to generate a Cyclic Redundancy Checksum (CRC).
Source for CRC is widely available, in this case you should use a common CRC-32 such as that used by Ethernet. Different CRCs work on the same principle, buy use different polynomials. Do not be tempted to invent your own polynomial; the distribution is likely to be sub-optimal.
What you're looking for is called a "hash". Two examples of hash functions I'm aware of that return short integers are MurmurHash and SipHash. MurmurHash, as I recall, is not designed to be a cryptographic hash, while SipHash, on the other hand, is indeed designed with security in mind, as stated on its homepage. MurmurHash has 2 versions that return a 32-bit and a 64-bit output. SipHash returns a 64-bit output.

Why is infinity = 0x3f3f3f3f?

In some situations, one generally uses a large enough integer value to represent infinity. I usually use the largest representable positive/negative integer. That usually yields more code, since you need to check if one of the operands is infinity before virtually all arithmetic operations in order to avoid overflows. Sometimes it would be desirable to have saturated integer arithmetic. For that reason, some people use smaller values for infinity, that can be added or multiplied several times without overflow. What intrigues me is the fact that it's extremely common to see (specially in programming competitions):
const int INF = 0x3f3f3f3f;
Why is that number special? It's binary representation is:
00111111001111110011111100111111
I don't see any specially interesting property here. I see it's easy to type, but if that was the reason, almost anything would do (0x3e3e3e3e, 0x2f2f2f2f, etc). It can be added once without overflow, which allows for:
a = min(INF, b + c);
But all the other constants would do, then. Googling only shows me a lot of code snippets that use that constant, but no explanations or comments.
Can anyone spot it?
I found some evidence about this here (original content in Chinese); the basic idea is that 0x7fffffff is problematic since it's already "the top" of the range of 4-byte signed ints; so, adding anything to it results in negative numbers; 0x3f3f3f3f, instead:
is still quite big (same order of magnitude of 0x7fffffff);
has a lot of headroom; if you say that the valid range of integers is limited to numbers below it, you can add any "valid positive number" to it and still get an infinite (i.e. something >=INF). Even INF+INF doesn't overflow. This allows to keep it always "under control":
a+=b;
if(a>INF)
a=INF;
is a repetition of equal bytes, which means you can easily memset stuff to INF;
also, as #Jörg W Mittag noticed above, it has a nice ASCII representation, that allows both to spot it on the fly looking at memory dumps, and to write it directly in memory.
I may or may not be one of the earliest discoverers of 0x3f3f3f3f. I published a Romanian article about it in 2004 (http://www.infoarena.ro/12-ponturi-pentru-programatorii-cc #9), but I've been using this value since 2002 at least for programming competitions.
There are two reasons for it:
0x3f3f3f3f + 0x3f3f3f3f doesn't overflow int32. For this some use 100000000 (one billion).
one can set an array of ints to infinity by doing memset(array, 0x3f, sizeof(array))
0x3f3f3f3f is the ASCII representation of the string ????.
Krugle finds 48 instances of that constant in its entire database. 46 of those instances are in a Java project, where it is used as a bitmask for some graphics manipulation.
1 project is an operating system, where it is used to represent an unknown ACPI device.
1 project is again a bitmask for Java graphics.
So, in all of the projects indexed by Krugle, it is used 47 times because of its bitpattern, once because of its ASCII interpretation, and not a single time as a representation of infinity.

Normalising 18 bit input between 0-9999

I'm writing a program in which i require to normalise an 18-bit input between 0-9999. This is something i have never come across before,
I have searched the internet and correct me if i am wrong here, but is this as simple as converting the 18-bit binary(000000000000000000) input into a natural number and then divide it by 1000.
Is there is a different and more efficient method ????
Thank you
No, what you want to do is multiply your input by 0.03814697265.
The reasoning is pretty simple: you take your range of inputs (0..2^18) and split it in 10000 "slices". Thus each slice will have a range of just over 26. Then if you divide your input from the original range by this 26 (or multiply it by 1/26), you'll get your number in the 0..9999 range.
Edit: depending on your background, you may need to know that here I use ^ with the meaning of exponentiation. Might be moot since this question is tagged C and it has no first-class concept of exponentiation, but it's definetly not XOR!

casting doubles to integers in order to gain speed

in Redis (http://code.google.com/p/redis) there are scores associated to elements, in order to take this elements sorted. This scores are doubles, even if many users actually sort by integers (for instance unix times).
When the database is saved we need to write this doubles ok disk. This is what is used currently:
snprintf((char*)buf+1,sizeof(buf)-1,"%.17g",val);
Additionally infinity and not-a-number conditions are checked in order to also represent this in the final database file.
Unfortunately converting a double into the string representation is pretty slow. While we have a function in Redis that converts an integer into a string representation in a much faster way. So my idea was to check if a double could be casted into an integer without lost of data, and then using the function to turn the integer into a string if this is true.
For this to provide a good speedup of course the test for integer "equivalence" must be fast. So I used a trick that is probably undefined behavior but that worked very well in practice. Something like that:
double x = ... some value ...
if (x == (double)((long long)x))
use_the_fast_integer_function((long long)x);
else
use_the_slow_snprintf(x);
In my reasoning the double casting above converts the double into a long, and then back into an integer. If the range fits, and there is no decimal part, the number will survive the conversion and will be exactly the same as the initial number.
As I wanted to make sure this will not break things in some system, I joined #c on freenode and I got a lot of insults ;) So I'm now trying here.
Is there a standard way to do what I'm trying to do without going outside ANSI C? Otherwise, is the above code supposed to work in all the Posix systems that currently Redis targets? That is, archs where Linux / Mac OS X / *BSD / Solaris are running nowaday?
What I can add in order to make the code saner is an explicit check for the range of the double before trying the cast at all.
Thank you for any help.
Perhaps some old fashion fixed point math could help you out. If you converted your double to a fixed point value, you still get decimal precision and converting to a string is as easy as with ints with the addition of a single shift function.
Another thought would be to roll your own snprintf() function. Doing the conversion from double to int is natively supported by many FPU units so that should be lightning fast. Converting that to a string is simple as well.
Just a few random ideas for you.
The problem with doing that is that the comparisons won't work out the way you'd expect. Just because one floating point value is less than another doesn't mean that its representation as an integer will be less than the other's. Also, I see you comparing one of the (former) double values for equality. Due to rounding and representation errors in the low-order bits, you almost never want to do that.
If you are just looking for some kind of key to do something like hashing on, it would probably work out fine. If you actually care about which values really have greater or lesser value, its a bad idea.
I don't see a problem with the casts, as long as x is within the range of long long. Maybe you should check out the modf() function which separates a double into its integral and fractional part. You can then add checks against (double)LLONG_MIN and (double)LLONG_MAX for the integral part to make sure. Though there may be difficulties with the precision of double.
But before doing anything of this, have you made sure it actually is a bottleneck by measuring its performance? And is the percentage of integer values high enough that it would really make a difference?
Your test is perfectly fine (assuming you have already separately handled infinities and NANs by this point) - and it's probably one of the very few occaisions when you really do want to compare floats for equality. It doesn't invoke undefined behaviour - even if x is outside of the range of long long, you'll just get an "implementation-defined result", which is OK here.
The only fly in the ointment is that negative zero will end up as positive zero (because negative zero compares equal to positive zero).

Resources