Objective-c: Why is strtol method used in this code I inherited? - c

I recently inherited some Objective-C code and am really confused as to what it's actually doing?
1) I don't understand why the char byte_chars[3] is being populated by a '0' first and then a 0 at the end?
2) I don't understand why they used unsigned char for the wholeByte, but the put a long into it?
3) Part of me is a little confused as to what the strtol method does here, it takes 3 byte_chars bytes (which are hexadecimal), and then converts them to a long?
- (NSString *)starFromFlags:(NSString *)flags {
unsigned char wholeByte;
char byte_chars[3];
NSString *star = #"NO";
byte_chars[0] = '0';
byte_chars[1] = [flags characterAtIndex:1];
byte_chars[2] = 0;
wholeByte = strtol(byte_chars, NULL, 16);
if (wholeByte & 0x01) star = #"YES";
return star;
}

To 1:
The 0 at the end is a '\0' (both value of zero) at the end, which means "string ends here". You have to put it at the end to end the string.
The '0' (letter 0) at the begin signals a octal value (not hexadecimal, that would be "0x") for strtol(). But as correctly mentioned by CRD (see comments), it is overriden here by the third arg passed to strtol(). So up to here you have a two-letter string with 0 the first character and the hexadecimal representation of the flags as the second character.
The reason for this is probably, that flags contains a digit from '0' to 'f'₁₆/'F'₁₆ (0 to 15₁₀).
To 2:
Since the values that can outcome from the conversion are in the range [0…15] the long value will have one of this values. You can store it in a single byte and do not need a whole long. However, strtol() always returns a long.
To 3:
Yes, it is a conversion from a string containing a number into a number. I.e. the string "06" will be converted to the number 6.

Amin Negm-Awad has explained what the code does (with a minor confusion over octal), but as to answering your question:
Why is strtol method used in this code I inherited?
we can only guess.
It appears that the second character in the string is a hexadecimal digit being used for (up to) 4 flag bits, the method is testing whether the least significant of these is set. A simpler way to do this might be:
- (NSString *)starFromFlags:(NSString *)flags
{
unichar flag = [flags characterAtIndex:1];
if (flag > 127 || !isxdigit(flag)) // check flag is in ASCII range and hex digit
return #"INVALID";
else
return digittoint(flag) & 0x1 ? #"YES" : #"NO"; // check if flag is odd hex
}
isxdigit() and digittoint() are C library functions (just like strtol()), use man isxdigit in the Terminal for the documentation (unless you are using an older version of Xcode which has the documentation for these, Apple unhelpfully removed the docs in the latest versions). The first checks if a character is a hexadecimal digit, the second returns the integer equivalent. The > 127 check is minimal protection against non-"ASCII" characters in your string.
Note: An NSString (presents itself as) as sequence of UTF-16 code units so characterAtIndex: returns a unichar (a 16-bit type) hence the type of flag. However this code doesn't handle any unicode string correctly. If your strings are "ASCII" it will work.
The above function actually does more error checking than the original, if you are happy to reduce the error checking you can just use:
- (NSString *)starFromFlags:(NSString *)flags
{
return digittoint([flags characterAtIndex:1]) & 0x1 ? #"YES" : #"NO";
}
This will return #"YES" if and only if the flag is a hex digit and its LSB is set, if it isn't a hex digit or the LSB isn't set it returns #"NO". This works because digittoint() returns 0 if its argument isn't a hex digit.
So why did the original programmer use strtol()? Maybe they didn't know about digittoint().
HTH

Related

Check if array is ASCII

How do I check in C if an array of uint8 contains only ASCII elements?
If possible please refer me to the condition that checks if an element is ASCII or not
Your array elements are uint8, so must be in the range 0-255
For standard ASCII character set, bytes 0-127 are used, so you can use a for loop to iterate through the array, checking if each element is <= 127.
If you're treating the array as a string, be aware of the 0 byte (null character), which marks the end of the string
From your example comment, this could be implemented like this:
int checkAscii (uint8 *array) {
for (int i=0; i<LEN; i++) {
if (array[i] > 127) return 0;
}
return 1;
}
It breaks out early at the first element greater than 127.
All valid ASCII characters have value 0 to 127, so the test is simply a value check or 7-bit mask. For example given the inclusion of stdbool.h:
bool is_ascii = (ch & ~0x7f) == 0 ;
Possibly however you intended only printable ASCII characters (excluding control characters). In that case, given inclusion of ctype.h:
bool is_printable_ascii = (ch & ~0x7f) == 0 &&
(isprint() || isspace()) ;
Your intent may be lightly different in terms of what characters you intend to include in your set - in which case other functions in ctype.h may be applied or simply test the values for value or range to include/exclude.
Note also that the ASCII set is very restricted in international terms. The ANSI or "extended ASCII" set uses locale specific codepages to define the glyphs associated with codes 128 to 255. That is to say the set changes depending on language/locale settings to accommodate different language characters, accents and alphabets. In modern systems it is common instead to use a multi-byte Unicode encoding (or which there are several with either fixed or variable length codes). UTF-8 encoding is a variable width encoding where all single byte encodings are also ASCII codes. As such, while it is trivial to determine whether data is entirely within the ASCII set, it does not follow that the data is therefore text. If the test is intended to distinguish binary data from text, it will fail in a great many scenarios unless you can guarantee a priori that all text is restricted to the ASCII set - and that is application specific.
You cannot check if something is "ASCII" with standard C.
Because C does not specify which symbol table that is used by a compiler. Various other more or less exotic symbol tables exists/existed.
UTF8 for example, is a superset of ASCII. Older, dysfunctional 8 bit symbol tables have existed, such as EBCDIC and "Extended ASCII". To tell if something is for example ASCII or EBCDIC can't be done trivially, without a long line of value checks.
With standard C, you can only do the following:
You can check if a character is printable, with the function isprint() from ctype.h.
Or you can check if it only has up to 7 bits only set, if((ch & 0x7F)==ch).
In C programming, a character variable holds ASCII value (an integer number between 0 and 127) rather than that character itself.
The ASCII value of lowercase alphabets are from 97 to 122. And, the ASCII value of uppercase alphabets are from 65 to 90.
incase of giving the actual code , i am giving you example.
You can assign int to char directly.
int a = 47;
char c = a;
printf("%c", c);
And this will also work.
printf("%c", a); // a is in valid range
Another approach.
An integer can be assigned directly to a character. A character is different mostly just because how it is interpreted and used.
char c = atoi("47");
Try to implement this after understand the following logic properly.

Writing one byte in struct variable

I'm trying to modify one byte of my structure with the following code :
struct example *dev;
PRINT_OPAQUE_STRUCT(dev);
sprintf((char*) dev + 24, "%x",1);
PRINT_OPAQUE_STRUCT(dev);
The PRINT_OPAQUE_STRUCT is just printing the content of the structure, and is defined in this other topic :
Print a struct in C
The output of this program is :
d046f64f20b3fb4f00000000e047f64f00000000ffffffff000000
d046f64f20b3fb4f00000000e047f64f00000000ffffffff310000
I don't know why I have the value "31" written and not the value "01" as wanted. I tried to replace the second argument of sprintf with "%01x" but it didn't change anything. Anyone knows why?
Thanks!
Well, you are formatting the value 1 as a string. That's what sprintf does. 0x31 is the character code for the character '1'. If you just want to write the byte value 0x01 into your struct, then you don't need sprintf. Just do this:
*((char*)dev + 24) = 1;
or (the same, but with slightly different syntax):
((char*)dev)[24] = 1;
Note also, like one comment below says, sprintf will not just write one single byte. Since it writes a string, and C strings are null-terminated, it will also write a null character ('\0', 0x00) right after the '1'.
I don't know why I have the value "31" written and not the value "01" as wanted.
The reason you see 31 is that your chain of functions interprets the value 1 twice:
First, sprintf interprets it as a character representing a hex digit
Second, PRINT_OPAQUE_STRUCT interprets the value again, now as a hex number
Essentially, sprintf converts 1 to its character representation, which is '1'. On your system, its code is 0x31, hence the output that you get.
You need to remove one of these two interpretations to get your code to print 1. To remove the first interpretation, simply assign 1 to the pointer, like this:
((char*)dev)[24] = 1;
To remove the second interpretation, print with %c instead of %x in PRINT_OPAQUE_STRUCT (which may not be possible).

Differences between int/char arrays/strings

I'm still new to the forum so I apologize in advance for forum - etiquette issues.
I'm having trouble understanding the differences between int arrays and char arrays.
I recently wrote a program for a Project Euler problem that originally used a char array to store a string of numbers, and later called specific characters and tried to use int operations on them to find a product. When I used a char string I got a ridiculously large product, clearly incorrect. Even if I converted what I thought would be compiled as a character (str[n]) to an integer in-line ((int)str[n]) it did the exact same thing. Only when I actually used an integer array did it work.
Code is as follows
for the char string
char str[21] = "73167176531330624919";
This did not work. I got an answer of about 1.5 trillion for an answer that should have been about 40k.
for the int array
int str[] = {7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9};
This is what did work. I took off the in-line type casting too.
Any explanation as to why these things worked/did not work and anything that can lead to a better understanding of these ideas will be appreciated. Links to helpful stuff are as well. I have researched strings and arrays and pointers plenty on my own (I'm self taught as I'm in high school) but the concepts are still confusing.
Side question, are strings in C automatically stored as arrays or is it just possible to do so?
To elaborate on WhozCraig's answer, the trouble you are having does not have to do with strings, but with the individual characters.
Strings in C are stored by and large as arrays of characters (with the caveat that there exists a null terminator at the end).
The characters themselves are encoded in a system called ascii which assigns codes between 0 - 127 for characters used in the english language (only). Thus "7" is not stored as 7 but as the ascii encoding of 7 which is 55.
I think now you can see why your product got so large.
One elegant way to fix would be to convert
int num = (int) str[n];
to
int num = str[n] - '0';
//thanks for fixing, ' ' is used for characters, " " is used for strings
This solution subtracts the ascii code for 0 from the ascii code for your character, say "7". Since the numbers are encoded linearly, this will work (for single digit numbers). For larger numbers, you should use atoi or strtol from stdlib.h
Strings are just character arrays with a null terminating byte.
There is no separate string data type in c.
When using a char as an integer, the numeric ascii value is used. For example, saying something like printf("%d\n", (int)'a'); will result in 97 (the ascii value of 'a') being printed.
You cannot use a string of numbers to do numeric calculations unless you convert it to an integer array. To convert a digit as a character into its integer form, you can do something like this:
char a = '2';
int a_num = a - '0';
//a_num now stores integer 2
This causes the ascii value of '0' (48) to be subtracted from ascii value '2' (50), finally leaving 2.
char str[21] = "73167176531330624919"
this code is equivalent to
char str[21] = {'7','3','1','6','7','1','7','6','5',/
'3','1','3','3','0','6','2','4','9','1','9'}
so whatever stored in str[21] is not numbers, but the char(their ASCII equivalent representation is different).
side question answer - yes/no, the strings are automatically stored as char arrays, but the string does has a extra character('\0') as the last element(where a char array need not have such a one).

Char to int conversion in C

If I want to convert a single numeric char to it's numeric value, for example, if:
char c = '5';
and I want c to hold 5 instead of '5', is it 100% portable doing it like this?
c = c - '0';
I heard that all character sets store the numbers in consecutive order so I assume so, but I'd like to know if there is an organized library function to do this conversion, and how it is done conventionally. I'm a real beginner :)
Yes, this is a safe conversion. C requires it to work. This guarantee is in section 5.2.1 paragraph 2 of the latest ISO C standard, a recent draft of which is N1570:
Both the basic source and basic execution character sets shall have the following
members:
[...]
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
[...]
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous.
Both ASCII and EBCDIC, and character sets derived from them, satisfy this requirement, which is why the C standard was able to impose it. Note that letters are not contiguous iN EBCDIC, and C doesn't require them to be.
There is no library function to do it for a single char, you would need to build a string first:
int digit_to_int(char d)
{
char str[2];
str[0] = d;
str[1] = '\0';
return (int) strtol(str, NULL, 10);
}
You could also use the atoi() function to do the conversion, once you have a string, but strtol() is better and safer.
As commenters have pointed out though, it is extreme overkill to call a function to do this conversion; your initial approach to subtract '0' is the proper way of doing this. I just wanted to show how the recommended standard approach of converting a number as a string to a "true" number would be used, here.
Try this :
char c = '5' - '0';
int i = c - '0';
You should be aware that this doesn't perform any validation against the character - for example, if the character was 'a' then you would get 91 - 48 = 49. Especially if you are dealing with user or network input, you should probably perform validation to avoid bad behavior in your program. Just check the range:
if ('0' <= c && c <= '9') {
i = c - '0';
} else {
/* handle error */
}
Note that if you want your conversion to handle hex digits you can check the range and perform the appropriate calculation.
if ('0' <= c && c <= '9') {
i = c - '0';
} else if ('a' <= c && c <= 'f') {
i = 10 + c - 'a';
} else if ('A' <= c && c <= 'F') {
i = 10 + c - 'A';
} else {
/* handle error */
}
That will convert a single hex character, upper or lowercase independent, into an integer.
You can use atoi, which is part of the standard library.
Since you're only converting one character, the function atoi() is overkill. atoi() is useful if you are converting string representations of numbers. The other posts have given examples of this. If I read your post correctly, you are only converting one numeric character. So, you are only going to convert a character that is the range 0 to 9. In the case of only converting one numeric character, your suggestion to subtract '0' will give you the result you want. The reason why this works is because ASCII values are consecutive (like you said). So, subtracting the ASCII value of 0 (ASCII value 48 - see ASCII Table for values) from a numeric character will give the value of the number. So, your example of c = c - '0' where c = '5', what is really happening is 53 (the ASCII value of 5) - 48 (the ASCII value of 0) = 5.
When I first posted this answer, I didn't take into consideration your comment about being 100% portable between different character sets. I did some further looking around around and it seems like your answer is still mostly correct. The problem is that you are using a char which is an 8-bit data type. Which wouldn't work with all character types. Read this article by Joel Spolsky on Unicode for a lot more information on Unicode. In this article, he says that he uses wchar_t for characters. This has worked well for him and he publishes his web site in 29 languages. So, you would need to change your char to a wchar_t. Other than that, he says that the character under value 127 and below are basically the same. This would include characters that represent numbers. This means the basic math you proposed should work for what you were trying to achieve.
Yes. This is safe as long as you are using standard ascii characters, like you are in this example.
Normally, if there's no guarantee that your input is in the '0'..'9' range, you'd have to perform a check like this:
if (c >= '0' && c <= '9') {
int v = c - '0';
// safely use v
}
An alternative is to use a lookup table. You get simple range checking and conversion with less (and possibly faster) code:
// one-time setup of an array of 256 integers;
// all slots set to -1 except for ones corresponding
// to the numeric characters
static const int CHAR_TO_NUMBER[] = {
-1, -1, -1, ...,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // '0'..'9'
-1, -1, -1, ...
};
// Now, all you need is:
int v = CHAR_TO_NUMBER[c];
if (v != -1) {
// safely use v
}
P.S. I know that this is an overkill. I just wanted to present it as an alternative solution that may not be immediately evident.
As others have suggested, but wrapped in a function:
int char_to_digit(char c) {
return c - '0';
}
Now just use the function. If, down the line, you decide to use a different method, you just need to change the implementation (performance, charset differences, whatever), you wont need to change the callers.
This version assumes that c contains a char which represents a digit. You can check that before calling the function, using ctype.h's isdigit function.
Since the ASCII codes for '0','1','2'.... are placed from 48 to 57 they are essentially continuous. Now the arithmetic operations require conversion of char datatype to int datatype.Hence what you are basically doing is:
53-48 and hence it stores the value 5 with which you can do any integer operations.Note that while converting back from int to char the compiler gives no error but just performs a modulo 256 operation to put the value in its acceptable range
You can simply use theatol()function:
#include <stdio.h>
#include <stdlib.h>
int main()
{
const char *c = "5";
int d = atol(c);
printf("%d\n", d);
}

Determine if a string is an integer or a float in ANSI C

Using only ANSI C, what is the best way to, with fair certainty, determine if a C style string is either a integer or a real number (i.e float/double)?
Don't use atoi and atof as these functions return 0 on failure. Last time I checked 0 is a valid integer and float, therefore no use for determining type.
use the strto{l,ul,ull,ll,d} functions, as these set errno on failure, and also report where the converted data ended.
strtoul: http://www.opengroup.org/onlinepubs/007908799/xsh/strtoul.html
this example assumes that the string contains a single value to be converted.
#include <errno.h>
char* to_convert = "some string";
char* p = to_convert;
errno = 0;
unsigned long val = strtoul(to_convert, &p, 10);
if (errno != 0)
// conversion failed (EINVAL, ERANGE)
if (to_convert == p)
// conversion failed (no characters consumed)
if (*p != 0)
// conversion failed (trailing data)
Thanks to Jonathan Leffler for pointing out that I forgot to set errno to 0 first.
Using sscanf, you can be certain if the string is a float or int or whatever without having to special case 0, as is the case with atoi and atof solution.
Here's some example code:
int i;
float f;
if(sscanf(str, "%d", &i) != 0) //It's an int.
...
if(sscanf(str "%f", &f) != 0) //It's a float.
...
atoi and atof will convert or return a 0 if it can't.
I agree with Patrick_O that the strto{l,ul,ull,ll,d} functions are the best way to go. There are a couple of points to watch though.
Set errno to zero before calling the functions; no function does that for you.
The Open Group page linked to (which I went to before noticing that Patrick had linked to it too) points out that errno may not be set. It is set to ERANGE if the value is out of range; it may be set (but equally, may not be set) to EINVAL if the argument is invalid.
Depending on the job at hand, I'll sometimes arrange to skip over trailing white space from the end of conversion pointer returned, and then complain (reject) if the last character is not the terminating null '\0'. Or I can be sloppy and let garbage appear at the end, or I can accept optional multipliers like 'K', 'M', 'G', 'T' for kilobytes, megabytes, gigabytes, terabytes, ... or any other requirement based on the context.
I suppose you could step through the string and check if there are any . characters in it. That's just the first thing that popped into my head though, so I'm sure there are other (better) ways to be more certain.
Use strtol/strtoll (not atoi) to check integers.
Use strtof/strtod (not atof) to check doubles.
atoi and atof convert the initial part of the string, but don't tell you whether or not they used all of the string. strtol/strtod tell you whether there was extra junk after the characters converted.
So in both cases, remember to pass in a non-null TAIL parameter, and check that it points to the end of the string (that is, **TAIL == 0). Also check the return value for underflow and overflow (see the man pages or ANSI standard for details).
Note also that strod/strtol skip initial whitespace, so if you want to treat strings with initial whitespace as ill-formatted, you also need to check the first character.
It really depends on why you are asking in the first place.
If you just want to parse a number and don't know if it is a float or an integer, then just parse a float, it will correctly parse an integer as well.
If you actually want to know the type, maybe for triage, then you should really consider testing the types in the order that you consider the most relevant. Like try to parse an integer and if you can't, then try to parse a float. (The other way around will just produce a little more floats...)
atoi and atof will convert the number even if there are trailing non numerical characters. However, if you use strtol and strtod it will not only skip leading white space and an optional sign, but leave you with a pointer to the first character not in the number. Then you can check that the rest is whitespace.
Well, if you don't feel like using a new function like strtoul, you could just add another strcmp statement to see if the string is 0.
i.e.
if(atof(token) != NULL || strcmp(token, "0") == 0)

Resources