Is it possible to convert a char* to uppercase without traversing character by character in a loop ?
Assumption:
1. Char pointer points to fixed size string array.
2. The array pointed to contains only lowercase characters
In the ASCII encoding, converting lowercase to uppercase amounts to setting the bit of weight 32 (i.e. 20H, the space character).
With a bitwise operator,
Char|= 0x20;
You can process several characters at a time by mapping longer data types on the array. For instance, to convert an array of 11 characters,
int ToUpper= 0x20202020;
*(int*) &Char[0]|= ToUpper;
*(int*) &Char[4]|= ToUpper;
*(short*)&Char[8]|= ToUpper;
Char[10]|= ToUpper;
You can go to 64 bit ints and even larger (up to 512 bits = 64 characters at a time) with the SIMD intrinsics (SSE, AVX).
If your code allows it, it is better to extend the buffer length to the next larger data type so that all bytes can be updated in a single operation. But don't forget to restore the terminating null.
Related
I have a char array:
char message[];
And an 8-bit integer
uint8_t remainder
I want to treat both just as arrays of bits and subtract them like:
message - remainder
and treat the result as a char array:
An example would be
char* message = "ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ";
// Equivalent to a 512-bit array of only 1s
uint8_t remainder = 1;
// Substract here message-remainder
printf("%s", message)
// Output: "ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþ"
// As the 512 bit array would still be just 1s except for the first bit which now is 0, so the first char would be 254 instead of 255
Is there a possible way to do it?
I thought about converting the char array to an int but the problem is that it usually is a 64 byte array, so I cannot treat is an int. I think the approach goes around using bitwise operators but I haven't figured how to subtract them yet.
Any suggestions?
As requested:
Read from the byte array (chars) as a type-casted integer. Beware that endian may cause this to not work correctly on some systems and you may have to endian swap. Also beware that if the data is not aligned to word boundaries, some systems may crash.
Compare your remainder vs the integer. If integer >= remainder, there is no carry and you can just subtract the values and typecast store the integer back into the char array. Same exceptions apply as stated above.
If remainder is bigger, still do the subtract and store, but then place a 1 into remainder.
Loop back to 1, reading in next word until you exit because of no carry to propagate or out of words to read.
If data is non-aligned, not size of word, etc., you may need to do this per byte instead, but you have stated this is not the case.
Enjoy.
(Note: Using a BigNum type library is highly recommended over doing it yourself. Someday, this code may need to be ported, and this method is highly likely to break when that occurs...)
Firstly it is generally a bad idea to do so :-( (usually cause some buffer overflow)
Secondly, since your integer of interest is a 8-bit one, it is the same size of a single char. Therefore if you do want to implement that, just do this:
if (message[strlen(message)-1]<integer){
for (int i=strlen(message);i>0;i--){
if (message[i-1]){
message[i-1]--;
for (j=i+1;j<strlen(message)-1;j++){
message[j]=255;
}
message[strlen(message)-1]=(char)((int)(message[strlen(message)-1])+255-remainder);
break;
}
}
/* ERROR - message is less than remainder */
}
else{
message[strlen(message)-1]-=remainder;
}
and you are done.
Notice that the part (char)((int)(message[strlen(message)-1])+255-remainder) may not be necessary; I am writing it just to make sure that when performing the addition and subtraction, everything is casted into int.
please tell me if I am wrong, if a number is stored as a character it will contain 1 byte per character of the number(not 4 bytes)?
for example if I make an int variable of the number 8 and a char variable of '8' the int variable will have consumed more memory?
and if I create an int variable as the number 12345 and a character array of "12345" the character array will have consumed more memory?
and in text files if numbers are stored are they considered as integers or characters?
thank you.
Yes, all of your answers are correct.
int will always take up sizeof(int) bytes, 8(int) assuming 32-bit int it will take 4 bytes, whereas 8(char) will take up one byte.
The way to think about your last question IMO is that data is stored as bytes. char and int are way of interpreting bytes, so in text files you write bytes, but if you want to write human-readable "8" into a text file, you must write this in some encoding, such as ASCII where bytes correspond to human-readable characters. So, to write "8" you would need to write the byte 0x38 (ASCII value of 8).
So, in files you have data, not int or chars.
When we consider the memory location for an int or for a char we think as a whole. Integers are commonly stored using a word of memory, which is 4 bytes or 32 bits, so integers from 0 up to 4,294,967,295 (232 - 1) can be stored in an int variable. As we need total 32 bits (32/8 = 4) hence we need 4 bytes for an int variable.
But to store a ascii character we need 7 bits. The ASCII table has 128 characters, with values from 0 through 127. Thus, 7 bits are sufficient to represent a character in ASCII; (However, most computers typically reserve 1 bits more, (i.e. 8 bits), for an ASCII character
And about your question:-
and if I create an int variable as the number 12345 and a character array of "12345" the character array will have consumed more memory?
Yes from the above definition it is true. In the first case(int value) it just need 4 bytes and for the second case it need total 5 bytes. The reason is in the first case 12345 is a single integer value and in the second case "12345" are total 5 ascii characters. Even in the second case, you actually need one more byte to hold the '\0' character as a part of a string (marks end of string).
When int is defined the memory would be allocated based on compiler option ( it can be 4 to 8 bytes). The number assigned to int is stored as is.
e.g int a = 86;
The number 86 would be stored at memory allocated for a.
When char is defined , there are numbers assigned to each character. When these character needs to be printed the same would print but when its stored in memory it would stored as number. These are called ASCII character, there are some more.
The allocation to store is 1Byte because with 1Byte you can represent 2^8 symbols.
if a number is stored as a character it will contain 1 byte per character of the number(not 4 bytes)? for example if I make an int variable of the number 8 and a char variable of '8' the int variable will have consumed more memory?
Yes, since it is guaranteed that (assuming 8-bit bytes):
sizeof(char) == 1
sizeof(int) >= 2
if I create an int variable as the number 12345 and a character array of "12345" the character array will have consumed more memory?
Correct. See the different between:
strlen("12345") == 5
sizeof(12345) >= 2
Of course, for small numbers like 7, it is not true:
strlen("7") == 1
sizeof(7) >= 2
in text files if numbers are stored are they considered as integers or characters?
To read any data (be it in a file or in a clay tablet!) you need to know its encoding.
If it is a text file, then typically the numbers will be encoded using characters, possibly in their decimal representation.
If it is a binary file, then you may find them written as they are stored in memory for a particular computer.
In short, it depends.
I have a uint64 value that I want to convert into a string because it has to be inserted as the payload of an HTTP POST request.
I've already tried many solutions (ltoa, this solution ) but my problem still remains.
My function is the following:
void check2(char* fingerprint, guint64 s_id) {
//stuff
char poststr[400] = "action=CheckFingerprint&sessionid=";
//convert s_id to, for example, char* myChar
strcat(poststr, myChar);
}
I want to convert s_id to char*. I've tried:
1) char ses[8]; ltoa(s_id,ses,10) but I have a segmentation fault;
2) char *buf; sprintf(buf, "%" PRIu64, s_id);
I'm working on a APIs, so I have seen that when this guint64 variable is printed, it has the following form:
JANUS_LOG(LOG_INFO, "Creating new session: %"SCNu64"\n", session_id);
sprintf is the right way to go with an unsigned 64 bit format specifier.
You'll need to allocate enough space for 16 hex digits and the null byte. Here I've allocated 20 bytes to accommodate a leading 0x as well and then I rounded it up to 20 for no good reason other than it feels better than 19.
char foo[20];
sprintf(foo, "0x%016" PRIx64, (uint64_t)numberToConvert);
will print the number in hex with leading 0x and leading zeros padded up to 16. You do not need the cast if numberToConvert is already a uint64_t
i have a uint64 value that i want to convert into char* because of it have to be inserted as payload of an HTTP POST request.
What you have is a fundamental misunderstanding.
To insert a text representation of your value into a document, you need to convert it to a sequence of characters, which is quite a different thing from a pointer to a character (char *). One of your options, which seems to be what you're really after, is to convert the value to a sequence of characters in the form of a C string -- that is, a null-terminated array of characters. You would then have or be able to obtain a pointer to the first character in the sequence.
That explains what's wrong with this attempted solution:
char *buf;
sprintf(buf, "%" PRIu64, s_id);
You are trying to write the string representation of your number into the array pointed-to by buf, but it doesn't point to one. Not having been initialized or assigned, its value is indeterminate.
Even if you your buf pointed to an array, it is essential that the array be long enough to accommodate all the digits of the value's decimal representation, plus a terminator. That's probably what's wrong with your other attempt:
char ses[8]; ltoa(s_id,ses,10)
An unsigned, 64-bit binary number may require up to 20 decimal digits, plus you need space for a terminator. The array you're providing is not nearly large enough, unless you can be confident that the actual values you're going to write will not exceed 9,999,999 (which is well within the range of a 32-bit integer).
Why is a char 1 byte long in C? Why is it not 2 bytes or 4 bytes long?
What is the basic logic behind it to keep it as 1 byte? I know in Java a char is 2 bytes long. Same question for it.
char is 1 byte in C because it is specified so in standards.
The most probable logic is. the (binary) representation of a char (in standard character set) can fit into 1 byte. At the time of the primary development of C, the most commonly available standards were ASCII and EBCDIC which needed 7 and 8 bit encoding, respectively. So, 1 byte was sufficient to represent the whole character set.
OTOH, during the time Java came into picture, the concepts of extended charcater sets and unicode were present. So, to be future-proof and support extensibility, char was given 2 bytes, which is capable of handling extended character set values.
Why would a char hold more than 1byte? A char normally represents an ASCII character. Just have a look at an ASCII table, there are only 256 characters in the (extended) ASCII Code. So you need only to represent numbers from 0 to 255, which comes down to 8bit = 1byte.
Have a look at an ASCII Table, e.g. here: http://www.asciitable.com/
Thats for C. When Java was designed they anticipated that in the future it would be enough for any character (also Unicode) to be held in 16bits = 2bytes.
It is because the C languange is 37 years old and there was no need to have more bytes for 1 char, as only 128 ASCII characters were used (http://en.wikipedia.org/wiki/ASCII).
When C was developed (the first book on it was published by its developers in 1972), the two primary character encoding standards were ASCII and EBCDIC, which were 7 and 8 bit encodings for characters, respectively. And memory and disk space were both of greater concerns at the time; C was popularized on machines with a 16-bit address space, and using more than a byte for strings would have been considered wasteful.
By the time Java came along (mid 1990s), some with vision were able to perceive that a language could make use of an international stnadard for character encoding, and so Unicode was chosen for its definition. Memory and disk space were less of a problem by then.
The C language standard defines a virtual machine where all objects occupy an integral number of abstract storage units made up of some fixed number of bits (specified by the CHAR_BIT macro in limits.h). Each storage unit must be uniquely addressable. A storage unit is defined as the amount of storage occupied by a single character from the basic character set1. Thus, by definition, the size of the char type is 1.
Eventually, these abstract storage units have to be mapped onto physical hardware. Most common architectures use individually addressable 8-bit bytes, so char objects usually map to a single 8-bit byte.
Usually.
Historically, native byte sizes have been anywhere from 6 to 9 bits wide. In C, the char type must be at least 8 bits wide in order to represent all the characters in the basic character set, so to support a machine with 6-bit bytes, a compiler may have to map a char object onto two native machine bytes, with CHAR_BIT being 12. sizeof (char) is still 1, so types with size N will map to 2 * N native bytes.
1. The basic character set consists of all 26 English letters in both upper- and lowercase, 10 digits, punctuation and other graphic characters, and control characters such as newlines, tabs, form feeds, etc., all of which fit comfortably into 8 bits.
You don't need more than a byte to represent the whole ascii table (128 characters).
But there are other C types which have more room to contain data, like int type (4 bytes) or long double type (12 bytes).
All of these contain numerical values (even chars! even if they're represented as "letters", they're "numbers", you can compare it, add it...).
These are just different standard sizes, like cm and m for lenght, .
Why does int a = 'adf'; compile and run in C?
The literal 'adf' is a multi-byte character constant. Its value is platform dependent. Don't use it.
For example, one some platform a 32-bit unsigned integer could take the value 0x00616466, and on another it could be 0x66646100, and on yet another it could be 0x84860081...
This, as Kerrek said, is a multi-byte character constant. It works because each character takes up 8 bits. 'adf' is 3 characters, which is 24 bits. An int is usually large enough to contain this.
But all of the above is platform dependent, and could be different from architecture to architecture. This kind of thing is still used in ancient Apple code, can't quite remember where, although file creator codes ring a bell.
Note the difference in syntax between " and '.
char *x = "this is a string. The value assigned to x is a pointer to the string in memory"
char y = '!' // the value assigned to y is the numerical character value of the character '!'
char z = 'asd' // the value of z is the numerical value of the 'string' data, which can in theory be expressed as an int if it's short enough
It works just because "adf" is 3 ASCII characters and thus 3 bytes long and your platform is a 24 bit or larger system. It would fail on a 16bit system for instance.
Its also worth remembering that although sizeof(char) will always return 1, dependending on platform and compiler more than 1 byte of memory space could be assigned to a char hence for
struct st
{
int a;
char c;
};
when you:
sizeof(st) a number of 32 bit systems will return 8. This is because the system will pad out the single byte for char c to 4 bytes.
ASCII. Every character has a numerical value. Halfway through this tutorial is a description if you need more information http://en.wikibooks.org/wiki/C_Programming/Variables
Edit_______________________________________
char letter2 = 97; /* in ASCII, 97 = 'a' */
This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing – the letter "a", but the first method is clearer, easier to debug, and much more straightforward.
One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' is not equal to 1.