convert jchararray to jstring in JNI - c

I am using JNI below code to convert jchararray to jstring but i am getting only first character in Linux.
char *carr =(char*)malloc(length+1);
(*env)->GetCharArrayRegion(env, ch, 0, length, carr);
return (*env)->NewStringUTF(env, carr);

GetCharArrayRegion returns Java chars, i.e. UTF-16 code points. And jchars in JNI, and they're not null-terminated, and you cannot use NewStringUTF, which expects a null-terminated string comprising bytes in the modified UTF-8 encoding.
First, allocate the correct amount of memory
jchar *carr = malloc(length * sizeof(jchar));
Then execute the GetCharArrayRegion
(*env)->GetCharArrayRegion(env, ch, 0, length, carr);
Then notice that you've got an array of UTF-16 characters. If the first character falls into the ASCII range, and the architecture is little-endian, it is expected that you'd just "get the first character", because the MSB byte of the first jchar will be zero, and NewStringUTF would consider this the terminator. Use NewString instead:
return (*env)NewString(env, carr, length);

You should use the NewString() function which takes jchar array and its length. The NewStringUTF() function takes UTF-8 encoded C string as input.
See https://www3.ntu.edu.sg/home/ehchua/programming/java/JavaNativeInterface.html#zz-4.2 for more details.

Related

Why the strlen() function doesn't return the correct length for a hex string?

I have a hex string for example \xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF, when trying to use strlen() function to get the length of that hex string it returns 4!
const char string_[] = { "\xF5\x17\x30\x91\x00\xA1\xC9\x00\xDF\xFF" };
unsigned int string_length = strlen(string_);
printf("%d", string_length); // the result: 4
Is the strlen() function dealing with that hex as a string, or is something unclear to me?
For string functions in the C standard library, a character with value zero, also called a null character, marks the end of a string. Your string contains \x00, which designates a null character, so the string ends there. There are four non-null characters before it, so strlen returns four.
C 2018 7.1.1 1 says:
A string is a contiguous sequence of characters terminated by and including the first null character… The length of a string is the number of bytes preceding the null character…
C 2018 7.24.6.3 2 says:
The strlen function computes the length of the string pointed to by s [its first argument].
You could compute the size of your array as sizeof string_ (because it is an array of char) or sizeof string_ / sizeof *string_ (to compute the number of elements regardless of type), but this will include a terminating null character because defining an array with [] and letting the length be computed from a string literal initializer includes the terminating null character of the string literal. You may need to hard-code the length of the array, possibly using #define to define a preprocessor macro, and use that length in the array definition and in other places where the length is needed.
It is because you have zero at index [4]
string_[0] == 0xF5
string_[1] == 0x17
string_[2] == 0x30
string_[3] == 0x91
string_[4] == 0
...
"\xf5" puts char having integer value 0xf5 at position [0]
To see it as a string you need to escape the \ character
const char string_[] = "\\xF5\\x17\\x30\\x91\\x00\\xA1\\xC9\\x00\\xDF\\xFF";
At compile time, your "string" appears as consecutive hex values expressed in C syntax inside a pair of quotation marks.
strlen() is a run time function that scans through a series of bytes, looking for the first instance of a zero-value byte.
It's good to understand the difference between "compile time" and "run time".

What if a null character is present in the middle of a string?

I understand that the end of a string is indicated by a null character, but i cannot understand the output of the following code.
#include <stdio.h>
#include <string.h>
int
main(void)
{
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
}
OUTPUT: 5 9
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing? Even if it doesn't do the same thing, isn't '\0' A null character (i.e, only one character), so shouldn't the answer be 8?
The sizeof operator does not give you the length of a string but instead the size of the type of it's operand. Since in your code the operand is an array, sizeof is giving you the size of the array including both null characters.
If it were like this
const char *string = "This is a large text\0This is another string";
printf("%zu %zu\n", strlen(string), sizeof(string));
the result will be very different because string is a pointer and not an array.
Note: Use the "%zu" specifier for size_t which is what strlen() returns, and is the type of the value given by sizeof.
strlen() doesn't care about the actual size of the string. It looks for a null byte and stops when it sees the first null byte.
But sizeof() operator knows the total size. It doesn't care about what bytes you are in the string literal. You might as well have all null bytes in the string and sizeof() would still give the correct size of the array (strlen() would retrun 0 in that case).
They are not comparable; they do different things.
If strlen() detects the end of the string at the end of o, then why doesn't sizeof() do the same thing?
strlen only works for string (character array), whereas sizeof works for every data type. sizeof calculates the exact memory spaces for any given data type; whereas strlen provides the length of a string (NOT including the NULL terminator \0). So in normal cases, this is true for a typical character array s:
char s[] = "Hello";
strlen( s ) + 1 = sizeof( s ); // +1 for the \0
In your case it's different because you have a NULL terminator in the middle of character array s:
char s[] = "Hello\0Hi";
Here, strlen would detect the first \0 and gives the length as 5. The sizeof, however, will calculate the total number of spaces enough to hold the character arrays, including two \0, so that's why it gives 9 as the second output.
strlen() computes the length of the string. This is done by returning the amount of characters before (and not including) the '\0' character. (See the manual page below.)
sizeof() returns the amount of bytes of the given variable (or data-type). Note that your example "Hello\0Hi" has 9 characters. But you don't seem to understand where character 9 comes from in your question. Let me explain the given string first. Your example string is:
"Hello\0Hi"
This can be written as the following array:
['H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0']
Note the last '\0' character. When using the string quotes the compiler ends the string with an '\0' character. This means "" also is ['\0'] and thus has 1 element.
BEWARE that sizeof() does NOT return the number of elements in the array. It returns the amount of bytes. char is 1 byte and therefor sizeof() does returns the number of elements. But if you used any other datatype, for example if you would call sizeof() on [1, 2, 3, 4] it would return 16. Since int is 4 bytes and the array has 4 elements.
BEWARE that passing an array as parameter will only passes the pointer. If you would pass s to another function and call sizeof() it will return the size of the pointer, which is the same as sizeof(void *). This is a fixed length independent from the array.
STRLEN(3) BSD Library Functions Manual STRLEN(3)
NAME
strlen, strnlen -- find length of string
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <string.h>
size_t
strlen(const char *s);
size_t
strnlen(const char *s, size_t maxlen);
DESCRIPTION
The strlen() function computes the length of the string s. The strnlen()
function attempts to compute the length of s, but never scans beyond the
first maxlen bytes of s.
RETURN VALUES
The strlen() function returns the number of characters that precede the
terminating NUL character. The strnlen() function returns either the
same result as strlen() or maxlen, whichever is smaller.
SEE ALSO
string(3), wcslen(3), wcswidth(3)
STANDARDS
The strlen() function conforms to ISO/IEC 9899:1990 (``ISO C90'').
The strnlen() function conforms to IEEE Std 1003.1-2008 (``POSIX.1'').
BSD February 28, 2009 BSD
As name literal itself implies string literal is a sequence of characters enclosed in double quotes. Implicitly this sequence of characters is appended by a terminating zero.
So any character enclosed in the double quotes is a part of the string literal.
When a string literal is used to initialize a character array all its characters including the terminating zero serve as initializers of the corresponding elements of the character array.
Each string literal in turn has type of a character array.
For example this string literal "Hello\0Hi" in C has type char[9]: 8 characters enclosed in the quotes plus the implicit terminating zero.
So in memory this string literal is stored like
{ 'H', 'e', 'l', 'l', 'o', '\0', 'H', 'i', '\0' }
Operator sizeof returns the number of bytes occupied by an object. So for the string literal above the operator sizeof will return value 9- it is the number of bytes occupied by the literal in memory.
If you wrote "Hello\0Hi" then the compiler may not itself just remove this part Hi from the literal. It has to store it in memory along with other characters of the literal enclosed in quotes.
The sizeof operator returns the size in bytes of any object in C not only of character arrays.
In general character arrays can store any raw data for example some binary data read from a binary file. In this case this data is not considered by the user and by the program like strings and as result are processed differently than strings.
Standard C function strlen is specially written for character arrays that to find the length of a stored string in a character array. It does not know what data are stored in an array and how they were written in it. All what it does is searches the first zero character in a character array and returns the number of characters in the character array before the zero character.
You can store in one character array several strings sequentially. For example
char s[12];
strcpy( s, "Hello" );
strcpy( s + sizeof( "Hello" ), "World" );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
If you would define a two dimensional array like this
char t[2][6] = { "Hello", "World" };
then in memory it will be stored the same way as the one-dimensional array above. So you can write
char *s = ( char * )t;
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
Another example. Standard C function strtok can split one string stored in a character array to several strings substituting the specified by the user delimiters with zero bytes. As result the character array will contain several strings.
For example
char s[] = "Hello World";
printf( "%zu\n", sizeof( s ) ); // outputs 12
strtok( s, " " );
puts( s ); // outputs "Hello"
puts( s + sizeof( "Hello" ) ); // outputs "World"
printf( "%zu\n", sizeof( s ) ); // outputs 12
The last printf statement will output the same value equal to 12 because the array occupies the same number of bytes. Simply one byte in the memory allocated for the array was changed from ' ' to '\0'.
Character arrays in C and pointers to character array are not same thing. Though you can print addresses and get same value.
An array in C is made up of following things.
Size of array
Its address / pointer
Homogenous Type of elements
Where a pointer is made up of just:
Address
Type information
char s[] = "Hello\0Hi";
printf("%d %d", strlen(s), sizeof(s));
Here you are calculating the size of array (which is s variable) using sizeof() which is 9.
But if you treat this character array as string than array(string now) looses its size information and become just a pointer to a character. Same thing happens when you try to print character array using %s.
So strlen() and %s treat character array as string and it utilize its address information only. You can guess, strlen() keep incrementing the pointer to calculate the length up-to first null character. When it encounter a null character you get a length up-to that point.
So the strlen() gives you 5 and do not count null character.
So sizeof() operator tells only the size of its operand. If you give it array variable than it utilize the array size information and tells the size regardless of null character position.
But if you give sizeof() the pointer to array of characters than it finds pointer without the size information and prints the size of pointer which is usually 64bit/8byte on 64bit systems or 32bit/4bytes on 32bit systems.
One more thing if you initialize your character arrays using double quotes like "Hello" than C adds a null character otherwise it does not in case of {'H','e','l','l','o'}.
Using gcc compiler. Hope it will help only to understand.

Does strlen() always correctly report the number of char's in a pointer initialized string?

As long as I use the char and not some wchar_t type to declare a string will strlen() correctly report the number of chars in the string or are there some very specific cases I need to be aware of? Here is an example:
char *something = "Report all my chars, please!";
strlen(something);
What strlen does is basically count all bytes until it hits a zero-byte, the so-called null-terminator, character '\0'.
So as long as the string contains a terminator within the bounds of the memory allocated for the string, strlen will correctly return the number of char in the string.
Note that strlen can't count the number of characters (code points) in a multi-byte encoded string (like UTF-8). It will correctly return the number of bytes in the string though.

C get size of hex shellcode

How can I get the size of my char pointer
char *data = "\x30\x2e\x30\x2e\x30\x2e\x30\x3a\x30";
Using strlen(data) or sizeof(data) always returns 1
strlen counts characters in C string till it encounters 0. It maybe
tricky to get length of shell code using strlen since shell code
may contain 0 bytes in between; it appears there even exists notion of null free shell code (link) - in that case I believe you can use strlen. Otherwise you can try:
char data[] = "\x30\x2e\x30\x2e\x30\x2e\x30\x3a\x30"; // you were missing \ in the beginning
printf("%zu", sizeof(data));
gives you 10 bytes.
Defines like this:
char data[] = "x30\x2e\x30\x2e\x30\x2e\x30\x3a\x30";
Then use sizeof(data)
strlen( data ) gives 11. It is the number of characters in the string literal "x30\x2e\x30\x2e\x30\x2e\x30\x3a\x30" that preceed the terminating zero.
sizeof( data ) returns the size of the pointer itself and is implementation defined. Usually it is either 4 or 8 bytes and this value does not depend of the size of the string literal the pointer points to..
You can ask why strlen( data ) returns 11. It is because you forgot the first backslash before the first character "x30...". Thus the string literal starts from three characters 'x', '3', and '0'. All other characters are specified like hexadecimal escape characters .
I think you mean
"\x30\x2e\x30\x2e\x30\x2e\x30\x3a\x30"
If the system where the program runs is used ASCII coding then this string literal is equivalent to
0.0.0.0:0

char Array problem in C

char label[8] = "abcdefgh";
char arr[7] = "abcdefg";
printf("%s\n",label);
printf("%s",arr);
====output==========
abcdefgh
abcdefgÅ
Why Å is appended at the end of the string arr?
I am running C code in Turbo C ++.
printf expects NUL-terminated strings. Increase the size of your char arrays by one to make space for the terminating NUL character (it is added automatically by the = "..." initializer).
If you don't NUL-terminate your strings, printf will keep reading until it finds a NUL character, so you will get a more or less random result.
Your variables label and arr are not strings. They are arrays of characters.
To be strings (and for you to be able to pass them to functions declared in <string.h>) they need a NUL terminator in the space reserved for them.
Definition of "string" from the Standard
7.1.1 Definitions of terms
1 A string is a contiguous sequence of characters terminated by and including
the first null character. The term multibyte string is sometimes used
instead to emphasize special processing given to multibyte characters
contained in the string or to avoid confusion with a wide string. A pointer
to a string is a pointer to its initial (lowest addressed) character. The
length of a string is the number of bytes preceding the null character and
the value of a string is the sequence of the values of the contained
characters, in order.
Your string is not null terminated, so printf is running into junk data. You need to use the '\0' at the end of the string.
Using GCC (on Linux), it prints more garbage:
abcdefgh°ÃÕÄÕ¿UTÞÄÕ¿UTÞ·
abcdefgabcdefgh°ÃÕÄÕ¿UTÞÄÕ¿UTÞ·
This is because, you are printing two character arrays as strings (using %s).
This works fine:
char label[9] = "abcdefgh\0"; char arr[8] = "abcdefg\0";
printf("%s\n",label); printf("%s",arr);
However, you need not mention the "\0" explicitly. Just make sure the array size is large enough, i.e 1 more than the number of characters in your strings.

Resources