Handling BSTRs on MacOSX in C - c

I've written some code in C for converting strings passed from VBA, when the C code is called from VBA from a MacOSX dylib. I got some good hints here, and since I only care about ASCII strings I've written the following functions to convert the BSTR to a simple char*:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include "myheader.h"
size_t vbstrlen(BSTR *vbstr)
{
size_t len = 0U;
while(*(vbstr++)) ++len;
len = len*2;
return len;
}
void vbstochr(BSTR *vbstr, char** out)
{
int len2 = vbstrlen(vbstr);
char str[len+1];
int i;
for(i = 0; i < len; i++)
{
str[i] = (char) (((uint16_t*) vbstr)[i]);
}
str[i] = '\0';
asprintf(out, str);
}
int test(BSTR *arg1)
{
char* convarg;
vbstochr(arg1, &convarg);
return 1;
}
The myheader.h looks like this:
typedef uint16_t OLECHAR;
typedef OLECHAR * BSTR;
. I used uint16_t because of the 4 byte (not 2 byte) wchar_t in the MacOSX C compiler. I added a breakpoint after vbstochar is called to look at the content of convarg, and it seems to work when called from Excel.
So this works, but one thing I don't understand is why I have to multiply my len in the vbstrlen function by 2. I'm new to C, so I had to read up on pointers a little bit - and I thought since my BSTR contains 2 byte characters, I should get the right string length without having to multiply by two? It would be great if someone could explain this to me, or post a link to a tutorial?
Also, my functions with string arguments work when called in VBA, but only after the first call. So when I call a function with a BSTR* argument from a dylib for the first time (after I start the application, Excel in this case), the BSTR* pointer just points at some (random?) address, but not the string. When I call the function from VBA a second time, everything works just fine - any ideas why this is the case?!

A BSTR has an embedded length, you do not need to calculate the length manually.
As for the need to multiply the length by 2, that is because a BSTR uses 2-byte characters, but char is only 1 byte. You coded your vbstrlen() function to return the number of bytes in the BSTR, not the number of characters.
Since you are only interested in ASCII strings, you can simplify the code to the following:
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include "myheader.h"
size_t vbstrlen(BSTR *vbstr)
{
if (vbstr)
return *(((uint32_t*)vbstr)-1);
return 0;
}
void vbstochr(BSTR *vbstr, char** out)
{
size_t len = vbstrlen(vbstr);
char str[len+1] = {0};
for(size_t i = 0; i < len; ++i)
str[i] = (char) vbstr[i];
asprintf(out, str);
}

The chances are that the VB string is a UTF-16 string that uses 2 bytes per character (except for characters beyond the BMP, Basic Multilingual Plane, or U+0000..U+FFFF, which are encoded as surrogate pairs). So, for your 'ASCII' data, you will have alternating ASCII characters and zero bytes. The 'multiply by 2' is because UTF-16 uses two bytes to store each counted character.
This is almost definitive when we see:
typedef uint16_t OLECHAR;
typedef OLECHAR * BSTR;

Related

Handling cryptic output when printing bytes array in C

I've recently encountered a problem with a script I wrote.
#include "md5.h"
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
char* hashfunc (char* word1, char* word2){
MD5_CTX md5;
MD5_Init(&md5);
char * word = (char *) malloc(strlen(word1)+ strlen(word2) );
strcpy(word,word1);
strcat(word,word2);
MD5_Update(&md5,word,strlen(word));
unsigned char* digest = malloc(sizeof(char)* 16); //MD5 generates 128bit hashes, each char has 4 bit, 128/4 = 32
MD5_Final(digest,&md5);
return digest;
}
int main(int argc, char* argv[]){
char* a = argv[1];
char* b = argv[2];
unsigned char* ret = hashfunc(a,b);
printf("%s\n",ret);
printf("%i\n",sizeof(ret));
}
As the hash function returns an array of unsigned chars I thought I'd print that as is.
Unfortunately, the following is my output:
��.�#a��%Ćw�0��
which, according to sizeof() is 8 bytes long.
How do I convert that to a readable format?
Thanks in advance!
PS:
Output should look like this:
1c0143c6ac7505c8866d10270a480dec
Firstly, sizeof a pointer will give the size of a pointer to char, which is the size of a word in your machine (I suppose it’s 64-bit, since your size returned 8). Pointers do not carry information of the size of the pointer, you’d have to return it elsewhere.
Anyway, since you know that a MD5 digest has 16 bytes, you can just iterate over each of them and print each byte in a more readable format using sprintf. Something like that:
for (int i = 0; i < 16; i++)
printf("%02x", (int)(unsigned char)digest[i]);
putchar('\n');
If you want to print it to a file, change printf to fprintf and putchar to fputc (the arguments change a bit however).
To put it into a string, you’d have to sprint each byte in the correct position of the string, something like this:
char* str = malloc(33 * sizeof(char));
for (int i = 0; i < 16; i++)
sprintf(str+2*i, "%02x", (int)(unsigned char)digest[i]);
P.S: don’t forget to free everything after.
There is no guarantee that your hashfunc is going to produce printable ASCII strings. In theory since they are really just binary data they could have embedded 0s which will screw up all the normal string handling functions anyway.
Best bet is to print each unsigned char as an unsigned char via a for loop.
void printhash(unsigned char* hash)
{
for(int i = 0; i < 16; i++)
{
printf("%02x", hash[i]);
}
printf("\n");
}

Extracting time from timestamp

I'm trying to extract the time section from an ISO8601 timestamp.
e.g. From the following timstamp"0001-01-01T17:45:33" I want to extract this part "17:45:33".
you have a few options.
lets say you have it in a variable char array called string.
now if you know that it the time will always be at the end of the string it`s very easy you can just do:
#define TIMEWIDTH 8
#include <stdio.h>
#include <string.h>
int main() {
const char string[] = {"0001-01-01T17:45:33\0"};
unsigned int strlength = strlen(string);
char temp[TIMEWIDTH + 1]; // add one for null character
printf("%s\n", string);
strncpy(temp, string + strlength - TIMEWIDTH, TIMEWIDTH + 1); // another + 1 for the null char
printf("%s\n", temp);
}
if it's more complex you have to do some more analyzing to find it. either manually or use different available tools like sscanf() or something else. make sure to specifiec widths for sscanfs().
http://www.tutorialspoint.com/c_standard_library/c_function_sscanf.htm
if the T indicates the start of the time you can look for it with strchr:
#include <stdio.h>
#include <string.h>
int main() {
const char string[] = {"0001-01-01T17:45:33\0"};
char *temp;
temp = strchr(string, 'T') + 1;
printf("%s\n", temp);
}
It really depends on how variable the input is... If it is just a singular example you can use either. Allthough the last one is more efficient.
You indicated that it is an ISO 8601 timestamp. then I would just use the 2nd method.

Reading data from an Ubuntu terminal and storing it in different variables in C

I am writing a program that stores information of the iwscan Ubuntu command. I am actually reading properly the file that is created with the information; however, when trying to store data(ESSID in a string, Channel in an int and Quality in a double), I have several problems treating the strings to extract the data...
The code is as following:
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
char *tratarEssid(char *cadena);
int tratarCanal(char *cadena);
double tratarCalidad(char *cadena);
int tratarCanal(char *cadena){
char resultado[2];
strncpy(resultado,cadena+9,2);
int canal;
canal=atoi(resultado);
return canal;
}
double tratarCalidad(char *cadena){
char resultado[6];
strncpy(resultado,cadena+8,6);
char num[2];
char den[2];
strncpy(num,resultado,2);
if(strlen(resultado)==5)
strncpy(den,resultado+3,2);
else
strncpy(den,resultado+2,2);
double numerador=atof(num);
double denominador=atof(den);
double calidad=numerador/denominador;
return calidad;
}
char *tratarEssid(char *cadena){
char *essid;
essid=(char *)malloc(sizeof(char)*10);
strncpy(essid,cadena+7,10);
return essid;
}
int main(){
int i;
const char *CHECKCANAL = "Channel:";
const char *CHECKQUALITY = "Quality=";
const char *CHECKESSID = "ESSID:";
double calidad;
int canal;
char *essid;
char cadena[20];
system("iwlist wlan0 scan | egrep \"(Channel|Signal level|ESSID)\">/home/wein/Escritorio/basic/bin/Debug/Lista.txt");
printf("Lista hecha\n");
lista=fopen("Lista.txt","r");
printf("Lista abierta\n");
while (!feof(lista)){
fgets(cadena,20,lista);
printf("%s",cadena);
if (strncmp(CHECKCANAL,cadena,strlen(CHECKCANAL))==0){
canal=tratarCanal(cadena);
printf("CANAL: %d\n",canal);
}
else if (strncmp(CHECKQUALITY,cadena,strlen(CHECKCANAL))==0){
calidad=tratarCalidad(cadena);
printf("CALIDAD: %f",calidad);
}
else if(strncmp(CHECKESSID,cadena,strlen(CHECKESSID))==0){
essid=tratarEssid(cadena);
printf("ESSID: %s\n",essid);
}
}
return 0;
}
So I know that my problem is in the conditionals made to filterig and treating the useful strings, just I don't know why the strncmp doesn't work properly (It should compare the beginning of the line with content of the String, or that's the idea) and thus, the functions don't work properly (Maybe I messed up in the functions as well...). Is there any other chance for treating the strings I receive correctly??
The output of the printf of the char[] cadena is just like this
Channel:11
Frequency:2.462 GH
z (Channel 11)
Quality=57/70 Sig
nal level=-53 dBm
ESSID:"eduroam"
And I should be able to extract from there the ESSID, Quality and Channel.
Thanks for any idea/suggestion/help received.
use strtok() on cadena[] to divide the string into tokens as,
token[0]=strtok(cadena,"\n") //considering the output you gave each date is in newline
token[1]=strtok(NULL,"\n")
token[2]=strtok(NULL,"\n")
token[3]=strtok(NULL,"\n")
token[4]=strtok(NULL,"\n")
token[5]=strtok(NULL,"\n")
then use sscanf() to get the required data from the token as,
sscanf(token[5],"ESSID:%s",ESSID);
sscanf(token[3],"Quality=%d/%d",&q1,&q2 ) //quality=q1/q2
sscanf(token[0],"Channel:%d",&channel)
this will give the required values from the string cadena[].
Your strncpy function calls don't add the terminator to the strings you copy to. Besides, they are to small to contain the terminator.
That means that when you call functions such as atoi on an unterminated string it will cause undefined behavior, as the function continues beyond the end of the allocated memory.

Why does this c program crashe?

I want to make a list of , for example 10 sentences that are entered through the keyboard. For getting a line I am using a function getline(). Can anybody explain why does this program crash upon entering the second line? Where is the mistake ?
#define LISTMAX 100
#define LINEMAX 100
#include <stdio.h>
#include <string.h>
void getline(char *);
int main ()
{
char w[LINEMAX], *list[LISTMAX];
int i;
for(i = 0; i < 10; i++)
{
getline(w);
strcpy(list[i], w);
}
for(i = 0; i < 10; i++)
printf("%s\n", list[i]);
return 0;
}
void getline(char *word)
{
while((*word++ = getchar()) != '\n');
*word = '\0';
}
A string is a block of memory (an array), which contains chars, terminated by '\0'. A char * is not a string; it's just a pointer to the first char in a string.
strcpy does not create a new string. It just copies the data from one block of memory to another. So your problem is: you haven't allocated a block of memory to hold the string.
I'll show you two solutions. The first solution is: change the declaration of list so that the memory is already allocated. If you do it this way, you can avoid using strcpy, so your code is simpler:
// no need for w
char list[10][LISTMAX];
// ...
// get the line straight into list
// no need to copy strings
getline(list[i]);
But if you want to stretch yourself, the second solution is to allocate the block of memory when you know you'll need it. You need to do this a lot in C, so maybe now is a good time to learn this technique:
#include <stdlib.h> // include the malloc function
// ...
char w[LINEMAX], * list[LISTMAX]
// put this line between the getline and strcpy lines
list[i] = (char *) malloc((strlen(w) + 1) * sizeof(char));
This solution is more complicated, but you only allocate as much memory as you need for the string. If the string is 10 characters long, you only request enough memory to hold 11 characters (10 characters + '\0') from the system. This is important if, say, you want to read in a file, and you've no idea how big the file will be.
By the way, why do you have LINEMAX and LISTMAX as separate constants? Can you think of a reason why they might be different? And why haven't you made 10 a constant? Wouldn't this be better?
#define LINEMAX 100
#define NUMBER_OF_LINES 10
// ...
char list[NUMBER_OF_LINES][LINEMAX];
// ...
for (i = 0; i < NUMBER_OF_LINES; i++)

Segmentation Fault in Simple Offset Encryption

Alright guys, this is my first post here. The most recent assignment in my compsci class has us coding a couple of functions to encode and decode strings based on a simple offset. So far in my encryption function I am trying to convert uppercase alphas in a string to their ASCII equivalent(an int), add the offset(and adjust if the ASCII value goes past 'Z'), cast that int back to a char(the new encrypted char) and put it into a new string. What I have here compiles fine, but it gives a Segmentation Fault (core dumped) error when I run it and input simple uppercase strings. Where am I going wrong here? (NOTE: there are some commented out bits from an attempt at solving the situation that created some odd errors in main)
#include <stdio.h>
#include <string.h>
#include <ctype.h>
//#include <stdlib.h>
char *encrypt(char *str, int offset){
int counter;
char medianstr[strlen(str)];
char *returnstr;// = malloc(sizeof(char) * strlen(str));
for(counter = 0; counter < strlen(str); counter++){
if(isalpha(str[counter]) && isupper(str[counter])){//If the character at current index is an alpha and uppercase
int charASCII = (int)str[counter];//Get ASCII value of character
int newASCII;
if(charASCII+offset <= 90 ){//If the offset won't put it outside of the uppercase range
newASCII = charASCII + offset;//Just add the offset for the new value
medianstr[counter] = (char)newASCII;
}else{
newASCII = 64 + ((charASCII + offset) - 90);//If the offset will put it outside the uppercase range, add the remaining starting at 64(right before A)
medianstr[counter] = (char)newASCII;
}
}
}
strcpy(returnstr, medianstr);
return returnstr;
}
/*
char *decrypt(char *str, int offset){
}
*/
int main(){
char *inputstr;
printf("Please enter the string to be encrypted:");
scanf("%s", inputstr);
char *encryptedstr;
encryptedstr = encrypt(inputstr, 5);
printf("%s", encryptedstr);
//free(encryptedstr);
return 0;
}
You use a bunch of pointers, but never allocate any memory to them. That will lead to segment faults.
Actually the strange thing is it seems you know you need to do this as you have the code in place, but you commented it out:
char *returnstr;// = malloc(sizeof(char) * strlen(str));
When you use a pointer you need to "point" it to allocated memory, it can either point to dynamic memory that you request via malloc() or static memory (such as an array that you declared); when you're done with dynamic memory you need to free() it, but again you seem to know this as you commented out a call to free.
Just a malloc() to inputstr and one for returnstr will be enough to get this working.
Without going any further the segmentation fault comes from your use of scanf().
Segmentation fault occurs at scanf() because it tries to write to *inputstr(a block of location inputstr is pointing at); it isn't allocated at this point.
To invoke scanf() you need to feed in a pointer in whose memory address it points to is allocated first.
Naturally, to fix the segmentation fault you want to well, allocate the memory to your char *inputstr.
To dynamically allocate memory of 128 bytes(i.e., the pointer will point to heap):
char *inputstr = (char *) malloc(128);
Or to statically allocate memory of 128 bytes(i.e., the pointer will point to stack):
char inputstr[128];
There is a lot of complexity in the encrypt() function that isn't really necessary. Note that computing the length of the string on each iteration of the loop is a costly process in general. I noted in a comment:
What's with the 90 and 64? Why not use 'A' and 'Z'? And you've commented out the memory allocation for returnstr, so you're copying via an uninitialized pointer and then returning that? Not a recipe for happiness!
The other answers have also pointed out (accurately) that you've not initialized your pointer in main(), so you don't get a chance to dump core in encrypt() because you've already dumped core in main().
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
char *encrypt(char *str, int offset)
{
int len = strlen(str) + 1;
char *returnstr = malloc(len);
if (returnstr == 0)
return 0;
for (int i = 0; i < len; i++)
{
char c = str[i];
if (isupper((unsigned char)c))
{
c += offset;
if (c > 'Z')
c = 'A' + (c - 'Z') - 1;
}
returnstr[i] = c;
}
return returnstr;
}
Long variable names are not always helpful; they make the code harder to read. Note that any character for which isupper() is true also satisfies isalpha(). The cast on the argument to isupper() prevents problems when the char type is signed and you have data where the unsigned char value is in the range 0x80..0xFF (the high bit is set). With the cast, the code will work correctly; without, you can get into trouble.

Resources