Iterating over all files in OFN_ALLOWMULTISELECT with Unicode - c

What is the recommended way of iterating over all files selected in an OFN_ALLOWMULTISELECT open file dialog with Unicode enabled?
My first idea was something like this:
TCHAR *tmp = ofn.lpStrFile + ofn.nFileOffset;
while(*tmp) {
wprintf("Got file: %s\n", tmp);
tmp += wcslen(tmp) + 1;
}
But then it occurred to me that this won't work in case there are characters in the string buffer that can't be represented in 16 bits. So for a safe approach I'd first need to find out the byte length of the tmp TCHAR string, then cast the TCHAR pointer to char and add that byte length in each iteration. Something like this:
TCHAR *tmp = ofn.lpStrFile + ofn.nFileOffset;
while(*tmp) {
wprintf("Got file: %s\n", tmp);
tmp = (TCHAR *) (((char *) tmp)) + get_byte_len_of_tstr(tmp));
}
Note that get_byte_len_of_tstr() is just a placeholder for a function that would've to be written for this purpose. Since this approach looks somewhat clumsy, I'd first like to ask for some feedback whether this is really the way to go or whether I've missed or misunderstood something here...

Your first example was on the right track, but has a couple of mistakes:
your variable should be declared WCHAR* instead of TCHAR*.
wprintf() does not accept a char* format string as input, it takes a wchar_t* instead.
WCHAR *tmp = ofn.lpStrFile + ofn.nFileOffset;
while (*tmp)
{
wprintf(L"Got file: %s\n", tmp);
tmp += (wcslen(tmp) + 1);
}
If you want to use TCHAR (and you really shouldn't, unless you need to support Win9x/ME), then it would look like this instead:
TCHAR *tmp = ofn.lpStrFile + ofn.nFileOffset;
while (*tmp)
{
_tprintf(_T("Got file: %s\n"), tmp);
tmp += (_tcslen(tmp) + 1);
}
That being said, your understanding of wcslen() is wrong (but your use of it is correct). In Windows, a Unicode string is encoded in UTF-16, where each WCHAR element is a UTF-16 codeunit. wcslen() counts the number of WCHAR elements in the string, not the number of Unicode codepoints that they represent, like you are thinking. So, if a given codepoint requires a UTF-16 surrogate pair, it will use two WCHAR elements in the string, and wcslen() will count 2 for it. Otherwise, it will use 1 WCHAR and wcslen() will count 1 for it.
The same is true for strlen() and MBCS strings, when a given Unicode codepoint is encoded using more than 1 codeunit (char element) in the string.

Related

Append extra null char to wide string

Some Win32 API structures require to concatenate an extra null character to a string, as in the following example taken from here:
c:\temp1.txt'\0'c:\temp2.txt'\0''\0'
When it comes to wide strings, what is the easiest way to append a L'\0' to the end of an existing wide string?
Here's what works for me but seems too cumbersome:
wchar_t my_string[10] = L"abc";
size_t len = wcslen(my_string);
wchar_t nullchar[1] = {'\0'};
memcpy(my_string + len + 1, nullchar, sizeof(wchar_t));
In your example you can just assign the value just like any other array. There's nothing special about wchar_t here.
my_string already has a single null-termination, so if you want double null-termination, then just add another 0 after it.
wchar_t my_string[10] = L"abc";
size_t len = wcslen(my_string);
// todo: check out-of-bounds
my_string[len + 1] = 0;
Or even simpler, if it's really just a string literal,
wchar_t my_string[10] = L"abc\0";
This will be doubly-null-terminated.
Assuming you have the various paths in a std::vector<std::wstring>, you can just build the required format in a loop:
std::vector<std::wstring> paths;
paths.emplace_back(L""); // This empty path will add the extra NUL
std::wstring buf(1000, 0);
for (auto p : paths) {
buf.append(p);
buf.append(1, 0);
}
wchar_t *ptr = buf.c_str(); // Now do stuff with it
assuming my_string is long enough:
my_string[wcslen(my_string)+1]='\0';
The terminating null will be translated to a wide char.
(Posted as a first comment to the question)
If you use std::wstring instead of wchar_t[], you can use its operator+= to append the extra null terminator, eg:
wstring my_string = L"abc";
...
my_string += L'\0';
// use my_string.c_str() as needed...

Concatenate char array and char

I am new to C language. I need to concatenate char array and a char. In java we can use '+' operation but in C that is not allowed. Strcat and strcpy is also not working for me. How can I achieve this? My code is as follows
void myFunc(char prefix[], struct Tree *root) {
char tempPrefix[30];
strcpy(tempPrefix, prefix);
char label = root->label;
//I want to concat tempPrefix and label
My problem differs from concatenate char array in C as it concat char array with another but mine is a char array with a char
Rather simple really. The main concern is that tempPrefix should have enough space for the prefix + original character. Since C strings must be null terminated, your function shouldn't copy more than 28 characters of the prefix. It's 30(the size of the buffer) - 1 (the root label character) -1 (the terminating null character). Fortunately the standard library has the strncpy:
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = root->label;
tempPrefix[buffer_size - 1] = '\0';
It's also worthwhile not to hard code the buffer size in the function calls, thus allowing you to increase its size with minimum changes.
If your buffer isn't an exact fit, some more legwork is needed. The approach is pretty much the same as before, but a call to strchr is required to complete the picture.
size_t const buffer_size = sizeof tempPrefix; // Only because tempPrefix is declared an array of characters in scope.
strncpy(tempPrefix, prefix, buffer_size - 3);
tempPrefix[buffer_size - 2] = tempPrefix[buffer_size - 1] = '\0';
*strchr(tempPrefix, '\0') = root->label;
We again copy no more than 28 characters. But explicitly pad the end with NUL bytes. Now, since strncpy fills the buffer with NUL bytes up to count in case the string being copied is shorter, in effect everything after the copied prefix is now \0. This is why I deference the result of strchr right away, it is guaranteed to point at a valid character. The first free space to be exact.
strXXX() family of functions mostly operate on strings (except the searching related ones), so you will not be able to use the library functions directly.
You can find out the position of the existing null-terminator, replace that with the char value you want to concatenate and add a null-terminator after that. However, you need to make sure you have got enough room left for the source to hold the concatenated string.
Something like this (not tested)
#define SIZ 30
//function
char tempPrefix[SIZ] = {0}; //initialize
strcpy(tempPrefix, prefix); //copy the string
char label = root->label; //take the char value
if (strlen(tempPrefix) < (SIZ -1)) //Check: Do we have room left?
{
int res = strchr(tempPrefix, '\0'); // find the current null
tempPrefix[res] = label; //replace with the value
tempPrefix[res + 1] = '\0'; //add a null to next index
}

Evaluating type identifiers (%d) in strings, in C

I am working on Mac OSX and using bash as my shell. I am working in C and I am trying to create a file that will renumber files. The important part of my code is as follows:
int i;
for (i=0; i<numberOfFiles; i++) {
strcpy(fileName,""); //Set to Null
char append[formatLength]; //String being appended
sprintf(append,"%%0%dd", formatLength); //example output: %04d
strcat(fileName,filePrefix); //Attached Prefix
strcat(fileName,append); //Attaches appended part
//Missing code: Part which equvaluates %04d as int i, such as 0023.
}
This gets me the correct string format I am looking for (say formatLength=4): filePrefix+%04d. However, now I need to evaluate the %04d in the string and evaluate it as i, so that the files look like: file0001, file0002, etc.
Would anyone have any ideas. Thanks for your help.
Use the string you created with snprintf() as the format string for the next call to snprintf().
int formatLength = /* some input */;
char filePrefix[FILEPREFIX_LEN]; // assigned by some input
const int FILENAME_LEN = strlen(filePrefix) + formatLength + 1; // +1 for terminating '\0'
char fileName[FILENAME_LEN];
int i;
for (i=0; i<numberOfFiles; i++) {
char temp[TEMPLATE_LEN]; // where TEMPLATE_LEN >= FILEPREFIX_LEN + 3 + number of characters in the decimal representation of formatLength
snprintf(temp, TEMPLATE_LEN, "%s%%0%dd", filePrefix, formatLength);
// error check snprintf here, in case the destination buffer was not large enough
snprintf(fileName, FILENAME_LEN, temp, i);
// error check snprintf here, in case the destination buffer was not large enough
// use fileName
}
So if your filePrefix = "file" then you'd get fileName = "file0001", "file0002", "file0003", and so on...
Although a lot of this work isn't actually dependant on i so you could move it outside the loop, like this:
int formatLength = /* some input */;
char filePrefix[FILEPREFIX_LEN]; // assigned by some input
const int FILENAME_LEN = strlen(filePrefix) + formatLength + 1; // +1 for terminating '\0'
char fileName[FILENAME_LEN];
char temp[TEMPLATE_LEN]; // where TEMPLATE_LEN >= FILEPREFIX_LEN + 3 + number of characters in the decimal representation of formatLength
snprintf(temp, TEMPLATE_LEN, "%s%%0%dd", filePrefix, formatLength);
// error check snprintf here, in case the destination buffer was not large enough
int i;
for (i=0; i<numberOfFiles; i++) {
snprintf(fileName, FILENAME_LEN, temp, i);
// error check snprintf here, in case the destination buffer was not large enough
// use fileName
}
In these cases, your temp (short for "template", not "temporary") is going to be "prefix%04d" (e.g., for a prefixLength of 4 and filePrefix of "prefix"). You need to take care, then, that your filePrefix does not contain any characters that have special meaning to the printf family of functions. If you know a priori that it won't, then you're good to go.
However, if it's possible it will, then you need to do one of two things. You can process the filePrefix before you use it by escaping all the special characters. Or you can change your snprintf() calls to something like these:
snprintf(temp, TEMPLATE_LEN, "%%s%%0%dd", formatLength);
// other stuff...
snprintf(fileName, FILENAME_LEN, temp, filePrefix, formatLength);
Note the extra % at the beginning of the first snprintf(). This makes the template pattern "%s%04d" (e.g., for a prefixLength of 4), and then you add the filePrefix on the second call so that it's contents are not part of the pattern string in the second call.
If I understand your question correctly, you should be able to say
char result[(sizeof filePrefix/sizeof (char)) + formatLength];
sprintf(result, fileName, i);
since fileName looks something like "filePrefix%04d". Your desired filename will be then stored in result. I would not recommend re-storing it in fileNameby saying sprintf(fileName, fileName, i) because fileName may be too small (for example, when formatLength = 9).
Note that you need (sizeof filePrefix/sizeof (char)) to find the size of filePrefix (which is likely also char*), and then you add formatLength to see how many more chars you need after that
You can build a format string then use that as a format string for another formatter call. Note they the prefix and number format specifier can be built into a single string - no need for strcat calls.
Given:
char format_specifier[256] ;
then the loop code in your example can be replaced with:
snprintf( format_specifier,
sizeof( format_specifier),
"%s%%0%dd",
filePrefix,
formatLength ) ; // Create format string "<filePrefix>%0<formatLength>",
// eg. "file%04d"
snprintf( fileName, // Where the filename will be built
sizeof(fileName), // The length of the filename buffer
format_specifier, // The previously built format string
i ) ; // The file number.
I have assumed above that fileName is an array, if it is a pointer to an array, then sizeof(fileName) will be incorrect. Of course if you choose to use sprintf rather than snprintf it is academic.
sprintf(fileNameString, fileName, i); // I think you mean this, but use snprintf()
You are almost there
// This line could be done before the loop
sprintf(append,"%%0%dd", formatLength); //example output: %04d
// Location to store number
char NumBuffer[20];
// Form textual version of number
sprintf(NumBuffer, append, i);
strcat(fileName,filePrefix); //Attached Prefix
strcat(fileName,NumBuffer); //Attaches appended part

"Pattern matching" and extracting in C

I need to parse a lot of filenames (up to 250000 I guess), including the path, and extract some parts out of it.
Here is an example:
Original: /my/complete/path/to/80/01/a9/1d.pdf
Needed: 8001a91d
The "pattern" I am looking for will always begin with "/8". The parts I need to extract form an 8 hex-digits string.
My idea is the following (simplyfied for demonstration):
/* original argument */
char *path = "/my/complete/path/to/80/01/a9/1d.pdf";
/* pointer to substring */
char *begin = NULL;
/* final char array to be build */
char *hex = (char*)malloc(9);
/* find "pattern" */
begin = strstr(path, "/8");
if(begin == NULL)
return 1;
/* jump to first needed character */
begin++;
/* copy the needed characters to target char array */
strncpy(hex, begin, 2);
strncpy(hex+2, begin+3, 2);
strncpy(hex+4, begin+6, 2);
strncpy(hex+6, begin+9, 2);
strncpy(hex+8, "\0", 1);
/* print final char array */
printf("%s\n", hex);
This works. I just have the feeling it is not the most clever way. And that there might be some traps I don't see myself.
So, does someone have suggestions what could be dangerous with this pointer-shifting manner? What would be an improvement in your opinion?
Does C provide a functionality to do it like so s|/(8.)/(..)/(..)/(..)\.|\1\2\3\4| ? If I remember right some scripting languages have a feature like that; if you know what I mean.
C itself doesn't provide this, but you can use POSIX regex. It's a full-featured regular expression library. But for a pattern as simple as yours, this probably is the best way.
BTW, prefer memcpy to strncpy. Very few people know what strncpy is good for. And I'm not one of them.
/* original argument */
char *path = "/my/complete/path/to/80/01/a9/1d.pdf";
char *begin;
char hex[9];
size_t len;
/* find "pattern" */
begin = strstr(path, "/8");
if (!begin) return 1;
// sanity check
len = strlen(begin);
if (len < 12) return 2;
// more sanity
if (begin[3] != '/' || begin[6] != '/' || begin[9] != '/' ) return 3;
memcpy(hex, begin+1, 2);
memcpy(hex+2, begin+4, 2);
memcpy(hex+4, begin+7, 2);
memcpy(hex+6, begin+10, 2);
hex[8] = 0;
// For additional sanity, you could check for valid hex characters here
/* print final char array */
printf("%s\n", hex);
In the simple case of just matching /8./../../.. I'd personally go for the strstr() solution myself (no external dependency required). If the rules become more though, you could try a lexer (flex and friends), they support regular expressions.
In your case something like this:
h2 [0-9A-Fa-f]{2}
mymatch (/{h2}){4}
could work. You'd have to set buffers to the match by side effect though as lexers typically return token identifiers.
Anyway, you'd gain the power of regexps without the dependencies but at the expense of generated (read: unreadable) code.

How to truncate C char*?

As simple as that. I'm on C++ btw. I've read the cplusplus.com's cstdlib library functions, but I can't find a simple function for this.
I know the length of the char, I only need to erase last three characters from it. I can use C++ string, but this is for handling files, which uses char*, and I don't want to do conversions from string to C char.
If you don't need to copy the string somewhere else and can change it
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
name[namelen - 3] = 0;
If you need to copy it (because it's a string literal or you want to keep the original around)
/* make sure strlen(name) >= 3 */
namelen = strlen(name); /* possibly you've saved the length previously */
strncpy(copy, name, namelen - 3);
/* add a final null terminator */
copy[namelen - 3] = 0;
I think some of your post was lost in translation.
To truncate a string in C, you can simply insert a terminating null character in the desired position. All of the standard functions will then treat the string as having the new length.
#include <stdio.h>
#include <string.h>
int main(void)
{
char string[] = "one one two three five eight thirteen twenty-one";
printf("%s\n", string);
string[strlen(string) - 3] = '\0';
printf("%s\n", string);
return 0;
}
If you know the length of the string you can use pointer arithmetic to get a string with the last three characters:
const char* mystring = "abc123";
const int len = 6;
const char* substring = mystring + len - 3;
Please note that substring points to the same memory as mystring and is only valid as long as mystring is valid and left unchanged. The reason that this works is that a c string doesn't have any special markers at the beginning, only the NULL termination at the end.
I interpreted your question as wanting the last three characters, getting rid of the start, as opposed to how David Heffernan read it, one of us is obviously wrong.
bool TakeOutLastThreeChars(char* src, int len) {
if (len < 3) return false;
memset(src + len - 3, 0, 3);
return true;
}
I assume mutating the string memory is safe since you did say erase the last three characters. I'm just overwriting the last three characters with "NULL" or 0.
It might help to understand how C char* "strings" work:
You start reading them from the char that the char* points to until you hit a \0 char (or simply 0).
So if I have
char* str = "theFile.nam";
then str+3 represents the string File.nam.
But you want to remove the last three characters, so you want something like:
char str2[9];
strncpy (str2,str,8); // now str2 contains "theFile.#" where # is some character you don't know about
str2[8]='\0'; // now str2 contains "theFile.\0" and is a proper char* string.

Resources