Comparing paths with special characters on Mac UTF-8 - c

We have a kext that checks if a path is a subdir of another path and does some magic if it is.
This all works fine as long we don't have special characters in our path (characters like ë)
We feed some working paths into the system by a helper application that can communicate with the kext.
I've isolated the problem to this code:
#include <stdio.h>
#include <string.h>
int main ()
{
char* path = "/Users/user/test/tëst/test"; //Sent by the system
char* wp = "/Users/user/test/tëst"; //Some path we claim to be ours
size_t wp_len = strlen(wp);
if (strncmp (wp,path,wp_len) == 0) //Check is path is a subpath
{
printf ("matched %s\n", path);
}else {
printf ("could not match\n");
}
return 0;
}
I've created a Gist, so the encoding does not go lost with the browser: https://gist.github.com/fvandepitte/ec28f4321a48061808d0095853af7bd7
Someone knows how i can check if path is a subpath of wp without losing too much performance (this code runs in the kernel)?

I've copy/pasted the source straight from the browser into a file (test.c). It prints could not match for me.
If I dump the file using od this is what I see:
bash-3.2$ od -c test.c
0000000 # i n c l u d e < s t d i o .
0000020 h > \n # i n c l u d e < s t r
0000040 i n g . h > \n \n i n t m a i n
0000060 ( ) \n { \n c h a r * p a
0000100 t h = " / U s e r s / u s e
0000120 r / t e s t / t ë ** s t / t e s
0000140 t " ; / / S e n t b y t h
0000160 e s y s t e m \n c h a r *
0000200 w p = " / U s e r s /
0000220 u s e r / t e s t / t e ̈ ** s t
0000240 " ; / / S o m e p a t h w
Notice that the tëst of path comes out as t ë ** s t,
but the tëst of wp comes out as t e ̈ ** s t, which is different: so strncmp will fail when comparing ë and e.
If I copy the tëst from path paste that into wp's assignment then I get matched /Users/user/test/tëst/test, so strncmp seems to work fine.
I don't know these two strings differ like this, I can only assume that the two strings are using different encodings somehow. The strncmp function compares strings per byte, so ë and e ̈ are considered different. If you want to use strncmp, then unfortunately there's no easy solution to this other than insuring that both strings use the same encoding.
FWIW - I'm running on macOS 10.12.1, with clang version Apple LLVM version 8.0.0 (clang-800.0.42.1)
EDIT: I've downloaded pathtest.cpp from your github link just to double-check things. I've run od -c pathtest.cpp and I see the same problem.

Related

mbedtls cannot parse valid x509 certificate

I have the following certificate:
-----BEGIN CERTIFICATE-----
MIIDWjCCAkKgAwIBAgIVAJ3wzBnLSnQvYi31rNVQRAXDUO/zMA0GCSqGSIb3DQEB
CwUAME0xSzBJBgNVBAsMQkFtYXpvbiBXZWIgU2VydmljZXMgTz1BbWF6b24uY29t
IEluYy4gTD1TZWF0dGxlIFNUPVdhc2hpbmd0b24gQz1VUzAeFw0yMDA3MjgxMTMz
MTJaFw00OTEyMzEyMzU5NTlaMB4xHDAaBgNVBAMME0FXUyBJb1QgQ2VydGlmaWNh
dGUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDHc2tmezGoekLjkQlb
+YOBKFyPswYR+GLq/JRVbFX2k4OrHF5js4GTfbHm1oQ733KbcnIugdejtQnRhtnr
1HRk3pqedVhRKGRo2DFDYyuX3K1UR6xna1poJF+6WNy6vXGxIQYKi7SNS5LtzkRT
1FCziOLBaxfcCRNgR1NBHjlcFsUWyL4evMok6h/wU7HA3/dfKEisyLdh3sMy7Yox
Im/ldvyX+9pH7Hj0TrGGTd5f8GtX8npNuSKdkntuag95r+vAaAPp6bQVyPWm8T/G
SUN8N7Nvc9DOcJ8ZhvB/Ubq+Fa/eoUnr3SgXInufLHhrfxJW7dyrBTlw/1kdXgYw
YiKnAgMBAAGjYDBeMB8GA1UdIwQYMBaAFP4UzdqnzQ4l89+D7UhXC5MKWnOJMB0G
A1UdDgQWBBSn95OHFqTn3DrE3anpNq5RoOsT+DAMBgNVHRMBAf8EAjAAMA4GA1Ud
DwEB/wQEAwIHgDANBgkqhkiG9w0BAQsFAAOCAQEA2Hvrxy2N0xt3I/w/7JIyoTH4
ixUKMaD1QXe+g6LrsQSCVVsaq0L468OpyydVzQLQONXvDDRv3rqIEel1hPAJNG0y
dp3g+WC1dPl7E44btM+59gBf1369lFwV6FbJMwCltVBUJ4hFAjt3QTkWRHq6DlFQ
wa896aSr5UUiVNAJjf/hLVjERlVG4wDjPN7YifQssRqlNcYDgok3UhVsBfKIGnct
WFbisX+0ONMyNnE1Qq6bX5g4sLN7VlwFhADiz1Xp2rUtLECR1NSPutYibWyvJJ8d
htYYV1a0FSkg7JKyvOIJ8IYKEPsKE+UYo1Z8DwkmHHcap+h0OMWAnKQgRXn6QQ==
-----END CERTIFICATE-----
I fed this into several certificate reading sites, and they were all able to parse and display its contents.
I tried using mbedtls to parse this certificate using the following code:
mbedtls_x509_crt certificate;
mbedtls_x509_crt_init(&certificate);
char certificate_string[] = "-----BEGIN CERTIFICATE--..."
int result_code = mbedtls_x509_crt_parse(&certificate, (unsigned char*)certificate_string, strlen(certificate_string));
if(result_code != 0) {
char err_str[256];
mbedtls_strerror(result_code, err_str, 256);
printf("Could not read the certificate. Error: %s\n", err_str);
return -1;
}
I then check the result_code for 0, and print the error message if it is not. I get the following error message every time I try to parse this certificate:
"Could not read the certificate. Error: X509 - The CRT/CRL/CSR format is invalid, e.g. different type expected"
I tried looking at the mbedtls_x509_crt_parse code to see what causes this message, and I then modified the code to use the following pieces of mbedtls_x509_crt_parse instead:
mbedtls_pem_context pem;
size_t use_len;
mbedtls_pem_init(&pem);
// If we get there, we know the string is null-terminated
int ret = mbedtls_pem_read_buffer(&pem,
"-----BEGIN CERTIFICATE-----",
"-----END CERTIFICATE-----",
(unsigned char *)certificate_string,
NULL,
0,
&use_len);
if(ret != 0) {
printf("we could not pem read the string\n");
return -1;
}
else {
printf("We pem read the certificate\n");
}
ret = mbedtls_x509_crt_parse_der(&certificate, pem.buf, pem.buflen);
if(ret != 0) {
printf("crt parse der has failed\n");
}
else {
printf("The issuer is: %s\n", certificate.issuer.val.p);
return 0;
}
When I run the program, I get the following output:
491231235959Z010�Uzon Web Services O=Amazon.com Inc. L=Seattle ST=Washington C=US0
*�H�� AWS IoT Certificate0�"0
I kept searching for answers as to what may be wrong, and I found a post saying that mbedtls is configured by default to use RSA 1024, so if your key is 2048 (and it is in mine) then mbedtls will have an error with parsing. I modified the configuration file to use 2048 and I rebuilt the library, but I still get errors.
Any ideas? I feel like I am really close, because mbedtls_x509_crt_parse executes almost the whole way through. I am pretty sure I am using the library correctly based on code samples I have seen.
Thanks!
Initially, the PEM format certificate string was parsed with the following code:
mbedtls_x509_crt certificate;
mbedtls_x509_crt_init(&certificate);
char certificate_string[] = "-----BEGIN CERTIFICATE--..."; // actually much longer
int result_code = mbedtls_x509_crt_parse(&certificate, (unsigned char*)certificate_string, strlen(certificate_string));
That resulted in a parsing error because for PEM format input, the final argument of the call to mbedtls_x509_crt certificate should be the length of the input including the null terminator. Changing the final argument to 1 + strlen(certificate_string) fixes the issue.
After successfully parsing, the issuer string was printed using:
printf("The issuer is: %s\n", certificate.issuer.val.p);
That produced some junk output that looked as if the initial part of the issuer string had been overwritten, but was actually due to the lack of a null terminator in the issuer string. The bytes of data after the issuer string included ASCII CR characters causing the terminal cursor position to move to the start of the line and print over the initial part of the output. (The CR characters can be seen by piping the output through | od -c for example, wherewith they are displayed as \r.)
Piping the output through | od -c produces:
0000000 T h e i s s u e r i s : A
0000020 m a z o n W e b S e r v i c
0000040 e s O = A m a z o n . c o m
0000060 I n c . L = S e a t t l e S
0000100 T = W a s h i n g t o n C = U
0000120 S 0 036 027 \r 2 0 0 7 2 8 1 1 3 3 1
0000140 2 Z 027 \r 4 9 1 2 3 1 2 3 5 9 5 9
0000160 Z 0 036 1 034 0 032 006 003 U 004 003 \f 023 A W
0000200 S I o T C e r t i f i c a t
0000220 e 0 202 001 " 0 \r 006 \t * 206 H 206 367 \r 001
0000240 001 001 005 \n
0000244
That shows unprintable bytes as 3-digit octal codes or as C backslash escape codes, depending on the byte value.
To print the issuer string without the junk, change the printf call to the following:
printf("The issuer is: %.*s\n", (int)certificate.issuer.val.len, certificate.issuer.val.p);

missing entries in array of string values

I'm trying to create a const char array of strings used to identify some hardware channels. I then want to retrieve these entries by index to label outputs on a user console. Some of these channels are unassigned and labeled as such with a string "XX_UNASSIGNED_XX", so this value is repeated in the array.
When I try to sequentially display these values in my test code, I see that XX_UNASSIGNED_XX only appears once, and is subsequently skipped. I opened a memory trace in the embedded hardware, and sure enough, the memory only lists XX_UNASSIGNED_XX once, I'm assuming as a sort of optimization.
Is there a way to force the compiler to instead list out every entry in memory as is, duplicates and all? Or, is it possible that I don't need to do this, and the way I'm attempting to display the strings is incorrect or inefficient and could be improved?
I've played around with how I display the strings, and because it's ultimately a pointer array with each string a different length, I ultimately resulted in recording the length of each string, tracing the array with a pointer variable, then using snprintf to copy the string over to a temp string which I then display. Any attempt to print the values in the array directly kept resulting in anomalous behavior I couldn't seem to correct.
FYI The Display_printf command is simply a printf to the UART terminal with syntax as follows:
Display_printf(UART_handle,col_index, row_index, display_text))
#define ADC_COUNT (20)
const char* adcNamesArray[ADC_COUNT] = {
"KP_CUR_MON",
"A_CUR_MON",
"A_VOLT_MON",
"NEG_15_VOLT_MON",
"XX_UNASSIGNED_XX",
"FOCUS_CUR_MON",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"K_CUR_MON",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"FOCUS_VOLT_MON",
"FARADAY_MON",
"MFC_MON",
"XX_UNASSIGNED_XX",
"POS_12_VOLT_MON",
"POS_24_VOLT_MON",
"POS_15_VOLT_MON",
"POS_5_VOLT_MON"
};
char str[20];
char* ptr = &adcNamesArray[0];
char* printPtr;
int nameLength;
for(int adc_index = 0; adc_index < ADC_COUNT; adc_index++) {
nameLength = 0;
while(*ptr == '\0') {
ptr += sizeof(char);
}
printPtr = ptr;
while(*ptr != '\0') {
ptr += sizeof(char);
nameLength++;
}
nameLength++;
char* str;
str = (char *)malloc((sizeof(char)*nameLength+1));
snprintf(str, nameLength, "%s", printPtr);
Display_printf(display,0,0,"ADC %d: %s", adc_index, str);
}
So, I expect all the XX_UNASSIGNED_XX entries to show up in order, but instead what I get is this:
ADC 0: KP_CUR_MON
ADC 1: A_CUR_MON
ADC 2: A_VOLT_MON
ADC 3: NEG_15_VOLT_MON
ADC 4: XX_UNASSIGNED_XX
ADC 5: FOCUS_CUR_MON
ADC 6: K_CUR_MON
ADC 7: FOCUS_VOLT_MON
ADC 8: FARADAY_MON
ADC 9: MFC_MON
ADC 10: POS_12_VOLT_MON
ADC 11: POS_24_VOLT_MON
ADC 12: POS_15_VOLT_MON
ADC 13: POS_5_VOLT_MON
ADC 14: ▒
ADC 15: #
ADC 16: ▒▒▒▒
ADC 17: #▒
ADC 18:
ADC 19:
A look at the memory dump gives this, which explains why XX_UNASSIGNED_XX doesn't show up multiple times.
0x0001C0D8 . . . . . . . 0 K P _ C U R _ M
0x0001C0E8 O N . . A _ C U R _ M O N . . .
0x0001C0F8 A _ V O L T _ M O N . . N E G _
0x0001C108 1 5 _ V O L T _ M O N . X X _ U
0x0001C118 N A S S I G N E D _ X X . . . .
0x0001C128 F O C U S _ C U R _ M O N . . .
0x0001C138 K _ C U R _ M O N . . . F O C U
0x0001C148 S _ V O L T _ M O N . . F A R A
0x0001C158 D A Y _ M O N . M F C _ M O N .
0x0001C168 P O S _ 1 2 _ V O L T _ M O N .
0x0001C178 P O S _ 2 4 _ V O L T _ M O N .
0x0001C188 P O S _ 1 5 _ V O L T _ M O N .
0x0001C198 P O S _ 5 _ V O L T _ M O N . .
0x0001C1A8 uartMSP432E4HWAttrs
0x0001C1A8 . . . # . . . . . . . . . . . .
0x0001C1B8 # . . . . . . . . . . . . . . .
Any help is appreciated.
You assume that the texts are contiguous in memory separated by one or more NUL characters. This assumption is wrong.
This declares an array of pointers to your texts:
const char* adcNamesArray[ADC_COUNT] = {
...
Just use that array and all of a sudden your code becomes much simpler and correct.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ADC_COUNT (20)
int main(void)
{
const char* adcNamesArray[ADC_COUNT] = {
"KP_CUR_MON",
"A_CUR_MON",
"A_VOLT_MON",
"NEG_15_VOLT_MON",
"XX_UNASSIGNED_XX",
"FOCUS_CUR_MON",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"K_CUR_MON",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"XX_UNASSIGNED_XX",
"FOCUS_VOLT_MON",
"FARADAY_MON",
"MFC_MON",
"XX_UNASSIGNED_XX",
"POS_12_VOLT_MON",
"POS_24_VOLT_MON",
"POS_15_VOLT_MON",
"POS_5_VOLT_MON"
};
for (int adc_index = 0; adc_index < ADC_COUNT; adc_index++)
{
char *str = malloc(strlen(adcNamesArray[adc_index]) + 1);
strcpy(str, adcNamesArray[adc_index]);
printf("ADC %d: %s\n", adc_index, str);
}
}
If you don't have strcpy or strlen on your platform for whatever reason, you can implement them yourself, they are one-liners.
Some explanations:
sizeof char is 1 by definition, so you can drop it
the (char*) cast is not necessary with malloc, it's not wrong to put one, but there is zero benefit in doing so.

WinDBG conditional breakpoint based on string arguments

I want to set a conditional breakpoint when the value of the 4th argument is equal to "abc".
void FunctionA(char* a, char* b, char* c, char* d)
{
`enter code here`//some code here
}
I use the following command but it doesn't work. Could you help?
bp app!FunctionA "as /mu ${/v:MyAlias} poi(d);.block{.if ($spat(\"${MyAlias}\", \"abc\") == 0) { } .else { gc } }"
Note: app.exe is my application name.
you cannot use /mu on char * /mu is for null terminated unicode string not ascii string for ascii string use /ma
I assume you have descriptive argument names and not an argument like d
which would obviously clash with 0xd aka 0n13
is d a number , string or symbol ??
what would poi(d) resolve to in your case is it poi(0x13) which obviously is a bad de-referance
or a local symbol illogically named d ??
also alias is not interpreted when you break
when using alias you should always stuff them in a script file and execute
the script file on each break
here is an example of a script file
as /ma ${/v:MyAlias} poi(k)
.block {
r $t0 = $spat("${MyAlias}" , "tiger")
.printf "%x\t${MyAlias}\n" , #$t0
.if(#$t0 != 1) {gc}
}
here is code on which this is operated comipled in debug mode with optimizations turned off
in release mode compiler will be smart enough to inline the printf() call
#include <stdio.h>
#include <stdlib.h> //msvc _countof
void func(char* h,char* i,char* j,char* k ) {
printf( "%s %s %s %s\n" ,h,i,j,k );
return;
}
int main(void) {
char* foo[] = {"goat","dog","sheep","cat","lion","tiger",0,"vampire"};
for(int x=0;x<_countof(foo);x++) {
func("this" , "is" , "a" , foo[x]);
}
return 0;
}
usage
windbg app.exe
set the break and run
keep in mind this or any script that uses alias will fail on
evaluating the null entry before char * vampire
if you want to break on "vampire" you may need to improvise without using alias at all
0:000> bl
0:000> bp strbp!func "$$>a< strbpcond.txt"
0:000> bl
0 e 00171260 0001 (0001) 0:**** strbp!func "$$>a< strbpcond.txt"
0:000> g
ModLoad: 72670000 72673000 C:\Windows\system32\api-ms-win-core-synch-l1-2-0.DLL
0 goat
0 dog
0 sheep
0 cat
0 lion
1 tiger
eax=00000005 ebx=7ffd7000 ecx=00000005 edx=001ac1e0 esi=001b6678 edi=001b667c
eip=00171260 esp=002bfa54 ebp=002bfa90 iopl=0 nv up ei ng nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293
strbp!func:
00171260 55 push ebp
0:000> dv
h = 0x001ac1f8 "this"
i = 0x001ac1f4 "is"
j = 0x001ac1f0 "a"
k = 0x001ac1e0 "tiger"

stat() does not work for .so files

I am facing an issue with stat() . stat() does not seem to be working with .so files. It gives the error
No such file or directory .
Why is this happening?
As requested I paste a portion of the code:
int main()
{
char str[300];
struct stat str_buf;
strcpy(str,"path/to/my/library/libfuncs.so");
if(stat(str,$str_buf)==-1)
perror("stat");
....
}
Thus the error comes as
stat No such file or directory
But the same code works fine for other files and directories. libfuncs.so is my generated shared library.
Many ".so" files are in fact symbolic links due to versioning issues. You might want to use lstat() in those cases, to stat the actual link.
The error you're getting ("No such file or directory") seems to imply that the symbolic link is pointing at something that doesn't exist. In these cases stat:ing the link itself helps, but of course that might not be what you want to do. Check the link's target. If the path in the link is relative, perhaps you're executing the code from a different directory?
Probable reason
I can only guess that "path/to/my/library/libfuncs.so" does not really exist. You could test that simply by typing ls "path/to/my/library/libfuncs.so".
I am pretty sure that
stat() does not work
I guess this once again solves for a "bug" in a very well established library.
Theoratically possible reason.
You use $ for a variable name. That is not permitted. The C99 Standard has this to say about this:
Both the basic source and basic execution character sets shall have the following
members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
the 10 decimal digits 0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ^ _ { | } ~
Further:
If ...
any
other characters are encountered in a source file (except in an identifier, a character
constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token),
guess what? ** drumroll **
the behavior is undefined.
Yay party.but I think it is the first reason.

Overwriting lines in file in C

I'm doing a project on filesystems on a university operating systems course, my C program should simulate a simple filesystem in a human-readable file, so the file should be based on lines, a line will be a "sector". I've learned, that lines must be of the same length to be overwritten, so I'll pad them with ascii zeroes till the end of the line and leave a certain amount of lines of ascii zeroes that can be filled later.
Now I'm making a test program to see if it works like I want it to, but it doesnt. The critical part of my code:
file = fopen("irasproba_tesztfajl.txt", "r+"); //it is previously loaded with 10 copies of the line I'll print later in reverse order
/* this finds the 3rd line */
int count = 0; //how much have we gone yet?
char c;
while(count != 2) {
if((c = fgetc(file)) == '\n') count++;
}
fflush(file);
fprintf(file, "- . , M N B V C X Y Í Ű Á É L K J H G F D S A Ú Ő P O I U Z T R E W Q Ó Ü Ö 9 8 7 6 5 4 3 2 1 0\n");
fflush(file);
fclose(file);
Now it does nothing, the file stays the same. What could be the problem?
Thank you.
From here,
When a file is opened with a "+"
option, you may both read and write on
it. However, you may not perform an
output operation immediately after an
input operation; you must perform an
intervening "rewind" or "fseek".
Similarly, you may not perform an
input operation immediately after an
output operation; you must perform an
intervening "rewind" or "fseek".
So you've achieved that with fflush, but in order to write to the desired location you need to fseek back. This is how I implemented it - could be better I guess:
/* this finds the 3rd line */
int count = 0; //how much have we gone yet?
char c;
int position_in_file;
while(count != 2) {
if((c = fgetc(file)) == '\n') count++;
}
// Store the position
position_in_file = ftell(file);
// Reposition it
fseek(file,position_in_file,SEEK_SET); // Or fseek(file,ftell(file),SEEK_SET);
fprintf(file, "- . , M N B V C X Y Í Ű Á É L K J H G F D S A Ú Ő P O I U Z T R E W Q Ó Ü Ö 9 8 7 6 5 4 3 2 1 0\n");
fclose(file);
Also, as has been commented, you should check if your file has been opened successfully, i.e. before reading/writing to file, check:
file = fopen("irasproba_tesztfajl.txt", "r+");
if(file == NULL)
{
printf("Unable to open file!");
exit(1);
}

Resources