How mbtowc uses locale? - c

I have hard time using mbtowc, which keeps returning wrong results. It also puzzles me why the function even uses locale? Multibyte unicode chars points are locale independent. I implemented custom conversion function that convert it well, see the code below.
I use GCC 4.8.1 on Windows (where sizeof wchar_t is 2), using Czech locale (cs_CZ). The OEM codepage is windows-1250, console by default uses CP852. These are my results so far:
#include <stdio.h>
#include <stdlib.h>
// my custom conversion function
int u8toint(const char* str) {
if(!(*str&128)) return *str;
unsigned char c = *str, bytes = 0;
while((c<<=1)&128) ++bytes;
int result = 0;
for(int i=bytes; i>0; --i) result|= (*(str+i)&127)<<(6*(bytes-i));
int mask = 1;
for(int i=bytes; i<6; ++i) mask<<= 1, mask|= 1;
result|= (*str&mask)<<(6*bytes);
return result;
}
// data inspecting type for the tests in main()
union data {
wchar_t w;
struct {
unsigned char b1, b2;
} bytes;
} a,b,c;
int main() {
// I tried setlocale here
mbtowc(NULL, 0, 0); // reset internal mb_state
mbtowc(&(a.w),"ř",6); // apply mbtowc
b.w = u8toint("ř"); // apply custom function
c.w = L'ř'; // compare to wchar
printf("\na = %hhx%hhx", a.bytes.b2, a.bytes.b1); // a = 0c5 wrong
printf("\nb = %hhx%hhx", b.bytes.b2, b.bytes.b1); // b = 159 right
printf("\nc = %hhx%hhx", c.bytes.b2, c.bytes.b1); // c = 159 right
getchar();
}
Here are setlocale settings and the results for a:
setlocale(LC_CTYPE,"Czech_Czech Republic.1250"); // a = 139 wrong
setlocale(LC_CTYPE,"Czech_Czech Republic.852"); // a = 253c wrong
Why mbtowc doesn't give 0x159 - the unicode number of ř?

Related

Can Ghidra re-compile and run a short function?

I've picked out a short and "self-contained" function from the Ghidra decompiler. Can Ghidra itself compile the function again so I can try to run it for a couple different values, or would I need to compile it myself with e.g. gcc?
Attaching the function for context:
undefined8 FUN_140041010(char *param_1,longlong param_2,uint param_3)
{
char *pcVar1;
uint uVar2;
ulonglong uVar3;
uVar3 = 0;
if (param_3 != 0) {
pcVar1 = param_1;
do {
if (pcVar1[param_2 - (longlong)param_1] == '\0') {
if ((uint)uVar3 < param_3) {
param_1[uVar3] = '\0';
return 0;
}
break;
}
*pcVar1 = pcVar1[param_2 - (longlong)param_1];
uVar2 = (uint)uVar3 + 1;
uVar3 = (ulonglong)uVar2;
pcVar1 = pcVar1 + 1;
} while (uVar2 < param_3);
}
param_1[param_3 - 1] = '\0';
return 0;
}
Can Ghidra itself compile the function again so I can try to run it for a couple different values
The P-Code emulator of Ghidra is intended for this kind of scenario.
If it is just a short function and doesn't use other libraries, syscalls, etc like your example then the emulator can easily handle this without further effort on your side to emulate library functions. Ghidra knows the semantics of each instruction and converts them to the standardized P-Code format for e.g. decompilation, but this can also be combined with a "P-Code virtual machine".
It will most likely still involve a bit of scripting, though there exist plugins like TheRomanXpl0it/ghidra-emu-fun to make this easier. There are also more general tutorials if you want to understand the basic idea and usage of the Emulator API (which is not exposed in the GUI in any way)
If you run into issues while scripting the emulator I would recommend asking specific questions about the emulator API at the dedicated Reverse Engineering Stack Exchange
You can, but you'll have to change some of the types to be standard C, or just add typedefs like so:
#include <stdint.h>
typedef uint8_t undefined8;
typedef long long int longlong;
typedef unsigned long long int ulonglong;
typedef unsigned int uint;
undefined8 FUN_140041010(char *param_1,longlong param_2,uint param_3)
{
char *pcVar1;
uint uVar2;
ulonglong uVar3;
uVar3 = 0;
if (param_3 != 0) {
pcVar1 = param_1;
do {
if (pcVar1[param_2 - (longlong)param_1] == '\0') {
if ((uint)uVar3 < param_3) {
param_1[uVar3] = '\0';
return 0;
}
break;
}
*pcVar1 = pcVar1[param_2 - (longlong)param_1];
uVar2 = (uint)uVar3 + 1;
uVar3 = (ulonglong)uVar2;
pcVar1 = pcVar1 + 1;
} while (uVar2 < param_3);
}
param_1[param_3 - 1] = '\0';
return 0;
}
Then you can call it like any function:
int main(int argc, char const* argv[])
{
char* mystr = "hello";
printf("%hhu\n", FUN_140041010(mystr, /* not sure about this arg */ 0, strlen(mystr));
return 0;
}

C: variables retain the value from previous operation instead of resetting

I am fairly new to C and have been trying my hand with some arduino projects on Proteus. I recently tried implementing a keypad and LCD interface with Peter Fleury's libraries, so far the characters I input are displayed fine, but I run into a problem when trying to print to the serial port. It's like the value of the keys keeps on being concatenated with every iteration so the ouput has extra characters like this:
The value before the comma is from the 'key' variable, the value after it the 'buf' variable:
151
(The 5 I input in the second iteration was added to the 1 from the first iteration and then put into the variable I print)
I figure it may be due to my lack/incorrect use of pointers, heres is my code:
#include <avr/io.h>
#include <util/delay.h>
#include <stdlib.h>
#include <stdio.h>
#include "lcd.h"
#include "mat_kbrd.h"
#include "funciones.h"
#include "menu.h"
char buf[256];
char* coma = ",";
int main(void)
{
pin_init();
serial_begin();
lcd_init(LCD_DISP_ON);
kbrd_init();
bienvenida();
while (1) {
int i = 0;
char key = 0;
//char *peso;
//int pesoSize = 1;
char peso[100];
//peso = calloc(pesoSize,sizeof(char));
int salida = 0;
lcd_clrscr();
desechos();
key = kbrd_read();
if (key != 0) {
lcd_gotoxy(0,3);
lcd_putc(key);
_delay_ms(2000);
lcd_clrscr();
cantidad();
while (salida != 1) {
char keypeso = 0;
keypeso = kbrd_read();
//pesoSize = i;
//peso = realloc(peso,pesoSize*sizeof(char));
if (keypeso != 0) {
if (keypeso == '+') {
salida = 1;
keypeso = *("");
lcd_clrscr();
calcularTotal(key,peso);
_delay_ms(2000);
} else {
lcd_gotoxy(i,1);
lcd_putc(keypeso);
snprintf(peso, sizeof peso, "%s%s",peso, &keypeso);
//strcat(peso,&keypeso);
i++;
_delay_ms(2000);
}
}
}
snprintf(buf, sizeof buf, "%s%s%s", &key,coma,peso);
serial_println_str(buf);
}
}
}
&key and &keypeso point to a single char, but you are using the %s format specifier, so trying to read a string into a single char. Use %c rather then %s for single characters, and pass the char not the address-of-char..

Initializing an array of structure in C

I'm using the LXLE 14.04 distribution of Linux. I want to write a C program to read commands, interpret and perform them. I'd like the program to be efficient, and I do not want to use a linked list. The commands are operations on sets. Each set can contain any of the values from 0 through 127 inclusive. I decided to represent a set as an array of characters, containing 128 bits. If bit at position pos is turned on then the number pos is in the set and if the bit at position pos is turned off then the number pos is not present in the set. For example, if the bit at position 4 is 1, then the number 4 is present in the set, if the bit at position 11 is 1 then the number 11 is present in the set.
The program should read commands and interpret them in a certain way. There are a few commands: read_set, print_set, union_set, intersect_set, sub_set and halt.
For example, the command read_set A,1,2,14,-1 in the terminal will cause the reading of values of the list into the specified set in the command. In this case the specified set in the command is A. The end of the list is represented by -1. So after writing this command, the set A will contain the elements 1,2,14.
This is what I have so far. Below is the file set.h
#include <stdio.h>
typedef struct
{
char array[16]; /*Takes 128 bits of storage*/
}set;
extern set A , B , C , D , E , F;
This is the file main.c
#include <stdio.h>
#include "set.h"
#include <string.h>
#include <stdlib.h>
set A , B , C , D , E , F; /*Variable definition*/
set sets[6];
/*Below I want to initialize sets so that set[0] = A set[1] = B etc*/
sets[0].array = A.array;
sets[1].array = B.array;
sets[2].array = C.array;
sets[3].array = D.array;
sets[4].array = E.array;
sets[5].array = F.array;
void read_set(set s,char all_command[])
{
int i, number = 0 , pos;
char* str_num = strtok(NULL,"A, ");
unsigned int flag = 1;
printf("I am in the function read_set right now\n");
while(str_num != NULL) /*without str_num != NULL get segmentation fault*/
{
number = atoi(str_num);
if(number == -1)
return;
printf("number%d ",number);
printf("str_num %c\n",*str_num);
i = number/8; /*Array index*/
pos = number%8; /*bit position*/
flag = flag << pos;
s.array[i] = s.array[i] | flag;
str_num = strtok(NULL, ", ");
if(s.array[i] & flag)
printf("Bit at position %d is turned on\n",pos);
else
printf("Bit at position %d is turned off\n",pos);
flag = 1;
}
}
typedef struct
{
char *command;
void (*func)(set,char*);
} entry;
entry chart[] = { {"read_set",&read_set} };
void (*getFunc(char *comm) ) (set,char*)
{
int i;
for(i=0; i<2; i++)
{
if( strcmp(chart[i].command,comm) == 0)
return chart[i].func;
}
return NULL;
}
int main()
{
#define PER_CMD 256
char all_comm[PER_CMD];
void (*ptr_one)(set,char*) = NULL;
char* comm; char* letter;
while( (strcmp(all_comm,"halt") != 0 ) & (all_comm != NULL))
{
printf("Please enter a command");
gets(all_comm);
comm = strtok(all_comm,", ");
ptr_one = getFunc(comm);
letter = strtok(NULL,",");
ptr_one(sets[*letter-'A'],all_comm);
all_comm[0] = '\0';
letter[0] = '\0';
}
return 0;
}
I defined a command structure called chart that has a command name and function pointer for each command. Then I have created an array of these
structures which can be matched within a loop.
In the main function, I've created a pointer called ptr_one. ptr_one holds the value of the proper function depending on the command entered by the user.
The problem is, that since user decides which set to use,I need to represent the sets as some variable, so that different sets can be sent to the function ptr_one. I thought about
creating an array in main.c like so
set sets[6];
sets[0] = A;
sets[1] = B;
sets[2] = C;
sets[3] = D;
sets[4] = E;
sets[5] = F;
And then call the function ptr_one in the main function like this ptr_one(sets[*letter-'A'] , all_command).
That way, I convert my character into a set.
The problem is that while writing the above code I got the following compile error:
error: expected ���=���, ���,���, ���;���, ���asm��� or ���attribute��� before ���.��� token
I also tried the following in the file main.c
sets[0].array = A.array;
sets[1].array = B.array;
sets[2].array = C.array;
sets[3].array = D.array;
sets[4].array = E.array;
sets[5].array = F.array;
But I got this compile error expected ���=���, ���,���, ���;���, ���asm��� or ���attribute��� before ���.��� token
I know similar questions have been asked, by they don't seem to help in my
specific case. I tired this set sets[6] = { {A.array},{B.array},{C.array},{D.array},{E.array},{F.array} } too but it did not compile.
What's my mistake and how can I initialize sets so that it holds the sets A though F?

unexpected out put from char **name, c

Am in my way to practice how to use the pcre regex library, to match regular expression/pattern against a given data/buffer.Then if there is match, i have to load the matched string to may array/list. but, when i print my list/array (using a for loop), the output is unexpected/wrong. pls see how the logic works:
1.first i have to load patterns/regex.....i have a function to do this and returns patterns in an array/list.
2.iterate on each pattern and search for match in a data/buffer....pcre library handles this business.
3.if match exists, push/fill to a list/array
4.print out all matches with in loop
my sample code is:
code.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pcre.h>
#define NUMBER 3
#define MAX_GROUPS 200
char *data = "Jun 12 05 12:24:48 100.101.102.103 end";
char **load_patern()
{
char **regex_array = malloc (sizeof (char *) * NUMBER);
regex_array[0] = "(\\w+\\s\\d+\\s\\d+\\s\\d+:\\d+:\\d+)"; //pattern to match date time
regex_array[1] = "(\\d+.\\d+.\\d+.\\d+)"; //pattern for ip_adress
regex_array[2] = "(end)";//pattern for "end"
return regex_array;
}
int main()
{
char **patern_list = load_patern();
int t,i,size,rc,p,re_err_offset,re_vce[MAX_GROUPS];
char sBuffer[512];
char *pLasts = NULL;
pcre *re_compiled;
pcre_extra *re_extra;
const char *re_err_str;
char *token,*logg,*next,*match_str;
char **struct_list = malloc (sizeof (char *) * NUMBER);
for(t=0;t<3;t++) //3 is number of patterns, patern_list size
{
snprintf(sBuffer, sizeof(sBuffer), "%s",data);
pLasts = sBuffer;
re_compiled = pcre_compile(patern_list[t], 0, &re_err_str,
&re_err_offset, NULL);
re_extra = pcre_study(re_compiled, 0, &re_err_str);
next = pLasts;
size = strlen(pLasts);
rc = pcre_exec(re_compiled, re_extra, next, size, 0, 0, re_vce, MAX_GROUPS);
if(rc>0) //if match exists
{
next[re_vce[3]] = '\0';
match_str = next + re_vce[2];
struct_list[t] = match_str;
printf("data at [%d]:%s\n",t,struct_list[t]);//this prints correctly
}
}
//but,here am trying to print each match stored in struct_list[], but it fails to display correctly.
for(p=0; p<3; p++)
{
printf("loop_one: [%d]----%s\n",p,struct_list[p]);
}
return 0;
}
The out put from a loop iterating on p,should be:
loop_one: 0----Jun 12 05 12:24:48
loop_one: 1----100.101.102.103
loop_one: 2----end
any thing i miss?

Fuzzy regex match using TRE

I'm trying to use the TRE library in my C program to perform a fuzzy regex search. I've managed to piece together this code from reading the docs:
regex_t rx;
regcomp(&rx, "(January|February)", REG_EXTENDED);
int result = regexec(&rx, "January", 0, 0, 0);
However, this will match only an exact regex (i.e. no spelling errors are allowed). I don't see any parameter which allows to set the fuzziness in those functions:
int regcomp(regex_t *preg, const char *regex, int cflags);
int regexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], int eflags);
How can I set the level of fuzziness (i.e. maximum Levenshtein distance), and how do I get the Levenshtein distance of the match?
Edit: I forgot to mention I'm using the Windows binaries from GnuWin32, which are available only for version 0.7.5. Binaries for 0.8.0 are available only for Linux.
Thanks to #Wiktor Stribiżew, I found out which function I need to use, and I've successfully compiled a working example:
#include <stdio.h>
#include "regex.h"
int main() {
regex_t rx;
regcomp(&rx, "(January|February)", REG_EXTENDED);
regaparams_t params = { 0 };
params.cost_ins = 1;
params.cost_del = 1;
params.cost_subst = 1;
params.max_cost = 2;
params.max_del = 2;
params.max_ins = 2;
params.max_subst = 2;
params.max_err = 2;
regamatch_t match;
match.nmatch = 0;
match.pmatch = 0;
if (!regaexec(&rx, "Janvary", &match, params, 0)) {
printf("Levenshtein distance: %d\n", match.cost);
} else {
printf("Failed to match\n");
}
return 0;
}

Resources