I have some some C code that I'm trying to understand which uses the function __isoc99_scanf(). I haven't encountered that function ever before. I looked it up and it turns out that it is some kind of variation of scanf(). This is the code:
__isoc99_scanf(&DAT_00400d18,local_78,(undefined4 *)((long)puVar3 + 4));
&DAT_00400d18 is a C string containing the value "%s". local_78 is an array of unknown data type. puVar3 is a pointer that points to the last element of that array.
What really confuses me is why does that function call have three parameters? I know that scanf() takes two parameters: the first one is the format string. The second one is the memory address to save the date into. However __isoc99_scanf() here is invoked with three parameters. I cannot understand why the third parameter is there. The first parameter &DAT_00400d18 is just "%s", which suggests that the second parameter be a memory location where to save that string. But why do you need the third parameter when it's not even specified in the format string?
This is not my code, I didn't write it. Actually it is a disassembled version of the assembly code for a particular application that I'm trying to debug. But I've never seen __isoc99_scanf() before because I only used scanf() in my own code.
When you compile scanf, the compiler automatically translates it to the __isoc99_scanf function in libc. If you compile this code:
#include <stdio.h>
int main() {
char buf[32];
scanf("%32s", buf);
return 0;
}
and decompile in GHIDRA, you get:
__isoc99_scanf(&DAT_001007b4,local_38);
where DAT_001007b4 is "%32s" and local_38 is a buffer. It behaves exactly the same as normal scanf. One important thing to keep in mind when using GHIDRA is it doesn't know exactly how many arguments a function should expect, so if a function is being passed in too many arguments, like in your case, you should just ignore the extra arguments since the code will too.
i got 2 error messages for the code, I tried to figure it out myself searching online but it didnĀ“t help. the message are like this:
error: incompatible integer to pointer conversion passing 'char' to parameter of type 'const char *'; take the address with & [-Werror,-Wint-conversion]
strcpy(genHash,crypt( letters[i], "abc"));
the other one is the same message but for passW[0]. I just want to understand what happen. I would appreciate any help. Also if anyone can recommend a good lecture about char arrays, char arrays using pointers. thanks
When you call a function and pass letters[i] or passW[0], you are passing a single character. But it sounds like these functions expect entire strings, that is, arrays of characters.
You might be able to get away with passing letters instead of letters[I], and passW instead of passW[0]. (But it's hard to be sure, because you haven't shown us your code.)
If you want to learn about char arrays and char array pointers then you should grab a copy of 'The C programming language' by Kernighan and Richie. It has a whole chapter on it. It's also a spectacular reference for C in general.
Char variables are actually unsigned 8-bit (1 byte) integers in C/C++. They have values from 0 to 255. When you use fputs() or other print functions in C/C++, these integer values are converted to characters based on your chosen character/type set.
Have you checked the type of the output that crypt() returns? If you're using a genuine GNU compiler then you can use the typeof() function/operator to check the output. Or you could look for the definition of crypt() in your header files. It's possible that it isn't the input to crypt that's causing the problem. It could be nesting crypt() inside strcpy() without an appropriate cast.
I am asking this question in the context of the C language, though it applies really to any language supporting pointers or pass-by-reference functionality.
I come from a Java background, but have written enough low-level code (C and C++) to have observed this interesting phenomenon. Supposing we have some object X (not using "object" here in the strictest OOP sense of the word) that we want to fill with information by way of some other function, it seems there are two approaches to doing so:
Returning an instance of that object's type and assigning it, e.g. if X has type T, then we would have:
T func(){...}
X = func();
Passing in a pointer / reference to the object and modifying it inside the function, and returning either void or some other value (in C, for instance, a lot of functions return an int corresponding to the success/failure of the operation). An example of this here is:
int func(T* x){...x = 1;...}
func(&X);
My question is: in what situations makes one method better than the other? Are they equivalent approaches to accomplishing the same outcome? What are the restrictions of each?
Thanks!
There is a reason that you should always consider using the second method, rather than the first. If you look at the return values for the entirety of the C standard library, you'll notice that there's almost always an element of error handling involved in them. For example, you have to check the return value of the following functions before you assume they've succeeded:
calloc, malloc and realloc
getchar
fopen
scanf and family
strtok
There are other non-standard functions that follow this pattern:
pthread_create, etc.
socket, connect, etc.
open, read, write, etc.
Generally speaking, a return value conveys a number of items successfully read/written/converted or a flat-out boolean success/fail value, and in practice you'll almost always need such a return value, unless you're going to exit(EXIT_FAILURE); at any errors (in which case I would rather not use your modules, because they give me no opportunity to clean up within my own code).
There are functions that don't use this pattern in the standard C library, because they use no resources (e.g. allocations or files) and so there's no chance of any error. If your function is a basic translation function (e.g. like toupper, tolower and friends which translate single character values), for example, then you don't need a return value for error handling because there are no errors. I think you'll find this scenario quite rare indeed, but if that is your scenario, by all means use the first option!
In summary, you should always highly consider using option 2, reserving the return value for a similar use, for the sake of consistent with the rest of the world, and because you might later decide that you need the return value for communicating errors or number of items processed.
Method (1) passes the object by value, which requires that the object be copied. It's copied when you pass it in and copied again when it's returned. Method (2) passes only a pointer. When you're passing a primitive, (1) is just fine, but when you're passing an object, a struct, or an array, that's just wasted space and time.
In Java and many other languages, objects are always passed by reference. Behind the scenes, only a pointer is copied. This means that even though the syntax looks like (1), it actually works like (2).
I think I got you.
These to approach are very different.
The question you have to ask your self when ever you trying to decide which approach to take is :
Which class would have the responsibility?
In case you passing the reference to the object you are decapul the creation of the object to the caller and creating this functionality to be more serviceability and you would be able to create a util class that all of the functions inside will be stateless, they are getting object manipulate the input and returning it.
The other approach is more likely and API, you are requesting an opperation.
For an example, you are getting array of bytes and you would like to convert it to string, you would probably would chose the first approch.
And if you would like to do some opperation in DB you would chose the second one.
When ever you will have more than 1 function from the first approch that cover the same area you would encapsulate it into a util class, same applay to the second, you will encapsulate it into an API.
In method 2, we call x an output parameter. This is actually a very common design utilized in a lot of places...think some of the various built-in C functions that populate a text buffer, like snprintf.
This has the benefit of being fairly space-efficient, since you won't be copying structs/arrays/data onto the stack and returning brand new instances.
A really, really convenient quality of method 2 is that you can essentially have any number of "return values." You "return" data through the output parameters, but you can also return a success/error indicator from the function.
A good example of method 2 being used effectively is in the built-in C function strtol. This function converts a string to a long (basically, parses a number from a string). One of the parameters is a char **. When calling the function, you declare char * endptr locally, and pass in &endptr.
The function will return either:
the converted value if it was successful,
0 if it failed, or
LONG_MIN or LONG_MAX if it was out of range
as well as set the endptr to point to the first non-digit it found.
This is great for error reporting if your program depends on user input, because you can check for failure in so many ways and report different errors for each.
If endptr isn't null after the call to strtol, then you know precisely that the user entered a non-integer, and you can print straight away the character that the conversion failed on if you'd like.
Like Thom points out, Java makes implementing method 2 simpler by simulating pass-by-reference behavior, which is just pointers behind the scenes without the pointer syntax in the source code.
To answer your question: I think C lends itself well to the second method. Functions like realloc are there to give you more space when you need it. However, there isn't much stopping you from using the first method.
Maybe you're trying to implement some kind of immutable object. The first method will be the choice there. But in general, I opt for the second.
(Assuming we are talking about returning only one value from the function.)
In general, the first method is used when type T is relatively small. It is definitely preferable with scalar types. It can be used with larger types. What is considered "small enough" for these purposes depends on the platform and the expected performance impact. (The latter is caused by the fact that the returned object is copied.)
The second method is used when the object is relatively large, since this method does not perform any copying. And with non-copyable types, like arrays, you have no choice but to use the second method.
Of course, when performance is not an issue, the first method can be easily used to return large objects.
An interesting matter is optimization opportunities available to C compiler. In C++ language compilers are allowed to perform Return Value Optimizations (RVO, NRVO), which effectively turn the first method into the second one "under the hood" in situations when the second method offers better performance. To facilitate such optimizations C++ language relaxes some address-identity requirements imposed on the involved objects. AFAIK, C does not offer such relaxations, thus preventing (or at least impeding) any attempts at RVO/NRVO.
Short answer: take 2 if you don't have a necessary reason to take 1.
Long answer: In the world of C++ and its derived languages, Java, C#, exceptions help a lot. In C world, there is not very much you can do. Following is an sample API I take from CUDA library, which is a library I like and consider well designed:
cudaError_t cudaMalloc (void **devPtr, size_t size);
compare this API with malloc:
void *malloc(size_t size);
in old C interfaces, there are many such examples:
int open(const char *pathname, int flags);
FILE *fopen(const char *path, const char *mode);
I would argue to the end of the world, the interface CUDA is providing is much obvious and lead to proper result.
There are other set of interfaces that the valid return value space actually overlaps with the error code, so the designers of those interfaces scratched their heads and come up with not brilliant at all ideas, say:
ssize_t read(int fd, void *buf, size_t count);
a daily function like reading a file content is restricted by the definition of ssize_t. since the return value has to encode error code too, it has to provide negative number. in a 32bit system, the max of ssize_t is 2G, which is very much limited the number of bytes you can read from your file.
If your error designator is encoded inside of the function return value, I bet 10/10 programmers won't try to check it, though they really know they should; they just don't, or don't remember, because the form is not obvious.
And another reason, is human beings are very lazy and not good at dealing if's. The documentation of these functions will describe that:
if return value is NULL then ... blah.
if return value is 0 then ... blah.
yak.
In the first form, things changes. How do you judge if the value has been returned? No NULL or 0 any more. You have to use SUCCESS, FAILURE1, FAILURE2, or something similar. This interface forces users to code more safer and makes the code much robust.
With these macro, or enum, it's much easier for programmers to learn about the effect of the API and the cause of different exceptions too. With all these advantages, there actually is no extra runtime overhead for it too.
I will try to explain :)
Let say you have to load a giant rocket into semi,
Method 1)
Truck driver places a truck on a parking lot, and goes on to find a hookers, you are stack with putting the load onto forklift or some kind of trailer to bring it to the track.
Method 2)
Truck driver forgets hooker and backs truck up right to the rocket, then you need just to push it in.
That is the difference between those two :). What it boils down to in programming is:
Method 1)
Caller function reserves and address for called function to return its return value to, but how is calling function going to get that value does not matter, will it have to reserve another address or not does not matter, I need something returned, it is your job to get it to me :). So called function goes and reserves the address for its calculations and than stores the value in address then returns value to caller. So caller goes and say oh thank you let me just copy it to the address I reserved earlier.
Method 2)
Caller function says "Hey I will help you, I will give you the address that I have reserved, store what ever calculations you do in it", this way you save not only memory but you save in time.
And I think second is better, and here is why:
So let say that you have struct with 1000 ints inside of it, method 1 would be pointless, it will have to reserve 2*100*32 bits of memory, which is 6400 plus you have to copy it to first location than copy it to second one. So if each copy takes 1 millisecond you will need to way 6.4 seconds to store and copy variables. Where if you have address you only have to store it once.
They are equivalent to me but not in the implementation.
#include <stdio.h>
#include <stdlib.h>
int func(int a,int b){
return a+b;
}
int funn(int *x){
*x=1;
return 777;
}
int main(void){
int sx,*dx;
/* case static' */
sx=func(4,6); /* looks legit */
funn(&sx); /* looks wrong in this case */
/* case dynamic' */
dx=malloc(sizeof(int));
if(dx){
*dx=func(4,6); /* looks wrong in this case */
sx=funn(dx); /* looks legit */
free(dx);
}
return 0;
}
In a static' approach it is more comfortable to me doing your first method. Because I don't want to mess with the dynamic part (with legit pointers).
But in a dynamic' approach I'll use your second method. Because it is made for it.
So they are equivalent but not the same, the second approach is clearly made for pointers and so for the dynamic part.
And so far more clear ->
int main(void){
int sx,*dx;
sx=func(4,6);
dx=malloc(sizeof(int));
if(dx){
sx=funn(dx);
free(dx);
}
return 0;
}
than ->
int main(void){
int sx,*dx;
funn(&sx);
dx=malloc(sizeof(int));
if(dx){
*dx=func(4,6);
free(dx);
}
return 0;
}
Maybe I have the wrong idea - but it was my understanding that wide types (i.e wchar_t etc) were for UTF-16 Unicode types. If this is correct, then I can't understand the flood of responses to similar issues, all involving some form of wchar_t, or other "wide" conversion with UTF-8.
I'm doing a CLI/C++ project with MSVC, with a Unicode build, that uses an implementation of Luac, to compile lua code to bytecode. Now everything works nicely in that regard, but the trouble is that no special handling is done for UTF-8 files - except for "discarding" the BOM. So all the data within is treated as ANSI. Obviously, when it comes to special characters, it becomes a problem to display them correctly.
So, I need a way to convert between the two - preferably at the source (fopen); but as I've rerouted the output, I could also do so there as well. Unfortunately the only promising solution I've found- using FILE* fh=fopen(fn,"r,css=UTF-8); just ends up kicking back an exception for invalid file mode. Which is puzzling, considering it's an Visual C++ project.
Unless of course, I need to change my include order/add an additional include?
/lauxlib.c
#include <ctype.h>
#include <errno.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "lua.h"
#include "lauxlib.h"
/lauxlib.h
#include <stddef.h>
#include <stdio.h>
#include "lua.h"
Edit:
After taking a look at the file in a hex editor, I'm beginning to understand. UTF-8 isn't just 1-byte, it's just 1-byte when it can be. The initial problem still remains, but at least I understand it a bit more.
Edit2/Update:
First off, I'm not sure if this part should be an answer, or if I should close the question - so please, feel free to educate me on that.
The application was originally written to be a console application - so when it needed to output, it just used either putchar or printf. However, this wasn't going to help for a WinForms application. So I basically just rerouted it, by making managed-friendly equivalents.
Luac is essentially a parser/compiler for Lua scripts. It has the option to output information based on the result of that parsing. Listing things like functions, opcodes, constant, and local variables. When it prints out the constants for each function, it prints the actual values of said constants. And that's where the encoding problem comes in.
If the constant value is a string type, the function written to handle printing strings, does the following:
casts a its parameter - a pointer to a union type to a const char*
loops through the const char* via index, assigning the value of the char to an int
checks for any escape characters in the text via a switch/case (tab, newline,etc) and escapes them
if that falls through, the default case is checking if it's a printable char, using isprint
If it is, it uses putchar
If not, it uses printf. Casting it to an unsigned char, and using \\%03u for the format.
Now obviously, if the intention was to display it in a form control, and the format is UTF-8, printing out the unsigned value of the individual chars isn't going to help. So I finally decided to just keep Googling for some usage clarification on MultiByteToWideChar, and that worked - except for high value characters (i.e. Asian language characters). Since I found that said Windows function makes some mistakes, I eventually found another that did so "manually". Unfortunately, it still didn't handle those characters correctly.
So I took another look at the actual const char* that was being looped over, and discovered the reason it wasn't being converted - was because something else changed those chars to a value of 63 - the question mark. And this is about the time where tracking that particular "something else" is far beyond my abilities, and asking for help has a real good chance of ending up far too specific for this site's guidelines.
Because the parameter that this function takes, is a pointer to a union typedef, that contains a typedef for string alignment, and a struct - which contains absolutely zero char arrays/pointers. But yet, it casts to one. Which is how that parameter gets turned into a const char* in the function. Since specifically changing certain char values to 63, doesn't seem very beneficial, I'm thinking that it's either the result of a c function, or maybe an ill advised (at least in this case) cast. Maybe if someone knows of a situation where that would be the result, and lets me know, I could probably find the offending code. But otherwise, it's way too specific for me to expect someone to be able to help in this case.
Use the Win32 API funxtion MultibyteToWideChar to convert the stuff you read to "wide" which is UTF-16. I think the stream classes and/or FILE streams have a conversion mode, which would be exactly what you need.
wchat_t is a 16-bit UTF-16 code point in Windows. Other platforms often make wchar_t 32 bits and have different conventions.
I am trying to save the data type given as a command line argument in my C program and use that type in the whole program without checking it. For example, I could run the program "./name -d int" or "./name -d float" and I want the data type to be saved for further use and to be seen in the entire program, not only in the main() function. A short example:
int main() {
/* read command line argument */
/* I would like to be able to save the type in T to use like this */
T a[20];
/* rest of the program */
}
Could I do this?
Thank you.
As pointed out by an earlier answer, you can't do this in c unless you have something like switch statements in your code that handle different cases, because data types are determined at compile time. If you are willing to settle for less than 64 bit precision for integers and you have 64 bit doubles, you can use doubles for all of your numbers and then just have switch statements e.g. when you output, that convert the double to an integer or char etc. as necessary and then output in the desired format.
No you can't do that in C.
Types have to be determined are compile-time, you can't choose the type at runtime.
More dynamic languages (like Objective-C for instance) will allow you to do such things.
Why are you trying to do that in the first place ? Maybe we can provide any more guidance.