What is stdin in C language? - c

I want to build my own scanf function. Basic idea is data from a memory address and save it to another memory address.
What is stdin? Is it a memory-address like 000ffaa?
If it is a memory-address what is it so I can build my own scanf function. Thanks!.

No, stdin is not "a memory address".
It's an I/O stream, basically an operating-system level abstraction that allows data to be read (or written, in the case of stdout).
You need to use the proper stream-oriented I/O functions to read from the stream.
Of course you can read from RAM too, so it's best to write your own function to require a function that reads a character, then you can adapt that function to either read from RAM or from stdin.
Something like:
int my_scanf(int (*getchar_callback)(void *state), void *state, const char *fmt, ...);
Is usually reasonable. The state pointer is some user-defined state that is required by the getchar_callback() function, and passed to it by my_scanf().

stdin is an "input stream", which is an abstract term for something that takes input from the user or from a file. It is an abstraction layer sitting on top of the actual file handling and I/O. The purpose of streams is mainly to make your code portable between different systems.
Reading/writing to memory is much more low-level and has nothing to do with streams as such. In order to use a stream in a meaningful way, you would have to know how a certain compiler implements the stream internally, which may not be public information. In some cases, like in Windows, streams are defined by the OS itself and can get accessed through API calls.
If you are looking to build your own scanf function, you would have to look into specific API functions for a specific OS, then build your own abstraction layer on top of those.

On Unix everything is a file
https://en.wikipedia.org/wiki/Everything_is_a_file
Or like they notice
Everything is a file descriptor
You can find on unix system /dev/stdin who is a symbolic link to /dev/fd/0 who is a Character special file

Related

Can I adapt a function that writes to disk to write to memory

I have third-party library with a function that does some computation on the specified data, and writes the results to a file specified by file name:
int manipulateAndWrite(const char *filename,
const FOO_DATA *data);
I cannot change this function, or reimplement the computation in my own function, because I do not have the source.
To get the results, I currently need to read them from the file. I would prefer to avoid the write to and read from the file, and obtain the results into a memory buffer instead.
Can I pass a filepath that indicates writing to memory instead of a
filesystem?
Yes, you have several options, although only the first suggestion below is supported by POSIX. The rest of them are OS-specific, and may not be portable across all POSIX systems, although I do believe they work on all POSIXy systems.
You can use a named pipe (FIFO), and have a helper thread read from it concurrently to the writer function.
Because there is no file per se, the overhead is just the syscalls (write and read); basically just the overhead of interprocess communication, nothing to worry about. To conserve resources, do create the helper thread with a small stack (using pthread_attr_ etc.), as the default stack size tends to be huge (on the order of several megabytes; 2*PTHREAD_STACK_SIZE should be plenty for helper threads.)
You should ensure the named pipe is in a safe directory, accessible only to the user running the process, for example.
In many POSIXy systems, you can create a pipe or a socket pair, and access it via /dev/fd/N, where N is the descriptor number in decimal. (In Linux, /proc/self/fd/N also works.) This is not mandated by POSIX, so may not be available on all systems, but most do support it.
This way, there is no actual file per se, and the function writes to the pipe or socket. If the data written by the function is at most PIPE_BUF bytes, you can simply read the data from the pipe afterwards; otherwise, you do need to create a helper thread to read from the pipe or socket concurrently to the function, or the write will block.
In this case, too, the overhead is minimal.
On ELF-based POSIXy systems (basically all), you can interpose the open(), write(), and close() syscalls or C library functions.
(In Linux, there are two basic approaches, one using the linker --wrap, and one using dlsym(). Both work fine for this particular case. This ability to interpose functions is based on how ELF binaries are linked at run time, and is not directly related to POSIX.)
You first set up the interposing functions, so that open() detects if the filename matches your special "in-memory" file, and returns a dedicated descriptor number for it. (You may also need to interpose other functions, like ftruncate() or lseek(), depending on what the function actually does; in Linux, you can run a binary under ptrace to examine what syscalls it actually uses.)
When write() is called with the dedicated descriptor number, you simply memcpy() it to a memory buffer. You'll need to use global variables to describe the allocated size, size used, and the pointer to the memory buffer, and probably be prepared to resize/grow the buffer if necessary.
When close() is called with the dedicated descriptor number, you know the memory buffer is complete, and the contents ready for processing.
You can use a temporary file on a RAM filesystem. While the data is technically written to a file and read back from it, the operations involve RAM only.
You should arrange for a default path to one to be set at compile time, and for individual users to be able to override that for their personal needs, for example via an environment variable (YOURAPP_TMPDIR?).
There is no need for the application to try and look for a RAM-based filesystem: choices like this are, and should be, up to the user. The application should not even care what kind of filesystem the file is on, and should just use the specified directory.
You could not use that library function. Take a look at this on how to write to in-memory files:
Is it possible to create a C FILE object to read/write in memory

Equivalent of fgetc with Unix file descriptors

The fgetc(3) function takes a FILE * as its input stream. Must I reimplement character-at-a-time input with read(2), or is there a <unistd.h>-style equivalent taking an integer file descriptor instead?
No, there isn't such a thing, and please never do read(fd, &ch, sizeof(char)) (explanations below).
The function read(2) is usually implemented as a system call to the operating system kernel. Although the internal (and funky) details of such a thing shall not be discused here, the overall idea is that system calls are (usually) not something cheap.
It would be inefficient for both the userspace application and the kernel to do a system call just to get a single character from a file descriptor.
For instance, fgetc(3) usually ends up doing some buffering inside the structure of the FILE object. This means that the internal read(2) from fgetc(3) wouldn't just read a single character, but rather it'll try to get more for the sake of efficiency.
Anyway, it's not usually a good idea to mess up with such low-level stuff. You can get all the benefits of buffering (and of FILEs overall) by using fdopen(3) to create a FILE object from a file descriptor, as your question appears to imply that you have at hand just a raw file descriptor at the moment.
If you want to, you can open a file using open() -
int fh = open("abc.txt", O_RDONLY, S_IREAD); // there are different permissions you can provide (refer to link).
and then you can use fh in read() calls.

Is it possible to fake a file stream, such as stdin, in C?

I am working on an embedded system with no filesystem and I need to execute programs that take input data from files specified via command like arguments or directly from stdin.
I know it is possible to bake-in the file data with the binary using the method from this answer: C/C++ with GCC: Statically add resource files to executable/library but currently I would need to rewrite all the programs to access the data in a new way.
Is it possible to bake-in a text file, for example, and access it using a fake file pointer to stdin when running the program?
If your system is an OS-less bare-metal system, then your C library will have "retargetting" stubs or hooks that you need to implement to hook the library into the platform. This will typically include low-level I/O functions such as open(), read(), write(), seek() etc. You can implement these as you wish to implement the basic stdin, stdout, stderr streams (in POSIX and most other implementations they will have fixed file descriptors 0, 1 and 2 respectively, and do not need to be explicitly opened), file I/O and in this case for managing an arbitrary memory block.
open() for example will be passed a file or device name (the string may be interpreted any way you wish), and will return a file descriptor. You might perhaps recognise "cfgdata:" as a device name to access your "memory file", and you would return a unique descriptor that is then passed into read(). You use the descriptor to reference data for managing the stream; probably little more that an index that is incremented by the number if characters read. The same index may be set directly by the seek() implementation.
Once you have implemented these functions, the higher level stdio functions or even C++ iostreams will work normally for the devices or filesystems you have supported in your low level implementation.
As commented, you could use the POSIX fmemopen function. You'll need a libc providing it, e.g. musl-libc or possibly glibc. BTW for benchmarking purposes you might install some tiny Linux-like OS on your hardware, e.g. uclinux

How does the stdin buffer work?

When using functions such as scanf you read bytes from a buffer where (usually) data coming from the keyboard is stored. How is this data stored? Is it stored inside a fixed size vector? Is there any way to access it directly from code?
The buffer used by the standard libraries input routines is private to the implementation of the standard library. You cannot access it other than through the published interface to the standard library.
The setvbuf() function lets you reconfigure the type of buffering for a stdio stream and replace the buffer with one you have allocated. That doesn't mean you should access the buffer behind the C library's back, but it does give you control over the size and whether the stream is unbuffered, line-buffered, or fully buffered.
You can't read the buffer directly. The best you can do is read keystrokes directly as they're typed, effectively enabling you to write your own scanf( ). To see the code for reading keystrokes, search for 'kbhit.c' on this page: http://pwilson.net/sample.html

How does scanf() work inside the OS?

I've been wondering how scanf()/printf() actually works in the hardware and OS levels. Where does the data flow and what exactly is the OS doing around these times? What calls does the OS make? And so on...
scanf() and printf() are functions in libc (the C standard library), and they call the read() and write() operating system syscalls respectively, talking to the file descriptors stdin and stdout respectively (fscanf and fprintf allow you to specify the file stream you want to read/write from).
Calls to read() and write() (and all syscalls) result in a 'context switch' out of your user-level application into kernel mode, which means it can perform privileged operations, such as talking directly to hardware. Depending on how you started the application, the 'stdin' and 'stdout' file descriptors are probably bound to a console device (such as tty0), or some sort of virtual console device (like that exposed by an xterm). read() and write() safely copy the data to/from a kernel buffer called a 'uio'.
The format-string conversion part of scanf and printf does not occur in kernel mode, but just in ordinary user mode (inside 'libc'), the general rule of thumb with syscalls is you switch to kernel mode as infrequently as possible, both to avoid the performance overhead of context switching, and for security (you need to be very careful about anything that happens in kernel mode! less code in kernel mode means less bugs/security holes in the operating system).
btw.. all of this was written from a unix perspective, I don't know how MS Windows works.
On my OS I am working with scanf and printf are based on functions getch() ant putch().
I think the OS just provides two streams, one for input and the other for output, the streams abstract away how the output data gets presented or where the input data comes from.
so what scanf & printf are doing are just adding bytes (or consuming bytes) from either streams.
scanf , printf etc internally all these types of functions can't be directly written in c/c++ language. internally they all are written in assembly language by the use of keword "asm", any thing written with keyword "asm" are directly introduced to object file irrespective of compilation (not changed even after compilation), and in assembly language we have got predefined codes which can implement all these functions ...... so in short SCANF PRINTF etc ALL ARE WRITTEN IN ASSEMBLY LANGUAGE INTERNALLY. YOU CAN DESIGN YOUR OWN INPUT FUNCTION USING KEYWORD "ASM".

Resources