Writing a CFSTR to the terminal in Mac OS X - c

How best would I output the following code
#include <CoreFoundation/CoreFoundation.h> // Needed for CFSTR
int main(int argc, char *argv[])
{
char *c_string = "Hello I am a C String. :-).";
CFStringRef cf_string = CFStringCreateWithCString(0, c_string, kCFStringEncodingUTF8);
// output cf_string
//
}

There's no API to write a CFString directly to any file (including stdout or stderr), because you can only write bytes to a file. Characters are a (somewhat) more ideal concept; they're too high-level to be written to a file. It's like saying “I want to write these pixels”; you must first decide what format to write them in (say, PNG), and then encode them in that format, and then write that data.
So, too, with characters. You must encode them as bytes in some format, then write those bytes.
Encoding the characters as bytes/data
First, you must pick an encoding. For display on a Terminal, you probably want UTF-8, which is kCFStringEncodingUTF8. For writing to a file… you usually want UTF-8. In fact, unless you specifically need something else, you almost always want UTF-8.
Next, you must encode the characters as bytes. Creating a C string is one way; another is to create a CFData object; still another is to extract bytes (not null-terminated) directly.
To create a C string, use the CFStringGetCString function.
To extract bytes, use the CFStringGetBytes function.
You said you want to stick to CF, so we'll skip the C string option (which is less efficient anyway, since whatever calls write is going to have to call strlen)—it's easier, but slower, particularly when you use it on large strings and/or frequently. Instead, we'll create CFData.
Fortunately, CFString provides an API to create a CFData object from the CFString's contents. Unfortunately, this only works for creating an external representation. You probably do not want to write this to stdout; it's only appropriate for writing out as the entire contents of a regular file.
So, we need to drop down a level and get bytes ourselves. This function takes a buffer (region of memory) and the size of that buffer in bytes.
Do not use CFStringGetLength for the size of the buffer. That counts characters, not bytes, and the relationship between number of characters and number of bytes is not always linear. (For example, some characters can be encoded in UTF-8 in a single byte… but not all. Not nearly all. And for the others, the number of bytes required varies.)
The correct way is to call CFStringGetBytes twice: once with no buffer (NULL), whereupon it will simply tell you how many bytes it'll give you (without trying to write into the buffer you haven't given it); then, you create a buffer of that size, and then call it again with the buffer.
You could create a buffer using malloc, but you want to stick to CF stuff, so we'll do it this way instead: create a CFMutableData object whose capacity is the number of bytes you got from your first CFStringGetBytes call, increase its length to that same number of bytes, then get the data's mutable byte pointer. That pointer is the pointer to the buffer you need to write into; it's the pointer you pass to the second call to CFStringGetBytes.
To recap the steps so far:
Call CFStringGetBytes with no buffer to find out how big the buffer needs to be.
Create a CFMutableData object of that capacity and increase its length up to that size.
Get the CFMutableData object's mutable byte pointer, which is your buffer, and call CFStringGetBytes again, this time with the buffer, to encode the characters into bytes in the data object.
Writing it out
To write bytes/data to a file in pure CF, you must use CFWriteStream.
Sadly, there's no CF equivalent to nice Cocoa APIs like [NSFileHandle fileHandleWithStandardOutput]. The only way to create a write stream to stdout is to create it using the path to stdout, wrapped in a URL.
You can create a URL easily enough from a path; the path to the standard output device is /dev/stdout, so to create the URL looks like this:
CFURLRef stdoutURL = CFURLCreateWithFileSystemPath(kCFAllocatorDefault, CFSTR("/dev/stdout"), kCFURLPOSIXPathStyle, /*isDirectory*/ false);
(Of course, like everything you Create, you need to Release that.)
Having a URL, you can then create a write stream for the file so referenced. Then, you must open the stream, whereupon you can write the data to it (you will need to get the data's byte pointer and its length), and finally close the stream.
Note that you may have missing/un-displayed text if what you're writing out doesn't end with a newline. NSLog adds a newline for you when it writes to stderr on your behalf; when you write to stderr yourself, you have to do it (or live with the consequences).
So:
Create a URL that refers to the file you want to write to.
Create a stream that can write to that file.
Open the stream.
Write bytes to the stream. (You can do this as many times as you want, or do it asynchronously.)
When you're all done, close the stream.

Related

Reading a file using pread

The aim of the problem is to use only pread to read a file with the intergers.
I am trying to device a generic solution where I can read intergers of any length, but I think there must be a better solution from my current algorithm.
For the sake of explanation and to guide the algorithm, here is a sample input file. I have explicitly added \r\n to show that they exist in the file.
Input file:
23456\r\n
134\r\n
1\r\n
345678\r\n
Algorithm
1. Read a byte from the file
2. Check if it is number i.e '0' <= byte <= '9'
3.1 if yes, increment the offset and read the next byte
3.2 if not, is it \r
3.2.1 if yes, read the next and it should be \n.
Here the line is finished and we can use strtol to convert string to int.
3.2.2 // Error condition
I'm required to make this algorithm because if found out that pread reads the files as string and just pust the requested number of bytes in the provided buffer.
Question:
Is there an better way of reading intergers from the file using pread() instead of parsing each byte to determine the end-of-string and then converting to interget?
Is there an better way of reading intergers from the file using pread() instead of parsing each byte to determine the end-of-string and then converting to interget?
Yes, read big chunks of data into memory and then do the parsing on the memory. Use a big buffer (i.e. depending on system memory). On a mordern system where giga-bytes of memory is available, you can go for a buffer in the mega byte range. I would probably start out with a 1 or 2mega byte buffer and see how it performs.
This will be much more efficient that byte-by-byte reads.
note: your code needs to handle situations where a chunk from the file stops in the middle of an integer. That adds a little complexity to code but it's not that difficult to handle.
where I can read intergers of any length
Well, if you actually mean integers greater than the largest integer of your system, it's much more complicated. Standard functions like strtol can't be used. Further, you'll need to define your own way of storing these values. Alternatively, you can fetch a public library that can handle such values.

How do fread and fwrite distinguish between different data (types) in C?

I am working with a program and C (with Ubuntu and its bash) and using it to manipulate binary data files. First of all, when I use fopen(filename, 'w') it creates a file but without any extension. However, when I use vim filename it opens it up in some binary form.
For this question, when I use fwrite(array, sizeof(some struct), # of structs, filePointer) it writes (which I am not sure how in binary) into the file. When I use fread(anotherArray, sizeof(same struct), same # of structs, anotherFilePointer) it somehow magically knows how to read each struct in binary form and puts it into the array just by knowing its size and how much to read. What happens if I put a decimal value less than the number of structs there are in the # of structs parameter? How would fread know what to read correctly? How does it work in reading data just by looking at the sizes and not knowing what type of data it is?
fwrite writes the bytes of the memory where the object is stored to the output stream and fread reads bytes from the input stream into the memory whose address it gets as an argument. No assumption is made regarding the types and representations of the C objects stored in this memory.
Hence a number of problems can occur:
the representation of basic types can differ from one compiler to another, one machine to another, one OS to another, possibly even depending on compiler switches. Writing the bytes of the memory representation of basic types makes sense only if you know you will be reading the file back into byte-compatible structures.
the mode for accessing the input and output files matters: as you mention, files must be open in binary mode to avoid any translation between memory representation and file contents such as what happens for text files on legacy systems. For example text mode on MS-Windows causes 0A bytes to convert to 0D 0A sequences on output and 0D bytes to be stripped on input, resulting in different contents for isolated 0D bytes in the initial content.
if the C structure contains pointers, the bytes written to the output represent the value of these pointers, not what they point to. Reading these values back into memory is highly likely to create invalid pointers and very unlikely to make any sense.
if the C structure has a flexible array at the end, its contents is not included in the sizeof(T) bytes written by fwrite or read by fread.
the C structure may contain padding between members, causing the output file to contain non deterministic bytes, which might be a problem in some circumstances.
if the C structure has arrays with only partial meaningful contents, such as char arrays containing C strings, beware that fwrite will write the bytes beyond the null terminator, which should not be meaningful, but might be sensitive information such as password fragments or other meaningful data. Carefully erasing such arrays may avoid this issue, but padding bytes cannot be erased reliably, so this solution is not perfect.
For all the above reasons and other ones, reading/writing binary data is to be reserved to very specific cases where the programmer knows exactly what is happening. For other purposes, saving as text files in human readable form is much preferred.
In question comments from #David C. Rankin
"Well, fread/fwrite read and write bytes (binary data - if you write out then read in the same number of bytes -- you get the same thing back). If you want to read and write text where you need to worry about line-breaks, etc.., fgets/fputs. or fprintf"
So I guess I can never know what I read in with fread unless I know what I wrote to it in with fwriite?
"Right, look at the type for your buffer in fwrite(3) - Linux man page it is type void *. It's just a starting address for fwrite to use in writing however many bytes you told it to write. (obviously you know what it is writing) The same for fread -- it just reads bytes -- you have to know what you are reading (or at least the format of it). That's what binary I/O is about, it's all just bytes -- it's up to you, the Programmer, to know what you are writing and reading and how to unpack it. Otherwise, use formatted-I/O and lines, words, etc.."

fread in c reads more than instructed

I'm trying to write a program that will add effects to a .wav file.
The program should:
Read a .wav file
Parse the header
Read the data
Manipulate the data
Create a new .wav file -
Write the header
Write the new data
I'm stuck on some weird thing with fread() function -
when I'm trying to read 4 Bytes to the char array I've defined (of size 4 Bytes) - I'm getting the word + garbage.
If I try to read 2, or 3 Bytes in the same manner - everything works fine.
I tried printing the content of the array in both cases (when I read 2/3 Bytes v.s. when I read 4 Bytes) with a while loop until '\n' instead of printf("%s") - I got the same result (write string in the first case, string + garbage in the second case).
Also, whem I write the header back and just COPY the data - the file that is created is NOT the same song!
It does open - so the header is fine, but the data is garbage.
I'll be very glad to hear some ideas for the possible reasons for this.. I'm really stuck on it, please help me guys!
The problem - printscreen of the output
fread is not intended to read strings. It reads binary data. This means that the data will not be null terminated, nor have any other termination.
fread returns the amount of read bytes. After that point, the data will not be initialized, and must be ignored.
If you want to treat the data as string, you must null terminate it yourself with arr[count]=0. Make sure that arr has a least count+1 capacity in order to avoid a buffer overflow.
Perhaps reserve 5 bytes for your fmt_chunk_marker. That will let you represent a 4-character string as a null-terminated C string. The byte after the last character read should be set to the null character ('\0').

File Bytes Array Length Go

I have recently started to learn Go. To start with I decided that I would write some code to open a file and output its contents on the terminal window. So far I have been writing code like this:
file, err := os.Open("./blah.txt")
data := make([]byte, 100)
count, err := file.Read(data)
To obtain up to 100 bytes from a file. Is there any way to ascertain the byte count on a file, such that you could set the correct (or more sensible) byte array length just using the standard Go library?
I understand you could use a slice with something like Append() once the extremities of the array have been reached, but I just wondered whether the file size/length/whatever could be accessed prior to instantiating an array through file metadata or something similar.
While you could certainly get the file's size prior to reading
from it (see the other answer), doing this is usually futile
for a number of reasons:
A filesystem is an inherently racy medium: any number of processes
might update a given file simultaneously, and even remove it.
On a filesystem with POSIX semantics (most commodity OSes
excluding Windows) the only guarantee a successful opening of a file
gives you is that it's possible to read data from it,
and that's basically all. (Well, reading may fail due to the error
in the underlying media but let's not digress further).
What would you do if you did the equivalent of a fstat(2) call,
as suggested, and it told you the file contains 42 terabytes of data?
Would you try to allocate a sufficiently large array to hold its contents?
Would you implement some custom logic which classifies the file's
size into several ranges and performs custom processing based on that—like,
say, slurping files less than N megabytes in length and reading
bigger files piecemeal?
What if the file grew bigger (was appended to) after you obtained its size?
What if you later decide to be a more Unix-way-ready and make it possible
to read the data from your program's standard input stream—like the cat
program on Unix (or its type Windows cousin) does?
You can't know how much data will be piped through that stream;
and potentially it might be of indefinite length (consider being piped
the contents of some busy log file on a continuously running system).
Sure, in some applications you assume the contents of files do not
change under you feet; one example is archivers like zip or tar which
record the file's metadata, including its size, along with the file.
(By the way, tar detects a file might have changed while the program
was reading its contents and warns the user in that case).
But what I'm leading you to, is that for a task as simple as yours,
there's little point in doing it the way you've come up with.
Instead, just use a buffer of some "sensible" size and gateway the data
between its source and destination through that buffer.
That is, you allocate the buffer, enter a loop, and on each iteration of
it you try to read as much data as fits in the buffer, process whatever
the Read function indicated it was able to read, then handle an
end-of-file condition or an error, if it was indicated.
To round up this small crash course, I'd hint that the standard library
already has io.Copy which, in your
case, may be called like
_, err := io.Copy(os.Stdout, f)
and will shovel all the contents of f to the standard output of your
program until EOF or an error is detected.
Last time I checked, this function used an internal buffer of 32 KiB in size,
but you may always check the source code of your Go installation.
I assume what you need is a way to get file size in bytes to create a slice of the same size:
f, err := f.Stat()
// handle error
// ...
size := f.Size()
(see FileInfo for more)
You can then use this size to initialise a slice.
data := make([]byte, size)
You can also consider reading the whole file in one call using ioutil.ReadFile.

Writing structure into a file in C

I am reading and writting a structure into a text file which is not readable. I have to write readable data into the file from the structure object.
Here is little more detail of my code:
I am having the code which reads and writes a list of itemname and code into a file (file.txt). The code uses linked list concept to read and write data.
The data are stored into a structure object and then writen into a file using fwrite.
The code works fine. But I need to write a readable data into the text file.
Now the file.txt looks like bellow,
㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀\䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠\㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈\䑜㵅㡸䍏䥔䥆㘸䘠㵅㩃䠀䵏㵈䑜㵅㡸䍏䥔\䥆㘸䘠㵅㩃䠀䵏㵈
I am expecting the file should be like this,
pencil aaaa
Table bbbb
pen cccc
notebook nnnn
Here is the snippet:
struct Item
{
char itemname[255];
char dspidc[255];
struct Item *ptrnext;
};
// Writing into the file
printf("\nEnter Itemname: ");
gets(ptrthis->itemname);
printf("\nEnter Code: ");
gets(ptrthis->dspidc);
fwrite(ptrthis, sizeof(*ptrthis), 1, fp);
// Reading from the file
while(fread(ptrthis, sizeof(*ptrthis), 1, fp) ==1)
{
printf("\n%s %s", ptrthis->itemname,ptrthis->dspidc);
ptrthis = ptrthis->ptrnext;
}
Writing the size of an array that is 255 bytes will write 255 bytes to file (regardless of what you have stuffed into that array). If you want only the 'textual' portion of that array you need to use a facility that handles null terminators (i.e. printf, fprintf, ...).
Reading is then more complicated as you need to set up the idea of a sentinel value that represents the end of a string.
This speaks nothing of the fact that you are writing the value of a pointer (initialized or not) that will have no context or validity on the next read. Pointers (i.e. memory locations) have application only within the currently executing process. Trying to use one process' memory address in another is definitely a bad idea.
The code works fine
not really:
a) you are dumping the raw contents of the struct to a file, including the pointer to another instance if "Item". you can not expect to read back in a pointer from disc and use it as you do with ptrthis = ptrthis->ptrnext (i mean, this works as you "use" it in the given snippet, but just because that snippet does nothing meaningful at all).
b) you are writing 2 * 255 bytes of potential crap to the file. the reason why you see this strange looking "blocks" in your file is, that you write all 255 bytes of itemname and 255 bytes of dspidc to the disc .. including terminating \0 (which are the blocks, depending on your editor). the real "string" is something meaningful at the beginning of either itemname or dspidc, followed by a \0, followed by whatever is was in memory before.
the term you need to lookup and read about is called serialization, there are some libraries out there already which solve the task of dumping data structures to disc (or network or anything else) and reading it back in, eg tpl.
First of all, I would only serialize the data, not the pointers.
Then, in my opinion, you have 2 choices:
write a parser for your syntax (with yacc for instance)
use a data dumping format such as rmi serialization mechanism.
Sorry I can't find online docs, but I know I have the grammar on paper.
Both of those solution will be platform independent, be them big endian or little endian.

Resources