am i misunderstanding this~ Why is COPY changing the output? (PGPLSQL) - database

I'm very close to solving a big problem that i'm having at the moment and this is the last bit. I just don't know why the output is different to what's in the database when i need it EXACTLY the same.
Reading the data before entering the database:
[NULL][NULL][NULL][SO][etc.......]
Reading the data from COPY to text file:
\\000\\000\\000\\016[etc...} (it matches, basically)
Reading the data after COPY using binary format
PGCOPY
ÿ
[NULL][NULL][NULL][EOT][etc.......] (first line changes a fair bit)
(rest of the data stays exactly the same.)
˜ÿÿ
The postgresql query being run for test's sake:
COPY (SELECT byteacolumn FROM tablename WHERE id = 1) TO 'C:\path\file' (format:Binary);
So using the binary format gives me almost what i need but not quite. I could botch to ignore the added lines, but i wouldn't know what the first line of data should be.
TL;DR: COPY is adding lines and changing the first row of my data. How do i make it stop? :(

The binary COPY format is really only designed to be consumed by a COPY FROM command, so it contains a lot of metadata to allow Postgres to interpret it.
After the first two lines, the next 9 bytes are also part of the header. The next 2 bytes give the number of fields in the following record, and the next 4 bytes give the number of bytes in the following field. Only then does the actual data begin.
The full details of the format can be found in the documentation, but be aware that they could change in a future release.
(However, assuming this is the same problem you were asking about here, I think this is the wrong way to go about it. You were on the right track with lo_import/lo_export.)

Related

How to write data to a binary file using C

I want to write the CAN frames to binary file including the current timestamp and CAN ID followed by 8bytes data frame. I have following in my program,
fwrite("&tm->tm_mon+1", sizeof(tm->tm_mon+1),1,fptr);
fwrite("&tm->tm_mday", sizeof(tm->tm_mday+1),1,fptr);
fwrite("&tm->tm_hour", sizeof(tm->tm_hour+1),1,fptr);
fwrite("&tm->tm_sec", sizeof(tm->tm_sec+1),1,fptr);
fwrite("&frame_rd->can_id", sizeof(frame_rd->can_id),1,fptr);
fwrite("&frame_rd->data", sizeof(frame_rd->data),1,fptr);
Is it right way to do so? Can anyone help me in doing it. Thanks in advance.
A few hints (not a complete solution):
The fact that tm->tm_mon is zero-based and thus requires +1 to make sense as a month number jan...dec, does not mean that you must write sizeof(tm->tm_mon+1).
In the same way, if you want to write its value and take its address, that does not mean you have to add 1. Think what you are doing: to what are you adding this 1??? (just as think about what you were taking the size of.)
And if you want to write this value, then do not mix a function like printf which requires a format string (the f meaning "print formatted") with fwrite (the f meaning File Write). Thus you don't provide a format string with what you want to write, you just provide the address of what needs to be written. And of course without any +1 (what would you be adding this 1 to?)
With these hints I hope you can find your answer. And remember, If nothing helps, Read the Manual!

Retrieving gobs written to file by appending several times

I am trying to use encoding/gob to store data to a file and load it later. I want to be able to append new data to the file and load all saved data later, e.g. after restarting my application. While storing to the file using Encode() there are no problems, but when reading it seems I always get only the item which was first stored, not the succinctly stored items.
Here is a minimal example: https://play.golang.org/p/patGkKDLhM
As you see, it works to write two times to an encoder and then read it back. But when closing the file and reopening it again in append mode, writing seems to work, but reading works only for the first two elements (which have been written previously). The two newly added structs cannot be retrieved, I get the error:
panic: extra data in buffer
I am aware of Append to golang gob in a file on disk and I also read https://groups.google.com/forum/#!topic/golang-nuts/bn6vjC5Abd8
Finally, I also found https://gist.github.com/kjk/8015952 which seems to demonstrate that what I am trying to do does not work. Why? What does this error mean?
I have not used the encoding/gob package yet (looks cool, I might have to find a project for it). But reading the godoc, it would seem to me that each encoding is a single record expected to be decoded from beginning to end. That is, once you Encode a stream, the resulting bytes is a complete set respecting the entire stream from start to finish - not able to be appended to later by encoding again.
The godoc states that an encoded gob is self-descriptive. At the beginning of the encoded stream, it describes the entire data set struct, types, etc that will be following including the field names. Then what follows in the byte stream is the the size and byte representation of the value of those Exported fields.
Then one could assume that what is omitted from the docs is since the stream self-describes itself at the very beginning, including each field that is about to be passed, that is all that the Decoder will care about. The Decoder will not know of any successive bytes added after what has been described as it only sees what was described at the beginning. Therefore, that error message panic: extra data in buffer is accurate.
In your Playground example, you are encoding twice to the same encoder instance and then closing the file. Since you are passing exactly two records in, and encoding two records, that may work as the single instance of the encoder may see the two Encode calls as a single encoded stream. Then when you close the file io's stream, the gob is now complete - and the stream is treated as a single record (even though you sent in two types).
And the same in the decoding function, you are reading X number of times from the same stream. But, you are writing a single record when closing the file - that actually has two types in that one single record. Hence why it works when reading 2, and EXACTLY 2. But fails if reading more than 2.
A solution, if you want to store this in a single file, is that you will need to create your own index of each complete "write" or encoder instance/session. Some form your own Block method that allows you to wrap or define each entry written to disk with a "begin" and "end" marker. That way, when reading back the file, you know exactly what buffer to allocate because of the begin/end markers. Once you have a single record in a buffer, then you use gob's Decoder to decode it. And close the file after each write.
The pattern I use for such markers is something like:
uint64:uint64
uint64:uint64
...
The first being the beginning byte number, and the second entry separated by a colon being its length. I usually store this in another file though, called appropriately indexes. That way it can be quickly read into memory, and then I can stream the large file knowing exactly where each start and end address is in the byte stream.
Another option is just to store each gob in its own file, using the file system directory structure to organize as you see fit (or one could even use the directories to define types, for example). Then the existence of each file is a single record. This is how I use my rendered json from Event Sourcing techniques, storing millions of files organized in directories.
In summary, it would seem to me that a gob of data is a complete set of data from beginning to end - a single "record" have you. If you want to store multiple encodings/multiple gobs, then to will need to create your own index to track the start and size/end of each gob bytes as you store them. Then, you will want to Decode each entry separately.

Most efficient way to replace a line in a text document ?

I am learning to code in Unix with C. So far I have written the code to find the index of the first byte of the line that I want to replace. The problem is that sometimes, the number of bytes replacing the line might be greater than the number of bytes already on the line. In this case, the code start overwriting the next line. I came up with two standard solutions:
a) Rather than trying to edit the file in-place, I could copy the entire file into memory, edit it by shifting all the bytes if necessary and rewriting it back to file.
b) Only copy the line I want to end-of-file to memory and edit.
Both suggestions doesn't scale well. And I don't want to impose any restrictions on the line size(like every line must be 50 bytes or something). Is there any efficient way to do the line replacement ? Any help would be appreciated.
Copy the first part of the file to a new file (no need to read it all into memory). Then, write the new version of the line. Finally, copy the final part of the file. Swap files and done.

How to take numbers and words out of a txt file and assign them to int and char*

I am making a text based game and want the user to be able to save. When they save all the variables will be saved to a text file.
I can't figure out how to take them out of the file and assigning them to specific variables and pointers.
The file will look somewhat like this:
jesse
hello
yes
rifle
0
1
3
20
Is there anyway I can specify what line I want to take out with fscanf? Or do I have to take a different approach?
There is no way to specify what line to read from because the concept of a file stream in C does not explicitly distinguish new lines. They are simply treated as a character. To read from a specific line, you would have to loop forward with fseek and fgetc until you find '\n' at which point you can update some variable that holds the current line number the stream points to.
One way around this would be to have information at a fixed offset. For example, say you are storing player information then if you make player information a fixed size X and have the constituent data at fixed indexes into each structure, you can just fseek to the right location straight away.
However, if you have structured data, it may be more suitable to use a format which is able to represent these structures inherently such as XML or JSON.
Altough I can't exactly tell what you want, I'd do a few suggestions:
Use a SQLite file instead of a text file. This way you can use SQL to get exactly what you want. Shortcut for you: http://www.sqlite.org/
If you still want to use a text file, use it comma-separated instead of spaces-separated. It's more common of a practice.
I have created a simple settings reader for my C program, maybe it might be useful to you to know how to parse test files
https://codereview.stackexchange.com/questions/8620/coding-style-in-c

Lots of questions about file I/O (reading/writing message strings)

For this university project I'm doing (for which I've made a couple of posts in the past), which is some sort of social network, it's required the ability for the users to exchange messages.
At first, I designed my data structures to hold ALL messages in a linked list, limiting the message size to 256 chars. However, I think my instructors will prefer if I save the messages on disk and read them only when I need them. Of course, they won't say what they prefer, I need to make a choice and justify the best I can why I went that route.
One thing to keep in mind is that I only need to save the latest 20 messages from each user, no more.
Right now I have an Hash Table that will act as inbox, this will be inside the user profile. This Hash Table will be indexed by name (the user that sent the message). The value for each element will be a data structure holding an array of size_t with 20 elements (20 messages like I said above). The idea is to keep track of the disk file offsets and bytes written. Then, when I need to read a message, I just need to use fseek() and read the necessary bytes.
I think this could work nicely... I could use just one single file to hold all messages from all users in the network. I'm saying one single file because a colleague asked an instructor about saving the messages from each user independently which he replied that it might not be the best approach cause the file system has it's limits. That's why I'm thinking of going the single file route.
However, this presents a problem... Since I only need to save the latest 20 messages, I need to discard the older ones when I reach this limit.
I have no idea how to do this... All I know is about fread() and fwrite() to read/write bytes from/to files. How can I go to a file offset and say "hey, delete the following X bytes"? Even if I could do that, there's another problem... All offsets below that one will be completely different and I would have to process all users mailboxes to fix the problem. Which would be a pain...
So, any suggestions to solve my problems? What do you suggest?
You can't arbitrarily delete bytes from the middle of a file; the only way that works is to rewrite the entire file without them. Disregarding the question of whether doing things this way is a good idea, if you have fixed length fields, one solution would be to just overwrite the oldest message with the newest one; that way, the size / position of the message on disk doesn't change, so none of the other offsets are affected.
Edit: If you're allowed to use external libraries, making a simple SQLite db could be a good solution.
You're complicating your life way more than you need to.
If your messages are 256 characters, then use a array of 256 characters to hold each message.
Write it to disk with fwrite, read with fread, delete it by changing the first character of the string to \0 (or whatever else strikes your fancy) and write that to disk.
Keep an index of the messages in a simple structure (username/recno) and bounce around in the file with fseek. You can either brute-force the next free record when writing a new one (start reading at the beginning of the file and stop when you hit your \0) or keep an index of free records in an array and grab one of them when writing a new one (or if your array is empty then fseek to the end of the file and write a complete new record.)
I want to suggest another solution for completeness' sake:
Strings should be ending with a null-byte character, "hello world\0", so you might read the raw binary data until reaching "\0".
Other datatypes have fixed bits, beware of byteorder (endian).
Also you could define a payload before each message, so you know its string length:
"11hello world;2hi;15my name is loco"
Thus making it possible to treat raw snippets like data fields.

Resources