Segmentation fault trying to read until header is encountered in C - c

I am new to C so pardon the simplicity of the question. I am trying to write a function (as a lib, so it must be robust) that constantly reads one byte (EDIT: from a serial port) until a header start byte is encountered. If it finds it, it will read in the rest of the header and payload and store it in a struct. The start of my code looks something like this (some pseudocode will be included):
soh_read = 0;
bytes_read = 0;
bytes_left = 1;
do{
n = read(fd, buf + bytes_read, bytes_left);
if(n < 0){
if(errno != EAGAIN && errno != EWOULDBLOCK && errno != EINTR){
return -1;
}
}else{
bytes_read += n;
if(!soh_read){
if(buf[0] != SOH){
bytes_read = 0;
continue;
}
}
soh_read = 1;
//read header ...
//read payload ...
}while(timeout is not reached);
I assumed I could reset the bytes_read to 0 if SOH is not encountered and try to read in the buf[0] position again, overwriting the garbage it previously read. But it seems like this a case of a buffer overflow and why I am getting a segmentation fault? Why would that not work though? If so, what is the best way to go about this? I wanted to start at buf[0] so it'd be easy to keep track of each of the message fields. Just trying to learn from the experts here, thanks.

You've left out some information crucial to diagnosing the problem with your code as it stands. The single most important thing is (probably) whether your SOH might occur later in the file than you've allowed room for in your buf.
That said, however, I think I'd do things rather differently: since you apparently don't need (or even care about) the data that precedes the SOH anyway, why not just read all that data into one character, overwriting the previous value at each iteration, and only save more than one byte of data after you encounter the SOH so you actually have a use for it.
do {
read(fd, buf, 1);
if (n<0 && errno != EWOULDBLOCK && /* ... */)
return -1;
} while (buf[0] != SOH and !timeout_reached);
// read the header here

Related

Issues with socket buffer reader

I've been given this code for an assignment, there supposed to be errors in it but I can't actually figure out what this function is supposed to do, never mind figure out if there's any issues with it...
I am guessing that it's supposed to read the buffer line by line, but I've never seen it done this way before
The buffer that is sent to the function is empty.
int read_line(int sock, char *buffer) {
size_t length = 0;
while (1) {
char data;
int result = recv(sock, &data, 1, 0);
if ((result <= 0) || (data == EOF)){
perror("Connection closed");
exit(1);
}
buffer[length] = data;
length++;
if (length >= 2 && buffer[length-2] == '\r' && buffer[length-1] == '\n') {
buffer[length-2] = '\0';
return length;
}
}
}
Thanks in advance!
I'd say the purpose of this function is to read a line that ends with \r\n from socket stream and store it in a char array as a string, therefore the \0 (string termination character) placement.
Ok, so what's wrong with the code?
I'd start with the input parameter char *buffer - inside the function you do not know its size so you cannot check if it exceeds its size limit and it could lead to buffer overflow.
So it would be better to send buffer length as a parameter and check with every received byte if it can be stored.
EOF - it is defined as -1 and in this case actually doesn't make any sense, because nothing will be setting your data variable to EOF. The only thing you need to look out for is the end of socket stream (recv documentation). And here is an example for EOFusage.
Feel free to remove (data == EOF) from condition.
Let's say you are receiving everything regularly and you receive your last input and connection closes, so you enter this case:
if ((result <= 0) || (data == EOF)){
perror("Connection closed");
exit(1);
}
The problem here is that you won't process your last line and the program will just end. Although, I might be wrong here since I don't know when the connection is getting regularly shut down.
And a minor note here, result that equals to 0 isn't considered as an error, but a regular connection shutdown (or a 0-byte datagram was received).
I hope I haven't missed anything.

Getting characters past a certain point in a file in C

I want to take all characters past location 900 from a file called WWW, and put all of these in an array:
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while((NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
I try to create a char array of length 1, since the read system call requires a pointer, I cannot use a regular char. The above code does not work. In fact, it does not print any characters to the terminal as expected by the loop. I think my logic is correct, but perhaps a misunderstanding of whats going on behind the scenes is what is making this hard for me. Or maybe i missed something simple (hope not).
If you already know how many bytes to read (e.g. in appropriatesize) then just read in that many bytes at once, rather than reading in bytes one at a time.
char everythingPast900[appropriatesize];
ssize_t bytesRead = read(WWW, everythingPast900, sizeof everythingPast900);
if (bytesRead > 0 && bytesRead != appropriatesize)
{
// only everythingPast900[0] to everythingPast900[bytesRead - 1] is valid
}
I made a test version of your code and added bits you left out. Why did you leave them out?
I also made a file named www.txt that has a hundred lines of "This is a test line." in it.
And I found a potential problem, depending on how big your appropriatesize value is and how big the file is. If you write past the end of EverythingPast900 it is possible for you to kill your program and crash it before you ever produce any output to display. That might happen on Windows where stdout may not be line buffered depending on which libraries you used.
See the MSDN setvbuf page, in particular "For some systems, this provides line buffering. However, for Win32, the behavior is the same as _IOFBF - Full Buffering."
This seems to work:
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
int main()
{
int WWW = open("www.txt", O_RDONLY);
if(WWW < 0)
printf("Error opening www.txt\n");
//Keep track of all characters past position 900 in WWW.
int Seek900InWWW = lseek(WWW, 900, 0); //goes to position 900 in WWW
printf("%d \n", Seek900InWWW);
if(Seek900InWWW < 0)
printf("Error seeking to position 900 in WWW.txt");
int appropriatesize = 1000;
char EverythingPast900[appropriatesize];
int NextRead;
char NextChar[1];
int i = 0;
while(i < appropriatesize && (NextRead = read(WWW, NextChar, sizeof(NextChar))) > 0) {
EverythingPast900[i] = NextChar[0];
printf("%c \n", NextChar[0]);
i++;
}
return 0;
}
As stated in another answer, read more than one byte. The theory behind "buffers" is to reduce the amount of read/write operations due to how slow disk I/O (or network I/O) is compared to memory speed and CPU speed. Look at it as if it is code and consider which is faster: adding 1 to the file size N times and writing N bytes individually, or adding N to the file size once and writing N bytes at once?
Another thing worth mentioning is the fact that read may read fewer than the number of bytes you requested, even if there is more to read. The answer written by #dreamlax illustrates this fact. If you want, you can use a loop to read as many bytes as possible, filling the buffer. Note that I used a function, but you can do the same thing in your main code:
#include <sys/types.h>
/* Read from a file descriptor, filling the buffer with the requested
* number of bytes. If the end-of-file is encountered, the number of
* bytes returned may be less than the requested number of bytes.
* On error, -1 is returned. See read(2) or read(3) for possible
* values of errno.
* Otherwise, the number of bytes read is returned.
*/
ssize_t
read_fill (int fd, char *readbuf, ssize_t nrequested)
{
ssize_t nread, nsum = 0;
while (nrequested > 0
&& (nread = read (fd, readbuf, nrequested)) > 0)
{
nsum += nread;
nrequested -= nread;
readbuf += nread;
}
return nsum;
}
Note that the buffer is not null-terminated as not all data is necessarily text. You can pass buffer_size - 1 as the requested number of bytes and use the return value to add a null terminator where necessary. This is useful primarily when interacting with functions that will expect a null-terminated string:
char readbuf[4096];
ssize_t n;
int fd;
fd = open ("WWW", O_RDONLY);
if (fd == -1)
{
perror ("unable to open WWW");
exit (1);
}
n = lseek (fd, 900, SEEK_SET);
if (n == -1)
{
fprintf (stderr,
"warning: seek operation failed: %s\n"
" reading 900 bytes instead\n",
strerror (errno));
n = read_fill (fd, readbuf, 900);
if (n < 900)
{
fprintf (stderr, "error: fewer than 900 bytes in file\n");
close (fd);
exit (1);
}
}
/* Read a file, printing its contents to the screen.
*
* Caveat:
* Not safe for UTF-8 or other variable-width/multibyte
* encodings since required bytes may get cut off.
*/
while ((n = read_fill (fd, readbuf, (ssize_t) sizeof readbuf - 1)) > 0)
{
readbuf[n] = 0;
printf ("Read\n****\n%s\n****\n", readbuf);
}
if (n == -1)
{
close (fd);
perror ("error reading from WWW");
exit (1);
}
close (fd);
I could also have avoided the null termination operation and filled all 4096 bytes of the buffer, electing to use the precision part of the format specifiers of printf in this case, changing the format specification from %s to %.4096s. However, this may not be feasible with unusually large buffers (perhaps allocated by malloc to avoid stack overflow) because the buffer size may not be representable with the int type.
Also, you can use a regular char just fine:
char c;
nread = read (fd, &c, 1);
Apparently you didn't know that the unary & operator gets the address of whatever variable is its operand, creating a value of type pointer-to-{typeof var}? Either way, it takes up the same amount of memory, but reading 1 byte at a time is something that normally isn't done as I've explained.
Mixing declarations and code is a no no. Also, no, that is not a valid declaration. C should complain about it along the lines of it being variably defined.
What you want is dynamically allocating the memory for your char buffer[]. You'll have to use pointers.
http://www.ontko.com/pub/rayo/cs35/pointers.html
Then read this one.
http://www.cprogramming.com/tutorial/c/lesson6.html
Then research a function called memcpy().
Enjoy.
Read through that guide, then you should be able to solve your problem in an entirely different way.
Psuedo code.
declare a buffer of char(pointer related)
allocate memory for said buffer(dynamic memory related)
Find location of where you want to start at
point to it(pointer related)
Figure out how much you want to store(technically a part of allocating memory^^^)
Use memcpy() to store what you want in the buffer

Parse This IP To Be The Right Length?

For something I am doing I would like to get the external IP of the PC running the program (written in C). So far I have found the best way is to connect to a site that simply displays the IP of the visitor, and then parse the webpage for the IP. The first part was easy, but when I display the buffer I read the page (which only visibly consisted of my IP) I get a few random extra symbols/characters after the IP. Here is the code I am using ATM (simplified to exclude other stuff):
HINTERNET OpenInternet = NULL;
HINTERNET GetIP = NULL;
DWORD BytesRead = 0;
char IPGrabbed[30];
OpenInternet = InternetOpen("Microsoft Internet Explorer", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
if (OpenInternet == NULL) {
return 1;
}
GetIP = InternetOpenUrl(OpenInternet, "http://api.externalip.net/ip/", NULL, 0, INTERNET_FLAG_RELOAD, 0);
if (GetIP == NULL)
return 1;
if (!InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed), &BytesRead))
return 1;
printf("IP: %s", IPGrabbed);
getchar();
I also tried parsing through IPGrabbed stopping at any '\n' or '\r' (because it displays the weird characters on the line below the IP when I printf() it) and then copying everything up till there to another char array, but got the same result. Could anyone help me figure out what is going on here? Thank you.
Initialise the buffer to all 0s and then read one character less then the buffer to read into provides.
This way the 0-terminator a C-"string" relies on is provided implicitly.
char IPGrabbed[30] = ""; /* Initialise the buffer to all `0`s ... */
[...]
/* ... and then read one character less then the buffer to read into provides. */
if (!InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed) - 1, &BytesRead))
return 1;
fprintf(stderr, "IP: %s", IPGrabbed); /* Print to stderr, as it's not buffered so
everything appear immediately to the console. */
The result from InternetReadFile is not null-terminated, you need to add a null character to the end of the string by code after the read is successful:
IPGrabbed[BytesRead] = 0;
Edit 1
As suggested in the comment by Jonathan Potter, the above code may be subjected to a buffer overflow error if the site being accessed is returning anything longer than a IP string (maximum 16 characters).
Suggest to change the InternetReadFile to read 1 less of the buffer length instead of full buffer length to eliminate the above problem.
InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed)-1, &BytesRead)

c recv() read until newline occurs

I'm working on writing a IRC bot in C, and have ran into a snag.
In my main function, I create my socket and connect, all that happy stuff. Then I have a (almost) infinite loop to read what's being sent back from the server. I then pass what's read off to a helper function, processLine(char *line) - the problem is, that the following code reads until my buffer is full - I want it to only read text until a newline (\n) or carriage return (\r) occurs (thus ending that line)
while (buffer[0] && buffer[1]) {
for (i=0;i<BUFSIZE;i++) buffer[i]='\0';
if (recv(sock, buffer, BUFSIZE, 0) == SOCKET_ERROR)
processError();
processLine(buffer);
}
What ends up happening is that many lines get jammed all together, and I can't process the lines properly when that happens.
If you're not familiar with IRC protocols, a brief summary would be that when a message is sent, it often looks like this: :YourNickName!YourIdent#YourHostName PRIVMSG #someChannel :The rest on from here is the message sent...
and a login notice, for instance, is something like this: :the.hostname.of.the.server ### bla some text bla with ### being a code(?) used for processing - i.e. 372 is an indicator that the following text is part of the Message Of The Day.
When it's all jammed together, I can't read what number is for what line because I can't find where a line begins or ends!
I'd appreciate help with this very much!
P.S.: This is being compiled/ran on linux, but I eventually want to port it to windows, so I am making as much of it as I can multi-platform.
P.S.S.: Here's my processLine() code:
void processLine(const char *line) {
char *buffer, *words[MAX_WORDS], *aPtr;
char response[100];
int count = 0, i;
buffer = strdup(line);
printf("BLA %s", line);
while((aPtr = strsep(&buffer, " ")) && count < MAX_WORDS)
words[count++] = aPtr;
printf("DEBUG %s\n", words[1]);
if (strcmp(words[0], "PING") == 0) {
strcpy(response, "PONG ");
strcat(response, words[1]);
sendLine(NULL, response); /* This is a custom function, basically it's a send ALL function */
} else if (strcmp(words[1], "376") == 0) { /* We got logged in, send login responses (i.e. channel joins) */
sendLine(NULL, "JOIN #cbot");
}
}
The usual way to deal with this is to recv into a persistent buffer in your application, then pull a single line out and process it. Later you can process the remaining lines in the buffer before calling recv again. Keep in mind that the last line in the buffer may only be partially received; you have to deal with this case by re-entering recv to finish the line.
Here's an example (totally untested! also looks for a \n, not \r\n):
#define BUFFER_SIZE 1024
char inbuf[BUFFER_SIZE];
size_t inbuf_used = 0;
/* Final \n is replaced with \0 before calling process_line */
void process_line(char *lineptr);
void input_pump(int fd) {
size_t inbuf_remain = sizeof(inbuf) - inbuf_used;
if (inbuf_remain == 0) {
fprintf(stderr, "Line exceeded buffer length!\n");
abort();
}
ssize_t rv = recv(fd, (void*)&inbuf[inbuf_used], inbuf_remain, MSG_DONTWAIT);
if (rv == 0) {
fprintf(stderr, "Connection closed.\n");
abort();
}
if (rv < 0 && errno == EAGAIN) {
/* no data for now, call back when the socket is readable */
return;
}
if (rv < 0) {
perror("Connection error");
abort();
}
inbuf_used += rv;
/* Scan for newlines in the line buffer; we're careful here to deal with embedded \0s
* an evil server may send, as well as only processing lines that are complete.
*/
char *line_start = inbuf;
char *line_end;
while ( (line_end = (char*)memchr((void*)line_start, '\n', inbuf_used - (line_start - inbuf))))
{
*line_end = 0;
process_line(line_start);
line_start = line_end + 1;
}
/* Shift buffer down so the unprocessed data is at the start */
inbuf_used -= (line_start - inbuf);
memmove(innbuf, line_start, inbuf_used);
}
TCP doesn't offer any sequencing of that sort. As #bdonlan already said you should implement something like:
Continuously recv from the socket into a buffer
On each recv, check if the bytes received contain an \n
If an \n use everything up to that point from the buffer (and clear it)
I don't have a good feeling about this (I read somewhere that you shouldn't mix low-level I/O with stdio I/O) but you might be able to use fdopen.
All you would need to do is
use fdopen(3) to associate your socket with a FILE *
use setvbuf to tell stdio that you want it line-buffered (_IOLBF) as opposed to the default block-buffered.
At this point you should have effectively moved the work from your hands to stdio. Then you could go on using fgets and the like on the FILE *.

printf of char* gets Segmentation Fault

I'm Trying to read from a socket and print to stdout using printf (a must);
However I get a Segmentation Fault every time I read a specific file (an HTML) from the sane web site.
Please, take a look at this code and tell me what wrong.
int total_read = 0;
char* read_buff = malloc(BUF_SIZE);
char* response_data = NULL;
if (read_buff == NULL){
perror("malloc");
exit(1);
}
while((nbytes = read(fd, read_buff, BUF_SIZE)) > 0){
int former_total = total_read;
total_read += nbytes;
response_data = realloc(response_data, total_read);
memmove(response_data + former_total, read_buff, nbytes); //start writing at the end of spot before the increase.
}
if (nbytes < 0){
perror("read");
exit(1);
}
printf(response_data);
Thank You.
response_data is probably not NUL ('\0') terminated, so printf continues past the end of the string. Or possibly it contains a % directive but printf can't find further arguments.
Instead, tell printf how far to read, and not to interpret any % directives in the string.
printf("%.*s", total_read, response_data);
Note that if response_data contains an embedded NUL, printf will stop there even if total_read is longer.
What's likely to be in response_data? If it contains printf-formatting characters (i.e. % followed by one of the usual options), printf will try to access some parameters you've not passed, and a segmentation fault is quite likely. Try puts instead?
If you must use printf, do printf("%s", response_data) (and NUL-terminate it first)
My understanding from your post is that the response is the HTML data.
And since it is text you attempt to print it. Do not use printf the way you do.
Instead do the following:
for(int i = 0; i < total_read; i++)
putc(response_data[i],stdout);

Resources