So I am trying to get information from an html page. I use curl to get the html page. I then try to parse the html page and store the information I need in a character array, but I do not know what the size of the array should be. Keep in mind this is for an assignment so I won't be giving too much code, so I am supposed to dynamically allocate memory, but since I do not know what size it is, I have to keep allocating memory with realloc. Everything is fine within the function, but once it is returned, there is nothing stored within the pointer. Here is the code. Also if there is some library that would do this for me and you know about it, could you link me to it, would make my life a whole lot easier. Thank you!
char * parse(int * input)
{
char * output = malloc(sizeof(char));
int start = 270;
int index = start;
while(input[index]!='<')
{
output = realloc(output, (index-start+1)*sizeof(char));
output[index-start]=input[index];
index++;
}
return output;
}
The strchr function finds the first occurrence of its second argument in its first argument.
So here you'd have to find a way to run strchr starting at input[start], passing it the character '<' as second argument and store the length that strchr finds. This then gives you the length that you need to allocate for output.
Don't forget the '\0' character at the end.
Use a library function to copy the string from input to output.
Since this is an assignment, you'll probably find out the rest by yourself ...
That is the dynamic reading:
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
int main(){
int mem=270;
char *str=malloc(mem);
fgets(str,mem,stdin);
while(str[strlen(str)-1]!='\n'){//checks if we ran out of space
mem*=2;
str=realloc(str,mem);//double the amount of space
fgets(str+mem/2-1,mem/2+1,stdin);//read the rest (hopefully) of the line into the new space.
}
printf("%s",str);
}
Your output needs to end with '\0'. A pointer is just a pointer to the beginning of the string, and has no length, so without a '\0' (NUL) as a sentinel, you don't know where the end is.
You generally don't want to call realloc for every individual new character. It would usually make more sense to malloc() output to be the strlen() of input and then realloc() it once at the end.
Alternatively, you should double it in size each time you realloc it instead of just adding one byte. That requires you to keep track of the current allocated length in a separate variable though, so that you know when you need to realloc.
You might read up on the function strcspn, it can be faster than using a while loop.
Related
Let's say I want to modify char array using function.
I am always seeing people using malloc, calloc, or pointers to modify int, char, or 2D arrays.
Am I right, if I say, that string can be returned from function only if I use malloc, create that array pointer and return him? Then why not getting/altering string, by passing it to function parameter?
Isn't my demonstration, which is using char array in parameter easier, than allocating/freeing? Is my concept wrong, or why am I never seeing people passing arrays to function? I am only seeing codes with passing like "char *array", not "char array[]", using malloc etc, when I see this method of altering char array easy. Am I missing something?
#include <stdio.h>
void change(char array[]){
array[0]='K';
}
int main(){
char array[]="HEY";
printf("%s\n", array);
change(array);
printf("%s\n",array );
return 0;
}
If you only need to change existing characters in the string, and the string will be in a variable, and you don't mind the side-effect of your original string being modified, then your solution may be acceptable and indeed easier. But:
What if you want to get a modified string, but also want to retain the original? To avoid destroying an arbitrary-sized original, you need to malloc space, make a copy, and modify that.
And what if you want to extend the string? If your change is to add " YOU" to the string, it can't modify the original because there's no space for it--it'll cause a buffer overflow, since there's only 4 bytes allocated for "HEY" (three letters plus the null terminator). Again, the solution involves mallocing space to work with.
Functions that make changes using your technique typically need a size or length parameter to avoid overflowing the array and causing a crash and a potential security risk. But although that avoids the overflow, there's still the question of what happens if there's not enough space: Silently drop some data? Pass back a flag or special value to indicate there wasn't enough space, and expect the caller to handle it? In the long run, it ends up easier to write it right the first time, and malloc/calloc the space and deal with having to free it up later and all that.
I'm new to C so I'm having a little trouble handling everything Java already did for me in the background.
Basically what I would like to achieve is this:
Declare an array of char with no specified size
Ask to the user a string in input (single word or phrase)
Set the previous array of char with size of the length of the input string (dynamically)
Put the inputed string inside the char array
I've tried using scanf but it doesn't seem to handle string as an array of char(?) so I'm not able to work on the variable
I've also read about malloc() functions which dynamically allocates space for an array so I could use it to set the size of the array as the strlen of the string and then put '\0' at the end (just like .asciiz in some assembly language) but I can't figure out how to correlate malloc and input string.
Any help would be appreciated!
Thanks for your attention
You can use getline to read an entire line and not have to worry about managing memory during that read. The function was only standardized in POSIX.1-2008, so if you’re using glibc you’ll need to compile with -D_POSIX_C_SOURCE=200809L, for example.
To summarize the linked documentation: getline takes a pointer to a string, and will allocate memory entirely for you if the string is NULL and the size is 0. It returns −1 if it fails to allocate memory (e.g. there’s more input than free memory) or if the end of the input is reached immediately. You always have to free the memory allocated in this way, even if it fails.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
size_t input_size = 0;
char* input_line = NULL;
if (getline(&input_line, &input_size, stdin) == -1) {
free(input_line);
perror("Failed to read input");
return EXIT_FAILURE;
}
printf("Got input: '%s'\n", input_line);
free(input_line);
}
C does not provide a way for you to create an array of unspecified size. Generally, to do this sort of thing, you must create an array of some size (e.g., using malloc) and start reading user input. If the user input continues too long, you increase the size of the array (using realloc) and continue reading.
Once you reach the end of whatever the user is inputting, then you can reduce the array to match the actual size (again using realloc) if desired.
A consequence of this is that you cannot read the user input in one go. You must write code that reads portions of specific size, either character-by-character or as many characters as fit in the array you have created so far.
I have a string "abcdefg-this-is-a-test" and I want to delete the first 6 characters of the string. This is what I am trying:
char contentSave2[180] = "abcdefg-this-is-a-test";
strncpy(contentSave2, contentSave2+8, 4);
No luck so far, processor gets stuck and resets itself.
Any help will be appreaciated.
Question: How can I trim down a string in C?
////EDIT////
I also tried this:
memcpy(contentSave2, &contentSave2[6], 10);
Doesn't work, same problem.
int len=strlen(content2save);
for(i=6;i<len;i++)
content2save[i-6]=content2save[i];
content2save[i-6]='\0'
This will delete first 6 charcters . Based on requirement you may modify your code. If you want to use an inbuilt function try memmove
The problem with your first code snippet is that it copies the middle four characters to the beginning of the string, and then stops.
Unfortunately, you cannot expand it to cover the entire string, because in that case the source and output buffers would overlap, causing UB:
If the strings overlap, the behavior is undefined.
Overlapping buffers is the problem with your second attempt: memcpy does not allow overlapping buffers, so the behavior is undefined.
If all you need is to remove characters at the beginning of the string, you do not need to copy it at all: simply take the address of the initial character, and use it as your new string:
char *strWithoutPrefix = &contentSave2[8];
For copying of strings from one buffer to another use memcpy:
char middle[5];
memcpy(middle, &contentSave2[8], 4);
middle[4] = '\0'; // "this"
For copying potentially overlapping buffers use memmove:
char contentSave2[180] = "abcdefg-this-is-a-test";
printf("%s\n", contentSave2);
memmove(contentSave2, contentSave2+8, strlen(contentSave2)-8+1);
printf("%s\n", contentSave2);
Demo.
Simply you can use pointer because contentSave2 here is also a pointer to a char array plus this will be quick and short.
char* ptr = contentSave2 + 6;
ptr[0] will be equal to contentSave2[6]
You can use memmove function.
It is specially used when source and destination memory addresses overlap.
Small word of advice, try to avoid copying to and from overlapping source and destination. It is simply a buggen.
The following snippet should works fine:
#include <stdio.h>
#include <string.h>
int main() {
char contentSave2[180] = "abcdefg-this-is-a-test";
strncpy(contentSave2, contentSave2+8, 4);
printf("%s\n", contentSave2);
return 0;
}
I would suggest posting the rest of your code because your issue is elsewhere. As others pointed out, watch out for overlap when you use strncpy though in this specific case it should works.
I'm just having a bit of difficulty with a print. Basically, I have code and I'm assigning values to bestmatch[], which is defined as being of type line_t (see struct at bottom).
As you can see, I am storing values for bestmatch[].score (double), bestmatch[].index (int) and bestmatch[].buf (string). When I print them, show in second code block below, bestmatch[i].index and bestmatch[i].score print correctly; however, bestmatch[i].buf does not print at all.
Just to confuse matters more (for myself at least), if I print bestmatch[i].buf at the end of scorecmp (first code block), it prints fine. I've got my call to scorecmp down the very bottom for reference.
Why is it that it is printing index and score fine, but not buf? Or even more, how can I fix this behaviour?
Thank you for your help! Please let me know if you need any additional information
The print, appearing in main, is as follows (for reference, TOP_SCORING_MAX is the number of elements in bestmatch[]):
int i;
for (i = 0; i<TOP_SCORING_MAX; i++) {
if (bestmatch[i].score != -1) {
printf("line\t%d, score = %6.3f and string is %s \n",
bestmatch[i].index,bestmatch[i].score, bestmatch[i].buf);
}
}
And in case you would like the struct:
typedef struct line_t {
char* buf;
int lineLength;
int wordCount;
int index;
double score;
} line_t;
This is my call to scorecmp:
scorecmp(linePtr, bestmatch);
You need to copy the content of the strings, not just the pointers, because they seem to be destroyed, freed, or mutilated before you print them:
bestmatch[j].buf = strdup(linePtr->buf);
Don't forget to free the copied string at the end.
The getline function is the preferred method for reading lines of text from a stream.
The other standard functions, such as gets, fgets and scanf, are a little too unreliable.
The getline function reads an entire line from a stream, up to and including the next newline character.
This function takes three parameters:
A pointer to a block of memory allocated with malloc or calloc. This parameter is of type char**, and it will contain the line read by getline when the function returns.
A pointer to a variable of type size_t. This parameter specifies the size in bytes of the block of memory pointed to by the first parameter.
The stream from which to read the line.
The first parameter - a pointer to the block of memory allocated with malloc or calloc - is merely a suggestion. Function getline will automatically enlarge the block of memory as needed via realloc, so there is never a shortage of space - one reason why this function is so safe. Not only that, but it will also tell you the new size of the block, by updating the value returned in the second parameter.
That being said, every time you call function getline, you first need to:
Set maxSz to a reasonable size.
Set line.buf = malloc(maxSz).
Set the value of maxSz not too large, in order to reduce the amount of redundant memory used.
Set the value of maxSz not too small, in order to reduce the number of times getline calls realloc.
This may be a very basic question for some. I was trying to understand how strcpy works actually behind the scenes. for example, in this code
#include <stdio.h>
#include <string.h>
int main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", sizeof(s));
return 0;
}
As I am declaring s to be a static array with size less than that of source. I thought it wont print the whole word, but it did print world isnsadsdas .. So, I thought that this strcpy function might be allocating new size if destination is less than the source. But now, when I check sizeof(s), it is still 6, but it is printing out more than that. Hows that working actually?
You've just caused undefined behaviour, so anything can happen. In your case, you're getting lucky and it's not crashing, but you shouldn't rely on that happening. Here's a simplified strcpy implementation (but it's not too far off from many real ones):
char *strcpy(char *d, const char *s)
{
char *saved = d;
while (*s)
{
*d++ = *s++;
}
*d = 0;
return saved;
}
sizeof is just returning you the size of your array from compile time. If you use strlen, I think you'll see what you expect. But as I mentioned above, relying on undefined behaviour is a bad idea.
http://natashenka.ca/wp-content/uploads/2014/01/strcpy8x11.png
strcpy is considered dangerous for reasons like the one you are demonstrating. The two buffers you created are local variables stored in the stack frame of the function. Here is roughly what the stack frame looks like:
http://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Call_stack_layout.svg/342px-Call_stack_layout.svg.png
FYI things are put on top of the stack meaning it grows backwards through memory (This does not mean the variables in memory are read backwards, just that newer ones are put 'behind' older ones). So that means if you write far enough into the locals section of your function's stack frame, you will write forward over every other stack variable after the variable you are copying to and break into other sections, and eventually overwrite the return pointer. The result is that if you are clever, you have full control of where the function returns. You could make it do anything really, but it isn't YOU that is the concern.
As you seem to know by making your first buffer 6 chars long for a 5 character string, C strings end in a null byte \x00. The strcpy function copies bytes until the source byte is 0, but it does not check that the destination is that long, which is why it can copy over the boundary of the array. This is also why your print is reading the buffer past its size, it reads till \x00. Interestingly, the strcpy may have written into the data of s depending on the order the compiler gave it in the stack, so a fun exercise could be to also print a and see if you get something like 'snsadsdas', but I can't be sure what it would look like even if it is polluting s because there are sometimes bytes in between the stack entries for various reasons).
If this buffer holds say, a password to check in code with a hashing function, and you copy it to a buffer in the stack from wherever you get it (a network packet if a server, or a text box, etc) you very well may copy more data from the source than the destination buffer can hold and give return control of your program to whatever user was able to send a packet to you or try a password. They just have to type the right number of characters, and then the correct characters that represent an address to somewhere in ram to jump to.
You can use strcpy if you check the bounds and maybe trim the source string, but it is considered bad practice. There are more modern functions that take a max length like http://www.cplusplus.com/reference/cstring/strncpy/
Oh and lastly, this is all called a buffer overflow. Some compilers add a nice little blob of bytes randomly chosen by the OS before and after every stack entry. After every copy the OS checks these bytes against its copy and terminates the program if they differ. This solves a lot of security problems, but it is still possible to copy bytes far enough into the stack to overwrite the pointer to the function to handle what happens when those bytes have been changed thus letting you do the same thing. It just becomes a lot harder to do right.
In C there is no bounds checking of arrays, its a trade off in order to have better performance at the risk of shooting yourself in the foot.
strcpy() doesn't care whether the target buffer is big enough so copying too many bytes will cause undefined behavior.
that is one of the reasons that a new version of strcpy were introduced where you can specify the target buffer size strcpy_s()
Note that sizeof(s) is determined at run time. Use strlen() to find the number of characters s occupied. When you perform strcpy() source string will be replaced by destination string so your output wont be "Helloworld isnsadsdas"
#include <stdio.h>
#include <string.h>
main ()
{
char s[6] = "Hello";
char a[20] = "world isnsadsdas";
strcpy(s,a);
printf("%s\n",s);
printf("%d\n", strlen(s));
}
You are relying on undefined behaviour in as much as that the compiler has chose to place the two arrays where your code happens to work. This may not work in future.
As to the sizeof operator, this is figured out at compile time.
Once you use adequate array sizes you need to use strlen to fetch the length of the strings.
The best way to understand how strcpy works behind the scene is...reading its source code!
You can read the source for GLibC : http://fossies.org/dox/glibc-2.17/strcpy_8c_source.html . I hope it helps!
At the end of every string/character array there is a null terminator character '\0' which marks the end of the string/character array.
strcpy() preforms its task until it sees the '\0' character.
printf() also preforms its task until it sees the '\0' character.
sizeof() on the other hand is not interested in the content of the array, only its allocated size (how big it is supposed to be), thus not taking into consideration where the string/character array actually ends (how big it actually is).
As opposed to sizeof(), there is strlen() that is interested in how long the string actually is (not how long it was supposed to be) and thus counts the number of characters until it reaches the end ('\0' character) where it stops (it doesn't include the '\0' character).
Better Solution is
char *strcpy(char *p,char const *q)
{
char *saved=p;
while(*p++=*q++);
return saved;
}