Converting Upper Case to Lower Case using pointers in C - c

I've been trying to change upper case letters to lower case letter using pointers but I keep getting segmentation faults. Here is my source code:
#include <stdlib.h>
#include <string.h>
char *changeL(char *s);
char *changeL(char *s)
{
char *upper = s;
for (int i = 0; upper[i] != '\0'; i++)
{
if (upper[i] >= 'A' && upper[i] <= 'Z')
{
upper[i] += 32;
}
}
printf("%s\n", upper);
return upper;
}
int main()
{
char *first;
char *second;
first = "HELLO My Name is LoL";
printf("%s\n", first);
second = changeL(first);
printf("There is no error here\n\n");
printf("%s\n", second);
return 0;
}
Using gdb I found the seg fault to be in "upper[i] += 32;". I don't understand why the seg fault is there.

"HELLO My Name is LoL" is the constant memory. You can`t change it. However you pass pointer to this memory(first) to a function which tries to change it. Thus you got segmentation fault. You should copy this string to memory butffer. Like
char buffer[] = "HELLO My Name is LoL";
and then pass buffer to changeL

A couple of notes in addition to what #Alex correctly points out in his answer. First
char *changeL(char *s);
char *changeL(char *s)
{
....
}
There is no need for a prototype before the function if the function is one line below. A prototype is used to inform code below it that the function described by the prototype exists and is defined elsewhere. If you define the function immediately below the prototype it makes the prototype irrelevant.
Second as noted in Alex's answer, on a overwhelming majority of systems, a String Literal, e.g. the "Something Here" in char *s = "Something Here"; is immutable and resides in read-only memory and any attempt to modify the string literal generally results in a SegFault.
Instead you need to create an array of characters which can be modified, e.g.
char first[] = "HELLO My Name is LoL";
or with C99+ you can use a Compound Literal to initialize first as a pointer to an array of char, e.g.
char *first = (char[]){ "HELLO My Name is LoL" };
In both cases above the characters pointed to by first will be modifiable.
Addition Per Comment
"can you also explain to him why is he getting segfault at upper[i] += 32;"
Yes. At mentioned above, when you initialize a pointer to a String Literal on virtually every current system (ancient systems had no distinction or protection for read-only memory -- all memory was writable). In the current day, creating a string literal (e.g. "foo") creates the string in memory which cannot be modified. (for ELF executables, that is generally in the .rodata section of the executable -- dissecting closer ".ro...data" meaning "read-only data")
When any attempt is made to change data that cannot be modified, a Segmentation Fault generally results because you have attempted to write to an address within a segment that is read-only. (thus the Segmentation Fault -- of SegFault)
In the code above as originally written with
first = "HELLO My Name is LoL";
If you compile to assembly (on Linux, e.g. gcc -S -masm=intel -o mysaved.asm myfile.c you will see that the string "HELLO My Name is LoL" is in fact created in the .rodata section. You do not have any ability to change that data -- you now know what happens when you try :)
The code as written in the Question also shows confusion about what the pointers first and second actually point to. By assigning the return of changeL to second, there is no new memory created for second. It is no different than simply assigning second = first; in main(). second is just a separate pointer that points to the same memory referenced by first. A more concise version of the code would be:
#include <stdio.h>
void changeL (char *s)
{
for (int i = 0; s[i]; i++)
if (s[i] >= 'A' && s[i] <= 'Z')
s[i] += 32;
}
int main (void)
{
char first[] = "HELLO My Name is LoL";
char *second = first;
printf("%s\n", first);
changeL(first);
printf("%s\n", second);
return 0;
}
(note: both header files in the original code are unnecessary, <stdio.h> is the only required header)
To illustrate second simply points to first:
Example Use/Output
$./bin/chars
HELLO My Name is LoL
hello my name is lol

This code outputs inly the lower case in string
#include<stdio.h>
#include<string.h>
int main()
{
char b[50];
printf("String=");
scanf("%[a-z]",b);
printf("%s",b);
return 0;
}

Related

The following C code outputs a segmentation fault error which I can hardly understand why

#include <stdio.h>
void append(char* s, char n);
void splitstr(char* string);
int main()
{
splitstr("COMPUTE 1-1");
printf("\n");
splitstr("COMPUTE 1+1");
printf("\n");
splitstr("COMPUTE 1*1");
return 0;
}
void append(char* s, char ch) {
while(*s != '\0'){
s = s + 1;
}
*s = ch;
s = s + 1;
*s = '\0';
}
void splitstr(char* string){
int count = 1;
char* expression = "";
while(*string != '\0'){
if(count > 8){
append(expression, *string);
string = string + 1;
count = count + 1;
}else{
string = string + 1;
count = count + 1;
}
}
printf("%s",expression);
}
Example Input and Output:
Input: COMPUTE 1+1
Output: 1+1
Input: COMPUTE 2-6
Output: 2-6
Originally, this code does not include stdio.h (I am doing this for testing on an online C compiler) because I am building an OS from scratch so I need to write all the functions by myself. I think the problem might be in the append function but I cannot find it.
instead of
char* expression = "";
do
char[MAX_expression_length+1] expression;
or use realloc in the append function
I think this line is the culprit:
append(expression, *string);
Notice how expression is declared:
char* expression = "";
In other words, expression consists of one byte, a single \0. Right away, we can see that append() won't work like you want it to--the while loop will never run, because *s is already \0.
But beyond that, the segfault likely happens at the bottom of append(). After the while loop, you unconditionally increment s and then write to the location it now points to. The problem is that this is a location that has never been allocated (since s is a reference to splitstr()'s expression, which is a single byte long). Furthermore, because expression is declared as a string constant, depending on your platform it may be placed in an area of memory marked read-only. Consequently, this is an attempt to write into memory that may not actually belong to the process and may also not be writable, raising the fault.
expression points to a string literal, and trying to modify a string literal leads to undefined behavior.
You need to define expression as an array of char large enough to store your final result:
char expression[strlen(string)+1]; // VLA
Since your result isn’t going to be any longer than the source string, this should be sufficient (provided your implementation supports VLAs).

Segmentation fault with strcpy, even though pointers have pointee

void trim(char *line)
{
int i = 0;
char new_line[strlen(line)];
char *start_line = line;
while (*line != '\0')
{
if (*line != ' ' && *line != '\t')
{
new_line[i] = *line;
i++;
}
line++;
}
new_line[i] = '\0';
printf("%s\n", start_line);
printf("%s\n", new_line);
strcpy(start_line, new_line);
}
I really cannot find the problem here. My pointers are initialized, and I made a pointer to have the start of the string line. At the end I would like to copy the new line in the old one, so the caller has a changed value of his line.
But strcpy() makes a segmentation fault. What is wrong?
This is the code that calls trim():
char *str = "Irish People Try American Food";
printf("%s\n", str);
trim(str);
printf("%s\n", str);
You need to show the whole program; what calls "trim()"? Paul R's answer is right, you are one character short and it should be at least:
char new_line[strlen(line) + 1];
However, this will not always cause a segfault, and if it did it would probably not be at strcpy().
The likely reason strcpy(start_line, new_line) is faulting is that start_line points to the original value of line. It is likely that you are calling the function like:
int main() {
trim("blah blah\tblah");
return 0;
}
If so, line is a pointer to a constant char array that can't be modified. On many OS's this is stored in a read-only memory area, so it will cause an immediate segmentation fault if a write attempt is made. So strcpy() then faults when trying to write into to this read only location.
As a quick test try this:
int main() {
char test[100] = "blah blah\tblah";
trim(test);
return 0;
}
If it works, that's your specific issue with strcpy() faulting.
EDIT - the question was updated later to include the main() calling function, which confirmed that the trim function was called with a pointer to a string constant. The problem line is:
char *str = "Irish People Try American Food";
This creates a string literal, an array of 31 characters including a null terminator which cannot be modified. The pointer str is then initialized with the address of this constant, array.
The correction is to allocate a regular array of characters and then initialize it with the known string. In this case the assignment and temporary constant string literal may or may not be optimized out, but the end result is always the same - a writable array of characters initialized with the desired text:
char str[100] = "Irish People Try American Food";
/* or */
char str2[] = "American People Like Irish Beer";
/* or */
char *str3[37];
strcpy(str3, "And German Beer"); /* example only, need to check length */
These create normal writable char arrays of lengths 100, 32, and 37, respectively. Each is then initialized with the given strings.
The ANSI/ISO C standard defined the language such that a string literal is a array of char that cannot be modified. This is the case even as it was first standardized in C89. Prior to this string literals had been commonly writable, such as in the pre-standard K&R C of very early UNIX code.
Identical string literals of either form need not be distinct. If
the program attempts to modify a string literal of either form, the
behavior is undefined. - ANSI X3.159-1989
Many C89 and newer compilers have since then placed this array into the .text or .rodata segments where it may even be physically unwritable (ROM, read-only MMU pages, etc.), as discovered here. Compilers may also coalesce duplicate string constants into single one to conserve space - and you wouldn't to write into "one" of those either!
The fact that these semantically unwritable strings were still left as type char *, and that they could be assigned to and passed as such was known to be a compromise, even as the C89 standard was being drafted. That they did not use the then-brand-new type qualifier const was described as a "not-completely-happy result". See Richie's (DMR's) explanation.
And apparently that result still boomerangs around and whacks people upside the head nearly 30 years later.
Your new_line string is one char too small - it does not have room for the final '\0' terminator - change:
char new_line[strlen(line)];
to:
char new_line[strlen(line) + 1];
You should also be aware that string literals can not be modified, so if you try to call your function like this:
trim("Hello world!");
then this will result in undefined behaviour. (You should also get a compiler warning if you try to do this.)
As #PaulR stated, your new line's buffer is too small. But instead of using another buffer that takes up more space, you could use a single-character approach, like this:
void trim(char *s)
{
char *src = s, *dest = s;
while (*src)
{
if ((*src != ' ') && (*src != '\t'))
*dest++ = *src;
++src;
}
*dest = '\0';
}

Segmentation Fault in Simple Offset Encryption

Alright guys, this is my first post here. The most recent assignment in my compsci class has us coding a couple of functions to encode and decode strings based on a simple offset. So far in my encryption function I am trying to convert uppercase alphas in a string to their ASCII equivalent(an int), add the offset(and adjust if the ASCII value goes past 'Z'), cast that int back to a char(the new encrypted char) and put it into a new string. What I have here compiles fine, but it gives a Segmentation Fault (core dumped) error when I run it and input simple uppercase strings. Where am I going wrong here? (NOTE: there are some commented out bits from an attempt at solving the situation that created some odd errors in main)
#include <stdio.h>
#include <string.h>
#include <ctype.h>
//#include <stdlib.h>
char *encrypt(char *str, int offset){
int counter;
char medianstr[strlen(str)];
char *returnstr;// = malloc(sizeof(char) * strlen(str));
for(counter = 0; counter < strlen(str); counter++){
if(isalpha(str[counter]) && isupper(str[counter])){//If the character at current index is an alpha and uppercase
int charASCII = (int)str[counter];//Get ASCII value of character
int newASCII;
if(charASCII+offset <= 90 ){//If the offset won't put it outside of the uppercase range
newASCII = charASCII + offset;//Just add the offset for the new value
medianstr[counter] = (char)newASCII;
}else{
newASCII = 64 + ((charASCII + offset) - 90);//If the offset will put it outside the uppercase range, add the remaining starting at 64(right before A)
medianstr[counter] = (char)newASCII;
}
}
}
strcpy(returnstr, medianstr);
return returnstr;
}
/*
char *decrypt(char *str, int offset){
}
*/
int main(){
char *inputstr;
printf("Please enter the string to be encrypted:");
scanf("%s", inputstr);
char *encryptedstr;
encryptedstr = encrypt(inputstr, 5);
printf("%s", encryptedstr);
//free(encryptedstr);
return 0;
}
You use a bunch of pointers, but never allocate any memory to them. That will lead to segment faults.
Actually the strange thing is it seems you know you need to do this as you have the code in place, but you commented it out:
char *returnstr;// = malloc(sizeof(char) * strlen(str));
When you use a pointer you need to "point" it to allocated memory, it can either point to dynamic memory that you request via malloc() or static memory (such as an array that you declared); when you're done with dynamic memory you need to free() it, but again you seem to know this as you commented out a call to free.
Just a malloc() to inputstr and one for returnstr will be enough to get this working.
Without going any further the segmentation fault comes from your use of scanf().
Segmentation fault occurs at scanf() because it tries to write to *inputstr(a block of location inputstr is pointing at); it isn't allocated at this point.
To invoke scanf() you need to feed in a pointer in whose memory address it points to is allocated first.
Naturally, to fix the segmentation fault you want to well, allocate the memory to your char *inputstr.
To dynamically allocate memory of 128 bytes(i.e., the pointer will point to heap):
char *inputstr = (char *) malloc(128);
Or to statically allocate memory of 128 bytes(i.e., the pointer will point to stack):
char inputstr[128];
There is a lot of complexity in the encrypt() function that isn't really necessary. Note that computing the length of the string on each iteration of the loop is a costly process in general. I noted in a comment:
What's with the 90 and 64? Why not use 'A' and 'Z'? And you've commented out the memory allocation for returnstr, so you're copying via an uninitialized pointer and then returning that? Not a recipe for happiness!
The other answers have also pointed out (accurately) that you've not initialized your pointer in main(), so you don't get a chance to dump core in encrypt() because you've already dumped core in main().
#include <ctype.h>
#include <stdlib.h>
#include <string.h>
char *encrypt(char *str, int offset)
{
int len = strlen(str) + 1;
char *returnstr = malloc(len);
if (returnstr == 0)
return 0;
for (int i = 0; i < len; i++)
{
char c = str[i];
if (isupper((unsigned char)c))
{
c += offset;
if (c > 'Z')
c = 'A' + (c - 'Z') - 1;
}
returnstr[i] = c;
}
return returnstr;
}
Long variable names are not always helpful; they make the code harder to read. Note that any character for which isupper() is true also satisfies isalpha(). The cast on the argument to isupper() prevents problems when the char type is signed and you have data where the unsigned char value is in the range 0x80..0xFF (the high bit is set). With the cast, the code will work correctly; without, you can get into trouble.

pointers on string in C

I have this program in C
#include<stdio.h>
int main()
{
printf("Hello New!\n");
char c = 'd';
char* s = "hello world";
char **t = &s;
*t[0] = c;
return 0;
}
The program compiles but doesn't run.
I have this output :
Hello New!
Bus error
I don't understand why
String constants are stored in readonly memory and you cannot modify them.
If you must, then use:
#include<stdio.h>
int main()
{
printf("Hello New!\n");
char c = 'd';
char s[] = "hello world";
char **t = &s[0];
*t[0] = c;
return 0;
}
This allocates a local variable (not a constant) that is conveniently initialized and may be modified to your heart's content.
You may not modify the string that 's' points to, in any way. It is in a part of memory that you are not allowed to change.
String constants are unmodifiable, despite having the type char* rather than const char* for historical reasons. Try using the string constant to initialize an array, rather than a pointer:
#include <stdio.h>
int
main(void)
{
char s[] = "hello new!";
puts(s);
s[0] = 'c';
puts(s);
return 0;
}
A bus error usually means that you're accessing a pointer with an invalid value - e.g. an address that is out of the address space.
I would guess that in this case, it is because you are trying to write to memory that is read-only. The string "hello world" is in a memory segment that you are not allowed to write to. Depending on the operating system, these memory segments are either protected or you can write arbitrary garbage to it. Seems like yours doesn't allow it. As you can see in the other answers, you can work around this by copying/initializing the string constant into an array.

Programs executes correctly and then segfaults

I'm trying to learn C programming and spent some time practicing with pointers this morning, by writing a little function to replace the lowercase characters in a string to their uppercase counterparts. This is what I got:
#include <stdio.h>
#include <string.h>
char *to_upper(char *src);
int main(void) {
char *a = "hello world";
printf("String at %p is \"%s\"\n", a, a);
printf("Uppercase becomes \"%s\"\n", to_upper(a));
printf("Uppercase becomes \"%s\"\n", to_upper(a));
return 0;
}
char *to_upper(char *src) {
char *dest;
int i;
for (i=0;i<strlen(src);i++) {
if ( 71 < *(src + i) && 123 > *(src + i)){
*(dest+i) = *(src + i) ^ 32;
} else {
*(dest+i) = *(src + i);
}
}
return dest;
}
This runs fine and prints exactly what it should (including the repetition of the "HELLO WORLD" line), but afterwards ends in a Segmentation fault. What I can't understand is that the function is clearly compiling, executing and returning successfully, and the flow in main continues. So is the Segmentation fault happening at return 0?
dest is uninitialised in your to_upper() function. So, you're overwriting some random part of memory when you do that, and evidently that causes your program to crash as you try to return from main().
If you want to modify the value in place, initialise dest:
char *dest = src;
If you want to make a copy of the value, try:
char *dest = strdup(src);
If you do this, you will need to make sure somebody calls free() on the pointer returned by to_upper() (unless you don't care about memory leaks).
Like everyone else has pointed out, the problem is that dest hasn't been initialized and is pointing to a random location that contains something important. You have several choices of how to deal with this:
Allocate the dest buffer dynamically and return that pointer value, which the caller is responsible for freeing;
Assign dest to point to src and modify the value in place (in which case you'll have to change the declaration of a in main() from char *a = "hello world"; to char a[] = "hello world";, otherwise you'll be trying to modify the contents of a string literal, which is undefined);
Pass the destination buffer as a separate argument.
Option 1 -- allocate the target buffer dynamically:
char *to_upper(char *src)
{
char *dest = malloc(strlen(src) + 1);
...
}
Option 2 -- have dest point to src and modify the string in place:
int main(void)
{
char a[] = "hello world";
...
}
char *to_upper(char *src)
{
char *dest = src;
...
}
Option 3 -- have main() pass the target buffer as an argument:
int main(void)
{
char *a = "hello world";
char *b = malloc(strlen(a) + 1); // or char b[12];
...
printf("Uppercase becomes %s\n", to_upper(a,b));
...
free(b); // omit if b is statically allocated
return 0;
}
char *to_upper(char *src, char *dest)
{
...
return dest;
}
Of the three, I prefer the third option; you're not modifying the input (so it doesn't matter whether a is an array of char or a pointer to a string literal) and you're not splitting memory management responsibilities between functions (i.e., main() is solely responsible for allocating and freeing the destination buffer).
I realize you're trying to familiarize yourself with how pointers work and some other low-level details, but bear in mind that a[i] is easier to read and follow than *(a+i). Also, there are number of functions in the standard library such as islower() and toupper() that don't rely on specific encodings (such as ASCII):
#include <ctype.h>
...
if (islower(src[i])
dest[i] = toupper(src[i]);
As others have said, your problem is not allocating enough space for dest. There is another, more subtle problem with your code.
To convert to uppercase, you are testing a given char to see if it lies between 71 ans 123, and if it does, you xor the value with 32. This assumes ASCII encoding of characters. ASCII is the most widely used encoding, but it is not the only one.
It is better to write code that works for every type of encoding. If we were sure that 'a', 'b', ..., 'z', and 'A', 'B', ..., 'Z', are contiguous, then we could calculate the offset from the lowercase letters to the uppercase ones and use that to change case:
/* WARNING: WRONG CODE */
if (c >= 'a' && c <= 'z') c = c + 'A' - 'a';
But unfortunately, there is no such guarantee given by the C standard. In fact EBCDIC encoding is an example.
So, to convert to uppercase, you can either do it the easy way:
#include <ctype.h>
int d = toupper(c);
or, roll your own:
/* Untested, modifies it in-place */
char *to_upper(char *src)
{
static const char *lower = "abcdefghijklmnopqrstuvwxyz";
static const char *upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
static size_t n = strlen(lower);
size_t i;
size_t m = strlen(src);
for (i=0; i < m; ++i) {
char *tmp;
while ((tmp = strchr(lower, src[i])) != NULL) {
src[i] = upper[tmp-lower];
}
}
}
The advantage of toupper() is that it checks the current locale to convert characters to upper case. This may make æ to Æ for example, which is usually the correct thing to do. Note: I use only English and Hindi characters myself, so I could be wrong about my particular example!
As noted by others, your problem is that char *dest is uninitialized. You can modify src's memory in place, as Greg Hewgill suggests, or you can use malloc to reserve some:
char *dest = (char *)malloc(strlen(src) + 1);
Note that the use of strdup suggested by Greg performs this call to malloc under the covers. The '+ 1' is to reserve space for the null terminator, '\0', which you should also be copying from src to dest. (Your current example only goes up to strlen, which does not include the null terminator.) Can I suggest that you add a line like this after your loop?
*(dest + i) = 0;
This will correctly terminate the string. Note that this only applies if you choose to go the malloc route. Modifying the memory in place or using strdup will take care of this problem for you. I'm just pointing it out because you mentioned you were trying to learn.
Hope this helps.

Resources