I have a character array that contains a phone number of the form: "(xxx)xxx-xxxx xxxx" and need to convert it to something of the form: "xxx-xxx-xxxx" where I would just truncate the extension. My initial pass at the function looks like this:
static void formatPhoneNum( char *phoneNum ) {
unsigned int i;
int numNumbers = 0;
/* Change the closing parenthesis to a dash and truncate at 12 chars. */
for ( i = 0; i < strlen( phoneNum ); i++ ) {
if ( phoneNum[i] == ')' ) {
phoneNum[i] = '-';
}
else if ( i == 13 ) {
phoneNum[i] = '\0';
break;
}
else if ( isdigit( phoneNum[i] ) ) {
numNumbers++;
}
}
/* If the phone number is empty or not a full phone number,
* i.e. just parentheses and dashes, or not 10 numbers
* format it as an emtpy string. */
if ( numNumbers != 10 ) {
strcpy( phoneNum, "" );
}
else {
/* Remove the first parenthesis. */
strcpy( phoneNum, phoneNum + 1 );
}
}
It feels kinda hokey the way I'm removing the leading paren, but I can't just increment the pointer in the function as the calling version's pointer won't get updated. I also feel like I could be "more clever" in general all throughout the function.
Any ideas/pointers?
Since you stated that your input is guaranteed to be in the proper format, how about the following:
static void formatPhoneNum( char *phoneNum )
{
memmove(phoneNum, phoneNum + 1, 12);
phoneNum[3] = '-';
phoneNum[12] = 0;
}
memmove() is guaranteed to work with overlapping buffers
As Pavel said, you can't strcpy a string onto itself. I'm declaring a new variable for clarity, although my approach doesn't use strcpy - with care, you could re-use the original variable. Anyway, if your input is always of the form (xxx) xxx-xxxx xxxx, and your output is always going to be xxx-xxx-xxxx why not just do:
char newPhone[14];
newPhone[0] = phoneNum[1];
newPhone[1] = phoneNum[2];
newPhone[2] = phoneNum[3];
newPhone[3] = '-';
newPhone[4] = phoneNum[6];
newPhone[5] = phoneNum[7];
newPhone[6] = phoneNum[8];
newPhone[7] = '-';
newPhone[8] = phoneNum[10];
newPhone[9] = phoneNum[11];
newPhone[10] = phoneNum[12];
newPhone[11] = phoneNum[13];
newPhone[12] = '\0';
Brute force? Sure, but - if your inputs and outputs are always going to be as you state - it should run efficiently.
Well I guess I'm just too slow. Nothing clever about this over memmove(), but it shows how you can have a loop and still take all those comparisons out of the inside:
char *formatPhoneNum(char *buffer) {
int index = 0;
for( index = 0; index < 12; ++index ) {
buffer[index] = buffer[index + 1];
}
buffer[3] = '-';
buffer[12] = '\0';
return buffer;
}
You may find it helpful if you return the start of the string you modify instead of just void so you can chain commands easier. E.g.,
printf("%s\n", formatPhoneNum(buffer));
For starters, this is wrong:
strcpy( phoneNum, phoneNum + 1 );
because ISO C standard says regarding strcpy:
If copying takes place between objects that overlap, the behavior is undefined.
"objects" here being source and destination char arrays. MSDN concurs with this, by the way, so this won't work properly on at least one popular real-world implementation.
It seems that a simpler approach would be to have the function return a new "proper" value of the pointer (into the same buffer), so it can adjust it by 1 to trim the '('.
Your validation, which simply counts digits, permits formatting such as "1-234567890" or "1234567890-" or even "12345foobar4567890" - this may or may not be a problem, depending on requirements.
When possible (and not performance degrading) I prefer to pass data to functions as consts. In your case I see no reason not to do it, so I'd declare your function as
static void formatPhoneNum(char *dst, const char *src);
or even, returning the length of the new number:
static int formatPhoneNum(char *dst, const char *src);
Then, just copy digits from src to dst inserting dashes at the right places. The caller is responsible to provide space in dst and check the return value: if 12 (dashes included), all ok; otherwise there was an error.
You can return some negative number to indicate possible errors. For example: -1 would indicate that src is not long enough; -2 would indicate a bad format for src, etc...
Do document all the return values!
Oh! And do not forget to NUL terminate dst!
If you are allowed to change the API, you could either accept a char** or return a char *, and improve the time complexity:
static void formatPhoneNum(char **phoneNum) {
(*phoneNum)[4] = '-';
(*phoneNum)[13] = '\0';
(*phoneNum)++;
}
Alternately
static char *formatPhoneNum(char *phoneNum) {
phoneNum[4] = '-';
phoneNum[13] = '\0';
return phoneNum + 1;
}
The advantage is that this will take constant time.
Related
So I've got this here:
#include <stdio.h>
char halloString[] = "Ha::ll::o";
char perfumeString[] = "47::11";
char veryLongString[] = "47::11::GHesd::dghsr::bfdr:hfgd46dG";
char *extract (char *input) {somethinghappenshere}
where extract needs to get all characters after the last double ":" of given input:
"o" for halloString
"11" for perfumeString
"bfdr:hfgd46dG" for veryLongString
In short, my issue is finding the length of the string *input points to. As far as I understand it that won't be happening without making something really sketchy.
Am I correct in assuming the length cannot be acquired in a good way?
And if so would it be a horrible idea to do, for example:
char stringToProcessTemp1[50];
char stringToProcessTemp2[50];
char stringToProcess[50];
for (int i = 0; i < 50; i++) {
stringToProcessTemp1[i] = input + i;
}
for (int i = 0; i < 50; i++) {
stringToProcessTemp2[i] = input + i;
}
for (int i = 0; i < 50; i++) {
if (stringToProcessTemp1[i] == stringToProcessTemp2[i]) {
stringToProcessTemp[i] = stringToProcessTemp1[i];
}
}
Later checking where the first empty index is and saving everything before it as the used String as from my very limited experience in C when you go outside of an array you tend to get different outputs every time therefore making the chance both Temp strings match for an extra element directly after the last one of the original string what I'd consider low enough.
It's honestly the only idea I've got right now.
Finding the length of a string is no problem. strlen will do that for you. However, you don't even need that.
You can use the strstr function to find a substring within a string, in this case "::". When you find one, keep looking right after the last one you found until you don't find it anymore, then the last one you found is the one you want. Then you want the substring that starts right after it.
char *extract(char *input)
{
char *last = NULL, *start = input, *curr;
while ((curr == strstr(start, "::")) != NULL) {
last = curr; // keep track of the last "::" found
start = last + 1; // move the starting string to right after the last "::"
// move up 1 instead of 2 in case of ":::"
}
if (last != NULL) {
last +=2; // We found one; move just past the "::"
}
return last;
}
C strings, which are really only an array of characters, are by definition terminated by '\0'. So, for a well formed C string you can always get the length of the string by using strlen().
If, however, your string is not null-terminated, there is no way to determine it's length, and it is not a C string by definition any more, but just an array of characters.
I'm looking for a way to check if a specific string exists in a large array of strings. The array is multi-dimensional: all_strings[strings][chars];. So essentially, this array is an array of character arrays. Each character array ends in '\0'
Given another array of characters, I need to check to see if those characters are already in all_strings, kind of similar to the python in keyword.
I'm not really sure how to go about this at all, I know that strcmp might help but I'm not sure how I could implement it.
As lurker suggested, the naive method is to simply loop on the array of strings calling strcmp. His string_in function is unfortunately broken due to a misunderstanding regarding sizeof(string_list), and should probably look like this:
#include <string.h>
int string_in(char *needle, char **haystack, size_t haystack_size) {
for (size_t x = 0; x < haystack_size; x++) {
if (strcmp(needle, haystack[x]) == 0) {
return 1;
}
}
return 0;
}
This is fairly inefficient, however. It'll do if you're only going to use it once in a while, particularly on a small collection of strings, but if you're looking for an efficient way to perform the search again and again, changing the search query for each search, the two options I would consider are:
If all_strings is relatively static, you could sort your array like so: qsort(all_strings, strings, chars, strcmp);... Then when you want to determine whether a word is present, you can use bsearch to execute a binary search like so: char *result = bsearch(search_query, all_strings, strings, chars, strcmp);. Note that when all_strings changes, you'll need to sort it again.
If all_strings changes too often, you'll probably benefit from using some other data structure such as a trie or a hash table.
Use a for loop. C doesn't have a built-in like Python's in:
int i;
for ( i = 0; i < NUM_STRINGS; i++ )
if ( strcmp(all_strings[i], my_other_string) == 0 )
break;
// Here, i is the index of the matched string in all_strings.
// If i == NUM_STRINGS, then the string wasn't found
If you want it to act like Python's in, you could make it a function:
// Assumes C99
#include <string.h>
#include <stdbool.h>
bool string_in(char *my_str, char *string_list[], size_t num_strings)
{
for ( int i = 0; i < num_strings; i++ )
if (strcmp(my_str, string_list[i]) == 0 )
return true;
return false;
}
You could simply check if a string exists in an array of strings. A better solution might be to actually return the string:
/*
* haystack: The array of strings to search.
* needle: The string to find.
* max: The number of strings to search in "haystack".
*/
char *
string_find(char **haystack, char *needle, size_t max)
{
char **end = haystack + max;
for (; haystack != end; ++haystack)
if (strcmp(*haystack, needle) == 0)
return *haystack;
return NULL;
}
If you're wanting the behavior of a set, where all strings in the array are unique, then you can use it that way:
typedef struct set_strings {
char **s_arr;
size_t count;
size_t max;
} StringSet;
.
.
.
int
StringSet_add(StringSet *set, const char *str)
{
// If string exists already, the add operation is "successful".
if (string_find(set->s_arr, str, set->count))
return 1;
// Add string to set and return success if possible.
/*
* Code to add string to StringSet would go here.
*/
return 1;
}
If you want to actually do something with the string, you can use it that way too:
/*
* Reverse the characters of a string.
*
* str: The string to reverse.
* n: The number of characters to reverse.
*/
void
reverse_str(char *str, size_t n)
{
char c;
char *end;
for (end = str + n; str < --end; ++str) {
c = *str;
*str = *end;
*end = c;
}
}
.
.
.
char *found = string_find(words, word, word_count);
if (found)
reverse_str(found, strlen(found));
As a general-purpose algorithm, this is reasonably useful and even can be applied to other data types as necessary (some re-working would be required of course). As pointed out by undefined behaviour's answer, it won't be fast on large amounts of strings, but it is good enough for something simple and small.
If you need something faster, the recommendations given in that answer are good. If you're coding something yourself, and you're able to keep things sorted, it's a great idea to do that. This allows you to use a much better search algorithm than a linear search. The standard bsearch is great, but if you want something suitable for fast insertion, you'd probably want a search routine that would provide you with the position to insert a new item to avoid searching for the position after bsearch returns NULL. In other words, why search twice when you can search once and accomplish the same thing?
So I have a string which will have the following rough format: string # more than one string here.
What I want to do is strip everything after the #. How would I do that? Furthermore, what if I just want to keep the initial string, and take everything after the space and the #?
I have the following code, but obviously appending something to a NULL string does not work as expected:
char *inst_ptr holds the whole string.
char *lbl = NULL;
int len = 0;
size_t inst_len = strlen(inst_ptr);
for (int t = 0; t < inst_len; t++) {
if (inst_ptr[t] == '#')
break;
else {
printf("len %d\n", len);
lbl[len] = inst_ptr[t];
lbl[len+1] = '\0';
len = strlen(lbl);
}
}
EDIT:
Basically, assume I have the following string:
loop # hello world!
I just want to extract loop in another string. What I am doing above is having lbl as a NULL string, and run a loop across the original string. As long as the character is not #, i just "append" the character with a null terminator as shown above.
Perhaps you should check section 7.21 in the standard for some inspiration.
The best approach depends on the rest of your application.
If you don't care about the original data, you could just use
strtok(inst_ptr, "#");
Or it might be better to allocate-and-copy just the data you need:
char * temp = strchr(inst_ptr, '#');
char * lbl;
if (temp)
lbl = strndup(inst_ptr, temp - inst_ptr);
else
lbl = strdup(inst_ptr);
Please note that the above are the minimal implemenations I've come to think of off the top of my head, not necessarily the best ones.
First, you are doing
char *lbl = NULL;
Then you are doing
lbl[len]
You're trying to dereference a NULL pointer and causing undefined behaviour.
First, you need to allocate memory for that string (calloc allocates the memory and sets it all to 0, which will obviate the need to manually add the null terminator):
char *lbl = calloc(inst_len + 1, 1);
Then you need to fix your loop, some of the things are in the wrong place. It should be
for (int t = 0; t < inst_len; t++) {
if (inst_ptr[t] == '#') {
len = t - 1; // strlen(lbl) is redundant
break;
} else
lbl[t] = inst_ptr[t];
}
Then when you are done, free the memory you allocated so as not to cause a memory leak:
free(lbl);
I have a function that it is supposed to read a string (only with numbers in it) and return the biggest sequence that doenst have repeated numbers.
for example:
12345267890
it should return: 345267890
i've experimented the code manually and I believe it should work. but when I run it, and when it reaches this line i=(strchr(v+i, *(v+j)))-v; instead of getting the distance between the pointers I get something like -1046583. can I do this?
char* bigSeq(char *v){
int i, j;
char *aux, *bgst;
aux=(char*) malloc(10*sizeof(char));
bgst=(char*) malloc(10*sizeof(char));
for(i=0;i<strlen(v);i++){
for(j=0;j<strlen(v+i);j++){
if(strchr(v+i, *(v+j)) != (v+j)){
if(strlen(strncpy(aux, (v+i),j)) > strlen(bgst))
strncpy(bgst, (v+i),j);
i=(strchr(v+i, *(v+j)))-v;
break;
}
}
}
return bgst;
}
I think your trouble is with strchr() and what it returns when it doesn't find the character you are searching for. In that case, it returns 0, not either the start or the end of the string that is passed to it.
You should also review making the input a 'const char *', and look at why you need to call strlen() on every iteration of the two loops. You can probably do better than that.
Note that if the longest substring of non-repeating digits is all 10 digits, your allocated space is too small - you need to allocate a byte for the NUL '\0' at the end of the string too. Also, strncpy() does not guarantee to null-terminate your string; I'm not sure whether that is part of your problem here.
if(strlen(strncpy(aux, (v+i),j))
and
strncpy(bgst, (v+i),j);
If j is > 10 you're overwriting memory you don't own - which can lead to all sorts of funny issues.
I'm going to suggest an O(N) algorithm which is also simpler, easier to read, and requires no extra memory other than a few pointers and integers on the stack. These features help remove possibilities for error.
void longest_non_repeating( const char** output_begin, const char** output_end, const char* input )
{
const char* last_occurrence[10] = { input-1 };
const char* candidate_begin = input;
while( *input )
{
if( last_occurrence[*input-'0'] < candidate_begin )
{
const char* candidate_end = input+1;
if( candidate_end - candidate_begin > *output_end - *output_begin )
{
*output_begin = candidate_begin;
*output_end = candidate_end;
if( ( candidate_end - candidate_begin ) == 10 )
return;
}
}
else
{
input = candidate_begin = last_occurrence[*input-'0'] + 1;
std::fill( last_occurrence, last_occurrence+10, input-1 );
}
last_occurrence[*input-'0'] = input;
++input;
}
}
Whilst reading through K&R, I came across the integer to string function. I gave it a quick read, and decided to implement it myself, but instead of printing, it updates a character array.
Here is what I have
void inttostr(int number, char str[]) {
static int i;
if (number / 10) {
inttostr(number / 10, str);
}
str[i++] = number % 10 + '0';
}
It seemed to work for the few integers I gave it, but I have some questions.
I haven't explicitly included the nul byte \0 at the end, so why does the string work fine when printed with printf("%s\n", str);?
I don't think I'm very good at thinking recursively. When I try and step through the program in my mind, I lose track of what is still awaiting execution. Is there a better way of seeing what is happening internally, to help me learn?
Any other suggestions on the code?
I'm using Xcode.
This is not homework. I'm just learning.
Thanks!
You're correct that you're never writing NUL, which is a bug.
In general, you don't have to think through the entire solution. You just have to make sure every step is correct. So in this case, you say:
1 . inttostr(number / 10, str);
will take care of all but the last digit.
2 . Then I will take care of the last one.
You can trace what's happening, though. For e.g. 54321 it looks like:
inttostr(54321, str); // str = ...;
inttostr(5432, str); // str = ...;
inttostr(543, str); // str = ...;
inttostr(54, str); // str = ...;
inttostr(5, str); // str = ...;
str[0] = '5'; // str = "5...";
str[1] = '4'; // str = "54...";
str[2] = '3'; // str = "543...";
str[3] = '2'; // str = "5432...";
str[4] = '1'; // str = "54321...";
Note that when you don't return from any of the functions until you write the first character, then you return in the opposite order from the calls.
The ... signifies that you haven't NUL-terminated. Another issue is that you're using a static variable, so your code isn't reentrant; this means it breaks in certain scenarios, including multi-threading.
To address the reentrancy and NUL issue, you can do something like the code below. This creates a helper function, and passes the current index to write.
void inttostr_helper(int number, char str[], int *i)
{
if (number / 10) {
inttostr_helper(number / 10, str, i);
}
str[(*i)++] = number % 10 + '0';
str[*i] = '\0';
}
void inttostr(int number, char str[])
{
int i = 0;
inttostr_helper(number, str, &i);
}
EDIT: Fixed non-static solution.
I am impressed of the creativity to use recursive, despite that it is not necessary. I think the code should remove statically-allocated i variable because this variable will persist through calls. So the second time you use this function from your code, e.g. from main(), it will not be initiated and will be the same value from previous call. I would suggest using return value as follow:
int inttostr(int number, char *str) {
int idx = 0;
if (number / 10) {
idx = inttostr(number / 10, str);
}
str[idx++] = number % 10 + '0';
return idx;
}
1, Your compiler (especially in debug mode) may have filled str with 0. Unix does this if you allocate memory with new() - But don't rely on this, either set the first byte to \0 or memset everything to 0
2, paper + pencil. Draw a table with each variable across the top and time down the side.
3, Write the simplest, longest version first, then get clever.
I haven't explicitly included the nul byte \0 at the end, so why
does the string work fine when printed
with printf("%s\n", str);?
how is the original char array declared when you call your function the first time? if it is a static char array then it will be filled with 0's and that would explain why it works, otherwise its just "luck"
I don't think I'm very good at thinking recursively. When I try and
step through the program in my mind, I
lose track of what is still awaiting
execution. Is there a better way of
seeing what is happening internally,
to help me learn?
honestly, I am not sure if anybody is good in thinking recursively :-)
Did you try drawing the execution step by step?