Check if strings contain the same number of the same characters - c

Here is the problem we have to check if two strings contain the same characters, regardless of order. For example s1=akash s2=ashka match.
My program is showing NO for every input strings;
s1 and s2 are two input strings
t is the number of testcases
->it would be really helpful if you can tell me where is the error I am a beginner
#include<stdio.h>
#include<string.h>
int main(){
int t,i,j;
scanf("%d",&t);
while(t>0){
char s1[100],s2[100];
scanf("%s ",s1);
scanf("%s",s2);
int count=0;
int found[100];
for(i=0;i<strlen(s1)-1;i++){
for(j=0;j<strlen(s1);j++){
if(s1[i]==s2[j]){
found[i]=1;
break;
}
}
}
for(i=0;i<strlen(s1);i++){
if(found[i]!=1){
count=1;
break;
}
}
if(count==1)
printf("NO");
else
printf("YES");
t--;
}
}

Some good answers above suggest sorting the strings first.
If you want to modify your program above to do this job then you need to modify it as you realised. I have a suggestion (in words) for how to do this below - after that there is a modified code that works, and finally a couple of extra points.
I guess that two strings aa and a would not be equal according to your definition, but your program would say that they were equal because once you find a character you do not have anyway of saying that it has been 'used up'
I would suggest that you change your found[] array so that it records when a character in the second string is matched.
I suggest logic as follows.
Loop through all S1 characters
| Loop through S2 charaters
| - if you get a match mark the S2 character as found
| - if you don't get a match by the end of the S2 loops then you are done - they are not equal
At the end of the S1 loop if you have not finished early then every character is matched, but you need to go through found[] array to check that every character in S2 was found.
working code is below....
note
you did not initialize found - it is initialize below in code
the first loop needs to have < strlen(s1) not < strlen(s1)-1
the second loop you should have been going to strlen(s2).
logic changed as described above so that found records characters found in s2 not s1
logic also changed so that if a character in s1 is not found the loop breaks early. There are tests to see if the loop broke early to see if the values of i and j are what we expect at the end of the loop.
edited code below (at the bottom below the code are some extra comments)
#include<stdio.h>
#include<string.h>
int main(){
int t,i,j;
scanf("%d",&t);
while(t>0){
char s1[100],s2[100];
scanf("%s ",s1);
scanf("%s",s2);
int count=0;
int found[100]={ 0 };
for(i=0;i<strlen(s1);i++){
for(j=0;j<strlen(s2);j++){
if(found[j]==1) continue; // character S2[j] already found
if(s1[i]==s2[j]){
found[j]=1;
break;
}
}
if (j==strlen(s2)) {
break; // we get here if we did not find a match for S1[i]
}
}
if (i!=strlen(s1)) {
printf("NO"); // we get here if we did not find a match for S1[i]
}
else {
// matched all of S1 now check S2 all matched
for(i=0;i<strlen(s2);i++){
if(found[i]!=1){
count=1;
break;
}
}
if(count==1) {
printf("NO");
}
else {
printf("YES");
}
}
t--;
}
return 0;
}
Two extra points to make your code more efficient.
First, as suggested by #chux it will probably be faster not to have strlen(s2) in the condition for the loop. What you could have instead would be for (j=0;s2[j];j++). This works because the final character at the end of the string will have the value 0 and in C a value of 0 means false.. in the for loop the loop runs whilst the logic statement is true and when it is false the loop stops. The speed up of not using strlen[s2] in the loop is because the compiler might decide to calculate strlen[s2] each time you go through the loop, which means counting for l2 if l2 is the length of s2 - thus as you have to go through the two loops l1*l2 times potentially with the strlen counting you actually have l1*l2*l2 steps.
secondly, you could speed up many tests by checking to see if the lengths of the two strings are different before checking if they contain the same number of the same types of character.

As suggested in my comment, and since it's now a bit more clear, an easy way to compare two multisets represented as strings is to:
Sort the two strings (easy using the qsort() standard function)
Compare the result (using the strcmp() standard function)
This will work since it will map both "akash" and "ashka" to "aahks", before comparing.

Sort both the strings by using bubble sort or any other tech. you know , then simply compair both strings by using strcmp() function .

for(i=0;i<strlen(s1)-1;i++){
for(j=0;j<strlen(s1);j++){
if(s1[i]==s2[j]){
found[i]=1;
break;
}
}
}
I am not able to understand why are you using j<strlen(s1) is second loop.
I think simple solution will be sorting the characters alphabetically and comparing one by one in single loop.

First, note that found is never initialized. The values within it are unknown. It ought to be initialized by setting every element to zero before each test for equality. (Or, if not every element, every element up to strlen(s1)-1, as those are the ones that will be used.)
Once found is initialized, though, there is another problem.
The first loop on i uses for(i=0;i<strlen(s1)-1;i++). Within this, found[i] is set if a match is found to s1[i]. Note that i never reaches strlen(s1)-1 within the loop, since the loop terminates when it does.
The second loop on i uses for(i=0;i<strlen(s1);i++). Within this loop, found[i] is tested to see if it is set. Note that i does reach strlen(s1)-1, since the loop terminates only when i reaches strlen(s1). However, found[strlen(s1)-1] can never have been set by the first loop, since i never reaches strlen(s1)-1 in the first loop. Therefore, the second loop would always report failure.
Additionally, it is not clear whether two strings ought to be considered equal if and only if they are anagrams (the characters in one can be rearranged to form the other string, without adding or removing any characters) or if each character in one string is found at least once in the other (“aaabbc” would be equal to “abbccc”, because both strings contain a, b, and c).
As written, with the initialization and loop bugs fixed, your program tests whether each character in the first string appears in the second string. This is not an equivalence relation because it is not reflexive: It does not test whether each character in the second string appears in the first string. So, you need to think more about what property you want to test and how to test for it.

Complicated solutions I did as a training. Two implementations controlled with a macro below.
First implementation loops through every character in the string, counts it's count in the first and second string and compares the values.
The second implementation allocates and creates a map of characters with count for each string and then compares these maps.
#include <stdio.h>
#include <string.h>
#include <stdbool.h>
#include <assert.h>
#include <stdlib.h>
#include <errno.h>
// configuration
#define STRCHARSETCNTCMP_METHOD_FOREACH 0
#define STRCHARSETCNTCMP_METHOD_MAP 1
// eof configuration
//#define dbgln(fmt, ...) fprintf(stderr, "%s:%d: " fmt "\n", __func__, __LINE__, ##__VA_ARGS__)
#define dbgln(...) ((void)0)
/**
* STRing CHARacter SET CouNT CoMPare
* compare the count of set of characters in strings
* #param first string
* #param the other string
* #ret true if each character in s1 is used as many times in s2
*/
bool strcharsetcntcmp(const char s1[], const char s2[]);
// Count how many times the character is in the string
size_t strcharsetcntcmp_count(const char s[], char c)
{
assert(s != NULL);
size_t ret = 0;
while (*s != '\0') {
if (*s == c) {
++ret;
}
*s++;
}
return ret;
}
// foreach method implementation
bool strcharsetcntcmp_method_foreach(const char s1[], const char s2[])
{
const size_t s1len = strlen(s1);
const size_t s2len = strlen(s2);
if (s1len != s2len) {
return false;
}
for (size_t i = 0; i < s1len; ++i) {
const char c = s1[i];
const size_t cnt1 = strcharsetcntcmp_count(s1, c);
const size_t cnt2 = strcharsetcntcmp_count(s2, c);
// printf("'%s','%s' -> '%c' -> %zu %zu\n", s1, s2, c, cnt1, cnt2);
if (cnt1 != cnt2) {
return false;
}
}
return true;
}
// array of map elements
struct strcharsetcntcmp_map_s {
size_t cnt;
struct strcharsetcntcmp_map_cnt_s {
char c;
size_t cnt;
} *map;
};
// initialize empty map
void strcharsetcntcmp_map_init(struct strcharsetcntcmp_map_s *t)
{
assert(t != NULL);
dbgln("%p", t);
t->map = 0;
t->cnt = 0;
}
// free map memory
void strcharsetcntcmp_map_fini(struct strcharsetcntcmp_map_s *t)
{
assert(t != NULL);
dbgln("%p %p", t, t->map);
free(t->map);
t->map = 0;
t->cnt = 0;
}
// get the map element for character from map
struct strcharsetcntcmp_map_cnt_s *strcharsetcntcmp_map_get(const struct strcharsetcntcmp_map_s *t, char c)
{
assert(t != NULL);
for (size_t i = 0; i < t->cnt; ++i) {
if (t->map[i].c == c) {
return &t->map[i];
}
}
return NULL;
}
// check if the count for character c was already added into the map
bool strcharsetcntcmp_map_exists(const struct strcharsetcntcmp_map_s *t, char c)
{
return strcharsetcntcmp_map_get(t, c) != NULL;
}
// map element into map, without checking if it exists (only assertion)
int strcharsetcntcmp_map_add(struct strcharsetcntcmp_map_s *t, char c, size_t cnt)
{
assert(t != NULL);
assert(strcharsetcntcmp_map_exists(t, c) == false);
dbgln("%p %p %zu %c %zu", t, t->map, t->cnt, c, cnt);
void *pnt = realloc(t->map, sizeof(t->map[0]) * (t->cnt + 1));
if (pnt == NULL) {
return -errno;
}
t->map = pnt;
t->map[t->cnt].c = c;
t->map[t->cnt].cnt = cnt;
t->cnt++;
return 0;
}
// create map from string, map needs to be initialized by init and needs to be freed with fini
int strcharsetcntcmp_map_parsestring(struct strcharsetcntcmp_map_s *t, const char s[])
{
assert(t != NULL);
assert(s != NULL);
int ret = 0;
while (*s != '\0') {
const char c = *s;
if (!strcharsetcntcmp_map_exists(t, c)) {
const size_t cnt = strcharsetcntcmp_count(s, c);
ret = strcharsetcntcmp_map_add(t, c, cnt);
if (ret != 0) {
break;
}
}
++s;
}
return ret;
}
// compare two maps if they have same sets of characters and counts
bool strcharsetcntcmp_cmp(const struct strcharsetcntcmp_map_s *t, const struct strcharsetcntcmp_map_s *o)
{
assert(t != NULL);
assert(o != NULL);
if (t->cnt != o->cnt) {
return false;
}
for (size_t i = 0; i < t->cnt; ++i) {
const char c = t->map[i].c;
const size_t t_cnt = t->map[i].cnt;
struct strcharsetcntcmp_map_cnt_s *o_map_cnt = strcharsetcntcmp_map_get(o, c);
if (o_map_cnt == NULL) {
dbgln("%p(%zu) %p(%zu) %c not found", t, t->cnt, o, o->cnt, c);
return false;
}
const size_t o_cnt = o_map_cnt->cnt;
if (t_cnt != o_cnt) {
dbgln("%p(%zu) %p(%zu) %c %zu != %zu", t, t->cnt, o, o->cnt, c, t_cnt, o_cnt);
return false;
}
dbgln("%p(%zu) %p(%zu) %c %zu", t, t->cnt, o, o->cnt, c, t_cnt);
}
return true;
}
// map method implementation
bool strcharsetcntcmp_method_map(const char s1[], const char s2[])
{
struct strcharsetcntcmp_map_s map1;
strcharsetcntcmp_map_init(&map1);
if (strcharsetcntcmp_map_parsestring(&map1, s1) != 0) {
abort(); // <insert good error handler here>
}
struct strcharsetcntcmp_map_s map2;
strcharsetcntcmp_map_init(&map2);
if (strcharsetcntcmp_map_parsestring(&map2, s2) != 0) {
abort(); // <insert good error handler here>
}
const bool ret = strcharsetcntcmp_cmp(&map1, &map2);
strcharsetcntcmp_map_fini(&map1);
strcharsetcntcmp_map_fini(&map2);
return ret;
}
bool strcharsetcntcmp(const char s1[], const char s2[])
{
assert(s1 != NULL);
assert(s2 != NULL);
#if STRCHARSETCNTCMP_METHOD_FOREACH
return strcharsetcntcmp_method_foreach(s1, s2);
#elif STRCHARSETCNTCMP_METHOD_MAP
return strcharsetcntcmp_method_map(s1, s2);
#endif
}
// unittests. Should return 0
int strcharsetcntcmp_unittest(void)
{
struct {
const char *str1;
const char *str2;
bool eq;
} const tests[] = {
{ "", "", true, },
{ "a", "b", false, },
{ "abc", "bca", true, },
{ "aab", "abb", false, },
{ "aabbbc", "cbabab", true, },
{ "123456789012345678901234567890qwertyuiopqwertyuiopasdfghjklasdfghjklzxcvbnmzxcvbnm,./;", "123456789012345678901234567890qwertyuiopqwertyuiopasdfghjklasdfghjklzxcvbnmzxcvbnm,./;", true },
{ "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", true },
{ "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890", "1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678900", false },
};
int ret = 0;
for (size_t i = 0; i < sizeof(tests)/sizeof(tests[0]); ++i) {
const bool is = strcharsetcntcmp(tests[i].str1, tests[i].str2);
if (is != tests[i].eq) {
fprintf(stderr,
"Error: strings '%s' and '%s' returned %d should be %d\n",
tests[i].str1, tests[i].str2, is, tests[i].eq);
ret = -1;
}
}
return ret;
}
int main()
{
return strcharsetcntcmp_unittest();
}

Related

Cannot access empty string from array of strings in C

I'm using an array of strings in C to hold arguments given to a custom shell. I initialize the array of buffers using:
char *args[MAX_CHAR];
Once I parse the arguments, I send them to the following function to determine the type of IO redirection if there are any (this is just the first of 3 functions to check for redirection and it only checks for STDIN redirection).
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
Once I compile and run the shell, it crashes with a SIGSEGV. Using GDB I determined that the shell is crashing on the following line:
if (strlen(args[i]) == 0) {
This is because the address of arg[i] (the first empty string after the parsed commands) is inaccessible. Here is the error from GDB and all relevant variables:
(gdb) next
359 if (strlen(args[i]) == 0) {
(gdb) p args[0]
$1 = 0x7fffffffe570 "echo"
(gdb) p args[1]
$2 = 0x7fffffffe575 "test"
(gdb) p args[2]
$3 = 0x0
(gdb) p i
$4 = 2
(gdb) next
Program received signal SIGSEGV, Segmentation fault.
parseInputFile (args=0x7fffffffd570, inputFilePath=0x7fffffffd240 "") at shell.c:359
359 if (strlen(args[i]) == 0) {
I believe that the p args[2] returning $3 = 0x0 means that because the index has yet to be written to, it is mapped to address 0x0 which is out of the bounds of execution. Although I can't figure out why this is because it was declared as a buffer. Any suggestions on how to solve this problem?
EDIT: Per Kaylum's comment, here is a minimal reproducible example
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/wait.h>
#include <sys/stat.h>
#include<readline/readline.h>
#include<readline/history.h>
#include <fcntl.h>
// Defined values
#define MAX_CHAR 256
#define MAX_ARG 64
#define clear() printf("\033[H\033[J") // Clear window
#define DEFAULT_PROMPT_SUFFIX "> "
char PROMPT[MAX_CHAR], SPATH[1024];
int parseInputFile(char **args, char *inputFilePath) {
char *inputSymbol = "<";
int isFound = 0;
for (int i = 0; i < MAX_ARG; i++) {
if (strlen(args[i]) == 0) {
isFound = 0;
break;
}
if ((strcmp(args[i], inputSymbol)) == 0) {
strcpy(inputFilePath, args[i+1]);
isFound = 1;
break;
}
}
return isFound;
}
int ioRedirectHandler(char **args) {
char inputFilePath[MAX_CHAR] = "";
// Check if any redirects exist
if (parseInputFile(args, inputFilePath)) {
return 1;
} else {
return 0;
}
}
void parseArgs(char *cmd, char **cmdArgs) {
int na;
// Separate each argument of a command to a separate string
for (na = 0; na < MAX_ARG; na++) {
cmdArgs[na] = strsep(&cmd, " ");
if (cmdArgs[na] == NULL) {
break;
}
if (strlen(cmdArgs[na]) == 0) {
na--;
}
}
}
int processInput(char* input, char **args, char **pipedArgs) {
// Parse the single command and args
parseArgs(input, args);
return 0;
}
int getInput(char *input) {
char *buf, loc_prompt[MAX_CHAR] = "\n";
strcat(loc_prompt, PROMPT);
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
add_history(buf);
strcpy(input, buf);
return 0;
} else {
return 1;
}
}
void init() {
char *uname;
clear();
uname = getenv("USER");
printf("\n\n \t\tWelcome to Student Shell, %s! \n\n", uname);
// Initialize the prompt
snprintf(PROMPT, MAX_CHAR, "%s%s", uname, DEFAULT_PROMPT_SUFFIX);
}
int main() {
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
int isPiped = 0, isIORedir = 0;
init();
while(1) {
// Get the user input
if (getInput(input)) {
continue;
}
isPiped = processInput(input, args, pipedArgs);
isIORedir = ioRedirectHandler(args);
}
return 0;
}
Note: If I forgot to include any important information, please let me know and I can get it updated.
When you write
char *args[MAX_CHAR];
you allocate room for MAX_CHAR pointers to char. You do not initialise the array. If it is a global variable, you will have initialised all the pointers to NULL, but you do it in a function, so the elements in the array can point anywhere. You should not dereference them before you have set the pointers to point at something you are allowed to access.
You also do this, though, in parseArgs(), where you do this:
cmdArgs[na] = strsep(&cmd, " ");
There are two potential issues here, but let's deal with the one you hit first. When strsep() is through the tokens you are splitting, it returns NULL. You test for that to get out of parseArgs() so you already know this. However, where your program crashes you seem to have forgotten this again. You call strlen() on a NULL pointer, and that is a no-no.
There is a difference between NULL and the empty string. An empty string is a pointer to a buffer that has the zero char first; the string "" is a pointer to a location that holds the character '\0'. The NULL pointer is a special value for pointers, often address zero, that means that the pointer doesn't point anywhere. Obviously, the NULL pointer cannot point to an empty string. You need to check if an argument is NULL, not if it is the empty string.
If you want to check both for NULL and the empty string, you could do something like
if (!args[i] || strlen(args[i]) == 0) {
If args[i] is NULL then !args[i] is true, so you will enter the if body if you have NULL or if you have a pointer to an empty string.
(You could also check the empty string with !(*args[i]); *args[i] is the first character that args[i] points at. So *args[i] is zero if you have the empty string; zero is interpreted as false, so !(*args[i]) is true if and only if args[i] is the empty string. Not that this is more readable, but it shows again the difference between empty strings and NULL).
I mentioned another issue with the parsed arguments. Whether it is a problem or not depends on the application. But when you parse a string with strsep(), you get pointers into the parsed string. You have to be careful not to free that string (it is input in your main() function) or to modify it after you have parsed the string. If you change the string, you have changed what all the parsed strings look at. You do not do this in your program, so it isn't a problem here, but it is worth keeping in mind. If you want your parsed arguments to survive longer than they do now, after the next command is passed, you need to copy them. The next command that is passed will change them as it is now.
In main
char input[MAX_CHAR];
char *args[MAX_CHAR], *pipedArgs[MAX_CHAR];
are all uninitialized. They contain indeterminate values. This could be a potential source of bugs, but is not the reason here, as
getInput modifies the contents of input to be a valid string before any reads occur.
pipedArgs is unused, so raises no issues (yet).
args is modified by parseArgs to (possibly!) contain a NULL sentinel value, without any indeterminate pointers being read first.
Firstly, in parseArgs it is possible to completely fill args without setting the NULL sentinel value that other parts of the program should rely on.
Looking deeper, in parseInputFile the following
if (strlen(args[i]) == 0)
contradicts the limits imposed by parseArgs that disallows empty strings in the array. More importantly, args[i] may be the sentinel NULL value, and strlen expects a non-NULL pointer to a valid string.
This termination condition should simply check if args[i] is NULL.
With
strcpy(inputFilePath, args[i+1]);
args[i+1] might also be the NULL sentinel value, and strcpy also expects non-NULL pointers to valid strings. You can see this in action when inputSymbol is a match for the final token in the array.
args[i+1] may also evaluate as args[MAX_ARGS], which would be out of bounds.
Additionally, inputFilePath has a string length limit of MAX_CHAR - 1, and args[i+1] is (possibly!) a dynamically allocated string whose length might exceed this.
Some edge cases found in getInput:
Both arguments to
strcat(loc_prompt, PROMPT);
are of the size MAX_CHAR. Since loc_prompt has a length of 1. If PROMPT has the length MAX_CHAR - 1, the resulting string will have the length MAX_CHAR. This would leave no room for the NUL terminating byte.
readline can return NULL in some situations, so
buf = readline(loc_prompt);
if (strlen(buf) != 0) {
can again pass the NULL pointer to strlen.
A similar issue as before, on success readline returns a string of dynamic length, and
strcpy(input, buf);
can cause a buffer overflow by attempting to copy a string greater in length than MAX_CHAR - 1.
buf is a pointer to data allocated by malloc. It's unclear what add_history does, but this pointer must eventually be passed to free.
Some considerations.
Firstly, it is a good habit to initialize your data, even if it might not matter.
Secondly, using constants (#define MAX_CHAR 256) might help to reduce magic numbers, but they can lead you to design your program too rigidly if used in the same way.
Consider building your functions to accept a limit as an argument, and return a length. This allows you to more strictly track the sizes of your data, and prevents you from always designing around the maximum potential case.
A slightly contrived example of designing like this. We can see that find does not have to concern itself with possibly checking MAX_ARGS elements, as it is told precisely how long the list of valid elements is.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_ARGS 100
char *get_input(char *dest, size_t sz, const char *display) {
char *res;
if (display)
printf("%s", display);
if ((res = fgets(dest, sz, stdin)))
dest[strcspn(dest, "\n")] = '\0';
return res;
}
size_t find(char **list, size_t length, const char *str) {
for (size_t i = 0; i < length; i++)
if (strcmp(list[i], str) == 0)
return i;
return length;
}
size_t split(char **list, size_t limit, char *source, const char *delim) {
size_t length = 0;
char *token;
while (length < limit && (token = strsep(&source, delim)))
if (*token)
list[length++] = token;
return length;
}
int main(void) {
char input[512] = { 0 };
char *args[MAX_ARGS] = { 0 };
puts("Welcome to the shell.");
while (1) {
if (get_input(input, sizeof input, "$ ")) {
size_t argl = split(args, MAX_ARGS, input, " ");
size_t redirection = find(args, argl, "<");
puts("Command parts:");
for (size_t i = 0; i < redirection; i++)
printf("%zu: %s\n", i, args[i]);
puts("Input files:");
if (redirection == argl)
puts("[[NONE]]");
else for (size_t i = redirection + 1; i < argl; i++)
printf("%zu: %s\n", i, args[i]);
}
}
}

Extending C Brute Force Algorithm

I need a brute force algorithm over alphanumeric characters.
The code I use just prints all the permutations to the standard output. I tried for hours but failed to rewrite the code in such a manner that I can just call a function brute_next() to get the next codeword when needed.
Can someone help me rewrite this code? The function brute_next() should return a char* or alternatively gets an char* as parameter. I'm using CLion with gcc under Mac.
The code is (source):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static const char alphabet[] =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789";
static const int alphabetSize = sizeof(alphabet) - 1;
void bruteImpl(char* str, int index, int maxDepth)
{
for (int i = 0; i < alphabetSize; ++i)
{
str[index] = alphabet[i];
if (index == maxDepth - 1) printf("%s\n", str);
else bruteImpl(str, index + 1, maxDepth);
}
}
void bruteSequential(int maxLen)
{
char* buf = malloc(maxLen + 1);
for (int i = 1; i <= maxLen; ++i)
{
memset(buf, 0, maxLen + 1);
bruteImpl(buf, 0, i);
}
free(buf);
}
int main(void)
{
bruteSequential(3);
return 0;
}
This is my non-working attempt to convert the recursion into a generator. Just can't figure out how the permutation algorithm works.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static const char alphabet[] =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789"
"$%&/()=.-_;!+*#";
static const int alphabetSize = sizeof(alphabet) - 1;
struct bruteconfig {
int index;
int i1;
int i2;
char* str;
int maxDepth;
};
static struct bruteconfig* config;
void brute_init(int maxLen){
free(config);
config = malloc(sizeof(struct bruteconfig*));
config->i1 = 1;
config->i2 = 0;
config->index = 0;
config->maxDepth = maxLen;
}
void bruteImpl()
{
if(config->i2 > alphabetSize) // how to transform for to iterative?
config->i2 = 0;
config->str[config->index] = alphabet[config->i2];
if (config->index == config->maxDepth - 1) {
//printf("%s\n", config->str);
return; // str filled with next perm
}
else {
config->index++;
//bruteImpl(config->str, config->maxDepth);
}
config->i2++;
}
char* bruteSequential()
{
config->str = malloc(config->maxDepth + 1);
if(config->i1 >= config->maxDepth)
return NULL;
memset(config->str, 0, config->maxDepth + 1); // clear buf
bruteImpl(config->str, config->i1); // fill with next perm
return config->str;
//free(buf); // needs to be done by the caller
}
You're trying to switch from recursion to using a generator: the key difference is that recursion stores working state implicitly in the call stack, while a generator needs to store all its state explicitly for the next call.
So, first you need to think about what state is being implicitly held for you in the recursive version:
each level of your recursive call has its own value for the parameter index
each level has its own value for the local variable i
... and that's it.
You have maxDepth levels, numbered 0..maxDepth-1, each with its own current position in the alphabet. Note that the index argument is also just the position in this collection, so you don't need to store it separately.
Now, you need to store some persistent state between calls, and it's going to be this array of maxDepth integer alphabet positions. Can you figure out how to write a function to convert that array into a string? Can you figure out how to advance the state one place in the same way your recursive code would?
Edit your state should probably look something like
struct PermutationState {
/* stringLength == maxDepth */
int stringLength;
char *string;
/* better to avoid globals */
int alphaLength;
const char *alphabet;
/* this replaces i as the index into our alphabet */
int *alphaPos;
};
and I'd suggest writing an interface like
struct PermutationState* start_permutation(int stringLength,
int alphaLength,
const char *alphabet)
{
struct PermutationState *state = malloc(sizeof(*state));
if (!state) return NULL;
/* initialize scalar values first, for easier error-handling */
state->stringLength = stringLength;
state->string = NULL;
state->alphaLength = alphaLength;
state->alphabet = alphabet;
state->alphaPos = NULL;
/* now we can handle nested allocations */
state->string = malloc(stringLength + 1);
state->alphaPos = calloc(stringLength, sizeof(int));
if (state->string && state->alphaPos) {
/* both allocations succeeded, and alphaPos is already zeroed */
memset(state->string, alphabet[0], stringLength);
state->string[stringLength] = 0;
return state;
}
/* one or both of the nested allocations failed */
end_permutation(state);
return NULL;
}
void end_permutation(struct PermutationState *state)
{
free(state->string);
free(state->alphaPos);
free(state);
}
and finally you're looking to implement this function:
char *next_permutation(struct PermutationState *state)
{
/* TODO */
}
Since start_permutation has already set you up with state->alphaPos = [0, 0, ... 0] and state->string = "aaa...a", you probably want to advance alphaPos by one position and then return the current string.
NB. I assumed you don't need to copy the alphabet, which means the caller is responsible for guaranteeing its lifetime. You could easily copy that too, if necessary.
I just can't figure out how the permutation algorithm works
It's quite simple: To get from one word to the next, start at the rightmost position, change that character to the next in the alphabet; if there's no next character, reset the position to the first character in the alphabet and continue with changing the position to the left; if there's no position left, the codeword needs to be lengthened. Here's a sample implementation:
char *brute_next()
{
for (; ; )
{
static char *buf; // buffer for codeword
static int maxDepth; // length of codeword
int i, index = maxDepth-1; // alphabet and buffer index, resp.
while (0 <= index) // as long as current length suffices:
{ // next char at buf[index] is next in alphabet or first:
i = buf[index] ? strchr(alphabet, buf[index]) - alphabet + 1 : 0;
if (buf[index] = alphabet[i]) return buf;
buf[index--] = alphabet[0]; // reset to 'a', continue to the left
}
index = maxDepth++; // now need to lengthen the codeword
buf = realloc(buf, maxDepth+1); // string length + terminator
if (!buf) exit(1);
buf[index] = buf[maxDepth] = '\0';
}
}

Appending a char to a char* in C?

I'm trying to make a quick function that gets a word/argument in a string by its number:
char* arg(char* S, int Num) {
char* Return = "";
int Spaces = 0;
int i = 0;
for (i; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
//Want to append S[i] to Return here.
}
else if (Spaces > Num) {
return Return;
}
}
printf("%s-\n", Return);
return Return;
}
I can't find a way to put the characters into Return. I have found lots of posts that suggest strcat() or tricks with pointers, but every one segfaults. I've also seen people saying that malloc() should be used, but I'm not sure of how I'd used it in a loop like this.
I will not claim to understand what it is that you're trying to do, but your code has two problems:
You're assigning a read-only string to Return; that string will be in your
binary's data section, which is read-only, and if you try to modify it you will get a segfault.
Your for loop is O(n^2), because strlen() is O(n)
There are several different ways of solving the "how to return a string" problem. You can, for example:
Use malloc() / calloc() to allocate a new string, as has been suggested
Use asprintf(), which is similar but gives you formatting if you need
Pass an output string (and its maximum size) as a parameter to the function
The first two require the calling function to free() the returned value. The third allows the caller to decide how to allocate the string (stack or heap), but requires some sort of contract about the minumum size needed for the output string.
In your code, when the function returns, then Return will be gone as well, so this behavior is undefined. It might work, but you should never rely on it.
Typically in C, you'd want to pass the "return" string as an argument instead, so that you don't have to free it all the time. Both require a local variable on the caller's side, but malloc'ing it will require an additional call to free the allocated memory and is also more expensive than simply passing a pointer to a local variable.
As for appending to the string, just use array notation (keep track of the current char/index) and don't forget to add a null character at the end.
Example:
int arg(char* ptr, char* S, int Num) {
int i, Spaces = 0, cur = 0;
for (i=0; i<strlen(S); i++) {
if (S[i] == ' ') {
Spaces++;
}
else if (Spaces == Num) {
ptr[cur++] = S[i]; // append char
}
else if (Spaces > Num) {
ptr[cur] = '\0'; // insert null char
return 0; // returns 0 on success
}
}
ptr[cur] = '\0'; // insert null char
return (cur > 0 ? 0 : -1); // returns 0 on success, -1 on error
}
Then invoke it like so:
char myArg[50];
if (arg(myArg, "this is an example", 3) == 0) {
printf("arg is %s\n", myArg);
} else {
// arg not found
}
Just make sure you don't overflow ptr (e.g.: by passing its size and adding a check in the function).
There are numbers of ways you could improve your code, but let's just start by making it meet the standard. ;-)
P.S.: Don't malloc unless you need to. And in that case you don't.
char * Return; //by the way horrible name for a variable.
Return = malloc(<some size>);
......
......
*(Return + index) = *(S+i);
You can't assign anything to a string literal such as "".
You may want to use your loop to determine the offsets of the start of the word in your string that you're looking for. Then find its length by continuing through the string until you encounter the end or another space. Then, you can malloc an array of chars with size equal to the size of the offset+1 (For the null terminator.) Finally, copy the substring into this new buffer and return it.
Also, as mentioned above, you may want to remove the strlen call from the loop - most compilers will optimize it out but it is indeed a linear operation for every character in the array, making the loop O(n**2).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *arg(const char *S, unsigned int Num) {
char *Return = "";
const char *top, *p;
unsigned int Spaces = 0;
int i = 0;
Return=(char*)malloc(sizeof(char));
*Return = '\0';
if(S == NULL || *S=='\0') return Return;
p=top=S;
while(Spaces != Num){
if(NULL!=(p=strchr(top, ' '))){
++Spaces;
top=++p;
} else {
break;
}
}
if(Spaces < Num) return Return;
if(NULL!=(p=strchr(top, ' '))){
int len = p - top;
Return=(char*)realloc(Return, sizeof(char)*(len+1));
strncpy(Return, top, len);
Return[len]='\0';
} else {
free(Return);
Return=strdup(top);
}
//printf("%s-\n", Return);
return Return;
}
int main(){
char *word;
word=arg("make a quick function", 2);//quick
printf("\"%s\"\n", word);
free(word);
return 0;
}

using functions in c (return value)

Learning C and having many doubts.
I have a function (lets say function 1) that calls another function (lets say function 2).
Function 2 calculates an array of string.
How can I use this array in function 1?
Some code example:
int find_errors(char* word)
{
char error[100];
/*Given the word, It will find the duplicate chars and store it in the
error array. */
return 0;
}
int find_word(char* word)
{
find_errors (word);
printf("%s\n", error);
return 0;
}
There are at least three possible approaches:
Use a global variable
pass a parameter between them
return a pointer from the function
There are multiple ways to do this.
1) Create a dynamic array and return a pointer to the array. This will require you to manually free the memory for the array at a later time.
#define NUM_ELEMS 50
// In find_error():
char* error = malloc(NUM_ELEMS * sizeof(char));
return error;
// In find_word():
char *error = find_errors();
// do stuff
free(error);
2) Pass a pointer to find_errors that it can use as the error array. This will not require you to manually free the memory.
// In find_word():
char error[NUM_ELEMS];
find_error(error);
3) Use a global array. May make it more difficult for other people to understand your code. Has other potential problems as well.
// In global scope:
char error[NUM_ELEMS];
Your question relates to "call-by-reference" and "call-by-value".
char* getNewValsToSet(void)
{
char* new_vals = (char*) malloc(sizeof(char[5]));
new_vals[4] = '\0';
return new_vals;
}
void setValuesEven(char* vals_to_set)
{
vals_to_set[0] = 'A';
vals_to_set[2] = 'C';
}
void setValuesOdd(char* vals_to_set)
{
vals_to_set[1] = 'B';
vals_to_set[3] = 'D';
}
int main(void)
{
char* some_vals_to_set = getNewValsToSet();
setValsEven(some_vals_to_set);
setValsOdd(some_vals_to_set);
// ... now has vals "ABCD"
free(some_vals_to_set); //cleanup
return 0;
}
If you have "doubts" about learning C, IMHO it's one of the best things you can do (no matter the language in which you work) because it will explain exactly how things work "under-the-hood" (which all high-level languages try to hide to some degree).
You need to declare the error array globally and use it just like you did.
EDIT: using global variables isn't the best practice in most of the cases, like this one.
Here is an example of what you are looking for with an awesome console output. It dynamically allocates the array to hold any number errors (duplicate characters in your case) that may occur.
//Only free errors if result is > 0
int find_errors(char* word, char** errors)
{
int num_errors = 0;
int word_length = strlen(word);
int ARRAY_SIZE = MIN(8, word_length);
char existing[word_length];
int existing_index = 0;
*errors = NULL;
for(int i = 0; i < word_length; i++)
{
char character = word[i];
//Search array
for (int n = 0; n < word_length; ++n ) {
if(n >= existing_index)
{
existing[n] = character;
existing_index++;
break;
}
if (existing[n] == character) {
num_errors++;
if(!*errors)
*errors = (char*)malloc(ARRAY_SIZE * sizeof(char));
//Check if we need to resize array
if(num_errors >= ARRAY_SIZE)
{
ARRAY_SIZE *= 2;
ARRAY_SIZE = MIN(ARRAY_SIZE, word_length);
char *tmp = (char*)malloc(ARRAY_SIZE * sizeof(char));
memcpy(tmp, *errors, (unsigned long)ARRAY_SIZE);
free(*errors);
*errors = tmp;
}
//Set the error character
(*errors)[num_errors - 1] = character;
break;
}
}
}
return num_errors;
}
int find_word(char* word)
{
char* errors;
int errCount = find_errors (word, &errors);
if(errCount > 0)
{
printf("Invalid Characters: ");
for(int i =0; i < errCount; i++)
{
printf("%c ", errors[i]);
}
printf("\n");
free(errors);
}
return 0;
}
int main(int argc, char *argv[])
{
find_word("YWPEIT");
find_word("Hello World");
find_word("XxxxXXxXXoooooooOOOOOOOOOOOOOOOooooooooOOOOOOOOOOOOooooooOOO");
}

splitting a full filename into parts

I am creating a function that will split a full unix filename(like /home/earlz/test.bin) into its individual parts. I have got a function, and it works for the first two parts perfect, but after that it produces erroneous output...
strlcpy_char will copy a string using term as the terminator, as well as 0.
If it is terminated with term, then term will be the last character of the string, then null.
returns trg string length...
int strlcpy_char(char *trg,const char *src,int max,char term){
int i;
if(max==0){return 0;}
for(i=0;i<max-1;i++){
if(*src==0){
*trg=0;
return i;
}
if(*src==term){
*trg=term;
trg++;
*trg=0; //null terminate
return i+1;
}
*trg=*src;
src++;
trg++;
}
*trg=0;
return max;
}
.
int get_path_part(char *file,int n,char *buf){
int i;
int current_i=0;
//file is assumed to start with '/'so it skips the first character.
for(i=0;i<=n;i++){
current_i++;
current_i=strlcpy_char(buf,&file[current_i],MAX_PATH_PART_SIZE,'/');
if(current_i<=1){ //zero length string..
kputs("!"); //just a debug message. This never happens with the example
return -1; //not enough parts to the path
}
}
if(buf[current_i-1]=='/'){
return 1; //is not the last part
}else{
return 0; //is the last part(the file part)
}
}
I use this code to test it:
kputs("test path: ");
kgets(cmd);
kputs("\n");
char *tmp=malloc(256);
int i=0;
get_path_part(cmd,i,tmp);
kputs(tmp);
kputs("\n");
i=1;
get_path_part(cmd,i,tmp);
kputs(tmp);
kputs("\n");
i=2;
get_path_part(cmd,i,tmp);
kputs(tmp);
kputs("\n");
When I try something like "/home/test.bin" it works right outputting
/home
/test.bin
But when I try "/home/earlz/test.bin" I get
/home
/earlz
/arlz
Anyone see the problem in my code, as I've been looking but I just can't see any problem.
Also, before you say "but there is a library for that" I am doing this in an operating system kernel, so I barely have a standard library. I only have parts of string.h and really that's about it for standard.
You overwrite current_i instead of adding it up as you walk through the path.
So
current_i++;
current_i=strlcpy_char(buf,&file[current_i],MAX_PATH_PART_SIZE,'/');
should really be
current_i += strlcpy_char(buf,&file[current_i+1],MAX_PATH_PART_SIZE,'/');
I think you need to track your current_i for i>1 since the max value returned from the strlcpy has no idea of where you are in the overall file string. does it make sense?
current_i=strlcpy_char(buf,&file[current_i],MAX_PATH_PART_SIZE,'/');
Don't you need to do something like
tocurrent_i += strlcpy_char...
instead of
tocurrent_i = strlcpy_char...
Does your code have to be re-entrant?
If not use strtok, it is in strings.h
STRTOK(P)
NAME
strtok, strtok_r - split string into tokens
SYNOPSIS
#include <string.h>
char *strtok(char *restrict s1, const char *restrict s2);
char *strtok_r(char *restrict s, const char *restrict sep,
char **restrict lasts);
Sorry for not commenting on your code though :)
If you are using Glib, g_strsplit is very nice and easy to use.
This is how I'd do it
char ** split_into_parts(char *path) {
char ** parts = malloc(sizeof(char *) * 100);
int i = 0;
int j = 0;
if (*path == '/') {
path++;
}
parts[0] = 0;
while (*path) {
if (*path == '/') {
parts[i][j] = 0;
i++;
parts[i] = 0;
j = 0;
} else {
if (parts[i] == 0) {
parts[i] = malloc(sizeof(char) * 100);
}
parts[i][j] = *path;
j++;
}
path++;
}
parts[i+1] = 0;
return parts;
}
Try something like the code I have below.
If you need implementations of standard C functions (like strchr()) try koders.com or just google for strchr.c.
#include <stdio.h>
#include <string.h>
const char *NextToken(const char *pStart, char chSep, char *pToken, size_t nTokMax)
{
const char *pEnd;
size_t nLength;
/* set output to empty */
*pToken=0;
/* make sure input is OK */
if (!pStart || *pStart!=chSep)
return NULL;
/* find end of token */
pEnd = strchr(pStart+1, chSep);
if (pEnd)
nLength = pEnd - pStart;
else
nLength = strlen(pStart);
if (nLength >= nTokMax) /* too big */
return NULL;
strncpy(pToken, pStart, nLength);
pToken[nLength] = 0;
return pEnd;
}
int main()
{
#define BUFFSIZE 256
char cmd[BUFFSIZE];
char tmp[BUFFSIZE];
const char *pStart=cmd;
int i=0;
puts("test path: ");
fgets(cmd, BUFFSIZE, stdin);
puts("");
do {
pStart = NextToken(pStart, '/', tmp, BUFFSIZE);
if (tmp[0])
puts(tmp);
} while (pStart);
return 0;
}

Resources