Swift String.Index vs transforming the String to an Array - arrays

In the swift doc, they say they use String.Index to index strings, as different characters can take a different amount of memory.
But I saw a lot of people transforming a String into an array var a = Array(s) so they can index by int instead of String.Index (which is definitely easier)
So I wanted to test by myself if it's exactly the same for all unicode character:
let cafeA = "caf\u{E9}" // eAcute
let cafeB = "caf\u{65}\u{301}" // combinedEAcute
let arrayCafeA = Array(cafeA)
let arrayCafeB = Array(cafeB)
print("\(cafeA) is \(cafeA.count) character \(arrayCafeA.count)")
print("\(cafeB) is \(cafeB.count) character \(arrayCafeB.count)")
print(cafeA == cafeB)
print("- A scalar")
for scalar in cafeA.unicodeScalars {
print(scalar.value)
}
print("- B scalar")
for scalar in cafeB.unicodeScalars {
print(scalar.value)
}
And here is the output :
café is 4 character 4
café is 4 character 4
true
- A scalar
99
97
102
233
- B scalar
99
97
102
101
769
And sure enough, as mentioned in the doc strings are just an array of Character, and then the grapheme cluster is down within the Character object, so why don't they indexed it by int ? what's the point of creating/using String.Index actually ?

In a String, the byte representation is packed, so there's no way to know where the character boundaries are without traversing the whole string from the start.
When converting to an array, this is traversal is done once, and the result is an array of characters that are equidistantly spaced out in memory, which is what allows constant time subscripting by an Int index. Importantly, the array is preserved, so many subscripting operations can be done upon the same array, requiring only one traversal of the String's bytes, for the initial unpacking.
It is possible extend String with a subscript that indexes it by an Int, and you see it often come up on SO, but that's ill advised. The standard library programmers could have added it, but they purposely chose not to, because it obscures the fact that every indexing operation requires a separate traversal of the String's bytes, which is O(string.count). All of a sudden, innocuous code like this:
for i in string.indices {
print(string[i]) // Looks O(1), but is actually O(string.count)!
}
becomes quadratic.

Related

Number sequences length, element first and last indexes in array

Im beginner in programming. My question is how to count number sequences in input array? For example:
input array = [0,0,1,1,1,1,1,1,0,1,0,1,1,1]
output integer = 3 (count one-sequences)
And how to calculate number sequences first and last indexes in input array? For example:
input array = [0,0,1,1,1,1,1,1,0,1,0,1,1,1]
output array = [3-8,10-10,12-14] (one first and last place in a sequence)
I tried to solve this problem in C with arrays. Thank you!
Your task is a good exercise to familiarize you with the 0-based array indexes used in C, iterating arrays, and adjusting the array indexes to 1-based when the output requires.
Taking the first two together, 0-based arrays in C, and iterating over the elements, you must first determine how many elements are in your array. This is something that gives new C programmers trouble. The reason being is for general arrays (as opposed to null-terminated strings), you must either know the number of elements in the array, or determine the number of elements within the scope where the array was declared.
What does that mean? It means, the only time you can use the sizeof operator to determine the size of an array is inside the same scope (i.e. inside the same block of code {...} where the array is declared. If the array is passed to a function, the parameter passing the array is converted (you may see it referred to as decays) to a pointer. When that occurs, the sizeof operator simply returns the size of a pointer (generally 8-bytes on x86_64 and 4-bytes on x86), not the size of the array.
So now you know the first part of your task. (1) declare the array; and (2) save the size of the array to use in iterating over the elements. The first you can do with int array[] = {0,0,1,1,1,1,1,1,0,1,0,1,1,1}; and the second with sizeof array;
Your next job is to iterate over each element in the array and test whether it is '0' or '1' and respond appropriately. To iterate over each element in the array (as opposed to a string), you will typically use a for loop coupled with an index variable ( 'i' below) that will allow you to access each element of the array. You may have something similar to:
size_t i = 0;
...
for (i = 0; i< sizeof array; i++) {
... /* elements accessed as array[i] */
}
(note: you are free to use int as the type for 'i' as well, but for your choice of type, you generally want to ask can 'i' ever be negative here? If not, a choice of a type that handles only positive number will help the compiler warn if you are misusing the variable later in your code)
To build the complete logic you will need to test for all changes from '0' to '1' you may have to use nested if ... else ... statements. (You may have to check if you are dealing with array[0] specifically as part of your test logic) You have 2 tasks here. (1) determine if the last element was '0' and the current element '1', then update your sequence_count++; and (2) test if the current element is '1', then store the adjusted index in a second array and update the count or index for the second array so you can keep track of where to store the next adjusted index value. I will let you work on the test logic and will help if you get stuck.
Finally, you need only print out your final sequence_count and then iterate over your second array (where you stored the adjusted index values for each time array was '1'.
This will get you started. Edit your question and add your current code when you get stuck and people can help further.

Storing one value for each item in 2-Dimensional array

Luckily I came up with a decent title, describing what I was curious about.
While this is really hard for me to explain, I am doing my best.
I tried, storing values in 3D array as such:
char arr[10][10][1];
To copy a string I have to do it in arr[y][x], (And I sadly I can't in just arr[y])but then, because of a reason still unknown for me, I could overflow the buffer with arr[8][8][8]. Maybe because of the size of char** but anyway.
I couldn't find a slot to store a character for each item (x and y)
I tried, it the other way:
char arr[1][10][10];
Assuming that I have 1 item * x and y.
To store a string, I have to do it in arr[0][y], which means the 3rd cell will be a character from the string.
So as a resume, I am trying to store one value for each character in x and y.
Do I really need 4D array for this?
Additional clarification:
I am aware what 1D and 2D arrays are for. Seems I can't understand the 3D array.
I thought that I can store an additional item for each character at y or x.
Example:
char arr[y][x][z];
Where y is the line, x is the column and z is the additional item that applies to all the characters.
A string is an array of characters. An array of strings is therefore an array of arrays of characters. Why you think you need the 3rd dimension, I have no idea.
When you allocate a multi-dimensional array statically, you must specify the maximum number of items that it can contain. In this case, you must specify how many bytes long the string is allowed to be, including one byte for null termination. This is the right-most [] in the expression, in your case 1 byte.
So you haven't actually allocated any memory at all to store a string: 1 byte is enough to store the null termination and nothing else. This is why you get a crash/seg fault when you attempt [x][y][z] when z is any other value than 0. And you cannot store anything meaningful there either.
Size of char** has absolutely nothing to do with this whatsoever. Pointers are not arrays.
I'd strongly suggest that your study this C FAQ about pointers and arrays.
Now what you probably want to do is something like this:
char string_array [10][20+1]; // 10 strings each containing 20 letters + null
strcpy(string_array[0], "hello");
strcpy(string_array[1], "world");
...
printf("%s\n", string_array[0]);
printf("%s\n", string_array[1]);
...
No need for the 3rd dimension as such .
you can use for example a[x][y];
and you can access this using *a[];
As you can also see that while using command line arguments where 2D array *argv[] is used to store a number of strings from command line. It explains you the best how 2D arrays are used.
For further reference you can have a look at this http://www.tutorialspoint.com/cprogramming/c_multi_dimensional_arrays.htm

Context-free grammar in C

I have an assignment to make a program in C that displays a number (n < 50) of valid, context-free grammar strings using the following context-free grammar:
S -> AA|0
A -> SS|1
I had few concepts of how to do it, but after analyzing them more and more, none of them were right.
For now, I'm planning to make an array and randomly change [..., A, ...] for [..., S, S, ...] or [..., 1, ...] until there are only 0s and 1s and then check whether the same thing was already randomly generated.
I'm still not convinced if that is the right approach, and I still don't know exactly how to do that or where to keep the final words because the basic form will be an array of chars of different length. Also, in C, is a two dimensional array of chars equal to an array of strings?
Does this make any sense, and is it a proper way to do it? Or am I missing something?
You can simply make a random decision every time you need to decide on something. For example:
function A():
if (50% random chance)
return "1"
else
return concat(S(), S())
function S():
if (50% random chance)
return "0"
else
return concat(A(), A())
Calling S() multiple times give me these outputs:
"0"
"00110110100100101111010111111111001111101011100100011000000110101110000110101110
10001000110001111100011000101011000001101111000110110011101010111111111011010011
10000000101111100100011011010000000101000111110010001000101001100110100111111111
1001010011"
"11"
"10010010101111010111101"
All valid strings for your grammar. Note that you may need to tweak a little the random chances. This sample has a high probability to generate very small strings like "11".
Try to think of the context-free grammar as a set of rules that allow you to generate new strings in a language. For example, the first rule:
S -> AA | 0
How could you generate a word S in this language? One way is with a function that generates, at random, either the string "0" or two A words, concatenated.
Similarly, to implement the second rule:
A -> SS | 1
write a function that generates, at random, either "1" or two S words concatenated.
You asked several questions...
Regarding The question: BTW in C, is two dimensional array of chars equal to array of strings?
Yes.
Here are ways to declare arrays of strings, each example shows varying flexibility in terms of usage:
char **ArrayOfStrings; //most flexible declaration -
//pointer to pointer, can use `calloc()` or `malloc()` to create memory for
//any number of strings of any length (all strings will have same length)
or
char *ArrayOfStrings[10]; //somewhat flexible -
//pointer to array of 10 strings, again can use `c(m)alloc()` to allocate memory for
//each string to have any lenth (all strings will have same length)
or
ArrayOfStrings[5][10]; //Not flexible - (but still very useful)
//2 dimensional array of 5 strings, each with space for up to 9 chars + '\0'
//Note: In C, by definition, strings must always be NULL terminated.
Note: Although each of these forms are valid, and very useful when used correctly, It is good to be aware there are differences in the way each will behave in practice. (read the link for a good discussion on that)

An array of length 4-20?

I'd like for my array to be of a set length using a simple format. Please, let me know how this is done.
What I already have:
arr[100]
Pseudocode: what I would like to have:
arr[4-20] or arr[$min_int THROUGH $max_int]
Additional detail edit: The int should be within the range array = (4, 20). The input may contain leading zeros. I'd like to keep the length of the array restricted (i.e., to 9 or 10 characters).
Arrays simply do not work this way in C. You will need to implement it yourself by only looping through valid indices (and wasting memory in the process) or by using a data structure better suited to the job, like a map (which you will have to find in a library or write yourself as it does not exist in the language).
#define ARRMINIDX 4
#define ARRMAXIDX 20
int arrmem[ARRMAXIDX+1-ARRMINIDX];
#define arr(x) arrmem[ARRMINIDX+(x)]
// process elements of arr
for( i = ARRMINIDX; i <= ARRMAXIDX; i++ )
dosomething(arr(i));
OTOH, this make not be what you want at all, given your comment
I want an array with 0-1 elements: a limited int or limited "numeric
int"--string mimicking an int.
which I can't make heads or tails of in this context. Are you saying that you want a string of 4-20 chars that represents an integer?

New to programming, don't get 2D/3D arrays

Hey everyone, I'm basically new to programming. I've decided to try and get started with C (not C++ or C#) and so far I've been doing pretty well. I managed to get far as two-dimensional arrays before I started to falter. While I think I broadly understand 2D integer arrays, I certainly don't understand 3D string arrays.
I'm learning by taking the techniques and applying them in an actual program I've created, an exchange rate "calculator" that basically takes asks the user to select a base currency then prints its value in USD. There's no maths involved, I simply googled stuff like EUR/USD and set the values manually in the array which I discuss below.
But here's where I'm getting stuck. I figure the best way to learn multi-dimensional arrays is to practically apply the theory, so here's what I've typed so far (I've omitted the other functions of my program (including the code which calls this function) for brevity):
char currencies[5][3][4] = {
{'1','2','3','4','5'},
{'GBP','EUR','JPY','CAD','AUD'},
{'1.5','1.23','0.11','0.96','0.87'}
};
int point, symbol, value;
displayarraycontents()
{
for(point=1;point<5;point++){
for(symbol=1;symbol<5;symbol++){
for(value=1;symbol<5;symbol++)
printf("%s ", currencies[point][symbol][value]);
printf("\n");
}}
}
Because C doesn't feature a string data type, building string arrays completely messes with my head.
Why currencies[5][3][4]? Because I'm storing a total of 5 currencies, each marked by a 3-letter symbol (eg EUR, CAD), which have a value of up to 4 digits, including the decimal point.
I'm trying to display this list:
1 GBP 1.5
2 EUR 1.23
3 JPY 0.11
4 CAD 0.96
5 AUD 0.87
When I click build, the line where I specify the values in the array is highlighted with several instances of this warning:
warning: overflow in implicit constant conversion
...and the line where I print the contents of the array is highlighted with this warning:
warning: format '%s' expects type 'char *', but argument 2 has type 'int'
Upon running the code, the rest of the program works fine except this function, which produces a "segmentation error" or somesuch.
Could somebody give me a hand here? Any help would be greatly appreciated, as well as any links to simple C 2D/3D string array initialisation tutorials! (my two books, the K&R and Teach Yourself C only provide vague examples that aren't relevant)
Thanks in advance!
-Ryan
EDIT: updated code using struct:
struct currency {
char symbol[4];
float value[5];
};
void displayarraycontents(){
int index;
struct currency currencies[] {
{"GBP", 1.50},
{"EUR", 1.23},
{"JPY", 0.11},
{"CAD", 0.96},
{"AUD", 0.87},};
}
I get the following errors:
main.c:99: error: nested functions are disabled, use -fnested-functions to re-enable
main.c:99: error: expected '=', ',', ';', 'asm' or 'attribute' before '{' token
main.c:100: error: expected ';' before '}' token
main.c:100: error: expected expression before ',' token
In the actual code window itself, every symbol is flagged as an "unexpected token".
In this case, you don't actually want a 3D array. In fact, since you have a table of values, all you need is a 1D array.
The tricky part is that each element of the array needs to store two things: the currency symbol, and the associated exchange rate. C has a way of building a type that stores two things - it's the struct mechanism. We can define a struct to hold a single currency:
struct currency {
char symbol[4];
char value[5];
};
(Note that this does not create a variable; it creates a type. struct currency is analagous to char, except that we defined the meaning of the former ourselves).
...and we can now create an array of 5 of these:
struct currency currencies[5] = {
{"GBP", "1.5" },
{"EUR", "1.23" },
{"JPY", "0.11" },
{"CAD", "0.96" },
{"AUD", "0.87" } };
To iterate over them and print them out, the code would look like:
void displayarraycontents(void)
{
int point;
for(point = 0; point < 5; point++)
{
printf("%d %s %s\n", point + 1, currencies[point].symbol, currencies[point].value);
}
}
You need a to correct your array dimensions, and you also need to declare your strings as strings, not as multibyte character constants:
char currencies[3][5][5] = {
{"1","2","3","4","5"},
{"GBP","EUR","JPY","CAD","AUD"},
{"1.5","1.23","0.11","0.96","0.87"}
};
Your logic for the array dimensions is wrong - what you want is 3 columns, each with 5 entries, each of which is a string 5 bytes long.
Your for loop should index from 0, not from 1.
There is also a oops in for statements:
for(point=1;point<5;point++)
First item in an array is in 0 position, so for statements should be like this:
for(point=0;point<5;point++)
It would make more sense to use structs here rather than a multi-dimensional array.
#include <stdio.h>
typedef struct Currency {
const char* symbol;
double value;
} Currency;
Currency CURRENCIES[] = {
{"GBP", 1.5},
{"EUR", 1.23},
{"JPY", 0.11},
{"CAD", 0.96},
{"AUD", 0.87},
};
size_t NUM_CURRENCIES = sizeof(CURRENCIES) / sizeof(Currency);
int main()
{
size_t index;
for (index = 0; index < NUM_CURRENCIES; index++)
{
printf("%zu %s %.2f\n",
index + 1, CURRENCIES[index].symbol, CURRENCIES[index].value);
}
return 0;
}
It should be
char currencies[3][5][5] = {
because it contains 3 lists containing 5 strings each.
Each string has a max of 4 characters, but you need the additional NUL character, so 5 at the end.
-- EDIT
You have the array access confused. Using your array definition (fixed as above) it would be currencies[data_type][index] to get a string.
data_type = 0 -> the index
data_type = 1 -> the symbol
data_type = 2 -> the value
the first line
{'1','2','3','4','5'},
is redundant.
Fixed code:
char currencies[2][5][5] = {
{"GBP","EUR","JPY","CAD","AUD"},
{"1.5","1.23","0.11","0.96","0.87"}
};
void displayarraycontents()
{
int index;
for(index = 0;index < 5;index++) {
printf("%i %s %s\n", index, currencies[0][index], currencies[1][index]);
}
}
In C/C++ you would normally read your array dimentions from right to left to get a good idea of how the compiler will see it. In this case, you need to store strings of 4 characters each which requires storage for 5 chars (to include the trailing \0) therefore [5] will be the array size. Next you are storing groups of 5 items, therefore the middle value will be [5] and finally, you are storing a total of 3 groups of these items, therefore [3]. The final result of all of this is char currencies[3][5][5] = . . .;
Of course, as replied elsewhere, you need to use the double quotes for string values.
If you want to solve this with multi-dimensional arrays, as #Forrest says, you need [3][5][5]. Look at it this way: in the initializer, find the outermost braces: inside that, on the top level, how many elements are there? 3. Now, each of these elements (one level in), how many elements? 5. Drilling further down, inside each of those, you have a string of 4 elements, plus one for the terminator, again 5.
Second error: you can only ever have one character in single quotes, like 'a'; that's char type, and equivalent to ASCII code (97 in this case). For strings, you have to use double quotes ("abc", which is equivalent to {97, 98, 99, 0}).
Third error: loops. You are not actually iterating over all three loops while printing a string at a time (since printf will actually do one of the loops for you) - so you should only have 2 loops (or, less efficiently, you can keep all three loops, but then print only a character at a time). Also, you need to be aware of the loop limits; you are going up to 5 in each case, but this will give you runtime garbage (in the best case) or runtime crash (in the worst case) when you go out of your [3] dimension. Thus, something like this:
Then again, your innermost loop is inconsistent in your variable usage (copy-paste error).
However, there will almost never be need to write code like this. You mainly use 2D arrays for matrix operations. Something like this should only have a one-dimensional array, storing record elements.
struct currency {
int id;
char[4] symbol;
float value;
} currencies[5];
You don't need to store the indices (1-5) as you can access the array (0-4) and thus know the indices. You can encapsulate the other values in a struct or two seperate arrays which gets your array(s) down to one dimension as it should be... In that way the items have proper types and you don't misuse two-dimensional arrays.
A 2D or 3D area shouldn't be filled with items that should be of a different type, it is needed when you have items that are of the same type and have a logic 2D or 3D structure. The pixels on your screen are a good example of something that needs a 2D structure, coordinates in a 3D graph are a good example of something that needs a 3D structure.

Resources