Selecting an element of an array without specifying an array in awk - arrays

I'd like to select a specific element of an array from a file with awk where the file is not setup specifying every entry as being part of an array. I plan on putting this in a for loop or assigning this as a variable to be used for arithmetic opterations. However, I am finding that I cannot use the way I'm selecting the element of the array when assigning it as a variable or using it in a for loop.
1 2 3 4
5 6 7 8
9 8 7 6
If these elements are not specified in awk as being part of an array, referencing them could be done with
FNR == 1 {print $3}
However, I cannot assign this as a variable to be used later, nor can I put this in a loop.
Is there another way to reference a single element of an array without having to restructure the input file?

You can read the file into an array, then access the array. When accessing the array, use split:
{ array[NR] = $0 }
After the input scanning is complete, array[42] gives you the contents of record #42, usually the 42nd line of the input. We can put in an END { ... } block where we process the array.
To get the third element of array[1], we can do this:
split(array[1], fields)
Now we have an array called fields. fields[3] holds the same datum as $3 held when the first record were being processed which we assigned to array[1].
In Awk we can also simulate two-dimensional arrays, by catenating multiple indices together with some unambiguous separator, like a space or dash.
{ for (i = 1; i <= NF; i++)
array[NR "-" i] = $i }
After this executes for every input record, we can access $3 from record 1 as array["1-3"]. The key 1-3 is a character string.
The expression NR "-" i in the loop body places several expressions next to each other with no operators in between. That denotes string catenation. When NR is 17 and i is 5, we get the string "17-5" and so on.
Since the number of fields per record is variable, we could have another array which gives the NF value for each element of array.
{ nf[i] = NF;
for (i = 1; i <= NF; i++)
array[NR "-" i] = $i }
Now we know that if nf[17] is 5, the fields array["17-1"] through array["17-5"] are valid.

Related

'highlighting' a string according to a given pattern

I have a bunch of strings such as
1245046126856123
5293812332348977
1552724141123171
7992612370048696
6912394320472896
I give the program a pattern which in this case is '123' and the expected output is
1245046126856[123]
52938[123]32348977
1552724141[123]171
79926[123]70048696
69[123]94320472896
To do this I record the indices where the pattern occurs in an array and then I make an empty array of chars and put '[' and ']' according to the indices. So far it works fine but when I have a string such as
12312312312312312123
I should get
[123][123][123][123][123]12[123]
However I cannot find a way to record the indices according to such a case. I use rabin-karp algorithm for pattern matching and the section where I calculate the indeces as to where to put the brackets is as follows
if(j == M){
index[k] = i; //beginning index
index[k+1] = i+M+1; //finishing index
if((k!=0)&&(index[k-1] == index[k])){
index[k]++;
index[k+1]++;
}
if((k!=0)&&(index[k] < index[k-1])){
index[k] = index[k-2]+M+1;
index[k+1] = i-index[k-1]+M+1;
}
k += 2;
}
i is the index where the pattern starts to occur,
j is the index where the algorithm terminates the pattern check (last character of the given pattern),
k is the index of the index array.
M is the length of the pattern
This results in a string (where only the brackets are placed) like this
[ ][ ][ ][ ][ ][ ]
but as you can see, there should be two empty spaces between the last two sets of brackets. How can I adjust way I calculate the indexes so that I can properly place the brackets?
EDIT
thought it was a python question at first so this is a pythonic answer, but it may help as a pseudo code.
this piece of code should help you find all the indexes in the string that holds the pattern.
string = "12312312312312123"
ptrn = "123"
i = 0
indexes = [] //create a dynamic array (it may also be constant size string length/pattern length or just the string length)
while True:
i = string.find(ptrn, i) //get the next index of the pattern in a substring that starts from the last index of last suffix of the pattern.
if i == -1: //if such index inside the original string (the pattern exists).
break
indexes.append(i) //add the found index of the pattern occurrence into the array.
i += len(ptrn) //get to the next position where the pattern may appear not inside another pattern.
print(indexes)
if you would like to have it on every pattern match even if it's inside another match, you can remove the i+=len(ptrn) and replace the while statement with for i in range(0,len(string)): // run for every index of the string - for(int i=0; i<strlen(string); i++)

Printing an empty 2d Array in C

I am writing a program that takes a 2D array and by use of a switch case, you can direct it to fill with random numbers and/or print the array among other things.
Before I fill the array with random numbers, I go to print the array just to see what it would do and I get this:
176185448 1 1 01430232144
32767180624652332767143023216832767
143023216832767 0 11430232192
32767176185344 1 0 14
143023220832767 0 0 0
0 0 0 0 0
This is my code for the print array function and I am passing plist from main:
void PrintArray2D(int plist[M][N])
{
size_t row, column; //counter
for (row = 0; row < M; ++row)
{
for (column = 0; column < N; ++column)
{
printf ("%5d" , plist[row][column]);
}
printf ("\n");
}
}
My Program is working fine otherwise. When I fill with random numbers and then print, my output is correct. The array size is 6 by 5. I'm just curious as to why, anything is being printed at all when it is supposed to be an empty array. And even more curious as to the specific output I am getting.
You are printing the value of uninitialized array. In this case the behavior of program is undefined. You may get any thing, either expected or unexpected result.
The value you are getting may be the some previous value stored at that location (called garbage value). Your program may gives erroneous result.
Are you initialising the array?
If not, you are more than likely getting the remains of whatever was in memory beforehand.
To initialise it to all zeros quickly, wherever it is defined have something like
int list[M][N] = {0};
Just a warning, the zero does not mean set all values to 0. It sets the first elements to the contents of the curly braces. So:
int values[M] = {1,2,3};
Would set the first three numbers to 1,2, and 3 and the remainder to zeros.
Variables in C must be initialized. If you don't initialize a variable (like an array) it contains some non-logic numbers (like address of a memory unit - the value which had been there before in that location of the memory)
The thing you're printing is that! you can't do this in some languages like Java, if you do so, you get an compilation error.
I hope this helps

Comparison of two array elements and calculation

I have an issue with a section of code I wish to write. My problem is based around two arrays and the elements they encompass.
I have two arrays filled with numbers (relating to positions in a string). I wish to select the substrings between the positions. The elements in the first array are the start of the substrings and the elements in the second array are the ends of the substrings.
The code I have supplied reads in the file and makes it a string:
>demo_data
theoemijono
milotedjonoted
dademimamted
String:
theoemijonomilotedjonoteddademimamted
so what I want to happen is to extract the substring
emijonomiloted
emimamted
The code I have written takes the the first element array and compares it with the second array corresponding element and then to ensure that there is no cross over and hence hold the substring to start with emi and end with tedas seen in the provided sequences
for($i=0; $i<=10; $i++)
{
if ($rs1_array[$i] < $rs2_array[$i] && $rs1_array[$i+1] > $rs2_array[$i])
{
my$size= $rs2_array[$i]-$rs1_array[$i]+ 3);
my$substr= substr($seq, $rs1_array[$i],$size);
print $substr."\n";
}
}
Using this code works for the first substring, but the second substring is ignored as the first array has fewer elements and hence the comparison cannot be completed.
UPDATE
Array structures:
#rs1_array = (4, 28);
#rs2_array = (15, 22, 34);
Hi borodin, You were absolutely correct.. I have edited the code now! Thank you for seeing that in relation to the length issue. The reason for the strange offset is that the value in #rs2_array is the start position and it does not take into consideration the remainder of the word "ted" in this case and I require this to complete the string.The Array is built correctly as for the elements in #rs1_array they represent the start position "emi" the #rs2_array elements also hold the start position for each "ted" so as there are 2 emi's and 3 ted's in the string this causes the unbalance.
my #starts = ( 4, 28 );
my #ends = map $_+3, ( 15, 22, 34 );
my $starts_idx = my $ends_idx = 0;
while ($starts_idx < #starts && $ends_idx < #ends) {
if ($starts[$start_idx] > $ends[$ends_idx]) {
++$start_idx;
next;
}
my $length = $ends[$ends_idx] - $starts[$start_idx];
say substr($seq, $starts[$start_idx], $length);
++$ends_idx;
++$start_idx;
}
Which, of course, gives the same output as:
say for $seq =~ /(emi(?:(?!emi|ted).)*ted)/sxg;

C programming: function that keeps track of word lengths & counts?

I'm having trouble articulating what the function is supposed to do, so I think I will just show you guys an example. Say my program opens and scans a text file, which contains the following:
"The cat chased after the rooster to no avail."
Basically the function I'm trying to write is supposed to print out how many 1 letter words there are (if any), how many 2 letter words there are, how many 3 letter words, etc.
"
Length Count
2 2
3 3
5 2
6 1
7 1
"
Here's my attempt:
int word_length(FILE *fp, char file[80], int count)//count is how many total words there are; I already found this in main()
{
printf("Length\n");
int i = 0, j = 0;
while(j < count)
{
for(i = 0; i < count; i++)
{
if(strlen(file[i] = i)
printf("%d\n", i);
}//I intended for the for loop to print the lengths
++i;
printf("Count\n");
while()//How do you print the counts in this case?
}
}
I think the way I set up the loops causes words of the same length to be printed twice...so it'd look something like this, which is wrong. So how should I set up the loops?
"Length Count
2 1
2 2
"
This sounds like homework, so I will not write code for you, but will give you some clues.
To hold several values you will need array. Element with index i will contain counter for words with length i.
Find a way to identify boundaries of words (space, period, beginning of line etc.). Then count number of characters between boundaries.
Increase relevant counter (see tip 1). Repeat.
Some details. You actually want to map one thing to another: length of word to number of such words. For mapping there is special data type, called usually hash(table) or dictionary. But in your case array can perfectly work as a map because you keys are uniform and continues (1,2 ... to some maximum word length).
You can't use a single int to count all of that. You need an array and then in it at position 0 you keep track of how many 1 letter words, at position 1 you accumulate 2 letter words and so on.

mergesort algorithm

First let me say that this is hw so I am looking for more advice than an answer. I am to write a program to read an input sequence and then produce an array of links giving the values in ascending order.
The first line of the input file is the length of the sequence (n) and each of the remaining n lines is a non-negative integer. The
first line of the output indicates the subscript of the smallest input value. Each of the remaining output lines is a triple
consisting of a subscript along with the corresponding input sequence and link values.
(The link values are not initialized before the recursive sort begins. Each link will be initialized to -1 when its sequence value is placed in a single element list at bottom of recursion tree)
The output looks something like this:
0 3 5
1 5 2
2 6 3
3 7 -1
4 0 6
5 4 1
6 1 7
7 2 0
Where (I think) the last column is the subscripts, the center is the unsorted array, and the last column is the link values. I have the code already for the mergeSort and understand how it works I am only just confused and how the links get put into place.
I used vector of structures to hold the three values of each line.
The major steps are:
initialize the indexes and read the values from the input
sort the vector by value
determine the links
sort (back) the vector by index
Here is a sketch of the code:
struct Element {
int index;
int value;
int nextIndex; // link
}
Element V[N + 1];
int StartIndex;
V[i].index = i;
V[i].value = read_from_input;
sort(V); // by value
startIndex = V[0].index;
V[i].nextIndex = V[i + 1].index;
V[N].nextIndex = -1;
sort(V); // by index

Resources