Find a string value in an array of strings - arrays

I have an array of strings in a Fortran program. Not a single character string. I know that one of the values in the array is "foo". I want to know the index of the array that contains "foo". Is there a way to find the index other than a brute force loop? I obviously can't use the "minloc" routine since I'm not dealing with numerics here. Again, just to make sure: I am not searching for a substring in a string. I am searching for a string in an array of strings.

implicit none
integer i
character*8 a(100)
do i = 1,100
a(i)='foo'
enddo
a(42)='bar'
call f(a,len(a(1)),shape(a)*len(a(1)),'bar ')
end
subroutine f(c,n,all,s)
implicit none
integer n,all
character*(*) s
character*(all) c
write(*,*)index(c,s)/n+1
end
a.out -> 42
note this code is treating the entire array as one big string and searching for substrings so it will also find matches that are not aligned with the component string boundaries.
eg. a false match occurs with adjacent entries such as:
a(2)='xxbar '
a(3)=' yyy'
Some additional work required to ensure you find an index that is an integer multiple of n ( of course by the time you do that a simple loop might look preferable )

Well, after thinking about it, I came up with this. It works if "foo" is known to be either absent from the array, or located in one and only one place:
character(len=3) :: tags(100)
integer :: test(100)
integer :: str_location
! populate "tags" however needed. Then search for "foo":
test=(/(i,i=1,100)/)
where (tags.ne."foo") test=0
str_location = sum(test)
I am guessing this is actually slower than the brute force loop, but it makes for compact code. I thought about filling "test" with ones and using maxloc, but that doesn't account for the possibility of "foo" be absent from the array. Opinions?

Related

How to write data of a single dimensional array to the next column in a file using Fortran 90?

Basically, I have many single dimensional arrays that I need to write to the same file, but I need to have each array in a separate column, one after the other in the order specified. I would use a "do" loop, but some of the arrays may have more values than the other. Is there a format that I can use in the write statement that will start at the next column instead of continuing after the previous one?
Basically you want a 'write' statement that will figure out whether the value it is writing is beyond upper limit of the array or not and then write a blank if the upper limit is crossed. A format can not handle such investigations. I am doubtful whether there is a shortcut command to what you require. One of the way to do is convert the array into array of characters and where you are going beyond scope of your number array, just leave it blank. Below is the implementation of this idea.
module sub
contains
subroutine tovalue(num_array,dim,value_array)
implicit none
real*8,intent(in)::num_array(:)
integer,intent(in)::dim
character,intent(inout)::value_array*(*)(:,:)
integer::i,n
n = size(value_array,2)
do i=1,n
if (i<size(num_array)) then
write(value_array(dim,i),"(F6.1)") num_array(i)
else
write(value_array(dim,i),"(6X)")
end if
end do
end subroutine
end module
This subroutine can be used in the following way:
program test
use sub
implicit none
real*8::A(10),B(12),C(15)
integer::i,n
character,allocatable::values*6(:,:)
A = (/2,1,4,1,2,4,5,7,4,9/)
B = (/5,7,4,1,5,5,4,9,6,2,1,5/)
C = (/2,4,7,5,9,6,3,2,1,4,5,8,7,4,5/)
n = MAX(size(A),size(B),size(C))
allocate(values(3,n))
call tovalue(A,1,values)
call tovalue(B,2,values)
call tovalue(C,3,values)
print"(3A6)", values(:,1:n)
end program

I need someone to explain extracting these string from a inputbox

delphi 2010
I have a procedure in which the user enters in their name and surname and then i extract the surname and name into two different strings. Can someone please explain the significance of the +1,3 and pos' ' in the code, and when would those values need to be changed?(e.g why is it +1 and not +2) thank you
procedure TForm1.GenerateOnceoffPassword1Click(Sender: TObject);
var
suser, ssurname, sname, spassword : string;
arrpassword : array[1..150] of string;
begin
inc(icounter);
suser := inputbox('Enter name and surname','lower case ONLY','');
ssurname := copy(suser,pos(' ',suser)+1, 3);
sname := copy(suser, 1, pos(' ',suser)-1);
I assume you've looked up the Copy and Pos functions in the OLH or elsewhere. So, dealing with your points in your q and comment:
a. The "+1" in "copy(suser,pos(' ',suser)+1, 3)" means that the call to Copy should start at the first character after the first occurrence of a space character in suser returned by the call to Pos(). If Pos() finds no space in suser, it will return 0, so copying would then start at the first character of suser. See also point 2 below.
b. The "3" means that Copy should copy (at most) 3 characters from where it has been told to start copying by "pos() + 1". I say "at most" because that's how Copy() works and nothing in your code compels the user to enter a string having 3 or more characters after the first space. Seems a bit odd that a surname should be restricted to a maximum of 3 characters, btw.
c. Presumably referring to "1,=1" in your comment, you actually meant "1,=-1" Anyway, The "1" in the second call to Copy() means "start copying from the first character of suser", and the "pos() - 1" means copy at most X characters where X is one less than the value returned by the call to pos(), in other words copy the characters from suser up to one before the first occurrence of a space. If there is no space in suser, this will result in sname being empty.
Be aware that:
When using functions like Pos() and Copy() to split strings up, it's a good idea to get into the habit of using the Trim() function to remove any leading or trailing spaces from the substring(s). In point a. above, your code as written overlooks the possibility that the user might type two (or more) consecutive spaces.
Rather than prompt the user to use lower-case only, it would be better to get into the habit of writing code which works regardless of case. Obviously this isn't an issue with the specific code in your q, but anyway.
Traditionally, strings in Delphi have been 1-based, meaning that, if non-blank, inter alia the string can be accessed as if it were an array with a starting index of 1. Newer versions of the compiler (newer than D2010, that is) for mobile platforms like Android use 0-based strings, which cause the arithmetic of code like yours to be problematic if used unmodified.

matching brackets program in C

I am fairly new to c programming and I have a question to do with a bracket matching algorithm:
Basically, for an CS assignment, we have to do the following:
We need to prompt the user for a string of 1-20 characters. We then need to report whether or not any brackets match up. We need to account for the following types of brackets "{} [] ()".
Example:
Matching Brackets
-----------------
Enter a string (1-20 characters): (abc[d)ef]gh
The brackets do not match.
Another Example:
Enter a string (1-20 characters): ({[](){}[]})
The brackets match
One of the requirements is that we do NOT use any stack data structures, but use techniques
below:
Data types and basic operators
Branching and looping programming constructs
Basic input and output functions
Strings
Functions
Pointers
Arrays
Basic modularisation
Any ideas of the algorithmic steps I need to take ? I'm really stuck on this one. It isn't as simple as counting the brackets, because the case of ( { ) } wouldn't work; the bracket counts match, but obviously this is wrong.
Any help to put me in the right direction would be much appreciated.
You can use recursion (this essentially also simulates a stack, which is the general consensus for what needs to happen):
When you see an opening bracket, recurse down.
When you see a closing bracket:
If it's matched (i.e. the same type as the opening bracket in the current function), process it and continue with the next character (don't recurse)
If it's not matched, fail.
If you see any other character, just move on to the next character (don't recurse)
If we reach the end of the string and we currently have a opening bracket without a match, fail, otherwise succeed.
You are describing a Context-Free language in here that you need to verify if a word is in the language or not.
This means that there is a Context Free Grammar you can create that describes this language.
For this specific language, one can use a deterministic stack automaton to verify if a word is in the language or not (this is not true for every context free langauge, some require non deterministic stack automaton)
Note that you can use recursion to imitate stack, and use the implicit call stack for it.
Other alternative (which is good for all context free languages) is CYK Algorithm, but it's an overkill here.
So you're not allowed to use stacks..but you ARE allowed to use arrays! This is good.
This might be against the rules, but you can mimic a stack with an array. Keep an index to the "next open spot" in the array, and make sure you do all of your insertions / deletions from that index.
My suggestion? parse each character in the string, and use the "stack" described above to determine when to add and remove brackets / parens / curlys.
Here is the easiest way to do it using no regex/complicated language stuff.
The only thing you need is a simple array of maximum length 10 to simulate a stack. You need this to keep track of the last bracket type opened. Every time you open a bracket, you will "push" the bracket type onto the end of the array. Every time you close a bracket, you will "pop" the bracket type off the end of the array if and only if the bracket types match.
Algorithm:
Iterate over each character in the string.
When you encounter an open bracket of any type, append it to your array. If your array is full (i.e. you are already storing 10 open bracket types), and you can't append it, you already know that the brackets do not match and you can end your program.
When you encounter a closed bracket of any type, if the closed-bracket type does not match the last element of your array, you already know that the brackets do not match and you can end the program, printing that they don't match. Else if the closed-bracket type does match the last element of your array, "pop" it off the end of your array.
Finally, if the array is empty at the end of your iteration, then you know that the brackets match.
EDIT: It has been pointed out to me in the comments that this is an explicit stack and that recursion may be a better method of using an implicit stack.
As amit answered, you definitely need some sort of stack. This can be mathematically proven. However, you can avoid using stack data structures in your code by using the compiler's stack mechanism. This requires you to use recursive function calls.

Is there a known O(nm)-time/O(1)-space algorithm for POSIX filename matching (fnmatch)?

Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have proposed (and possibly solved) a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
On the plus side, this means there's a much greater chance that something like 2way/SMOA needle factorization might be applicable to these patterns, which in turn could yield the better-than-originally-desired O(n) or even O(n/m) performance.
In the question title, let m be the length of the pattern/needle and n be the length of the string being matched against it.
This question is of interest to me because all the algorithms I've seen/used have either pathologically bad performance and possible stack overflow exploits due to backtracking, or required dynamic memory allocation (e.g. for a DFA approach or just avoiding doing backtracking on the call stack) and thus have failure cases that could also be dangerous if a program is using fnmatch to grant/deny access rights of some sort.
I'm willing to believe that no such algorithm exists for regular expression matching, but the filename pattern language is much simpler than regular expressions. I've already simplified the problem to the point where one can assume the pattern does not use the * character, and in this modified problem you're not matching the whole string but searching for an occurrence of the pattern in the string (like the substring match problem). If you further simplify the language and remove the ? character, the language is just composed of concatenations of fixed strings and bracket expressions, and this can easily be matched in O(mn) time and O(1) space, which perhaps can be improved to O(n) if the needle factorization techniques used in 2way and SMOA substring search can be extended to such bracket patterns. However, naively each ? requires trials with or without the ? consuming a character, bringing in a time factor of 2^q where q is the number of ? characters in the pattern.
Anyone know if this problem has already been solved, or have ideas for solving it?
Note: In defining O(1) space, I'm using the Transdichotomous_model.
Note 2: This site has details on the 2way and SMOA algorithms I referenced: http://www-igm.univ-mlv.fr/~lecroq/string/index.html
Have you looked into the re2 regular expression engine by Russ Cox (of Google)?
It's a regular expression matching engine based on deterministic finite automata, which is different than the usual implementations (Perl, PCRE) using backtracking to simulate a non-deterministic finite automaton. One of the specific design goals was to eliminate the catastrophic backtracking behaviour you mention.
It disallows some of the Perl extensions like backreferences in the search pattern, but you don't need that for glob matching.
I'm not sure if it guarantees O(mn) time and O(1) memory constraints specifically, but it was good enough to run the Google Code Search service while it existed.
At the very least it should be cool to look inside and see how it works. Russ Cox has written three articles about re2 - one, two, three - and the re2 code is open source.
Edit: WHOOPS! Big admission, I screwed up the definition of the ? in fnmatch pattern syntax and seem to have solved a much harder problem where it behaves like .? in regular expressions. Of course it actually is supposed to behave like . in regular expressions (matching exactly one character, not zero or one). Which in turn means my initial problem-reduction work was sufficient to solve the (now rather boring) original problem. Solving the harder problem is rather interesting still though; I might write it up sometime.
Possible solution to the harder problem follows below.
I have worked out what seems to be a solution in O(log q) space (where q is the number of question marks in the pattern, and thus q < m) and uncertain but seemingly better-than-exponential time.
First of all, a quick explanation of the problem reduction. First break the pattern at each *; it decomposes as a (possibly zero length) initial and final component, and a number of internal components flanked on both sided by a *. This means once we've determined if the initial/final components match up, we can apply the following algorithm for internal matches: Starting with the last component, search for the match in the string that starts at the latest offset. This leaves the most possible "haystack" characters free to match earlier components; if they're not all needed, it's no problem, because the fact that a * intervenes allows us to later throw away as many as needed, so it's not beneficial to try "using more ? marks" of the last component or finding an earlier occurrence of it. This procedure can then be repeated for every component. Note that here I'm strongly taking advantage of the fact that the only "repetition operator" in the fnmatch expression is the * that matches zero or more occurrences of any character. The same reduction would not work with regular expressions.
With that out of the way, I began looking for how to match a single component efficiently. I'm allowing a time factor of n, so that means it's okay to start trying at every possible position in the string, and give up and move to the next position if we fail. This is the general procedure we'll take (no Boyer-Moore-like tricks yet; perhaps they can be brought in later).
For a given component (which contains no *, only literal characters, brackets that match exactly one character from a given set, and ?), it has a minimum and maximum length string it could match. The minimum is the length if you omit all ? characters and count bracket expressions as one character, and the maximum is the length if you include ? characters. At each position, we will try each possible length the pattern component could match. This means we perform q+1 trials. For the following explanation, assume the length remains fixed (it's the outermost loop, outside the recursion that's about to be introduced). This also fixes a length (in characters) from the string that we will be comparing to the pattern at this point.
Now here's the fun part. I don't want to iterate over all possible combinations of which ? characters do/don't get used. The iterator is too big to store. So I cheat. I break the pattern component into two "halves", L and R, where each contains half of the ? characters. Then I simply iterate over all the possibilities of how many ? characters are used in L (from 0 to the total number that will be used based on the length that was fixed above) and then the number of ? characters used in R is determined as well. This also partitions the string we're trying to match into part that will be matched against pattern L and pattern R.
Now we've reduced the problem of checking if a pattern component with q ? characters matches a particular fixed-length string to two instances of checking if a pattern component with q/2 ? characters matches a particular smaller fixed-length string. Apply recursion. And since each step halves the number of ? characters involved, the number of levels of recursion is bounded by log q.
You can create a hash of both strings and then compare these. The hash computation will be done in O(m) while the search in O(m + n)
You can use something like this for calculating the hash of the string where s[i] is a character
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
As you said this is for file-name matching and you can't use this where you have wildcards in the strings. Good luck!
My feeling is that this is not possible.
Though I can't provide a bullet-proof argument, my intuition is that you will always be able to construct patterns containing q=Theta(m) ? characters where it will be necessary for the algorithm to, in some sense, account for all 2^q possibilities. This will then require O(q)=O(m) space to keep track of which of the possibilities you're currently looking at. For example, the NFA algorithm uses this space to keep track of the set of states it's currently in; the brute-force backtracking approach uses the space as stack (and to add insult to injury, it uses O(2^q) time in addition to the O(q) of space).
OK, here's how I solved the problem.
Attempt to match the initial part of the pattern up to the first * against the string. If this fails, bail out. If it succeeds, throw away this initial part of both the pattern and the string; we're done with them. (And if we hit the end of pattern before hitting a *, we have a match iff we also reached the end of the string.)
Skip all the way to end end of the pattern (everything after the last *, which might be a zero-length pattern if the pattern ends with a *). Count the number of characters needed to match it, and examine that many characters from the end of the string. If they fail to match, we're done. If they match, throw away this component of the pattern and string.
Now, we're left with a (possibly empty) sequence of subpatterns, all of which are flanked on both sides by *'s. We try searching for them sequentially in what remains of the string, taking the first match for each and discarding the beginning of the string up through the match. If we find a match for each component in this manner, we have a match for the whole pattern. If any component search fails, the whole pattern fails to match.
This alogorithm has no recursion and only stores a finite number of offsets in the string/pattern, so in the transdichotomous model it's O(1) space. Step 1 was O(m) in time, step 2 was O(n+m) in time (or O(m) if we assume the input string length is already known, but I'm assuming a C string), and step 3 is (using a naive search algorithm) O(nm). Thus the algorithm overall is O(nm) in time. It may be possible to improve step 3 to be O(n) but I haven't yet tried.
Finally, note that the original harder problem is perhaps still useful to solve. That's because I didn't account for multi-character collating elements, which most people implementing regex and such tend to ignore because they're ugly to get right and there's no standard API to interface with the system locale and obtain the necessary info to get them. But with that said, here's an example: Suppose ch is a multi-character collating element. Then [c[.ch.]] could consume either 1 or 2 characters. And we're back to needing the more advanced algorithm I described in my original answer, which I think needs O(log m) space and perhaps somewhat more than O(nm) time (I'm guessing O(n²m) at best). At the moment I have no interest in implementing multi-character collating element support, but it does leave a nice open problem...

Arrays and derived types

For my new project, I have to use an array instead of a scratch file to store information from users. To do this, I need to create derived types, too.
However, I haven't understood what an array is and what a derived type is, how to use them, what they can do, and some other basic ideas.
Can anyone give me some information about array and derived types?
I wrote code for them, but I don't know it is written correctly.
If anyone can check this for me, I would appreciate it.
Here are my array and derived types:
! derived type
TYPE Bank
INTEGER :: acNumber, acChecks
REAL :: acBlance, acRate
CHARACTER :: acType*1, acLName*15, acFName*15
END TYPE
! array
INTEGER, PARAMETER :: MaxRow, MaxColum = 7
INTEGER, DIMENSION(MaxRow:MaxColum) :: AccountData
If you are a fortran programmer you have probably seen a subroutine accepting 10/15 arguments. If you think about it, it's insane (they are too many, you run the risk of swapping them) and you quickly realize that some arguments always travel together.
It would make sense to pack them under a single entity that carries everything around as a whole, non as independent entities. This would reduce the number of arguments considerably, giving you only the burden to find proper association. This single entity is the type.
In your code, you say that a Bank is an aggregate of those informations. You can now declare a concrete variable of that type, which will represent and provide access to the single variables acNumber, acChecks and so on. In order to do so, you have to use the % symbol. so if your bank variable is called b, you can say for example
b%acNumber = 5
You can imagine b as a closet, containing different shelves. You move the closed, all the shelves and their content move together.
An array is a group of entities of the same type (say, integer, or Character(len=1024), or Bank) and they are one after another so you can access each of them with a numeric index. Remember that, unless specified differently, arrays indexes in fortran start at 1 (in all the other major languages, the first index is zero instead)
As for your code, I suggest you to:
write
INTEGER, DIMENSION(MaxRow:MaxColum) :: AccountData
as
INTEGER :: AccountData(MaxRow,MaxColum)
it is the same, but you write less. Please also note that there is a difference between using the : and the ,. If you want to define a matrix (your case), which is a two-dimension array, you have to use the comma. What you wrote is wrong.
for the strings, it's better if you write
CHARACTER :: acType*1, acLName*15, acFName*15
as
CHARACTER(LEN=1) :: acType
CHARACTER(LEN=15) :: acLName
CHARACTER(LEN=15) :: acFName
in this case, you write more, but your syntax is deprecated (I could be wrong, though)
Also, remember that it's better if you write one member variable per line in the types. It's a matter of taste, but I prefer to see the full size of a type by having one line per member variable.
For MaxRows and MaxColumns, I would write them as MAX_ROWS and MAX_COLUMNS. Parameters and stuff that is highly constant by tradition is identified with an all capital, underscore separated name in any major language.
Edit: to answer your comment, here is an example of the use of an array
$ more foo.f90
program test
integer :: myarray(10)
myarray = 0 ! equivalent to zeroing the single elements one by one
myarray(2) = 5
myarray(7) = 10
print *, myarray
end program
$ g95 foo.f90 -o foo
$ ./foo
0 5 0 0 0 0 10 0 0 0
an array is just like multiple variables with the same name, identified by an index. Very useful to express vectors, or matrices.
You can of course do an array of an aggregated type you define, instead of a predefined type (eg. integer).
An array is an ordered list of variables, all of the same type, indexed by integers. See Array in Wikipedia Note that in Fortran array indexing is more flexible than most other low level languages, in that instead of a single index per dimension, you can have an index triplet consisting of lower bound, upper bound, and stride. In that case the lvalue of the expression is a subarray rather than a single element of the array type.
A derived type is a composite type defined by the users, which is made up of multiple components which can be of different types. In some other languages these are knows as structs, structure types, or record types. See Record in Wikipedia
You can also make an array of a derived type, or you can have a derived type where one or more components are themselves arrays, or for that matter, other derived types. It's up to you!
The easiest way to check your code is to try to compile it. Making it past the compiler is of course no guarantee that the program works as expected, but it certainly is a required step.

Resources