How can I break up a list in prolog, given a pivot? - arrays

I'm working on a small project to lean prolog. What I'm trying to do right now is, given a sentence, return a list of words. So, I'm taking in a character array, e.g. "highs and lows", and trying to split it up into "highs" "and" "lows". I'm using a character array because I want to play with the words themselves, and I don't think strings work for that.
Here's my code.
get_first_word([], _, []):-
!.
get_first_word(X, Pivot, Input):-
append(X, [Pivot|_], Input),
!.
split_at([],_).
split_at(Input, Pivot):-
get_first_word(X, Pivot, Input),
writef(X),
append(X, Y, Input),
split_at(Y, Pivot).
The problem I'm getting is that this turns into an infinite loop. Eventually it'll pass itself empty input, and my base case isn't well-written enough to handle this. How do I fix this?

I think that get_first_word misses an argument: it should 'return' both the word and the rest, accounting for the possibility that Pivot doesn't appear in input.
I've also moved arguments to follow the conventional 'input at begin, output at end'.
get_first_word(Input, Pivot, Word, Rest):-
append(Word, [Pivot|Rest], Input), !.
get_first_word(Input, _Pivot, Input, []).
split_at([], _).
split_at(Input, Pivot):-
get_first_word(Input, Pivot, W, Rest),
writef(W),nl,
split_at(Rest, Pivot).
test:
?- split_at("highs and lows", 0' ).
highs
and
lows
true .

If you use SWI-Prolog, it is worth considering using atoms to represent sentences, words, parts of words and so on. As you can see here, you problem becomes (if your sentence is an atom):
?- atomic_list_concat(Ws, ' ', 'highs and lows').
Ws = [highs, and, lows].
There are further useful predicates, for example atom_concat/3 (we can say it is append/3 for atoms), or sub_atom/5 which can be useful in multiple ways.
As a side note, SWI-Prolog has no artificial limit on the length of atoms and actually recommends using atoms instead of strings or character code lists.

When describing lists (in this case: lists of character codes, which is what your strings are), always also consider using DCGs. For example:
string_pivot_tokens(Cs, P, Ts) :- phrase(tokens(Cs, P, []), Ts).
tokens([], _, Ts) --> token(Ts).
tokens([C|Cs], P, Ts) -->
( { C == P } -> token(Ts), tokens(Cs, P, [])
; tokens(Cs, P, [C|Ts])
).
token([]) --> [].
token([T|Ts]) --> { reverse([T|Ts], Token) }, [Token].
Example:
?- string_pivot_tokens("highs and lows", 0' , Ts), maplist(atom_codes, As, Ts).
Ts = [[104, 105, 103, 104, 115], [97, 110, 100], [108, 111, 119, 115]],
As = [highs, and, lows] ;
false.

Related

Square brackets operator in Matlab without comma between its two values

I'm having a hard time figuring out what this code does, because googling square brackets doesn't yield appropriate results for the way the search engine works.
id2 is a 1x265 array (so basically a 1d vector with 265 values)
m is a 1x245 array (so basically a 1d vector with 245 values)
id2 = id2([m m(end)+1]);
For what I've seen so far, there always is a comma between the first and second value in the square brackets.
If it was
id2 = id2[m, m(end)+1]
In my little Matlab experience I would have known its meaning but this is not the case, never seen this one before.
The square brackets are also enclosed in brackets ( ) after id2 so this makes me think that
id2 = id2([m m(end)+1]) and id2 = id2[m, m(end)+1] are two completely different things.
Can you explain me what that code does please?
[1, 2] and [1 2] are equivalent. Either a comma or a space can denote element separation when building arrays using square brackets.
Indexing, using parentheses (), has to be done using commas: A(3,1), not A(3 1). The same holds for argument lists in functions: mean(A,[],1) needs commas to separate the various parameters.
id2 = id2([m m(end)+1]); should then be clear: you build an array [m m(end)+1], i.e. you take m and add one extra element, m(end)+1, to its end. These should be integers, presumably, since the look like they are indexing into id2. Given the above, id2 = id2([m, m(end)+1]); is exactly equivalent.
I can recommend reading this post on the various ways of indexing in MATLAB.

Best way to print array/list in prolog without commas, brackets, or spacing?

So I am just getting started and fooling around with prolog.
Suppose I have a list of numbers as such:
X = [0,1,0,1,1,1,1,0]
and want to print those numbers to the screen without commas, spacing, or newlines like this:
?- write(X).
01011110
So far I have tried using write(X). which just prints the array, and I have fooled around with print_term using the pprint module but haven't had any success.
Right now i have a method to create a grid of 0's and 1's as such:
grid(0,[]).
grid(X,Y) :-
X > 0,
X1 is X-1,
random(0, 2, U),
Y = [U|T],
grid(X1,T).
The above method works as intended, just not getting the output i desire on printing. If it changes things, I do intend to turn this into a 2d grid eventually.
Assuming all elements in the input are indeed numbers (or atoms), you can use builtin predicate atomic_list_concat/2 :
?- X = [0,1,0,1,1,1,1,0], atomic_list_concat(X,Y), write(Y).
01011110
If it were a matrix represented as a list of lists you could do something like this:
print_matrix([]).
print_matrix([Row|Rows]):-
atomic_list_concat(Row, TRow),
writeln(TRow),
print_matrix(Rows).
sample test:
?- print_matrix([[1,0,0,1], [0,1,1,0]]).
1001
0110

Regex with no 2 consecutive a's and b's

I have been trying out some regular expressions lately. Now, I have 3 symbols a, b and c.
I first looked at a case where I don't want 2 consecutive a's. The regex would be something like:
((b|c + a(b|c))*(a + epsilon)
Now I'm wondering if there's a way to generalize this problem to say something like:
A regular expression with no two consecutive a's and no two consecutive b's. I tried stuff like:
(a(b|c) + b(a|c) + c)* (a + b + epsilon)
But this accepts inputs such as"abba" or "baab" which will have 2 consecutive a's (or b's) which is not what I want. Can anyone suggest me a way out?
If you can't do a negative match then perhaps you can use negative lookahead to exclude strings matching aa and bb? Something like the following (see Regex 101 for more information):
(?!.*(aa|bb).*)^.*$
I (think I) solved this by hand-drawing a finite state machine, then, generating a regex using FSM2Regex. The state machine is written below (with the syntax from the site):
#states
s0
s1
s2
s3
#initial
s0
#accepting
s1
s2
s3
#alphabet
a
b
c
#transitions
s0:a>s1
s0:b>s2
s0:c>s3
s1:b>s2
s1:c>s3
s2:a>s1
s2:c>s3
s3:c>s3
s3:a>s1
s3:b>s2
If you look at the transitions, you'll notice it's fairly straightforward- I have states that correspond to a "sink" for each letter of the alphabet, and I only allow transitions out of that state for other letters (not the "sink" letter). For example, s1 is the "sink" for a. From all other states, you can get to s1 with an a. Once you're in s1, though, you can only get out of it with a b or a c, which have their own "sinks" s2 and s3 respectively. Because we can repeat c, s3 has a transition to itself on the character c. Paste the block text into the site, and it'll draw all this out for you, and generate the regex.
The regex it generated for me is:
c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+(a+cc*a+(b+cc*b)(cc*b)*(a+cc*a))(cc*a+(b+cc*b)(cc*b)*(a+cc*a))*(c+cc*(c+$+b+a)+(b+cc*b)(cc*b)*(c+cc*(c+$+b+a)+$+a)+b+$)+b+a
Which, I'm pretty sure, is not optimal :)
EDIT: The generated regex uses + as the choice operator (usually known to us coders as |), which means it's probably not suitable to pasting into code. However, I'm too scared to change it and risk ruining my regex :)
You can use back references to match the prev char
string input = "acbbaacbba";
string pattern = #"([ab])\1";
var matchList = Regex.Matches(input, pattern);
This pattern will match: bb, aa and bb. If you don't have any match in your input pattern, it means that it does not contain a repeated a or b.
Explanation:
([ab]): define a group, you can extend your symbols here
\1: back referencing the group, so for example, when 'a' is matched, \1 would be 'a'
check this page: http://www.regular-expressions.info/backref.html

Algorithm - check if any string in an array of strings is a prefix of any other string in the same array

I want to check if any string in an array of strings is a prefix of any other string in the same array. I'm thinking radix sort, then single pass through the array.
Anyone have a better idea?
I think, radix sort can be modified to retrieve prefices on the fly. All we have to do is to sort lines by their first letter, storing their copies with no first letter in each cell. Then if the cell contains empty line, this line corresponds to a prefix. And if the cell contains only one entry, then of course there are no possible lines-prefices in it.
Here, this might be cleaner, than my english:
lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]
line_lines = [(line, line) for line in lines]
def find_sub(line_lines):
cells = [ [] for i in range(26)]
for (ine, line) in line_lines:
if ine == "":
print line
else:
index = ord(ine[0]) - ord('a')
cells[index] += [( ine[1:], line )]
for cell in cells:
if len(cell) > 1:
find_sub( cell )
find_sub(line_lines)
If you sort them, you only need to check each string if it is a prefix of the next.
To achieve a time complexity close to O(N2): compute hash values for each string.
Come up with a good hash function that looks something like:
A mapping from [a-z]->[1,26]
A modulo operation(use a large prime) to prevent overflow of integer
So something like "ab" gets computed as "12"=1*27+ 2=29
A point to note:
Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.
Steps:
Compute hash value for each string
Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
Report answer

SWI Prolog Array Retrive [Index and Element]

I'm trying to program an array retrieval in swi-prolog. With the current code printed below I can retrieve the element at the given index but I also want to be able to retrieve the index[es] of a given element.
aget([_|X],Y,Z) :- Y \= 0, Y2 is (Y-1), aget(X,Y2,Z).
aget([W|_],Y,Z) :- Y = 0, Z is W.
Example 1: aget([9,8,7,6,5],1,N) {Retrieve the element 8 at index 1}
output: N = 9. {Correct}
Example 2: aget([9,8,7,6,5],N,7) {retrieve the index 2 for Element 7}
output: false {incorrect}
The way I understood it was that swi-prolog would work in this way with little no additional programing. So clearly I'm doing something wrong. If you could point me in the right direction or tell me what I'm doing wrong, I would greatly appreciate it.
Your code it's too procedural, and the second clause it's plainly wrong, working only for numbers.
The functionality you're looking for is implemented by nth0/3. In SWI-Prolog you can see the optimized source with ?- edit(nth0). An alternative implementation has been discussed here on SO (here my answer).
Note that Prolog doesn't have arrays, but lists. When an algorithm can be rephrased to avoid indexing, then we should do.
If you represent arrays as compounds, you can also use the ISO standard predicate arg/3 to access an array element. Here is an example run:
?- X = array(11,33,44,77), arg(2,X,Y).
X = array(11, 33, 44, 77),
Y = 33.
The advantage over lists is that the compound access needs O(1) time and whereas the list access needs O(n) time, where n is the length of the array.

Resources