Neural Network Input - artificial-intelligence

Is this possible to give input to ANN in Non-numeric form, like, English text sentence "Eat an Apple daily" specially in Unicode text, and ANN return output in text form only. Answer Like "Yes" "No".
I don't know much about ANN but all the example codes I studied , like XOR , image input , all deal with Numeric input in ANN.
Can you please suggest some code, or links to webpage having examples related to this.
I need to train ANN for some text sentences which are in Unicode text , but don't know how to deal with text input to NN.

What we do to deal with Text input for ANN is to produce a Vector of word. Like you have a dictionary:
An
Apple
Eat
Steak
Zebra
And what you give to the ANN input is: [1, 1, 1, 0, 0] for Eat an apple
[0, 0, 1, 0, 1] for Eat Zebra
You train your network like this and for your output you can just say that a 0 output is "No" and 1 is "Yes".
Be careful, with this solution you will have a input layer very large.

Related

How do I search for most similar sequence in Excel?

I'm hoping to search an excel column for the sequence in it most similar to a sequence I enter.
For instance, in the following example, the sequence I provide is: 1, 2.5, 3.5, 2.5, 1. It's depicted on the following graph as black.
In the column I'm searching, there are a few sequences. The most similar one to mine is colored blue. It goes: 1, 2, 3, 2, 1.
Graph
Do any of you know an excel formula, or series of formulas and steps, that would allow Excel to determine this -- so that when I enter the black sequence, for instance, it will match it with the blue sequence as the most similar one?
Thanks tothis Stack overflow answer, I already know how to search a set of numbers for an exact sequence by using the following formula:
=MATCH([Criteria 1]&[Criteria 2],[Data 1st val]:[Data last val]&[Data 2nd val]:[Data last + 1 val],0)
For instance, if I have the following numbers: 1, 3, 5, 1, 4, and I am hoping to find the sequence, 1, 4, this formula will direct me towards it in that set of numbers.
I ALSO already know how to find the closest match to a number I enter, using this formula (which will make more sense if you look in the example image below): =INDEX($A$1:$A$10,MATCH(MIN(ABS(C1-B1:B10)),ABS(C1-$B$1:$B$10),0))
Example
When I press control+shift+enter, this formula will produce the number 4, indicating row 4, because the number I entered in C1, which was 39, is closest to the number 40, which is located in the 4th row.
So I have both the components -- finding exact sequences, and finding the closest number -- but now the question is, how do I combine these two formulas to show me the closest sequence of numbers, the one which would look most similar if drawn on a graph like in my first example with the blue and black line?
And bonus points if you can help find not only the closest sequence but the closest sequences in order of most similar to least similar.
And once again, I don't need this to be rolled into one formula; I am happy to go through a couple steps and different formulas manually to arrive at the answer.
And if you think this would be better solved in some other way, please let me know! But I do not have any coding experience so I figured Excel would be my best bet.
Thank you so much!!!
Not sure how you exactly have set this up, but if I visualize your graph in a table you could use the below (if one has Microsoft365):
Formula in H2:
=INDEX(SORTBY(B2:F4,MMULT(ABS(B2:F4-B1:F1),SEQUENCE(5,,,0))),1)
With all your data in a single column, below you can find an example for if you'd have sequences of 5.
Formula in C2:
=TRANSPOSE(INDEX(SORTBY(INDEX(A2:A16,SEQUENCE(11,5)-ROUNDDOWN(SEQUENCE(11,5,0,0.2),0)*4),MMULT(ABS(INDEX(A2:A16,SEQUENCE(11,5)-ROUNDDOWN(SEQUENCE(11,5,0,0.2),0)*4)-TRANSPOSE(B2:B6)),SEQUENCE(5,,,0))),1))
If you would want to make this applicable for your dataset from A1:A500 with sequence of 10 numbers:
=TRANSPOSE(INDEX(SORTBY(INDEX(A1:A500,SEQUENCE(COUNT(A1:A500)-9,10)-ROUNDDOWN(SEQUENCE(COUNT(A1:A500)-9,10,0,0.1),0)*9),MMULT(ABS(INDEX(A1:A500,SEQUENCE(COUNT(A1:A500)-9,10)-ROUNDDOWN(SEQUENCE(COUNT(A1:A500)-9,10,0,0.1),0)*9)-TRANSPOSE(B1:B10)),SEQUENCE(10,,,0))),1))
And if will be even better if you had acces to LET() and it will be a piece of cake to just change the range reference:
=LET(X,A2:A500,Y,INDEX(X,SEQUENCE(COUNT(X)-9,10)-ROUNDDOWN(SEQUENCE(COUNT(X)-9,10,0,0.1),0)*9),TRANSPOSE(INDEX(SORTBY(Y,MMULT(ABS(Y-TRANSPOSE(B2:B11)),SEQUENCE(10,,,0))),1)))
EDIT2:
To make it more dynamic you can use:
=LET(W,1,X,A2:A500,Y,11,Z,INDEX(X,SEQUENCE(COUNT(X)-(Y-1),Y)-ROUNDDOWN(SEQUENCE(COUNT(X)-(Y-1),Y,0,1/Y),0)*(Y-1)),TRANSPOSE(INDEX(SORTBY(Z,MMULT(ABS(Z-TRANSPOSE(B2:INDEX(B:B,Y+1))),SEQUENCE(Y,,,0))),W)))
Where "W" is the nth closest match and where "Y" is the length of the sequence, 11 in the example.
My approach would be to calculate a match-value between each color and the input values, like the sum of the differences for each point.
The formula for this is:
=SUM(IF([inputrange]<>"",ABS([inputrange]-[colorrange]),0))
Where [inputrange] is the range of your input (indicated red in the picture below, $C$6:$G$6) and [colorrange] is the range of that color (indicated blue, C2:G2).
The color with the lowest difference is the match:
=VLOOKUP(MIN([matchvalues],[rangeofmatchandcolors],2,0)
Where [matchvalues] is the range of match values (indicated blue in the picture below, Cells A2:A4) and [rangeofmatchandcolors] is both the match values as well as the colors (indicated red, A2:B4)

Input: Sequential right-most digits of format string

I'm trying to design an input form where users can scan a barcode or enter the trailing significant digits. When the users scan the barcode, the format is AAA111000000000000000000012345. The USB scanner acts just like a keyboard. When entering by hand, it's easier to just enter the trailing significant digits (12345) and autofill the rest.
I'm using angular/material design, and am trying to design an input that gracefully handles both. I'd like to preload the input field with AAA111000000000000000000000000, and as the user enters each number, they replace the right-most 0's. This would work with the scanner.
I've tried to search for this, but I don't know how to describe exactly what I'm trying to do succinctly enough.
What UI/UX would work best here?

Ceaser Cipher crack using C language

I am writing a program to decrypt text using ceaser cypher algorithm.
till now my code is working fine and gets all possible decrypted results but I have to show just the correct one, how can I do this?
below is the code to get all decrypted strings.
for my code answer should be "3 hello world".
void main(void)
{
char input[] = "gourz#roohk";
for(int key = 1;x<26;key++)
{
printf("%i",input[I]-x%26);
for(int i = strlen(input)-1;i>=0;i--)
{
printf("%c",input[I]-x%26);
}
}
}
Recall that a Caesar Cipher has only 25 possible shifts. Also, for text of non-trivial length, it's highly likely that only one shift will make the input make sense. One possible approach, then, is to see if the result of the shift makes sense; if it does, then it's probably the correct shift (e.g. compare words against a dictionary to see if they're "real" words; not sure if you've done web services yet, but there are free dictionary APIs available).
Consider the following text: 3 uryyb jbeyq. Some possible shifts of this:
3 gdkkn vnqkc (12)
3 xubbe mehbt (3)
3 hello world (13)
3 jgnnq yqtnf (15)
Etc.
As you can see, only the shift of 13 makes this text contain "real" words, so the correct shift is probably 13.
Another possible solution (albeit more complicated) is through frequency analysis (i.e. see if the resulting text has the same - or similar - statistical characteristics as English). For example, in English the most frequent letter is "e," so the correct shift will likely have "e" as the most frequent letter. By way of example, the first paragraph of this answer contains 48 instances of the letter "e", but if you shift it by 15 letters, it only has 8:
Gtrpaa iwpi p Rpthpg Rxewtg wph dcan 25 edhhxqat hwxuih. Pahd, udg
itmi du cdc-igxkxpa atcviw, xi'h wxvwan axztan iwpi dcan dct hwxui
lxaa bpzt iwt xceji bpzt htcht. Dct edhhxqat peegdprw, iwtc, xh id htt
xu iwt gthjai du iwt hwxui bpzth htcht; xu xi sdth, iwtc xi'h egdqpqan
iwt rdggtri hwxui (t.v. rdbepgt ldgsh pvpxchi p sxrixdcpgn id htt xu
iwtn'gt "gtpa" ldgsh; cdi hjgt xu ndj'kt sdct ltq htgkxrth nti, qji
iwtgt pgt ugtt sxrixdcpgn PEXh pkpxapqat).
The key word here is "likely" - it's not at all statistically certain (especially for shorter texts) and it's possible to write text that's resistant to that technique to some degree (e.g. through deliberate misspellings, lipograms, etc.). Note that I actually have an example of an exception above - "3 xubbe mehbt" has more instances of the letter "e" than "3 hello world" even though the second one is clearly the correct shift - so you probably want to apply several statistical tests to increase your confidence (especially for shorter texts).
Hello to make an attack on caesar cipher more speed way is the frequency analysis attack where you count the frequency of each letter in your text and how many times it appeared and compare this letter to the most appearing letters in English in this link
( https://www3.nd.edu/~busiforc/handouts/cryptography/letterfrequencies.html )
then by applying this table to the letters you can git the text or use this link its a code on get hub (https://github.com/tombusby/understanding-cryptography-exercises/blob/master/Chapter-01/ex1.2.py)
in python for letter frequency last resort answer is the brute force because its more complexity compared to the frequency analysis
brute force here is 26! which means by getting a letter the space of search of letters decrease by one
if you want to use your code you can make a file for the most popular strings in english and every time you decrypt you search in this file but this is high cost of time to do so letter frequency is more better

How can I detect what alphabet is being used in utf8?

I want a quick and dirty way of determining what language the user is writing in. I know that there is a Google API which will detect the difference between French and Spanish (even though they both use mostly the same alphabet), but I don't want the latency. Essentially, I know that the Latin alphabet has a lot of confusion as to what language it is using. Other alphabets, however, don't. For example, if there is a character using hiragana (part of the Japanese writing system) there is no confusion as to the language. Therefore, I don't need to ask Google.
Therefore, I would like to be able to do something simple like say that שלום uses the Hebrew alphabet and こんにちは uses Japanese characters. How do I get that alphabet string?
"Bonjour", "Hello", etc. should return "Latin" or "English" (Then I'll ask Google for the real language). "こんにちは" should return "Hiragana" or "Japanese". "שלום" should return "Hebrew".
I'd suggest looking at the Unicode "Script" property. The latest database can be found here.
For a quick and dirty implementation, I'd try scanning all of the characters in the target text and looking up the script name for each one. Pick whichever script has the most characters.
Use an N-gram model and then give a sufficiently large set of training data. A full example describing this technique is to be found at this page, among others:
http://phpir.com/language-detection-with-n-grams/
Although the article assumes you are implementing in PHP and by "language" you mean something like English, Italian, etc... the description may be implemented in C if you require this, and instead of using "language" as in English, etc. for the training, just use your notion of "alphabet" for the training. For example, look at all of your "Latin alphabet" strings together and consider their n-grams for n=2:
Bonjour: "Bo", "on", "nj", "jo", "ou", "ur"
Hello: "He", "el", "ll", "lo"
With enough training data, you will discover dominant combinations that are likely for all Latin text, for example, perhaps "Bo" and "el" are quite probable for text written in the "Latin alphabet". Likewise, these combinations are probably quite rare in text that is written in the "Hiragana alphabet". Similar discoveries will be made with any other alphabet classification for which you can provide sufficient training data.
This technique is also known as a Hidden Markov model or a Markov chain; searching for these keywords will give more ideas for implementation. For "quick and dirty" I would use n=2 and gather just enough training data such that the least common letter from each alphabet is encountered at least once... e.g. at least one 'z' and at least one 'ぅ' *little hiragana u.
EDIT:
For a simpler solution than N-Grams, use only basic statistical tests -- min, max and average -- to compare your Input (a string given by the user) with an Alphabet (a string of all characters in one of the alphabets you are interested).
Step 1. Place all the numerical values of the Alphabet (e.g. utf8 codes) in an array. For example, if the Alphabet to be tested against is "Basic Latin", make an array DEF := {32, 33, 34, ..., 122}.
Step 2. Place all the numerical values of the Input into an array, for example, make an array INP := {73, 102, 32, ...}.
Step 3. Calculate a score for the input based on INP and DEF. If INP really comes from the same alphabet as DEF, then I would expect the following statements to be true:
min(INP) >= min(DEF)
max(INP) <= max(DEF)
avg(INP) - avg(DEF) < EPS, where EPS is a suitable constant
If all statements are true, the score should be close to 1.0. If all are false, the score should close to 0.0. After this "Score" routine is defined, all that's left is to repeat it on each alphabet you are interested in and choose the one whiich gives the highest score for a given Input.

How do Markov Chain Chatbots work?

I was thinking of creating a chatbot using something like markov chains, but I'm not entirely sure how to get it to work. From what I understand, you create a table from data with a given word and then words which follow. Is it possible to attach any sort of probability or counter while training the bot? Is that even a good idea?
The second part of the problem is with keywords. Assuming I can already identify keywords from user input, how do I generate a sentence which uses that keyword? I don't always want to start the sentence with the keyword, so how do I seed the markov chain?
I made a Markov chain chatbot for IRC in Python a few years back and can shed some light how I did it. The text generated does not necessarily make any sense, but it can be really fun to read. Lets break it down in steps. Assuming you have a fixed input, a text file, (you can use input from chat text or lyrics or just use your imagination)
Loop through the text and make a Dictionary, meaning key - value container. And put all pair of words as keys and the word following as a value.
For example: If you have a text "a b c a b k" you start with "a b" as key and "c" as value, then "b c" and "a" as value... the value should be a list or any collection holding 0..many 'items' as you can have more than one value for a given pair of words. In the example above you will have "a b" two times followed fist by "c" then in the end by "k". So in the end you will have a dictionary/hash looking like this: {'a b': ['c','k'], 'b c': ['a'], 'c a': ['b']}
Now you have the needed structure for building your funky text. You can choose to start with a random key or a fixed place! So given the structure we have we can start by saving "a b" then randomly taking a following word from the value, c or k, so the first save in the loop, "a b k" (if "k" was the random value chosen) then you continue by moving one step to the right which in our case is "b k" and save a random value for that pair if you have, in our case no, so you break out of the loop (or you can decide other stuff like start over again). When to loop is done you print your saved text string.
The bigger the input, the more values you will have for you keys (pair of words) and will then have a "smarter bot" so you can "train" your bot by adding more text (perhaps chat input?). If you have a book as input, you can construct some nice random sentences. Please note that you don't have to take only one word that follows a pair as a value, you can take 2 or 10. The difference is that your text will appear more accurate if you use "longer" building blocks. Start with a pair as a key and the following word as a value.
So you see that you basically can have two steps, first make a structure where you randomly choose a key to start with then take that key and print a random value of that key and continue till you do not have a value or some other condition. If you want you can "seed" a pair of words from a chat input from your key-value structure to have a start. Its up to your imagination how to start your chain.
Example with real words:
"hi my name is Al and i live in a box that i like very much and i can live in there as long as i want"
"hi my" -> ["name"]
"my name" -> ["is"]
"name is" -> ["Al"]
"is Al" -> ["and"]
........
"and i" -> ["live", "can"]
........
"i can" -> ["live"]
......
Now construct a loop:
Pick a random key, say "hi my" and randomly choose a value, only one here so its "name"
(SAVING "hi my name").
Now move one step to the right taking "my name" as the next key and pick a random value... "is"
(SAVING "hi my name is").
Now move and take "name is" ... "Al"
(SAVING "hi my name is AL").
Now take "is Al" ... "and"
(SAVING "hi my name is Al and").
...
When you come to "and i" you will randomly choose a value, lets say "can", then the word "i can" is made etc... when you come to your stop condition or that you have no values print the constructed string in our case:
"hi my name is Al and i can live in there as long as i want"
If you have more values you can jump to any keys. The more values the more combinations you have and the more random and fun the text will be.
The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word. It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it’s said enough. It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text. It then returns to input mode again to let you respond, and so on.
It isn’t very realistic but I hereby challenge anyone to do better in 71 lines of code !! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog. To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.
Its responses are rather impressionistic to say the least ! Also you have to put what you say in single quotes.
I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…
here is the trainer
#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
for word in line.split():
text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
working=[]
check=textset[l]
for w in range(len(text)-1):
if check==text[w] and text[w][-1] not in '(),.?!':
working.append(str(text[w+1]))
follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()
Here is the bot:
#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
if a in successorlist:
return random.choice(successorlist[a])
else:
return 'the'
speech=''
while speech!='quit':
speech=raw_input('>')
s=random.choice(speech.split())
response=''
while True:
neword=nextword(s)
response+=' '+neword
s=neword
if neword[-1] in ',?!.':
break
print response
You tend to get an uncanny feeling when it says something that seems partially to make sense.
You could do like this:
Make a order 1 markov chain generator, using words and not letters.
Everytime someone post something, what he posted is added to bot database.
Also bot would save when he gone to chat and when a guy posted the first post (in multiples of 10 seconds), then he would save the amount of time this same guy waited to post again (in multiples of 10 seconds)...
This second part would be used to see when the guy will post, so he join the chat and after some amount of time based on a table with "after how many 10 seconds the a guy posted after joining the chat", then he would continue to post with the same table thinking "how was the amount of time used to write the the post that was posted after a post that he used X seconds to think about and write"

Resources