In Python3 display true if search term is in CSV table - database

I want a user to be able to search for a specific term and if it is in the database a message will be given saying that it is in the database.
This is the code that I have made so far and it runs without errors, but when I enter a search term it always says "We are sorry, but that does not seem to be in our system" and it always says this 3 times. I am using Pycharm as my IDE, but I don't think that this should make a difference.
I am semi new to Python so please be patient with me if I have missed something simple : )
import csv
uIn = input("Please enter a search term: ")
with open('database.csv', 'r') as uFile:
fileReader = csv.reader(uFile, delimiter=',')
for row in fileReader:
if uIn == row[1]: #True
print (uIn + "is in file")
else: #False
print("We are sorry, but that does not seem to be in our system")
----------------------------------------------------------Edit----------------------------------------------------------------
Thank you, #Rahul Bhatnagar, for your answer; it showed me that I needed to switch == with in.
This does answer part of the question, and I have figured out some of the other part, the else statement prints for every row that does not have the uIn inside of it. So say the uIn was on the third row the script would print the #False statement twice and the #True statement once. This is alrightish, but if the uIn is on say the 50th row or the 100th row then there would be 49 or 99 #false statements printed.
I would now say that about 50% of my problem is fixed, the part that is not fixed, is to only print 1 #true or 1 #false statement if it is in the file.

import csv
def search(uIn):
found = false;
with open('database.csv', 'r') as uFile:
fileReader = csv.reader(uFile, delimiter=',')
for row in fileReader:
if uIn in row: #True
found = true;
if(found):
print ("Found the string in file!")
else:
print("We are sorry, but that does not seem to be in our system")
uIn = raw_input("ST: ")
search(uIn)
Where my database.csv file is :
search,terms,go,into,this,file,in,this,format
You're trying to find uIn at position 1 of row in all cases, which is not going to be the situation.

Related

Dealing with errors while parsing strings

I'm tasked with pulling relevent data out of a field which is essentially free text. I have been able to get the information I need 98% of the time by looking for keywords and using CASE statements to break the field down into 5 different fields.
My issue is I can't get around the last 2% because the errors don't follow any logical order - they are mostly misspellings.
I could bypass the field with a TRY CATCH, but I don't like giving up 4 good pieces of data when the routine is choking on one.
Is there any way to handle blanket errors within a CASE statement, or is there another option?
Current code, the 'b' with the commented out section is where it's choking right now:
CASE WHEN #Location = 0 THEN
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN
SUBSTRING(#Comment,#Begin, #Context-#Begin)
ELSE
SUBSTRING(#Comment,#Begin, #Timing-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Duration-#Begin)
END
ELSE SUBSTRING(#Comment,#Begin, #Location-#Begin)
END AS Complaint
,CASE WHEN #Location = 0 THEN ''
ELSE
CASE WHEN #Duration = 0 THEN
CASE WHEN #Timing = 0 THEN SUBSTRING(#Comment,#Location+10, (#CntBegin-11))
ELSE SUBSTRING(#Comment,#Location+10, #Timing-(#Location+10))
END
ELSE SUBSTRING(#Comment,#Location+10, #Duration-(#Location+10))
END
END AS Location
,CASE WHEN #Timing = 0 THEN ''
ELSE
CASE WHEN #CntBegin = 0 THEN
SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#Context)-(#Timing+#TimingEnd))
ELSE
'b'--SUBSTRING(#Comment,#Timing+#TimingEnd, (#Location+#CntBegin-1)-(#Timing+#TimingEnd))
END
END AS Timing
On this statement, which has a comma in an odd spot. I have to reference the comma usually for the #CntBegin, but in this case it's making my (#Location+#CntBegin-1) shorter then the (#Timing+#TimingEnd):
'Pt also presents with/for mild check MGP/MGD located in OU, since 12/2015 ? Stability.'
Please take into account, I'm not necessarily trying to fix this error, I'm looking for a way to handle any error that comes up as who knows what someone is going to type. I'd like to just display 'ERR' in that particular field when the code runs into something it can't handle. I just don't want the routine to die.
Assuming your error is due to the length parameter in SUBSTRING being less than 0. I always alias my parameters using CROSS APPLY and then validate the input before calling SUBSTRING(). Something like this should work:
SELECT
CASE WHEN CA.StringLen > 0 /*Ensure valid length*/
THEN SUBSTRING(#comment,#Timing+#TimingEnd,CA.StringLen)
ELSE 'Error'
END
FROM YourTable
CROSS APPLY (SELECT StringLen = (#Location+#CntBegin-1)-(#Timing+#TimingEnd)) AS CA

How can I make decision for exactly one data set using ID3 decision tree

I'm implementing a program that ask user for their symptoms (whether they have fever, cough, breathing issue) to check if they need COVID test.
I implemented my id3 decision tree, used some dataset in csv file
Now I want the program be like it can prompt user input to enter their symptoms (whether they have fever, cough, breathing issue), and tell them some info
My code is attached down below, the question is when I ran it, the error msg showed up, I think it is because I only have one dataset in my txt file
pandas.errors.EmptyDataError: No columns to parse from file
May I ask how can I fix it or is their a better way to make decision for just one data?
Thank you!
fever = input("Do you have a fever? (Yes or No) ")
cough = input("Do you cough? (Yes or No) ")
breathing_issue = input("Do you have short breating or other breathing issues? (Yes or No) ")
infected = "Yes"
test_sample = fever + "," + cough + "," + breathing_issue + "," +infected
f = open("test.txt", "w")
f.write(test_sample)
# convert to .csv
test_df = pd.read_csv(r'/Users/xxxx/xxxx/xxxx/test.txt', header=None, delim_whitespace=True)
train_df.columns = ['fever', 'cough', 'breating-issue', 'infected']
pd.set_option("display.max_columns", 500) # Load all columns
The reason this occurs is because lines 7-9 read an empty data frame. Here is a minimal reproducible example demonstrating the error:
import pandas as pd
with open("test.txt", "w") as _fh:
_fh.write("yes,no,yes,no")
df = pd.read_csv("test.txt")
print(df)
Output:
Empty DataFrame
Columns: [yes, no, yes.1, no.1]
Index: []
To get a nonempty DataFrame, either the columns need names or pd.read_csv needs to be called with optional argument header=None. Here is a version where column names are written:
import pandas as pd
with open("test.txt", "w") as _fh:
_fh.write("fever,cough,breathing_issues,infected\n")
_fh.write("yes,no,yes,no")
df = pd.read_csv("test.txt")
print(df)
Output:
fever cough breathing_issues infected
0 yes no yes no

How to check df rows that has a difference between 2 columns and then send it to another table to verify information

I’m very new to python and am trying really hard these last few days on how to go through a df row by row, and check each row that has a difference between columns dQ and dCQ. I just said != 0 since there could be a pos or neg value. Now if this is true, I would like to check in another table whether certain criteria are met. I'm used to working in R, where I could store the df into a variable and call upon the column name, I can't seem to find a way to do it in python. I posted all of the code I’ve been playing with. I know this is messy, but any help would be appreciated. Thank you!
I've tried installing different packages that wouldn't work, I tried making a for loop (I failed miserably), maybe a function? I’m not sure where to even look. I've never learned Python, I’m really doing my best watching videos online and reading on here.
import pyodbc
import PyMySQL
import pandas as pd
import numpy as np
conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};"
"Server=***-***-***.****.***.com;"
"Database=****;"
"Trusted_Connection=no;"
"UID=***;"
"PWD=***")
# cur = conn.cursor()
# cur.execute("SELECT TOP 1000 tr.dQ, po.dCQ,
tr.dQ - po.dCQ as diff FROM [IP].[dbo].
[vT] tr (nolock) JOIN [IP].[dbo].[vP] po
ON tr.vchAN = po.vchCustAN WHERE tr.dQ
!= po.dCQ")
# query = cur.fetchall()
query = "SELECT TOP 100 tr.dQ, po.dCQ/*, tr.dQ -
po.dCQ as diff */FROM [IP].[dbo].[vT]
tr (nolock) INNER JOIN [IP].[dbo].[vP] po ON
tr.vchAN = po.vchCustAN WHERE tr.dQ !=
po.dCQ"
df = pd.read_sql(query, conn)
#print(df[2,])
cursor = conn.cursor(PyMySQL.cursors.DictCursor)
cursor.execute("SELECT TOP 100 tr.dQ, po.dCQ/*,
tr.dQ - po.dCQ as diff */FROM [IP].[dbo].
[vT] tr (nolock) INNER JOIN [IP].[dbo].
[vP] po ON tr.vchAN = po.vchCustAN
WHERE tr.dQ != po.dCQ")
result_set = cursor.fetchall()
for row in result_set:
print("%s, %s" % (row["name"], row["category"]))
# if df[3] != 0:
# diff = df[1]-df[2]
# print(diff)
# else:
# exit
# cursor = conn.cursor()
# for row in cursor.fetchall():
# print(row)
#
# for record in df:
# if record[1] != record[2]:
# print(record[3])
# else:
# record[3] = record[1]
# print(record)
# df['diff'] = np.where(df['dQ'] != df["dCQ"])
I expect some sort of notification that there's a difference in row xx, and now it will check in table vP to verify we received this data's details. I believe i can get to this point, if i can get the first part working. Any help is appreciated. I'm sorry if this question is not clear, i will do my best to answer any questions someone may have. Thank you!
One solution could be to make a new column where you store the result of the diff between df[1] and df[2]. One note first. It might be more precise to either name your columns when you make the df, then reference them with df['name1'] and df['name2'], or use df.iloc[:,1] and df.iloc[:,2]. Also note that column numbers start with zero, so these would refer to the second and third columns in the df. The reason to use iloc is and the colons is to explicitly state that you want all rows and and column numbers 1 and 2. Otherwise, with df[1] or df[2] if your df was transposed that may actually refer to what you think of as the index. Now, on to a solution.
You could try
df['diff']=df.iloc[:,1]-df.iloc[:,2]
df['diff_bool']=np.where(df['diff']==0,False, True)
or you could combine this into one method
df['diff_bool']==np.where(df.iloc[:,1]-df.iloc[:,2]==0,False, True)
This will create a column in your df that says if there is a difference between columns one and two. You don't actually need to loop through row by row because pandas functions work like matrix math, so df.iloc[:,1]-df.iloc[:,2] will apply the subtraction row by row automatically.

Select groups of lines in file where value of one field is the same

I'm not sure how to word this question so I'll try my best to explain it:
Lets say I have a file:
100001,ABC,400
100001,EFG,500
100001,ABC,500
100002,DEF,400
100002,EFG,300
100002,XYZ,1000
100002,ABC,700
100003,DEF,400
100003,EFG,300
I want to grab each row and group them together where the first value in each row is the same. So all 100001's go together, all 100002's go together, etc.
I just need help figuring out the logic. Don't need a specific implementation in a language.
Pseudocode is fine.
I assume the lines are in order by COL1.
I assume "go together" means they are concatenated into one line.
The logic with pseudocode:
while not EOF
read line
if not same group
if not first line
print accumulated values
start new group
append values
print the last group
In awk you can test it with the following code:
awk '
BEGIN { FS = ","; x=""; last="";}
{
if ($1 != last) {
if (x != "")
print x;
x=$1;
last=$1;
}
x=x";"$2";"$3;
}
END {print x;} '

VBA function to tell machine not to display certain variables

I was wondering if there was a way to tell excel not to display some variables if the VLookup function couldn't find anything.
Here's roughly what my code does: take some numbers in another excel workbook by looking them up, compare that value from the previous year and take the difference, display that difference in another spreadsheet, all in one big merged cell.
Some excerpts from my code:
cashO = Val(Application.VLookup("cash" & "*", Workbooks("creditreport.csv").ActiveSheet.Range("A1:F199"), 4, False))
Then the difference cash = Round(cashN - cashO, 0)
Then the Display: MergedCell.Value = "Cash increased by" & cash
But I dont want to display cash if it couldnt find cash in the first place (if this is the case, cash = 0 both when cash couldnt be found and when the change is null).
I was thinking of creating an array with all my variables (cash, ...) and then loop through it. But I couldn't find anything online on "if not found dont display anything".
Best,
You can use an if else statement to check the value of cash and write the value if cash has a value and some other message like "no change" or "no previous value" if that's the case.
if cash = 0
MergedCell.Value = "No Change"
Else
MergedCell.Value = "Cash increased by" & cash
End If
Or you could just check if the function returns an error:
if iserror(Application.VLookup("cash" & "*", Workbooks("creditreport.csv").ActiveSheet.Range("A1:F199"), 4, False)) then
cash = 0
else
cashO = Val(iserror(Application.VLookup("cash" & "*", Workbooks("creditreport.csv").ActiveSheet.Range("A1:F199"), 4, False))
....other statements for whatever...
end if
I hope I understood you, that you mean that VLookup didn't find a match for "cash" , if that's the case you need error-handling.
If that's the case, try the code below:
Sub VLookup_Test()
On Error Resume Next
cashO = Val(Application.VLookup("cash" & "*", Workbooks("creditreport.csv").ActiveSheet.Range("A1:F199"), 4, False))
If Err.Number <> 0 Then
MsgBox "cashO not found" '
' Do your actions if cashO not found with VLookup
End If
On Error GoTo 0
End Sub

Resources