I'm having problem inserting a .txt file contents into my MySQL DB recursively.
Right now i have a function that reads a .txt file and gets a specific line from the file. It succeeds on inserting a specific line to an List where i then insert it into the MySQL DB but when i try to go from line 1 to n and do each line sepperatly I am truly lost.
I've tried to use the erlang:length() to gather the max lenght of the list and make it reduce by one for each line i insert but it wont work. I've tried something like this.
start() ->
Lines = into_list("Filepath"),
LineNR = erlang:length(Lines),
getLine(LineNR, Lines).
getLine(NewLineNR, List) when NewLineNR >= 0 ->
NewNewLineNR = NewLineNR - 1,
task(NewNewLineNR, List).
task(NewLineNR, List) ->
Line = line_nr( NewLineNR , List),
[List] = string:tokens(Line, ","),
Insert = [List],
%%The insertion in the DB happens here
insert_to_db(Insert),
getLine(NewLineNR, List).
But it crashed and i dont know why. Tips are appreciated! Thanks!
So what happens when NewLineNR = -1? Erlang will freak out because it can't match any of the getLine clauses.
You can do this a little simpler though (assuming line_nr just gets line x out of the list). In Erlang you can use pattern matching to control the program flow, instead of relying on keeping track of which item in the list you are working with:
start() ->
Lines = into_list("Filepath"),
task(Lines).
task([Head | Tail]) ->
TokensList = string:tokens(Head, ","),
%%The insertion in the DB happens here
insert_to_db(TokensList),
task(Tail);
task([]) -> done.
This version uses list head/tail pattern matching to help recurse through the list. Assuming into_list just reads the file into a list of strings, the code does the following:
Read file into list
Tokenize first entry in list
Insert into DB
Repeat 2-3 on remainder of list until list is empty.
Return done atom when list is empty.
Related
First of all I'm new to power query, so I'm taking the first steps. But I need to try to deliver sometime at work so I can gain some breathing time to learn.
I have the following table (example):
Orig_Item Alt_Item
5.7 5.10
79.19 79.60
79.60 79.86
10.10
And I need to create a column that will loop the table and display the final Alt_Item. So the result would be the following:
Orig_Item Alt_Item Final_Item
5.7 5.10 5.10
79.19 79.60 79.86
79.60 79.86 79.86
10.10
Many thanks
Actually, this is far too complicated for a first Power Query experience.
If that's what you've got to do, then so be it, but you should be aware that you are starting with a quite difficult task.
Small detail: I would expect the last Final_Item to be 10.10. According to the example, the Final_Item will be null if Alt_Item is null. If that is not correct, well that would be a nice first step for you to adjust the code below accordingly.
You can create a new blank query, copy and paste this code in the Advanced Editor (replacing the default code) and adjust the Source to your table name.
let
Source = Table.Buffer(Table1),
AddedFinal_Item =
Table.AddColumn(
Source,
"Final_Item",
each if [Alt_Item] = null
then null
else List.Last(
List.Generate(
() => [Final_Item = [Alt_Item], Continue = true],
each [Continue],
each [Final_Item =
Table.First(
Table.SelectRows(
Source,
(x) => x[Orig_Item] = [Final_Item]),
[Alt_Item = "not found"]
)[Alt_Item],
Continue = Final_Item <> "not found"],
each [Final_Item])))
in
AddedFinal_Item
This code uses function List.Generate to perform the looping.
For performance reasons, the table should always be buffered in memory (Table.Buffer), before invoking List.Generate.
List.Generate is one of the most complex Power Query functions.
It requires 4 arguments, each of which is a function in itself.
In this case the first argument starts with () and the other 3 with each (it should be clear from the outline above: they are aligned).
Argument 1 defines the initial values: a record with fields Final_Item and Continue.
Argument 2 is the condition to continue: if an item is found.
Argument 3 is the actual transformation in each iteration: the Source table is searched (with Table.SelectRows) for an Orig_Item equal to Alt_Item. This is wrapped in Table.First, which returns the first record (if any found) and accepts a default value if nothing found, in this case a record with field Alt_Item with value "not found", From this result the value of record field [Alt_Item] is returned, which is either the value of the first record, or "not found" from the default value.
If the value is "not found", then Continue becomes false and the iterations will stop.
Argument 4 is the value that will be returned: Final_Item.
List.Generate returns a list of all values from each iteration. Only the last value is required, so List.Generate is wrapped in List.Last.
Final remark: actual looping is rarely required in Power Query and I think it should be avoided as much as possible. In this case, however, it is a feasible solution as you don't know in advance how many Alt_Items will be encountered.
An alternative for List.Generate is using a resursive function.
Also List.Accumulate is close to looping, but that has a fixed number of iterations.
This can be solved simply with a self-join, the open question is how many layers of indirection you'll be expected to support.
Assuming just one level of indirection, no duplicates on Orig_Item, the solution is:
let
Source = #"Input Table",
SelfJoin1 = Table.NestedJoin( Source, {"Alt_Item"}, Source, {"Orig_Item"}, "_tmp_" ),
Expand1 = ExpandTableColumn( SelfJoin1, "_tmp_", {"Alt_Item"}, {"_lkp_"} ),
ChkJoin1 = Table.AddColumn( Expand1, "Final_Item", each (if [_lkp_] = null then [Alt_Item] else [_lkp_]), type number)
in
ChkJoin1
This is doable with the regular UI, using Merge Queries, then Expand Column and adding a custom column.
If yo want to support more than one level of indirection, turn it into a function to be called X times. For data-driven levels of indirection, you wrap the calls in a list.generate that drop the intermediate tables in a structured column, though that's a much more advanced level of PQ.
Hopefully an easy problem for an experienced SQL person. I have an application which uses SQL Server, and I cannot perform this query in the application, so I'm hoping to back-door it, but I need help.
I have a table with a large list of emails and all its metadata. I'm trying to find email that is only between parties of this one company and flag them.
What I did was search where companyName.com is in To and From and marked a TagField as 1 (I did this through my application's front end).
Now what I need to do is search where any other possible values, ignoring companyName.com exist in To and From where I've already flagged them as 1 in TagField. From will usually just have one value, but To could have multiple, all formatted differently, but all separated by a semi-colon (I will probably have to apply this same search to CC and BCC columns, too).
Any thoughts?
Replace the ; with the empty string. Then check to see if the length changed. If there's one email address, there shouldn't be a ';'. You could also use the same technique to replace the company name with the empty string. Anything left would be the other companies.
select email_id, to_email
from yourtable
where TagField = 1 and len(to_email) <> len(replace(to_email,';',''))
This solution is based on the following thread
Number of times a particular character appears in a string
So I went an entirely different route and exported my data to a CSV and used Python to get to where I needed. Here's the code I used in case anybody needs it. What this returned for me was a list of DocIDs (unique identifiers that were in the CSV) where ever there was an email address in the To field that wasn't from one specific domain. I went into the original CSV and made sure all instances of this domain name were in all lowercase, too.
import csv
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
sub = "domainname"
def findMultipleTo(dict):
for row in reader:
if row['To'].find(";") != -1:
toArray = row['To'].split(';')
newTo = [s for s in toArray if sub not in s]
row['To'] = newTo
else:
row['To'] = 'empty'
with open('location\\newCSV-BCCFieldSearch.csv', 'a') as f:
if row['To'] != "empty" and row['To'] != []:
print(row['DocID'], row['To'], file = f)
else:
pass
with open(file_path) as csvfile:
reader = csv.DictReader(csvfile)
findMultipleTo(reader)
I am following up on my previous question.
Have sorted out a loop to import CSVs, concatenate data and remove duplicates.
files = glob.glob('./A08_csv/A08_B1_T*.csv')
dfs = [pd.read_csv(fp, index_col=[0], parse_dates=[0], dayfirst=True) for fp in files]
df = pd.concat(dfs)
df_purged = df.drop_duplicates(inplace=True)
print df_purged
However df.drop_duplicates(inplace=True) does not work (surely I am missing something) and print returns a void. How can I specify to check the duplicates by index? Adding the column name does not seem to work.
Also, how can I transform this loop into a formula, so I can apply this recursive input to csv with different filenames (i.e something that could work for A08_B1_T*.csv (bedroom) and for A08_KI_T*.csv (kitchen) etc.)?
Do you understand the inplace = True option?
If you do it inplace, it means you will modify df, so don't set the values to df_purged.
You here have two solutions: either you want to keep the 'unpurged' dataframe and you do:
df_purged = df.drop_duplicates()
Either you don't care about keeping it and you do:
df.drop_duplicates(inplace = True)
First option your result dataframe will be df_purged, but in the second it will be df which will be purged since you performed it inplace.
That being said, if you want to purge on your index, if you don't need to keep it, you can reset_index and then drop_duplicates like this:
df_purged = df.reset_index().drop_duplicates(['index']).drop('index',1)
And if you need to keep the index (modulo the dropped lines):
df_purged = df.reset_index().drop_duplicates(['index']).set_index('index')
del df.index.name
(Note that once again deleting the index name is only here for aesthetic)
Would this help?
df.drop_duplicates(['col_name'])
Here is a solution that adds the index as a dataframe column, drops duplicates on that, then removes the new column:
df= df.reset_index().drop_duplicates(subset='Date', 'Time', keep='last').set_index(subset='Date', 'Time')
I have a student who has gathered data in a survey online whereby each response was given a variable, rather than the variable having whatever the response was. We need a scoring algorithm which reads the statements and integrates. I can do this with IF statements per item, e.g.,
if Q1_1=1 var1=1.
if Q1_2=1 var1=2.
if Q1_3=1 var1=3.
if Q1_4=1 var1=4.
Doing this for a 200 item survey (now more like 1000) will be a drag and subject to many typos unless automated. I have no experience of vectors and loops in SPSS, but some reading suggests this is the way to approach the problem.
I would like to run if statements as something like (pseudocode):
for items=1 1 to 30
for responses=1 to 4
if Q1_2_1=1 a=1.
if Q1_2=1 a=2.
if Q1_3=1 a=3.
if Q1_4=1 a=4.
compute newitem(items)=a.
next response.
next item.
Which I would hope would produce a new variable (newitem1 to 30) which has one of the 4 responses for it's original corresponding 4 variable information.
Never written serious spss code before: please advise!
This will do the Job:
* creating some sample data.
data list free (",")/Item1_1 to Item1_4 Item2_1 to Item2_4 Item3_1 to Item3_4.
begin data
1,,,,,1,,,,,1,,
,1,,,1,,,,1,,,,
,,,1,,,1,,,,,1,
end data.
* now looping over the items and constructing the "NewItems".
do repeat Item1=Item1_1 to Item1_4
/Item2=Item2_1 to Item2_4
/Item3=Item3_1 to Item3_4
/Val=1 to 4.
if Item1=1 NewItem1=Val.
if Item2=1 NewItem2=Val.
if Item3=1 NewItem3=Val.
end repeat.
execute.
In this way you run all you loops simultaneously.
Note that "ItemX_1 to ItemX_4" will only work if these four variables are consecutive in the dataset. If they aren't, you have to name each of them separately - "ItemX_1 ItemX_2 ItemX_3 itemX_4".
Now if you have many such item sets, all named regularly as in the example, the following macro can shorten the process:
define !DoItems (ItemList=!cmdend)
!do !Item !in (!ItemList)
do repeat !Item=!concat(!Item,"_1") !concat(!Item,"_2") !concat(!Item,"_3") !concat(!Item,"_4")/Val=1 2 3 4.
if !item=1 !concat("New",!Item)=Val.
end repeat.
!doend
execute.
!enddefine.
* now you just have to call the macro and list all your Item names:
!DoItems ItemList=Item1 Item2 Item3.
The macro will work with any item name, as long as the variables are named ItemName_1, ItemName_2 etc'.
I have no idea what is going on here and it's a little bizzare.
I'm adapting a VBA macro into a VB.net project, and I'm experiencing what I would describe as some extreemly unusual behavior of a method I'm using to pass data around in VB.net. Here's the set up...
I have, for indexing reasons, a collection that consists of all open orders:
Public allOpenOrders As New Collection
Within this collection, I store other collections, indexed by account number, that each contain information about each open order in an array that is three elements long. Here is how I'm populating it:
openOrderData(0) = some information
openOrderData(1) = some information
openOrderData(2) = some information
SyncLock allOpenOrders
If allOpenOrders.Contains(accountNumber) Then
'Already in the collection...
accountOpenOrders = allOpenOrders(accountNumber)
accountOpenOrders.Add(openOrderData)
Else
'Not already in collection
accountOpenOrders = New Collection
accountOpenOrders.Add(openOrderData)
allOpenOrders.Add(accountOpenOrders, AccountNumber)
End If
End SyncLock
Here's the thing, if I place a stop after end synclock and check the collection, I can clearly see that the array with all data is there, plain as day. However, when I move on in my code (this is occuring in another thread after the preceeding code has executed) to retrieve it and write it to a workbook...
If allOpenOrders.Contains(accountNumber) Then
accountOpenOrders = allOpenOrders(accountNumber)
For each openOrderArray In accountOpenOrders
OutputSheet.Cells(1, 1).value = accountNumber
For counter = 0 to 2
OutputSheet.Cells(1, counter + 2).value = openOrderArray(counter)
Next counter
Next openOrderArray
End If
I get the first element of the array in column B, but C and D are blank. Even more puzzling, if I put a stop right after the allOpenOrders.Contains line I can look at the collection and the last two elements of the array are now blank. Most puzzling of all, they aren't just blank, they are blanks, a number of blanks equal in length to the original field I recorded in that element of the array?!
Any ideas are appreciated. I can tell you I'm using the same type of method to load other data in this workbook with no problems. These are also the only instances in which the allOpenOrders collection is touched... I'm so confused by these results.