get data behind " , " to take phrase by python - database

I want to get data in Column D behind " , " in the end of the sentence from left to right to get phrase in link bio:
[1]:( http://prntscr.com/fye9hi) "here"
Someone cant help me please ....
This is my code but it cant go like i want.
import xlrd
file_location = "C:/Users/admin/DataKH.xlsx"
wb = xlrd.open_workbook(file_location)
sheet = wb.sheet_by_index(0)
print(sheet.nrows)
print(sheet.ncols)
for rows in range(sheet.nrows):
row_0 = sheet.cell_value(rows,0)
from xlwt import Workbook
import xlwt
from xlwt import Formula
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
data = [sheet.cell_value(row,3) for row in range(sheet.nrows)]
data1 = [sheet.cell_value(row, 4) for row in range(sheet.nrows)]
workbook = xlwt.Workbook()
sheet = workbook.add_sheet('test')
for index, value in enumerate(data):
sheet.write(index, 0, value)
for index, value in enumerate(data1):
sheet.write(index, 1 , value)
workbook.save('output.xls')

How about using split(",") method? It returns a list of phrases so you can easily iterate through it though.
#MinhTuấnNgô: I'm confused with xlrd's syntax so I switched to pandas instead.
import pandas as pd
df = pd.read_excel('SampleData.xlsx')
df['Extracted Address'] = pd.Series((cell.split(',')[-1] for cell in df['Address']), index = df.index)
Not sure what you mean by 'getting the data after the comma' but this shows a way to manipulate the cell data.
After you've finished formatting the data, you can export it back to excel using df.to_excel(<filepath>)
For xlrd, you can iterate through a specific column using this syntax:
for row in ws.col(2)[1:]:
This should skip the first row (as taken care of in the case of pandas anyway) and iterate all remaining rows.

Related

How to manipulate array in IronPython retrieved from excel

I am using IronPython to retrieve data from an excel worksheet as follows:
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
from Microsoft.Office.Interop import Excel
filename= RunDir+filename_cfg+".xlsx"
excelApp = Excel.ApplicationClass()
excelApp.Visible = False
workbook = excelApp.Workbooks.Open(filename)
worksheet = workbook.ActiveSheet
# Find rows without comment "#", find columns without "None"
dataRows = [];
for irow in range(1,1000):
if worksheet.Cells[irow,1].Value() == None:
break
elif "#" in worksheet.Cells[irow,1].Value():
continue
else:
dataRows.append(irow)
for icol in range(1,1000):
if worksheet.Cells[dataRows[0],icol].Value() == None:
break
numCol = icol-1
cfgData = []
for irow in dataRows:
cfgData.append(worksheet.Range(worksheet.Cells[irow,1],worksheet.Cells[irow,numCol]).Value())
excelApp.Quit() #or excel.Exit()
However the data comes in as 2D array objects. I dont know how to retrieve the data from these array objects. When i've tried this in the past in a different framework this data would come in as a list and i could organize the data as i saw fit.
Is there anyway to convert this 2D array object to a list? or ensure that i retrieve a list from excel?
Thanks in advance.

Google Sheet Query - Building Reference Array Dynamically

I have multiple tabs in a file and want to merge the same range to a master tab.
I did a Query
=QUERY({source1!AR5:AU;source1!AR5:AU}, "select Col1,Col2,Col3,Col4 where Col1 is not null order by Col1", 0)
But at the end I will have more and more tabs (around 30), and I don't want to change my query manually each time. how can i do?
I saw somewhere that I can use macros to create a function, do you have an idea of the code?
I just want to have something easy where I have added in another tab all my tab names
Try:
function myFunction() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheets = ss.getSheets();
var allSheets = [];
// Append more names if you want to exclude more
// Since 'query' is master sheet, excluded it as well
// Remove 'query' below if you want to include it in the formula
var excludedTabs = ['other', 'random', 'query'];
sheets.forEach(function (sheet){
var sheetName = sheet.getSheetName();
if (!excludedTabs.includes(sheetName)){
allSheets.push(sheetName);
}
});
// Set formula in A1 cell in "query" sheet
// Modify A1 to change the cell
var cell = ss.getSheetByName('query').getRange("A1");
cell.setFormula("=QUERY({" + allSheets.join('!AR5:AU;') + "!AR5:AU}, \"select Col1,Col2,Col3,Col4 where Col1 is not null order by Col1\", 0)");
}
This will set the formula to A1 of the sheet query, feel free to change where to set the formula by changing A1 and query.
You can also add sheets to be excluded. Append it on the excludedTabs array.
Sample Output:
Sample sheets were added to check the new sheet cases.
Expected formula was added. (excluding query, random and other sheets)

Google Sheets Query - build referenced array source dynamically

I have many sheets in my spreadsheet (sheet1,sheet2,sheet3...) and I want to add them all to array, maybe based on any call range? Now I add them manually as below:
=query(
{
INDIRECT("sheet1!$A$3:$V");
INDIRECT("sheet2!$A$3:$V");
INDIRECT("sheet3!$A$3:$V") };
"SELECT Col2, Col3, Col4, ...[etc]")
I want to create any "Settings" sheet and put here all sheets that should be in array, like this:
=query(
{
get_all_sheets_names_from('settings!A1:A100'); // something like this
};
"SELECT Col2, Col3, Col4, ...[etc]")
Is it possible?
My attempts:
https://docs.google.com/spreadsheets/d/1ZMzu6FuVyAJiWfNIHW87OW1Vpg_mM_QdtGs9nq9UXCU/edit#gid=0
I would like the array with data sources to be taken from the G2:G column.
The example in column C shows how this can be done manually. However, I am looking for a solution so that in the query nothing has to be done so that the query can drag an array with the names of the data source from G2:G
1) What i Think
I think it is not possible to use "INDIRECT" in the query parameters, because "INDIRECT" returns a cell reference and the parameters {(); ()} in a query are fixed objects.
An "INDIRECT" on a complete query is not possible either, for the same reason: a query does not return a reference on a cell.
2) Limited soluce
the principle: case1: look in column G the 3rd line (3rd source), if empty then test case 2, otherwise apply the formula with 3 sources.
case 2: if 2nd source is empty then go to case 1, otherwise apply the formula with 2 sources
case 1: if empty then display "no sources" otherwise apply formula with 1 source
Formula
note 1 replace ESTVIDE (fr) by ISBLANK (eng) !!
note 2 : you can test with (G2="source1" and G3="source2),
but it works with G2="source3" and G3="source1"
=SI(ESTVIDE($G$4); SI(ESTVIDE($G$3); SI(ESTVIDE($G$1); "no source(s)";query({((INDIRECT("'"&G2&"'!A1:A5")))};"SELECT Col1")) ;query({(INDIRECT("'"&G2&"'!A1:A5"));(INDIRECT("'"&G3&"'!A1:A5"))};"SELECT Col1")) ;query({(INDIRECT("'"&G2&"'!A1:A5"));(INDIRECT("'"&G3&"'!A1:A5"));(INDIRECT("'"&G4&"'!A1:A5"))};"SELECT Col1"))
Online sheet
https://docs.google.com/spreadsheets/d/1sCwwFjpYKKzzAvVwmbMUWcmHSc1wY52XnHlFdT00A3U/edit?usp=sharing
Limitations
Off course, this is a formula with only 3 sources max !
It will be verry big and uggly with more sources...
Script
macro is the only solution ?
soluce with Macro
append this script,
it gets value sources values from G2:G30 (you need more...put G100..)
it create the formula and put it on H2
it read max 50 value in each source (see A1:A50 in source code)
it's not so hard to understand,
note : managing macro with GSheet is a another problem, if you needs advices, please post a comment.
link to live sheet :
https://docs.google.com/spreadsheets/d/14XaR-UsADUpCUCVWqeg0zCbfGy3CCvnwVxUhozjYocc/edit?usp=sharing
function formula6() {
var spreadsheet = SpreadsheetApp.getActive();
var values=spreadsheet.getRange('G2:G30').getValues();
var acSources="{";
for (var i = 0; (i < values.length) && (values[i]!=""); i++) {
if (i>0) { acSources+=";" }
acSources=acSources+'INDIRECT("'+values[i]+'!A1:A50")';
}
acSources=acSources+"}";
var formula='query('+acSources+';"SELECT Col1")';
spreadsheet.getRange('H2').activate();
spreadsheet.getCurrentCell().setFormula('='+formula);
};
dudes who copy-pasted INDIRECT function into Google Sheets completely failed to understand the potential of it and therefore they made zero effort to improve upon it and cover the obvious logic which is crucial in this age of arrays.
in other words, INDIRECT can't intake more than one array:
=INDIRECT("Sheet1!A:B"; "Sheet2!A:B")
nor convert an arrayed string into active reference, which means that any attempt of concatenation is also futile:
=INDIRECT(MasterSheet!A1:A10)
————————————————————————————————————————————————————————————————————————————————————
=INDIRECT("{Sheet1!A:B; Sheet2!A:B}")
————————————————————————————————————————————————————————————————————————————————————
={INDIRECT("Sheet1!A:B"; "Sheet2!A:B")}
————————————————————————————————————————————————————————————————————————————————————
=INDIRECT("{INDIRECT("Sheet1!A:B"); INDIRECT("Sheet2!A:B")}")
the only possible way is to use INDIRECT for each end every range like:
={INDIRECT("Sheet1!A:B"); INDIRECT("Sheet2!A:B")}
which means that the best you can do is to pre-program your array like this if only part of the sheets/tabs is existant (let's have a scenario where only 2 sheets are created from a total of 4):
=QUERY(
{IFERROR(INDIRECT("Sheet1!A1:B5"), {"",""});
IFERROR(INDIRECT("Sheet2!A1:B5"), {"",""});
IFERROR(INDIRECT("Sheet3!A1:B5"), {"",""});
IFERROR(INDIRECT("Sheet4!A1:B5"), {"",""})},
"where Col1 is not null", 0)
so, even if sheet names are predictable (which not always are) to pre-program 100+ sheets like this would be painful (even if there are various sneaky ways how to write such formula under 30 seconds)
an alternative would be to use a script to convert string and inject it as the formula
A1 would be formula that creates a string that looks like real formula:
=ARRAYFORMULA("=QUERY({"&TEXTJOIN("; "; 1;
FILTER(SNAME(1); SNAME(1)<>SNAME(0))&"!A1:A20")&"}; ""where Col1 is not null""; 0)")
then this script will take the string from A1 cell and it will paste it as valid formula into A2 cell:
function onEdit() {
var sheet = SpreadsheetApp.getActive().getSheetByName('query'); // MASTER SHEET NAME
var src = sheet.getRange("A1"); // COPY STRING FROM
var str = src.getValue();
var cell = sheet.getRange("A2"); // PASTE AS FORMULA INTO
cell.setFormula(str);
}
function SNAME(option) {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet()
var thisSheet = sheet.getName();
if(option === 0){ // ACTIVE SHEET NAME =SNAME(0)
return thisSheet;
}else if(option === 1){ // ALL SHEET NAMES =SNAME(1)
var sheetList = [];
ss.getSheets().forEach(function(val){
sheetList.push(val.getName())
});
return sheetList;
}else if(option === 2){ // SPREADSHEET NAME =SNAME(2)
return ss.getName();
}else{
return "#N/A"; // ERROR MESSAGE
};
}
of course, the script can be changed to onOpen trigger or with custom name triggered from the custom menu or via button (however it's not possible to use the custom function as formula directly)
this will cover all your needs to not edit the formula by adding references if new sheets are added. the only drawback is a recalculation of sheet name script... to do so you need to dismantle A1 formula for example by adding ' before the leading = pressing enter and then removing it to fix the formula
spreadsheet demo

Keras input shape error - passing the whole array not each line

I am loading images from a csv file. The images are 300 x 300 pixels but flattened to 90000. I am getting an error for input shape. I am using tensorflow back end. I have attached an image of my csv file as well as an image of the error. It looks like its passing the whole list of arrays instead of passing each line.
"ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 380 arrays:[array([ 43., 45., 46., ..., 161., 152., 146.]), array([ 211., 222., 224., ..., 212., 213., 213.]), array([ 201., 201., "
csv file
error
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
import csv
import cv2
import re
loaded_images_train = []
loaded_labels_train = []
loaded_images_test = []
loaded_labels_test = []
with open('images_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = np.asarray(row, dtype='float')
loaded_images_train.append(row)
with open('labels_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = str(row)
row = row.strip(',')
loaded_labels_train.append(row)
with open('images_test.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = np.asarray(row, dtype='float')
loaded_images_test.append(row)
with open('labels_test.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
row = str(row)
row = row.strip(',')
loaded_labels_test.append(row)
# load data
x_train = loaded_images_train
y_train = loaded_labels_train
print("Loaded Training Data")
x_test = loaded_images_test
y_test = loaded_labels_test
print("Loaded Testing Data")
model = Sequential()
model.add(Dense(64, input_shape=(90000,), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.fit(x_train, y_train,
epochs=20,
batch_size=128)
#score = model.evaluate(x_test, y_test, batch_size=128)
print(score)
The way you are converting each line with asarray and then feeding keras with a list of arrays is not working.
I've tested your code with a sightly different approach and it did run flawlessly for me with the csv you provided in the comments (changing input_size to 400).
Read all lines from the file to loaded_images_train. It will be a list of lists:
input_size = 90000
with open('images_train.csv') as f:
csvReader = csv.reader(f, lineterminator = '\n')
for row in csvReader:
assert len(row) == input_size
loaded_images_train.append(row)
I've included the assertion following your feedback to my comment.
You can also assert len(row) == output_size for the labels.
On the other hand, if you are pretty sure about the sizes of the rows, you can substitute the loop by a simple:
loaded_images_train = list(csvReader)
Whichever you choose, do the same to test images.
Then do the conversion to numpy.ndarray when declaring x_train:
x_train = np.asarray(loaded_images_train, dtype=float) # you don't really need the quotes here
Finally, printing the shape of the loaded data can help you know that everything is ok. For example:
print("Loaded Training Data", x_train.shape)
The reason why you met the problem is the type of your dataset is list, but the acceptable type for Keras model is only numpy array.
You need to convert the lists to numpy array with np.asarray(loaded_images_train) and make sure the shape of the data is (n,90000).

spark scala typesafe config safe iterate over value of a specific column name

I have found similar post on Stackoverflow. However, I could not solve my issue So, this is why I write this post.
Aim
The aim is to perform a column projection [projection = filter columns] while loading a SQL table (I use SQL Server).
According to the scala cookbook this is the way to filter colums [using an Array]:
sqlContext.read.jdbc(url,"person",Array("gender='M'"),prop)
However, I do not want to hardcode Array("col1", "col2", ...) inside my Scala code this is why I am using a config file with typesafe (see hereunder).
Config file
dataset {
type = sql
sql{
url = "jdbc://host:port:user:name:password"
tablename = "ClientShampooBusinesLimited"
driver = "driver"
other = "i have a lot of other single string elements in the config file..."
columnList = [
{
colname = "id"
colAlias = "identifient"
}
{
colname = "name"
colAlias = "nom client"
}
{
colname = "age"
colAlias = "âge client"
}
]
}
}
Let's focus on 'columnList': The name of the SQL column correspond exatecly to 'colname'. 'colAlias' is a field that I will use later.
data.scala file
lazy val columnList = configFromFile.getList("dataset.sql.columnList")
lazy val dbUrl = configFromFile.getList("dataset.sql.url")
lazy val DbTableName= configFromFile.getList("dataset.sql.tablename")
lazy val DriverName= configFromFile.getList("dataset.sql.driver")
configFromFile is created by myself in another custom class. But this does not matter. The type of columnList is "ConfigList" this type comes from typesafe.
main file
def loadDataSQL(): DataFrame = {
val url = datasetConfig.dbUrl
val dbTablename = datasetConfig.DbTableName
val dbDriver = datasetConfig.DriverName
val columns = // I need help to solve this
/* EDIT 2 march 2017
This code should not be used. Have a look at the accepted answer.
*/
sparkSession.read.format("jdbc").options(
Map("url" -> url,
"dbtable" -> dbTablename,
"predicates" -> columns,
"driver" -> dbDriver))
.load()
}
So all my issue is to extract the 'colnames' values in order to put them in a suitable array. Can someone help me to write the right operhand of 'val columns' ?
Thanks
If you're looking for a way to read the list of colname values into a Scala Array - I think this does it:
import scala.collection.JavaConverters._
val columnList = configFromFile.getConfigList("dataset.sql.columnList")
val colNames: Array[String] = columnList.asScala.map(_.getString("colname")).toArray
With the supplied file this would result in Array(id, name, age)
EDIT:
As to your actual goal, I actually don't know of any option named predication (nor can I find evidence for one in the sources, using Spark 2.0.2).
JDBC Data Source performs "projection pushdown" based on the actual columns selected in the query used. In other words - only selected columns would be read from DB, so you can use the colNames array in a select immediately following the DF creation, e.g.:
import org.apache.spark.sql.functions._
sparkSession.read
.format("jdbc")
.options(Map("url" -> url, "dbtable" -> dbTablename, "driver" -> dbDriver))
.load()
.select(colNames.map(col): _*) // selecting only desired columns

Resources