Unpacking set in string format in python only returns first value - string-formatting

I have converted a DataFrame Column into a set, and I am trying to format the values into a string using the * to unpack it like a list. However, it only returns the first value.
I am using the python-docx to automatically create reports based on the data.
This code selects a column of a DataFrame, drops blank values and converts it into a set. The idea is to eliminate duplicates. The next step uses the format function to enter the set into a string or the report:
set_unique_statgroup = set(self.internal_df.StatGroup.dropna())
self.document.add_paragraph("{} categories have been found, and they are: {}".format(len(set_unique_statgroup), *set_unique_statgroup)
The code returns the following sentence:
"12 categories have been found, and they are: Temperature"
I was hoping it would display all of the items in the set:
"12 categories have been found, and they are: Temperature, Mood, Time of Day (...)"

I have found a workaround, possibly not the most pythonic:
Use a loop and the add_run function to add to the paragraph for each item in the set:
for item in set_unique_statgroup:
p.add_run("{}".format(item))
p.add_run(".")
If anybody has a more compact/pythonic way of doing it, please feel free to post.

Related

How to add values in cells ONLY when other columns contain data from a query result

Link to example file:
https://docs.google.com/spreadsheets/d/1dCQSHWjndejkyyw-chJkBjfHgzEGYoRdXmPTNKu7ykg/edit?usp=sharing
The tab "Source data" contains the data to be used in the query on the tab "Query output". The tab "Desired result" shows what I would like the end result to look like.
The goal I'm trying to achieve is to have the formula in cell A2 on the tab "Query output" to populate the data in all four of the columns, so that it looks exactly like the "Desired result" tab. I know I can get the same result simply by entering additional formulas in C2 and D2, but this is not the objective, I need the results to come specifically from the single formula in A2.
The information in the "Additional data 1" column should simply repeat the word "Test" for every row that contains data in the first two columns. The information in the "Additional data 2" column should simply repeat the data from cell 'Source data'!A1 for every row that contains data in the first two columns.
Please feel free to edit the example file as it only contains dummy data. If you like, you can copy the tab "Query output" to create your own working formula for illustrative purposes.
EDIT:
I'm thinking along the lines of creating an array that consists of the required data for the columns "Additional data 1" and "Additional data 2" and then combining that array with the array of the query result which provides the first two columns. I've been experimenting with this in various ways, but so far the only result I have achieved is an error on the first cell of the query results. I also have no idea yet how I could make sure that the second array contains an equal amount of rows to the query result.
You can add static data into query:
=QUERY('Source data'!A3:B,"SELECT A,B, 'Test', '" & 'Source data'!A1 &"' WHERE A IS NOT NULL LABEL A '', B '', 'Test' '', '" & 'Source data'!A1 &"' ''")
Many thanks to #basic for the provided assistance! The insights were a great help to solving my issue. That said, I have muddled along a bit, and I've come up with a slightly different solution which I find better suited as it gives true blank values instead of a column filled with spaces.
First of all, instead of querying directly on the source data, I built an array and queried on that. I used the two existing columns (A and B) from the source data and added a third column to the array which does not exist in the source data. In order to make sure that the third column would consist of blank values, I used the IFERROR formula.
=IFERROR(0/0)
The formula above returns a blank because dividing by zero forces an error and the IFERROR method returns a blank unless an alternative return value is specified.
In order to be able to use this formula in an array however, it had to be tweaked slightly, because as it is it would only return a single blank cell value instead of a column of blank values. To do this, I used an already existing column from the source data, and then encapsulated it in an ARRAYFORMULA.
=ARRAYFORMULA(IFERROR('Source data'!A3:A/0))
Using this, the resulting array has the following formula.
=ARRAYFORMULA({'Source data'!A3:A,'Source data'!B3:B,IFERROR('Source data'!A3:A/0)})
This creates an array consisting of the two original columns A and B from the source data, plus an additional third column filled with blank values. This array can now be queried upon, and using the tricks previously provided by #basic the desired result as specified in the original question can be achieved.
Due to the query now being used upon a user-defined array, the columns in the SELECT statement now have to be referred to as Col1, Col2, Col3, instead of A, B, C. The final formula now looks like this.
=QUERY(ARRAYFORMULA({'Source data'!A3:A,'Source data'!B3:B,IFERROR('Source data'!A3:A/0)}),"SELECT Col1,Col2,'Test',Col3,'"&'Source data'!A1&"' WHERE Col1 IS NOT NULL LABEL 'Test' '','"&'Source data'!A1&"' ''")
I hope this information may prove of use to someone else as well.

Excel INDEX MATCH retrieves information randomly

I'm have an issue that the Excel INDEX (with double MATCH) funtion retrieves data what it seems randomly. I have the following data set (Imgur).
What it needs to do: Display the pax count on the schedule at the correct position if a match between the gatenumber (columA) and the time (row1) is found in the data list.
I used: {=IFERROR(INDEX($I$54:$I$91,MATCH($A2&I$1,$H$54:$H$91&$G$54:$G$91,0)),"")}
The issue: some values are retrieved, some not (marked in gray) without any differences in the function/data.
Remarks:
All cell formats are the same.
All cells are set as an array {}.
Increasing the Index Array to all 3 columns with the data and adding [column_num] 3 does not help.
If I change values like the time in the list some move to the correct new position, some not.
Software version is (Excel) Professional Plus 2013.
Any help on what the cause of this problem could be, a solution or an alternative method would be greatly appreciated!

return array of #NA

I have a list of item numbers.Some of them don't have details associated with them. I would like the list of item numbers that don't have info associated. they can be identified with #N/A error.
I'm running excel 2007.
i am using this array formula to return the associated details. which are in column A
=IF(ISERROR(VLOOKUP(J12,A:H,{2,3,4,5,6,7,8},FALSE)),"",VLOOKUP(J12,A:H,{2,3,4,5,6,7,8},FALSE))
if the lookup can't find the associated item number in column a it returns blanks, otherwise it returns the associated data.
the ones that error, i need a list of those.
is there a formula or a vba macro to get this information?
thanks for your time
Ian
As XORLX said: why you are using {values} as its picking items from column B?
Anyway you can change your formula with
=IFERROR(VLOOKUP(J12,A:H,{2,3,4,5,6,7,8},FALSE),"N/A")
So in case of error it will give N/A which you can later filter.
But I think you want result as in this pic
Sample File
Where
K6=IFERROR(INDEX(A:H,MATCH(J6,A:A,0),(IF(INDIRECT("B"&MATCH(J6,A:A,0))<>"",2,IF(INDIRECT("C"&MATCH(J6,A:A,0))<>"",3,IF(INDIRECT("D"&MATCH(J6,A:A,0))<>"",4,IF(INDIRECT("E"&MATCH(J6,A:A,0))<>"",5,IF(INDIRECT("F"&MATCH(J6,A:A,0))<>"",6,IF(INDIRECT("G"&MATCH(J6,A:A,0))<>"",7,IF(INDIRECT("H"&MATCH(J6,A:A,0))<>"",8))))))))),"N/A")
In my example I am looking for Joy which is in Row 2.
Now after finding Row 2 it will go and check 2nd Column B which is empty so it will go for 3rd C and so on and when will return the data from the 1st column and if find nothing then will return error.

Remove Duplicate adjacent Sub-String from String in Microsoft SQL Server

I am using SQL Server 2008 and I have a column in a table, which has values like below. It basically shows departure and arrival information.
-->Heathrow/Dublin*Dublin/Heathrow
-->Gatwick/Liverpool*Liverpool/Carlisle *Carlisle/Gatwick
-->Heathrow/Dublin*Liverpool/Heathrow
(The 3rd example shown above is slightly different where the person did not depart from Dublin, instead departed from a Liverpool).
This makes the column too lengthy, and I want to remove only the adjacent duplicates, so the information can be shown like below:
-->Heathrow/Dublin/Heathrow
-->Gatwick/Liverpool/Carlisle/Gatwick
-->Heathrow/Dublin***Liverpool/Heathrow
So, this would still show the correct travel route, but omits only the contiguous duplicates. Also, in the 3rd case, since the departure and arrival information location is not the same, Iwould like to show it as ***.
I found a post here that removes all duplicates (Find and Remove Repeated Substrings) but this is slightly different from the solution that I need.
Could someone share any thoughts please?
The first step is to adapt the process defined in the following link so that it splits based on /:
T-SQL split string
This returns a table which you would then loop through checking if the value contains an *. In that case you would get the text values before and after the * and compare them. Use CHARINDEX to get the position of the *, and SUBSTRING to get the values before and after. Once you have those check both values and append to your output string accordingly.
So you have a database column that contains this text string? Is your concern to display the data to the user in a new format, or to update the data in your database table with a new value?
Do you have access to the original data from which this text string was built? It would probably be easier to re-create the string in the format you desire than it would be to edit the existing string programmatically.
If you don't have access to this data, it would probably be a lot simpler to update your data (or reformat it for display) if you do the string manipulation in a high-level language such as c# or java.
If you're reformatting it for display, write the string manipulation code in whatever language is appropriate, right before displaying it. If you're updating your table, you could write a program to process the table, reading each record, building the replacement string, and updating the record before moving on to the next one.
The bottom line is that T-SQL is just not a good language for doing this sort of string examination and manipulation. If you can build a fresh string from the original data, or do your manipulation in a high-level language, you'll have an easier job of it and end up with more maintainable code.
I wrote a code for the first example you gave. You still need to
improve it for the rest ...
DECLARE #STR VARCHAR(50)='Heathrow/Dublin*Dublin/Heathrow'
IF (SELECT SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)) =
(SELECT SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))))
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))+1,'')
END
ELSE
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)))),'***')
END

Create array with vlookup

I want to conditionally grab lines from a database-style spreadsheet in Google Spreadsheets (a list with a name, location, description, price) after checking the value with a vlookup - I've used this a while ago and expected it to CONTINUE an array across for the other columns next to the one 'looked up', but it seems my memory fails me here and it just retrieves the 'searched out' value.
=vlookup("Yes",'All 2014-15'!A2:G,2)
This formula basically finds the first value of the desired rows and should create a 'Selected items from 2014-15' list, but I can't work out how to expand it to produce a list of all the rows I want. Is there a simple way to retrieve this, I've tried playing with arrayformula but no success.
I can change the index simply to get the other values across, but if this could be filled out through an array too that would be preferable...?
Can you try this...
=FILTER('All 2014-15'!A2:G,'All 2014-15'!A2:A="Yes")
Edit:
As suggested by Immx, added apostrophes to sheet name and the second range changed to A2:G to A2:A assuming the yes/no data is in A column.

Resources