I have been working with tables based on xmacros like this:
#define TABLE_MACRO(MAN_TYPE, WOMAN_TYPE) \
MAN_TYPE( John, Doe, "Addr1", arg_a, arg_b, arg_c) \
WOMAN_TYPE( Jane, Joe, "Addr2", arg_a, arg_b, arg_c) \
MAN_TYPE( Bill, Tom, "Addr3", arg_a, arg_b, arg_c) \
I have tables with many more arguments than what I am showing, however, in many cases I am expanding the table for only 1 or 2. I use them to generate variables and enums. For example:
#define NAME_LIST(name,last,addr, arg1, arg2, arg3) name,\
enum {
TABLE_MACRO(NAME_LIST,NAME_LIST)
}Name_List;
Is there a way to take TABLE_MACRO and redefine or change the expansion order of it to have expand to just this?
TABLE_MACRO_NAMES_ONLY(MAN_TYPE, WOMAN_TYPE) \
MAN_TYPE( John, ) \
WOMAN_TYPE( Jane, ) \
MAN_TYPE( Bill, ) \
My objetive is to have simplified tables, to be used like:
#define NEW_NAME(name) New_##name,
TABLE_MACRO_NAMES_ONLY(NEW_NAME, NEW_NAME)
Hopefully the waiting payed off:
I came up with two solutions. One in which you can keep your original table as it is, but the usage is a bit more inconvenient.
Solution 1
// your original table as is
#define TABLE_MACRO(MAN_TYPE, WOMAN_TYPE) \
MAN_TYPE( John, Doe, "Addr1", arg_a, arg_b, arg_c) \
WOMAN_TYPE( Jane, Joe, "Addr2", arg_a, arg_b, arg_c) \
MAN_TYPE( Bill, Tom, "Addr3", arg_a, arg_b, arg_c) \
// create Names only table
#define MTYPE(a,b,c,d,e,f) MAN(a)
#define WTYPE(a,b,c,d,e,f) WOMAN(a)
#define TABLE_NAMES_ONLY \
TABLE_MACRO(MTYPE,WTYPE)
// usage (here, the defines MUST be named "MAN" and "WOMAN".
// At least they have to align with the definition of MTYPE and WTYPE)
#define MAN(name) man_##name
#define WOMAN(name) woman_##name
TABLE_NAMES_ONLY
The other solution involves extending your original table by two arguments, but in the end, the usage is easier.
Solution 2
// your extended original table (see extra arguments)
#define TABLE_MACRO(MAN_TYPE, WOMAN_TYPE, m_extra, w_extra) \
MAN_TYPE( John, Doe, "Addr1", arg_a, arg_b, arg_c, m_extra, w_extra) \
WOMAN_TYPE( Jane, Joe, "Addr2", arg_a, arg_b, arg_c, m_extra, w_extra) \
MAN_TYPE( Bill, Tom, "Addr3", arg_a, arg_b, arg_c, m_extra, w_extra) \
// create Names only table
#define MTYPE(a,b,c,d,e,f, m_extra, w_extra) m_extra(a)
#define WTYPE(a,b,c,d,e,f, m_extra, w_extra) w_extra(a)
#define TABLE_NAMES_ONLY(male_func, female_func) \
TABLE_MACRO(MTYPE, WTYPE, male_func, female_func)
// usage (here, the naming of the 2 following defines is arbitrary)
#define MAN(a) man_##a
#define WOMAN(a) woman_##a
TABLE_NAMES_ONLY(MAN, WOMAN)
// different usage:
#define NAME(a) name_##a
TABLE_NAMES_ONLY(NAME, NAME)
Both solutions have been tested with gcc 5.3.0 using gcc -E yourTestFile.c.
It's possible that there are even better solutions, but these are the ones that came to my mind.
Related
I have these as strings: { column01 \ column02 \ column01 }(for other countries { column01 , column02 , column01 }). I want them evaluated as a array as if copy pasted.
The array string range is created automatically, based on the selection of the user. I created a dynamically personalized dataset based on a sheet studeertijden named. The user can easily select the wanted tables by checkboxes, so by Google Sheets ARRAY range (formula). I try to copy these content to an other sheet ... to make the required data available for Google Data Studio.
The contents of page studeertijden is NOT important. Let's say, a cell in 'legende-readme'!B39 returns a string with the required columns/data in a format like this:
{ studeertijden!A:A \ studeertijden!B:B}
If I put this in an empty sheet, by copy and paste, it works fine :
={ studeertijden!A:A \ studeertijden!B:B}
How can it be done automatically???
my first thought was by indirect ...
What I've tried(Does NOT work):
Cell 'legende - readme'!B39 contains:
{ studeertijden!A:A \ studeertijden!B:B}
=indirect('legende - readme'!B39)
returns :
#REF! - It is not a valid cell/range reference.
={ indirect('legende - readme'!B39) }
returns :
#REF! - It is not a valid cell/range reference.
={'legende - readme'!B39}
returns : { studeertijden!A:A \ studeertijden!B:B}
Note : For European users, use a '\' [backslash] as the column separator. Instead of the ',' [comma].
Assuming I've understood the question, if string doesn't need to start and end with curly brackets, then is this the behaviour you are looking for?
=arrayformula(transpose(split(byrow(transpose(split(string,",")),lambda(row,join(",",indirect(row)))),",")))
N.B. In my case I'm assuming that string is of the format 'studeertijden!A:A,studeertijden!B:B' (i.e. comma separated). So SPLIT by the comma to generate a column vector of references, TRANSPOSE to a row vector, INDIRECT each row (with the JOIN to return a single cell per row), ARRAYFORMULA/SPLIT to get back to multiple cells per row, TRANSPOSE back into columns like the original data.
This would be a lot easier if BYROW/BYCOL could return a matrix rather than being limited to just a row/column vector - the outer SPLIT and the JOIN in the BYROW wouldn't be needed. Over in Excel world they can also use arrays of thunks rather than string manipulation to deal with this limitation (which Excel also has), but Google Sheets doesn't seem to allow them when I've tried - see https://www.flexyourdata.com/blog/what-is-a-thunk-in-an-excel-lambda-function/ for more details.
Thanks to Natalia Sharashova of AbleBits, she provided this working solution (for the complete sheet).
referenceString is the reference to the string to all wanted columns; array matrix; { studeertijden!A:A \ studeertijden!B:B}
Note :
for European users, use a ' \ ' - [backslash] as the column separator
instead of the default ' , ' - [comma]
=REDUCE(
FALSE;
ArrayFormula(TRIM(SPLIT(REGEXREPLACE( referenceString ; "^=?{(.*?)}$"; "$1"); "\"; TRUE; TRUE)));
LAMBDA(accumulator; current_value;
IF(
accumulator = FALSE;
INDIRECT(current_value);
{ accumulator \ INDIRECT(current_value)}
)
)
)
={"1" , "2"}
={"1" \ "2"}
both are valid. it all depends on your locale settings
see: https://stackoverflow.com/a/73767720/5632629
with indirect it would be:
=INDIRECT("sheet1!A:B")
where you can build it dynamically for example:
=INDIRECT("sheet1!"& A1 &":"& B1)
where A1 contains a string like A or A1 (and same for B1)
another way how to construct range is with ADDRESS like:
=INDIRECT(ADDRESS(1; 2)
from another sheet it could be:
=INDIRECT(ADDRESS(1; 2;;; "sheet2")
or like:
=INDIRECT("sheet2!"&ADDRESS(1; 2))
for a range we can do:
=INDIRECT("sheet2!"&ADDRESS(1; 2)&":"&ADDRESS(10; 3))
I have a pandas dataframe with a column that looks like this:
'Column name'
NaN
[11am-2am]
NaN
[9am-10pm]
NaN
[10:30am-10:30pm]
See picture below for further illustration:row_explanation
I am trying to make all row in the same format such as [10:30am-10:00pm]
working_hours_daily=schedule['Daily'] // column name is 'Daily'
c=lambda x: str(x)
b=lambda x: str(x).replace('-',',').replace('am',':00am').replace('pm',':00pm').split(',')
times_daily.apply(c)
open_hours_daily=[]
for i in (range(0,len(times_daily))):
if ":" not in times_daily:
working_hours_daily=times_daily.apply(b)
print (working_hours_daily)
open_hours_daily.append(working_hours_daily)
The idea is to apply b only when ":" is not in the string,
and so I am using not in syntax
But the code is not respecting that condition and applies b to all rows,
So some rows turn out fine: [['11:00am, 2:00am']]
but others which already contain ':' turn out like this: [['10:30:00am, 10:30:00pm']]
Any help would be much appreciated.
Camille
This should work:
b = lambda x: str(x) if str(x).contains(':') else str(x).replace('-',',').replace('am',':00am').replace('pm',':00pm').split(',')
times_daily.apply(b)
If you could please post a sample dataset, that would be great, so I can debug this code.
I am trying to write a loop to generate and fill in a dummy variable for whether an individual was a member of a particular party in the year in question. My data is long with each observation being a person, year. It looks like the following.
X1 X2 X3
AR, 1972-1981 PDC, 1982-1986 PFL, 1986-.
MD, 1966-1980 PMDB, 1980-1988 PSB, 1988-.
MD, 1966-1968 AR, 1968-1980 PDS, 1980-1985
Before the comma is the party and after are the years in which the person was a member of the party.
Any help would be greatly appreciated!
So far the code I have is:
rename X1 XA
rename X2 XB
rename X3 XC
foreach var of varlist XA XB XC{
split `var', parse (,)
}
tabulate XA1, gen(p)
Here's one way to do it. I had to make an assumption about what the missing year corresponds to in X3, so you will need to alter that.
/* Enter Data */
clear
input str20 X1 str20 X2 str20 X3
"AR, 1972-1981" "PDC, 1982-1986" "PFL, 1986-."
"MD, 1966-1980" "PMDB, 1980-1988" "PSB, 1988-."
"MD, 1966-1968" "AR, 1968-1980" "PDS, 1980-1985"
end
compress
/* Split X1,X2,X3 into party, start year and end year and create 3 ID variables that we need later */
forvalues v=1/3 {
split X`v', parse(", " "-")
gen id`v'=_n
}
/* Makes years numeric, and get rid of messy original data */
destring X12 X13 X22 X23 X32 X33, replace
replace X33 = 1990 if missing(X33) // enter your survey year here
drop X1 X2 X3
/* stack the spells on top of each other */
stack (id1 X11 X12 X13) (id2 X21 X22 X23) (id3 X31 X32 X33), into(id party year1 year2) clear
drop _stack
/* Put the data into long format and fill in the gaps */
reshape long year, i(id party) j(p)
drop p
/* need this b/c people can be in more than one party in a given year */
egen idparty = group(id party), label
xtset idparty year
tsfill
carryforward id party, replace
drop idparty
/* create party dummies */
tab party, gen(DD_)
/* rename the dummies to have party affiliation at the end instead of numbers */
foreach var of varlist DD_* {
levelsof party if `var'==1, local(party) clean
rename `var' ind_`party'
}
drop party
/* get back down to one person-year observation */
collapse (max) ind_*, by(id year)
list id year ind_*, sepby(id) noobs
Following Dimitriy's lead (and interpretation), here is a slightly different way of doing it. I make a different assumption about the missing endpoints, i.e., I truncate the series to the last known years.
clear
set more off
input ///
str15 (XA XB XC)
"AR, 1972-1981" "PDC, 1982-1986" "PFL, 1986-."
"MD, 1966-1980" "PMDB, 1980-1988" "PSB, 1988-."
"MD, 1966-1968" "AR, 1968-1980" "PDS, 1980-1985"
end
list
*----- what you want? -----
// main
stack X*, into(X) clear
bysort _stack: gen id = _n
order id, first
split X, parse (, -)
rename (X1 X2 X3) (party sdate edate)
destring ?date, replace
gen diff = edate - sdate + 1
expand diff
bysort id party: replace sdate = sdate[1] + _n - 1
drop _stack X edate diff
// create indicator variables
tabulate party, gen(y)
// fix years with two or more parties
levelsof party, local(lp) clean
collapse (sum) y*, by(id sdate)
// rename
unab ly: y*
rename (`ly') (`lp')
list, sepby(id)
Data is setup with a bunch of information corresponding to an ID, which can show-up more than once.
ID Data
1 X
1 Y
2 A
2 B
2 Z
3 X
I want a loop that signifies which instance of the ID I am looking at. Is it the first time, second time, etc? I want it as a string in the form _# so I have to go beyond the simple _n function in Stata, to my knowledge. If someone knows a way to do what I want without the loop let me know, but I would still like the answer.
I have the following loop in Stata
by ID: gen count_one = _n
gen count_two = ""
quietly forval j = 1/3 {
replace count_two = "_`j'" if count_one == `j'
}
The output now looks like this:
ID Data count_one count_two
1 X 1 _1
1 Y 2 _2
2 A 1 _1
2 B 2 _2
2 Z 3 _3
3 X 1 _1
The question is how can I replace the 16 above with to tell Stata to take the max of the count_one column because I need to run this weekly and that max will change and I want to reduce errors.
It's hard to understand why you want this, but it is one line whether you want numeric or string:
bysort ID : gen nummax = _N
bysort ID : gen strmax = "_" + string(_N)
Note that the sort order within ID is irrelevant to the number of observations for each.
Some parts of your question aren't clear ("...replace the 16 above with to tell Stata...") but:
Why don't you just use _n with tostring?
gsort +ID +data
bys ID: g count_one=_n
tostring count_one, gen(count_two)
replace count_two="_"+count_two
Then to generate the max (answering the partial question at the end there) -- although note this value will be repeated across instances of each ID value:
bys ID: egen maxcount1=max(count_one)
or more elegantly:
bys ID: g maxcount2=_N
I've get a column in solr - type string which is with values like 'JOHN JACKON', 'JAKE SMITH', 'JOHNATAN JAMESON'
IS it possible to tell solr when I type J to get first these record which has J more times than the other.
You may use solr.EdgeNGramFilterFactory. You can set minGramSize to 1.
This FilterFactory is very useful in matching prefix substrings (or suffix substrings if side="back") of particular terms in the index during query time.
Reference: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
So for your examples above,
for JOHN JACKSON, it will store:
J, JO, JOH, JOHN, J, JA, JAC, JACK, JACKS, JACKSON
and for JAKE SMITH:
j, JA, JAK, JAKE, S, SM, SMI, SMIT, SMITH
Now when someone searches for J, first document(john jackson) will get higher score, because J is twice in the index.