Count function with postgis and counting 0 - postgis

I have been using postgis for only a week and I'm already stucked. I'm working with the NYC gDB.
It's the one that is explained on this page : http://workshops.opengeo.org/postgis-intro/about_data.html
I am trying to list the neighboroughts which are next to the neighborhood 'Woodhaven-Richmond Hill' and the number of subway stations located in those neighborhoods.
Now when I look at the map, I see there are 7. One of those is Glendale, it has no subway station, and when I write my query, it will not list it.. I want it to be listed and display '0' for the count.
select n2.name, n2.geom, count(u.geom)
into glendale
from nyc_neighborhoods n1, nyc_neighborhoods n2, nyc_subway_stations u
where n1.name='Woodhaven-Richmond Hill'
and (st_touches(n1.geom, n2.geom) or st_overlaps(n1.geom, n2.geom)) and ST_Contains(n1.geom, u.geom)
group by n2.name, n2.geom
I know the problem is this : and ST_Contains(n1.geom, s.geom)
because it is fault for glendale...
Thanks !

Related

Google Sheets: Automatically entering data into a cell based on the values of two neighbouring cells

First of all, I am a complete novice at spreadsheets, so please explain things to me like I'm five years old. Secondly, I've already searched this throughout the internet but the only help seems to be related to entering data based on the value of one cell, not two.
I'm a language teacher and I'm using a tracker to keep a record of all the lessons that I teach. There are three important values:
The course (Beginner, Elementary, Intermediate, Upper-Intermediate or Advanced). I'm entering this information in column C.
The lesson number (1-20). Each course contains 20 lessons. So, for example, once students complete all 20 lessons in the Beginner course, they move on to the Elementary course. I'm entering this information in column D.
The lesson title. So for example, Beginner lesson 1 is called 'Introduction to English', and Intermediate lesson 15 is called 'Making phone calls'. I'm entering this information in column E.
When I'm booked for a new lesson, I currently have to manually enter all this information. But I want to speed up the process so that I just have to enter the course name in column C and the lesson number in column D, and the third cell will automatically update in column E with the lesson title.
At the moment this tracker is one sheet. A table of all the levels, lesson numbers and lesson titles is on another sheet.
If anyone could help me accomplish this, I'd very much appreciate it!
you can use VLOOKUP where you join C&D and look that up in your other sheet where you have that table (let's say it's in Sheet1!A:C where C=Sheet1!A, D=Sheet1!B and E=Sheet1!C) then you just use this in row 2:
=INDEX(IFNA(VLOOKUP(C2:C&"×"&D2:D, {Sheet1!A:A&"×"&Sheet1!B:B, Sheet1!C:C}, 2, 0)))
update:
=INDEX(IFNA(VLOOKUP(E2:E, 'UPDATED Lessons'!A2:J,
MATCH(D2:D, 'UPDATED Lessons'!A1:J1, ), )))
#player0 helped me out with the sheet in the end with this very clever formula. Thanks so much!
=INDEX(IFNA(VLOOKUP(E2:E, 'UPDATED Lessons'!A2:J, MATCH(D2:D, 'UPDATED Lessons'!A1:J1, ), )))

Create floating maximum for Table

I'm sadly out of ideas. I'm currently learning in COGNOS analytics and I could use your help.
I have crosstable that looks like this and comes from different system that uses the same source structure. I use company account and am a user, so I cannnot sadly write SQL or any scripts!
MIS0 MIS1 MIS3 MIS6
2016 0,0 0,1 0,3 0,6
2017 0,0 0,1 0,4 0,7
2018 0,0 0,2 0,4 0,7
I replicated this in COGNOS but cannot get one thing right (it's much more difficult than than but I think that this is the core)
explanation:
MIS = months in service
years = year of product manufactury
values = (faults / manufactured (that year) and sold products) * 1000
Fault has property MIS = which MIS it happened in, also product has property something like dateOfManufacture
ok so the problem... to have e.g. MIS6 means: Fault that happened within 6 months since purchase. The complication starts that MIS3 fault logically belongs to MIS6 fault too.
So I need to create data-element or filter or some other trick that would enable me to:
select faults relevant for MIS from 0 to X where X will be the number in the header for columns (0,1,3,6...) based of course on year of manufacture .. I'm limited by my user rights so please if you have a suggestion that contains writing a script, thank you, you roll! :) but I won't be able to do it via script.
Excuse the lack of details but named variables or any code is a part of the confidetiality I'm bound by. :(
Thank you for the time and have a nice weekend!
Fault
MIS: 2
ProductID: <121212>
Product
ProductID: <121212>
Date of assembly: 25.02.2020
(MIS: gets copied to product fault when fault occours)
Table is supposed to view faults that have happened in specific months in service - that means that if fault is as above example says in 2 months in service, it should be calculated into columns MIS3 and MIS6 and not calculated into MIS1 and MIS0 statistics since the fault didn't occour in 1 months but in 2.
Basically e.g. the first row second column says: find me products that have been manufactured in 2016 - count how many faults they had in first month in service. This number divide by the number of products you found (first sentence) and all this multiply by 1000 (faults/1000)
As you can now probably see the problem occours when you move to next column on the same row. -> find me products that have been manufactured in 2016. Count how many fault they had in 3 months of service (= 1,2,3 included) and then divide by the number of products made - multiply by 1000.
When I set up crosstab I need to use inteval (MIS0 - MIS1,3,6) with floating maximum, but I don't have the brain to make it..
Try with a list first. If this works, we can convert the list to a crosstab
Let's start by isolating the metric in context to time
This would be your first column
For one month. Create a data item [Month 1 Faults] like this:
if ([Year] = 2016 and [Month] = 1)Then([Faults])Else(0)
Next column is for both month 1 and 2. We add the function IN(1,2) to accomplish this
Create a data item [Month 1 & 2 Faults] like this:
if ([Year] = 2016 and [Month] IN(1,2))Then([Faults])Else(0)
repeat this logic for all of the other data items

iterating over multindex - a groupby.value_counts() object is only through values and not through original date index

i want to know the percent of males in the ER (emergency room) during days that i defined as over crowded days.
i have a DF named eda with rows repesenting each entry to the ER. a certain column states if the entry occurred in an over crowded day (1 means over crowded) and a certain column states the gender of the person who entered.
so far i managed to get a series of over crowded days as index and a sub-index representing gender and the number of entries in that gender.
i used this code :
eda[eda.over_crowd==1].groupby(eda[eda.over_crowd==1].index.date).gender.value_counts()
and got the following result:
my question is, what is the most 'pandas-ian' way to get the percent of males\females in general. or, how to continue from the point i stopped?
as can be shown in the bottom of the screenshot, when i iterate over the elements, each value is the male of female consecutively. i want to iterate over dates so i could somehow write a more clean loop that will produce another column of male percentage.
i found a pretty elegant solution. i'm sure there are more, but maybe it can help someone else.
so i defined a multi-index series with all dates and counts of females and males. then used .loc to operate on each count of all dates to get percentage of males at each day. finally i just extract only the days that apply for over_crowd==1.
temp=eda.groupby(eda.index.date).gender.value_counts()
crowding['male_percent']=np.divide(100*temp.loc[:,1],temp.loc[:,2]+temp.loc[:,1])
crowding.male_percent[crowding.over_crowd==1]

how to detect polygons within other (many) polygons in postgis

I have two datasets: 1. ZipCodes and 2. Neighborhoods (think of them as like counties).
I want to join each neighborhood with which zipcodes cover it. Most neighborhoods will only be within one zipcode, but in some cases neighborhoods will straddle two. So for example:
Neighborhood 1 is inside 20001
Neighborhood 2 is inside 20002
Neighborhood 3 is inside 20001,20002
Here is what I have so far:
SELECT name, zipcode
FROM
neighborhood_names nn, dc_zipcode_boundries dzb
WHERE ST_Intersects(nn.the_geom, dzb.the_geom);
Note: Updated to within based on comments, now getting an answer for each neighborhood but still not able to get the Array function to respond as expected.
I figured it out. thanks to the help from John. My statement needed a group by (whcih is what the error said, just needed some time to digest before it clicked).
the snippet below worked for anyone following
SELECT name, array_to_string(array_agg(zipcode), ',')
FROM
neighborhood_names nn, dc_zipcode_boundries dzb
WHERE ST_Intersects(nn.the_geom, dzb.the_geom)
group by name

SPSS :Loop through the values of variable

I have a dataset that has patient data according to the site they visited our mobile clinic. I have now written up a series of commands such as freqs and crosstabs to produce the analyses I need, however I would like this to be done for patients at each site, rather than the dataset as whole.
If I had only one site, a mere filter command with the variable that specifies a patient's site would suffice, but alas I have 19 sites, so I would like to find a way to loop through my code to produce these outputs for each site. That is to say for i in 1 to 19:
1. Take the i th site
2. Compute a filter for this i th site
3. Run the tables using this filtered data of patients at ith site
Here is my first attempt using DO REPEA. I also tried using LOOP earler.
However it does not work I keep getting an error even though these are closed loops.
Is there a way to do this in SPSS syntax? Bear in mind I do not know Python well enough to do this using that plugin.
*LOOP #ind= 1 TO 19 BY 1.
DO REPEAT #ind= 1 TO 20.
****8888888888888888888888888888888888888888888888888888888 Select the Site here.
COMPUTE filter_site=(RCDSITE=#ind).
USE ALL.
FILTER BY filter_site.
**********************Step 3: Apply the necessary code for tables
*********Participation in the wellness screening, we actually do not care about those who did FP as we are not reporting it.
COUNT BIO= CheckB (1).
* COUNT FPS=CheckF(1).
* COUNT BnF= CheckB CheckF(1).
VAL LABEL BIO
1 ' Has the Wellness screening'
0 'Does not have the wellness screening'.
*VAL LABEL FPS
1 'Has the First patient survey'.
* VAL LABEL BnF
1 'Has either Wellness or FPS'
2 'Has both surveys done'.
FREQ BIO.
*************************Use simple math to calcuate those who only did the Wellness/First Patient survey FUB= F+B -FnB.
*******************************************************Executive Summary.
***********Blood Pressure.
FREQ BP.
*******************BMI.
FREQ BMI.
******************Waist Circumference.
FREQ OBESITY.
******************Glucose.
FREQ GLUCOSE.
*******************Cholesterol.
FREQ TC.
************************ Heamoglobin.
FREQ HAEMOGLOBIN.
*********************HIV.
FREQ HIV.
******************************************************************************I Lifestyle and General Health.
MISSING VALUES Gender GroupDep B8 to B13 ('').
******************Graphs 3.1
Is this just Frequencies you are producing? Try the SPLIT procedure by the variable RCDSITE. Should be enough.
SPLIT FILES allows you to partition your data by up to eight variables. Then each procedure will automatically iterate over each group.
If you need to group the results at a higher level than the procedure, that is, to run a bunch of procedures for each group before moving on to the next one so that all the output for a group will be together, you can use the SPSSINC SPLIT DATASET and SPSSINC PROCESS files extension commands to do this.
These commands require the Python Essentials. That and the commands can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) if you have at least version 18.
HTH,
Jon Peck
A simple but perhaps not very elegant way is to select from the menu: Data/Select Cases/If condition, there you enter the filter for site 1 and press Paste, not OK.
This will give the used filter as syntax code.
So with some copy/paste/replace/repeat you can get the freqs and all other results based on the different sites.

Resources