R About Array replacement [closed] - arrays

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
ordered_use
2 UNIT CONVERTED DWELLING
28706 51
2 UNIT DWELLING 2 UNITS
99 44
3 UNIT DWELLING APARTMENT
31 4733
APARTMENT APARTMENT BLDG
38 37
APARTMENT BUILDING APARTMENT UNIT
2042 37
APPARTMENT BUILDING APT
54 357
APT BLDG APT BUILDING
78 49
APT. APT. BLDG
41 61
APT. BUILDING ARENA
35 67
BANK BOWLING ALLEY
302 267
BUNGALOW CAR DEALERSHIP
85 62
CHURCH CLUB
94 40
COLLEGE COMMERCIAL
196 410
COMMERCIAL/RESIDENTIAL COMMUNITY CENTRE
56 131
COMMUNITY HALL CONDO
31 223
CONDOMINIUM CONVERTED DWELLING
42 42
CONVERTED HOUSE CONVERTED HOUSE - 2 UNITS
149 124
CONVERTED HOUSE - 3 UNITS CONVERTED HOUSE (2 UNITS)
56 35
CONVERTED HOUSE 2 UNITS CONVERTED HOUSE, 2 UNITS
38 84
CONVERTED HOUSE, 3 UNITS DAYCARE
42 31
DENTAL OFFICE DETACHED - SFD
87 513
DETACHED - SINGLE FAMILY DWELLING DETACHED HOUSE
97 130
DETACHED SFD DUPLEX
190 145
ELEMENTARY SCHOOL FIRE HALL
859 41
FITNESS CENTRE FUNERAL HOME
48 36
GARAGE GAS STATION
63 130
GROCERY STORE GROUP HOME
51 45
HAIR SALON HOME FOR THE AGED
46 49
HOSPITAL HOTEL
971 215
HOUSE IND
1249 219
INDUSTRIAL INDUSTRIAL
1725 35
INDUSTRIAL BUILDING INDUSTRIAL MANUFACTURING
51 91
INDUSTRIAL WAREHOUSE INSTITUTIONAL
61 48
LAB LABORATORY
56 46
LIBRARY LONG TERM CARE FACILITY
91 74
LUMBER YARD MANUFACTURING
53 55
MEDICAL OFFICE MIXED USE
247 539
MIXED USE MIXED USE (COMMERCIAL)
40 34
MIXED USE (RETAIL) MIXED USE BUILDING
74 297
MIXED USE BUILDING/NON RESIDENTIAL MIXED USE NON RES
37 93
MIXED USE NON RES (RETAIL) MIXED USE RES & NON RES
52 59
MIXED-USE MULTI UNIT
202 54
MULTI UNIT BUILDING MULTI USE
70 381
MULTI USE, NON RES MULTI USE/NON RES
134 36
MULTI USE/NON RESIDENTIAL MULTIPLE UNIT
49 149
MULTIPLE UNIT BUILDING MUSEUM
40 40
N/A NONE
650 264
NOT KNOWN NURSING HOME
58 55
OFF OFFICE
181 9698
OFFICE OFFICE BLD
50 46
OFFICE BUILDING OFFICE SPACE
177 39
OFFICE/RETAIL OFFICE/WAREHOUSE
36 95
OFFICES OTHER
63 54
PARK PARKING GARAGE
137 149
PARKING LOT PERSONAL SERVICE SHOP
126 49
PLACE OF WORSHIP POLICE STATION
516 34
PROF. OFFICE RECREATIONAL
65 46
REPAIR GARAGE RES
91 242
RESIDENTIAL RESIDENTIAL - SFD
488 561
RESIDENTIAL CONDO REST
39 46
RESTAURANT RESTAURANT > 30 SEATS
1074 42
RESTAURANT GREATER THAN 30 SEATS RESTAURANT LESS THAN 30 SEATS
145 69
RESTAURANT UNDER 30 SEATS RESTAURANT, GREATER THAN 30 SEATS
47 42
RET RETAIL
81 4001
RETAIL RETAIL MALL
46 61
RETAIL PLAZA RETAIL STORE
96 796
RETAIL/OFFICE RETAIL/RESIDENTIAL
32 99
ROOMING HOUSE ROW HOUSE
89 38
SCHOOL SECONDARY SCHOOL
594 246
SEMI SEMI DETACHED
209 218
SEMI DETACHED - SFD SEMI DETACHED - SINGLE FAMILY DWELLING
212 50
SEMI DETACHED SFD SEMI-DETACHED
46 71
SEMI-DETACHED - SFD SEMI-DETACHED DWELLING
241 172
SEMI-DETACHED HOUSE SEMI-DETACHED SFD
56 155
SEMI-DETACHED SINGLE FAMILY DWELLING SFD
90 26479
SFD - DETACHED SFD - DETCAHED
3817 79
SFD - ROWHOUSE SFD - SEMI
76 206
SFD - SEMI DETACHED SFD - SEMI-DETACHED
353 209
SFD - SEMIDETACHED SFD - TOWNHOUSE
158 131
SFD DET SFD DETACEHD
495 39
SFD DETACHED SFD DETATCHED
755 231
SFD ROWHOUSE SFD SEMI
31 857
SFD SEMI DETACHED SFD SEMI-DETACHED
59 167
SFD TOWNHOUSE SFD-DETACHED
155 8148
SFD-DETACHED SFD-ROWHOUSE
37 56
SFD-SEMI SFD-SEMI DETACHED
1189 613
SFD-SEMI-DETACHED SFD-TOWNHOUSE
313 526
SINGLE SINGLE FAMILY
41 64
SINGLE FAMILY DETACHED SINGLE FAMILY DETACHED DWELLING
1615 222
SINGLE FAMILY DETACHED HOUSE SINGLE FAMILY DWELLING
58 2673
SINGLE FAMILY SEMI-DETACHED SINGLE-FAMILY DETACHED HOUSE
54 107
SINGLE-FAMILY SEMI-DETACHED HOUSE STADIUM
53 37
STUDENT RESIDENCE SUBWAY STATION
34 44
SURFACE PARKING LOT/EXISTING COMMERCIAL BUILDING TAKE OUT RESTAURANT
57 34
THEATRE TOWNHOUSE
38 198
TOWNHOUSE - SFD TOWNHOUSES
97 31
TRANSIT STATION TRIPLEX
70 54
UNION STATION UNIVERSITY
52 359
UNIVERSITY OF TORONTO VACANT
42 15010
VACANT VACANT (AFTER DEMO)
77 36
VACANT COMMERCIAL VACANT COMMERCIAL UNIT
37 63
VACANT INDUSTRIAL VACANT LAND
32 1107
VACANT LOT VACANT RETAIL
447 112
VACANT RETAIL UNIT VACANT SINGLE FAMILY DWELLING
46 82
VACANT SPACE VACANT UNIT
120 117
VACNT WAREHOUSE
42 526
WAREHOUSE/OFFICE WATER TREATMENT PLANT
54 46
Apartment <- (ordered_use[6]+ ordered_use[7]+ ordered_use[8] + ordered_use[9] + ordered_use[10] + ordered_use[11] + ordered_use[12] + ordered_use[13] + ordered_use[14] + ordered_use[15] + ordered_use[16] + ordered_use[17] + ordered_use[30] + ordered_use[31] + ordered_use[33] + ordered_use[34] + ordered_use[35] + ordered_use[36] + ordered_use[37] + ordered_use[38] + ordered_use[39] + ordered_use[84] + ordered_use[85] + ordered_use[90] + ordered_use[91])
I am trying to convert anything that looks like an apartment,building,condo, unit and etc therefore I combined everything which looks similar but my question is, how can I replace those with my combined data of Apartment

To get something to work with I pasted your text into the space between the quotes of:
ordered_use <- read.fwf(textConnection("___"), widths=c(50,50), stringsAsFactors=FALSE)
And then trimmed blank-space and extracted every other row of the odd items and applied as.numeric to the even rows>
ordered_use[] <- lapply(ordered_use, trim)
ord2 <- data.frame(
nams <- c( ordered_use[ c(TRUE,FALSE), "V1"], ordered_use[ c(TRUE,FALSE), "V2"]),
nums=as.numeric(c( ordered_use[ c(FALSE,TRUE), "V1"], ordered_use[ c(FALSE,TRUE), "V2"]) )
> head(ord2)
nams nums
1 28706
2 2 UNIT DWELLING 99
3 3 UNIT DWELLING 31
4 APARTMENT 38
5 APARTMENT BUILDING 2042
6 APPARTMENT BUILDING 54
To extract items with "APT" or "CONDO" use grepl
> ord2[ grepl("APART|APPART|APT|CONDO", ord2$nams) , ]
nams nums
4 APARTMENT 38
5 APARTMENT BUILDING 2042
6 APPARTMENT BUILDING 54
7 APT BLDG 78
8 APT. 41
9 APT. BUILDING 35
16 CONDOMINIUM 42
60 RESIDENTIAL CONDO 39
110 APARTMENT 4733
111 APARTMENT BLDG 37
112 APARTMENT UNIT 37
113 APT 357
114 APT BUILDING 49
115 APT. BLDG 61
122 CONDO 223
I cannot tell whether your item numbers match up since you probably have a table object and I have two columns that are not arranges the same as yours.
> sum( ord2[ grepl("APART|APPART|APT|CONDO", ord2$nams) ,"nums" ])
[1] 7866
You should post the output of dput(head(ordered_use, 20)) if you want an answer tailored to the type of object you have.

Related

Repeat certain pandas series values, so that it has an entry for all index values between 1 and 100

I have created a list of pandas series, with each series indexed by numbers between 1 and 100 eg
Index Value
1 62.99
4 64.39
37 75.225
65 88.12
74 89.89
79 93.30
88 94.30
92 95.83
100 100.00
What I want to do, either while it is a Series, or as an array after calling .to_numpy() on it, is to fill it out so that my series has 100 values (1 to 100), with any new entries having the previous existing value ie
Index Value
1 62.99
2 62.99
3 62.99
4 64.39
5 64.39
6 64.39
...
...
36 64.39
37 75.225
38 75.225
and so on.
I can do this programmatically the long-winded way by iterating through each series and checking for a change in value; my question is, is there a version of Series.repeat() which could do this in one hit, or a numpy function which can 'pad out' my array in this manner with my 100 values?
Thanks in advance for reading, and for any suggestions. This isn't homework; it's a genuine question so please don't attack me if my style of asking isn't as you expect.
What you need yo do is to frontfill the values in a series:
This code
series = pd.Series([33.2, 36, 39, 55], index=[3, 6, 12, 14], name='series')
indices = range(100)
df = pd.DataFrame(indices)
series = df.join(series).ffill()['series']
produces
0 NaN
1 NaN
2 NaN
3 33.2
4 33.2
...
95 55.0
96 55.0
97 55.0
98 55.0
99 55.0
First values ar NaN because there are no values to fill them in the series
So here's the solution I went with - an ffill() with fillna(0), joining to range(1,101). I had to iterate through a larger dataset which needed grouping by ID first / taking the maximum 'Pct' per 'Bucket' :-
j=df[['ID','Bucket','Pct']].groupby(['ID','Bucket']).max()
for i in df['ID'].unique():
index=pd.DataFrame(range(1,101))
index.columns=['Bucket']
k=pd.merge(index,j.loc[i],how='left',on='Bucket').ffill().fillna(0)
In:
Bucket Pct
3 0.03
3 0.1
3 0.26
3 0.42
3 0.45
3 0.59
3 0.69
3 0.83
3 0.86
3 0.91
3 0.94
3 0.98
4 1.1
... ...
91 98.89
93 99.08
94 99.17
94 99.26
94 99.43
94 99.48
94 99.63
100 100.0
Out:
Bucket Pct
1 0.00
2 0.00
3 0.98
4 1.83
5 22.83
... ...
91 98.89
92 98.89
93 99.08
94 99.63
95 99.63
96 99.63
97 99.63
98 99.63
99 99.63
100 100.00
Many, many thanks once again to you both!

Make uppercase in array

Use of order
Apartment Canada Toronto
38 37 2042 37
Appartment Building Apt
54 357
can you help for making capital letters for the charcaters in my array?
Try this:
names(ordered_use) <-toupper(names(ordered_use))
> ordered_use
APARTMENT APARTMENT BLDG APARTMENT BUILDING APARTMENT UNIT APPARTMENT BUILDING
38 37 2042 37 54
APT
357

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

How do I sum using for distinct items in a table

I have to show my table data in sort order by design_no
Here is my data
design_no fname meter rate s m l xl
---------------------------------------------------------------
3092 2111-1 432.00 235.00 32 33 21 21
3092 2111-1 498.75 235.00 38 37 24 24
3092 2111-1 460.50 235.00 31 35 23 24
3092 2111 501.75 245.00 37 38 25 24
I want show it like this..
design_no fname meter rate pcs
---------------------------------------------------
3092 2111 501.75 245.00 124
3092 2111-1 1391.25 235.00 343
Kindy help me
SELECT design_no,fname,SUM(meter),rate,SUM(s)+SUM(m)+SUM(l)+SUM(xl)
FROM tab
GROUP BY design_no,fname,rate
What behaviour do you want if the rate is different for the same design_no and fname?

CipherSaber bug

So I implemented ciphersaber-1. It almost works, I can decrypt the cstest1.cs1. But i have trouble getting cstest2.cs1 to work.
The output is:
The Fourth Amendment to the Constitution of the Unite ▀Stat→s of America
"The right o☻ the people to be secure in their persons, houses, papers, and
effects, against unreasonab→e searches an╚A)┤Xx¹▼☻dcðþÈ_#­0Uc.?n~J¿|,lómsó£k░7╠▄
íuVRÊ ╣├xð"↕(Gû┤.>!{³♫╚Tƒ}Àõ+»~C;ÔÙ²÷g.qÏø←1ß█yÎßsÈ÷g┐ÅJÔÞ┘Îö║AÝf╔ìêâß╗È;okn│CÚê
õ&æÄ[5&Þ½╔s╦Nå1En♂☻♫ôzÓ9»Á╝ÐÅ├ðzÝÎòeØ%W¶]¤▲´Oá╗e_Ú)╣ó0↑ï^☻P>ù♂­¥¯▄‗♦£mUzMצվ~8å
ì½³░Ùã♠,H-tßJ!³*²RóÅ
So I must have a bug in initializing the state. The odd thing is that I can encrypt and decrypt long texts without problems, so the bug is symmetric.
I implemented the rc4 cipher as a reentrent single byte algorithm as you can see in rc4.c.
The state is stored in the rc4_state struct:
typedef unsigned char rc4_byte;
struct rc4_state_
{
rc4_byte i;
rc4_byte j;
rc4_byte state[256];
};
typedef struct rc4_state_ rc4_state;
The state is initialized with rc4_init:
void rc4_init(rc4_state* state, rc4_byte* key, size_t keylen)
{
rc4_byte i, j, n;
i = 0;
do
{
state->state[i] = i;
i++;
}
while (i != 255);
j = 0;
i = 0;
do
{
n = i % keylen;
j += state->state[i] + key[n];
swap(&state->state[i], &state->state[j]);
i++;
}
while (i != 255);
state->i = 0;
state->j = 0;
}
The actual encryption / decryption is done in rc4:
rc4_byte rc4(rc4_state* state, rc4_byte in)
{
rc4_byte n;
state->i++;
state->j += state->state[state->i];
swap(&state->state[state->i], &state->state[state->j]);
n = state->state[state->i] + state->state[state->j];
return in ^ state->state[n];
}
For completeness, swap:
void swap(rc4_byte* a, rc4_byte* b)
{
rc4_byte t = *a;
*a = *b;
*b = t;
}
I have been breaking my head on this for more than two days... The state, at least for the "asdfg" key is correct. Any help would be nice.
The whole thing can be found in my github reopsitory: https://github.com/rioki/ciphersaber/
I stumbled across your question while searching online, but since you haven't updated your code at GitHub yet, I figured you might still like to know what the problem was.
It's in this bit of code:
i = 0;
do
{
state->state[i] = i;
i++;
}
while (i != 255);
After this loop has iterated 255 times, i will have a value of 255 and the loop will terminate. As a result, the last byte of your state buffer is being left uninitialised.
This is easily fixed. Just change while (i != 255); to while (i);.
Sorry you haven't gotten feedback, I finally pulled this off in Python 3 today, but don't know enough about C to debug your code.
Some of the links on the main ciphersaber page are broken (pointing to ".com" instead of ".org"), so you might not have found the FAQ:
http://ciphersaber.gurus.org/faq.html
It includes the following debugging tips:
Make sure you are not reading or writing encrypted files as text files. You must use binary mode for file I/O.
If you are writing in the C language, be sure to store bytes as unsigned char.
Watch out for classic indexing problems. Do arrays in you chosen programming language start with 0 or 1?
Make sure you are writing out a random 10 byte IV when you encrypt and are reading the IV from the start of the file when you decrypt.
If your program still does not work, put in some statements to print out the S array after the key setup step. Then run your program to
decrypt the file cstest1.cs1 using asdfg as the key. Here is how the S
array should look:
file: cstest1.cs1
key: asdfg
176 32 49 160 15 112 58 8 186 19 50 161 60 17 82 153 37 141 131 127 59
2 165 103 98 53 9 57 41 150 174 64 36 62 191 154 44 136 149 158 226
113 230 227 247 155 221 34 125 20 163 95 128 219 1 181 201 146 88 204
213 80 143 164 145 234 134 248 100 77 188 235 76 217 194 35 75 99 126
92 243 177 52 180 83 140 198 42 151 18 91 33 16 192 101 48 97 220 114
110 124 72 139 218 142 118 81 84 31 29 195 68 209 172 200 214 93 240
61 22 206 123 152 7 203 10 119 171 79 250 109 137 199 167 11 104 211
129 208 216 178 207 242 162 30 120 65 115 87 170 47 69 244 212 45 85
73 222 225 185 63 0 179 210 108 245 202 46 96 148 51 173 24 182 89 116
3 67 205 94 231 23 21 13 169 215 190 241 228 132 252 4 233 56 105 26
12 135 223 166 238 229 246 138 239 54 5 130 159 236 66 175 189 147 193
237 43 40 117 157 86 249 74 27 156 14 133 251 196 187 197 102 106 39
232 255 121 122 253 111 90 38 55 70 184 78 224 25 6 107 168 254 144 28
183 71
I also found the "memorable test cases" helpful here:
http://www.cypherspace.org/adam/csvec/
Including:
key="Al"+ct="Al Dakota buys"(iv="Al Dakota "):
pt = "mead"
Even though the memorable test cases require cs2, upgrading to cs2 from cs1 is fairly trivial, you may be able to confidently convert your program to cs2 from cs1 even without fully debugging the rest of it.
Also note that the FAQ claims there used to be a file on the site that wouldn't decode, make sure your target file doesn't begin with "0e e3 f9 b2 40 11 fc 3e ..."
(Though I think that was a smaller test file, not the certificate.)
Oh, and also know that the site's not really up to date on the latest research into RC4 and derivatives. Just reserve this as a toy program unless all else fails.
Python
Here's one I wrote in Python for a question that later got deleted. It processes the file as a stream so memory usage is modest.
Usage
python encrypt.py <key> <rounds> < <infile> > <outfile>
python decrypt.py <key> <rounds> < <infile> > <outfile>
rc4.py
#!/usr/bin/env python
# coding: utf-8
import psyco
from sys import stdin,stdout,argv
def rc4(K):
R=range(256)
S=R[:]
T=bytearray(K*256)[:256]
j=0
for i in R*int(argv[2]):
j=j+S[i]+T[i]&255
S[i],S[j]=S[j],S[i]
i=j=0
while True:
B=stdin.read(4096)
if not B: break
for c in B:
i+=1&255
j=j+S[i]&255
S[i],S[j]=S[j],S[i]
stdout.write(chr(ord(c)^S[S[i]+S[j]&255]))
psyco.bind(rc4)
encrypt.py
from rc4 import *
import os
V=os.urandom(10)
stdout.write(V)
rc4(argv[1]+V)
decrypt.py
from rc4 import *
V=stdin.read(10)
rc4(argv[1]+V)

Resources