I'm trying an activity online and have come across a problem. I have data about molecules and am trying to build a molecular weight calculator. The data is formatted like this:
{1, 1.0079, "Hydrogen", "H"},
{2, 4.0026, "Helium", "He"},
{3, 6.941, "Lithium", "Li"}
where it has the the atomic number, the atomic weight, the name and the abbreviation.
I've used a struct:
struct element {
int atomicNumber;
double atomicWeight;
char elementName[25];
char abbriv[5];
};
and now I need to use a global array to actually store the information. I'm confused to as how I would do this as I have three different data types for each element (int, double, char). I've tried doing some research but can't find a problem similar to this. Is it possible to put this information into an array?
Also, I've only copied in 3 element descriptions above, in reality there is 109 in total so I'm having second thoughts on how to actually store large amounts of information.
The full list:
{1, 1.0079, "Hydrogen", "H"},
{2, 4.0026, "Helium", "He"},
{3, 6.941, "Lithium", "Li"},
{4, 9.0122, "Beryllium", "Be"},
{5, 10.811, "Boron", "B"},
{6, 12.0107, "Carbon", "C"},
{7, 14.0067, "Nitrogen", "N"},
{8, 15.9994, "Oxygen", "O"},
{9, 18.9984, "Fluorine", "F"},
{10, 20.1797, "Neon", "Ne"},
{11, 22.9897, "Sodium", "Na"},
{12, 24.305, "Magnesium", "Mg"},
{13, 26.9815, "Aluminum", "Al"},
{14, 28.0855, "Silicon", "Si"},
{15, 30.9738, "Phosphorus", "P"},
{16, 32.065, "Sulfur", "S"},
{17, 35.453, "Chlorine", "Cl"},
{18, 39.948, "Argon", "Ar"},
{19, 39.0983, "Potassium", "K"},
{20, 40.078, "Calcium", "Ca"},
{21, 44.9559, "Scandium", "Sc"},
{22, 47.867, "Titanium", "Ti"},
{23, 50.9415, "Vanadium", "V"},
{24, 51.9961, "Chromium", "Cr"},
{25, 54.938, "Manganese", "Mn"},
{26, 55.845, "Iron", "Fe"},
{27, 58.9332, "Cobalt", "Co"},
{28, 58.6934, "Nickel", "Ni"},
{29, 63.546, "Copper", "Cu"},
{30, 65.39, "Zinc", "Zn"},
{31, 69.723, "Gallium", "Ga"},
{32, 72.64, "Germanium", "Ge"},
{33, 74.9216, "Arsenic", "As"},
{34, 78.96, "Selenium", "Se"},
{35, 79.904, "Bromine", "Br"},
{36, 83.8, "Krypton", "Kr"},
{37, 85.4678, "Rubidium", "Rb"},
{38, 87.62, "Strontium", "Sr"},
{39, 88.9059, "Yttrium", "Y"},
{40, 91.224, "Zirconium", "Zr"},
{41, 92.9064, "Niobium", "Nb"},
{42, 95.94, "Molybdenum", "Mo"},
{43, 98, "Technetium", "Tc"},
{44, 101.07, "Ruthenium", "Ru"},
{45, 102.9055, "Rhodium", "Rh"},
{46, 106.42, "Palladium", "Pd"},
{47, 107.8682, "Silver", "Ag"},
{48, 112.411, "Cadmium", "Cd"},
{49, 114.818, "Indium", "In"},
{50, 118.71, "Tin", "Sn"},
{51, 121.76, "Antimony", "Sb"},
{52, 127.6, "Tellurium", "Te"},
{53, 126.9045, "Iodine", "I"},
{54, 131.293, "Xenon", "Xe"},
{55, 132.9055, "Cesium", "Cs"},
{56, 137.327, "Barium", "Ba"},
{57, 138.9055, "Lanthanum", "La"},
{58, 140.116, "Cerium", "Ce"},
{59, 140.9077, "Praseodymium", "Pr"},
{60, 144.24, "Neodymium", "Nd"},
{61, 145, "Promethium", "Pm"},
{62, 150.36, "Samarium", "Sm"},
{63, 151.964, "Europium", "Eu"},
{64, 157.25, "Gadolinium", "Gd"},
{65, 158.9253, "Terbium", "Tb"},
{66, 162.5, "Dysprosium", "Dy"},
{67, 164.9303, "Holmium", "Ho"},
{68, 167.259, "Erbium", "Er"},
{69, 168.9342, "Thulium", "Tm"},
{70, 173.04, "Ytterbium", "Yb"},
{71, 174.967, "Lutetium", "Lu"},
{72, 178.49, "Hafnium", "Hf"},
{73, 180.9479, "Tantalum", "Ta"},
{74, 183.84, "Tungsten", "W"},
{75, 186.207, "Rhenium", "Re"},
{76, 190.23, "Osmium", "Os"},
{77, 192.217, "Iridium", "Ir"},
{78, 195.078, "Platinum", "Pt"},
{79, 196.9665, "Gold", "Au"},
{80, 200.59, "Mercury", "Hg"},
{81, 204.3833, "Thallium", "Tl"},
{82, 207.2, "Lead", "Pb"},
{83, 208.9804, "Bismuth", "Bi"},
{84, 209, "Polonium", "Po"},
{85, 210, "Astatine", "At"},
{86, 222, "Radon", "Rn"},
{87, 223, "Francium", "Fr"},
{88, 226, "Radium", "Ra"},
{89, 227, "Actinium", "Ac"},
{90, 232.0381, "Thorium", "Th"},
{91, 231.0359, "Protactinium", "Pa"},
{92, 238.0289, "Uranium", "U"},
{93, 237, "Neptunium", "Np"},
{94, 244, "Plutonium", "Pu"},
{95, 243, "Americium", "Am"},
{96, 247, "Curium", "Cm"},
{97, 247, "Berkelium", "Bk"},
{98, 251, "Californium", "Cf"},
{99, 252, "Einsteinium", "Es"},
{100, 257, "Fermium", "Fm"},
{101, 258, "Mendelevium", "Md"},
{102, 259, "Nobelium", "No"},
{103, 262, "Lawrencium", "Lr"},
{104, 261, "Rutherfordium", "Rf"},
{105, 262, "Dubnium", "Db"},
{106, 266, "Seaborgium", "Sg"},
{107, 264, "Bohrium", "Bh"},
{108, 277, "Hassium", "Hs"},
{109, 268, "Meitnerium", "Mt"}
Yes you are on the right track, now you can create an array of struct variables and assign all those to it at once and access it via a loop.
Something like this:
#include<stdio.h>
#define NUM_OF_ELEMENTS 109
struct element {
int atomicNumber;
double atomicWeight;
char elementName[25];
char abbriv[5];
};
struct element elements[NUM_OF_ELEMENTS]={
{1, 1.0079, "Hydrogen", "H"},
{2, 4.0026, "Helium", "He"},
{3, 6.941, "Lithium", "Li"},
{4, 9.0122, "Beryllium", "Be"},
{5, 10.811, "Boron", "B"},
{6, 12.0107, "Carbon", "C"},
{7, 14.0067, "Nitrogen", "N"},
{8, 15.9994, "Oxygen", "O"},
{9, 18.9984, "Fluorine", "F"},
{10, 20.1797, "Neon", "Ne"},
{11, 22.9897, "Sodium", "Na"},
{12, 24.305, "Magnesium", "Mg"},
{13, 26.9815, "Aluminum", "Al"},
{14, 28.0855, "Silicon", "Si"},
{15, 30.9738, "Phosphorus", "P"},
{16, 32.065, "Sulfur", "S"},
{17, 35.453, "Chlorine", "Cl"},
{18, 39.948, "Argon", "Ar"},
{19, 39.0983, "Potassium", "K"},
{20, 40.078, "Calcium", "Ca"},
{21, 44.9559, "Scandium", "Sc"},
{22, 47.867, "Titanium", "Ti"},
{23, 50.9415, "Vanadium", "V"},
{24, 51.9961, "Chromium", "Cr"},
{25, 54.938, "Manganese", "Mn"},
{26, 55.845, "Iron", "Fe"},
{27, 58.9332, "Cobalt", "Co"},
{28, 58.6934, "Nickel", "Ni"},
{29, 63.546, "Copper", "Cu"},
{30, 65.39, "Zinc", "Zn"},
{31, 69.723, "Gallium", "Ga"},
{32, 72.64, "Germanium", "Ge"},
{33, 74.9216, "Arsenic", "As"},
{34, 78.96, "Selenium", "Se"},
{35, 79.904, "Bromine", "Br"},
{36, 83.8, "Krypton", "Kr"},
{37, 85.4678, "Rubidium", "Rb"},
{38, 87.62, "Strontium", "Sr"},
{39, 88.9059, "Yttrium", "Y"},
{40, 91.224, "Zirconium", "Zr"},
{41, 92.9064, "Niobium", "Nb"},
{42, 95.94, "Molybdenum", "Mo"},
{43, 98, "Technetium", "Tc"},
{44, 101.07, "Ruthenium", "Ru"},
{45, 102.9055, "Rhodium", "Rh"},
{46, 106.42, "Palladium", "Pd"},
{47, 107.8682, "Silver", "Ag"},
{48, 112.411, "Cadmium", "Cd"},
{49, 114.818, "Indium", "In"},
{50, 118.71, "Tin", "Sn"},
{51, 121.76, "Antimony", "Sb"},
{52, 127.6, "Tellurium", "Te"},
{53, 126.9045, "Iodine", "I"},
{54, 131.293, "Xenon", "Xe"},
{55, 132.9055, "Cesium", "Cs"},
{56, 137.327, "Barium", "Ba"},
{57, 138.9055, "Lanthanum", "La"},
{58, 140.116, "Cerium", "Ce"},
{59, 140.9077, "Praseodymium", "Pr"},
{60, 144.24, "Neodymium", "Nd"},
{61, 145, "Promethium", "Pm"},
{62, 150.36, "Samarium", "Sm"},
{63, 151.964, "Europium", "Eu"},
{64, 157.25, "Gadolinium", "Gd"},
{65, 158.9253, "Terbium", "Tb"},
{66, 162.5, "Dysprosium", "Dy"},
{67, 164.9303, "Holmium", "Ho"},
{68, 167.259, "Erbium", "Er"},
{69, 168.9342, "Thulium", "Tm"},
{70, 173.04, "Ytterbium", "Yb"},
{71, 174.967, "Lutetium", "Lu"},
{72, 178.49, "Hafnium", "Hf"},
{73, 180.9479, "Tantalum", "Ta"},
{74, 183.84, "Tungsten", "W"},
{75, 186.207, "Rhenium", "Re"},
{76, 190.23, "Osmium", "Os"},
{77, 192.217, "Iridium", "Ir"},
{78, 195.078, "Platinum", "Pt"},
{79, 196.9665, "Gold", "Au"},
{80, 200.59, "Mercury", "Hg"},
{81, 204.3833, "Thallium", "Tl"},
{82, 207.2, "Lead", "Pb"},
{83, 208.9804, "Bismuth", "Bi"},
{84, 209, "Polonium", "Po"},
{85, 210, "Astatine", "At"},
{86, 222, "Radon", "Rn"},
{87, 223, "Francium", "Fr"},
{88, 226, "Radium", "Ra"},
{89, 227, "Actinium", "Ac"},
{90, 232.0381, "Thorium", "Th"},
{91, 231.0359, "Protactinium", "Pa"},
{92, 238.0289, "Uranium", "U"},
{93, 237, "Neptunium", "Np"},
{94, 244, "Plutonium", "Pu"},
{95, 243, "Americium", "Am"},
{96, 247, "Curium", "Cm"},
{97, 247, "Berkelium", "Bk"},
{98, 251, "Californium", "Cf"},
{99, 252, "Einsteinium", "Es"},
{100, 257, "Fermium", "Fm"},
{101, 258, "Mendelevium", "Md"},
{102, 259, "Nobelium", "No"},
{103, 262, "Lawrencium", "Lr"},
{104, 261, "Rutherfordium", "Rf"},
{105, 262, "Dubnium", "Db"},
{106, 266, "Seaborgium", "Sg"},
{107, 264, "Bohrium", "Bh"},
{108, 277, "Hassium", "Hs"},
{109, 268, "Meitnerium", "Mt"}
};
void main()
{
int i;
printf("%13s\t%13s\t%25s\t%s\n", "Atomic Number", "Atomic Weight", "Element Name", "Abbrevation");
for(i = 0; i < NUM_OF_ELEMENTS; i++)
printf("%13d\t%13.2lf\t%25s\t%5s\n", elements[i].atomicNumber,
elements[i].atomicWeight,
elements[i].elementName,
elements[i].abbriv);
}
And here is the output:
$ gcc prgm.c
$ ./a.out
Atomic Number Atomic Weight Element Name Abbrevation
1 1.01 Hydrogen H
2 4.00 Helium He
3 6.94 Lithium Li
4 9.01 Beryllium Be
5 10.81 Boron B
6 12.01 Carbon C
7 14.01 Nitrogen N
8 16.00 Oxygen O
9 19.00 Fluorine F
10 20.18 Neon Ne
11 22.99 Sodium Na
12 24.30 Magnesium Mg
13 26.98 Aluminum Al
14 28.09 Silicon Si
15 30.97 Phosphorus P
16 32.06 Sulfur S
17 35.45 Chlorine Cl
18 39.95 Argon Ar
19 39.10 Potassium K
20 40.08 Calcium Ca
21 44.96 Scandium Sc
22 47.87 Titanium Ti
23 50.94 Vanadium V
24 52.00 Chromium Cr
25 54.94 Manganese Mn
26 55.84 Iron Fe
27 58.93 Cobalt Co
28 58.69 Nickel Ni
29 63.55 Copper Cu
30 65.39 Zinc Zn
31 69.72 Gallium Ga
32 72.64 Germanium Ge
33 74.92 Arsenic As
34 78.96 Selenium Se
35 79.90 Bromine Br
36 83.80 Krypton Kr
37 85.47 Rubidium Rb
38 87.62 Strontium Sr
39 88.91 Yttrium Y
40 91.22 Zirconium Zr
41 92.91 Niobium Nb
42 95.94 Molybdenum Mo
43 98.00 Technetium Tc
44 101.07 Ruthenium Ru
45 102.91 Rhodium Rh
46 106.42 Palladium Pd
47 107.87 Silver Ag
48 112.41 Cadmium Cd
49 114.82 Indium In
50 118.71 Tin Sn
51 121.76 Antimony Sb
52 127.60 Tellurium Te
53 126.90 Iodine I
54 131.29 Xenon Xe
55 132.91 Cesium Cs
56 137.33 Barium Ba
57 138.91 Lanthanum La
58 140.12 Cerium Ce
59 140.91 Praseodymium Pr
60 144.24 Neodymium Nd
61 145.00 Promethium Pm
62 150.36 Samarium Sm
63 151.96 Europium Eu
64 157.25 Gadolinium Gd
65 158.93 Terbium Tb
66 162.50 Dysprosium Dy
67 164.93 Holmium Ho
68 167.26 Erbium Er
69 168.93 Thulium Tm
70 173.04 Ytterbium Yb
71 174.97 Lutetium Lu
72 178.49 Hafnium Hf
73 180.95 Tantalum Ta
74 183.84 Tungsten W
75 186.21 Rhenium Re
76 190.23 Osmium Os
77 192.22 Iridium Ir
78 195.08 Platinum Pt
79 196.97 Gold Au
80 200.59 Mercury Hg
81 204.38 Thallium Tl
82 207.20 Lead Pb
83 208.98 Bismuth Bi
84 209.00 Polonium Po
85 210.00 Astatine At
86 222.00 Radon Rn
87 223.00 Francium Fr
88 226.00 Radium Ra
89 227.00 Actinium Ac
90 232.04 Thorium Th
91 231.04 Protactinium Pa
92 238.03 Uranium U
93 237.00 Neptunium Np
94 244.00 Plutonium Pu
95 243.00 Americium Am
96 247.00 Curium Cm
97 247.00 Berkelium Bk
98 251.00 Californium Cf
99 252.00 Einsteinium Es
100 257.00 Fermium Fm
101 258.00 Mendelevium Md
102 259.00 Nobelium No
103 262.00 Lawrencium Lr
104 261.00 Rutherfordium Rf
105 262.00 Dubnium Db
106 266.00 Seaborgium Sg
107 264.00 Bohrium Bh
108 277.00 Hassium Hs
109 268.00 Meitnerium Mt
$
What you're looking to do is creating an array of structs.
Eg.
struct elements myArray[108];
myArray[0].atomicNumber = 1;
myArray[0].atomicWeight = 1.0079;
myArray[0].elementName = "Hydrogen";
myArray[0].abbriv = "H";
Obviously there are a lot of elements in your table, so you can easily automate this process with a loop.
Related
I have the following code that for a sorted Pandas data frame, groups by one column, and creates two new columns: one according to the previous 4 rows and current row in the group, and one based on the future row in the group.
data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame (data_test, columns = ['nr','val'])
print (df_test)
hence the following frame:
nr val
0 1 11
1 1 12
2 1 13
3 1 14
4 1 15
5 6 61
6 6 62
7 6 63
8 6 64
9 6 65
10 6 66
11 6 67
Now I have to following code which groups by 'nr' and build one column containing for each row previous 4 values of 'val' in the group and the current value. Similarly is build one extra column containing per row the future value of 'val' in the group.
df_test['past4'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(4).fillna(0))
df_test['past3'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(3).fillna(0))
df_test['past2'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(2).fillna(0))
df_test['past1'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(1).fillna(0))
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = df_test[['past4', 'past3','past2','past1','val']].values.tolist()
df_test.drop(columns = ['past4', 'past3', 'past2', 'past1'], inplace = True)
df_test
nr val future amounts
0 1 11 12 [0, 0, 0, 0, 11]
1 1 12 13 [0, 0, 0, 11, 12]
2 1 13 14 [0, 0, 11, 12, 13]
3 1 14 15 [0, 11, 12, 13, 14]
4 1 15 0 [11, 12, 13, 14, 15]
5 6 61 62 [0, 0, 0, 0, 61]
6 6 62 63 [0, 0, 0, 61, 62]
7 6 63 64 [0, 0, 61, 62, 63]
8 6 64 65 [0, 61, 62, 63, 64]
9 6 65 66 [61, 62, 63, 64, 65]
10 6 66 67 [62, 63, 64, 65, 66]
11 6 67 0 [63, 64, 65, 66, 67]
I'm sure I should be able to build the one list column called 'amounts' easier, probably one-liner. How can I do this?
Use custom function for create nested lists like:
def f(x):
#list comprehension with shift by 4,3,2,1,0
L = [x['val'].shift(i).fillna(0) for i in range(4, -1, -1)]
#shifting to another column
x['future'] = x['val'].shift(-1).fillna(0).astype(int)
#column filled by lists
x['amounts'] = pd.Series(np.array(L).astype(int).T.tolist(), index=x.index)
return (x)
df_test = df_test.groupby(['nr']).apply(f)
print (df_test)
nr val future amounts
0 1 11 12 [0, 0, 0, 0, 11]
1 1 12 13 [0, 0, 0, 11, 12]
2 1 13 14 [0, 0, 11, 12, 13]
3 1 14 15 [0, 11, 12, 13, 14]
4 1 15 0 [11, 12, 13, 14, 15]
5 6 61 62 [0, 0, 0, 0, 61]
6 6 62 63 [0, 0, 0, 61, 62]
7 6 63 64 [0, 0, 61, 62, 63]
8 6 64 65 [0, 61, 62, 63, 64]
9 6 65 66 [61, 62, 63, 64, 65]
10 6 66 67 [62, 63, 64, 65, 66]
11 6 67 0 [63, 64, 65, 66, 67]
Migrating your bloc into a function make the code more modular and lighter
In this specific example we send reversed(range(5)) as shift_values, this represents the list [4, 3, 2, 1, 0]
import pandas as pd
data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame(data_test, columns = ['nr','val'])
def generate_past(df, shift_values):
serie = pd.DataFrame([df.groupby('nr')['val'].transform(lambda x: x.shift(shift_value).fillna(0)) for shift_value in shift_values])
return serie.T.values.tolist()
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = generate_past(df_test, reversed(range(5)))
you can try like this (same as jezrael) but without using apply. Not a good approach as I am making new dataframe.
df_new = pd.DataFrame()
for i,grp in df_test.groupby('nr'):
grp = grp.reset_index(drop=True)
grp['future'] = pd.Series(grp['val'].shift(-1).fillna(0).astype(int))
grp['amount'] = pd.Series([grp['val'].shift(i).fillna(0).values[-5:] for i in range(len(grp)-1,-1,-1)])
df_new = df_new.append(grp)
df_new.reset_index(drop=True, inplace=True)
df_new:
nr val future amounts
0 1 11 12 [0.0, 0.0, 0.0, 0.0, 11.0]
1 1 12 13 [0.0, 0.0, 0.0, 11.0, 12.0]
2 1 13 14 [0.0, 0.0, 11.0, 12.0, 13.0]
3 1 14 15 [0.0, 11.0, 12.0, 13.0, 14.0]
4 1 15 0 [11, 12, 13, 14, 15]
5 6 61 62 [0.0, 0.0, 0.0, 0.0, 61.0]
6 6 62 63 [0.0, 0.0, 0.0, 61.0, 62.0]
7 6 63 64 [0.0, 0.0, 61.0, 62.0, 63.0]
8 6 64 65 [0.0, 61.0, 62.0, 63.0, 64.0]
9 6 65 66 [61.0, 62.0, 63.0, 64.0, 65.0]
10 6 66 67 [62.0, 63.0, 64.0, 65.0, 66.0]
11 6 67 0 [63, 64, 65, 66, 67]
I am implementing a pearson hash in order to create a lightweight dictionary structure for a C project which requires a table of files names paired with file data - I want the nice constant search property of hash tables. I'm no math expert so I looked up good text hashes and pearson came up, with it being claimed to be effective and having a good distribution. I tested my implementation and found that no matter how I vary the table size or the filename max length, the hash is very inefficient, with for example 18/50 buckets being left empty. I trust wikipedia to not be lying, and yes I am aware I can just download a third party hash table implementation, but I would dearly like to know why my version isn't working.
In the following code, (a function to insert values into the table), "csString" is the filename, the string to be hashed, "cLen" is the length of the string, "pData" is a pointer to some data which is inserted into the table, and "pTable" is the table struct. The initial condition cHash = cLen - csString[0] is somethin I experimentally found to marginally improve uniformity. I should add that I am testing the table with entirely randomised strings (using rand() to generate ascii values) with randomised length between a certain range - this is in order to easily generate and test large amounts of values.
typedef struct StaticStrTable {
unsigned int nRepeats;
unsigned char nBuckets;
unsigned char nMaxCollisions;
void** pBuckets;
} StaticStrTable;
static const char cPerm256[256] = {
227, 117, 238, 33, 25, 165, 107, 226, 132, 88, 84, 68, 217, 237, 228, 58, 52, 147, 46, 197, 191, 119, 211, 0, 218, 139, 196, 153, 170, 77, 175, 22, 193, 83, 66, 182, 151, 99, 11, 144, 104, 233, 166, 34, 177, 14, 194, 51, 30, 121, 102, 49,
222, 210, 199, 122, 235, 72, 13, 156, 38, 145, 137, 78, 65, 176, 94, 163, 95, 59, 92, 114, 243, 204, 224, 43, 185, 168, 244, 203, 28, 124, 248, 105, 10, 87, 115, 161, 138, 223, 108, 192, 6, 186, 101, 16, 39, 134, 123, 200, 190, 195, 178,
164, 9, 251, 245, 73, 162, 71, 7, 239, 62, 69, 209, 159, 3, 45, 247, 19, 174, 149, 61, 57, 146, 234, 189, 15, 202, 89, 111, 207, 31, 127, 215, 198, 231, 4, 181, 154, 64, 125, 24, 93, 152, 37, 116, 160, 113, 169, 255, 44, 36, 70, 225, 79,
250, 12, 229, 230, 76, 167, 118, 232, 142, 212, 98, 82, 252, 130, 23, 29, 236, 86, 240, 32, 90, 67, 126, 8, 133, 85, 20, 63, 47, 150, 135, 100, 103, 173, 184, 48, 143, 42, 54, 129, 242, 18, 187, 106, 254, 53, 120, 205, 155, 216, 219, 172,
21, 253, 5, 221, 40, 27, 2, 179, 74, 17, 55, 183, 56, 50, 110, 201, 109, 249, 128, 112, 75, 220, 214, 140, 246, 213, 136, 148, 97, 35, 241, 60, 188, 180, 206, 80, 91, 96, 157, 81, 171, 141, 131, 158, 1, 208, 26, 41
};
void InsertStaticStrTable(char* csString, unsigned char cLen, void* pData, StaticStrTable* pTable) {
unsigned char cHash = cLen - csString[0];
for (int i = 0; i < cLen; ++i) cHash ^= cPerm256[cHash ^ csString[i]];
unsigned short cTableIndex = cHash % pTable->nBuckets;
long long* pBucket = pTable->pBuckets[cTableIndex];
// Inserts data and records how many collisions there are - it may look weird as the way in which I decided to pack the data into the table buffer is very compact and arbitrary
// It won't affect the hash though, which is the key issue!
for (int i = 0; i < pTable->nMaxCollisions; ++i) {
if (i == 1) {
pTable->nRepeats++;
}
long long* pSlotID = pBucket + (i << 1);
if (pSlotID[0] == 0) {
pSlotID[0] = csString;
pSlotID[1] = pData;
break;
}
}
}
FYI (This is not an answer, I just need the formatting)
These are just single runs from a simulation, YMMV.
distributing 50 elements randomly over 50 bins:
kalender_size=50 nperson = 50
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 18 (0.360000) 0 (0.000000) 0 0 0
1: 18 (0.360000) 18 (0.360000) 1 18 18
2: 10 (0.200000) 20 (0.400000) 3 30 48
3: 4 (0.080000) 12 (0.240000) 6 24 72
----+---------+--------+----------+--------+------+--------+--------
4: 50 50 1.440000 72
Similarly: distribute 365 persons over a birthday-calendar (ignoring leap days ...):
kalender_size=356 nperson = 356
E/cell| Ncell | frac | Nelem | frac |h/cell| hops | Cumhops
----+---------+--------+----------+--------+------+--------+--------
0: 129 (0.362360) 0 (0.000000) 0 0 0
1: 132 (0.370787) 132 (0.370787) 1 132 132
2: 69 (0.193820) 138 (0.387640) 3 207 339
3: 19 (0.053371) 57 (0.160112) 6 114 453
4: 6 (0.016854) 24 (0.067416) 10 60 513
5: 1 (0.002809) 5 (0.014045) 15 15 528
----+---------+--------+----------+--------+------+--------+--------
6: 356 356 1.483146 528
For N items over N slots, the expectation for the number of empty slots and the number of slots with a single item in them is equal. The expected density is 1/e for both.
The final number (1.483146) is the number of ->next pointer traversels per found element (when using a chained hash table) Any optimal hash function will almost reach 1.5.
I have this array that comes from a previous a=array.unpack("C*") command.
a = [9, 32, 50, 53, 56, 53, 57, 9, 73, 78, 70, 79, 9, 73, 78, 70, 79, 53, 9,
32, 55, 52, 32, 50, 51, 32, 48, 51, 32, 57, 50, 32, 48, 48, 32, 48, 48, 32,
48, 48, 32, 69, 67, 32, 48, 50, 32, 49, 48, 32, 48, 48, 32, 69, 50, 32, 48,
48, 32, 55, 55, 9, 0, 0, 0, 0, 1, 12, 1, 0, 0, 0, 57, 254, 70, 6, 1, 6, 0, 3,
0, 3, 198, 0, 2, 198, 31, 147, 23, 0, 226, 7, 12, 17, 18, 56, 55, 3, 101, 1,
1, 0, 134, 7, 145, 5, 148, 37, 150, 133, 241, 135, 5, 22, 109, 145, 53, 38,
171, 4, 3, 2, 6, 192, 173, 22, 160, 20, 48, 18, 6, 9, 42, 134, 58, 0, 137, 97,
58, 1, 0, 164, 5, 48, 3, 129, 1, 7, 225, 16, 2, 1, 1, 4, 11, 9, 1, 10, 10, 6,
2, 19, 105, 145, 103, 116, 226, 35, 48, 3, 194, 1, 242, 48, 3, 194, 1, 241, 48,
3, 194, 1, 246, 48, 3, 194, 1, 245, 48, 3, 194, 1, 244, 48, 3, 194, 1, 243, 48,
3, 194, 1, 247, 177, 13, 10, 1, 1, 4, 8, 10, 6, 2, 19, 105, 145, 103, 116, 0, 0,
42, 3, 0, 0, 48, 48, 48, 48, 48, 48, 48, 50, 9, 82, 101, 99, 101, 105, 118, 101,
9, 50, 51, 9, 77, 111, 110, 32, 32]
when I convert to chr it looks like this:
irb(main):4392:0> a.map(&:chr).join
=> "\t 25859\tINFO\tINFO5\t 74 23 03 92 00 00 00 EC 02 10 00 E2 00 77\t\x00\x00\x00\x00
\x01\f\x01\x00\x00\x009\xFEF\x06\x01\x06\x00\x03\x00\x03\xC6\x00\x02\xC6\x1F\x93\x17\x00
\xE2\a\f\x11\x1287\x03e\x01\x01\x00\x86\a\x91\x05\x94%\x96\x85\xF1\x87\x05\x16m\x915&\xAB
\x04\x03\x02\x06\xC0\xAD\x16\xA0\x140\x12\x06\t*\x86:\x00\x89a:\x01\x00\xA4\x050\x03\x81
\x01\a\xE1\x10\x02\x01\x01\x04\v\t\x01\n\n\x06\x02\x13i\x91gt\xE2#0\x03\xC2\x01\xF20\x03
\xC2\x01\xF10\x03\xC2\x01\xF60\x03\xC2\x01\xF50\x03\xC2\x01\xF40\x03\xC2\x01\xF30\x03\xC2
\x01\xF7\xB1\r\n\x01\x01\x04\b\n\x06\x02\x13i\x91gt\x00\x00*\x03\x00\x000000..."
I would like to extract the hexadecimal values between INFO5\t and \t..., so the output would be
"74 23 03 92 00 00 00 EC 02 10 00 E2 00 77"
I'm doing like below but only removes the first unwanted part and leaves \n\n\x06...000
How can I fix this?
irb(main)>: a.map(&:chr).join.gsub(/(\t .*\t )|(\t.*)/,"")
=> "74 23 03 92 00 00 00 EC 02 10 00 E2 00 77\n\n\x06\x02\x13i\x91gt\xE2#0
\x03\xC2\x01\xF20\x03\xC2\x01\xF10\x03\xC2\x01\xF60\x03\xC2\x01\xF50\x03\xC2
\x01\xF40\x03\xC2\x01\xF30\x03\xC2\x01\xF7\xB1\r\n\x01\x01\x04\b\n\x06\x02\
x13i\x91gt\x00\x00*\x03\x00\x0000000002"
Thanks for the help in advance.
UDPATE
Below attached sample binary file.
input.dat
Here are two approaches (a below is abbreviated from that given in the question).
a = [9, 32, 50, 53, 56, 53, 57, 9, 73, 78, 70, 79, 9, 73, 78, 70, 79, 53, 9,
32, 55, 52, 32, 50, 51, 32, 48, 51, 32, 57, 50, 32, 48, 48, 32, 48, 48,
32, 48, 48, 32, 69, 67, 32, 48, 50, 32, 49, 48, 32, 48, 48, 32, 69, 50,
32, 48, 48, 32, 55, 55, 9, 0, 0]
Extract from the string that had been unpacked to create a
str = a.pack("C*")
#=> "\t 25859\tINFO\tINFO5\t 74 23 03 92 00 00 00 EC 02 10 00 E2 00 77\t\x00\x00"
str[/(?<=INFO5\t).+?(?=\t)/].strip
#=> "74 23 03 92 00 00 00 EC 02 10 00 E2 00 77"
str is the string that had been converted to a (a = str.unpack("C*)), so it need not be computed.
(?<=INFO5\t ) and (?=\t) are respectively a positive lookbehind and a positive lookahead. They must be matched but are not part of the match that is returned. The ("non-greedy") question mark in .+? ensures that the match terminates immediately before the first tab is encountered. By contrast,
"abc\td\tef"[/(?<=a).+(?=\t)/]
#=> "bc\td"
Extract from a and convert to a string
pfix = "INFO5\t".unpack("C*")
#=> [73, 78, 70, 79, 53, 9]
pfix_size = pfix.size
#=> 6
sfix = [prefix.last]
#=> [9]
sfix_size = sfix.size
start = idx_start(a, pfix) + pfix_size
#=> 19
a[start..idx_start(a[start..-1], sfix) + start - 1].pack("C*").strip
#=> "74 23 03 92 00 00 00 EC 02 10 00 E2 00 77"
def idx_start(a, arr)
arr_size = arr.size
a.each_index.find { |i| a[i, arr_size] == arr }
end
I assume you mean a=str.unpack("C*") - you can unpack a string but not an array.
To get the result you want, you don't need to use unpack at all1 - just perform a regex:
str.match(/INFO5\t(.*?)\t/).to_a[1]
# => " 74 23 03 92 00 00 00 EC 02 10 00 E2 00 77"
Note that there's a leading space in the result, but you can adjust the regex according to your needs; I'm not going to try to guess the specification of this format.
Tips:
The ? in .*? is needed to make the * non-greedy.
The to_a avoids raiseing an error in case the match finds nothing.
EDIT
Your comment regarding "invalid byte sequence in UTF-8" indicates that your data is probably ASCII-8BIT (i.e. it's not compatible with UTF-8), but it's stored in a string whose encoding attribute is "UTF-8". It would help if you explain how you obtained that string, because the string's encoding appears to be wrong.
Solution 1 (this is ideal):
Read in the file as ASCII-8BIT:
str = File.read("input.dat", encoding: 'ASCII-8BIT')
Solution 2 (a workaround, if you can't control the input encoding):
# NOTE: this changes the encoding on `str`
str.force_encoding("ASCII-8BIT")
After you've done this, the .match should work.
Further Explanation
The reason your map(&:chr).join works is because .chr will produce either US-ASCII or ASCII-8BIT strings (the latter happens for bytes above 127), never UTF-8.
When you join those strings, your result is in ASCII-8BIT if any byte was above 127. So this is effectively the same as calling force_encoding("ASCII-8BIT"), except that map/join doesn't modify the original string's encoding like force_encoding does.
1unpack is unnecessary because a.map(&:chr).join is the same as arr.pack('C*') which gives you the original str. Even if you had to unpack the string for another purpose, I recommend using the original string instead of re-packing the array. Maybe you can encapsulate this into a data structure, e.g.:
i_data = InfoData.new(str)
i_data.bytes # array of bytes
i_data.hex_string # "74 23 03 ..."
Note that the above code won't work as-is - you need to write the InfoData class yourself.
I assume that you don't need the non-ascii bytes, so in first step I trim them to the first null byte using take_while
Then I convert ints to string using map(&:chr).join
Finally I match them using a regex that /INFO5\t ?([^\t]*)\t/ that assumes the interesting part is between INFO5\t and next \t
--
a=array.unpack("C*")
a.take_while{|e| e > 0}.map(&:chr).join.match(/INFO5\t ?([^\t]*)\t/)[1]
# => "74 23 03 92 00 00 00 EC 02 10 00 E2 00 77"
I want to do something like this, where df.index matches 2dim_arr exactly
df['newcol']=2dim_arr[df.index][df.existingcol.values]
I can get at the values I want if I do this:
for i in range(len(df)):
print(2dim_arr[i][df.iloc[i].existingcol])
Thanks in advance for assistance.
You are basically using the values from existingcol as column indices and going through each row of the 2D array to select one element per row off the 2D array. Thus, we can use NumPy's integer array indexing to achieve the desired new column -
col_idx = df.existingcol.values
df['newcol'] = dim2_arr[np.arange(len(dim2_arr)), col_idx]
Sample run -
1) Inputs :
In [311]: df
Out[311]:
existingcol
0 2
1 0
2 0
3 1
4 0
5 2
6 1
7 4
8 3
9 3
In [313]: dim2_arr
Out[313]:
array([[25, 75, 70, 45, 67],
[21, 85, 74, 68, 61],
[79, 33, 22, 77, 25],
[69, 31, 67, 11, 45],
[50, 12, 35, 55, 89],
[62, 59, 86, 55, 58],
[67, 41, 77, 88, 79],
[64, 30, 36, 25, 21],
[24, 73, 68, 84, 79],
[50, 53, 55, 71, 84]])
2) Use proposed codes :
In [314]: col_idx = df.existingcol.values
In [317]: df['newcol'] = dim2_arr[np.arange(len(dim2_arr)), col_idx]
In [318]: df
Out[318]:
existingcol newcol
0 2 70
1 0 21
2 0 79
3 1 31
4 0 50
5 2 86
6 1 41
7 4 21
8 3 84
9 3 71
I recently wrote code for a project euler problem, and by the time I had worked around every bug I ran into my code was pretty convoluted and no longer pretty and efficient. I had to manually manipulate my data far too much for my liking. I cannot find a straight forward answer elsewhere and would like a more graceful solution.
I'm not even sure this is possible in C, so keep that in mind.
The problem requires analyzing a grid of data that is in pain text.
The grid is as follows...
08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48
The idea is to find the largest possible product of four adjacent numbers (vertical, horizontal, or diagonal).
In the end my solution involved manually inputting this into a two-dimensional int array and manually changing all 08's or 09's to 8's and 9's to avoid the octal number problem.
Like so...
int str[20][20] = {{ 8, 02, 22, 97, 38, 15, 00, 40, 00, 75, 04, 05, 07, 78, 52, 12, 50, 77, 91, 8},{49, 49, 99, 40, 17, 81, 18, 57, 60, 87, 17, 40, 98, 43, 69, 48, 04, 56, 62, 00},{81, 49, 31, 73, 55, 79, 14, 29, 93, 71, 40, 67, 53, 88, 30, 03, 49, 13, 36, 65},{52, 70, 95, 23, 04, 60, 11, 42, 69, 24, 68, 56, 01, 32, 56, 71, 37, 02, 36, 91},{22, 31, 16, 71, 51, 67, 63, 89, 41, 92, 36, 54, 22, 40, 40, 28, 66, 33, 13, 80},{24, 47, 32, 60, 99, 03, 45, 02, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50},{32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70},{67, 26, 20, 68, 02, 62, 12, 20, 95, 63, 94, 39, 63, 8, 40, 91, 66, 49, 94, 21},{24, 55, 58, 05, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72},{21, 36, 23, 9, 75, 00, 76, 44, 20, 45, 35, 14, 00, 61, 33, 97, 34, 31, 33, 95},{78, 17, 53, 28, 22, 75, 31, 67, 15, 94, 03, 80, 04, 62, 16, 14, 9, 53, 56, 92},{16, 39, 05, 42, 96, 35, 31, 47, 55, 58, 88, 24, 00, 17, 54, 24, 36, 29, 85, 57},{86, 56, 00, 48, 35, 71, 89, 07, 05, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58},{19, 80, 81, 68, 05, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77, 04, 89, 55, 40},{04, 52, 8, 83, 97, 35, 99, 16, 07, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66},{88, 36, 68, 87, 57, 62, 20, 72, 03, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69},{04, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18, 8, 46, 29, 32, 40, 62, 76, 36},{20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74, 04, 36, 16},{20, 73, 35, 29, 78, 31, 90, 01, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57, 05, 54},{01, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52, 01, 89, 19, 67, 48}};
This is not only tedious but it seems in efficient as well. Is there a way in c to take this data from the plain text grid, besides using a char string? And if not what would be a more elegant way to take this data?
I am self taught so I apologize for any glaring holes in what I know.
Is there a way in c to take this data from the plain text grid, besides using a char string? And if not what would be a more elegant way to take this data?
The approach to take is: save the data as a file (say input.txt) and pipe it to my program and read all of the entries through stdin. It would look like the following:
int rows = 20;
int cols = 20;
int arr[ rows ][ cols ] = { 0 };
int crow = 0;
int ccol = 0;
int num;
// Iterates until EOF is sent through stdin.
while ( scanf( "%d", &num ) != EOF ) {
// Determines whether we have filled all of the columns, if so
// reset the current column to 0 and increase the current row
// by 1.
if ( ccol >= 20 ) {
ccol = 0;
crow++;
}
// Mutate arr at position ( ( col * crow ) + ccol ) to have the
// value num.
arr[ crow ][ ccol ] = num;
}
... this would be inside a function in your driver file (possibly main). What this is doing is, reading each number one at a time then populating the array and stopping when EOF is sent (end of file). See documentation for scanf (here) for further details.
You would then run your program as follows to pipe the input file to your program:
./program.out < input.txt
Remark:
I am not using a dynamic array or blocks of memory from the memory pool. If you plan to receive an arbitrarily large file then I suggest implementing a dynamic array using the memory pool (as the stack is rather small in comparison to the memory pool).