How do I correctly add a chain ID to my pdb file? - file

I am trying to conduct some analysis with my single-chain PDB file (766 residues long), but it requires a chain ID. Currently, there isn't one.
Here is a snippet of the pdb file:
ATOM 1 N MET 1 -69.269 78.953 -91.441 1.00 0.00 N
ATOM 2 CA MET 1 -69.264 78.650 -92.891 1.00 0.00 C
ATOM 4 C MET 1 -69.371 79.939 -93.633 1.00 0.00 C
ATOM 5 O MET 1 -68.379 80.649 -93.799 1.00 0.00 O
ATOM 3 CB MET 1 -70.475 77.774 -93.251 1.00 0.00 C
ATOM 6 CG MET 1 -70.505 76.455 -92.477 1.00 0.00 C
ATOM 7 SD MET 1 -69.115 75.332 -92.806 1.00 0.00 S
ATOM 8 CE MET 1 -69.473 74.270 -91.377 1.00 0.00 C
ATOM 9 N ASP 2 -70.583 80.284 -94.111 1.00 0.00 N
ATOM 10 CA ASP 2 -70.688 81.539 -94.789 1.00 0.00 C
ATOM 12 C ASP 2 -70.661 82.602 -93.737 1.00 0.00 C
ATOM 13 O ASP 2 -71.088 82.377 -92.606 1.00 0.00 O
ATOM 11 CB ASP 2 -71.963 81.733 -95.626 1.00 0.00 C
ATOM 14 CG ASP 2 -71.691 82.908 -96.557 1.00 0.00 C
ATOM 15 OD1 ASP 2 -70.569 82.953 -97.130 1.00 0.00 O
ATOM 16 OD2 ASP 2 -72.598 83.768 -96.717 1.00 0.00 O1-
ATOM 17 N HIS 3 -70.129 83.791 -94.077 1.00 0.00 N
ATOM 18 CA HIS 3 -70.045 84.846 -93.110 1.00 0.00 C
ATOM 20 C HIS 3 -71.342 85.581 -93.094 1.00 0.00 C
ATOM 21 O HIS 3 -72.113 85.574 -94.052 1.00 0.00 O
ATOM 19 CB HIS 3 -68.925 85.865 -93.404 1.00 0.00 C
ATOM 23 CG HIS 3 -68.749 86.908 -92.336 1.00 0.00 C
ATOM 25 CD2 HIS 3 -67.998 86.879 -91.200 1.00 0.00 C
ATOM 22 ND1 HIS 3 -69.357 88.144 -92.351 1.00 0.00 N
ATOM 26 CE1 HIS 3 -68.947 88.797 -91.234 1.00 0.00 C
ATOM 24 NE2 HIS 3 -68.121 88.068 -90.504 1.00 0.00 N
What's the best way for me to label the chain as chain A?

Here's the answer.
We need to read the file line by line and put a chain into column 22 of each line that begins with ATOM. Assuming the file is called myfile.pdb, we are trying to replace the empty space that is separated by 17 characters from ATOM with the letter A. This can be accomplished with a relatively simple sed command.
sed 's/^\(ATOM.\{17\}\) /\1A/' myfile.pdb > newfile.pdb
Hope this is helpful!

Related

KNN Algorithm is Giving good Accuracy with Bad Confusion Matrix Results

I have data with multilabel classification. I used KNN model in order to classify it. The number of labels are 15, I got accuracy results for each label, averaged the results to get the accuracy of the model which is 93%.
The confusion matrix is showing bad numbers.
Would you tell me what does this mean? Is it overfitting? How can I solve my problem?
Accuracy and mean absolute error (mae) code
Input:
# Getting the accuracy of the model
y_pred1 = level_1_knn_model.predict(X_val1)
accuracy = (sum(y_val1==y_pred1)/y_val1.shape[0])*100
accuracy = sum(accuracy)/len(accuracy)
print("Accuracy: "+str(accuracy)+"%\n")
# Getting the mean absolute error
mae1 = mean_absolute_error(y_val1, y_pred1)
print("Mean Absolute Error: "+str(mae1))
Output:
Accuracy: [96.55462575 97.82146336 99.23207908 95.39247451 98.69340807 74.22793801
78.67975909 97.47825108 99.80189098 77.67264969 91.69399776 99.97084683
99.42621267 99.32682688 99.74159693]%
Accuracy: 93.71426804569977%
Mean Absolute Error: 9.703818402273944
Confusion Matrix and classification report code
Input:
# Calculate the confusion matrix
cMatrix1 = confusion_matrix(y_val1.argmax(axis=1), y_pred1.argmax(axis=1))
# Plot the confusion matrix
plt.figure(figsize=(11,10))
sns.heatmap(cMatrix1, annot=True, fmt='g')
# Calculate the classification report
classReport1 = classification_report(y_val1, y_pred1)
print("\nClassification Report:")
print(classReport1)
Output:
Classification Report:
precision recall f1-score support
0 0.08 0.00 0.01 5053
1 0.03 0.00 0.01 3017
2 0.00 0.00 0.00 1159
3 0.07 0.00 0.01 6644
4 0.00 0.00 0.00 1971
5 0.58 0.65 0.61 47222
6 0.39 0.33 0.36 27302
7 0.02 0.00 0.00 3767
8 0.00 0.00 0.00 299
9 0.58 0.61 0.60 40823
10 0.13 0.02 0.03 11354
11 0.00 0.00 0.00 44
12 0.00 0.00 0.00 866
13 0.00 0.00 0.00 1016
14 0.00 0.00 0.00 390
micro avg 0.54 0.43 0.48 150927
macro avg 0.13 0.11 0.11 150927
weighted avg 0.43 0.43 0.42 150927
samples avg 0.43 0.43 0.43 150927

How can I import this formated txt file into a SQL Server Table

I have txt files that are a output of one especific Software, that file always have the same format, see it below. The datas starts at line 31, in this example i put only a feel lines of the TXT file, I can not change this format due to it is the output of one software. I need to import this file into a Table in SQL Server. How do I do it?
RECORDED_YEAR Col: 1 - 4 Decs: 0 Mult: 1.000000
RECORDED_DAY Col: 5 - 8 Decs: 0 Mult: 1.000000
RECORDED_HOUR Col: 9 - 10 Decs: 0 Mult: 1.000000
RECORDED_MINUTE Col: 11 - 12 Decs: 0 Mult: 1.000000
RECORDED_SECOND Col: 13 - 14 Decs: 0 Mult: 1.000000
SHOTLINE_NUMBER Col: 18 - 21 Decs: 0 Mult: 1.000000
SHOT_POINT_NO Col: 22 - 25 Decs: 0 Mult: 1.000000
RECEIVERLINE_NUMBER Col: 26 - 29 Decs: 0 Mult: 1.000000
FIELD_STATION_NUMBER Col: 30 - 33 Decs: 0 Mult: 1.000000
XREC Col: 35 - 45 Decs: 2 Mult: 1.000000
YREC Col: 46 - 56 Decs: 2 Mult: 1.000000
ELEV_REC Col: 57 - 62 Decs: 2 Mult: 1.000000
XSHOT Col: 63 - 73 Decs: 2 Mult: 1.000000
YSHOT Col: 74 - 84 Decs: 2 Mult: 1.000000
ELEV_SHOT Col: 85 - 90 Decs: 2 Mult: 1.000000
TRCHDR3_TILTERROR Col: 91 - 92 Decs: 0 Mult: 1.000000
TRCHDR3_RESISTERROR Col: 93 - 94 Decs: 0 Mult: 1.000000
TRCHDR5_LEAKAGEERROR Col: 95 - 96 Decs: 0 Mult: 1.000000
FIELD_RECORD_NO Col: 97 - 102 Decs: 0 Mult: 1.000000
EXTHDR_SWATHID Col: 103 - 106 Decs: 0 Mult: 1.000000
DATA_RMSAMPLITUDE Col: 109 - 119 Decs: 8 Mult: 10000.000000
VWUSER_1 Col: 121 - 125 Decs: 0 Mult: 1.000000
CHANNEL_NO Col: 127 - 137 Decs: 0 Mult: 1.000000
VWUSER_7 Col: 140 - 142 Decs: 0 Mult: 1.000000
VWUSER_8 Col: 144 - 146 Decs: 1 Mult: 1.000000
DATA_MAXFREQ Col: 148 - 154 Decs: 3 Mult: 1.000000
DATA_MAXABSAMPLITUDE Col: 156 - 166 Decs: 4 Mult: 10000.000000
VWUSER_22 Col: 168 - 175 Decs: 3 Mult: 1.000000
VWUSER_11 Col: 177 - 181 Decs: 0 Mult: 1.000000
VWUSER_12 Col: 183 - 187 Decs: 0 Mult: 1.000000
18 327113458 5090115210965074 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 57.74633959 1 1 0 1.0 13.645 3703.4148 0.008 1 1
18 327113458 5090115210965075 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 35.32746807 1 2 0 1.0 18.519 3493.8994 0.008 1 1
18 327113458 5090115210965076 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 86.58912033 1 3 0 1.0 22.904 4077.5797 0.008 1 1
18 327113458 5090115210965077 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 53.32520232 1 4 0 1.0 23.392 5024.1262 0.008 1 1
18 327113458 5090115210965078 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 87.56771684 1 5 0 1.0 22.417 6922.9585 0.008 1 1
It looks like fixed length data. You can try the following Bulk Insert and then parse the results into a table using Substring()
CREATE TABLE #tempTable
(
RowVal VarChar(Max)
)
BULK INSERT #tempTable
FROM 'c:\Downloads\Fixedtxt.txt'
WITH
(
FIRSTROW = 31,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
SELECT * FROM #tempTable
Here's some parsing from what appears to be a structured portion of your text file representing your data.
CREATE TABLE #tempTable
(
RowVal VarChar(Max)
)
BULK INSERT #tempTable
FROM 'c:\Downloads\Fixedtxt.txt'
WITH
(
FIRSTROW = 31,
ROWTERMINATOR = '\n'
)
SELECT
SubString(RowVal,1,5) As f01,
SubString(RowVal,6,9) As f02,
SubString(RowVal,18,16) As f03,
SubString(RowVal,35,8) As f04,
SubString(RowVal,44,10) As f05,
SubString(RowVal,55,6) As f06,
SubString(RowVal,62,9) As f07,
SubString(RowVal,72,10) As f08,
SubString(RowVal,83,6) As f09,
SubString(RowVal,90,3) As f10,
SubString(RowVal,94,1) As f11,
SubString(RowVal,96,1) As f12,
SubString(RowVal,98,5) As f13,
SubString(RowVal,104,3) As f14,
SubString(RowVal,108,12) As f15,
SubString(RowVal,121,5) As f16,
SubString(RowVal,127,11) As f17,
SubString(RowVal,139,4) As f18,
SubString(RowVal,144,3) As f19,
SubString(RowVal,148,6) As f20,
SubString(RowVal,155,10) As f21,
SubString(RowVal,166,7) As f22,
SubString(RowVal,174,8) As f23,
SubString(RowVal,183,5) As f24
FROM
#tempTable
Results:
f01 f02 f03 f04 f05 f06 f07 f08 f09 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24
18 327113458 5090115210965074 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 57.74633959 1 1 0 1.0 13.645 3703.4148 0.008 1 1
18 327113458 5090115210965075 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 35.32746807 1 2 0 1.0 18.519 3493.8994 0.008 1 1
18 327113458 5090115210965076 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 86.58912033 1 3 0 1.0 22.904 4077.5797 0.008 1 1
18 327113458 5090115210965077 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 53.32520232 1 4 0 1.0 23.392 5024.1262 0.008 1 1
18 327113458 5090115210965078 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 12 1 87.56771684 1 5 0 1.0 22.417 6922.9585 0.008 1 1

how to split an array into separate arrays (R)? [duplicate]

This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 7 years ago.
I have the array:
>cent
b e r f
A19 60.46 0.77 -0.12 1
A15 16.50 0.53 0.08 2
A17 2.66 0.51 0.20 3
A11 36.66 0.40 -0.25 4
A12 38.96 0.91 0.23 1
A05 0.00 0.29 0.01 2
A09 3.40 0.35 0.03 3
A04 0.00 0.25 -0.03 4
Could some one please say me how to split this array into 4 separate arrays where the last column «f» is the flag? In result I would like to see:
>cent1
b e r f
A19 60.46 0.77 -0.12 1
A12 38.96 0.91 0.23 1
>cent2
b e r f
A15 16.50 0.53 0.08 2
A05 0.00 0.29 0.01 2
….
Should I use the for-loop and check flag "f" or exist a build-in function? Thanks.
We can use split to create a list of data.frames.
lst <- split(cent, cent$f)
NOTE: Here I assumed that the 'cent' is a data.frame. If it is a matrix
lst <- split(as.data.frame(cent), cent[,"f"])
Usually, it is enough to do most of the analysis. But, if we need to create multiple objects in the global environment, we can use list2env (not recommended)
list2env(lst, paste0("cent", seq_along(lst)), envir= .GlobalEnv)

profiling c code using gprof

for a seemingly large code for AES when I profile the code using gprof with following command
cc file1.c file2.c -pg
./a.out
gprof a.out gmon.out > analysis.txt
cat analysis.txt
the output file shows time as 0 for all function calls
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 576 0.00 0.00 galois_multiply
0.00 0.00 0.00 40 0.00 0.00 getSBoxValue
0.00 0.00 0.00 33 0.00 0.00 PrintArr
0.00 0.00 0.00 11 0.00 0.00 AddRoundKey
0.00 0.00 0.00 10 0.00 0.00 core
0.00 0.00 0.00 10 0.00 0.00 getRconValue
0.00 0.00 0.00 10 0.00 0.00 myrotate
0.00 0.00 0.00 10 0.00 0.00 shiftRow
0.00 0.00 0.00 10 0.00 0.00 subByte
0.00 0.00 0.00 9 0.00 0.00 mixColumn
0.00 0.00 0.00 1 0.00 0.00 ReadInput
0.00 0.00 0.00 1 0.00 0.00 expandKey
am I missing somthing.. kindly advise,
I tried using eclipse tptp, but couldnt figure out a way to profice c
code using eclipse, any ideas in that direction would also be
appreciated
Is there any tool online using which I can upload my code and extract
the detailed analysis report?

SAS: Using a Loop for Creating Many Data Sets and renaming the variables in them

I have a dataset in a long format as e.g.:
time subject var1 var2 var3
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
I would like to get a dataset in a wide format as
time var1_sub1 var2_sub1 var3_sub1 var1_sub2 var2_sub2 var3_sub2
1 0.41 0.48 0.85 0.76 0.49 0.03
2 0.58 0.38 0.15 0.36 0.26 0.93
3 0.08 0.39 0.96 0.83 0.88 0.63
4 0.58 0.87 0.15 0.19 0.65 0.99
5 0.55 0.40 0.67 0.89 0.91 0.47
So far, I came up with an idea to do it in the following way:
data data_sub1;
set data;
if subject=1;
var1_sub1=var1;
var2_sub1=var2;
var3_sub1=var3;
run;
data data_sub2;
set data;
if subject=2;
var1_sub2=var1;
var2_sub2=var2;
var3_sub2=var3;
run;
proc sort data=data_sub1;
by time;
run;
proc sort data=data_sub2;
by time;
run;
data datamerged;
merge data_sub1 data_sub2;
by time;
run;
It works, everything is fine, but I would like to learn how one could code it in a more beautiful way as in the practice I have much more subjects and variables.
This is a PROC TRANSPOSE problem. To solve most PROC TRANSPOSE problems, make it totally vertical (one value-one variable name per row) and then transpose using the ID statement.
data have;
input time subject var1 var2 var3;
datalines;
1 1 0.41 0.48 0.85
2 1 0.58 0.38 0.15
3 1 0.08 0.39 0.96
4 1 0.58 0.87 0.15
5 1 0.55 0.40 0.67
1 2 0.76 0.49 0.03
2 2 0.36 0.26 0.93
3 2 0.83 0.88 0.63
4 2 0.19 0.65 0.99
5 2 0.89 0.91 0.47
;;;;
run;
data have_vert;
set have;
array vars var:;
do _t = 1 to dim(vars);
id=cats(vname(vars[_t]),'_','sub',subject); *this is our future variable name;
value = vars[_t]; *this is our future variable value;
output;
end;
keep time id value subject;
run;
proc sort data=have_vert;
by time subject id;
run;
proc transpose data=have_vert out=want;
by time;
var value;
id id;
run;

Resources