Troubles with BlazingText jsonlines Batch Transform - amazon-sagemaker

I have a jsonlines file that looks like this:
{"id":123,"source":"this is a text string"}
{"id":456,"source":"this is another text string"}
{"id":789,"source":"yet another string"}
When I run a BlazingText Batch Transform job on a file that just contains the source, it works. When trying to join the inputs and outputs, I get Customer Error: Unable to decode payload: Incorrect data format. (caused by AttributeError).
Any suggestions?
Code:
bt_transformer = bt_model.transformer(
instance_count = 1,
instance_type = "ml.m4.xlarge",
assemble_with = "Line",
output_path = s3_batch_out_data,
accept = "application/jsonlines"
)
bt_transformer.transform(
s3_batch_in_data,
content_type = "application/jsonlines",
split_type = "Line",
input_filter = "$.source",
join_source = "Input",
output_filter = "$['id', 'SageMakerOutput']"
)
bt_transformer.wait()

When applying "$.source" on {"id":123,"source":"this is a text string"}, the output is "this is a text string" instead of {"source":"this is a text string"} which is probably why you got a format error. I'm wondering why you need to do such filtering on a JSON input - doesn't the algorithm ignore unrecognized JSON fields automatically?

Related

Parsing a pipe delimited json data in python

I am trying to parse an API response which is JSON. the JSON looks like this:
{
'id': 112,
'name': 'stalin-PC',
'type': 'IP4Address',
'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98|'
}
It's length is 1200, If i convert it I should get 1200 rows. My goal is to parse this json like below:
id name type address iplong state macAddress
112 stalin-PC IP4Address 10.0.1.110 277893412 DHCP Allocated 41-1z-y4-23-dd-98
I am getting the first 3 elements but having an issue in "properties" key which value is pipe delimited. I have tried the below code:
for network in networks: # here networks = response.json()
network_id = network['id']
network_name = network['name']
network_type = network['type']
print(network_id, network_name, network_type)
It works file and gives me result :
112 stalin-PC IP4Address
But when I tried to parse the properties key with below code , its not working.
for network in networks:
network_id = network['id']
network_name = network['name']
network_type = network['type']
for line in network['properties']:
properties_value = line.split('|')
network_address = properties_value[0]
print(network_id, network_name, network_type, network_address )`
How can I parse the pipe delimited properties key? Would anyone help me please.
Thank you
Using str methods
Ex:
network = {
'id': 112,
'name': 'stalin-PC',
'type': 'IP4Address',
'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98'
}
for n in network['properties'].split("|"):
key, value = n.split("=")
print(key, "-->", value)
Output:
address --> 10.0.1.110
ipLong --> 277893412
state --> DHCP Allocated
macAddress --> 41-1z-y4-23-dd-98

Saving json array output to txt file and getting errors when attempting to parse

I am unable to parse a JSON array from a text file due to errors and my limited knowledge of JSON.
The file looks something like this [{"random":"fdjsf","random56":128,"name":"dsfjsd", "rid":1243,"rand":674,"name":"dsfjsd","random43":722, "rid":126},{"random":"fdfgfgjsf","random506":120,"name":"dsfjcvcsd", "rid":12403,"rando":670,"name":"dsfooojsd","random4003":720, "rid":120}] It has more than one object({}) in the entire array however I did not want to include all 600. The layout shown above is basically how all of them look.
r = s.get(getAPI, headers=header, verify=False)
f = open('text.txt', 'w+')
f.write(r.text)
f.close
output_file = open ('text.txt', 'r')
json_array = json.load(output_file)
json_list = []
for item in json_array:
name = "name"
rid = "rid"
json_items = {name:None, rid:None}
json_items = [name] = item[name]
json_items = [rid] = item[rid]
json_list.append(json_items)
print(json_list)
I would like to loop through an array and find any time it says "name":... eventually followed by "rid":... and store those in a dictionary as key value pairs.
Errors:
ValueError: too many values to unpack (expected 1)
There is a syntax error when you assign values to json_items, change it to:
json_items[name] = item[name]
json_items[rid] = item[rid]

Saving Data to a Txt File Doesn't Result in Columns

I am trying to save data from two arrays in a text file. I want them to be in columns. After using, my code, it places them in arrays within my text file.
I have tried opening and writing to a file, while saving the data to the file I opened. This results in arrays with []s inside my text file. I want each array to be a separate column.
listkH2=[]
with open ("Basecol_HCL_H2_para_space.txt", 'r') as data:
for line in data:
listkH2.append(map(float, line.strip().split('\t')[4:]))
listkHe=[]
with open ("Lamda_HeHCL.txt", "r") as data:
for line in data:
listkHe.append(map(float, line.strip().split('\t')[3:]))
listkH2 = np.array(listkH2)
listkHe = np.array(listkHe)
listkHe = zip(*listkHe)
listkH2 = zip(*listkH2)
g = len(listkH2[4])
h = len(listkH2)
base = 10
exponentkH2 = np.log(listkH2)/np.log(base)
exponentkHe = np.log(listkHe)/np.log(base)
exponentR = np.log(R)/np.log(base)
exponentkHe.resize(exponentkH2.shape)
sub_arr = np.subtract(exponentkH2, exponentkHe)
C = sub_arr/exponentR
# m, b=np.polyfit(Delta,C,1)
with open('final_all_temperatures.txt', 'w') as f:
for i in range(0,h):
f.write(str(C[i]) + '\t')
This is the result of my text file.
[ 2.05096059 3.65564871 0.25845727 2.86561982 1.45278606 1.43600254
2.56773896 2.45761421 1.20582346 0.60828049 3.26240446 4.27576383
0.54857546 -0.64695867 -0.11224826 5.17238434 2.98451642 3.85145269
3.09965407 2.52651157 -3.00785759 4.3324366 2.80269088 2.45654687
3.47455066 2.65614389 0.27125519 0.25980975 7.09469104 6.99091224
0.87473021 1.10630048 0.87345859 0.97107587 -0.27617078 -1.30076657
6.77947741 5.90616421 4.5975132 5.11154511 5.25644884 5.31593925
-1.99289632 -2.3974694 -3.80162209 5.13590939 3.98362836 4.46667132
4.10673705 3.77926953 3.94386828 3.30850352 1.95047033 1.76710167
1.19894221 10.18390848 1.78371439 2.49395868 2.3474523 2.519068
1.38826942 2.05602612 1.25479554 0.96399962 0.11317195 -1.10848421
8.53820213 7.633488 7.12863018 7.63910202 8.10127737 7.87049188
-1.91641511 -0.28726499 -1.29462024 -1.25186861 -2.50127217 -3.22243846
5.68118123 6.23469673 5.63422048 5.91636457 5.68482542 5.5662473
5.07251286 4.9056646 4.37984497 4.62261653 4.25979296 3.72784228
-7.88586841 5.25835441 1.23747084 4.6921546 2.66446001 4.40904565
2.19600568 4.0688489 0.68169022 3.43846513 1.67349861 2.63996661
0.1540953 0.94775071 -2.36103131 10.93013689 9.09044102 10.78030914
9.49466186 11.26694864] [10.80376405 10.93396947 8.78331386 8.66009656 7.53236037 7.99974278
5.22439242 4.28218218 3.03594372 2.16010528 12.82404951 13.3732189
9.05826637 9.18144497 10.56296634 10.09347188 9.04596931 9.61318739
8.54461674 9.12830347 4.26666193 4.5667606 3.54283205 2.95557657
2.88977065 1.0935233 12.59595864 13.00536833 12.6136856 13.00661573
7.26336223 7.67990927 7.30171157 7.91066542 7.455629 7.2838736
7.58015125 6.82723242 5.35373085 5.71033646 5.18946539 4.49244374
-3.51515573 10.30278666 9.38098429 9.94790116 9.25593545 9.83329325
9.63878409 9.53483308 9.73113818 9.22097715 8.37749349 8.89604493
9.10481854 8.86047563 0.62755156 0.93984665 0.10698098 -0.39034919
-2.69513211 12.15017732 11.13286652 11.43125376 11.32322433 11.1926892
11.22782336 10.87236282 10.4539346 10.98570189 11.38457084 11.25752581
1.21225179 3.07424717 2.01456061 2.44766602 1.26953183 1.0984022
-1.29397618 -1.48139497 12.60977098 12.54882608 12.1162488 12.21761522
11.8101766 11.90699102 11.49715631 12.24025124 12.37510349 12.67056169
2.13545973 4.28512186 0.62778423 3.92076771 1.67268049 2.97473961
0.31064791 1.51748601 -2.5543087 -1.17402789 -3.59412271 -1.0003402
-4.10661142 -0.05811249 -2.34663317 -1.61805463 0.47613233 2.36532043
0.54560575 1.16125952] [ 4.9855747 5.03496604 3.42614786 2.22158852 0.38884903 6.90865818
3.39190522 3.33552032 3.05084431 2.12397894 -0.2323361 7.53797668
2.25622511 2.02602316 3.60604263 2.91488229 0.83429121 0.53335524
6.63836166 6.96359602 1.86327326 2.32811345 1.89414417 1.46912454
1.26198015 -0.21686245 6.20160265 6.73493949 6.17160435 6.29832781
0.46716077 0.86182049 -0.09054745 -0.29874062 -1.67545908 6.62483228
6.60375835 6.04711012 4.73844679 5.37803451 5.4159072 5.40949612
-1.91714216 -2.26650225 -3.7138887 5.98006425 4.72632669 5.20453338
4.95046361 4.67806678 4.72286638 4.1850588 3.13116959 3.16862892
2.73366898 1.72371214 3.53951691 3.86404861 3.51527105 3.90972208
2.86340839 3.46975846 2.7687766 3.0284503 2.51227629 1.75161811
0.95285671 -0.4655973 9.38224353 9.47766062 9.76635492 9.73635628
-0.27367966 1.86275939 0.86102507 1.60947148 0.67562965 0.85894242
-1.35537727 -0.85855513 -2.59942103 8.77070795 7.85437711 7.9078672
7.37520837 7.47069689 6.97168053 7.64025685 7.56935214 7.80255101
-2.85734751 -1.07975623 -5.10821411 -2.48103704 -5.50188297 7.66128659
5.15799005 6.86371907 3.80742845 6.49294819 5.11835001 6.29209746
4.57300885 5.73948154 3.72602271 4.93019252 3.34088625 4.22800894
2.53747149 2.72044622] [14.59399396 14.17538676 12.11253635 12.21898523 11.36216272 11.97824621
9.49599821 9.19121228 8.68402685 8.89117939 7.32480894 7.91381899
3.19143507 2.69871186 3.12348552 1.46258132 13.35418216 13.46758999
12.36061903 13.11511372 8.45521196 9.08641432 8.5309354 8.56076844
9.191071 8.53850611 7.24304758 7.83154007 7.04489 6.99701568
0.5638625 0.19469849 -1.43943823 12.62988962 11.71885502 11.62566754
12.05866517 11.57027291 10.51877182 11.33946178 11.31414447 11.48429395
4.75474657 5.08202719 4.41086424 4.8373262 3.79586278 3.84440026
2.9008565 1.95401438 0.81082048 14.46202919 13.22811087 13.82328719
14.10304479 14.19299321 6.41208229 7.28931298 6.99147468 7.43798579
6.29954869 7.04283397 6.3125048 6.52678372 6.01999398 5.56020142
4.96703023 3.98429553 2.61200136 1.82232825 16.850672 16.44384111
6.26819032 8.71002922 7.83320081 8.8212229 7.98323861 8.68625401
6.96184901 8.32485726 7.58829445 7.95682585 7.26568654 7.27092631
6.4217462 6.17293616 5.11895136 5.20220408 4.28081568 3.27553963
-8.07712172 -4.79581853 -9.33932062 -3.55410904 -5.4922721 -4.98068847
-3.50284453 -1.65739383 -5.26691215 -3.69863835 -0.64627254 0.28844521
-0.69630747 -0.85757847 -3.41338786 3.70800446 1.63408755 3.1450509
2.64204492 2.50336127] [ 1.30114054e+00 8.15315156e+00 5.23930535e+00 5.01132982e+00
4.45565040e+00 4.74180742e+00 1.14604320e+00 -6.73559045e-03
6.78496283e+00 6.74663638e+00 5.07017488e+00 5.66409178e+00
1.63733874e+00 1.25912340e+00 1.45420587e+00 1.24102954e-01
6.94612171e+00 7.19239882e+00 5.99157573e+00 6.37614617e+00
1.67897007e+00 2.24869726e+00 1.16807242e+00 3.35083369e-01
2.19301137e-02 7.87804957e+00 6.39415667e+00 7.09650386e+00
6.39578406e+00 6.62074804e+00 7.99959932e-01 1.15490059e+00
2.47318786e-01 4.92876498e-02 -1.39700704e+00 7.65124721e+00
7.48227097e+00 6.90380404e+00 5.91983503e+00 6.48870461e+00
6.21423116e+00 6.37679250e+00 -4.34920957e-01 -6.00321151e-01
-1.96827731e+00 -2.26992026e+00 6.81784002e+00 6.81568556e+00
6.35956528e+00 6.29227876e+00 6.39250993e+00 5.80096255e+00
4.94411213e+00 5.45219128e+00 5.24361831e+00 4.71206344e+00
-3.74398461e+00 -3.96166348e+00 5.96680211e+00 6.03785540e+00
4.70121252e+00 5.62892297e+00 5.05631239e+00 5.37750750e+00
4.90252062e+00 4.74640875e+00 4.43255391e+00 3.77344892e+00
2.66087389e+00 2.45855534e+00 1.53481055e+00 1.26065265e+01
1.97353454e+00 4.51960506e+00 3.54980293e+00 4.50609317e+00
3.49840571e+00 4.19961447e+00 2.20714105e+00 3.57672717e+00
2.68321279e+00 2.72025529e+00 1.48633479e+00 9.59891660e-01
-8.00455607e-01 1.09638382e+01 9.98456126e+00 1.06383717e+01
1.05493328e+01 1.09708647e+01 7.43888050e-01 2.78464534e+00
-3.07655161e-01 2.55644834e+00 8.40377970e-01 1.83414081e+00
-3.99145423e-01 4.77315104e-01 -3.06103618e+00 -1.86959167e+00
9.33159151e+00 9.66021669e+00 8.34340496e+00 9.23847458e+00
7.80225548e+00 8.85027901e+00 8.09277237e+00 9.11936253e+00
8.51739384e+00 9.25735251e+00] [ 8.7746807 8.50527169 6.18449065 5.72511731 4.12625001 3.35894075
13.68707776 12.97813629 12.60295852 13.01244433 11.83660687 12.51649823
8.43366325 8.47280548 9.47910527 8.98201393 8.02320711 8.37003231
7.04169735 7.16926569 1.91426947 1.65120342 -0.08205392 13.39191391
13.52881059 12.96519714 12.03288501 12.7481386 12.21465451 12.64224632
6.83771396 7.38086875 7.05853845 7.4940764 6.87205723 6.59423745
6.58596952 5.5922075 4.03636266 3.85952638 2.35582845 16.70020835
9.7802121 10.18159267 9.54239143 10.35201563 9.81068739 10.31050063
9.95729237 9.87833452 9.94041346 9.42473993 8.60500549 9.00512293
8.81699532 8.55190749 0.4586119 0.6053144 -0.71843353 -1.4951758
11.93270089 12.49663713 11.8818076 12.27848105 11.95068557 11.98714091
11.85704393 11.63471412 11.15727874 11.67565456 11.72611885 11.77094384
1.46343637 3.97543463 2.77180999 3.31983273 1.73263685 1.88511749
-0.94401385 -0.80031704 -2.53135343 -0.92449635 -2.84444137 -0.04797997
-0.90177917 -1.6182753 1.2427764 2.15395051 1.32127769 0.79631879
-5.11062163 -3.10984532 -5.53092354 -3.90661017 -6.30053893 0.73700249
-2.15690124 -0.47499138 -2.97804509 -1.92889935 -3.76299996 3.74504845
1.56555529 1.98075547 0.88244604 1.61251479 -0.14051652 -0.08338517
6.61776443 6.98779591] [ 6.11810619 5.98083022 4.46335971 4.21698354 2.58363314 1.95054541
7.1441627 6.54703017 6.26560657 6.35671329 5.30001727 5.6827213
1.38830303 0.44548006 0.33185371 8.33397437 7.29577365 7.74791931
6.67299775 6.86612113 2.4948925 2.72224304 1.82625593 0.95363991
0.38655253 9.01517432 7.83066075 8.25203463 7.60690456 7.74908084
1.74899554 2.51306757 2.23489676 1.95912073 0.76823837 -0.33672514
9.55970569 8.62066788 7.9146716 8.37406532 7.97313769 8.02162324
1.76566115 2.15045202 0.8805645 1.19897017 0.07493811 -0.69998902
9.18214126 8.6425589 8.51303327 8.13573162 7.69527632 8.02502877
7.71098893 7.73024505 0.57354369 0.80584046 -0.41320126 -0.44468769
-3.11611823 9.04216078 8.17653945 8.31461282 7.7758945 7.80135456
7.50391063 7.25507496 6.78531649 7.04322748 6.69842074 6.46431534
-3.55015897 -1.92091056 -3.82203769 8.55162325 6.84296438 7.74611465
5.73950973 7.31454659 6.82563053 7.06496592 6.23868853 6.34944599
5.52527565 5.4509146 4.68027495 4.54424724 3.5403263 2.71115104
5.13755567 6.83998348 4.58780231 6.76014919 5.53833276 6.46002404
4.72441272 6.06079692 3.65572558 5.39001709 4.55060578 4.65435803
3.15056989 3.08321201 1.13596416 0.59857696 13.02216898 13.19224027
12.81720677 13.32968484] [12.7089459 12.71675932 11.20780617 11.39995159 10.70388087 10.77007591
8.19830104 7.68702633 7.42698434 7.2518205 5.69900072 5.13203097
0.39557732 13.67827776 13.8852568 13.48902448 13.11849501 13.52280391
12.69314428 12.96853253 8.92336752 9.08105087 8.80828186 8.57779444
8.68227512 8.04870269 7.27808688 7.13249279 5.80355315 5.16653256
-1.99509752 13.07642876 12.79724273 12.90827863 12.52050106 12.45162473
12.56260174 12.21922019 11.85851903 12.14648963 11.62733864 11.69063629
5.63762171 5.99477105 4.7590756 5.31853939 4.61560375 4.07348835
2.71502181 1.2843117 15.88855744 15.13404519 14.80479931 15.09227986
14.82609206 15.00715414 8.53432524 8.99494314 8.30443942 9.08127099
7.46401465 8.52655349 8.11183977 7.99574157 7.13613774 6.76190291
5.88074426 5.07372122 3.98007457 2.82738588 1.28014044 2.71009949
-6.94989482 -2.65881888 -3.50675844 -3.71843184 -1.72846045 -0.50861143
-3.00286434 -2.57050771 1.26306969 1.30221658 0.98601033 -0.03314028
-1.56826551 4.37796625 3.07227691 3.62277232 3.53862163 2.55933073
-7.80545668 1.1158715 -1.55666162 -0.29086146 -1.05472296 -0.55586977
-3.31247025 -2.88559027 2.25328933 3.31008262 2.44682573 2.47841962
1.67640725 1.63557975 -0.27743426 -0.66182935 6.72225978 6.81070835
6.37229095 6.43772756]
The expected results should be that each array will be in its own column. However, that is not the case. The actual results are that each array is just underneath each other.

ValueError: could not convert string to float: E-3 [duplicate]

This question already has answers here:
Issue querying from Access database: "could not convert string to float: E+6"
(3 answers)
Closed 7 years ago.
I am trying to access some of the columns in a Microsoft Access database table which contains numbers of type double but I am getting the error mentioned in the title.The code used for querying the database is as below, the error is occurring in the line where cur.execute(....) command is executed. Basically I am trying filter out data captured in a particular time interval. If I exclude the columns CM3_Up, CG3_Up, CM3_Down, CG3_Down which contains double data type in the cur.execute(....) command I wont get the error. Same logic was used to access double data type from other tables and it worked fine, I am not sure what is going wrong.
Code:
start =datetime.datetime(2015,03,28,00,00)
a=start
b=start+datetime.timedelta(0,240)
r=7
while a < (start+datetime.timedelta(1)):
params = (a,b)
sql = "SELECT Date_Time, CM3_Up, CG3_Up, CM3_Down, CG3_Down FROM
Lysimeter_Facility_Data_5 WHERE Date_Time >= ? AND Date_Time <= ?"
for row in cur.execute(sql,params):
if row is None:
continue
r = r+1
ws.cell(row = r,column=12).value = row.get('CM3_Up')
ws.cell(row = r,column=13).value = row.get('CG3_Up')
ws.cell(row = r,column=14).value = row.get('CM3_Down')
ws.cell(row = r,column=15).value = row.get('CG3_Down')
a = a+five_min
b = b+five_min
wb.save('..\SE_SW_Lysimeters_Weather_Mass_Experiment-02_03_26_2015.xlsx')
Complete error report:
Traceback (most recent call last):
File "C:\DB_PY\access_mdb\db_to_xl.py", line 318, in <module>
for row in cur.execute(sql,params):
File "build\bdist.win32\egg\pypyodbc.py", line 1920, in next
row = self.fetchone()
File "build\bdist.win32\egg\pypyodbc.py", line 1871, in fetchone
value_list.append(buf_cvt_func(alloc_buffer.value))
ValueError: could not convert string to float: E-3
As to this discussion:
Python: trouble reading number format
the trouble could be that e should be d, like:
float(row.get('CM3_Up').replace('E', 'D'))
Sounds weird to me though, but I know only little of Python.
It sounds like you receive strings like '2.34E-3', so try with a conversion. Don't know Python, but in C# it could be like:
ws.cell(row = r,column=12).value = Convert.ToDouble(row.get('CM3_Up'))
ws.cell(row = r,column=13).value = Convert.ToDouble(row.get('CG3_Up'))
ws.cell(row = r,column=14).value = Convert.ToDouble(row.get('CM3_Down'))
ws.cell(row = r,column=15).value = Convert.ToDouble(row.get('CG3_Down'))

Find string in log files and return extra characters

How can I get Python to loop through a directory and find a specific string in each file located within that directory, then output a summary of what it found?
I want to search the long files for the following string:
FIRMWARE_VERSION = "2.15"
Only, the firmware version can be different in each file. So I want the log file to report back with whatever version it finds.
import glob
import os
print("The following list contains the firmware version of each server.\n")
os.chdir( "LOGS\\" )
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + " = ???)
I was thinking I could use something like the following to return the extra characters but it's not working.
file[:+5]
I want the output to look something like this:
server1.web.com = FIRMWARE_VERSION = "2.16"
server2.web.com = FIRMWARE_VERSION = "3.01"
server3.web.com = FIRMWARE_VERSION = "1.26"
server4.web.com = FIRMWARE_VERSION = "4.1"
server5.web.com = FIRMWARE_VERSION = "3.50"
Any suggestions on how I can do this?
You can use regex for grub the text :
import re
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + '='+ re.search(r'FIRMWARE_VERSION ="([\d.]+)"',contents).group(1))
In this case re.search will do the job! with searching the file content based on the following pattern :
r'FIRMWARE_VERSION ="([\d.]+)"'
that find a float number between two double quote!also you can use the following that match anything right after FIRMWARE_VERSIONbetween two double quote.
r'FIRMWARE_VERSION =(".*")'

Resources