Importing CSV file in Neo4j - database

While using this command
LOAD CSV FROM "file:///artists.csv" AS line
CREATE (:Artist { name: line.Name, year: toInt(line.Year)})
I am getting
Error: Type mismatch: expected Any, Map, Node or Relationship but was List<String> (line 2, column 25 (offset: 68))
"CREATE (:Artist { name: line.Name, year: toInt(line.Year)})"
CSV file
"1","ABBA","1992"
"2","Roxette","1986"
"3","Europe","1979"
"4","The Cardigans","1992"

Your CSV has no header, so you can't access a column by its name, ie when you do this : line.Name.
You have to do it by the column index : line[0]
Cheers

Please try
LOAD CSV WITH HEADERS FROM "file:///artists.csv" AS line
CREATE (:Artist { name: line.Name, year: toInt(line.Year)})
Make sure you have Name and Year columns in the atists.csv file

Related

Change ID in multiple FASTA files

I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
Each fasta file corresponds to a single organism, but it is not specified in the IDs. I have the original fasta files (because these have been translated) with the same filenames but different directories and include in their IDs the name of each organism.
I wanted to figure out how to loop through all these fasta files and rename each ID in each file with the corresponding organism name.
ok my effort, got to use my own input folders/files since they where not specified in question
/old folder contains files :
MW628877.1.fasta :
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
ATGCAAGATAAAAATTTAGTAGATGTTAATCTAACTAGTGAAATGAAAACGAGTTTTATCGATTACGCCA
TGAGTGTCATTGTTGCTCGTGCACTTCCAGATGTTAGAGATGGTTTAAAACCTGTTCATCGTCGTATTTT
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
CGGCGCGTACCGTACGCGATGCACGAGCTGAAAAATAACTGGAATGCCGCCTACAAAAAATCGGCGCGCA
TCGTCGGCGACGTCATCGGTAAATACCACCCCCACGGCGATTTCGCAGTTTACGGCACCATCGTCCGTAT
MG995190.1.fasta :
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCCAGCAGGAGA
TGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCGGAGGTGCGCGACGG
and an /empty folder.
/new folder contains files :
MW628877.1.fasta :
>MW628877.1
MQDKNLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRI
>KY347969.1
RRVPYAMHELKNNWNAAYKKSARIVGDVIGKYHPHGDFAVYGTIVR
MG995190.1.fasta :
>MG995190.1
MTDTTLPPDDSLDRIEPVDIQQEMQRSYIDYAMSVIVGRALPEVRD
my code is :
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
print(filename)
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('_____________________')
print('old ids ---> ',old_ids_dict)
print('_____________________')
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('### ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id += '.'+old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
print(seq_record)
sequences.append(seq_record)
SeqIO.write(sequences, filename, 'fasta')
check how it works, it actually overwrites both files in new folder,
as pointed out by #Vovin in his comment it needs to be adapted per your files template from-to.
I am sure there is more than a way to do this, probably better and more pythonic than may way, I am learning too. Let us know

Parsing a pipe delimited json data in python

I am trying to parse an API response which is JSON. the JSON looks like this:
{
'id': 112,
'name': 'stalin-PC',
'type': 'IP4Address',
'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98|'
}
It's length is 1200, If i convert it I should get 1200 rows. My goal is to parse this json like below:
id name type address iplong state macAddress
112 stalin-PC IP4Address 10.0.1.110 277893412 DHCP Allocated 41-1z-y4-23-dd-98
I am getting the first 3 elements but having an issue in "properties" key which value is pipe delimited. I have tried the below code:
for network in networks: # here networks = response.json()
network_id = network['id']
network_name = network['name']
network_type = network['type']
print(network_id, network_name, network_type)
It works file and gives me result :
112 stalin-PC IP4Address
But when I tried to parse the properties key with below code , its not working.
for network in networks:
network_id = network['id']
network_name = network['name']
network_type = network['type']
for line in network['properties']:
properties_value = line.split('|')
network_address = properties_value[0]
print(network_id, network_name, network_type, network_address )`
How can I parse the pipe delimited properties key? Would anyone help me please.
Thank you
Using str methods
Ex:
network = {
'id': 112,
'name': 'stalin-PC',
'type': 'IP4Address',
'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98'
}
for n in network['properties'].split("|"):
key, value = n.split("=")
print(key, "-->", value)
Output:
address --> 10.0.1.110
ipLong --> 277893412
state --> DHCP Allocated
macAddress --> 41-1z-y4-23-dd-98

Troubles with BlazingText jsonlines Batch Transform

I have a jsonlines file that looks like this:
{"id":123,"source":"this is a text string"}
{"id":456,"source":"this is another text string"}
{"id":789,"source":"yet another string"}
When I run a BlazingText Batch Transform job on a file that just contains the source, it works. When trying to join the inputs and outputs, I get Customer Error: Unable to decode payload: Incorrect data format. (caused by AttributeError).
Any suggestions?
Code:
bt_transformer = bt_model.transformer(
instance_count = 1,
instance_type = "ml.m4.xlarge",
assemble_with = "Line",
output_path = s3_batch_out_data,
accept = "application/jsonlines"
)
bt_transformer.transform(
s3_batch_in_data,
content_type = "application/jsonlines",
split_type = "Line",
input_filter = "$.source",
join_source = "Input",
output_filter = "$['id', 'SageMakerOutput']"
)
bt_transformer.wait()
When applying "$.source" on {"id":123,"source":"this is a text string"}, the output is "this is a text string" instead of {"source":"this is a text string"} which is probably why you got a format error. I'm wondering why you need to do such filtering on a JSON input - doesn't the algorithm ignore unrecognized JSON fields automatically?

invalid sign in external "numeric" value error from Npgsql Binary copy

I get this error on using Npgsql Binary copy (see my implementation at the bottom):
invalid sign in external "numeric" value
The RetailCost column on which it fails has the following PostgreSQL properties:
type: numeric(19,2)
nullable: not null
storage: main
default: 0.00
The PostgreSQL log looks like this:
2019-07-15 13:24:05.131 ACST [17856] ERROR: invalid sign in external "numeric" value
2019-07-15 13:24:05.131 ACST [17856] CONTEXT: COPY products_temp, line 1, column RetailCost
2019-07-15 13:24:05.131 ACST [17856] STATEMENT: copy products_temp (...) from stdin (format binary)
I don't think it should matter, but there are only zero or positive RetailCost values (no negative or null ones)
My implementation looks like this:
using (var importer = conn.BeginBinaryImport($"copy {tempTableName} ({dataColumns}) from stdin (format binary)"))
{
foreach (var product in products)
{
importer.StartRow();
importer.Write(product.ManufacturerNumber, NpgsqlDbType.Text);
if (product.LastCostDateTime == null)
importer.WriteNull();
else
importer.Write((DateTime)product.LastCostDateTime, NpgsqlDbType.Timestamp);
importer.Write(product.LastCost, NpgsqlDbType.Numeric);
importer.Write(product.AverageCost, NpgsqlDbType.Numeric);
importer.Write(product.RetailCost, NpgsqlDbType.Numeric);
if (product.TaxPercent == null)
importer.WriteNull();
else
importer.Write((decimal)product.TaxPercent, NpgsqlDbType.Numeric);
importer.Write(product.Active, NpgsqlDbType.Boolean);
importer.Write(product.NumberInStock, NpgsqlDbType.Bigint);
}
importer.Complete();
}
Any suggestions would be welcome
I caused this problem when using MapBigInt, not realizing the column was accidentally numeric(18). Changing the column to bigint fixed my problem.
The cause of the problem was that the importer.Write statements was not in the same order as the dataColumns

Query or Expression for excluding certain values from DAL selection

I'm trying to exclude posts which have a tag named meta from my selection, by:
meta_id = db(db.tags.name == "meta").select().first().id
not_meta = ~db.posts.tags.contains(meta_id)
posts=db(db.posts).select(not_meta)
But those posts still show up in my selection.
What is the right way to write that expression?
My tables look like:
db.define_table('tags',
db.Field('name', 'string'),
db.Field('desc', 'text', default="")
)
db.define_table('posts',
db.Field('title', 'string'),
db.Field('message', 'text'),
db.Field('tags', 'list:reference tags'),
db.Field('time', 'datetime', default=datetime.utcnow())
)
I'm using Web2Py 1.99.7 on GAE with High Replication DataStore on Python 2.7.2
UPDATE:
I just tried posts=db(not_meta).select() as suggested by #Anthony, but it gives me a Ticket with the following Traceback:
Traceback (most recent call last):
File "E:\Programming\Python\web2py\gluon\restricted.py", line 205, in restricted
exec ccode in environment
File "E:/Programming/Python/web2py/applications/vote_up/controllers/default.py", line 391, in <module>
File "E:\Programming\Python\web2py\gluon\globals.py", line 173, in <lambda>
self._caller = lambda f: f()
File "E:/Programming/Python/web2py/applications/vote_up/controllers/default.py", line 8, in index
posts=db(not_meta).select()#orderby=settings.sel.posts, limitby=(0, settings.delta)
File "E:\Programming\Python\web2py\gluon\dal.py", line 7578, in select
return adapter.select(self.query,fields,attributes)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3752, in select
(items, tablename, fields) = self.select_raw(query,fields,attributes)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3709, in select_raw
filters = self.expand(query)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3589, in expand
return expression.op(expression.first)
File "E:\Programming\Python\web2py\gluon\dal.py", line 3678, in NOT
raise SyntaxError, "Not suported %s" % first.op.__name__
SyntaxError: Not suported CONTAINS
UPDATE 2:
As ~ isn't currently working on GAE with Datastore, I'm using the following as a temporary work-around:
meta = db.posts.tags.contains(settings.meta_id)
all=db(db.posts).select()#, limitby=(0, settings.delta)
meta=db(meta).select()
posts = []
i = 0
for post in all:
if i==settings.delta: break
if post in meta: continue
else:
posts.append(post)
i += 1
#settings.delta is an long integer to be used with limitby
Try:
meta_id = db(db.tags.name == "meta").select().first().id
not_meta = ~db.posts.tags.contains(meta_id)
posts = db(not_meta).select()
First, your initial query returns a complete Row object, so you need to pull out just the "id" field. Second, not_meta is a Query object, so it goes inside db(not_meta) to create a Set object defining the set of records to select (the select() method takes a list of fields to return for each record, as well as a few other arguments, such as orderby, groupby, etc.).

Resources