I have one spider that crawls one website, but I want to store the results in two different tables in my Postgresql DB.
1 in "races"
2 in "participants"
If I just fill in one table, it works fine, but how do I get the scrapy pipeline to fill in both tables in one go?
I tried to make two classes in my pipelines.py, but that did not work out. I guess I just miss sth. here
Well, here is my code
import logging
import psycopg2
from scrapy.loader import ItemLoader
class RacesPipeline(object):
def open_spider(self, spider):
hostname = 'localhost'
username = 'postgres'
password = '****!'
database = 'horseracing'
port = "***"
self.connection = psycopg2.connect(host=hostname, user=username, password=password,
dbname=database, port=port)
self.cur = self.connection.cursor()
def close_spider(self, spider):
self.cur.close()
self.connection.close()
def process_item(self, item, spider):
self.cur.execute("insert into races(track, date, racename, racetype, distancefinal, minalter, maxalter, raceclass, classrating, going, finalhurdle, anzahlstarter, winningtimecombined, pricemoney1, pricemoney2, pricemoney3, pricemoney4, pricemoney5, pricemoney6, pricemoney7, pricemoney8) values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",
(
item['track'][0],
item['date'],
item['racename'][0],
item['racetype'],
item['distancefinal'],
item['minalter'],
item['maxalter'],
item['raceclass'],
item['classrating'],
item['going'][0],
item['finalhurlde'],
item['anzahlstarter'],
item['winningtimecombined'],
item['pricemoney1'],
item['pricemoney2'],
item['pricemoney3'],
item['pricemoney4'],
item['pricemoney5'],
item['pricemoney6'],
item['pricemoney7'],
item['pricemoney8']
))
self.connection.commit()
return item
class HorsesPipeline(object):
def open_spider(self, spider):
hostname = 'localhost'
username = 'postgres'
password = '********'
database = 'horseracing'
port = "****"
self.connection = psycopg2.connect(host=hostname, user=username, password=password, dbname=database, port=port)
self.cur = self.connection.cursor()
def close_spider(self, spider):
self.cur.close()
self.connection.close()
def process_item(self, item, spider):
self.cur.execute("insert into participants(pos, draw, dwinner, dnext, startnumber, pferde, horsecountry, odd, jockey, trainer, weightkg, alter, headgear, officalrating, rp, ts, rprc) values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)",
(
item['pos'][0],
item['draw'],
item['dwinner'],
item['dnext'],
item['startnumber'],
item['pferde'],
item['horsecountry'],
item['odd'],
item['jockey'],
item['trainer'],
item['weightkg'],
item['alter'],
item['headgear'],
item['officalrating'],
item['rp'],
item['ts'],
item['rprc']
))
self.connection.commit()
return item
And the pipeline settings:
ITEM_PIPELINES = {
'results.pipelines.RacesPipeline': 100,
'results.pipelines.HorsesPipeline':200,
}
If I run the code, I get the error
line 33, in process_item
item['track'][0],
KeyError: 'track'
But they run just fine when I don't try to string the two table inserts together, but test them individually. AND, it inserts the first table just fine, even if the error above suggests otherwiese
I know I am just missing sth. to add them together, but I can`t figure it out
I see you have TWO different item types. You need to check item instance in your process_item and use different insert:
from your_spider.items import RaceItem, ParticipantItem # use actual names here
if isinstance(item, RaceItem):
# insert into race
Single process_item will work for two different tables, no need to create second class.
Related
I have a problem with \n when I am trying to write a datestring and number values in txt file
pattern = [ ...
'Date %s - First %d \n', ...
'Date %s - Second %d \n' ...
'%d, \n', ...
'*ENDDO\n\n'];
t = datetime('now');
[fid, msg] = fopen('date_and_values.txt', 'wt');
assert(fid ~= -1, 'Cannot open file %s: %s', 'долбоеб.txt', msg);
formatOut='dd.mm.yy';
dateString = datestr(t);
disp(dateString);
formatNumb = '\t%d';
res = [dateString num2str(1,formatNumb) num2str(2,formatNumb)];
for k = 1:17
fprintf(fid, pattern, res);
% % Perhaps this is faster:
% % fwrite(fid, strrep(pattern, '%d', sprintf('%d', k)), 'char');
end
fclose(fid);
I want the data looks like this:
But instead I get data in file look like this:
What am I doing wrong?
Change pattern to
pattern = ['Date %1$s - First %2$d \n', ...
'Date %1$s - Second %3$d \n\n'];
and use
fprintf(fid, pattern, dateString, num2str(1,formatNumb), num2str(2,formatNumb));
instead, you will get the desired output.
Note the use of identifiers in the above. (ctrl+F "identifiers" in documentation.) Without identifiers, each time you have a new formatting operator, a new input is expected by fprintf(). On top of that, every uniquely identified operator in your pattern should correspond to 1 input in fprintf().
(The pattern in OP also contains some superfluous trailing bits that are not found in the example output.)
I don't know if I understand what you are looking for, but, have you tried this?
res = [dateString num2str(1,formatNumb) num2str(2,formatNumb) '\n'];
So I'm currently connecting to a local Linux TCP Server over 8888 and sending GPS coordinates and temperature readings from my Arduino shield Quectel BG96 modem.
After that, I then send the received data to a mySQL Database as Temperature and Location columns respectively.
The Thing is that my location coordinates are mixed (UTC time, long and lat are all in the one string) and I want to be able to separate them before I send them to the Database (Ill add extra columns for this)
I would appreciate any assistance in researching the most efficient way to separate/parse the strings.
My Socket Server Python Script
while True:
data = connection.recv()
print >>sys.stderr, 'received "%s"' % data
print(len(data))
if (len(data) < 15):
temp = data
else:
Loc = data
try:
now=datetime.datetime.utcnow()
print(now.strftime('%Y-%m-%d %H:%M:%S'))
#Datain = """INSERT INTO irthermo(temperature, location, TIME) VALUES (%s, %s,%s)""",(temp,Loc,now.strftime('%Y-%m-%d %H:%M:%S'))
Datain = "INSERT INTO irthermo(temperature, location, TIME) VALUES (%s,%s,%s)"
values= (temp,Loc,now.strftime('%Y-%m-%d %H:%M:%S'))
cursor.execute(Datain,values)
con.commit()
print(cursor.rowcount, "Hello")
except Error as e:
print("Error while connecting to Mysql", e)
finally:
if (con.is_connected()):
con.close()
print("connection closed")
cursor.close()
# Clean up the connection
connection.close()
Image of MyPHPAdmin showing the Database values coming inr LAT and LONGI managed to make something like this which is working for me now:
print(len(data))
if (len(data) < 15) :
if ">" in data.strip():
temp = data
temp1 = temp.split(">")
print(temp1[1])
else:
Loc = data
try:
loc1 = Loc.split(": ")
loc2 = loc1[1]
# setting the maxsplit parameter to 1, will return a list with 2 elements!
loc3 = loc2.split(",")
if len(loc3) >= 4 :
LAT=loc3[1]+loc3[2]
LONG=loc3[3]+loc3[4]
now=datetime.datetime.utcnow()
print(now.strftime('%Y-%m-%d %H:%M:%S'))
#Datain = """INSERT INTO irthermo(temperature, location, TIME) VALUES (%s, %s,%s)""",(temp,Loc,now.strftime('%Y-%m-%d %H:%M:%S'))
Datain = "INSERT INTO irthermo(temperature, Lat, Lon, TIME) VALUES (%s,%s,%s,%s)"
values= (temp1[1],LAT,LONG,now.strftime('%Y-%m-%d %H:%M:%S'))
cursor.execute(Datain,values)
con.commit()
else:
LAT=loc3[1]+loc3[2]
LONG="Unknown"
now=datetime.datetime.utcnow()
print(now.strftime('%Y-%m-%d %H:%M:%S'))
#Datain = """INSERT INTO irthermo(temperature, location, TIME) VALUES (%s, %s,%s)""",(temp,Loc,now.strftime('%Y-%m-%d %H:%M:%S'))
Datain = "INSERT INTO irthermo(temperature, Lat, Lon, TIME) VALUES (%s,%s,%s,%s)"
values= (temp1[1],LAT,LONG,now.strftime('%Y-%m-%d %H:%M:%S'))
cursor.execute(Datain,values)
con.commit()
except Error as e:
print("Error transfer Data", e)
Im am using a family account (premium) and this code returns a'Premium required' error. My code is as follows:
device_id = '0d1841b0976bae2a3a310dd74c0f3df354899bc8'
def playSpotify():
client_credentials_manager = SpotifyClientCredentials(client_id='<REDACTED>', client_secret='<REDACTED>')
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
playlists = sp.user_playlists('gh8gflxedxmp4tv2he2gp92ev')
#while playlists:
#for i, playlist in enumerate(playlists['items']):
#print("%4d %s %s" % (i + 1 + playlists['offset'], playlist['uri'], playlist['name']))
#if playlists['next']:
#playlists = sp.next(playlists)
#else:
#playlists = None
#sp.shuffle(true, device_id=device_id)
#sp.repeat(true, device_id=device_id)
sp.start_playback(device_id=device_id, context_uri='spotify:playlist:4ndG2qFEFt1YYcHYt3krjv')
When using SpotifyClientCredentials the token that is generated doesn't belong to any user but to an app, hence the error message.
What you need to do is use SpotifyOAuth instead. So to initialize spotipy, just do:
sp = spotipy.Spotify(auth_manager=spotipy.SpotifyOAuth())
This will open a browser tab and require you to sign in to your account.
I am currently working on Softbanks' robot Pepper and I try to use Watson speech-to-text solution on Pepper's audio buffers remote streaming by using websocket protocol.
I used the answer to that former question NAO robot remote audio problems to find a way to access remotly pepper's audio buffers and that project https://github.com/ibm-dev/watson-streaming-stt to learn how to use websocket protocole to use watson streaming stt.
However, after I open my websocket application, I start sending buffers to watson and after a few sendings, I receive error: 'Unable to transcode from audio/l16;rate=48000;channel=1 to one of: audio/x-float-array; rate=16000; channels=1'
Each time I'm trying to send Pepper's audio buffer to watson, it is unable to understand it.
I compared data I send with data sent in watson streaming stt example (using pyaudio streaming from microphone instead of Pepper's buffer streaming) and I don't see any difference. Both time I'm pretty sure that I am sending a string containing raw chunks of bytes. Which is what Watson asks for in it documentation.
I try to send chunks of 8192 bytes with a sample rate of 48kHz and I can easily convert Pepper's audio buffer in hexa so I don't understand why Watson can't transcode it.
Here is my code:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
import argparse
import base64
import configparser
import json
import threading
import time
from optparse import OptionParser
import naoqi
import numpy as np
import sys
from threading import Thread
import ssl
import websocket
from websocket._abnf import ABNF
CHANNELS = 1
NAO_IP = "172.20.10.12"
class SoundReceiverModule(naoqi.ALModule):
"""
Use this object to get call back from the ALMemory of the naoqi world.
Your callback needs to be a method with two parameter (variable name, value).
"""
def __init__( self, strModuleName, strNaoIp):
try:
naoqi.ALModule.__init__(self, strModuleName );
self.BIND_PYTHON( self.getName(),"callback" );
self.strNaoIp = strNaoIp;
self.outfile = None;
self.aOutfile = [None]*(4-1); # ASSUME max nbr channels = 4
self.FINALS = []
self.RECORD_SECONDS = 20
self.ws_open = False
self.ws_listening = ""
# init data for websocket interfaces
self.headers = {}
self.userpass = "" #userpass and password
self.headers["Authorization"] = "Basic " + base64.b64encode(
self.userpass.encode()).decode()
self.url = ("wss://stream.watsonplatform.net//speech-to-text/api/v1/recognize"
"?model=fr-FR_BroadbandModel")
except BaseException, err:
print( "ERR: abcdk.naoqitools.SoundReceiverModule: loading error: %s" % str(err) );
# __init__ - end
def __del__( self ):
print( "INF: abcdk.SoundReceiverModule.__del__: cleaning everything" );
self.stop();
def start( self ):
audio = naoqi.ALProxy( "ALAudioDevice", self.strNaoIp, 9559 );
self.nNbrChannelFlag = 3; # ALL_Channels: 0, AL::LEFTCHANNEL: 1, AL::RIGHTCHANNEL: 2; AL::FRONTCHANNEL: 3 or AL::REARCHANNEL: 4.
self.nDeinterleave = 0;
self.nSampleRate = 48000;
audio.setClientPreferences( self.getName(), self.nSampleRate, self.nNbrChannelFlag, self.nDeinterleave ); # setting same as default generate a bug !?!
audio.subscribe( self.getName() );
#openning websocket app
self._ws = websocket.WebSocketApp(self.url,
header=self.headers,
on_open = self.on_open,
on_message=self.on_message,
on_error=self.on_error,
on_close=self.on_close)
sslopt={"cert_reqs": ssl.CERT_NONE}
threading.Thread(target=self._ws.run_forever, kwargs = {'sslopt':sslopt}).start()
print( "INF: SoundReceiver: started!" );
def stop( self ):
print( "INF: SoundReceiver: stopping..." );
audio = naoqi.ALProxy( "ALAudioDevice", self.strNaoIp, 9559 );
audio.unsubscribe( self.getName() );
print( "INF: SoundReceiver: stopped!" );
print "INF: WebSocket: closing..."
data = {"action": "stop"}
self._ws.send(json.dumps(data).encode('utf8'))
# ... which we need to wait for before we shutdown the websocket
time.sleep(1)
self._ws.close()
print "INF: WebSocket: closed"
if( self.outfile != None ):
self.outfile.close();
def processRemote( self, nbOfChannels, nbrOfSamplesByChannel, aTimeStamp, buffer ):
"""
This is THE method that receives all the sound buffers from the "ALAudioDevice" module"""
print "receiving buffer"
# self.data_to_send = self.data_to_send + buffer
# print len(self.data_to_send)
#self.data_to_send = ''.join( [ "%02X " % ord( x ) for x in buffer ] ).strip()
self.data_to_send = buffer
#print("buffer type :", type(data))
#print("buffer :", buffer)
#~ print( "process!" );
print( "processRemote: %s, %s, %s, lendata: %s, data0: %s (0x%x), data1: %s (0x%x)" % (nbOfChannels, nbrOfSamplesByChannel, aTimeStamp, len(buffer), buffer[0],ord(buffer[0]),buffer[1],ord(buffer[1])) );
if self.ws_open == True and self.ws_listening == True:
print "sending data"
self._ws.send(self.data_to_send, ABNF.OPCODE_BINARY)
print "data sent"
#print self.data_to_send
aSoundDataInterlaced = np.fromstring( str(buffer), dtype=np.int16 );
#
aSoundData = np.reshape( aSoundDataInterlaced, (nbOfChannels, nbrOfSamplesByChannel), 'F' );
# print "processRemote over"
# processRemote - end
def on_message(self, ws, msg):
print("message")
data = json.loads(msg)
print data
if "state" in data:
if data["state"] == "listening":
self.ws_listening = True
if "results" in data:
if data["results"][0]["final"]:
self.FINALS.append(data)
# This prints out the current fragment that we are working on
print(data['results'][0]['alternatives'][0]['transcript'])
def on_error(self, ws, error):
"""Print any errors."""
print(error)
def on_close(self, ws):
"""Upon close, print the complete and final transcript."""
transcript = "".join([x['results'][0]['alternatives'][0]['transcript']
for x in self.FINALS])
print("transcript :", transcript)
self.ws_open = False
def on_open(self, ws):
"""Triggered as soon a we have an active connection."""
# args = self._ws.args
print "INF: WebSocket: opening"
data = {
"action": "start",
# this means we get to send it straight raw sampling
"content-type": "audio/l16;rate=%d;channel=1" % self.nSampleRate,
"continuous": True,
"interim_results": True,
# "inactivity_timeout": 5, # in order to use this effectively
# you need other tests to handle what happens if the socket is
# closed by the server.
"word_confidence": True,
"timestamps": True,
"max_alternatives": 3
}
# Send the initial control message which sets expectations for the
# binary stream that follows:
self._ws.send(json.dumps(data).encode('utf8'))
# Spin off a dedicated thread where we are going to read and
# stream out audio.
print "INF: WebSocket: opened"
self.ws_open = True
def version( self ):
return "0.6";
def main():
"""initialisation
"""
parser = OptionParser()
parser.add_option("--pip",
help="Parent broker port. The IP address or your robot",
dest="pip")
parser.add_option("--pport",
help="Parent broker port. The port NAOqi is listening to",
dest="pport",
type="int")
parser.set_defaults(
pip=NAO_IP,
pport=9559)
(opts, args_) = parser.parse_args()
pip = opts.pip
pport = opts.pport
# We need this broker to be able to construct
# NAOqi modules and subscribe to other modules
# The broker must stay alive until the program exists
myBroker = naoqi.ALBroker("myBroker",
"0.0.0.0", # listen to anyone
0, # find a free port and use it
pip, # parent broker IP
pport) # parent broker port
"""fin initialisation
"""
global SoundReceiver
SoundReceiver = SoundReceiverModule("SoundReceiver", pip) #thread1
SoundReceiver.start()
try:
while True:
time.sleep(1)
print "hello"
except KeyboardInterrupt:
print "Interrupted by user, shutting down"
myBroker.shutdown()
SoundReceiver.stop()
sys.exit(0)
if __name__ == "__main__":
main()
I would be thankful if anyone had any idea on how to bypass that error or on what to try to get useful info. I first believed that I was sending "wrong" data to watson however after lots of attempts I have no clue on how to fix that problem.
Thank you a lot,
Alex
I'm trying to build my first dronekit python program, and I'm doing some tests with some examples but I couldn't connect to my UAV(Iris+). I plugged the usb radio(3DR 915 MHz) and I put vehicle = connect('/dev/ttyUSB0', wait_ready=True). Actually I have no idea which string I should put in. Thanks in advance guys, I need some help!
My code:
print "Start simulator (SITL)"
from dronekit_sitl import SITL
sitl = SITL()
sitl.download('copter', '3.3', verbose=True)
sitl_args = ['-I0', '--model', 'quad', '--home=-35.363261,149.165230,584,353']
sitl.launch(sitl_args, await_ready=True, restart=True)
# Import DroneKit-Python
from dronekit import connect, VehicleMode
import time
# Connect to the Vehicle.
print "Connecting to vehicle on: '/dev/ttyUSB0'"
vehicle = connect('/dev/ttyUSB0', wait_ready=True)
# Get some vehicle attributes (state)
print "Get some vehicle attribute values:"
print " GPS: %s" % vehicle.gps_0
print " Battery: %s" % vehicle.battery
print " Last Heartbeat: %s" % vehicle.last_heartbeat
print " Is Armable?: %s" % vehicle.is_armable
print " System status: %s" % vehicle.system_status.state
print " Mode: %s" % vehicle.mode.name # settable
# Close vehicle object before exiting script
vehicle.close()
# Shut down simulator
sitl.stop()
print("Completed")
Best place for getting dk support now is probably here: https://discuss.dronekit.io/c/python
In answer, I have not tried this on Linux. I suspect the connection string is correct, but you may have to also set the baud rate using baud=57600