Laravel 5.2: Handling database insertions using the Laravel Queues service - database

Good morning!
I have the following code in my Controller:
$num_rows = $this->drupalRequestRowCount();
$uris = $this->drupalRequestPrepareUri($num_rows, config('constants.IMPORT_DATA_URI'));
$requestHandler = new RequestHandler($uris[0]);
$content = $requestHandler->httpRequest();
// Dispatch each URI for the job that will handle
// the insertion of our Drupal data.
foreach($uris as $uri) {
$job = (new DataJob($uri))->onQueue('data_export_insert');
$this->dispatch($job);
}
return 'Jobs dispatched';
I have a Drupal database with data that I want to synchronize with my Laravel database every night. I make use of an API i developed myself in Drupal that returns a lot of data that I need for my Laravel database.
The API returns a maximum of 10 items (which contain a lot of data). The Drupal database has +- 17000 items that I need to import every night. I thought that the Queue API might be a good solution for this to import the data in batches instead of importing it all at once. I loop through the $uris array in the foreach loop that appends an offset to the API based on the $num_rows I get.
So if $num_rows returns 17000 items, the $uris array will contain 1700 items like so:
[0] => 'http://mywebsite.com/api/request_data.json?offset=0'
[1] => 'http://mywebsite.com/api/request_data.json?offset=10'
[2] => 'http://mywebsite.com/api/request_data.json?offset=20' etc.
This means that I want to have 1700 jobs dispatched that Laravel will execute for me once I run the php artisan queue:listen database command.
The following code is the Job I want to execute 1700 times.
public function __construct($uri)
{
$this->uri = $uri;
}
public function handle()
{
try {
Log::info('Inserting data into database for uri: ' . $this->uri);
$requestHandler = new RequestHandler($this->uri);
$content = $requestHandler->httpRequest();
foreach($content as $key => $value) {
$id = $value->id;
$name = $value->name;
DB::insert('INSERT INTO table_1 (id, name) VALUES (?, ?)', [$id, $name]);
}
} catch(\Exception $e) {
Log::error('Error in job: '. $e->getMessage());
}
}
My problem is that the Jobs don't get executed at all. The jobs are being correctly dispatched to the database because I can see 1700 rows in Jobs the table when I check the Jobs table in the database. I tried logging the handle function but when i run php artisan queue:listen or php artisan queue:work nothing happens. The failed_jobs table also don't have any records.
The handle function is not being executed so there is no way I can log anything that's happening inside. I tried finding information on the internet with examples, but all I can find is some mailing examples which I don't need.
Any help would be greatly appreciated!

Fixed the issue. Make sure you place in your Job code protected $payload! With payload being the data you want to pass to the Queue Handler!

Related

How do I load data from Cloud Storage to Cloud Datastore from an AppEngine PHP application?

I have been searching various sources but it is not clear to this neewbie. How do I load data (CSV file) from Cloud Storage to Cloud Datastore from an AppEngine PHP application? I do have an existing method which downloads the file and then loads each row as a transaction. It takes a few hours for a few million rows so this does not seem the best method and have been searching for a more efficient method. I appreciate any guidance.
Editing this as I have switched to trying to use a remote URL from which to load the JSON data into Datastore from GAE. Code is not working though I do not know why (yet):
<?php
require 'vendor/autoload.php';
use Google\Auth\ApplicationDefaultCredentials;
use Google\Cloud\Datastore\DatastoreClient;
/**
* Create a new product with a given SKU.
*
* #param DatastoreClient $datastore
* #param $sku
* #param $product
* #return Google\Cloud\Datastore\Entity
*/
function add_product(DatastoreClient $datastore, $sku, $product)
{
$productKey = $datastore->key('SKU', $sku);
$product = $datastore->entity(
$productKey,
[
'created' => new DateTime(),
'name' => strtolower($product)
]);
$datastore->upsert($product);
return $product;
}
/*
Load Cloud DataStore Kind from remote URL
#param $projectId
#param $url
*/
function load_datastore($projectId, $url) {
// Create Datastore client
$datastore = new DatastoreClient(['projectId' => $projectId]);
// Enable `allow_url_fopen` to allow reading file from URL
ini_set("allow_url_fopen", 1);
// Read the products listing and load to Cloud Datastore.
// Use batches of 20 for a transaction
$json = json_decode(file_get_contents($url), true);
$count = 1;
foreach($json as $sku_key => $product_val) {
if ($count == 1) {
$transaction = $datastore->transaction();
}
add_product($datastore, $sku_key, $product_val);
if ($count == 20) {
$transaction->commit();
$count = 0;
} catch (Exception $err) {
echo 'Caught exception: ', $err->getMessage(), "\n";
$transaction->rollback();
}
$count++;
}
}
try
{
$projectId = 'development';
$url = 'https://raw.githubusercontent.com/BestBuyAPIs/open-data-set/master/products.json';
load_datastore($projectId, $url);
} catch (Exception $err) {
echo 'Caught exception: ', $err->getMessage(), "\n";
$transaction->rollback();
}
?>
Google provides pre-written dataflow templates. You can use the GCS to Datastore Dataflow Template to read in the CSV, convert the CSV into Datastore Entity JSON, and write the results to datastore.
Let's assume you have a CSV of the following:
username, first, last, age, location.zip, location.city, location.state
samsmith, Sam, Smith, 33, 94040, Mountain View, California
johndoe, John, Doe, 50, 30075, Roswell, Georgia
dannyboy, Danny, Mac, 94040, Mountain View, California
You could have the following UDF to transform this CSV into a Datastore Entity of Kind People. This UDF assumes the following Schema:
username = Key & String Property
first = String Property
Last = String Property
Age = Integer Property
Location = Record
Location.Zip = Integer Property
Location.City = String Property
Location.State = String Property
This UDF outputs a JSON encoded Entity. This is the same JSON payload as used by the Cloud Datastore REST API. Values can be of the following types.
function myTransform(csvString) {
var row = csvString.split(",");
if (row.length != 4) { return; }
return JSON.stringify({
"key": {
"partition_id": {
// default namespace is an empty string
"namespace_id": ""
},
"path": {
"kind": "People",
"name": row[0]
}
},
"properties": {
"username": { "stringValue": row[0] },
"first": { "stringValue": row[1] },
"last": { "stringValue": row[2] },
"age": { "integerValue": row[3] },
"location": {
"entityValue": {
"properties": {
"zip": { "integerValue": row[4] },
"city": { "stringValue": row[5] },
"state": { "stringValue": row[6] }
}
}
}
}
});
}
To run the dataflow template. First save that UDF to a GCS bucket using gsutil.
gsutil cp my_csv_udf.js gs://mybucket/my_csv_udf.js
Now head into the Google Cloud Platform Console. Head to the dataflow page. Click on Create Job From Template and select "GCS Text to Datastore". You can also refer to this doc.
You job parameters would look like as follows:
textReadPattern = gs://path/to/data/*.csv
javascriptTextTransformGcsPath = gs://mybucket/my_csv_udf.js
javascriptTextTransformFunctionName = myTransform
datastoreWriteProjectId = my-project-id
errorWritePath = gs://path/to/data/errors
Note: The UDF transform only supports JavaScript ECMAScript 5.1. So only basic javascript, no fancy arrow functions / promises...etc.
This question is similar to Import CSV into google cloud datastore and Google Cloud Datastore: Bulk Importing w Node.js .
The quick answer is that you can use Apache Beam or Cloud Dataflow to import CSV data into Cloud Datastore.
Sorry for not being more specific, but I'm a python standard env GAE user, rather unfamiliar with the PHP environment(s).
In general your current approach is serialized and synchronous - you're processing the rows one at a time (or, at best, in batches of 20 if all the upsert calls inside a transactions actually go to the datastore in a single batch), blocking for every the datastore interaction and advancing to the next row only after that interaction completes.
I'm unsure if the PHP environment supports async datastore operations and/or true batch operations (the python ndb library can batch up to 500 writes into one datastore call) - those could help speed things up.
Another thing to consider if your rows are entirely independent - do you actually need transactions for writing them? If PHP supports plain writing you could do that instead (transactions take longer to complete).
Even without the above-mentioned support, you can still speed things up considerably by decoupling the row reading from the waiting for completion of datastore ops:
in the current request handler you keep just the row reading and creating batches of 20 rows somehow passed for processing on other threads (task queue, pub/sub, separate threads - whatever you can get in PHP)
on a separate request handler (or task queue or pub/sub handler, depending on how you choose to pass your batch data) you receive those batches and make the actual datastore calls. This way you can have multiple batches processed in parallel, the amount of time they're blocked waiting for the datastore replies becoming irrelevant from the overall processing time perspective.
With such approach your performance would be limited only by the speed at which you can read the rows and enqueue those batches. If you want to be even faster - you could also split the single CSV file into multiple smaller ones, thus also having multiple row readers that could work in parallel, feeding those batch processing workers.
Side note: maybe you want to retry the failed/rolled-back transactions or save those entities for a later retry, currently it appears you're losing them.

Laravel - How to pass a single database instance at a time to the queue job for processing?

I have a list of websites in my MySQL Database. I need to crawl these websites using PHP file_get_html method (provided by Simple HTML DOM parser). When I try to parse each website, it takes huge time and the execution time limit is exceeded. And I need to keep crawling these websites every 30 mins. So to manage this, I am trying implement queue, which I guess would be the right solution.
This is sample data table:
Id | Website Name | Website Source
1 | Website 1 | www.website1.com
2 | Website 2 | www.website2.com
3 | Website 3 | www.website3.com
4 | Website 4 | www.website4.com
But, I am facing issues when I try to send a database instance to the queue. This is my controller function:
public function queueAllLinks(){
include('simplehtmldom_1_5/simple_html_dom.php');
$sourceObj = SourceDetails::all();
foreach($sourceObj as $sourceObj) {
CrawlLinks::dispatch($sourceObj);
}
}
I am dispatching to job CrawlLinks. This is my job class:
<?php
namespace App\Jobs;
use App\SourceDetails;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
class CrawlLinks implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public function __construct()
{
}
public function handle(SourceDetails $sourceObj) {
$source_url = $sourceObj->website_source;
$source_name = $sourceObj->website_name;
$source_id = $sourceObj->id;
$html = file_get_html($source_url);
//process and save data in code
}
}
But this doesn't seem to work. And I am getting the following error:
local.ERROR: Type error: Argument 1 passed to App\Jobs\CrawlLinks::__construct() must be an instance of App\SourceDetails, instance of Illuminate\Database\Eloquent\Collection given, called in D:\Projects\marathinews\vendor\laravel\framework\src\Illuminate\Foundation\Bus\Dispatchable.php on line 14
I tried to find solutions but all the sources use 'email' example. None of the sources explains about database instance.
PS - I am new to Laravel Queues.
Your $sourceObject is being serialized when sent to the queue, and deserialized during the handle method. Why dont you try to change your parameter to be an ..Eloquent\Collection object instead of App\SourceDetails and see if anything happen?
Solution to above problem was to dumpautoload using command composer dumpautoload. I have to run this command everytime I make any changes to my Queue Jobs and then only run queue worker. Otherwise, I guess, changes are not reflected in the queue.

Drupal: Running Feeds Importer Programatically - Where to Put Code

When running a feeds importer using Cron, the importer times out resulting in an incomplete import. I'm trying to use a script to execute the importer and I've come across this code a few times:
<?php
function MODULE_NAME_cron() {
$name = 'FEED_NAME';
$source = feeds_source($name);
$source->import();
}
?>
However, when executing this I get an error saying there's no feeds_source() function, which leads me to believe that I just don't know where to put this code (a separate php file just isn't working for me). Can anyone help me out here? Thanks in advance!
I think you need to call $source->startImport(); method instaed of $source->import();
I just posted an answer to a similar question over here: https://drupal.stackexchange.com/questions/90062/programmatically-execute-feed-import/153434#153434 , which might help. in short, in the a form submit hook if your using the batch api, or dont use the batch api if you're doing it non browser (install hook , install profile)
so in your case
function load_data(){
// Files to import in specific order.
$files = array(
'challenge_importer' => 'data/challenges.csv',
);
$file_location_base = drupal_get_path('module', 'challenge');
foreach ($files as $feed => $file) {
$feedSource = feeds_source($feed);
// Set the file name to import
$filename = $file_location_base.'/' . $file;
if (!file_destination($filename, FILE_EXISTS_ERROR)) {
$config = $feedSource->getConfig();
$config['FeedsFileFetcher']['source'] = $filename;
$feedSource->setConfig($config);
$feedSource->save();
while (FEEDS_BATCH_COMPLETE != $feedSource->import());
}
}
}
could be called from the cron , or use the scheduled execution from the feeds importer

Debugging SQL in controller

I am trying to debug my sql but I am having a hard time. I know I can use this:
<?php echo $this->element('sql_dump'); ?>
to dump the sql but this doesnt (or at least I dont know how to use it) work if I am doing an ajax call. Because the page is not reloaded, the dump does not get refreshed. How can I run my command and debug the sql? Here is the code I have in my controller:
public function saveNewPolicy(){
$this->autoRender = false;
$policyData = $this->request->data["policyData"];
$numRows=0;
$data = array(
'employee_id' => trim($policyData[0]["employeeId"]),
'insurancetype_id'=> $policyData[0]["insuranceTypeId"],
'company' => $policyData[0]["companyName"],
'policynumber' => $policyData[0]["policyNumber"],
'companyphone' => $policyData[0]["companyPhone"],
'startdate'=> $policyData[0]["startDate"],
'enddate'=> $policyData[0]["endDate"],
'note' => $policyData[0]["notes"]
);
try{
$this->Policy->save($data);
$numRows =$this->Policy->getAffectedRows();
if($numRows>0){
$dataId = $this->Policy->getInsertID();
$response =json_encode(array(
'success' => array(
'msg' =>"Successfully Added New Policy.",
'newId' => $dataId
),
));
return $response;
}else{
throw new Exception("Unspecified Error. Data Not Save! ");
}
}catch (Exception $e){
return $this->EncodeError($e);
}
}
The problem is that if the company field in my array is empty, empty, the insert will fail without any error. I know it has failed, though, because of the numrows variable I use. I know the field accepts nulls in the database. It seems like the only way for me to debug this is to look at what SQL is being sent to MySql. Anyone know how to debug it? I am using CakePhp 2.4
I use this approach. I added this method in my AppModel class:
public function getLastQuery() {
$dbo = $this->getDatasource();
$logs = $dbo->getLog();
$lastLog = end($logs['log']);
return $lastLog['query'];
}
and then in any controller you call this like:
debug($this->{your model here}->getLastQuery());
Rather than trying to hack around in CakePHP, perhaps it would be easier to just log the queries with MySQL and do a tail -f on the log file? Here's how you can turn that on:
In MySQL, run SHOW VARIABLES; and look for general_log and general_log_file entries
If general_log is OFF, run SET GLOBAL general_log = 'ON'; to turn it on
In a terminal, run a tail -f logfile.log (log file location is in the general_log_file entry) to get a streaming view of the log as it's written to
This is very helpful for these circumstances to see what's going on behind the scenes, or if you have debug off for some reason.

Grails database migration plugin - Java heap space

I am running grails 1.3.7 and using the grails database migration plugin version database-migration-1.0
The problem I have is I have a migration change set. That is pulling blobs out of a table and writing them to disk. When running through this migration though I am running out of heap space. I was thinking I would need to flush and clear the session to free up some space however I am having difficulty getting access to the session from within the migration. BTW The reason it's in a migration is we are moving away from storing files in oracle and putting them on disk
I have tried
SessionFactoryUtils.getSession(sessionFactory, true)
I have also tried
SecurityRequestHolder.request.getSession(false) //request in null -> not surprising
changeSet(author: "userone", id: "saveFilesToDisk-1") {
grailsChange{
change{
def fileIds = sql.rows("""SELECT id FROM erp_file""")
for (row in fileIds) {
def erpFile = ErpFile.get(row.id)
erpFile.writeToDisk()
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
ConfigurationHolder.config.erp.ErpFile.persistenceMode = previousMode
}
}
}
Any help would be greatly appreciated.
The application context will be automatically available in your migration as ctx. You can get the session like this:
def session = ctx.sessionFactory.currentSession
To access session, you can use withSession closure like this:
Book.withSession { session ->
session.clear()
}
But, this may not be the reason why your app run out of heap space. If the data volume is large, then
def fileIds = sql.rows("""SELECT id FROM erp_file""")
for (row in fileIds) {
..........
}
will consume up your space. Try to process the data with pagination. Don't load all the data at once.

Resources