Rollback Doctrine data:load when insert from fixtures fails - database

I have often noticed that when database insert for a model fails, data loaded previously continue to stay in the database. So when you try to load the same fixture file again, it gives an error.
Is there any way the DATA:LOAD process can be made ATOMIC, i.e. GO or NO-GO for all data, so that data is never inserted half-way.

Hopefully that should work :
Write a task that do the same as data:load but wrap it in :
$databaseManager = new sfDatabaseManager($this->configuration);
$conn = $sf_database_managaer->getDatabase('doctrine')->getDoctrineConnection();
try{
...............
}catch(Exception $e){ //maybe you can be more specific about the exception thrown
echo $e->getMessage();
$conn->rollback();
}

Fixtures are meant for loading initial data, which means that you should be able to build --all --and-load, or in other words, clear all data and re-load fixtures. It doesn't take any longer.
One option you have is to break your fixtures into multiple files and load them individually. This is also what I'd do if you first need to load large amounts of data via a script or from a CSV (i.e. something bigger than just a few fixtures). This way you don't need to redo it if you had a fixtures problem somewhere else.

Related

Can't load bad data with Anzograph

I'm trying to load a filtered Wikidata dump with Anzograph using LOAD WITH 'global' <file:wdump-749.nt.gz> INTO GRAPH <WD_749>. The file exists; Anzograph gives out this error:
Error - At Turtle production subject=http://www.wikidata.org/entity/Q144> predicate=http://www.wikidata.org/prop/direct/P1319> file=wdump-749.nt.gz line=3229 details: -34000-01-01T00:00:00Z:Datum is not a datetime, use setting 'load_normalize_datetime' to patch bad data
I've set load_normalize_datetime=true in settings.conf and settings_anzograph.conf inside Anzograph's filesystem, restarted the server, but still can't load the dump. I get the exact same error.
load_normalize_datetime does not take a boolean. Change bad datetimes in loads to this value, e.g. 0001-01-01T00:00:00Z
So instead try setting:
load_normalize_datetime=0001-01-01T00:00:00Z
in your settings.conf, which worked for me on that specific file using the command you listed.
WD_749 has 38,131,614 statements, loaded in 372 seconds on my Thinkpad. It was relatively slow (102k triples per second) to load because it is a single file. If you break it up into smaller pieces (you can do this with the COPY command to dump the graph to a dir:/mydir/wdump-749.nt.gz) it will load in parallel (for me 114 seconds, 335k tps).

Shrine clear cached images on persistence success

Background
I am using file system storage, with the Shrine::Attachment module in a model setting (my_model), with activerecord (Rails). I am also using it in a direct upload scenario, therefore i need the response from the file upload (save to cache).
my_model.rb
class MyModel < ApplicationRecord
include ImageUploader::Attachment(:image) # adds an `image` virtual attribute
omitted relations & code...
end
my_controller.rb
def create
#my_model = MyModel.new(my_model_params)
# currently creating derivatives & persisting all in one go
#my_model.image_derivatives! if #my_model.image
if #my_model.save
render json: { success: "MyModel created successfully!" }
else
#errors = #my_model.errors.messages
render 'errors', status: :unprocessable_entity
end
Goal
Ideally i want to clear only the cached file(s) I currently have hold of in my create controller as soon as they have been persisted (the derivatives and original file) to permanent storage.
What the best way is to do this for scenario A: synchronous & scenario B: asynchronous?
What i have considered/tried
After reading through the docs i have noticed 3 possible ways of clearing cached images:
1. Run a rake task to clear cached images.
I really don't like this as i believe the cached files should be cleaned once the file has been persisted and not left as an admin task (cron job) that cant be tested with an image persistence spec
# FileSystem storage
file_system = Shrine.storages[:cache]
file_system.clear! { |path| path.mtime < Time.now - 7*24*60*60 } # delete files older than 1 week
2. Run Shrine.storages[:cache] in an after block
Is this only for background jobs?
attacher.atomic_persist do |reloaded_attacher|
# run code after attachment change check but before persistence
end
3. Move the cache file to permanent storage
I dont think I can use this as my direct upload occurs in two distinct parts: 1, immediately upload the attached file to a cached store then 2, save it to the newly created record.
plugin :upload_options, cache: { move: true }, store: { move: true }
Are there better ways of clearing promoted images from cache for my needs?
Synchronous solution for single image upload case:
def create
#my_model = MyModel.new(my_model_params)
image_attacher = #my_model.image_attacher
image_attacher.create_derivatives # Create different sized images
image_cache_id = image_attacher.file.id # save image cache file id as it will be lost in the next step
image_attacher.record.save(validate: true) # Promote original file to permanent storage
Shrine.storages[:cache].delete(image_cache_id) # Only clear cached image that was used to create derivatives (if other images are being processed and are cached we dont want to blow them away)
end

Can Fiddler .SAZ capture files be combined?

Fiddler has an autosave feature which unfortunately clears the captured sessions each time it saves to an .SAZ. Rather than have a folder of Fiddler save sessions (.SAZ's), I'd prefer to have one master .SAZ, albeit saved at regular intervals.
Since there doesn't appear to be an option in Fiddler to do this, is there a way to combine or merge .SAZ files?
There are two possibilities:
Use the Fiddler UI: When you execute the menu command "Load Archive..." you can append the data from the loaded SAZ file to the current session list. Therefore it is possible to load multiple SAZ files and them save them into a new SAZ file.
Use Fiddler Core: Using Fiddler Core you can develop an own .net program that merges multiple SAZ files into one new SAZ file. The methods of loading and saving sessions is pretty simple:
using Fiddler;
Session[] oLoaded1 = Utilities.ReadSessionArchive("FileToLoad1.saz", false);
Session[] oLoaded2 = Utilities.ReadSessionArchive("FileToLoad2.saz", false);
Session[] oAllSessions = ... //<merge the two arrays>
Utilities.WriteSessionArchive("Combined.saz", oAllSessions, null, false);
Sources:
http://fiddler.wikidot.com/fiddlercore-demo
Utilities.ReadSessionArchive
Utilities.WriteSessionArchive

Safety/sanitization when storing images in DB with PHP

I'm looking to store images for an application in an MSSQL database. (I understand that there is some debate about whether this or file system storage is better; that's another thread though.) I'm looking at doing something similar to http://forum.codecall.net/topic/40286-tutorial-storing-images-in-mysql-with-php/ but in CodeIgniter, something along the lines of:
foreach ($_FILES as $upload_name => $info) {
if ($info['name']) {
// Temporary file name stored on the server
$tmpName = $info['tmp_name'];
// Read the file
$fp = fopen($tmpName, 'r');
$data = fread($fp, filesize($tmpName));
fclose($fp);
//model code consolidated here for ease of question-asking
$db = $this->load->database();
$stmt = $db->insert('my_table', array('image' => $data));
}
}
My question is mostly along the lines of security. Basically is there any particular concerns I should have for sanitizing image binary data inserts versus other sorts of string data? I took out the addslashes() in the code from the site linked above because I know CI's active records do some sanitization on their own but I don't know if it is better to have it (or do some other prep work altogether).
If I understand your question correctly, you should not have to worry about it as long as you store the file_type (The file's Mime type) with it and fore the Mime type with the binary data. Then whenever you handle the data you make sure and use it with the proper Mime type so even if they upload a script of virus you can make sure it is only rendered as an image instead of letting your server or the browser handle it.
Other than this I do not think you will need to pull the upload into memory and try and scrub it.

Redis: Wrong data in wrong tables

I am trying to solve a problem that has been blocking me for a month
I am bulding the backend of an application using Node.js & Redis and due to our structure we have to transfer data from one redis table to another (What I mean by table is the one's that we use "select" i.e. "select 2")
We receive a lot of request and push a lot of response in a sec, and no matter how much I tried I could not stop data getting mixed. Assume we have a "teacherID" that has to be stored inside Redis table #2. And a "studentID" that has to be stored in Redis table #4. How matter what I tried (I've checked my code multiple times) I could not stop teacherID getting into studentID. The last trick I've tried was actually placing callback at each select.;
redisClient.select(4, function(err) {
if(err)
console.log("You could not select the table. Function will be aborted");
else {
// Proceed with the logic
}
});
What could be the reason that I cannot simply stop this mess ? One detail that drivers me crazy is that it works really well on local and also online however whenever multiple request reach to server it gets mixed. Any suggestions to prevent this error? (Even though I cannot share the code to NDA I can make sure that logic has been coded correctly)
I'm not sure about your statement about having to "transfer data from one redis table to another". Reading through your example it seems like you could simply have two redis clients that write to different databases (what you called "tables").
It would look similar to this:
var redis = require("redis");
var client1 = redis.createClient(6379, '127.0.0.1');
var client2 = redis.createClient(6379, '127.0.0.1');
client1.select(2, function(err){
console.log('client 1 is using database 2');
});
client2.select(4, function(err){
console.log('client 2 is using database 4');
});
Then, wherever your read/write logic is, you just use the appropriate client:
client1.set("someKey", "teacherID", function(err){
// ...
});
client2.set("someKey", "studentID", function(err){
// ...
});
You can obviously encapsulate the above into functions with callbacks, nest the operations, use some async library, etc. to make it do whatever you need it to do. If you need to transfer values from database 2 to database 4 you could do a simple client1.get() and client2.set().

Resources