Titan graph database too slow with 100000+ vertices with indices how to optimize it? - database

Here is the indices code:
`
g = TitanFactory.build().set("storage.backend", "cassandra")
.set("storage.hostname", "127.0.0.1").open();
TitanManagement mgmt = g.getManagementSystem();
PropertyKey db_local_name = mgmt.makePropertyKey("db_local_name")
.dataType(String.class).make();
mgmt.buildIndex("byDb_local_name", Vertex.class).addKey(db_local_name)
.buildCompositeIndex();
PropertyKey db_schema = mgmt.makePropertyKey("db_schema")
.dataType(String.class).make();
mgmt.buildIndex("byDb_schema", Vertex.class).addKey(db_schema)
.buildCompositeIndex();
PropertyKey db_column = mgmt.makePropertyKey("db_column")
.dataType(String.class).make();
mgmt.buildIndex("byDb_column", Vertex.class).addKey(db_column)
.buildCompositeIndex();
PropertyKey type = mgmt.makePropertyKey("type").dataType(String.class)
.make();
mgmt.buildIndex("byType", Vertex.class).addKey(type)
.buildCompositeIndex();
PropertyKey value = mgmt.makePropertyKey("value")
.dataType(Object.class).make();
mgmt.buildIndex("byValue", Vertex.class).addKey(value)
.buildCompositeIndex();
PropertyKey index = mgmt.makePropertyKey("index")
.dataType(Integer.class).make();
mgmt.buildIndex("byIndex", Vertex.class).addKey(index)
.buildCompositeIndex();
mgmt.commit();`
Here is the search for vertices and then add vertex with 3 edges on 3GHz 2GB RAM pc. It does 830 vertices in 3 hours and I have 100,000 data its too slow. The code is below:
for (Object[] rowObj : list) {
// TXN_ID
Iterator<Vertex> iter = g.query()
.has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "txn_id")
.has("value", rowObj[0]).vertices().iterator();
if (iter.hasNext()) {
vertex1 = iter.next();
logger.debug("vertex1=" + vertex1.getId() + ","
+ vertex1.getProperty("db_local_name") + ","
+ vertex1.getProperty("db_schema") + ","
+ vertex1.getProperty("db_column") + ","
+ vertex1.getProperty("type") + ","
+ vertex1.getProperty("index") + ","
+ vertex1.getProperty("value"));
}
// TXN_TYPE
iter = g.query().has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "txn_type")
.has("value", rowObj[1]).vertices().iterator();
if (iter.hasNext()) {
vertex2 = iter.next();
logger.debug("vertex2=" + vertex2.getId() + ","
+ vertex2.getProperty("db_local_name") + ","
+ vertex2.getProperty("db_schema") + ","
+ vertex2.getProperty("db_column") + ","
+ vertex2.getProperty("type") + ","
+ vertex2.getProperty("index") + ","
+ vertex2.getProperty("value"));
}
// WALLET_ID
iter = g.query().has("db_local_name", "Report Name 1")
.has("db_schema", "MPS").has("db_column", "wallet_id")
.has("value", rowObj[2]).vertices().iterator();
if (iter.hasNext()) {
vertex3 = iter.next();
logger.debug("vertex3=" + vertex3.getId() + ","
+ vertex3.getProperty("db_local_name") + ","
+ vertex3.getProperty("db_schema") + ","
+ vertex3.getProperty("db_column") + ","
+ vertex3.getProperty("type") + ","
+ vertex3.getProperty("index") + ","
+ vertex3.getProperty("value"));
}
vertex4 = g.addVertex(null);
vertex4.setProperty("db_local_name", "Report Name 1");
vertex4.setProperty("db_schema", "MPS");
vertex4.setProperty("db_column", "amount");
vertex4.setProperty("type", "indivisual_0");
vertex4.setProperty("value", rowObj[3].toString());
vertex4.setProperty("index", i);
vertex1.addEdge("data", vertex4);
logger.debug("vertex1 added");
vertex2.addEdge("data", vertex4);
logger.debug("vertex2 added");
vertex3.addEdge("data", vertex4);
logger.debug("vertex3 added");
i++;
g.commit();
}
Is there anyway to optimize this code?

For completeness, this question was answered in the Aurelius Graphs mailing list:
https://groups.google.com/forum/#!topic/aureliusgraphs/XKT6aokRfFI
Basically:
build/use a real composite index:
mgmt.buildIndex("by_local_name_schema_value", Vertex.class).addKey(db_local_name).addKey(db_schema).addKey(value).buildComposite();
don't call g.commit() after each loop cycle, instead do something
like this: if (++1%10000 == 0) g.commit()
turn on storage.batch-loading if not already doing so
if all you can throw at cassandra is 2G of RAM consider using BerkleyDB. Cassandra prefers 4G of RAM minimum and would probably like "more"
I don't know the nature of your data, but can you pre-sort it and use BatchGraph as described in the Powers of Ten - Part I blog post and in the wiki - Using BatchGraph would prevent you from having to maintain the transaction described in number 2 above.

Related

creating a file in google cloud storage - IOError: Buffer is closed

I created a python script to pull the data from facebookads API and create a file in google cloud storage by using google app engine.
getting the following error while writing the data to google cloud storage but data is displaying properly on web browser:
IOError: Buffer is closed.
After some research I understood that, this error will come when not able to recognize end of the lin ("\n") , so it treats the entire file as a single line and raise "Buffer is Closed" error.
So I added following code and now displaying rows properly on web browser but still getting error while writing into gcs.
data1=data.replace("\n", "<br />")
Code:
class get_region_insights(webapp.RequestHandler):
_apptitle = None
_projectid = None
_projectnumber = None
def get(self):
#bucket_name = os.environ.get('BUCKET_NAME', app_identity.get_default_gcs_bucket_name())
cfg=appsettings()
for i in cfg._templates:
id=int(i['_id'])
if id == 7:
### Read variables from config file
bucket_name = i['_gcsbucket']
bucket = '/' + bucket_name
filename = bucket + '/' + i['_filename'] + str(time.strftime("%d_%m_%Y")) + ".csv"
ad_acct=i['_ad_acct']
app_id = i['_app_id']
app_secret = i['_app_secret']
access_token = i['_access_token']
needed_keys=ast.literal_eval(i['_needed_keys'])
self.tmp_filenames_to_clean_up = []
u = date.today()
sub_days = 1
s = u - timedelta(sub_days)
until = str(u)
since = str(s)
params = {
'fields': [
FBAdsInsights.Field.account_id,
FBAdsInsights.Field.campaign_id,
FBAdsInsights.Field.campaign_name,
FBAdsInsights.Field.adset_id,
FBAdsInsights.Field.adset_name,
FBAdsInsights.Field.impressions,
FBAdsInsights.Field.spend,
FBAdsInsights.Field.actions,
FBAdsInsights.Field.action_values,
],
'filtering': [
{
'field': 'action_type',
'operator': 'IN',
'value': ["link_click","comment", "post_reaction", "post", "offsite_conversion.fb_pixel_purchase"] #custom rule filter
},
],
'level': 'adset',
'time_range': {
'since': since, #user input field
'until': until #specify dynamic date range between (today() - (days_entered)) and today()
},
'time_increment': 1,
'breakdowns': ['region'],
'action_breakdowns': ['action_type'],
}
### Initializing Google cloud Storage Object
write_retry_params = _gcs.RetryParams(backoff_factor=1.1)
gcs_file=_gcs.open(filename, 'w', content_type='text/plain',retry_params=write_retry_params)
### Facebook Initialization
session=FacebookSession(app_id,app_secret,access_token)
api=FacebookAdsApi(session)
FacebookAdsApi.set_default_api(api)
ad_account = FBAdAccount(ad_acct)
stats = ad_account.get_insights(params=params,async=True)
stats.remote_read()
while stats[AdReportRun.Field.async_percent_completion] < 100:
time.sleep(1)
stats.remote_read()
time.sleep(1)
stats1 = stats.get_result()
x = [x for x in stats1]
### Printing the result and writing to Google Cloud Storage
for i in x:
for k in i['actions']:
if k['action_type'] == "offsite_conversion.fb_pixel_purchase":
Purchase_Facebook_Pixel = k['value']
if k['action_type'] == "comment":
post_comment= k['value']
if k['action_type'] == "link_click":
link_click=k['value']
if k['action_type'] == "post":
post_share=k['value']
if k['action_type'] == "post_reaction":
post_reaction=k['value']
for m in i['action_values']:
if m['action_type'] == "offsite_conversion.fb_pixel_purchase" :
Purchase_Conversion_Value_Facebook_Pixel=m['value']
data=(i['account_id'] + "," + i['adset_id'] + "," + i['campaign_id'] + "," + i['date_start'] + "," + i['date_stop'] + ","+ i['impressions']+ "," + i['region'] + ","+ i['spend']+ "," + link_click + "," + Purchase_Facebook_Pixel + "," + Purchase_Conversion_Value_Facebook_Pixel+"\n")
data1=data.replace("\n", "<br />")
self.response.write(data.replace("\n", "<br />"))
#self.response.write("\n"+i['account_id'] + "," + i['adset_id'] + "," + i['adset_name'] + "," + i['age'] + "," + i['campaign_id'] + "," +i['campaign_name'] + "," + i['date_start'] + "," + i['date_stop'] + ","+i['gender'] + ","+ i['impressions']+","+ i['spend']+ "," + link_click + "," + post_comment + "," + post_share + "," + post_reaction + "," + Purchase_Facebook_Pixel + "," + Purchase_Conversion_Value_Facebook_Pixel+"\n")
gcs_file.write(data1.encode('utf-8'))
gcs_file.close()
self.tmp_filenames_to_clean_up.append(filename)
You are opening the cloud storage file outside your loop, but then you close inside the loop.
### Initializing Google cloud Storage Object
write_retry_params = _gcs.RetryParams(backoff_factor=1.1)
gcs_file=_gcs.open(filename, 'w', content_type='text/plain',retry_params=write_retry_params)
### Facebook Initialization
...
### Printing the result and writing to Google Cloud Storage
for i in x:
# do stuff with data
...
gcs_file.write(data1.encode('utf-8'))
gcs_file.close() # <-- closing the file buffer
self.tmp_filenames_to_clean_up.append(filename)
If you want to write one file for each loop iteration, open and close the file inside the loop.
If you want to to write all the data to a single file, open and close the file outside the loop.

how to create pymol rename loop

I would like to create a loop for changing interactions name in PyMol. But after one selection loop it crashes and doesn't work.
def get_dists(interactions): # interactions=([1,2], [3,4])
for i in interactions:
a = "////" + str(i[0]) + "/C2'"
b = "////" + str(i[1]) + "/C2'"
cmd.distance("(" + a + ")", "(" + b + ")")
for j in range(1, 599):
x = "dist" + "0" + str(j)
y = str(i[0]) + " " + str(i[1])
cmd.set_name(str(x), str(y))
In Pymol the default name of interactions is dist01, 02 , 03.
I want to change these to 1_3, 5_59, 4_8, (interaction between residue).
Your code is totally fine except for one thing: If PyMol doesn't succeed with set_name the whole script is aborted. When you change it to, it should work:
try:
cmd.set_name(str(x), str(y))
except:
print('failed to rename')
Some additional comments:
y = str(i[0]) + " " + str(i[1]) should be y = str(i[0]) + "_" + str(i[1])
this line is probably for padding zeros x = "dist" + "0" + str(j). This is only needed when j is a single digit, otherwise the name of the distance objects is dist20 or dist123
cmd.set_name(str(x), str(y)) can be simplified to cmd.set_name(x, y) since x and y are already strings.

I have custom code that I need to bold for output. What google sheets script can I use?

I have created a customized work order submission form in Forms & Sheets that auto emails a confirmation from each submission (job request form) to create a data trail of vendor activity. Fairly integrated and totally cobbled together by a lot of reading in these forums coupled with a gazillion frustrating moments of trial & error. Novice moving towards "capable" but Im stuck on a piece of code for a triggered confirmation email with random work order generator and email confirmations and toggle based management built in. The code below that I actually need help with is for that triggered confirmation email that sends a confirmation of service, work order #, and also shows everything they originally submitted. The problem is that the code I have is providing the data exactly how and I want it and placement is great, but I need to create visual distinction between the column titles and the variable submission data. Can someone please help me add a bold code to the column titles in line 16 to help create that visual differentiation between columnar "category and submission data?
// This constant is written in column C for rows for which an email
// has been sent successfully.
var EMAIL_SENT = "EMAIL_SENT";
function sendEmails2() {
var sheet = SpreadsheetApp.getActiveSheet();
var startRow = 2; // First row of data to process
var numRows = 1000; // Number of rows to process
// Fetch the range of cells A2:B3
var dataRange = sheet.getRange(startRow, 1, numRows, 27)
// Fetch values for each row in the Range.
var data = dataRange.getValues();
for (var i = 0; i < data.length; ++i) {
var row = data[i];
var emailAddress = row[19];
var message = row[16] + "\n\n" + "Submitted By: " + row[19] + "\n\n" + "Date Submitted: " + row[0] + "\n\n" + row[21] + "\n\n" + "IMPORTANT NOTES FROM CDS: " + row[20] + "\n\n" + "Full Show Services: " + row[3] + "\n\n" + "Event Start Date: " + row[4] + "\n\n" + "Event End Date: " + row[5] + "\n\n" + "Warehouse Locations: " + row[6] + "\n\n" + "Individual Services Requested: " + row[7] + "\n\n" + "Individual Services - Warehouse(s) & Date(s) Requested: " + row[8] + "\n\n" + "Partial Hourly Staffing Details Requested: " + row[9] + "\n\n" + "Requestors Instructions / Comments: " + row[10] + "\n\n" + "Files: " + row[11] + row[12] + "\n\n" + "Thank you for your request. We appreciate your business. CDS Special Events Team ";// Second columnn
var emailSent = row[18];
var subject = row[16];// Third columnvar ss = SpreadsheetApp.getActiveSpreadsheet();
if (emailSent != EMAIL_SENT) { // Prevents sending duplicates
MailApp.sendEmail(emailAddress, subject, message);
sheet.getRange(startRow + i, 19).setValue(EMAIL_SENT);
// Make sure the cell is updated right away in case the script is interrupted
SpreadsheetApp.flush();
}
}
}

creating a 2D array from String arrays

I have built a program which takes 13 comma separated values of user input and appends them into a text file, each to a new line. Not too difficult, but I am new.
Now I am trying to bring these single line arrays back from the file into a 2D array where I hope to work with the values. I think I am close, but there is an obvious mistake that my inexperience does not allow me to see. I also know that there are much better class strategies written to handle this task, but again . . . newbie.
What I think is happening is that the entire file is being written into one location in the 2D array(I can sysout the finalArray , and it looks right, but only in position [0][0]). So how do I get each "z" to fill in the next open slot as it processes through ?. Hope you can help, here is my code:
String[][] finalArray = new String[100][13];
int i=-1;
try
{
x = new BufferedReader(new FileReader(readFile));
} catch (FileNotFoundException e1)
{
e1.printStackTrace();
}
try
{
while ((line = x.readLine()) != null)
{
String[] y = line.split(separator);
try
{
z ="["+ y[0] + "," + y[1] + "," + y[2] + "," + y[3]
+ "," + y[4] + "," + y[5] + "," + y[6] + "," + y[7]
+ "," + y[8] + "," + y[9] + "," + y[10] + y[11]
+ "," + y[12] + "," + y[13]+"]";
finalArray[i+1][0] = z;
z is not array it is a String but You have array of string array
therefore your loop must look like this
i=0; // array starts with 0 Item
while ((line = x.readLine()) != null)
{
finalArray[i++] = line.split(separator);
}

Neo4J modifying nodes ignored

I'm using this code for opening a database:
GraphDatabaseAPI graphdb2 = (GraphDatabaseAPI) new GraphDatabaseFactory().newEmbeddedDatabaseBuilder("D:\\test\\neo4j\\data").setConfig(ShellSettings.remote_shell_enabled, "TRUE").
setConfig(GraphDatabaseSettings.node_keys_indexable,
USERNAME_PROPERTY + "," + TITLE_PROPERTY + ","
+ NAME_PROPERTY + "," + LABEL_PROPERTY + "," + TYPE_PROPERTY).
setConfig(GraphDatabaseSettings.relationship_keys_indexable,
USERNAME_PROPERTY).
setConfig(GraphDatabaseSettings.node_auto_indexing, "true").
setConfig(GraphDatabaseSettings.relationship_auto_indexing,
"true").
newGraphDatabase();
ServerConfigurator config;
config = new ServerConfigurator(graphdb2);
config.configuration().setProperty(
Configurator.WEBSERVER_PORT_PROPERTY_KEY, 1234);
srv = new WrappingNeoServerBootstrapper(graphdb2, config);
srv.start();
graphDb = srv.getServer().getDatabase().getGraph();
registerShutdownHook(graphDb);
However, when my app stops running, all of the modification is ignored.
Why is that?

Resources