How to safely remove a duplicate index from a Rails 3 schema?

How to safely remove a duplicate index from a Rails 3 schema? - database

I'm working on a Rails 3 app, and we recently realized we have a duplicate index:
# from schema.rb
add_index "dogs", ["owner_id"], :name => "index_dogs_on_owner"
add_index "dogs", ["owner_id"], :name => "index_dogs_on_owner_id"
How can I check which index ActiveRecord is using for relevant queries? Or do I even need to? If one of the indices is removed will ActiveRecord happily just use the other?
I can play around with it locally, but I'm not sure our production environment behaves exactly the same at the DB level.

The name of the index is arbitrary. The database engine will look at the indexes based on the column name, not the human name. The index will not affect ActiveRecord. I recommend removing whichever index is least obvious, in this case index_dogs_on_owner, because the other index is clearly on the owner_id column.
remove_index :dogs, :name => 'index_dogs_on_owner'
Cite: http://apidock.com/rails/ActiveRecord/ConnectionAdapters/SchemaStatements/remove_index

Related

Ruby convert array of active records or objects into array of hashes

I have an object Persons which is an ActiveRecord model with some fields like :name, :age .etc.
Person has a 1:1 relationship with something called Account where every person has an account .
I have some code that does :
Account.create!(person: current_person)
where current_person is a specified existing Person active record object.
Note : The table Account has a field for person_id
and both of them have has_one in the model for each other.
Now I believe we could do something like below for bulk creation :
Account.create!([{person: person3},{person:: person2} ....])
I have an array of persons but am not sure of the best way to convert to an array of hashes all having the same key.
Basically the reverse of Convert array of hashes to array is what I want to do.

Why not just loop over your array of objects?
[person1, person2].each{|person| Account.create!(person: person)}
But if for any reason any of the items you loop over fail Account.create! you may be left in a bad state, so you may want to wrap this in an Active Record Transaction.
ActiveRecord::Base.transaction do
[person1, person2].each{|person| Account.create!(person: person)}
end

The create method actually persists each hash individually, as shown in the source code, so probably it's not what you are looking for. Either way the following code would do the job:
Account.create!(persons.map { |person| Hash[:person_id, person.id] })
If you need to create all records in the same database operation and are using rails 6+ you could use the insert_all method.
Account.insert_all(persons.map { |person| Hash[:person_id, person.id] })
For previous versions of rails you should consider using activerecord-import gem.
# Combination(1).to_a converts [1, 2, 3] to [[1], [2], [3]]
Account.import [:person_id], persons.pluck(:id).combination(1).to_a

How to use indexed properties of NodeModels in cypher queries of Neo4django?

I'm a newbie to Django as well as neo4j. I'm using Django 1.4.5, neo4j 1.9.2 and neo4django 0.1.8
I've created NodeModel for a person node and indexed it on 'owner' and 'name' properties. Here is my models.py:
from neo4django.db import models as models2
class person_conns(models2.NodeModel):
owner = models2.StringProperty(max_length=30,indexed=True)
name = models2.StringProperty(max_length=30,indexed=True)
gender = models2.StringProperty(max_length=1)
parent = models2.Relationship('self',rel_type='parent_of',related_name='parents')
child = models2.Relationship('self',rel_type='child_of',related_name='children')
def __unicode__(self):
return self.name
Before I connected to Neo4j server, I set auto indexing to True and and gave indexable keys in conf/neo4j.properties file as follows:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=owner,name
# Enable auto-indexing for relationships, default is false
relationship_auto_indexing=true
# The relationship property keys to be auto-indexed, if enabled
relationship_keys_indexable=child_of,parent_of
I followed Neo4j: Step by Step to create an automatic index to update above file and manually create node_auto_index on neo4j server.
Below are the indexes created on neo4j server after executing syndb of django on neo4j database and manually creating auto indexes:
graph-person_conns lucene
{"to_lower_case":"true", "_blueprints:type":"MANUAL","type":"fulltext"}
node_auto_index lucene
{"_blueprints:type":"MANUAL", "type":"exact"}
As suggested in https://github.com/scholrly/neo4django/issues/123 I used connection.cypher(queries) to query the neo4j database
For Example:
listpar = connection.cypher("START no=node(*) RETURN no.owner?, no.name?",raw=True)
Above returns the owner and name of all nodes correctly. But when I try to query on indexed properties instead of 'number' or '*', as in case of:
listpar = connection.cypher("START no=node:node_auto_index(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives 0 rows.
listpar = connection.cypher("START no=node:graph-person_conns(name='s2') RETURN no.owner?, no.name?",raw=True)
Above gives
Exception Value:
Error [400]: Bad Request. Bad request syntax or unsupported method.
Invalid data sent: (' expected but-' found after graph
I tried other strings like name, person_conns instead of graph-person_conns but each time it gives error that the particular index does not exist. Am I doing a mistake while adding indexes?
My project mainly depends on filtering the nodes based on properties, so this part is really essential. Any pointers or suggestions would be appreciated. Thank you.
This is my first post on stackoverflow. So in case of any missing information or confusing statements please be patient. Thank you.
UPDATE:
Thank you for the help. For the benefit of others I would like to give example of how to use cypher queries to traverse/find shortest path between two nodes.
from neo4django.db import connection
results = connection.cypher("START source=node:`graph-person_conns`(person_name='s2sp1'),dest=node:`graph-person_conns`(person_name='s2c1') MATCH p=ShortestPath(source-[*]->dest) RETURN extract(i in nodes(p) : i.person_name), extract(j in rels(p) : type(j))")
This is to find shortest path between nodes named s2sp1 and s2c1 on the graph. Cypher queries are really cool and help traverse nodes limiting the hops, types of relations etc.
Can someone comment on the performance of this method? Also please suggest if there are any other efficient methods to access Neo4j from Django. Thank You :)

Hm, why are you using Cypher? neo4django QuerySets work just fine for the above if you set the properties to indexed=True (or not, it'll just be slower for those).
people = person_conns.objects.filter(name='n2')
The neo4django docs have some other querying examples, as do the Django docs. Neo4django executes those queries as Cypher on the backend- you really shouldn't need to drop down to writing the Cypher yourself unless you have a very particular traversal pattern or a performance issue.
Anyway, to more directly tackle your question- the last example you used needs backticks to escape the index name, like
listpar = connection.cypher("START no=node:`graph-person_conns`(name='s2') RETURN no.owner?, no.name?",raw=True)
The first example should work. One thought- did you flip the autoindexing on before or after saving the nodes you're searching for? If after, note that you'll have to manually reindex the nodes either using the Java API or by re-setting properties on the node, since it won't have been autoindexed.
HTH, and welcome to StackOverflow!

Given an array of (sanitized) attribute headers (metatags), how might I automatically create columns for each in my database based on those tags?

So here is the unformatted list (this one, an income statement, has over row headers like these, so yes, automation is the way to go here).
["Revenue", "Other Revenue, Total", "Total Revenue", "Cost of Revenue, Total"...]
Here is the list after I ran each array entity (string) through my simple little sanitizer program, CleanZeeString.new.go(str).
["revenue", "other_revenue_total", "total_revenue", "cost_of_revenue_total"...]
So, I want to access Rails methods that will allow me to at least partially automate the database column creation process and migration, because this list has over 50 row headers, there are more lists, and I simply do not believe in doing things by hand anymore.

LATER (personal progress):
I'm starting to believe that a solution to this problem is going to involve getting outside of the rails "box" with regards to migrations. Yes, to solve this, I think we might have to think creatively about migrations...
I know how easy this is to do either by hand, or with the assistance of some sort of third party scripting solution, but I simply refuse. I should have been able to do this automatically last night after a couple of drinks if I wanted to. Given the array, and the fact that each column is the same type ("decimal" in rails), this should be doable in an automatic, rails-like way.
migration files are just normal ruby files. working on a solution based off that fact. time to get fancy. String#to_sym
Got it---
class CreateIncomeStatements < ActiveRecord::Migration
def change
f = File.open(File.join(Rails.root, 'lib', 'assets', 'is_list.json'))
is_ary = JSON.parse(f.read)
create_table :income_statements do |t|
is_ary.each do |k|
eval("t.decimal k.to_sym")
end
t.timestamps
end
end
end
I used the eval() method, and felt the ghost of my teacher slap me on the wrist, but, it worked. The key "ah hah" was re-considering the fact that migration files are just ruby files, and as such, I can just do whatever I want.

django-haystack won't index my data

I'm following instructions on haystack documentation.
I'm getting no results for SearchQuerySet().all().
I think the problem is here
$ ./manage.py rebuild_index
WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 0 notes. // <-- here 0 notes!
mysite/note/search_indexes.py looks like
import datetime
import haystack
from haystack import indexes
from note.models import Note
class NoteIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
author = indexes.CharField(model_attr='user')
pub_date = indexes.DateTimeField(model_attr='pub_date')
def get_model(self):
return Note
def index_queryset(self):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
and I have mysite/note/templates/search/indexes/note/Note_text.txt
{{ object.title }}
{{ object.user.get_full_name }}
{{ object.body }}
Debugging haystack document mentions
Do you have a search_sites.py that runs haystack.autodiscover?
Have you registered your models with the main haystack.site (usually
within your search_indexes.py)?
But none of search_sites.py , haystack.autodiscover, haystack.site was mentioned in the first article.
I'm so confused. Are their docs dealing with different haystack versions?
My setups are..
haystack version 2.0.0.beta
django 1.3.1
solr 3.6.0
sqlite 3

def index_queryset(self):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(pub_date__lte=datetime.datetime.now())
was the culprit.
I don't know why, but commenting out fixes the problem.
I guess 'time' in my system is somehow messed up.

It should be...
def index_queryset(self, using=None):
I don't know if this will fix your issue or not, but that is the correct signature for the method.

Removing def index_queryset(self) makes sense. It builds a regular Django ORM QuerySet, which decides which objects get put into the full-text index. Your sample index_queryset limits the objects to past timestamps only (before now).
So, you really have a datetime handling problem. Check your SQL database's timezone and how it stores times.
A timestamp in UTC locale is about +5 hours ahead of New York, and most of the USA. SQLite caused the same problem for me by choosing UTC times in the future.

pull Drupal field values with db_query() or db_select()

I've created a content type in Drupal 7 with 5 or 6 fields. Now I want to use a function to query them in a hook_view call back. I thought I would query the node table but all I get back are the nid and title. How do I get back the values for my created fields using the database abstraction API?

Drupal stores the fields in other tables and can automatically join them in. The storage varies depending on how the field is configured so the easiest way to access them is by using an EntityFieldQuery. It'll handle the complexity of joining all your fields in. There's some good examples of how to use it here: http://drupal.org/node/1343708
But if you're working in hook_view, you should already be able access the values, they're loaded into the $node object that's passed in as a parameter. Try running:
debug($node);
In your hook and you should see all the properties.

If you already known the ID of the nodes (nid) you want to load, you should use the node_load_multiple() to load them. This will load the complete need with all fields value. To search the node id, EntityFieldQuery is the recommended way but it has some limitations. You can also use the database API to query the node table for the nid (and revision ID, vid) of your nodes, then load them using node_load_multiple().
Loading a complete load can have performance impacts since it will load way more data than what you need. If this prove to be an issue, you can either try do directly access to field storage tables (if your fields values are stored in your SQL database). The schema of these tables is buld dynamicaly depedning on the fields types, cardinality and other settings. You will have to dig into your database schema to figure it out. And it will probably change as soon as you change something on your fields.
Another solution, is to build stub node entities and to use field_attach_load() with a $options['field_id'] value to only load the value of a specific field. But this require a good knowledge and understanding of the Field API.

See How to use EntityFieldQuery article in Drupal Community Documentation.
Creating A Query
Here is a basic query looking for all articles with a photo that are
tagged as a particular faculty member and published this year. In the
last 5 lines of the code below, the $result variable is populated with
an associative array with the first key being the entity type and the
second key being the entity id (e.g., $result['node'][12322] = partial
node data). Note the $result won't have the 'node' key when it's
empty, thus the check using isset, this is explained here.
Example:
<?php
$query = new EntityFieldQuery();
$query->entityCondition('entity_type', 'node')
->entityCondition('bundle', 'article')
->propertyCondition('status', 1)
->fieldCondition('field_news_types', 'value', 'spotlight', '=')
->fieldCondition('field_photo', 'fid', 'NULL', '!=')
->fieldCondition('field_faculty_tag', 'tid', $value)
->fieldCondition('field_news_publishdate', 'value', $year. '%', 'like')
->fieldOrderBy('field_photo', 'fid', 'DESC')
->range(0, 10)
->addMetaData('account', user_load(1)); // Run the query as user 1.
$result = $query->execute();
if (isset($result['node'])) {
$news_items_nids = array_keys($result['node']);
$news_items = entity_load('node', $news_items_nids);
}
?>
Other resources
EntityFieldQuery on api.drupal.org
Building Energy.gov without Views

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to safely remove a duplicate index from a Rails 3 schema? - database

Related

Ruby convert array of active records or objects into array of hashes

How to use indexed properties of NodeModels in cypher queries of Neo4django?

Given an array of (sanitized) attribute headers (metatags), how might I automatically create columns for each in my database based on those tags?

django-haystack won't index my data

pull Drupal field values with db_query() or db_select()

Categories

Resources