Ruby Script unable to gather data

Ruby Script unable to gather data - arrays

#!/usr/bin/ruby
# Fetches all Virginia Tech classes from the timetable and spits them out into a nice JSON object
# Can be run with option of which file to save output to or will save to classes.json by default
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'json'
#Create Mechanize Browser and Class Data hash to load data into
agent = Mechanize.new
classData = Hash.new
#Get Subjects from Timetable page
page = agent.get("https://banweb.banner.vt.edu/ssb/prod/HZSKVTSC.P_ProcRequest")
subjects = page.forms.first.field_with(:name => 'subj_code').options
#Loop subjects
subjects.each do |subject|
#Get the Timetable Request page & Form
timetableSearch = agent.get("https://banweb.banner.vt.edu/ssb/prod/HZSKVTSC.P_ProcRequest")
searchDetails = page.forms.first
#Submit with specific subject
searchDetails.set_fields({
:SUBJ_CODE => subject,
:TERMYEAR => '201401',
:CAMPUS => 0
})
#Submit the form and store results into courseListings
courseListings = Nokogiri::HTML(
searchDetails.submit(searchDetails.buttons[0]).body
)
#Create Array in Hash to store all classes for subjects
classData[subject] = []
#For every Class
courseListings.css('table.dataentrytable/tr').collect do |course|
subjectClassesDetails = Hash.new
#Map Table Cells for each course to appropriate values
[
[ :crn, 'td[1]/p/a/b/text()'],
[ :course, 'td[2]/font/text()'],
[ :title, 'td[3]/text()'],
[ :type, 'td[4]/p/text()'],
[ :hrs, 'td[5]/p/text()'],
[ :seats, 'td[6]/text()'],
[ :instructor, 'td[7]/text()'],
[ :days, 'td[8]/text()'],
[ :begin, 'td[9]/text()'],
[ :end, 'td[10]/text()'],
[ :location, 'td[11]/text()'],
# [ :exam, 'td[12]/text()']
].collect do |name, xpath|
#Not an additional time session (2nd row)
if (course.at_xpath('td[1]/p/a/b/text()').to_s.strip.length > 2)
subjectClassesDetails[name] = course.at_xpath(xpath).to_s.strip
end
end
#Add class to Array for Subject!
classData[subject].push(subjectClassesDetails)
end
end
#Write Data to JSON file
open(ARGV[0] || "classes.json", 'w') do |file|
file.print JSON.pretty_generate(classData)
end
The above code is supposed to retrieve data from https://banweb.banner.vt.edu/ssb/prod/HZSKVTSC.P_ProcRequest
but if I print subjects.length is prints 0 so it clearly isn't getting the correct data. The given term code "201401" is definitely the right one.
I've noticed that when I manually enter in the link to my browser the subject field doesn't allow you to select an option until a term is selected, however when I view the page source the data is clearly already there. What can I do to retrieve this data?

I'm looking at that vtech page and I can see that you need to select a TERMYEAR first before the subj_code dropdown fills allowing you to get the options. Unfortunately this happens with javascript in function dropdownlist(listindex). Mechanize doesn't handle javascript so this script is doomed to fail.
Your options are to run a browser automator like Watir or Selenium: discussed here: How do I use Mechanize to process JavaScript?
Or to read the source of that page and parse out the values of these lines:
document.ttform.subj_code.options[0]=new Option("All Subjects","%",false, false);
document.ttform.subj_code.options[1]=new Option("AAEC - Agricultural and Applied Economics","AAEC",false, false);
document.ttform.subj_code.options[2]=new Option("ACIS - Accounting and Information Systems","ACIS",false, false);
To get the options. You could do that by simply using open-uri:
require 'open-uri'
page = open("https://banweb.banner.vt.edu/ssb/prod/HZSKVTSC.P_ProcRequest")
page_source = page.read
Now you can use a regex to scan for all the options:
page_source.scan /document\.ttform.+;/
That'll give you an array with all the lines that have the javascript codes that contain the options. Craft your regex a little better and you can extract the option text from those. I'll see if I can come up with something for that and I'll post back. Hopefully this will get you headed in the right direction.
I'm back. I was able to parse out all the subj_code options with this regex:
subjects = page_source.scan(/Option\("(.*?)"/).uniq # remove duplicates
subjects.shift # get rid of the first option because it's just "All Subjects"
subjects.size == 137
Hope that helps.

Related

Cakephp 3 - How to integrate external sources in table?

I working on an application that has its own database and gets user information from another serivce (an LDAP is this case, through an API package).
Say I have a tables called Articles, with a column user_id. There is no Users table, instead a user or set of users is retrieved through the external API:
$user = LDAPConnector::getUser($user_id);
$users = LDAPConnector::getUsers([1, 2, 5, 6]);
Of course I want retrieving data from inside a controller to be as simple as possible, ideally still with something like:
$articles = $this->Articles->find()->contain('Users');
foreach ($articles as $article) {
echo $article->user->getFullname();
}
I'm not sure how to approach this.
Where should I place the code in the table object to allow integration with the external API?
And as a bonus question: How to minimise the number of LDAP queries when filling the Entities?
i.e. it seems to be a lot faster by first retrieving the relevant users with a single ->getUsers() and placing them later, even though iterating over the articles and using multiple ->getUser() might be simpler.

The most simple solution would be to use a result formatter to fetch and inject the external data.
The more sophisticated solution would a custom association, and a custom association loader, but given how database-centric associations are, you'd probably also have to come up with a table and possibly a query implementation that handles your LDAP datasource. While it would be rather simple to move this into a custom association, containing the association will look up a matching table, cause the schema to be inspected, etc.
So I'll stick with providing an example for the first option. A result formatter would be pretty simple, something like this:
$this->Articles
->find()
->formatResults(function (\Cake\Collection\CollectionInterface $results) {
$userIds = array_unique($results->extract('user_id')->toArray());
$users = LDAPConnector::getUsers($userIds);
$usersMap = collection($users)->indexBy('id')->toArray();
return $results
->map(function ($article) use ($usersMap) {
if (isset($usersMap[$article['user_id']])) {
$article['user'] = $usersMap[$article['user_id']];
}
return $article;
});
});
The example makes the assumption that the data returned from LDAPConnector::getUsers() is a collection of associative arrays, with an id key that matches the user id. You'd have to adapt this accordingly, depending on what exactly LDAPConnector::getUsers() returns.
That aside, the example should be rather self-explanatory, first obtain a unique list of users IDs found in the queried articles, obtain the LDAP users using those IDs, then inject the users into the articles.
If you wanted to have entities in your results, then create entities from the user data, for example like this:
$userData = $usersMap[$article['user_id']];
$article['user'] = new \App\Model\Entity\User($userData);
For better reusability, put the formatter in a custom finder. In your ArticlesTable class:
public function findWithUsers(\Cake\ORM\Query $query, array $options)
{
return $query->formatResults(/* ... */);
}
Then you can just do $this->Articles->find('withUsers'), just as simple as containing.
See also
Cookbook > Database Access & ORM > Query Builder > Adding Calculated Fields
Cookbook > Database Access & ORM > Retrieving Data & Results Sets > Custom Finder Methods

Embedded models storage in Odoo (Inherits)

I'm creating a custom module in odoo and I'm struggling with an inheritance issue, let's say i have the following implementation :
class SuperModel(models.Model) :
_name="model_name"
_inherits={'model_name.one':'model_name_one_id',
'model_name.two':'model_name_two_id'}
selection = fields.Selection(selection=[('m1','Model one'),('m2','Model Two')])
model_name_one_id = fields.Many2one(comodel_name="model_name.one",ondelete="cascade")
model_name_two_id = fields.Many2one(comodel_name="model_name.two",ondelete="cascade")
class ModelOne(models.Model):
_name="model_name.one"
value_one = fields.Char()
class ModelTwo(models.Model):
_name="model_name.two"
value_two = fields.Char()
What i want to achieve, is by selecting "Model 1" or "Model 2" in the main model view, only the corresponding fields will be displayed and stored in the database.
But whenever i create a record for "SuperModel" both records are created in "ModelOne" and "ModelTwo" tables.
For example if i select "Model 1" and fill "value_one", when saving, an empty record is created in "Model 2" table (model_name_two_id == False). How can i prevent that ?
Thank you for helping :)

OK using Delegate is impossible in you condition because odoo will make sure that the
many2one must have a value or the saving will not happen so use related field like this
class SuperModel(models.Model) :
_name="model_name"
selection = fields.Selection(selection=[('m1','Model one'),('m2','Model Two')])
# indecate that the Many2one are delegated = true
model_name_one_id = fields.Many2one(comodel_name="model_name.one",ondelete="cascade", )
model_name_two_id = fields.Many2one(comodel_name="model_name.two",ondelete="cascade", )
value_one = fields.Char(related="model_name_one_id.value_one")
value_two = fields.Char(related="model_name_two_id.value_two")
#api.model
def create(self, vals):
if not rec_id.value_one:
# if the related field of model_name_one_id are no null
# create a record from that relateds fields add it to vals
# i used vals directly odoo is smart to ignore the non existing field in model_name.one
# or iterate the vals and extract a dictionary of model_name.one
m2on_rec = self.env['model_name.one'].create(vals) # create a record
vals.update({'model_name_one_id':m2on_rec.id}) # add the id to vals
return super(SuperModel, self).create(vals)
elif not rec_id.value_one:
# same thing for the second many2one
else :
# show error or create a simple record
return return super(SuperModel, self).create(vals)
#api.one # is used one so i make sure that self containing only one record it's hard for multi need to much code
def write(self, vals):
# check if any of the related field of model_name_one_id is changed
if any(field_name in self.env['model_name.one'] for field_name in vals.keys()) :
# then check the many2one field all ready have a value so the operation here is update
if self.model_name_one_id:
return super(SuperModel, self).write(vals) # related field will do the changes
else :
# here we need to delete the record of model_name_two_id
self.model_name_two_id.unlink()
# here the same thing in create you need to create the record
retrun super(SuperModel, self).write(vals)
else :
# same thing for model_name_two_id
i tried this solution and it work sooo fine just create the record of the one2many field it's like you are the one who are delegating not the frame work for editing is more complex because you need to delete the record and then save the new one

Extracting Arrays as values in a JSON file using AngularJS

This is sort of a three part question. I have a JSON file. A few of the values in the JSON file are arrays. Keeping that in mind:
1) On any given page, I'd only want one set of values coming out of the JSON file. For example (as you'll see in code below) my JSON file is a list of attorneys. On any given bio page, I'd obviously only want one attorney's information. I'm currently, successfully, doing this by pulling back the entire JSON and then using ng-show. But this is causing some other issues that I'll explain in later points, so I'm wondering if there's something to put in the app.factory itself to only bring back the one set in the first place.
2) As mentioned, some of the values are arrays. This comes into play two ways in this situation. One of the ways is that there is an array of quotes about the attorney that I'll need to drop into a JS array so that my JS function can loop through them. Currently, I'm hardcoding the quotes for the one test attorney but I'm really trying to figure out how to make this dynamic. This is one reason I'm trying to figure out how to bring back only one attorney's information so I can then, somehow, say his quotes go into this array.
3) Another array value is a list of his specialty areas. I have another, hardcoded, JS object, associating the short terms with the display names. I realized though, that this has two issues.
a) The JS renders after the Angular, so I can't reference that JS in the Angular code
b) I have no way , anyway, to display the JS dynamically inside the Angular code.
My solution to that aspect was to create a second JSON file holding the area hash but besides being a little cumbersome, I'm also not sure how to dynamically display just the ones I want. e.g: If my attorney only specializes in securities and litigation, how would I tell the code to only display {{areas.securities}} and {{areas.litigation}}? So,I'm open to thoughts there as well.
Here is the current, relevant code. If you need more, just ask.
Thanks.
attorneys.json (irrelevant lines removed)
{"attorneys":
[
{
"id":1,
"name":"Bob Smith",
"quotes":
[
{
"id": 1,
"quote": "Wonderful guy!",
"person": "Dovie"
},
{
"id": 2,
"quote": "If ye be wanting a haggis like no other, Bob be yer man!",
"person": "Angus McLoed"
},
{
"id": 3,
"quote": "Wotta Hottie!",
"person": "Bob's wife"
}
],
"areas": ["altdispute", "litigation", "securities"],
}
]
}
...and the relevant current JS object that I'm not sure what to do with:
var practiceareas = {
altdispute: "Alternative Dispute Resolution",
businesscorp: "Businesses & Corporations",
estateplanning: "Estate Planning",
futures: "Futures & Derivatives",
litigation: "Litigation",
productliability: "Product Liability",
realestate: "Real Estate",
securities: "Securities"
}
script.js (relevant function)
var idno = 0;
/* This is what I want replaced by the Angular pull */
var quotelist = ["\"Wonderful guy!\"<br/>-Dovie", "\"If ye be wanting a haggis like no other, Bob be yer man!\"<br/>-Angus McLoed", "\"Hubba, Hubba! What a hottie!\"<br/>-Bob's wife"];
$("#bio_quotes").html(quotelist[0]);
function quoteflip(id, total){
var src1 = quotelist[idno];
$("#bio_quotes").fadeOut(500, function(){
$("#bio_quotes").html(src1).fadeIn(500);
});
idno = (id + 1) % total;
window.setTimeout(function(){quoteflip(idno, quotelist.length);}, 5000);
}
window.setTimeout(function(){quoteflip(idno, quotelist.length);}, 500);
By the way, as far as the quotes, I'm even happy to turn the JSON into a more condensed version by removing the id and consolidating the quote and author - making it an array of strings instead of mini-objects - if that makes it easier. In fact, it might be easier as far as the function anyway.

Can definitely filter things out at the service / factory using Array.filter. If you want to filter it server side, you have to have the code at server side that will do that.
Not sure what your backend store is but definitely doable.
Again, you can do this pretty easily with Array.map which let you pull specific values into a new Array. If you just want the name and quotes' quote and person name, you can definitely do this using Array .filter and .map and bind the new array to your viewmodel / scope.
Hmm.. again, I'd disagree, this look like the same issue with JavaScript array manipulation. You can definitely as part of the transformation in point 1 and 2, include this so it will transfer area to the long practice area names. The easiest way to show the relevant practice area is to map it to the long name during the transformation in the service layer.
//get matching attorney from the store by id
var matches = data.attorneys.filter(function(a) {
return a.id === id;
});
//If match found,
if (matches.length === 1) {
var result = matches[0];
//map the long name for practicing area
//use the matching attorney's area name as key
//and overwrite the result areas with the map
result.areas = result.areas.map(function(a) {
return practiceareas[a];
});
return result;
}
See this solution: http://embed.plnkr.co/xBPju7/preview
As for the fade in and fade out, I'll let you figure it out...

How to add items to an array one by one in groovy language

I´m developing a grails app, and I already have a domain class "ExtendedUser" wich has info about users like: "name", "bio", "birthDate". Now I´m planning to do statistics about user´s age so I have created another controller "StatisticsController" and the idea is to store all the birthDates in a local array so I can manage multiple calculations with it
class StatisticsController {
// #Secured(["ROLE_COMPANY"])
def teststat(){
def user = ExtendedUser.findAll() //A list with all of the users
def emptyList = [] //AN empty list to store all the birthdates
def k = 0
while (k<=user.size()){
emptyList.add(user[k].birthDate) //Add a new birthdate to the emptyList (The Error)
k++
}
[age: user]
}
}
When I test, it shows me this error message: Cannot get property 'birthDate' on null object
So my question is how is the best way to store all the birthdates in an single array or list, so I can make calculations with it. Thank you

I prefer to .each() in groovy as much as possible. Read about groovy looping here.
For this try something like:
user.each() {
emptylist.push(it.birthdate) //'it' is the name of the default iterator created by the .each()
}
I don't have a grails environment set up on this computer so that is right off the top of my head without being tested but give it a shot.

I would use this approach:
def birthDates = ExtendedUser.findAll().collect { it.birthDate }
The collect method transforms each element of the collection and returns the transformed collection. In this case, users are being transformed into their birth dates.

Can you try:
List dates = ExtendedUser.findAll().birthDate

How can I mimic 'select_related' using google-appengine and django-nonrel?

django nonrel's documentation states: "you have to manually write code for merging the results of multiple queries (JOINs, select_related(), etc.)".
Can someone point me to any snippets that manually add the related data? #nickjohnson has an excellent post showing how to do this with the straight AppEngine models, but I'm using django-nonrel.
For my particular use I'm trying to get the UserProfiles with their related User models. This should be just two simple queries, then match the data.
However, using django-nonrel, a new query gets fired off for each result in the queryset. How can I get access to the related items in a 'select_related' sort of way?
I've tried this, but it doesn't seem to work as I'd expect. Looking at the rpc stats, it still seems to be firing a query for each item displayed.
all_profiles = UserProfile.objects.all()
user_pks = set()
for profile in all_profiles:
user_pks.add(profile.user_id) # a way to access the pk without triggering the query
users = User.objects.filter(pk__in=user_pks)
for profile in all_profiles:
profile.user = get_matching_model(profile.user_id, users)
def get_matching_model(key, queryset):
"""Generator expression to get the next match for a given key"""
try:
return (model for model in queryset if model.pk == key).next()
except StopIteration:
return None
UPDATE:
Ick... I figured out what my issue was.
I was trying to improve the efficiency of the changelist_view in the django admin. It seemed that the select_related logic above was still producing additional queries for each row in the results set when a foreign key was in my 'display_list'. However, I traced it down to something different. The above logic does not produce multiple queries (but if you more closely mimic Nick Johnson's way it will look a lot prettier).
The issue is that in django.contrib.admin.views.main on line 117 inside the ChangeList method there is the following code: result_list = self.query_set._clone(). So, even though I was properly overriding the queryset in the admin and selecting the related stuff, this method was triggering a clone of the queryset which does NOT keep the attributes on the model that I had added for my 'select related', resulting in an even more inefficient page load than when I started.
Not sure what to do about it yet, but the code that selects related stuff is just fine.

I don't like answering my own question, but the answer might help others.
Here is my solution that will get related items on a queryset based entirely on Nick Johnson's solution linked above.
from collections import defaultdict
def get_with_related(queryset, *attrs):
"""
Adds related attributes to a queryset in a more efficient way
than simply triggering the new query on access at runtime.
attrs must be valid either foreign keys or one to one fields on the queryset model
"""
# Makes a list of the entity and related attribute to grab for all possibilities
fields = [(model, attr) for model in queryset for attr in attrs]
# we'll need to make one query for each related attribute because
# I don't know how to get everything at once. So, we make a list
# of the attribute to fetch and pks to fetch.
ref_keys = defaultdict(list)
for model, attr in fields:
ref_keys[attr].append(get_value_for_datastore(model, attr))
# now make the actual queries for each attribute and store the results
# in a dict of {pk: model} for easy matching later
ref_models = {}
for attr, pk_vals in ref_keys.items():
related_queryset = queryset.model._meta.get_field(attr).rel.to.objects.filter(pk__in=set(pk_vals))
ref_models[attr] = dict((x.pk, x) for x in related_queryset)
# Finally put related items on their models
for model, attr in fields:
setattr(model, attr, ref_models[attr].get(get_value_for_datastore(model, attr)))
return queryset
def get_value_for_datastore(model, attr):
"""
Django's foreign key fields all have attributes 'field_id' where
you can access the pk of the related field without grabbing the
actual value.
"""
return getattr(model, attr + '_id')
To be able to modify the queryset on the admin to make use of the select related we have to jump through a couple hoops. Here is what I've done. The only thing changed on the 'get_results' method of the 'AppEngineRelatedChangeList' is that I removed the self.query_set._clone() and just used self.query_set instead.
class UserProfileAdmin(admin.ModelAdmin):
list_display = ('username', 'user', 'paid')
select_related_fields = ['user']
def get_changelist(self, request, **kwargs):
return AppEngineRelatedChangeList
class AppEngineRelatedChangeList(ChangeList):
def get_query_set(self):
qs = super(AppEngineRelatedChangeList, self).get_query_set()
related_fields = getattr(self.model_admin, 'select_related_fields', [])
return get_with_related(qs, *related_fields)
def get_results(self, request):
paginator = self.model_admin.get_paginator(request, self.query_set, self.list_per_page)
# Get the number of objects, with admin filters applied.
result_count = paginator.count
# Get the total number of objects, with no admin filters applied.
# Perform a slight optimization: Check to see whether any filters were
# given. If not, use paginator.hits to calculate the number of objects,
# because we've already done paginator.hits and the value is cached.
if not self.query_set.query.where:
full_result_count = result_count
else:
full_result_count = self.root_query_set.count()
can_show_all = result_count self.list_per_page
# Get the list of objects to display on this page.
if (self.show_all and can_show_all) or not multi_page:
result_list = self.query_set
else:
try:
result_list = paginator.page(self.page_num+1).object_list
except InvalidPage:
raise IncorrectLookupParameters
self.result_count = result_count
self.full_result_count = full_result_count
self.result_list = result_list
self.can_show_all = can_show_all
self.multi_page = multi_page
self.paginator = paginator

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Ruby Script unable to gather data - arrays

Related

Cakephp 3 - How to integrate external sources in table?

Embedded models storage in Odoo (Inherits)

Extracting Arrays as values in a JSON file using AngularJS

How to add items to an array one by one in groovy language

How can I mimic 'select_related' using google-appengine and django-nonrel?

Categories

Resources