PostGis nearest neighbors query with Ecto - postgis

could you help me to write this kind of PostGis query with Ecto DSL?
SELECT streets.gid, streets.name
FROM
nyc_streets streets,
nyc_subway_stations subways
WHERE streets.geom && ST_Expand(subways.geom, 200)
ORDER BY ST_Distance(streets.geom, subways.geom) ASC;
I'm confused on how to select data from multiple tables without joining them on foreign keys.
Thanks

You may try to use raw queries like in this question:
How to use raw sql with ecto Repo
It will tie you to one database, but you are using very specific feature anyway. I don't think, that Ecto DSL can support queries specific to PostGis.

Related

Questions in Sybase lag and over by concept

I have table like this below in my sybase database
ID,Col1,Col2
1,100,300
2,300, 400
3,400,500
4,900,1000.
I want result like this below only in sybase.
1,100,500 --- cross interrow checking the values
2,900,1000.
SInce you did not specify which database you're using, I'm assuming your using Sybase ASE (rather than Sybase IQ or Sybase SQL Anywhere, which do support lag/lead etc.)
Also it's not quite clear what you want since you have not defined how the relation between the various rows and columns should be interpreted. But I'm guessing you're essentially hinting at a dependency graph between Col2->Col1.
In ASE, you'll need to write this as a multi-step, loop-based algorithm whereby you determine the dependency graph. Since you don't know how many levels deep this will run, you need a loop rather than a self-join. You need to keep track of the result in a temporary table.
Can't go further here... but that's the sort of approach you'll need.

Querying large amount of data processed by Hive

Say I have around 10-20GB of data in HDFS as a Hive table. This has been obtained after several Map-Reduce jobs and JOIN over two separate datasets. I need to make this Queryable to the user. What options do I have?
Use Sqoop to transfer data from HDFS to an RDS like Postgresql. But I want to avoid spending so much time on data transfer. I just tested HDFS->RDS in the same AWS region using Sqoop, and 800mb of data takes 4-8 minutes. So you can imagine ~60gb of data would be pretty unmanagable. This would be my last resort.
Query Hive directly from my Webserver as per user request. I haven't ever head of Hive being used like this so I'm skeptical about this. This struck me because I just found out you can query hive tables remotely after some port forwarding on the EMR cluster. But being new to big(ish) data I'm not quite sure about the risks associated with this. Is it commonplace to do this?
Some other solution - How do people usually do this kind of thing? Seems like a pretty common task.
Just for completeness sake, my data looks like this:
id time cat1 cat2 cat3 metrics[200]
A123 1234212133 12 ABC 24 4,55,231,34,556,123....(~200)
.
.
.
(time is epoch)
And my Queries look like this:
select cat1, corr(metrics[2],metrics[3]),corr(metrics[2],metrics[4]),corr(metrics[2],metrics[5]),corr(metrics[2],metrics[6]) from tablename group by cat1;
I need the correlation function, which is why I've chosen postgresql over MySQL.
You have correlation function in Hive:
corr(col1, col2)
Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.
You can simply connect to a hiveserver port via odbc and execute queries.
Here is an example:
http://www.cloudera.com/content/cloudera/en/downloads/connectors/hive/odbc/hive-odbc-v2-5-10.html
Hive User Experience (hue) has a Beeswax query editor designed specifically for the purpose of exposing Hive to end users who are comfortable with SQL. This way they can potentially run ad-hoc queries against the data residing in Hive without needing to move it elsewhere. You can see an example of the Beeswax Query Editor here: http://demo.gethue.com/beeswax/#query
Will that work for you?
What i can understand from the question posted above is you have some data (20GB ) which you have stored in hdfs and using hive. Now you want to access that data to perform some kind of statistics functions like correlation and others.
You have functions in hive that perform correlation.
Otherwise you can directly connect R to hive using RHive or even excel to hive using datasource.
The other solution is installing hue which comes with hive editors where you can directly query the hive.

Databases: Insert from multipler servers

Rally DB newbie question:
I am trying to insert user records to a DB. The id type could be an autoincrementing serial, or an INT.
How do I insert a record with an ID that is unique, and I can get that ID back, making sure that if the request is handled by multiple application servers, then I don't generate duplicate id's.
e.g.
Server 1 needs to insert: ( 'John', 'Smith', 25 )
Server 2 needs to insert: ( 'John', 'Rambo', 25 )
The app server wants the id's of the generated records back. I can't do a select based on attributes because
They could be duplicate
It's expensive.
One solution is that each app server also inserts a server id, server update no, combination and then selects on the basis of that.
I feel like this should be such a generic problem that there is would be a much simpler solution.
I'm using PostgreSQL if it matters.
With postgres you can use the RETURNING clause to return the value of a column such as
INSERT INTO table (col1,col2,col3) VALUES (1,2,3) RETURNING id;
You didn't mention what language and tools you're using. That matters, as the standard doesn't really cover this, but many client platforms have their own abstractions.
In particular JDBC has Statement.getGeneratedKeys() and Statement.RETURN_GENERATED_KEYS.
I don't think there's any equivalent in the ODBC interface. I didn't find one with a quick search, though found that some vendors add it as an extension.
For other clients, it just depends on what you're using. Some ORM layers have their own handling, e.g. Hibernate (and other JPA implementations) handle key generation, as does ActiveRecord (blech), SQLAlchemy, etc.
Otherwise, as Lucas says, you can just use the PostgreSQL extension RETURNING the_key_column_names_here. (9.5 should hopefully add RETURNING PRIMARY KEY too).
(The SQL spec provides GENERATED ALWAYS but as far as I know, no standard way to return the values. Many databases don't implement GENERATED anyway.)

How to map 2 table to 1 entity class with hibernate annotation?

I am new to Hibernate. Now I have a problem. I have 2 tables (Timetable, and Timetable_backup) with similar structure because the timetable_backup table is just back up version of timetable table which contains current data. Now I do not know how to get all data from the past to now. In hibernate, we cannot use UNION like in SQL to query. So I try to map 2 tables to 1 entity using Inheritance and #mappedsuperclass but it does not work for me. Please help me with this. If the context is not clear please tell me.
Kind Regards
Nathan
Probably what you want is something like Envers, a plugin for Hibernate that takes care of versioning records in a table. You just use a couple of annotations in your classes and it provides an interface to look for past records among other things.
You cannot do it.
The typical workaround is to map entity to the main table, and use native SQL queries to access the backup table.
By this time you might have found answer or workaround to the problem you have posted. If possible, can you please post it here so that it will help others.
Anyway I found following link which explains how to create tables using single POJO Mapping same POJO to more than one table in Hibernate.
As hibernate does not support union. I have extracted results from 2 queried (main table as well as backup table) and used listTimeTable.addAll(listbackTimeTable); This will give result same as union all operation.
Once again, please post your implementation for benefit of this community...
Thanks,
Shirish

Querying XML columns in SQLServer 2005

There is a field in my company's "Contacts" table. In that table, there is an XML type column. The column holds misc data about a particular contact. EG.
<contact>
<refno>123456</refno>
<special>a piece of custom data</special>
</contact>
The tags below contact can be different for each contact, and I must query these fragments
alongside the relational data columns in the same table.
I have used constructions like:
SELECT c.id AS ContactID,c.ContactName as ForeName,
c.xmlvaluesn.value('(contact/Ref)[1]', 'VARCHAR(40)') as ref,
INNER JOIN ParticipantContactMap pcm ON c.id=pcm.contactid
AND pcm.participantid=2140
WHERE xmlvaluesn.exist('/contact[Ref = "118985"]') = 1
This method works ok but, it takes a while for the Server to respond.
I have also investigated using the nodes() function to parse the XML nodes and exist() to test if a nodes holds the value I'm searching for.
Does anyone know a better way to query XML columns??
If you are doing one write and a lot of reads, take the parsing hit at write time, and get that data into some format that is more query-able. A first suggestion would be to parse them into a related but separate table, with name/value/contactID columns.
I've found the msdn xml best practices helpful for working with xml blob columns, might provide some inspiration...
http://msdn.microsoft.com/en-us/library/ms345115.aspx#sql25xmlbp_topic4
In addition to the page mentioned by #pauljette, this page has good performance optimization advice:
http://msdn.microsoft.com/en-us/library/ms345118.aspx
There's a lot you can do to speed up the performance of XML queries, but it will never be as good as properly indexed relational data. If you are selecting one document and then querying inside just that one, you can do pretty well, but when your query needs to scan through a bunch of similar documents looking for something, it's sort of like a key lookup in a relational query plan (that is, slow).
If you have a XSD for your Xml then you can import that into your database and you can then build indexes for your Xml data.
Try this
SELECT * FROM conversionupdatelog WHERE
convert(XML,colName).value('(/leads/lead/#LeadID=''xyz#airproducts.com'')[1]', 'varchar(max)')='true'

Resources