Table Not Found When Using %sql - apache-zeppelin

Exception thrown as follows for the code below when executing the %sql statement:
Exception - org.apache.spark.sql.AnalysisException: Table not found: businessReviews; line 1 pos 14
Code:
val business_DF = sqlCtx.read.json("/Users/tom/Documents/Spring 2016/Java/Project/YELP/yelp/DS - YELP/yelp_academic_dataset_business.json").select("business_id", "categories", "state", "city", "name", "longitude", "latitude")
import sqlContext.implicits._
business_DF.registerTempTable("businessReviews")
%sql
select * from businessReviews
ZEPPELIN_SPARK_USEHIVECONTEXT set to False in zeppelin-env.sh
export ZEPPELIN_SPARK_USEHIVECONTEXT = false # Use HiveContext instead of SQLContext if set true. true by default.

Adding in the beginning this row:
spark = SparkSession.builder.master("yarn").enableHiveSupport().getOrCreate()
worked for me

The following works for me*:
%pyspark
business_DF = spark.read.json("/tmp/yelp_academic_dataset_business.json")
business_DF = business_DF.select("business_id", "categories", "state", "city", "name", "longitude", "latitude")
business_DF.registerTempTable("businessReviews")
%sql
select * from businessReviews
However, I wouldn't bother with the temp table for the purpose you've described. You can just use z.show(<dataframe>), e.g.:
%pyspark
business_DF = spark.read.json("/tmp/yelp_academic_dataset_business.json")
business_DF = business_DF.select("business_id", "categories", "state", "city", "name", "longitude", "latitude")
z.show(business_DF)
*Using Spark 2.0.0, Python 3.5.2 and a snapshot build of Zeppelin (#04da56403b543e661dca4485f3c5a33ac53d0ede)

Related

Migrate from an array of strings to an array of objects (JSONB)

I have an array of strings in a jsonb column in a postgres DB, which I'd like to migrate to an array of objects with 2 fields. So, turn this:
"Umbrella": [
"green|bubbles",
"blue|clouds"
],
into this:
"items": {
"umbrella": [
{
"color": "green",
"pattern": "bubbles"
},
{
"color": "blue",
"pattern": "clouds"
}
]
}
I managed to migrate the first value of the array, but I don't know how to implement a "foreach" to do this for all items.
What I tried (public.metadata is the table and metadata is the jsonb column):
update public.metadata set metadata = jsonb_set(metadata, '{items}', '{}');
update public.metadata set metadata = jsonb_set(metadata, '{items, umbrella}', '[]');
update public.metadata set metadata = jsonb_insert(metadata, '{items, umbrella, 0}', '{"color":"1", "pattern":"2"}');
update public.metadata
set metadata = jsonb_set(
metadata,
'{items, umbrella, 0, color}',
to_jsonb(split_part(metadata -> 'Umbrella' ->> 0, '|', 1))
);
update public.metadata
set metadata = jsonb_set(
metadata,
'{items, umbrella, 0, pattern}',
to_jsonb(split_part(metadata -> 'Umbrella' ->> 0, '|', 2))
);
I thought maybe this could lead me to the final solution, but I'm stuck.
I manage to solve it like this:
-- from array of strings to array of objects
update public.metadata set metadata = jsonb_set(metadata #- '{Umbrella}', '{items, umbrella}',
(select jsonb_agg(
jsonb_build_object(
'color', (split_part(x::text, '|', 1)),
'pattern', (split_part(x::text, '|', 2))
)
) from public.metadata, jsonb_array_elements_text(metadata.metadata->'Umbrella') x)
) where metadata->'Umbrella' is not null and metadata -> 'Umbrella' != '[]'::jsonb;;

Parameter validation failed:\nInvalid type for parameter Body in endpoint

error
{
"errorMessage": "Parameter validation failed:\nInvalid type for parameter Body, value: [[[[-0.4588235, -0.27058822, -0.44313723], [-0.4823529, -0.47450978, -0.6], [-0.7490196, -0.70980394, -0.75686276],
....
[0.18431377, 0.19215691, 0.15294123], [0.0196079, 0.02745104, -0.03529412], [-0.0745098, -0.05882353, -0.16862744], [-0.4823529, -0.5058824, -0.62352943], [-0.38039213, -0.372549, -0.4352941], [-0.4588235, -0.41960782, -0.47450978], [-0.56078434, -0.58431375, -0.62352943], [-0.4352941, -0.41960782, -0.4588235], [-0.40392154, -0.41176468, -0.45098037], [-0.30196077, -0.34117645, -0.372549], [-0.30196077, -0.29411763, -0.34117645], [-0.26274508, -0.2235294, -0.27843136], [0.00392163, -0.01176471, 0.09019613], [0.09019613, 0.09803927, -0.01176471], [0.06666672, 0.12941182, -0.05098039], [0.03529418, 0.09019613, -0.03529412], [0.09019613, 0.10588241, 0.00392163], [-0.01960784, -0.01176471, -0.05882353], [0.03529418, 0.04313731, -0.01960784], [0.13725495, 0.15294123, 0.06666672], [0.06666672, 0.07450986, 0.02745104], [0.16078436, 0.1686275, 0.20000005], [0.43529415, 0.52156866, 0.5686275], [0.5764706, 0.64705884, 0.7019608], [0.67058825, 0.7490196, 0.75686276], [0.5764706, 0.654902, 0.6627451], [0.5921569, 0.67058825, 0.6784314]]]], type: <class 'list'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object",
"errorType": "ParamValidationError",
"stackTrace": [
[
"/var/task/lambda_function.py",
16528,
"lambda_handler",
"[ 0.5921569 , 0.67058825, 0.6784314 ]]]])"
],
[
"/var/runtime/botocore/client.py",
357,
"_api_call",
"return self._make_api_call(operation_name, kwargs)"
],
[
"/var/runtime/botocore/client.py",
649,
"_make_api_call",
"api_params, operation_model, context=request_context)"
],
[
"/var/runtime/botocore/client.py",
697,
"_convert_to_request_dict",
"api_params, operation_model)"
],
[
"/var/runtime/botocore/validate.py",
297,
"serialize_to_request",
"raise ParamValidationError(report=report.generate_report())"
]
]
}
lambda code
import os
import io
import boto3
import JSON
ENDPOINT_NAME = "tensorflow-training-2021-01-24-03-35-44-884"
runtime= boto3.client('runtime.sagemaker')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
data = json.loads(json.dumps(event))
payload = data['data']
print(payload)
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
Body=payload)
print(response)
result = json.loads(response['Body'].read().decode())
print(result)
return result[0]
I trained the model in sagemaker using TensorFlow
Estimator part
pets_estimator = TensorFlow(
entry_point='train.py',
role=role,
train_instance_type='ml.m5.large',
train_instance_count=1,
framework_version='2.1.0',
py_version='py3',
output_path='s3://imageclassificationtest202/data',
sagemaker_session=sess
)
I don't know why but I can't send data using JSON for
{
"data" : [[[[90494]...]]
}
My model simply accept NumPy array of dimension (1,128,128,3), and I m sending that data in JSON data field, but its saying invalid format, need byte or byte array
If you are following the reference from the AWS Blog, one way to solve the error would be:
data= json.dumps(event).encode('utf-8')
Then in the invokeEndPoint method, we would set Body=data .
Therefore, in reference to your code, I would suggest you set:
payload=json.dumps(data).encode('utf-8')
you have to set a serializer on your predictor, not sure how to do in your example but when I deploy a model I set this on the returned predictor.
from sagemaker.serializers import CSVSerializer
xgb_predictor.serializer = CSVSerializer() # set the serializer type

How do I specify a column on the foreign model in a where object using sequelize?

I am trying to create a join query in sequelize (SQL Server & SQLite), here is what I have so far:
db.Job.findAll({
attributes: [
'title',
'id',
[fn('SUM', literal('CASE WHEN JobResponses.result = "true" THEN 1 ELSE 0 END', 'T')), 'True'],
[fn('SUM', literal('CASE WHEN JobResponses.result = "false" THEN 1 ELSE 0 END', 'F')), 'False'],
[fn('SUM', literal('CASE WHEN JobResponses.result IS NULL OR JobResponses.result NOT IN ("true", "false") THEN 1 ELSE 0 END', 'Incomplete')), 'Incomplete']
],
include: [
{ model: db.JobResponse, as: 'JobResponses' },
],
where: where,
group: ['Job.id', 'Job.title'],
})
I want the aggregations to only be run on the jobresponses within a certain date range, so I am doing this to the where object:
where['JobResponses.createdAt'] = {
$lt: end,
$gt: start
}
However, that throws issue:
Error: SQLITE_ERROR: no such column: Job.JobResponses.createdAt
How do I specify a column on the foreign model in a where object using sequelize?
If you want to specify where for joined table you can do it in such way:
db.Job.findAll({
include: [
{
model: db.JobResponse,
as: 'JobResponses',
where:{
$and:[{createdAt:{$lt:end}},{createdAt:{$gt:start}}]
}
}
]
})

How to re-structure the below JSON data which is a result set of SQL query using cursors in python

def query_db(query, args=(), one=False):
cur = connection.cursor()
cur.execute(query, args)
r = [dict((cur.description[i][0], value)
for i, value in enumerate(row)) for row in cur.fetchall()]
cur.connection.close()
return (r[0] if r else None) if one else r
my_query = query_db("select top 1 email as email_address,status = 'subscribed',firstname,lasstname from users")
json_output = json.dumps(my_query)
print json_output
Result is this:
[{
"status": "subscribed",
"lastname": "Engineer",
"email": "theengineer#yahoo.com",
"firstname": "The"}]
what I want is this
{
"email_address":"yash#yahoo.com",
"status":"subscribed",
'merge_fields': {
'firstname': 'yash',
'lastname': 'chakka',
}
I don't have any column called merge-fields in database but, I want this merge-fields header for every email-id with first name and last name under it to post it to Mailchimp. what modification do i have to do to my cursor get desired output. Any help will be appreciated. Thanks!
I'm adding this so that future users of mailchimp API version3 can get an idea of how I achieved this.
Here is what i did I've added another function which accepts values in other word I capture all the values which are resulted from myquery and pass it to this below function
def get_users_mc_format(query):
users = query_db(query)
new_list = []
for user in users:
new_list.append({
"email_address": user["email"],
"status": user["status"],
"merge_fields": {
"FNAME": user["firstname"],
"LNAME": user["lastname"],
},
"interests": {
"1b0896641e": bool(user["hardware"]),
}
})
return new_list

How to specify schema for nested field in BigQuery?

The data I am working with has a nested field 'location' which looks like:
"location": {
"city": "Amherst",
"region": "NS",
"country": "CA"
},
How can I specify the schema for nested fields using the Java API?
Currently, my code looks like:
List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldLocation = new TableFieldSchema();
fieldFoo.setName("location");
fieldFoo.setType("record");
TableFieldSchema fieldLocationCity = new TableFieldSchema();
fieldBar.setName("location.city");
fieldBar.setType("string");
...
fields.add(fieldLocation);
fields.add(fieldLocationCity);
TableSchema schema = new TableSchema();
schema.setFields(fields);
This doesn't work as I am getting the following error:
CONFIG: {
"error": {
"errors": [
{
"domain": "global",
"reason": "invalid",
"message": "Record field location must have a schema."
}
],
"code": 400,
"message": "Record field location must have a schema."
}
I think you'd want to do something like the following:
List<TableFieldSchema> inner = new ArrayList<TableFieldSchema>();
List<TableFieldSchema> outer = new ArrayList<TableFieldSchema>();
TableFieldSchema fieldLocationCity = new TableFieldSchema();
fieldLocationCity.setName("city");
fieldLocationCity.setType("string");
// Add the inner fields to the list of fields in the record.
inner.add(fieldLocationCity);
...(add region & country, etc)...
TableFieldSchema fieldLocation = new TableFieldSchema();
fieldLocation.setName("location");
fieldLocation.setType("record");
// Add the inner fields to the location record.
fieldLocation.setFields(inner);
outer.add(fieldLocation);
TableSchema schema = new TableSchema();
schema.setFields(outer);

Resources