Adding comments to database columns and retrieving from AWS Glue - database

I'm trying to incorporate a AWS GLUE Data Catalog to my Data Lake I'm building out. I'm using a few different databases and would like to add COMMENTS to columns in a few of these tables. These databases include Redshift and MySql. I usually add the comments to the column by doing something along the lines of
COMMENT ON COLUMN table.column_name IS 'This is the comment';
Now i know that Glue has a comment field that shows in the GUI. Is there a way to sync the comment field in Glue with the comments I add to the columns in a DB?

In order to update some meta information about a table that has been defined in AWS Glue Data Catalog, you would need to use a combination of get_table() and update_table() methods with boto3 for example .
Here is the most naive approach to do that:
import boto3
from pprint import pprint
glue_client = boto3.client('glue')
database_name = "__SOME_DATABASE__"
table_name = "__SOME_TABLE__"
response = glue_client.get_table(
DatabaseName=database_name,
Name=table_name
)
original_table = response['Table']
Here original_table adheres response syntax defined by get_table(). However, we need to remove some fields from it so it would pass validation when we use update_table(). List of allowed keys could be obtained by passing original_table directly to update_table() without any chagnes
allowed_keys = [
"Name",
"Description",
"Owner",
"LastAccessTime",
"LastAnalyzedTime",
"Retention",
"StorageDescriptor",
"PartitionKeys",
"ViewOriginalText",
"ViewExpandedText",
"TableType",
"Parameters"
]
updated_table = dict()
for key in allowed_keys:
if key in original_table:
updated_table[key] = original_table[key]
For simplicity sake, we will change comment of the very first column from the table
new_comment = "Foo Bar"
updated_table['StorageDescriptor']['Columns'][0]['Comment'] = new_comment
response = glue_client.update_table(
DatabaseName=database_name,
TableInput=updated_table
)
pprint(response)
Obviously, if you want to add a comment to a specific column you would need to extend this to
new_comment = "Targeted Foo Bar"
target_column_name = "__SOME_COLUMN_NAME__"
for col in updated_table['StorageDescriptor']['Columns']:
if col['Name'] == target_column_name:
col['Comment'] = new_comment
response = glue_client.update_table(
DatabaseName=database_name,
TableInput=updated_table
)
pprint(response)

Related

Is there a way to upload the byte type created with zlib to Google bigquery?

I want to input string data into bigquery by implied by pyhton's zlib library.
Here is an example code that uses zlib to generate data:
import zlib
import pandas as pd
string = 'abs'
df = pd.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
df.append({'id' : 1, 'data' : data}, ignore_index=True)
I've also tried both methods provided by the bigquery API, but both of them give me an error.
The schema is:
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
Examples of methods I have tried are:
1. bigquery API
job = bigquery_client.load_table_from_dataframe(
df, table, job_config=job_config
)
job.result()
2. pandas_gbq
df.to_gbq(detination_table, project_id, if_exists='append')
However, both give similar errors.
1. error
pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)
2. error
pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
Is there any way to solve this ?
I want to input python bytestring as bigquery byte data.
Thank you
The problem isn't coming from the insertion of your zlib compressed data. The error occurs on the insertion of the value of your key id which is dataframe value 1 to the NUMERIC data type in BigQuery.
The easiest solution for this is to change the datatype of your schema in BigQuery from NUMERIC to INTEGER.
However, if you really need your schema to be in NUMERIC datatype, you may convert the dataframe datatype of 1 on your python code using decimal library as derived from this SO post before loading it to BigQuery.
You may refer to below sample code.
from google.cloud import bigquery
import pandas
import zlib
import decimal
# Construct a BigQuery client object.
client = bigquery.Client()
# Set table_id to the ID of the table to create.
table_id = "my-project.my-dataset.my-table"
string = 'abs'
df = pandas.DataFrame()
data = zlib.compress(bytearray(string, encoding='utf-8'), -1)
record = df.append({'id' : 1, 'data' : data}, ignore_index=True)
df_2 = pandas.DataFrame(record)
df_2['id'] = df_2['id'].astype(str).map(decimal.Decimal)
dataframe = pandas.DataFrame(
df_2,
# In the loaded table, the column order reflects the order of the
# columns in the DataFrame.
columns=[
"id",
"data",
],
)
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("id", bigquery.enums.SqlTypeNames.NUMERIC),
bigquery.SchemaField("data", bigquery.enums.SqlTypeNames.BYTES),
],
write_disposition = "WRITE_APPEND"
)
job = client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
) # Make an API request.
job.result() # Wait for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
OUTPUT:

Load unmapped tables in Symfony with Doctrine

I have tables in my database, that are not managed by Symfony; there are no entities for these tables. They are tables from another application, I import them and use Symfony to generate statistics from the data in the tables.
How do I access this?
Can i use doctrine and a regular repository for this?
I just want to read data, not update.
Right now I'm using straight mysqli_connect and mysqli_query, but that just doesn't feel right using Symfony 5.
You should just be able to query with sql. The following example comes straight from the docs:
// src/Repository/ProductRepository.php
// ...
class ProductRepository extends ServiceEntityRepository
{
public function findAllGreaterThanPrice(int $price): array
{
$conn = $this->getEntityManager()->getConnection();
$sql = '
SELECT * FROM product p
WHERE p.price > :price
ORDER BY p.price ASC
';
$stmt = $conn->prepare($sql);
$stmt->execute(['price' => $price]);
// returns an array of arrays (i.e. a raw data set)
return $stmt->fetchAllAssociative();
}
}
https://symfony.com/doc/current/doctrine.html#querying-with-sql

How to judge if an element is in list as defined Jsonfield?

I use peewee related with an exsits table:
import peewee
from playhouse.postgres_ext import *
class Rules(peewee.Model):
channels = JSONField(null=True)
remark = peewee.CharField(max_length=500, null=True)
class Meta:
database = db
db_table = 'biz_rule'
schema = 'opr'
example: in my table there exists a record in column channels:
["A012102","C012102","D012102","E012102"]
I want to judge whether "A012102" is in the list,how to write the code?
If you're using PostgreSQL 9.4+, you can use the jsonb data type using the corresponding postgres_ext.BinaryJSONField peewee field type. It has contains_any() and contains_all() methods that correspond to the PostgreSQL ?| and ?& operators (see the PostgreSQL JSON docs). So I think it'd be something like this:
from playhouse.postgres_ext import BinaryJSONField
class Rules(peewee.Model):
channels = BinaryJSONField(null=True)
...
query = Rules.select().where(Rules.channels.contains_all('A012102'))

Getting a 'The data types nvarchar(max) and ntext are incompatible in the equal to operator.'

I am trying to populate a table with data and am using Django's get_or_create method. Whenever I do this it will enter records into the database but at a certain record it will throw the above error. My queryset function is
r, created = Response.objects.get_or_create(
auth_user=auth_user,
name=surv_name,
organization=org_id,
category=category,
question=question,
present_order=present_order,
reference=reference,
quest_id=quest_id,
survey_id=survey_id
)
My response table is
class Response(models.Model):
auth_user = models.ForeignKey('AuthUser')
survey = models.ForeignKey('Survey')
name = models.CharField(max_length=50)
organization = models.ForeignKey('Organization')
tf_question_key = models.CharField(max_length=50)
category = models.CharField(max_length=25, blank=True, null=True)
question = models.CharField(max_length=2048)
quest_id = models.CharField(max_length=25)
present_order = models.IntegerField()
reference = models.CharField(max_length=20)
answer = models.CharField(max_length=2048)
remediation = models.CharField(max_length=2048, blank=True, null=True)
dt_started = models.DateTimeField(db_column='DT_Started',
auto_now_add=True) # Field name made lowercase.
dt_completed = models.DateTimeField(db_column='DT_COMPLETED',
auto_now_add=True) # Field name made lowercase.
class Meta:
managed = False
db_table = 'response'
and the traceback where the error is located is
organization <Organization: Individual Offices>
r <Response: Response object>
user_id 2
question ('Does your written policy include the follow-up process for significant outstanding checks, including, but not limited to, checks to recording clerk, checks to tax collector, hazard insurance checks, underwriter checks or checks for mortgage payoffs and any other high risk items? ( 2.03 k )')
present_order 21
survey_id 1
reference '2.03 (k)'
quest_id 27
created True
category 'Pillar II'
surv_name 'Compliance Benchmark'
org_id 1
auth_user <AuthUser: AuthUser object>
I can add records to the table by using
r = Response(
auth_user=auth_user,
name=surv_name,
organization=organization,
category=category,
question=question,
present_order=present_order,
reference=reference,
quest_id=quest_id,
survey_id=survey_id
)
r.save()
but I need to use the get_or_create method to avoid duplicating records. I am not sure why I can add records with the .save() method but not with get_or_create and also why with get_or_create it will add records up to a certain one and then fail. The only thing that is changing is the question, quest_id, present_order, and reference.
I am using python 3.4, django 1.8.4 and SQL Server 2014
Any insight would be greatly appreciated.
I ran into the same issue and turned on logging on sql server to see what was occurring. It looks like long text fields are being converted to ntext. This is then being compared to the nvarchar field causing the error.
The error is occurring during the SELECT within the get_or_create function. Instead of using get_or_create, query for your model with startswith. Using startswith performs a LIKE check which will work. I also added a length check on the field to ensure the fields will match instead of finding other rows with the same starting value.
from django.core.exceptions import ObjectDoesNotExist
from django.db.models.functions import Length
attrs = {
auth_user=auth_user,
name=surv_name,
organization=org_id,
category=category,
present_order=present_order,
reference=reference,
quest_id=quest_id,
survey_id=survey_id,
}
try:
r = Response.objects.annotate(
text_len=Length('question')
).get(
text_len__exact=len(question),
question__startswith=question,
**attrs
)
except ObjectDoesNotExist:
r = Response.objects.create(
question=question,
**attrs
)

Salesforce SOQL describe table

Is there a way to fetch a list of all fields in a table in Salesforce? DESCRIBE myTable doesn't work, and SELECT * FROM myTable doesn't work.
From within Apex, you can get this by running the following Apex code snippet. If your table/object is named MyObject__c, then this will give you a Set of the API names of all fields on that object that you have access to (this is important --- even as a System Administrator, if certain fields on your table/object are not visible through Field Level Security to you, they will not show up here):
// Get a map of all fields available to you on the MyObject__c table/object
// keyed by the API name of each field
Map<String,Schema.SObjectField> myObjectFields
= MyObject__c.SObjectType.getDescribe().fields.getMap();
// Get a Set of the field names
Set<String> myObjectFieldAPINames = myObjectFields.keyset();
// Print out the names to the debug log
String allFields = 'ALL ACCESSIBLE FIELDS on MyObject__c:\n\n';
for (String s : myObjectFieldAPINames) {
allFields += s + '\n';
}
System.debug(allFields);
To finish this off, and achieve SELECT * FROM MYTABLE functionality, you would need to construct a dynamic SOQL query using these fields:
List<String> fieldsList = new List<String>(myObjectFieldAPINames);
String query = 'SELECT ';
// Add in all but the last field, comma-separated
for (Integer i = 0; i < fieldsList.size()-1; i++) {
query += fieldsList + ',';
}
// Add in the final field
query += fieldsList[fieldsList.size()-1];
// Complete the query
query += ' FROM MyCustomObject__c';
// Perform the query (perform the SELECT *)
List<SObject> results = Database.query(query);
the describeSObject API call returns all the metadata about a given object/table including its fields. Its available in the SOAP, REST & Apex APIs.
Try using Schema.FieldSet
Schema.DescribeSObjectResult d = Account.sObjectType.getDescribe();
Map<String, Schema.FieldSet> FsMap = d.fieldSets.getMap();
complete documentation
Have you tried DESC myTable?
For me it works fine, it's also in the underlying tips in italic. Look:

Resources