creating AWS glue metadata from the command line

creating AWS glue metadata from the command line - database

I want to create glue metadata based on an already existing table but I want to read in values from a different S3 bucket. So basically the location parameter in my DDL script will be different and the name of the table.
Can someone help me figure out how to this via the command line? I have been going through aws documentation but haven't found anything helpful yet.

Related

Worksheet Code in Snowflake Web UI -> Where is it stored?

In the Snowflake Web UI, you have the option to rename and/or save worksheet "code". Where is this code stored? Is it local to the machine, in a table on snowflake, or out in the ether of the web?
Example below: Tab named "DEV Acct Perf CE" contains a series of SQL statements. Where are those statements stored?

They are stored in S3, Azure BLOB, or Google Cloud Storage depending on where you're running Snowflake. It's stored to Snowflake-managed storage, so the only place you can access it is through the web UI. The newer UI in preview allows sharing between users. The current UI is single-user, so you'd need to copy & paste any statements.
Edit: You can see where they're stored, but I think the body of the worksheet is encrypted.
You can see where they're stored by doing this:
ls #~/worksheet_data/;
I downloaded mine and tried gunzip on the body, but that didn't work. I also tried selecting it in Snowflake using the JSON file format, but that didn't work either. I think the body field may be encrypted in addition to being compressed.

Last I checked they were stored in your personal internal stage.
Try this:
list #~;
There should be a folder there called worksheets if I recall correctly. I never tried to open them to see what the files look like, but I did move the files from one user to another successfully when I had to recreate a user for one of my users.
https://docs.snowflake.com/en/user-guide/data-load-local-file-system-create-stage.html#user-stages

Unable to see any columns in table after running AWS Glue crawler

I am relatively new to AWS Glue, but after creating my crawler and running it successfully, I can see that a new table has been created but I can't see any columns in that table. It is absolutely blank.
I am using a .csv file from a S3 bucket as my data source.

Is your file UTF8 encoded... Glue has a problem if it’s not.
Does your file have at least 2 records
Does the file have more than one column.
There are various factors that impact the crawler from identifying a csv file
Please refer to this documentation that talks about the built in classifier and what it needs to crawl a csv file properly
https://docs.aws.amazon.com/glue/latest/dg/add-classifier.html

Snowpipe: Importing historical data that may be modified without causing re-imports

To start with, I'm not sure if this is possible with the existing features of Snowpipe.
I have a S3 bucket with years of data, and occasionally some of those files get updated (the contents change, but the file name stays the same). I was hoping to use Snowpipe to import these files into Snowflake, as the "we won't reimport files that have been modified" aspect is appealing to me.
However, I discovered that ALTER PIPE ... REFRESH can only be used to import files staged no earlier than seven days ago, and the only other recommendation Snowflake's documentation has for importing historical data is to use COPY INTO .... However, if I use that, then if those old files get modified, they get imported via Snowflake since the metadata preventing COPY INTO ... from re-importing the S3 files and the metadata for Snowpipe are different, so I can end up with that file imported twice.
Is there any approach, short of "modify all those files in S3 so they have a recent modified-at timestamp", that would let me use Snowpipe with this?

If you're not opposed to a scripting solution for this, one solution would be to write a script to pull the set of in scope object names from AWS S3 and feed them to the Snowpipes REST API. The code you'd use for this is very similar to what is required if you're using an AWS Lambda to call the Snowpipe REST API when triggered via an S3 event notification. You can either use the AWS SDK to get the set of objects from S3, or just use Snowflake's LIST STAGE statement to pull them.
I've used this approach multiple times to backfill historical data from an AWS S3 location where we've enabled Snowpipe ingestion after data had already been written there. Even in the scenario where you don't have to worry about a file being updated in place, this can still be an advantage over just falling back to a direct COPY INTO because you don't have to worry if there's any overlap between when the PIPE was first enabled and the set of files you push to the Snowpipe REST API since the PIPE load history will take care of that for you..

Is there a way to create migration script for a single or a selected group of objects with SqlPackage?

I'm trying to migrate specific objects from one database to another using sqlpackage.exe /action:Extract and sqlpackage.exe /action:Script. Currently I'm creating the script and filtering the unneeded objects manually, I would like to be able to exclude them all together and to automate the process. So far I didn't find in the documentation any option that does it.Thanks.

There is no way to remove single objects with native functionality. Natively you can remove only specific object types.
You can write your own deployment contributor and then skip whatever objects you need. Here is an example here.
Check Ed Elliot's ready to use contributor with bunch of configuration options (I haven't used it for a while and do not know how does it work with the new versions of SQL Server).
Additionally, in Ed Elliot's blog you can find a lot of useful information.

Best strategy to initially populate a Grails database backend

I'd like to know your approach/experiences when it's time to initially populate the Grails DB that will hold your app data. Assuming you have CSVs with data, is is "safer" to create a script (with whatever tool fits you) that:
1.-Generates the Bootstrap commands with the domain classes, run it in test or dev environment and then use the native db commands to export it to prod?
2.-Create the DB's insert script assuming GORM's version = 0 and incrementing manually the soon-to-be autogenerated IDs ?
My fear is that the second approach may lead to inconsistencies for hibernate will have the responsability for the IDs generation and there may be something else I'm missing.
Thanks in advance.

Take a look at this link. This allows you to run groovy scripts in the normal grails context giving you access to all grails features including GORM. I'm currently importing data from a legacy database and have found that writing a Groovy script using the Groovy SQL interface to pull out the data then putting that data in domain objects appears to be the easiest thing to do. Once you have the data imported you just use the commands specific to your database system to move that data to the production database.
Update:
Apparently the updated entry referenced from the blog entry I link to no longer exists. I was able to get this working using code at the following link which is also referenced in the comments.
http://pastie.org/180868

Finally it seems that the simplest solution is to consider that GORM as of the current release (1.2) uses a single sequence for all auto-generated ids. So considering this when creating whatever scripts you need (in the language of your preference) should suffice. I understand it's planned for 1.3 release that every table has its own sequence.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

creating AWS glue metadata from the command line - database

Related

Worksheet Code in Snowflake Web UI -> Where is it stored?

Unable to see any columns in table after running AWS Glue crawler

Snowpipe: Importing historical data that may be modified without causing re-imports

Is there a way to create migration script for a single or a selected group of objects with SqlPackage?

Best strategy to initially populate a Grails database backend

Categories

Resources