I know that I can import .csv file into a pre-existing table in a sqlite database through:
.import filename.csv tablename
However, is there such method/library that can automatically create the table (and its schema), so that I don't have to manually define: column1 = string, column2 = int ....etc.
Or, maybe we can import everything as string. To my limited understanding, sqlite3 seems to treat all fields as string anyway?
Edit:
The names of each column is not so important here (assume we can get that data from the first row in the CSV file, or they could be arbitrary names) The key is to identify the value types of each column.
This seems to work just fine for me (in sqlite3 version 3.8.4):
$ echo '.mode csv
> .import data_with_header.csv some_table' | sqlite3 db
It creates the table some_table with field names taken from the first row of the data_with_header.csv file. All fields are of type TEXT.
You said yourself in the comment that its a nontrivial problem to determine the types of columns. (Imagine a million rows that all look like numbers, but one of those rows has a Z in it. - Now that row has to be typed "string".)
Though non-trivial, it's also pretty easy to get the 90% scenario working. I would just write a little Python script to do this. Python has a very nice library for parsing CSV files and its interface to sqlite is simple enough.
Just load the CSV, guess and check at the column types. Devise a create table that encapsulates this information, then emit your insert intos. I can't imagine this taking up more than 20 lines of Python.
This is a little off-topic but it might help to use a tool that gives you all the SQL functionality on an individual csv file without actually using SQLite directly.
Take a look at TextQL - a utility that allows querying of csv files directly which uses SQLite engine in memory:
https://github.com/dinedal/textql
textql -header -sql "select * from tbl" -source some_file.csv
Related
I'm testing out a trial version of Snowflake. I created a table and want to load a local CSV called "food" but I don't see any "load" data option as shown in tutorial videos.
What am I missing? Do I need to use a PUT command somewhere?
Don't think Snowsight has that option in the UI. It's available in the classic UI though. Go to Databases tab, select a database. Go to Tables tab and select a table the option will be at the top
If the classic UI is limiting you or you are already using Snowsight and don't want to switch back, then here is another way to upload a CSV file.
A preliminary is that you have installed SnowSQL on your device (https://docs.snowflake.com/en/user-guide/snowsql-install-config.html).
Start SnowSQL and perform the following steps:
Use the database where to upload the file to. You need various privileges for creating a stage, a fileformat, and a table. E.g. USE MY_TEST_DB;
Create the fileformat you want to use for uploading your CSV file. E.g.
CREATE FILE FORMAT "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT TYPE = 'CSV';
If you don't configure the RECORD_DELIMITER, the FIELD_DELIMITER, and other stuff, Snowflake uses some defaults. I suggest you have a look at https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html. Some of the auto detection stuff can make your life hard and sometimes it is better to disable it.
Create a stage using the previously created fileformat
CREATE STAGE MY_STAGE file_format = "MY_TEST_DB"."PUBLIC".MY_FILE_FORMAT;
Now you can put your file to this stage
PUT file://<file_path>/file.csv #MY_STAGE;
You can find documentation for configuring the stage at https://docs.snowflake.com/en/sql-reference/sql/create-stage.html
You can check the upload with
SELECT d.$1, ..., d.$N FROM #MY_STAGE/file.csv d;
Then, create your table.
CREATE TABLE MY_TABLE (col1 varchar, ..., colN varchar);
Personally, I prefer creating first a table with only varchar columns and then create a view or a table with the final types. I love the try_to_* functions in snowflake (e.g. https://docs.snowflake.com/en/sql-reference/functions/try_to_decimal.html).
Then, copy the content from your stage to your table. If you want to transform your data at this point, you have to use an inner select. If not then the following command is enough.
COPY INTO mycsvtable from #MY_STAGE/file.csv;
I suggest doing this without the inner SELECT because then the option ERROR_ON_COLUMN_COUNT_MISMATCH works.
Be aware that the schema of the table must match the format. As mentioned above, if you go with all columns as varchars first and then transform the columns of interest in a second step, you should be fine.
You can find documentation for copying the staged file into a table at https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
If you can check the dropped lines as follows:
SELECT error, line, character, rejected_record FROM table(validate("MY_TEST_DB"."MY_SCHEMA"."MY_CSV_TABLE", job_id=>'xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'))
Details can be found at https://docs.snowflake.com/en/sql-reference/functions/validate.html.
If you want to add those lines to your success table you can copy the the dropped lines to a new table and transform the data until the schema matches with the schema of the success table. Then, you can UNION both tables.
You see that it is pretty much to do for loading a simple CSV file to Snowflake. It becomes even more complicated when you take into account that every step can cause some specific failures and that your file might contain erroneous lines. This is why my team and I are working at Datameer to make these types of tasks easier. We aim for a simple drag and drop solution that does most of the work for you. We would be happy if you would try it out here: https://www.datameer.com/upload-csv-to-snowflake/
So I need a way to import CSVs that vary in column names, column order, and number of columns. They will always be CSV and of course comma-delimited.
Is it possible to generate both FMT and a temp table creation script of a CSV file?
From what I can gather, you need one or the other. For example, you need the table to generate the FMT file using the bcp utility. And you need the FMT file to dynamically build a create script for a table.
Using just SQL and to dynamically load files text files there is no quick way to do this. I see one option:
Get the data into SQL Server as a single column (bcp it in or use
t-sql and openrowset to load, SSIS, etc...). Be sure to include in this table a second column that is an identity (I'll call it "row_nbr"). You will need this to find the first row to get column names from the header in the file.
Parse the first record "where row_nbr = 1" to get the header record. You will need a string parse function (find online, or create your own) to substring out each column name.
Build dynamic SQL statement to create a new table with the parsed
out number of fields you just found. Must calculate lengths and use
a generic "varchar" data type since you wont know how to type the
data. Use column names found above.
Once you have a table created with the correct number of adequately
sized columns, you can create the format file.
I assumed, in my answer, that you are comfortable with doing all these things, just shared the logical flow at a high level. I can add more if you need more detail.
I have one database with an image table that contains just over 37,000 records. Each record contains an image in the form of binary data. I need to get all of those 37,000 records into another database containing the same table and schema that has about 12,500 records. I need to insert these images into the database with an IF NOT EXISTS approach to make sure that there are no duplicates when I am done.
I tried exporting the data into excel and format it into a script. (I have doe this before with other tables.) The thing is, excel does not support binary data.
I also tried the "generate scripts" wizard in SSMS which did not work because the .sql file was well over 18GB and my PC could not handle it.
Is there some other SQL tool to be able to do this? I have Googled for hours but to no avail. Thanks for your help!
I have used SQL Workbench/J for this.
You can either use WbExport and WbImport through text files (the binary data will be written as separate files and the text file contains the filename).
Or you can use WbCopy to copy the data directly without intermediate files.
To achieve your "if not exists" approache you could use the update/insert mode, although that would change existing row.
I don't think there is a "insert only if it does not exist mode", but you should be able to achieve this by defining a unique index and ignore errors (although that wouldn't be really fast, but should be OK for that small number of rows).
If the "exists" check is more complicated, you could copy the data into a staging table in the target database, and then use SQL to merge that into the real table.
Why don't you try the 'Export data' feature? This should work.
Right click on the source database, select 'Tasks' and then 'Export data'. Then follow the instructions. You can also save the settings and execute the task on a regular basis.
Also, the bcp.exe utility could work to read data from one database and insert into another.
However, I would recommend using the first method.
Update: In order to avoid duplicates you have to be able to compare images. Unfortunately, you cannot compare images directly. But you could cast them to varbinary(max) for comparison.
So here's my advice:
1. Copy the table to the new database under the name tmp_images
2. use the merge command to insert new images only.
INSERT INTO DB1.dbo.table_name
SELECT * FROM DB2.dbo.table_name
WHERE column_name NOT IN
(
SELECT column_name FROM DB1.dbo.table_name
)
Is it possible to create a Star Schema Database from data available in an XML file or a text file?
Your valuable inputs appreciated.
Thanks in advance.
Regards,
Sam.
It's a very vague question, but you can pretty much describe anything logical with an XML file. XML is "extensible", so you can make it describe nearly any set of data in any form as long as there is consistency.
I've done both. I wrote a some JavaScripts that parse files and creates records in an Oracle database. One script processed the Tyco2 data from a formated text file. I've only done two of the 20 text files (250,000 stars) because of the size and I'm doing this on my laptop. A similar file created a table for the Hipparcos data (118,000 stars). The other I just finished processed the data from a file called "Stars - 2300+6000-2.xml." That file contained some stars with names and I created a table with the named stars that matched the Hipparcos stars. Alas only 92 of them showed up.
For xml, I used the Microsoft.XMLDOM interface and to add the data to Oracle, I used ADODB.Connection. The xml also used ADODB.Recordset because I had to run queries against the Hipparcos table to find the Hipparcos number (where there was a match) matching a star in the xml file.
If that is what your looking for, let me know and I can send the JavaScripts.
Frank Perry, MSEE
For one of my projects I display a list of counties in a drop down list (this list comes from a lookup table containing all counties). The client just requested that I limit it to a subset of their choice. The subset was given to me in an excel spreadsheet containing only names (seen below):
I'm trying to figure out the quickest way possible for me to map each of these to their corresponding id in the original lookup table. The client cannot give me these IDs. The names in here match the names in my table (except for the case).
This will most likely be a one time only thing.
Can anyone suggest a fast way to get these values into a query so I don't have to manually do it?
When I say fast I'm not talking about processing speed, just the fastest start to finish time that results in me getting the corresponding IDs using any tool available.
Note: I'm aware that I could have probably done this manually in the time it will take to get an answer, but I'd like to know for future reference.
You could do an External Data Query into another Excel sheet with
SELECT countryname, countryid FROM countries
then use a VLOOKUP to get the ids into the client provided sheet
=VLOOKUP(A1,Sheet2!$A$1:$B$200,2,FALSE)
See http://excelusergroup.org/blogs/nickhodge/archive/2008/11/04/excel-2007-getting-external-data.aspx for creating External Data Table in Excel 2007. Skip down to the "Other Sources" part.
Put the list in a text file. Write a powershell script which will get the contents of that file and then query your database to output the keys. Here is a rough, rough example.
get-content c:\list.txt | ForEach-Object {invoke-sqlcmd -E -query"select blah blah where county =" _$} | Format-Table
If you have access to SSIS, you could probably do a join between the excel source and your table.
You could load the excel sheet in to a temp table to take advantage of all your SQL query knowledge.
I believe (and yes it is true) that SQL can create a linked server out of a spreadsheet. Then you get to joining and you're done.