Snowflake - upload CSV - issue with only one accented character - snowflake-cloud-data-platform

I have an issue with an accented character when I upload a CSV file and then copy it into a table. the weird thing is that most accented letters are just fine, but one is being replaced by '�' when queried.
Another thing, when I use an INSERT statement, no issue whatsoever.
I use an internal stage. here's the file format definition:
create or replace file format MY_FORMAT
type = csv
record_delimiter = '\n'
field_delimiter = ';'
field_optionally_enclosed_by = '"'
skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
compression = gzip
validate_UTF8=false
skip_blank_lines = true;
The file built in Excel, saved as CSV UTF-8. No other issues, no errors, all my rows are uploaded, just that one character that's supposed to be a "û" that turns out to be "�".
Any ideas?
Thanks,
JFS.

It could an issue with the terminal being used. Please try and check in a different terminal or web UI.
I simulated the scenario and I get the result as expected. Please refer below -
#### Data contents for so_testfile.csv
id;name
1;"û"
2;"û"
3;"û"
4;"û"
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>create or replace stage so_my_stage file_format=SO_MY_FORMAT;
+----------------------------------------------+
| status |
|----------------------------------------------|
| Stage area SO_MY_STAGE successfully created. |
+----------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.138s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>put file://c:\snowflake\so_testfile.csv #so_my_stage;
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------|
| so_testfile.csv | so_testfile.csv.gz | 39 | 64 | NONE | GZIP | UPLOADED | |
+-----------------+--------------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.100s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>select $1,$2 from #so_my_stage;
+----+----+
| $1 | $2 |
|----+----|
| 1 | û |
| 2 | û |
| 3 | û |
| 4 | û |
+----+----+
4 Row(s) produced. Time Elapsed: 0.308s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>select * from SO_TEST_TAB;
+----+------+
| ID | COL1 |
|----+------|
+----+------+
0 Row(s) produced. Time Elapsed: 0.388s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>copy into SO_TEST_TAB from #so_my_stage;
+--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| so_my_stage/so_testfile.csv.gz | LOADED | 4 | 4 | 1 | 0 | NULL | NULL | NULL | NULL |
+--------------------------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 0.833s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>select * from so_test_tab;
+----+------+
| ID | COL1 |
|----+------|
| 1 | û |
| 2 | û |
| 3 | û |
| 4 | û |
+----+------+
4 Row(s) produced. Time Elapsed: 0.263s
SNOWFLAKE1#COMPUTE_WH#TEST_DB.PUBLIC>

It turns out that coding the file properly as CSV UTF-8 from Excel worked. The "û" is now displayed correctly, just like all others accented letters we have in French.
Thanks for your help.
JFS.

Related

How to list a stage in snowflake?

Look at this procedure:
greendatasvc#COMPUTE_WH#POS_DATA.BLAZE>CREATE STAGE IF NOT EXISTS NDJSON_STAGE FILE_FORMAT = NDJSON;
+---------------------------------------------------+
| status |
|---------------------------------------------------|
| NDJSON_STAGE already exists, statement succeeded. |
+---------------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.182s
greendatasvc#COMPUTE_WH#POS_DATA.BLAZE>SHOW FILE FORMATS;
greendatasvc#COMPUTE_WH#POS_DATA.BLAZE>LIST #NDJSON_STAGE;
+------+------+-----+---------------+
| name | size | md5 | last_modified |
|------+------+-----+---------------|
+------+------+-----+---------------+
0 Row(s) produced. Time Elapsed: 0.192s
greendatasvc#COMPUTE_WH#POS_DATA.BLAZE>SHOW STAGES;
+-------------------------------+--------------+---------------+-------------+-----+-----------------+--------------------+----------+---------+--------+----------+-------+----------------------+---------------------+
| created_on | name | database_name | schema_name | url | has_credentials | has_encryption_key | owner | comment | region | type | cloud | notification_channel | storage_integration |
|-------------------------------+--------------+---------------+-------------+-----+-----------------+--------------------+----------+---------+--------+----------+-------+----------------------+---------------------|
| 2021-10-19 12:31:31.043 -0700 | NDJSON_STAGE | POS_DATA | BLAZE | | N | N | SYSADMIN | | NULL | INTERNAL | NULL | NULL | NULL |
+-------------------------------+--------------+---------------+-------------+-----+-----------------+--------------------+----------+---------+--------+----------+-------+----------------------+---------------------+
1 Row(s) produced. Time Elapsed: 0.159s
I believe I already have a stage named NDJSON_STAGE based on its output when I try and create one. However, when I try and list it I get no results. Am I using the LIST function incorrectly?
Your stage exists, its confirmed both by the 'already exists' results response and by the fact that you did'nt receive any error when trying to list files from your stage.
If you see nothing with LIST #NDJSON_STAGE; command that's probably because you don't have any file in this stage. Upload a file in the stage using a PUT command then you should be able to list your availables stage files.
Just to be clear, LIST #stagename returns a list of files that have been staged - on that stage.
In your case the stage is empty.
If you want to display the stages for which you have access, then you can use SHOW STAGES and that lists all the stages for which you have access privileges

comma-separated value to rows and split other colums also /2 or /3

I have a table like this
MeterSizeGroup | WrenchTime | DriveTime
1,2,3 | | 7.843 || 5.099 |
I want to separate the comma delimited string into three rows as
MeterSizeGroup | WrenchTime | DriveTime
1 | | 2.614 | | 1.699 |
2 | | 2.614 | | 1.699 |
3 | | 2.614 | | 1.699 |
please help me how to write a query for this type of split it has to split in such a way that wrech time and driver time also has to be split by 3 enter image description here

SQLite merge duplicates

I have a SQLite table, in which there are some Rows which differ in just one column.
I want to merge the entrys in this Column with a seperator (line break in my case).
So, this:
| id | block | description|
------------------------------
| 1 | a | foo |
| 1 | a | bar |
| 3 | b | cat |
| 4 | c | mouse |
------------------------------
Should become this:
| id | block | description|
------------------------------
| 1 | a | foo \r\n bar|
| 3 | b | cat |
| 4 | c | mouse |
------------------------------
I don't even have an Idea what to search for (instead of "merge", but I couldn't find anything suitable for my application), so any Input would be appreciated.
Jann
I think you are looking for group_concat():
select id, block, group_concat(description, ' \r\n ')
from t
group by id, block;

How to Write Conditional Statement in SQL Server

I am having a logic issue in relation to querying an SQL database. I need to exclude 3 different categories and any item that is included in those categories; however, if an item under one of those categories meets the criteria for another category I need to keep said item.
This is an example output I will get after querying the database at its current version:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 05RH-27N | FLDR | Finished | KH72n
4 | 84OH-95W | FLEP | Final | tar5x
5 | 81RS-67F | FLEP | Final | tar7x
6 | 48YU-40Q | FLCM | Final | bile6
7 | 19VB-89S | FLDR | Warranty | exp380
8 | 76CS-01U | FLCL | Gator | low5
9 | 28OC-08Z | FLCM | Redo | coil34Y
item_num and description are in a table together, and pro_type and area are in 2 separate tables--a total of 3 tables to pull data from.
I need to construct a query that will not pull back any item_num where area is equal to: Finished, Final, and Redo; but I also need to pull in any item_num that meets the type criteria: FLCM and FLEP. In the end my query should look like this:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 84OH-95W | FLEP | Final | tar5x
4 | 81RS-67F | FLEP | Final | tar7x
5 | 19VB-89S | FLDR | Warranty | exp380
6 | 76CS-01U | FLCL | Gator | low5
7 | 28OC-08Z | FLCM | Redo | coil34Y
Try this:
select * from table
join...
where area not in('finished', 'final', 'redo') or type in('flcm', 'flep')
Are you looking for something like
SELECT *
FROM Table_1
JOIN Table_ProType ON Table_1.whatnot = Table_ProType.whatnot
JOIN Table_Area ON Table_1.whatnot = Table_Area.whatnot
WHERE Table.area NOT IN ('Finished','Final','Redo') OR ProType.pro_type IN ('FLCM','FLEP')
Giving the names of the three tables and the joining criteria will help me improve the answer.

MySQL Import into Innodb table severely spikes at a certain point

I'm trying to migrate a 30GB database from one server to another.
The short story is that at a certain point through the process, the amount of time it takes to import records severely increases as a spike. The following is from using the SOURCE command to import a chunk of 500k records (out of about ~25-30 million throughout the database) that was exported as an sql file that was ssh tunnelled over to the new server:
...
Query OK, 2871 rows affected (0.73 sec)
Records: 2871 Duplicates: 0 Warnings: 0
Query OK, 2870 rows affected (0.98 sec)
Records: 2870 Duplicates: 0 Warnings: 0
Query OK, 2865 rows affected (0.80 sec)
Records: 2865 Duplicates: 0 Warnings: 0
Query OK, 2871 rows affected (0.87 sec)
Records: 2871 Duplicates: 0 Warnings: 0
Query OK, 2864 rows affected (2.60 sec)
Records: 2864 Duplicates: 0 Warnings: 0
Query OK, 2866 rows affected (7.53 sec)
Records: 2866 Duplicates: 0 Warnings: 0
Query OK, 2879 rows affected (8.70 sec)
Records: 2879 Duplicates: 0 Warnings: 0
Query OK, 2864 rows affected (7.53 sec)
Records: 2864 Duplicates: 0 Warnings: 0
Query OK, 2873 rows affected (10.06 sec)
Records: 2873 Duplicates: 0 Warnings: 0
...
The spikes eventually average to 16-18 seconds per ~2800 rows affected. Granted I don't usually use Source for a large import, but for the sakes of showing legitimate output, I used it to understand when the spikes happen. Using mysql command or mysqlimport yields the same results. Even piping the results directly into the new database instead of through an sql file has these spikes.
As far as I can tell, this happens after a certain amount of records are inserted into a table. The first time I boot up a server and import a chunk that size, it goes through just fine. Give or take the estimated amount it handles until these spikes occur. I can't correlate that because I haven't consistently replicated the issue to evidently conclude that. There are ~20 tables that have sub 500,000 records that all imported just fine when those 20 tables were imported through a single command. This seems to only happen to tables that have an excessive amount of data. Granted, the solutions I've come cross so far seem to only address the natural DR that occurs when you import over time (The expected output in my case was that eventually at the end of importing 500k records, it would take 2-3 seconds per ~2800, whereas it seems the questions were addressing that at the end it shouldn't take that long). This comes from a single sugarCRM table called 'campaign_log', which has ~9 million records. I was able to import in chunks of 500k back onto the old server i'm migrating off of without these spikes occuring, so I assume this has to do with my new server configuration. Another thing is that whenever these spikes occur, the table that it is being imported into seems to have an awkward way of displaying the # of records via count. I know InnoDB gives count estimates, but the number doesn't succeed the ~, indicating the estimate. It usually is accurate and that each time you refresh the table, it doesn't change the amount it displays (This is based on what it reports through PHPMyAdmin)
Here's the following commands/InnoDB system variables I have on the new server:
INNODB System Vars:
+---------------------------------+------------------------+
| Variable_name | Value |
+---------------------------------+------------------------+
| have_innodb | YES |
| ignore_builtin_innodb | OFF |
| innodb_adaptive_flushing | ON |
| innodb_adaptive_hash_index | ON |
| innodb_additional_mem_pool_size | 8388608 |
| innodb_autoextend_increment | 8 |
| innodb_autoinc_lock_mode | 1 |
| innodb_buffer_pool_instances | 1 |
| innodb_buffer_pool_size | 8589934592 |
| innodb_change_buffering | all |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | OFF |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | fsync |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_io_capacity | 200 |
| innodb_large_prefix | OFF |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_file_size | 5242880 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 75 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 0 |
| innodb_open_files | 300 |
| innodb_print_all_deadlocks | OFF |
| innodb_purge_batch_size | 20 |
| innodb_purge_threads | 1 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 8 |
| innodb_replication_delay | 0 |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_spin_wait_delay | 6 |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | ON |
| innodb_stats_sample_pages | 8 |
| innodb_strict_mode | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
| innodb_use_native_aio | ON |
| innodb_use_sys_malloc | ON |
| innodb_version | 5.5.39 |
| innodb_write_io_threads | 8 |
+---------------------------------+------------------------+
System Specs:
Intel Xeon E5-2680 v2 (Ivy Bridge) 8 Processors
15GB Ram
2x80 SSDs
CMD to Export:
mysqldump -u <olduser> <oldpw>, <olddb> <table> --verbose --disable-keys --opt | ssh -i <privatekey> <newserver> "cat > <nameoffile>"
Thank you for any assistance. Let me know if there's any other information I can provide.
I figured it out. I increased the innodb_log_file_size from 5MB to 1024MB. While it did significantly increase the amount of records I imported (Never went above 1 second per 3000 rows), it also fixed the spikes. There were only 2 in all the records I imported, but after they happened, they immediately went back to taking sub 1 second.

Resources