Presto query error: Error reading tail from - database

I'm trying to query data with Presto connection. The data(delta format) is in S3 bucket and fails with this error:
SQL Error [16777232]: Query failed (#20211005_122441_00037_s2r9w): Error reading tail from s3://*/*/*/table/*/part-00015-bc2cc6d2-706d-4859-ab57-5f87d93d81f5-c000.snappy.parquet with length 16384
When I look at the bucket the file doesn't exist.

Looks like your data has been changed, but the metadata (I assume you're using AWS Glue as a metastore) hasn't.
You can try to CALL system.sync_partition_metadata('<YOUR_SCHEMA>', '<YOUR _TABLE>', 'full'); to get it updated.
Also make sure you've got a consistent schema between your partitions if you're using them.

Related

error while loading csv file though snlowsql copy into command

i was trying to load csv file from AWS s3 bucket with copy into command in one of the csv file throw error like
End of record reached while expected to parse column
'"RAW_PRODUCTS"["PACK_COUNT_UNITS":25]
and with the VALIDATION_MODE = RETURN_ALL_ERRORS it also give me 2 rows that have error i am not sure what error would be.
my concern is can we get specific error so that we can fix it in file.
You might try using the VALIDATE table function. https://docs.snowflake.com/en/sql-reference/functions/validate.html
Thanks Eda, i already reviewed above link but that did not work with sql query with copy into table from s3 bucket, so i create stage and place that csv file on stage and then try to run that validate command that give me same error row.
there is another way to identify error while executing copy into command you can add VALIDATION_MODE = RETURN_ALL_ERRORS you will get same result.
by the way i resolve error its due to /,, i remove / and it loaded successfully. / or /, is working as it was in other row but /,, did not work.

Track errors in snowflake - Error during validate function in snowflake

Executed a copy command using stage and also direct S3 location in snowflake with ON_ERROR = CONTINUE;
and then used select * from table(validate(test, job_id=>'_last')); to track the error records however i received the following error after execution the validate command "Failure using stage area. Cause: [The AWS Access Key Id you provided is not valid.]"
Could someone please help with error.
Thanks
Retried the same with new setup and its working fine.

Is there a way to find out details of data type erorr in Snowflake?

I am pretty new to Snowflake Cloud offering and was just trying to load a simple .csv file from AWS s3 staging are to a table in Snowflake using copy command.
Here is what I used as the command:
copy into "database name"."schema"."table name"
from #S3_ACCESS
file_format = (format_name = format name);
When run the above code, I get the following error: Numeric value '63' is not recognized
Please see the attached image. Not sure what this error is and i'm not able to find any lead in Snowflake UI itself to find out what could be wrong with the value.
Thanks in Advance!
The error says, it was waiting a numberic value, but it got "63", and this value can not be converted to numeric value.
From the image you share, I can see that there are some weird characters around 6 and 3. There could be an issue with file encoding or data is corrupted.
Please check encoding option for file format:
https://docs.snowflake.com/en/sql-reference/sql/create-file-format.html#format-type-options-formattypeoptions
By the way, I recommend you always use utf-8.

Importing a tsv file into dynamoDB

I have a dataset, got from the registry of open data on AWS.
This is the link of my data set. I want to import this data set into a DynamoDb table, but I don't know how to do it
I tried to use data Pipeline, from S3 bucket to dynamoDB but it didn't work
at javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused
by: com.google.gson.stream.MalformedJsonException: Expected ':' at
line 1 column 20 at
com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505) at
com.google.gson.stream.JsonReader.doPeek(JsonReader.java:519) at
com.google.gson.stream.JsonReader.peek(JsonReader.java:414) at
com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:157)
at
com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
at
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:187)
at
com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:145)
at com.google.gson.Gson.fromJson(Gson.java:803) ... 15 more Exception
in thread "main" java.io.
I have this error and I don't know how to fix it.
than I downloaded the file locally but I can't import it into my table in dynamoDb
There is no code for the moment, all i do is configuration
I'm expecting to have the data set into my table, but unfortunately I couldn't reach my goal
i finally converted the TSV file to a CSV file then to a Json file using a python script.
with a Json file it's much easier

Access database in Physionet's ptbdb by Matlab

I set up the system first by
[old_path]=which('rdsamp');if(~isempty(old_path)) rmpath(old_path(1:end-8)); end
wfdb_url='http://physionet.org/physiotools/matlab/wfdb-app-matlab/wfdb-app-toolbox-0-9-3.zip';
[filestr,status] = urlwrite(wfdb_url,'wfdb-app-toolbox-0-9-3.zip');
unzip('wfdb-app-toolbox-0-9-3.zip');
cd mcode
addpath(pwd);savepath
I am trying to read databases from Physionet.
I have successfully reached one database mitdb by
[tm,sig]=rdsamp('mitdb/100',1)
but I want to reach the database ptbdb unsuccessfully by
[tm,sig]=rdsamp('ptbdb/100',1)
and get the error
Warning: Could not get signal information. Attempting to read signal without buffering.
> In rdsamp at 107
Error: Cannot convert to double:
init: can't open header for record ptbdb/100
Error using rdsamp (line 145)
Java exception occurred:
java.lang.NumberFormatException: Cannot convert
at org.physionet.wfdb.Wfdbexec.execToDoubleArray(Unknown Source)
The first error message refers to these lines in rdsamp.m:
if(isempty(N))
[siginfo,~]=wfdbdesc(recordName);
if(~isempty(siginfo))
N=siginfo(1).LengthSamples;
else
warning('Could not get signal information. Attempting to read signal without buffering.')
end
end
This line if(~isempty(siginfo)) is false means that the siginfo is empty that is there is no signal. Why? No access to the database, I think.
I think other errors follow from it.
So the error must follow from this line
[siginfo,~]=wfdbdesc(recordName);
What does the snake mean here in the brackets?
How can you get data from ptbdb by Matlab?
So
Does this error mean that the connection cannot be established to the database?
or
that there does not exists such data in the database?
It would be very nice to know how you can check if you have connection to the database like in Postrgres. It would be much easier to debug.
If you run physionetdb("ptdb",1) it will download the files to your computer. You will then be able to see the available records in the <current-dir>/ptdb/
Source: physionetdb function documentation. You are interested in the DoBatchDownload parameter.
After downloading it, I believe every command from the toolbox will check if you have the files locally before fetching from the server (as long as you give the function the correct path to the local files).
The problem is that the data unit "100" does not exist in the database ptbdb.
I run finally successfully after waiting 35 minutes with 100Mb cable broadband:
db_list = physionetdb('ptbdb')
and get not complete data finally to the patient 54 - there should be 294 patients.
'ptbdb/patient001/s0014lre' 'ptbdb/patient001/s0014lre' ... cut ...
The main developer, Ikaro's' answer helped me to wait so long:
The WFDB Toolbox connects to PhysioNet's file server. The databases
accessible through the WFDB Toolbox are not SQL database, they consist
of flat files. The error message that you are getting regarding the
ptdb/100 database is because you are attempting to get a record that
does not exist on the database.
For more information on a particular database or record in PhysioNet
please type:
help physionetdb
and
physionetdb('ptdb')
This flat file system is really a bottle neck in the system.
It would be a good time to change to SQL.

Resources