We are trying to upload files into Google Cloud Storage before moving them into BigQuery, but we are often facing '500 Internal Server Error' or '410 Gone' (raw messages below) during some uploads.
We are using the official SDK and have added retry with exponential backoff but the errors are always here. Do you have any advise please ?
Here is how we upload (scala) :
val credential = new GoogleCredential().setAccessToken(accessToken)
val requestInitializer = new HttpRequestInitializer() {
def initialize(request: HttpRequest): Unit = {
credential.initialize(request)
// to avoid read timed out exception
request.setConnectTimeout(200000)
request.setReadTimeout(200000)
request.setIOExceptionHandler(new
HttpBackOffIOExceptionHandler(new ExponentialBackOff()))
request.setUnsuccessfulResponseHandler(new
HttpBackOffUnsuccessfulResponseHandler(new ExponentialBackOff()))
}
}
val storage = new Storage.Builder(
new NetHttpTransport,
JacksonFactory.getDefaultInstance,
requestInitializer
).setApplicationName("MyAppHere").build
val objectMetadata = new StorageObject()
.setBucket(bucketName)
.setName(distantFileName)
val isc = new InputStreamContent("binary/octet-stream", fis)
val length = isc.getLength
val insertObject = storage.objects().insert(bucketName, objectMetadata, isc)
// For small files, you may wish to call setDirectUploadEnabled(true), to
// reduce the number of HTTP requests made to the server.
if (length > 0 && length <= 2 * 1000 * 1000 /* 2MB */ ) {
insertObject.getMediaHttpUploader.setDirectUploadEnabled(true)
}
insertObject.execute()
Our scala dependancies :
"com.google.api-client" % "google-api-client" % "1.18.0-rc",
"com.google.api-client" % "google-api-client-jackson2" % "1.18.0-rc",
"com.google.apis" % "google-api-services-bigquery" % "v2-rev142-1.18.0-rc",
"com.google.apis" % "google-api-services-storage" % "v1-rev1-1.18.0-rc",
"com.google.http-client" % "google-http-client" % "1.18.0-rc",
"com.google.oauth-client" % "google-oauth-client" % "1.18.0-rc"
Raw SDK error responses :
500 Internal Server Error
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
410 Gone
{
"code" : 500,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
Every Backend error should be handled with an exponential retry, as there might be service problems.
If the error still persists after let's say 10 hours then you should contact the support in order to provide you 1:1 help to your problem.
Related
I'm trying to find out what protocol the SnowFlake JDBC library uses to communicate with SnowFlake. I see hints here and there that it seems to be using HTTPS as the protocol. Is this true?
To my knowledge, other JDBC libraries like for example for Oracle or PostgreSQL use the lower level TCP protocol to communicate with their database servers, and not the application-level HTTP(S) protocol, so I'm confused.
My organization only supports securely routing http(s)-based communication. Can I use this snowflake jdbc library then?
I have browsed all documentation that I could find, but wasn't able to answer this question.
My issue on GitHub didn't get an answer either.
Edit: Yes, I've seen this question, but I don't feel that it answers my question. SSL/TLS is an encryption, but that doesn't specify the data format.
It looks like the jdbc driver uses HTTP Client HttpUtil.initHttpClient(httpClientSettingsKey, null);, as you can see in here
The HTTP Utility Class is available here
Putting an excerpt of the session open method here in case the link goes bad/dead.
/**
* Open a new database session
*
* #throws SFException this is a runtime exception
* #throws SnowflakeSQLException exception raised from Snowflake components
*/
public synchronized void open() throws SFException, SnowflakeSQLException {
performSanityCheckOnProperties();
Map<SFSessionProperty, Object> connectionPropertiesMap = getConnectionPropertiesMap();
logger.debug(
"input: server={}, account={}, user={}, password={}, role={}, database={}, schema={},"
+ " warehouse={}, validate_default_parameters={}, authenticator={}, ocsp_mode={},"
+ " passcode_in_password={}, passcode={}, private_key={}, disable_socks_proxy={},"
+ " application={}, app_id={}, app_version={}, login_timeout={}, network_timeout={},"
+ " query_timeout={}, tracing={}, private_key_file={}, private_key_file_pwd={}."
+ " session_parameters: client_store_temporary_credential={}",
connectionPropertiesMap.get(SFSessionProperty.SERVER_URL),
connectionPropertiesMap.get(SFSessionProperty.ACCOUNT),
connectionPropertiesMap.get(SFSessionProperty.USER),
!Strings.isNullOrEmpty((String) connectionPropertiesMap.get(SFSessionProperty.PASSWORD))
? "***"
: "(empty)",
connectionPropertiesMap.get(SFSessionProperty.ROLE),
connectionPropertiesMap.get(SFSessionProperty.DATABASE),
connectionPropertiesMap.get(SFSessionProperty.SCHEMA),
connectionPropertiesMap.get(SFSessionProperty.WAREHOUSE),
connectionPropertiesMap.get(SFSessionProperty.VALIDATE_DEFAULT_PARAMETERS),
connectionPropertiesMap.get(SFSessionProperty.AUTHENTICATOR),
getOCSPMode().name(),
connectionPropertiesMap.get(SFSessionProperty.PASSCODE_IN_PASSWORD),
!Strings.isNullOrEmpty((String) connectionPropertiesMap.get(SFSessionProperty.PASSCODE))
? "***"
: "(empty)",
connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY) != null
? "(not null)"
: "(null)",
connectionPropertiesMap.get(SFSessionProperty.DISABLE_SOCKS_PROXY),
connectionPropertiesMap.get(SFSessionProperty.APPLICATION),
connectionPropertiesMap.get(SFSessionProperty.APP_ID),
connectionPropertiesMap.get(SFSessionProperty.APP_VERSION),
connectionPropertiesMap.get(SFSessionProperty.LOGIN_TIMEOUT),
connectionPropertiesMap.get(SFSessionProperty.NETWORK_TIMEOUT),
connectionPropertiesMap.get(SFSessionProperty.QUERY_TIMEOUT),
connectionPropertiesMap.get(SFSessionProperty.TRACING),
connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY_FILE),
!Strings.isNullOrEmpty(
(String) connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY_FILE_PWD))
? "***"
: "(empty)",
sessionParametersMap.get(CLIENT_STORE_TEMPORARY_CREDENTIAL));
HttpClientSettingsKey httpClientSettingsKey = getHttpClientKey();
logger.debug(
"connection proxy parameters: use_proxy={}, proxy_host={}, proxy_port={}, proxy_user={},"
+ " proxy_password={}, non_proxy_hosts={}, proxy_protocol={}",
httpClientSettingsKey.usesProxy(),
httpClientSettingsKey.getProxyHost(),
httpClientSettingsKey.getProxyPort(),
httpClientSettingsKey.getProxyUser(),
!Strings.isNullOrEmpty(httpClientSettingsKey.getProxyPassword()) ? "***" : "(empty)",
httpClientSettingsKey.getNonProxyHosts(),
httpClientSettingsKey.getProxyProtocol());
// TODO: temporarily hardcode sessionParameter debug info. will be changed in the future
SFLoginInput loginInput = new SFLoginInput();
loginInput
.setServerUrl((String) connectionPropertiesMap.get(SFSessionProperty.SERVER_URL))
.setDatabaseName((String) connectionPropertiesMap.get(SFSessionProperty.DATABASE))
.setSchemaName((String) connectionPropertiesMap.get(SFSessionProperty.SCHEMA))
.setWarehouse((String) connectionPropertiesMap.get(SFSessionProperty.WAREHOUSE))
.setRole((String) connectionPropertiesMap.get(SFSessionProperty.ROLE))
.setValidateDefaultParameters(
connectionPropertiesMap.get(SFSessionProperty.VALIDATE_DEFAULT_PARAMETERS))
.setAuthenticator((String) connectionPropertiesMap.get(SFSessionProperty.AUTHENTICATOR))
.setOKTAUserName((String) connectionPropertiesMap.get(SFSessionProperty.OKTA_USERNAME))
.setAccountName((String) connectionPropertiesMap.get(SFSessionProperty.ACCOUNT))
.setLoginTimeout(loginTimeout)
.setAuthTimeout(authTimeout)
.setUserName((String) connectionPropertiesMap.get(SFSessionProperty.USER))
.setPassword((String) connectionPropertiesMap.get(SFSessionProperty.PASSWORD))
.setToken((String) connectionPropertiesMap.get(SFSessionProperty.TOKEN))
.setPasscodeInPassword(passcodeInPassword)
.setPasscode((String) connectionPropertiesMap.get(SFSessionProperty.PASSCODE))
.setConnectionTimeout(httpClientConnectionTimeout)
.setSocketTimeout(httpClientSocketTimeout)
.setAppId((String) connectionPropertiesMap.get(SFSessionProperty.APP_ID))
.setAppVersion((String) connectionPropertiesMap.get(SFSessionProperty.APP_VERSION))
.setSessionParameters(sessionParametersMap)
.setPrivateKey((PrivateKey) connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY))
.setPrivateKeyFile((String) connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY_FILE))
.setPrivateKeyFilePwd(
(String) connectionPropertiesMap.get(SFSessionProperty.PRIVATE_KEY_FILE_PWD))
.setApplication((String) connectionPropertiesMap.get(SFSessionProperty.APPLICATION))
.setServiceName(getServiceName())
.setOCSPMode(getOCSPMode())
.setHttpClientSettingsKey(httpClientSettingsKey);
// propagate OCSP mode to SFTrustManager. Note OCSP setting is global on JVM.
HttpUtil.initHttpClient(httpClientSettingsKey, null);
SFLoginOutput loginOutput =
SessionUtil.openSession(loginInput, connectionPropertiesMap, tracingLevel.toString());
isClosed = false;
authTimeout = loginInput.getAuthTimeout();
sessionToken = loginOutput.getSessionToken();
masterToken = loginOutput.getMasterToken();
idToken = loginOutput.getIdToken();
mfaToken = loginOutput.getMfaToken();
setDatabaseVersion(loginOutput.getDatabaseVersion());
setDatabaseMajorVersion(loginOutput.getDatabaseMajorVersion());
setDatabaseMinorVersion(loginOutput.getDatabaseMinorVersion());
httpClientSocketTimeout = loginOutput.getHttpClientSocketTimeout();
masterTokenValidityInSeconds = loginOutput.getMasterTokenValidityInSeconds();
setDatabase(loginOutput.getSessionDatabase());
setSchema(loginOutput.getSessionSchema());
setRole(loginOutput.getSessionRole());
setWarehouse(loginOutput.getSessionWarehouse());
setSessionId(loginOutput.getSessionId());
setAutoCommit(loginOutput.getAutoCommit());
// Update common parameter values for this session
SessionUtil.updateSfDriverParamValues(loginOutput.getCommonParams(), this);
String loginDatabaseName = (String) connectionPropertiesMap.get(SFSessionProperty.DATABASE);
String loginSchemaName = (String) connectionPropertiesMap.get(SFSessionProperty.SCHEMA);
String loginRole = (String) connectionPropertiesMap.get(SFSessionProperty.ROLE);
String loginWarehouse = (String) connectionPropertiesMap.get(SFSessionProperty.WAREHOUSE);
if (loginDatabaseName != null && !loginDatabaseName.equalsIgnoreCase(getDatabase())) {
sqlWarnings.add(
new SFException(
ErrorCode.CONNECTION_ESTABLISHED_WITH_DIFFERENT_PROP,
"Database",
loginDatabaseName,
getDatabase()));
}
if (loginSchemaName != null && !loginSchemaName.equalsIgnoreCase(getSchema())) {
sqlWarnings.add(
new SFException(
ErrorCode.CONNECTION_ESTABLISHED_WITH_DIFFERENT_PROP,
"Schema",
loginSchemaName,
getSchema()));
}
if (loginRole != null && !loginRole.equalsIgnoreCase(getRole())) {
sqlWarnings.add(
new SFException(
ErrorCode.CONNECTION_ESTABLISHED_WITH_DIFFERENT_PROP, "Role", loginRole, getRole()));
}
if (loginWarehouse != null && !loginWarehouse.equalsIgnoreCase(getWarehouse())) {
sqlWarnings.add(
new SFException(
ErrorCode.CONNECTION_ESTABLISHED_WITH_DIFFERENT_PROP,
"Warehouse",
loginWarehouse,
getWarehouse()));
}
// start heartbeat for this session so that the master token will not expire
startHeartbeatForThisSession();
}
I have TTL index in collection fct_in_ussd as following
db.fct_in_ussd.createIndex(
{"xdr_date":1},
{ "background": true, "expireAfterSeconds": 259200}
)
{
"v" : 2,
"key" : {
"xdr_date" : 1
},
"name" : "xdr_date_1",
"ns" : "appdb.fct_in_ussd",
"background" : true,
"expireAfterSeconds" : 259200
}
with expiry of 3 days. Sample document in collection is as following
{
"_id" : ObjectId("5f4808c9b32ewa2f8escb16b"),
"edr_seq_num" : "2043019_10405",
"served_imsi" : "",
"ussd_action_code" : "1",
"event_start_time" : ISODate("2020-08-27T19:06:51Z"),
"event_start_time_slot_key" : ISODate("2020-08-27T18:30:00Z"),
"basic_service_key" : "TopSim",
"rate_event_type" : "",
"event_type_key" : "22",
"event_dir_key" : "-99",
"srv_type_key" : "2",
"population_time" : ISODate("2020-08-27T19:26:00Z"),
"xdr_date" : ISODate("2020-08-27T19:06:51Z"),
"event_date" : "20200827"
}
Problem Statement :- Documents are not getting removed from collection. Collection still contains 15 days old documents.
MongoDB server version: 4.2.3
Block compression strategy is zstd
storage.wiredTiger.collectionConfig.blockCompressor: zstd
Column xdr_date is also part of another compound index.
Observations as on Sep 24
I have 5 collections with TTL index.
It turns out that data is getting removed from one of the collection and rest of the collections remains unaffected.
Daily insertion rate is ~500M records (including 5 collections).
This observation left me confused.
TTL expiration thread run on single. Is it too much data for TTL to expire ?
Just for some context: Currently using AppSync + React + Apollo, and I'm trying to send 38 - 40 items to update via AppSync's Apache VTL.
I knew that DynamoDB limited the number of items inserted to 25 per request, but I thought AppSync didn't have those limitations. Guess I'm wrong, because my request fails whenever I send more than 25 items (26+).
Here's my VTL script:
#set($isTenantValid = false)
#foreach($tenant in $context.identity.claims["https://app.schon.io/tenants"])
#if($tenant == $ctx.args.tenantId)
#set($isTenantValid = true)
#end
#end
## Needs to verify if the employee has permission to insert students.
#if(!$isTenantValid)
$utils.unauthorized()
#end
#set($itemsToPut=[])
#set($pk="tenant:${ctx.args.tenantId}")
#set($userSK="tenant:${ctx.args.tenantId}:school-year:${ctx.args.schoolYear}")
#foreach($student in $ctx.args.students)
#set($studentId = "${util.autoId()}")
#set($sk="school-year:${ctx.args.schoolYear}:student:${studentId}")
#set($userPK="student:${studentId}")
#set($item = {
"pk": $pk,
"sk": $sk,
"id": $studentId,
"userPK": $userPK,
"userSK": $userSK,
"name": {
"firstName" : $student.name.firstName,
"lastName" : $student.name.lastName,
"fullName": "${student.name.firstName} ${student.name.lastName}"
},
"schoolYear": $ctx.args.schoolYear,
"createdAt": ${util.time.nowEpochSeconds()},
"updatedAt": ${util.time.nowEpochSeconds()},
"gender": $student.gender,
"retired": $student.retired
})
#if("${student.diseases}" != "")
$util.qr($item.add("diseases", $student.diseases))
#end
#if("${student.email}" != "")
$util.qr($item.add("email", $student.email))
#end
#if("${student.birthdate}" != "")
$util.qr($item.add("birthdate", $student.birthdate))
#end
$util.qr($itemsToPut.add($util.dynamodb.toMapValues($item)))
#end
{
"version" : "2018-05-29",
"operation" : "BatchPutItem",
"tables" : {
"SchonDB": $utils.toJson($itemsToPut)
}
}
In this case, will I need to offload it to a Lambda and do the send and retry logic per 25 items?
Can you try to contact DDB team to increase service limit for you?
I am currently working on a google cloud project in free trial mode. I have cron job to fetch the data from a data vendor and store it in the data store. I wrote the code to fetch the data couple of weeks ago and it was all working fine but all of sudden , i started receiving error "DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded" for last two days. I believe cron job is supposed to timeout only after 60 minutes any idea why i am getting the error?.
cron task
def run():
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
company_list = cron.rest_client.load(config, "companies", '')
if not company_list:
logging.info("Company list is empty")
return "Ok"
for row in company_list:
company_repository.save(row,original_data_source, actual_data_source)
return "OK"
Repository code
def save( dto, org_ds , act_dp):
try:
key = 'FIN/%s' % (dto['ticker'])
company = CompanyInfo(id=key)
company.stock_code = key
company.ticker = dto['ticker']
company.name = dto['name']
company.original_data_source = org_ds
company.actual_data_provider = act_dp
company.put()
return company
except Exception:
logging.exception("company_repository: error occurred saving the company
record ")
raise
RestClient
def load(config, resource, filter):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
current_page = 1
data = []
while True:
if (filter):
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return 'Sorry, xxxx API request failed', 500
db = json.loads(response.content)
if not db['data']:
break
data.extend(db['data'])
if db['total_pages'] == current_page:
break
current_page += 1
return data
except Exception:
logging.exception("Error occured with xxxx API request")
raise
I'm guessing this is the same question as this, but now with more code:
DeadlineExceededError: The overall deadline for responding to the HTTP request was exceeded
I modified your code to write to the database after each urlfetch. If there are more pages, then it relaunches itself in a deferred task, which should be well before the 10 minute timeout.
Uncaught exceptions in a deferred task cause it to retry, so be mindful of that.
It was unclear to me how actual_data_source & original_data_source worked, but I think you should be able to modify that part.
crontask
def run(current_page=0):
try:
config = cron.config
actual_data_source = config['xxx']['xxxx']
original_data_source = actual_data_source
data, more = cron.rest_client.load(config, "companies", '', current_page)
for row in data:
company_repository.save(row, original_data_source, actual_data_source)
# fetch the rest
if more:
deferred.defer(run, current_page + 1)
except Exception as e:
logging.exception("run() experienced an error: %s" % e)
RestClient
def load(config, resource, filter, current_page):
try:
username = config['xxxx']['xxxx']
password = config['xxxx']['xxxx']
headers = {"Authorization": "Basic %s" % base64.b64encode(username + ":"
+ password)}
if filter:
from_date = filter['from']
to_date = filter['to']
ticker = filter['ticker']
start_date = datetime.strptime(from_date, '%Y%m%d').strftime("%Y-%m-%d")
end_date = datetime.strptime(to_date, '%Y%m%d').strftime("%Y-%m-%d")
url = config['xxxx']["endpoints"][resource] % (ticker, current_page, start_date, end_date)
else:
url = config['xxxx']["endpoints"][resource] % (current_page)
response = urlfetch.fetch(
url=url,
deadline=60,
method=urlfetch.GET,
headers=headers,
follow_redirects=False,
)
if response.status_code != 200:
logging.error("xxxx GET received status code %d!" % (response.status_code))
logging.error("error happend for url: %s with headers %s", url, headers)
return [], False
db = json.loads(response.content)
return db['data'], (db['total_pages'] != current_page)
except Exception as e:
logging.exception("Error occured with xxxx API request: %s" % e)
return [], False
I would prefer to write this as a comment, but I need more reputation to do that.
What happens when you run the actual data fetch directly instead of
through the cron job?
Have you tried measuring a time delta from the start to the end of
the job?
Has the number of companies being retrieved increased dramatically?
You appear to be doing some form of stock quote aggregation - is it
possible that the provider has started blocking you?
I'd like to know if a record is present in a database, using a field different from the key field.
I've try the following code :
function start()
{
jlog("start db query")
myType d1 = {A:"rabbit", B:"poney"};
/myDataBase/data[A == d1.A] = d1
jlog("db write done")
option opt = ?/myDataBase/data[B == "rabit"]
jlog("db query done")
match(opt)
{
case {none} : <>Nothing in db</>
case {some:data} : <>{data} in database</>
}
}
Server.start(
{port:8092, netmask:0.0.0.0, encryption: {no_encryption}, name:"test"},
[
{page: start, title: "test" }
]
)
But the server hang up, and never get to the line jlog("db query done"). I mispell "rabit" willingly. What should I've done ?
Thanks
Indeed it fails with your example, I have a "Match failure 8859742" exception, don't you see it?
But do not use ?/myDataBase/data[B == "rabit"] (which should have been rejected at compile time - bug report sent) but /myDataBase/data[B == "rabit"] which is a DbSet. The reason is when you don't use a primary key, then you can have more than one value in return, ie a set of values.
You can convert a dbset to an iter with DbSet.iterator. Then manipulate it with Iter: http://doc.opalang.org/module/stdlib.core.iter/Iter