Problems downloading a page updated everyday - google-app-engine

I'm developing an application on GAE that fetches a web page and searches it for a link.
This page gets updated every morning, so a cron job is executed each morning every 15 minutes for a couple of hours, to obtain current day's page.
Here's the problem: if at the first execution of the cron job the application finds the older page (yesterday's one), it keeps fetching that one, although a new page is available at the same URL.
Seems that a cache is used somewhere, but I can't disable it.
The code that the application uses for downloading the page is simply Java I/O:
InputStream input = null;
ByteArrayOutputStream output = null;
HttpURLConnection conn = null;
URL url = new URL("http://www.page.url.net");
try {
conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(0);
conn.setUseCaches(false);
int httpResponseCode = conn.getResponseCode();
if (httpResponseCode == HttpURLConnection.HTTP_OK) {
input = conn.getInputStream();
output = writeByteArrayOutputStreamFromInputStream(input);
} else {
throw new IOException("response code " + httpResponseCode);
}
} finally {
if (input != null) {
output.close();
conn.disconnect();
}
}
What's wrong?

In order to avoid caching, I suggest to use this simple trick: add a "fake" query parameter to the end of the query string, for example if the page you are fetching is
http://www.page.url.net
add a parameter named dummy= so the url becomes:
http://www.page.url.net?dummy=2013-05-25
Just be sure the "dummy" paramater is not actually interpreted by the remote server.
Hope this helps.

Related

leveraging Apache solr streaming capability to send millions of records as part of REST API

My problem statement goes like this
" I want to leverage apache solr 8.6.1 streaming capability to send millions of records as part of spring boot REST API call. I cannot directly call solr end points due to security restrictions and also some business logic in place. So I have written the code through which I am able to read the data as stream and push it to spring boot outputstream."
When I am making the API call everytime It goes through the following code
StreamFactory factory = new StreamFactory().withCollectionZkHost(COLLECTION_NAME,ZK_HOST);
SolrClientCache solrClientCache = new SolrClientCache(httpClient);
StreamContext streamContext = new StreamContext();
streamContext.setSolrClientCache(solrClientCache);
String expressionStr = String.format(SEARCH_EXPRESSION,COLLECTION_NAME);
StreamExpression expression = StreamExpressionParser.parse(expressionStr);
TupleStream stream;
try {
stream = new CloudSolrStream(expression, factory);
stream.setStreamContext(streamContext);
stream.open();
Tuple tuple = stream.read();
int count = 0;
while (!tuple.EOF) {
String jsonStr = ++count + " " + tuple.jsonStr() + "\r\n";
outputStream.write(jsonStr.getBytes());
outputStream.flush();
tuple = stream.read();
}
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
and it tries to connect to zookeeper at stream.open and it is taking some time.
is it possible to optimize this code so that everytime it doesn't have to connect to zookeeper and we can keep it ready before hand only.
because it is a stream that's why we have to open and close the stream with every call.
also how it will behave in the multiuser scenario.
Can anyone throw some light on it and how we can optimize it further

How to handle "Accept Cookies" message when an automation script runs

I am writing a selenium test script which navigates to the url, say https://www.flipkart.com/ (This is just an example website)
When you first time navigate to the home page, a message regarding Cookies is displayed and there is a button "Accept Cookies".
Whenever my selenium script runs and navigates to the home page, every time it gets the cookie message as described earlier. My question is what needs to be done so that the script will not encounter such cookie consent message?
I have managed to store the cookies in in a file. It is as below
_gut_UB-97923818-1;1;.mycompany.com;/;Fri Mar 29 18:12:07 EET 2019;false
I have also tried to set its expiry with below code
public void retrieveCookie()
{
try{
File file = new File("Cookie.data");
FileReader fileReader = new FileReader(file);
BufferedReader Buffreader = new BufferedReader(fileReader);
String strline;
while((strline=Buffreader.readLine())!=null){
StringTokenizer token = new StringTokenizer(strline,";");
while(token.hasMoreTokens()){
String name = token.nextToken();
String value = token.nextToken();
String domain = token.nextToken();
String path = token.nextToken();
Date expiry = null;
String val;
if(!(val=token.nextToken()).equals("null")){ //Thu Mar 28 23:26:39 EET 2019
expiry = new Date(val);
}
Boolean isSecure = new Boolean(token.nextToken()).booleanValue();
Cookie ck = new Cookie(name,value,domain,path,expiry,isSecure);
BaseDriver.getDriver().manage().addCookie(ck); // This will add the stored cookie to our current session
}
}
}catch(Exception ex){
ex.printStackTrace();
}
BaseDriver.getDriver().get("https://www.flipkart.com/");
}
However I get java.lang.IllegalArgumentException exception at line,
expiry = new Date(val);
It is because it is not able to parse the date
Can someone share the code so that the date can be parsed?
My only intention is whenever the test script runs, it should not encounter the cookie consent message. If there is any other way to achieve this, please suggest.

How can you run a report from the ReportServer database without building subscriptions?

I'd like to build a back end system that allows me to run each report every night and then query the execution log to see if anything failed. I know you can build out subscriptions for these reports and define parameters etc but is there a way to execute each report from the ReportServer database using TSQL without building out each subscription?
I understand that your overall goal is that you want to automate this and not have to write a subscription for every report. You say you want to do it in T-SQL, but is that required to meet your overall goal?
If you can accept, say .Net, then you can use the System.Data.SqlClient.SqlConnection and related classes to query your report server catalog and fetch a listing of all your reports.
Then you can use System.Net.WebClient or similar tool to attempt to download a pdf of your report. From there you can either read your execution log, or catch the error in the .Net Code.
EDIT
Well, since you accepted the answer, and it seems you may go this route, I'll mention that if you're not familiar with .net, it may be a long path for you. Here's a few things to get you started.
Below is a c# function utilizing .Net that will query the report catalog. If safeImmediate is set to true, it will only capture reports that can be run immediately, as in there are no parameters or the defaults cover the parameters.
IEnumerable<string> GetReportPaths(
string conStr,
bool safeImmediate // as in, you can exexute the report right away without paramters
) {
using (var con = new SqlConnection(conStr))
using (var cmd = new SqlCommand()) {
cmd.Connection = con;
cmd.CommandText = #"select path from catalog where type=2";
con.Open();
if (safeImmediate)
cmd.CommandText = #"
select path
from catalog
cross apply (select
params = convert(xml, Parameter).value('count(Parameters/Parameter)', 'int'),
defaults = convert(xml, Parameter).value('count(Parameters/Parameter/DefaultValues/Value)', 'int')
) counts
where type = 2
and params = defaults
and path not like '%subreport%' -- this is not standard. Just works for my conventions
";
using (var rdr = cmd.ExecuteReader())
while (rdr.Read())
yield return rdr["path"].ToString();
}
}
The next function will download a report given proper paths passed to it:
byte[] DownloadReport (
WebClient wc,
string coreUrl,
string fullReportPath,
string parameters = "" // you won't use this but may come in handy for other uses
) {
var pathToViewer = "ReportServer/Pages/ReportViewer.aspx"; // for typical ssrs installs
var renderOptions = "&rs:Format=pdf&rs:Command=Render"; // return as pdf
var url = $#"{coreUrl}/{pathToViewer}?{fullReportPath}{parameters}{renderOptions}";
url = Uri.EscapeUriString(url); // url's don't like certain characters, fix it
return wc.DownloadData(url);
}
And this utilizes the functions above to find what's succeeding and whats not:
var sqlCon = "Server=yourReportServer; Database=ReportServer; Integrated Security=yes"; // or whatever
var ssrsSite = "http://www.yourSite.org";
using (var wc = new WebClient()) {
wc.UseDefaultCredentials = true; // or whatever
int loops = 3; // get rid of this when you're ready for prime-time
foreach(var path in GetReportPaths(sqlCon, true)) {
try {
DownloadReport(wc, ssrsSite, path);
Debug.WriteLine($"Success with: {path}");
}
catch(Exception ex) { // you might want to get more specific
Debug.WriteLine($"Failed with: {path}");
}
if (loops-- == 0)
break;
}
}
Lots to learn, but it can be very beneficial. Good luck.

Selenium login time- Am I calculating right?

I have searched quite a bit on how to get login time but could not find any definitive answer. I do not want to introduce any timers in my scripts. My aim is to find how much time it took exactly to login and logout in my selenium script.
I have following so far-
I am getting start time and finish time and getting the login time as follows-
public void testLogin(){
String csvFile = "C:\Users\users.csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
// use comma as separator
String[] value = line.split(cvsSplitBy);
WebDriver driver = new HtmlUnitDriver();
//WebDriver driver = new FirefoxDriver();
driver.get("www.test.com");
long start=System.currentTimeMillis();
driver.findElement(By.id("txt-username")).sendKeys(value[0]);
driver.findElement(By.id("pwd-password")).sendKeys(value[1]);
driver.findElement(By.id("login-widget-submit")).click();
long finish=System.currentTimeMillis();
long OverallTime =finish-start;
System.out.println("Total time for login -"+OverallTime);
driver.close()
If you want to only measure the time taken to log in (not to load the login page and then log in), you will want to add a WebDriverWait after driver.get() and wait for a specific element to load to ensure that the page is fully loaded. You will want to add another wait after clicking Submit to ensure that the page after login has loaded completely. That is a better test of login time. What you have now is you start the timer potentially before the login page is loaded and then stop the timer when you click the Submit button... but the user isn't actually logged in yet.
I personally use the StopWatch class that's a part of apache.commons to do timings.

Should I Be Using Async Calls?

I have a c# application which reads a table of roughly 1500 site url's of clients who have been with the company since we started. Basically, I am running whois queries on these url's and seeing if they are still a client or not. The application works but it takes roughly an hour to complete. Would I be better off using async whois queries and how much time roughly could I save.
Here is a sample whois query block of code that I am using.
Also if anyone has any tips on how to improve this code or run async commands could ye please help me out as I'm only an intern. Thanks
string whoisServer = "whois.markmonitor.com";
string data;
try
{
TcpClient objTCPC = new TcpClient(whoisServer, 43);
string strDomain = domainName + "\r\n";
byte[] arrDomain = Encoding.ASCII.GetBytes(strDomain);
Stream objStream = objTCPC.GetStream();
objStream.Write(arrDomain, 0, strDomain.Length);
StreamReader objSR = new StreamReader(objTCPC.GetStream(),
Encoding.ASCII);
//return objSR.ReadLine();
//return (Regex.Replace(objSR.ReadToEnd(),"\n","<br>")).ToString();
using (StreamReader reader = new StreamReader(objTCPC.GetStream(), Encoding.ASCII))
{
data = (reader.ReadToEnd());
}
//test.Add(objSR.ReadLine());
objTCPC.Close();
}
catch
{
data = "Not Found";
}
return data;
Well, the short answer is certainly yes.
Since you are making multiple, completely independent lookups, you have everything to gain by running them in parallel, asynchronously.
There are several ways to do this. The options depend on what version of .net you're in.
As you would guess, there are many examples.
Check these out right here on SO.
Avaliable parallel technologies in .Net
Multi threaded file processing with .NET
When to use a Parallel.ForEach loop instead of a regular foreach?

Resources