Is there any way that I can make Gatling to follow a link after page load, On click this link redirects to a new page which gives login page ?
Gatling doesn't automate web browsers.
You have to parse the network traffic in order to find where the link comes from.
Such parsing really depends on how your application works.
If HTML content is generated server side, there's an example based on css selectors in the tutorial.
If it's generated in Javascript, you probably have to parse some JSON payload, eg with a JMESPath check.
I use this tool to test my structured data:
https://search.google.com/structured-data/testing-tool
This is my page:
https://www.offersprive.eu/it/prod/Black%20Latte/56
If I try to check it, the response is empty
...But if I copy-and-paste my html content, the tool read it correctly
What can I do to read the link content? Is that a problem with React content loading?
Thanks.
I'm having this same issue.
Basically I have a static website (job board) built with React and want the job to show in the Google Job Network.
To do this the web page needs to contain structured data for Google to crawl.
I've tried some npm packages like react-structured-data which does get the data to appear in the header, but the data gets injected AFTER Google runs the scan, so the data does not yet exist for Google and therefore is returning zero results.
I have the same issue when I try using react-helmet.
I have the same issue when I try to append a script with the data to either the header or body upon ComponentDidMount or ComponentWillMount.
It's weird that it shows in the header when I do inspect elements but doesn't show in the header when I view page source.
Maybe one solution is server-side rendering, but there must be another way.
Possible answer
according to this answer, the Google might actually see the data, its just the testing tool doesn't see the data, which is quite a pain in the butt.
https://webmasters.stackexchange.com/questions/91064/structured-data-tool-doesnt-see-javascript-rendered-content
Also, this page:
https://developers.google.com/search/docs/guides/intro-structured-data#structured-data-format
says: Google can read JSON-LD data when it is dynamically injected into the page's contents, such as by JavaScript code or embedded widgets in your content management system.
Another potential solution, but less plausible because it still loads after the fact
Instead of using JSON-LD, use microdata attached to your elements, like if you go here:
https://schema.org/JobPosting
and click example 4, microdata tab
Then perhaps it will know to wait for your elements to load before scanning.
Testing these solutions now. I will update probably tomorrow as I am logging off soon.
UPDATE: I FOUND THE ANSWER
I have tried the above and it appears that the data is valid and Google does see it, it's just Google's Structured Data tool (and also some structured data chrome extensions) don't not see the data. This is because such tools scan the page before the data is loaded in. Other tools, wait until the data is loaded before scanning, and on those tools, it works.
For example: If you inspect your web page and click on the HTML element, and click "edit as html" and copy the entire html of you page, and paste that HTML as code into Google Structured Data tool, you should see that it now finds your data. Hopefully they fix that in the future but for now, you can at least try that to make sure your data is valid.
Another thing is, if you go to the Google Search Console and request the URL in question to be indexed by Google, then wait a day or so for it to process, then check back in on it. You will see that the Google Search Console DID find your data. So Google IS seeing your data, e.g. search console. It's just the broken Structured Data Tool from Google that is not seeing your data. Hopefully it is fixed soon.
For the record, how I was able to get this to work on my React app is by putting my data inside of Component Did Mount. E.g.
`componentDidMount() {
const googleJobNetworkScript = document.createElement("script");
googleJobNetworkScript.type = "application/ld+json";
googleJobNetworkScript.innerHTML = JSON.stringify({
"#context": "http://schema.org",
"#type": "JobPosting",
"baseSalary": "100000",
"jobBenefits": "Medical, Life, Dental",
"datePosted": "2011-10-31",
"description": "Description: ABC Company Inc. seeks a full-time mid-level software engineer to develop in-house tools.",
"educationRequirements": "Bachelor's Degree in Computer Science, Information Systems or related fields of study.",
"employmentType": "Full-time",
"experienceRequirements": "Minumum 3 years experience as a software engineer",
"incentiveCompensation": "Performance-based annual bonus plan, project-completion bonuses",
"industry": "Computer Software",
"jobLocation": {
"#type": "Place",
"address": {
"#type": "PostalAddress",
"addressLocality": "Kirkland",
"addressRegion": "WA"
}
},
"occupationalCategory": "15-1132.00 Software Developers, Application",
"qualifications": "Ability to work in a team environment with members of varying skill levels. Highly motivated. Learns quickly.",
"responsibilities": "Design and write specifications for tools for in-house customers Build tools according to specifications",
"salaryCurrency": "USD",
"skills": "Web application development using Java/J2EE Web application development using Python or familiarity with dynamic programming languages",
"specialCommitments": "VeteranCommit",
"title": "Software Engineer",
"workHours": "40 hours per week"
});
document.head.appendChild(googleJobNetworkScript);
}`
You can also append the child to document.body instead of document.head. Either should work. Your choice.
You could also use react-helmet, or react-structured-data from NPM, which some other people do, but I didn't see the need, since the above seems to work fine.
You can find the other structured data types at schema.org
Remember to either submit a new site map to Google or submit your site to the Google indexing API each time you have a new webpage or webpage with updated content that you would like Google to scan.
This post is long but I hope it covered all the bases and I hope it helps.
Having had a brief look at how your website loads; I believe you are using React Helmet. The issue is with this tool (and vanilla React in general) is that the page must be loaded and javascript run in order for your headers to be set and your content loaded.
Most tools that crawl webpages don't run javascript, Google now does on its main crawler I believe but they don't seem to have updated all their various tools. Facebook, Twitter, Bing etc, I believe it's patchy at best.
The answer is probably either Gatsby or Next.js; both provide ways of rendering your React code on the server or during the build so that all the headers and content are sent when your page is first called. You can write your own server side rendering methods but these solutions provide all that leg work.
This removes the need for a crawler to be running javascript; so you get index properly! For the sake of interest, when I ran into this issue I went for Gatsby.
A quick work around is do what you have your other links / meta tags; write them into the base index.html file. However, this obviously can't be updated per page etc...
Hope that shines some light on it :)
Google Structured Data Tool don't read my React Site content
i think its reading correctly,but not executing the js,that is google data tool use crawler to fetch the page,which is the source code of your page,to see on what content google tool is working ,just open your page and goto view page source,you can see google tool is working on this source,not on what generated by the react.
Is that a problem with React content loading?
this is because react components are rendered after the page load.and your content is not visible to crawler as webcrawler do not execute javascript.
i hope this will clear your doubt.
I would suggest you to have a look at React Helmet package, which can help you manage your <head> and structured data.
I am trying to scrape data from a public site PINNACLE page.
The page contains a div which when I fetch from Apps Script shows empty. I understand that it is because the data is loaded over ajax after page is loaded.
On checking the page, I came to know that they using AngularJS. I checked using Chrome Developer Tools but could not find the ajax URL for the same.
Could anyone help me on this. I need to fetch the data shown in table below using Google Apps Script.
Thank you all for looking into it.
I figured out the URL as shown in below image. The URL is https://www.pinnacle.com/webapi/1.17/api/v1/GuestLines/Deadball/4/487?callback=angular.callbacks._0
I have a personal project which consumes my free time and effort for about a year without significant profit. I have problems with it appearance in Google and would really appreciate to get help here.
This project (http://yuppi.com.ua - similar to craiglist in US) is WEB-based AngularJS 1.2 application that uses PHP rest API hosted on GoDaddy. And in order to make this application popular it have to be very visible in internet and very searchable in Google and users have to be able to share pages via social networks or skype.
According to Google specification, google crawlers doesn't run javascript to get content of a web page before index, so I've added _escaped_fragment_ page that displays content of web page without javascript. For example:
Page: http://yuppi.com.ua/#!/items/sub/18/_
Dirty : yuppi.com.ua/?_escaped_fragment_=/items/sub/18/_
This dirty page will be redirected here where google will see content.
http://yuppi.com.ua/server/crawler_proxy/routee.php?path=/items/sub/18/
So basically I have two versions on HTML file for that page. One version is the one that available to users, which has styles, a lot more HTML tags etc. And the second is the version for Google crawler - very light-weight without any styles. And I am expecting to see clean link to my site in Google, not dirty.
So, If to search all links to a web site in Google you will see that one of the links displays it's "dirty" state.
Another problem is sharing links in Skype.
When I send a link to someone, I am expecting that this link will be transformed to thumbnail image but it is not happens. Instead I see ungly link to my web site.
Please help me to understand how to make happy everyone: users, google crawler, GoDaddy and me.
I was encountering the same problems last year with a big project and we ended to use : https://prerender.io/.
It's a prerendering system that work with a phantomjs browser to detect bot request and render a full html template. It does also instanciate a cache service to not render again a template that haven't change.
Hope it help's.
I used this https://stackoverflow.com/a/20537386/744040 solution to define the content of the Facebook sharing window.
I can't find a similar solution for Google +.
I am working on this website http://sportnews.codeskeleton.com/ and if you go to an article page and try to G+, it'll take a wrong image as thumbnail and for the title it takes the AngularJS code : Sportnew7-24 - <% Page.title() %> (I use <% instead of {{ to avoid collision with blade ).
I tried the solution with schema.org but the google popin seems to ignore them.
Thank you
Google+ share details are populated by Google making an HTTP request to the specified URL and parsing the HTML. Since you are making a SPA site Google is just parsing the default template details without executing your JavaScript. If you want Google+ sharing details to work you will have to render the HTML with the title/image in it.
Google does have a method for telling crawlers about AJAX pages and how to crawl them but I can't speak to if the Google+ bot supports the standard.