I use this tool to test my structured data:
https://search.google.com/structured-data/testing-tool
This is my page:
https://www.offersprive.eu/it/prod/Black%20Latte/56
If I try to check it, the response is empty
...But if I copy-and-paste my html content, the tool read it correctly
What can I do to read the link content? Is that a problem with React content loading?
Thanks.
I'm having this same issue.
Basically I have a static website (job board) built with React and want the job to show in the Google Job Network.
To do this the web page needs to contain structured data for Google to crawl.
I've tried some npm packages like react-structured-data which does get the data to appear in the header, but the data gets injected AFTER Google runs the scan, so the data does not yet exist for Google and therefore is returning zero results.
I have the same issue when I try using react-helmet.
I have the same issue when I try to append a script with the data to either the header or body upon ComponentDidMount or ComponentWillMount.
It's weird that it shows in the header when I do inspect elements but doesn't show in the header when I view page source.
Maybe one solution is server-side rendering, but there must be another way.
Possible answer
according to this answer, the Google might actually see the data, its just the testing tool doesn't see the data, which is quite a pain in the butt.
https://webmasters.stackexchange.com/questions/91064/structured-data-tool-doesnt-see-javascript-rendered-content
Also, this page:
https://developers.google.com/search/docs/guides/intro-structured-data#structured-data-format
says: Google can read JSON-LD data when it is dynamically injected into the page's contents, such as by JavaScript code or embedded widgets in your content management system.
Another potential solution, but less plausible because it still loads after the fact
Instead of using JSON-LD, use microdata attached to your elements, like if you go here:
https://schema.org/JobPosting
and click example 4, microdata tab
Then perhaps it will know to wait for your elements to load before scanning.
Testing these solutions now. I will update probably tomorrow as I am logging off soon.
UPDATE: I FOUND THE ANSWER
I have tried the above and it appears that the data is valid and Google does see it, it's just Google's Structured Data tool (and also some structured data chrome extensions) don't not see the data. This is because such tools scan the page before the data is loaded in. Other tools, wait until the data is loaded before scanning, and on those tools, it works.
For example: If you inspect your web page and click on the HTML element, and click "edit as html" and copy the entire html of you page, and paste that HTML as code into Google Structured Data tool, you should see that it now finds your data. Hopefully they fix that in the future but for now, you can at least try that to make sure your data is valid.
Another thing is, if you go to the Google Search Console and request the URL in question to be indexed by Google, then wait a day or so for it to process, then check back in on it. You will see that the Google Search Console DID find your data. So Google IS seeing your data, e.g. search console. It's just the broken Structured Data Tool from Google that is not seeing your data. Hopefully it is fixed soon.
For the record, how I was able to get this to work on my React app is by putting my data inside of Component Did Mount. E.g.
`componentDidMount() {
const googleJobNetworkScript = document.createElement("script");
googleJobNetworkScript.type = "application/ld+json";
googleJobNetworkScript.innerHTML = JSON.stringify({
"#context": "http://schema.org",
"#type": "JobPosting",
"baseSalary": "100000",
"jobBenefits": "Medical, Life, Dental",
"datePosted": "2011-10-31",
"description": "Description: ABC Company Inc. seeks a full-time mid-level software engineer to develop in-house tools.",
"educationRequirements": "Bachelor's Degree in Computer Science, Information Systems or related fields of study.",
"employmentType": "Full-time",
"experienceRequirements": "Minumum 3 years experience as a software engineer",
"incentiveCompensation": "Performance-based annual bonus plan, project-completion bonuses",
"industry": "Computer Software",
"jobLocation": {
"#type": "Place",
"address": {
"#type": "PostalAddress",
"addressLocality": "Kirkland",
"addressRegion": "WA"
}
},
"occupationalCategory": "15-1132.00 Software Developers, Application",
"qualifications": "Ability to work in a team environment with members of varying skill levels. Highly motivated. Learns quickly.",
"responsibilities": "Design and write specifications for tools for in-house customers Build tools according to specifications",
"salaryCurrency": "USD",
"skills": "Web application development using Java/J2EE Web application development using Python or familiarity with dynamic programming languages",
"specialCommitments": "VeteranCommit",
"title": "Software Engineer",
"workHours": "40 hours per week"
});
document.head.appendChild(googleJobNetworkScript);
}`
You can also append the child to document.body instead of document.head. Either should work. Your choice.
You could also use react-helmet, or react-structured-data from NPM, which some other people do, but I didn't see the need, since the above seems to work fine.
You can find the other structured data types at schema.org
Remember to either submit a new site map to Google or submit your site to the Google indexing API each time you have a new webpage or webpage with updated content that you would like Google to scan.
This post is long but I hope it covered all the bases and I hope it helps.
Having had a brief look at how your website loads; I believe you are using React Helmet. The issue is with this tool (and vanilla React in general) is that the page must be loaded and javascript run in order for your headers to be set and your content loaded.
Most tools that crawl webpages don't run javascript, Google now does on its main crawler I believe but they don't seem to have updated all their various tools. Facebook, Twitter, Bing etc, I believe it's patchy at best.
The answer is probably either Gatsby or Next.js; both provide ways of rendering your React code on the server or during the build so that all the headers and content are sent when your page is first called. You can write your own server side rendering methods but these solutions provide all that leg work.
This removes the need for a crawler to be running javascript; so you get index properly! For the sake of interest, when I ran into this issue I went for Gatsby.
A quick work around is do what you have your other links / meta tags; write them into the base index.html file. However, this obviously can't be updated per page etc...
Hope that shines some light on it :)
Google Structured Data Tool don't read my React Site content
i think its reading correctly,but not executing the js,that is google data tool use crawler to fetch the page,which is the source code of your page,to see on what content google tool is working ,just open your page and goto view page source,you can see google tool is working on this source,not on what generated by the react.
Is that a problem with React content loading?
this is because react components are rendered after the page load.and your content is not visible to crawler as webcrawler do not execute javascript.
i hope this will clear your doubt.
I would suggest you to have a look at React Helmet package, which can help you manage your <head> and structured data.
I'm looking for a way to create a search box in wordpress, where visitors can search a number from the database. Is this possible? I have several package numbers in my database. I want to give my visitors the ability to search for their package number and request the information that comes with the number.
What you want to do can be done.
I suggest a different approach than using wp-exec. (I just looked at wp-exec website, and that plugin was created for WordPress 1.5, which means it hasn't been updated in about 5 years).
The content you want to display exists entirely outside of WordPress. I suggest you use a custom page template - see
http://codex.wordpress.org/Pages#Creating_Your_Own_Page_Templates
In this case you would not use WordPress posts or pages or custom post types. On the custom page template you would write (or have written if you don't have the knowhow to do it yourself) PHP code to extract the info from the database and display it on a page.
For pages like that you would be using WordPress only as a container within which to display the results - they custom page would appear in the site Nav, The page of results would use the site's theme to display so it looks like the rest of the site.
But the code to display from the database would not use the WordPress loop. It would be PHP / MySQL data retrieval and display code.
I really doubt you will find a plugin that lets you display results from an external database, formatted the way you want them to appear. The reason is every external database is different, has different tables and table structures. And no two sites will want the external data visually displayed in the same way. So there is little generalization to encapsulate in a plugin as everyone wants it different.
I've created pages on some sites along the lines of what you want to do thus I know it can be done. But it requires writing custom code.
I'm trying to implement PageRank algorithm on a set of web pages, for that I need a sample dataset of web pages, and the web graph corresponding to them, this web graph represents the links between the pages that the data set contains.
I need the web graph so I can get the transition matrix and do the calculation needed. Example:
URL1 -> URL2
URL3390 -> URL5
URLxxxx is an id, somehow mapped to the corresponding web page
My question is: how/where can I get this resource (I've tried many links on the internet but nothing really helps), I would also like it to be not of a very large size, (internet connection limitation), if I can't have this as it is, could sou give me some advice on what I should do?
Update: for people who may consider this off topic, and they may be right, networks like Software Recommendation or on Computer Science, don't even have corresponding tags, and doesn't really fit the kind of this question, I appreciate your help.
May be Site Visualizer is the tool you're looking for. The app has the feature to generate visual sitemap.
Download and install the app (Standard or Pro version), click Create new project toolbutton, type the URL of the website you need to crawl, and then click Start button.
After the crawling is finished, click Draw button on the Visual Sitemap tab. Graph of the website will be drawn as a set of pages (rectangles) and links (lines with arrows). Click on a box to select the certain page and highlight its outbound links:
Dataset of all links of the website you can get by using All Links report (on the Reports tab). 'From URL' and 'To URL' columns are what you need.
Besides of that, you can represent a dataset of pages or links of the crawled website by using your particular SQL query. For instance, go to the Database tab, type the following query and click Execute toolbutton:
SELECT * FROM links WHERE link_type='A'
The resultset will contain only A-tag links, excluding images, CSS files, JS, etc.
The program has full-featured 30-days trial period, so you can carry out your tasks for free.
you might try searching for datasets used in supplementary information for PageRank papers. Here's an example:
this paper: http://langvillea.people.cofc.edu/ReorderingPageRank.pdf
uses this dataset:
http://www.cs.cornell.edu/Courses/cs685/2002fa/data/gr0.California
which supposedly contains 9,664 nodes and 16,773 links. The links are at the end of the file and appear to be in a connection format similar to what you're looking for.
from this page (which also has other datasets):
http://www.cs.cornell.edu/Courses/cs685/2002fa/
here's a few other pages that aggregate network datasets:
http://snap.stanford.edu/data/, see particularly
http://snap.stanford.edu/data/web-Stanford.html
http://www.datawrangling.com/some-datasets-available-on-the-web
http://networkdata.ics.uci.edu/resources.php
good luck!
Is it possible to store harvested data from a website(Nestoria) upon implementing their APIs using PHP?
I am able to extract the data using PHP and it displays the result on a web browser, but I need to dump or save them into my PostGIS database. (I am using XAMPP and PostGIS on windows 7)
Most companies wouldn't have a problem with you doing that, for instance Ebay's API. However, as Mapperz pointed out - Nestoria's terms require you not to compete with them for originality of the content. So, any data that comes from their API and stored in your database should not be able to be indexed by search engines.
This isn't as difficult to comply with as you might think. You could have the content loaded through an iframe that uses the "NoIndex, NoFollow" meta tag attribute in the HTML of the page being loaded, or pull the content from the database to your page's DOM using AJAX/JavaScript after the page has loaded.
I personally would go with the second option (AJAX).
I'm looking for a solution to map one or all of the following Flickr, twitter, vimeo, by exif, keywords or whatever to google maps. I'm trying to show a map of my location from updating social sites.
If each of the services you want supports GeoRSS, then you can actually build such a map with zero coding whatsoever! This is because Google Maps supports mapping a GeoRSS feed directly. All you have to do is type the URL of the RSS feed with the GeoRSS data within, into the box on Google maps. Here's an example of the feed from my What's The Harm? website mapped in Google Maps.
Now you mention several services there, each of which whould presumably have its own GeoRSS feed. What you would need to do is merge the feeds together before handing the resulting feed to Google Maps. There are a variety of ways to do this, one quick point-and-click way is via Yahoo Pipes. Search for "merge feed" or "merge RSS" on there and you can find many examples that you can copy and modify.
Yahoo Pipes also has functionality to add GeoRSS coding to feeds that don't have it already. You could use that to bring in data like blog posts and son on that might not be GeoRSS. Look under "Operators" for the "Location Extractor" widget.
As for the websites you mentioned:
Flickr: Yes. Make sure you map your photos, and use the feed marked "geoFeed".
twitter: There's a service called GeoTwitter that can add this for you.
vimeo: It doesn't appear they support it out of the box.
There's a nice greasemonkey script to ease geotagging on Flickr.