Schema.org - Something's afoot..

There's speculation that Schema.org may have been compromised in some manner. A number of people (including myself) have noticed some very spammy links showing up in Webmaster tools as Itemtypes under Structured Data.

Rather than displaying (for example) http://schema.org/SiteNavigationElement, there's an itemtype pointing to various URLs on domains including www.yalwa.com, locanto.fr and askalo.fr. The only thing any of the sites have in common is their use of Schema.org.

Curiously, you can also reproduce the issue using the Structured Data Testing Tool and entering a small HTML snippet. The issue only seems to be affecting those in Europe though, with US users only able to reproduce by using an EU based proxy.

It appears to happen about 1 time in every 5 requests, and you'll need to modify the snippet slightly to be able to resubmit the form (I simply clicked after the closing span and inserted a space each time).

Try inserting the following

<html>
<div itemscope itemtype="http://schema.org/SiteNavigationElement">
<span itemprop="name">Badgers</span>
</div>
</html>

Submit, and then re-submit. If you're in Europe, within 5 requests you should see the itemtype change from

http://schema.org/SiteNavigationElement

to something like

http://denhaag.yalwa.nl/id_104436871/jl-pedicure.htm#__sid=12

On the Google thread, a poster claiming to be from yalwa says it's nothing to do with them. I'm reasonably inclined to believe that given that it's not always a yalwa URL being generated (though they do seem to form the vast majority).

More concerningly, some of the URLs returned link to Adult material. There's a risk that a site may be wrongly classified by Google if they're taking the content of the linked 'schema' into account.

So what's happened? Has someone found a way to exploit the ability to 'extend' schema.org schemas? Has someone compromised schema.org or is something else going on here?

Aside from using 'Fetch as Google' and the Rich Snippets Testing Tool, it appears to be impossible to reproduce. Using Google's User-Agent isn't enough (nor that of the Rich Snippet testing tool) so if it is a compromise, there's some IP filtering going on as well.

You can follow the thread here.

 

Update

It looks like the issue is now resolved, apparently it only affected the display of the data and hadn't made it's way into the search snippets etc.