Obfuscate profile links

Changing the way in which profile links output is part of broader approach to 'solving' spam profiles on wordpress.org. See https://www.jonoalderson.com/wordpress/wordpress-org-toxic-profile-spam/ for long-form research and rationale.

This part of the solution requires us to:

#1 @Ipstenu
21 months ago

Would it be feasible to only disallow profiles that are blocked or flagged? We have the ability to mark an account as blocked, and logically that's the most common spammy profile we'll see.

Your example of profiles.wordpress.org/phathaiophunhuan/ is a banned account, so logically that should show ... nothing?

21 months ago

#3 @jonoaldersonwp
21 months ago

No, crawl prevention can only be handled via robots.txt, and it'd be impractical (and technically challenging) to adapt that dynamically based on bad profiles. I'm also not if/how sure this would be advantageous for us.

RE: banned accounts; as per my post, I expected some of the examples I highlighted to have been 'fixed' since I discovered them. See also, #4632

#4 @dd32
21 months ago

I personally disagree with using something like /out/ as a redirect, however acknowledge that it's a easy way to deal with the years of spam links that we've accumulated that we're otherwise unable to detect.

As part of the other profiles changes, I've hidden the URL for banned accounts which helps a little bit.

Would using https://profiles.wordpress.org/out-redirect/$user suffice here though? AFAICT it doesn't need to be signed, or otherwise secret as long as it's within a directory that's not-a-user (such as the user 'out') and that can be blocked in the robots.txt?

#5 @jonoaldersonwp
21 months ago

Yep! :)

That example is fine - we're good as long as we're:

  • Obfuscating the URL so that automated spam tools can't (as easily) detect that their link is on the page.
  • Using a structure which we can disallow search engines from following in the robots.txt file

We could even simplify to something like /?redirect_profile_link.php?user=$user, if that's preferable.

#6 @dd32
21 months ago

Fixed in r15376-dotorg.

#7 @dd32
21 months ago

$ curl -s https://profiles.wordpress.org/dd32/ | grep 'dd32.id.au'
Website: <strong><a href="https://profiles.wordpress.org/website-redirect/dd32" title="https://dd32.id.au/" rel="nofollow">dd32.id.au</a></strong>

$ curl -Is --referer https://profiles.wordpress.org/dd32/ https://profiles.wordpress.org/website-redirect/dd32 | grep location
location: https://dd32.id.au/

$ curl https://profiles.wordpress.org/robots.txt
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /website-redirect/

#8 @jonoaldersonwp
21 months ago

Could you remove the title attribute from the link, too, please?

#9 @dd32
21 months ago

