Making WordPress.org

Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#4696 closed defect (bug) (fixed)

Hide profiles.wordpress.org from non-logged in users

Reported by: jdembowski's profile jdembowski Owned by: dd32's profile dd32
Milestone: Priority: normal
Component: Profiles Keywords:
Cc:

Description

When a user is not logged into WordPress.ORG, I want the profile page on profiles.wordpress.org to only display the user's display name, maybe their Contribution History and nothing else. No bio, no links, nada.

This ask is similar to how https://wordpress.org/support/users/[userid]/ is handled. If you are not logged into WordPress.ORG then the forum profile does not show a description, any links or anything that a spammer can leverage for SERP. That's good as a volunteer community forum should not be about promotion.

For example https://wordpress.org/support/users/jdembowski/ only displays useful information if you are logged already.

This is not true for profiles.wordpress.org such as my profile page.

https://profiles.wordpress.org/jdembowski/

This makes profiles page is an attractive space for spammers and a good reason for spammers to sign up for a forum account. Use your favorite search engine and do a quick look for site:profiles.wordpress.org "packers" and you will see some examples.

Unless one of those accounts post on the forums, trac or a WordPress.ORG site then the accounts are not banned and the profiles page remains in SERP. Until they post and until someone notices that, the page remains. Once they are banned then the page becomes hidden. I want that to be the default view for users that are not logged in.

I understand that people who contribute to WordPress want that search engine magic™. But I think there are many more spammy pages on profiles.wordpress.org from temp email providers just to get that page. Those bad actors will never post in the forums, on trac or one of the make blogs. They do not need to as they already have what they want.

Change History (56)

#1 @jonoaldersonwp
5 years ago

I really dislike this, and, I'd be keen to reverse the behaviour in the /support/users/ section.
I'd rather we find better ways to prevent spam submissions, and to encourage useful content creation. Hiding the content from Google harms us.

#2 in reply to: ↑ description ; follow-up: @SergeyBiryukov
5 years ago

Replying to jdembowski:

I understand that people who contribute to WordPress want that search engine magic™. But I think there are many more spammy pages on profiles.wordpress.org from temp email providers just to get that page. Those bad actors will never post in the forums, on trac or one of the make blogs. They do not need to as they already have what they want.

So maybe only hide those who don't have any contribution activity?

#3 @jonoaldersonwp
5 years ago

Yeah, happy to noindex 'empty' profile pages.

#4 in reply to: ↑ 2 ; follow-up: @jdembowski
5 years ago

Replying to SergeyBiryukov:

So maybe only hide those who don't have any contribution activity?

That would work too but I don't just want noindex on those pages, I want to make it clear to spammers that their profiles are not visible to non-logged in users.

It is a business driver for people to get an account just to abuse profiles.wordpress.org. Let's take that away.

The profiles.wordpress.org pages are not FB, Twitter, etc. They are a resource for other users to contact other users. They should not be anyone's shingle for work or anything else. People can and do use their own site for that. If a user has to log into wordpress.org to see that then mission accomplished. User A can contact User B.

How about a compromise? If you are not banned and you have activity in the forums, trac, make, etc. then then your profile page is visible and only then.

If you do not have any activity then it's just your display name. No bio, no links, nothing.

#5 @fierevere
5 years ago

So maybe only hide those who don't have any contribution activity?

This should be fine for most contributors (of any kind) i guess...

#6 follow-up: @tobifjellner
5 years ago

One thing to keep in mind: Every now and then, I need to find the org profile that's connected to a certain Slack identity. The easiest way for me to achieve that, is to use the world's biggest search engine. There may be other legitimate use-cases that we need to cater for before we can hide these pages from search engines.

#7 in reply to: ↑ 6 @jdembowski
5 years ago

Replying to tobifjellner:

One thing to keep in mind: Every now and then, I need to find the org profile that's connected to a certain Slack identity.

I do the same thing. It's easy to tie in a .org profile -> Slack but the reverse is so not true. However, the forum profile does show up in Google.

site:wordpress.org support/users/jdembowski
site:wordpress.org "@jan_dembowski on Slack"

That should cover that use case and for added redundancy the Display name on profiles.wordpress.org can still include things like "@jdembowski on WordPress.org, @jan_dembowski on Slack"

#8 in reply to: ↑ 4 ; follow-up: @jonoaldersonwp
5 years ago

Okay, let's do this.

If a user hasn't made at least one contribution, then we should:

1) Prevent all of their profile content fields except for username, Slack identity, and basic/arbitrary details from being output.

2) Noindex their profile page(s).

Additionally, if the user in question is viewing their own profile page, we should provide messaging which explains that their profile won't be publicly visible until it's "active".

#9 follow-up: @tobifjellner
5 years ago

If we NOINDEX profile pages, then anything we might need to search for (like links to Slack-ID) needs to be made avaiable/searchable somewhere else.

#10 in reply to: ↑ 9 @jonoaldersonwp
5 years ago

Replying to tobifjellner:

If we NOINDEX profile pages, then anything we might need to search for (like links to Slack-ID) needs to be made avaiable/searchable somewhere else.

Ok, then that needs to be a separate/internal system. We can't have all of those thin/empty profile pages indexable.

#11 in reply to: ↑ 8 @jdembowski
5 years ago

I'm doing this on my phone so I apologise for the gross mistakes I'm going to miss.

Replying to jonoaldersonwp:

Additionally, if the user in question is viewing their own profile page, we should provide messaging which explains that their profile won't be publicly visible until it's "active".

Why?

*Look for coffee, finds none*

The profiles.WordPress.org pages are not anyone's shingle for services or a site. If anyone is using then for SERP then they're making a mistake. Those pages are for informational purposes for other forum or site users to find out about another forum or site user.

The creation of those spam profiles is performed by scripts. The bots won't care about any such notification notices.

I, hands down, do not want people posting inane replies in the forums just to activate their profile. I guarantee that will be added to the script and that is why I want those pages to only be viewable to logged in users only.

We're supposed to be promoting the WordPress opensource project. It's not to be used that way because that is abuse.

If anyone wants that then please propose a vetted WordPress directory. Perhaps on jobs.wordpress.net or somewhere else. Right now the profiles.wordpress.org site is innundated with much more spam than anything else.

#12 follow-up: @jonoaldersonwp
5 years ago

Let's separate out these concerns.

Firstly, on principle, we shouldn't be making decisions which negatively impact the experience and quality of life of our users, based on frustrations we have with spam and abuse. As a user, if I've completed my profile, I expect to be able to see my profile. Given that new users won't be able to, that's a potentially confusing experience. We should signpost that and explain the situation, or risk alienating new users.

Secondly, the profiles pages are for whatever users decide they're for. If they want to use them as their personal homepage, that's fine. If they want to use it to advertise their consultancy services, fine. With the exception of overtly 'spammy' scenarios, if they want to try and sell their lemonade from it, that's fine, too. We should encourage flexibility and ownership of use cases. That's good for our users, and good for wordpress.org.

Thirdly, if our issue is the creation of spam accounts, we should take steps to resolve that, rather than working around it at the expense of our users. Why don't we use honeypots, or, e.g., Google's invisible Recapcha system (with a ham/spam feedback mechanism, and a score threshold for instigating additional requirements such as an image captcha)? There are myriad options here to bag and tag these registration attempts before they ever get anywhere near having a profile.

#13 follow-ups: @zodiac1978
5 years ago

As an alternative we could follow the path in the support forums and display the website URL just if the user is logged in.

Assuming this link is what the spammers are here for.

We could display: [log in to see the link] like in the forums.

This would keep the page useful for everyone. You just have to be logged in to see the link.

It wouldn't prevent "links" in the bio (Is HTML allowed?), but maybe this could be a start and a compromise for everyone?

#14 in reply to: ↑ 12 @jdembowski
5 years ago

Replying to jonoaldersonwp:

Let's separate out these concerns.

Firstly, on principle, we shouldn't be making decisions which negatively impact the experience and quality of life of our users, based on frustrations we have with spam and abuse. As a user, if I've completed my profile, I expect to be able to see my profile. Given that new users won't be able to, that's a potentially confusing experience. We should signpost that and explain the situation, or risk alienating new users.

Did I mention we're not FB, Twitter or anyone's shingle? I did right? ;)

I agree about setting expectations and what I am proposing will not in anyway impact anyone viewing your profile if you are a new or experienced user in the community. Being logged in is not a bar, it's a requirement.

If you want to add a notice to the profile page editor then I am good with that.

Secondly, the profiles pages are for whatever users decide they're for. If they want to use them as their personal homepage, that's fine. If they want to use it to advertise their consultancy services, fine. With the exception of overtly 'spammy' scenarios, if they want to try and sell their lemonade from it, that's fine, too. We should encourage flexibility and ownership of use cases. That's good for our users, and good for wordpress.org.

Nope, I totally disagree with that. See my reply to the first concern and if a brief notice needs to reinforce that is needed then cool.

I honestly and not trying to be repetitive but this is the real point: The profiles.wordpress.org pages are not anyones home page, it's not a social media, it is not a shingle for promotion of any kind. It is not a SERP quid pro quo for participation in the community and it is not a hosting service for light web pages.

That is the reason that profiles.wordpress.org is such a quagmire of spam pages. The number of fake, temp email logins created just for that purpose outweighs the number of real users signing up for real support or participation. I believe the signups on a daily basis is ~2,000 and I am confident saying that only a quarter of those are legit.

Thirdly, if our issue is the creation of spam accounts, we should take steps to resolve that, rather than working around it at the expense of our users. Why don't we use honeypots, or, e.g., Google's invisible Recapcha system (with a ham/spam feedback mechanism, and a score threshold for instigating additional requirements such as an image captcha)? There are myriad options here to bag and tag these registration attempts before they ever get anywhere near having a profile.

Again, I disagree. That just moves the solution from something that can be achieved to something that will not accomplish anything.

Making the profiles.wordpress.org pages like the support/users in that you need to be logged in will work. I can prove it: it works on those other support/users pages successfully. The spam target and business driver is profiles.wordpress.org. Let's fix that by making it hide the fields save the name and Slack handle unless you are logged in.

Last edited 5 years ago by jdembowski (previous) (diff)

#15 in reply to: ↑ 13 @jdembowski
5 years ago

Replying to zodiac1978:

I agree and think that would work. As you know that is why the topic link field in the forums is not viewable unless you are logged in. It has cut down on the number of spammy post a great deal in the forums.

#16 in reply to: ↑ 13 ; follow-up: @zodiac1978
5 years ago

Replying to zodiac1978:

It wouldn't prevent "links" in the bio (Is HTML allowed?), but maybe this could be a start and a compromise for everyone?

I've done some tests. You can use real links in the bio and in the origin story.

Maybe this needs some fixing, too. Like somments? If more than x links? Or we don't allow links here anymore?

#17 in reply to: ↑ 16 @jdembowski
5 years ago

Replying to zodiac1978:

I've done some tests. You can use real links in the bio and in the origin story.

I know. Separate trac ticket when I have some time. I talked about submitting this one like 6 months ago. :D

Edit: I want to propose escaping that field so links and other things a un-html'ized. This too will go well, I am sure.

Last edited 5 years ago by jdembowski (previous) (diff)

#18 follow-up: @jonoaldersonwp
5 years ago

This is a really bad approach. It shifts our 'business' burden and problems onto the user, introduces friction, and enforces unnatural behaviours. That's terrible practice.

We shouldn't be hiding or distorting content because we have spam problems. If we have spam problems, we should fix our spam problems behind the scenes. There are processes, systems and technologies which we could employ to resolve this easily. A million other websites have, and have had, problems like this and solved it. We're not special, and we shouldn't be reinventing the wheel.

I also disagree strongly that the profiles pages aren't "profiles", both in the conventional sense, and in a strategic sense for wordpress.org. If there are 'rules' about what this is, or isn't, I'd be keen to see and challenge them.

The proposed approach will be harmful to wp.org's SEO, to the usability of the site, and to the on-boarding of new users - so for the record, I object.

Last edited 5 years ago by jonoaldersonwp (previous) (diff)

#19 in reply to: ↑ 18 @jdembowski
5 years ago

Replying to jonoaldersonwp:

It really is all about what those profiles.wordpress.org pages are for.

If they are for permitting other users to find out more about another user, to possibly see all of those links that Gravatar adds such as Twitter, FB, LinkedIn, a site, etc. and inform other users what that person is about then my solution does that.

Adding a requirement to be logged in does not impact those legitimate users and use case.

If the purpose is anything else then the user can set up their own site. In no way does my proposed solution harm any users. The wordpress.org site is not anyone's free blog or HTML pages.

Spam problems are always an uphill and adaptive problem. We're providing free spam hosting and that is a valid concern. That needs to stop. Once that free spam platform is removed then the bad actors will move onto something else. Great GNU, they do talk to each other and do keep lists of targets.

Leaving free spam hosting in place does not help the community. There is not SERP expectations and closing that avenue of spam pages does help.

If you want to propose a vetted (that's important and I know that is subjective) directory then lets see if that can happen. The showcase site on wordpress.org is kind of vetted so this really is not a new idea.

#20 follow-up: @jonoaldersonwp
5 years ago

I don't want to suggest a vetted membership section; I want to allow people to create, manage and consume profiles without introducing friction. Hiding them from Google harms our SEO; I'm not comfortable with that.

Separately, we should solve our spam problems, behind the scenes, by utilizing the myriad of largely off-the-shelf solutions available to us.

The only compromise I'd be comfortable with from an SEO and usability perspective is hiding (removing fields, noindexing) 'empty' profiles, as discussed above - providing we go to lengths to explain to people why we're imposing non-standard experiences and user flows to access/view/manage their (and others') legitimate-but-initially-empty profiles.

Last edited 5 years ago by jonoaldersonwp (previous) (diff)

#21 in reply to: ↑ 20 @jdembowski
5 years ago

Replying to jonoaldersonwp:

Separately, we should solve our spam problems, behind the scenes, by utilizing the myriad of largely off-the-shelf solutions available to us.

You recall when someone proposes something and ultimately says "This should not be a lot of coding and is easy"? It never is that simple but my proposal is. It's already been done for support/user profiles and it works.

Fighting spam really is a separate topic and lengthy conversation. Akismet is probably the best at it but this isn't comment spam and back to my point, is not what the profiles should be for.

This is not a rules thing for challenging. The profiles are currently a reward system for spammers. With my proposed change users can log into the site to see the whole profile. There is no quid pro quo for SERP. That' why I think a vetted system could be a solution. Or not; who would vet and approve? Who would monitor changes after a profile has been approved?

The only compromise I'd be comfortable with from an SEO and usability perspective is hiding 'empty' profiles, as discussed above - providing we go to lengths to explain to people why we're imposing non-standard experiences and user flows to access/view/manage their (and others') legitimate-but-initially-empty profiles.

I am sorry, but that will not work. Spammers will catch on, pick a 5 month old topic (they close at 6 month) reply harmlessly with "Thanks, this helps me" and POOF! we're back to hosting spam pages again.

#22 follow-ups: @jonoaldersonwp
5 years ago

The proposal as it stands, as I've stated, is harmful to the site's SEO, and is a usability anti-pattern.

Either we alter the approach to resolve this, or, we should focus on solving the underlying spam problem.

Regardless, we should consider:

  • Adding honeypot fields to the registration field, and using Google's invisible recaptcha to catch bots attempting to fill out the form.
  • Reduce the appeal of manual/human spam, by neutering the SEO/business value of profiles themselves by ensuring that all non-WP links are nofollow'd.
  • Using Akismet on profile content creation/updates to catch generic spam content.

#23 in reply to: ↑ 22 @jdembowski
5 years ago

Replying to jonoaldersonwp:
I'll ask someone to weigh in about anti-spam fields for accessibility. Let's not solve one problem by possibly excluding a whole segment of users.

The proposal as it stands, as I've stated, is harmful to the site's SEO, and is a usability anti-pattern.

How is it harmful and to who is it harmful? Honest question. Give me the context of that statement.

WordPress.org's SEO should be about WordPress and the community. It should not be about rewarding people who just happen to have a profile here. See my replies above for why I state that.

The forum post and replies, the make blogs, the plugins, the themes, all of these are already indexed appropriately in search engines. Nothing here will change that so who's SEO are you referring to?

The profiles.wordpress.org pages are a magnet for spam the way they are now. That's a business driver for bad actors. My proposed solution closes that effectively and does not impact users. This isn't a guess on my part, it's already been implemented on support/user profiles without a single issue.

#24 in reply to: ↑ 22 ; follow-up: @zodiac1978
5 years ago

Replying to jonoaldersonwp:

The proposal as it stands, as I've stated, is harmful to the site's SEO, and is a usability anti-pattern.

What is your opinion on just hiding the website URL as I suggested above?

  • Adding honeypot fields to the registration field, and using Google's invisible recaptcha to catch bots attempting to fill out the form.

This should be done - if we are hiding those profiles/links or not.

  • Reduce the appeal of manual/human spam, by neutering the SEO/business value of profiles themselves by ensuring that all non-WP links are nofollow'd.

The website URL has

rel="nofollow"

already

Replying to jdembowski:

Edit: I want to propose escaping that field so links and other things a un-html'ized. This too will go well, I am sure.

I have seen this in forums where every link is replaced by [log in to see the link]. This could be another solution to this problem.

This ticket was mentioned in Slack in #accessibility by jan_dembowski. View the logs.


5 years ago

#26 @jonoaldersonwp
5 years ago

Requiring users to log in to click on the links (or to see them, or to transform a [log in to view] type message] is a UX anti-pattern - it takes our spam problem and turns it into our users' UX problem. That's bad practice.

That said, it'd be a lot less-bad than other approaches.

#27 in reply to: ↑ 24 @jdembowski
5 years ago

Replying to zodiac1978:

Replying to jonoaldersonwp:

The proposal as it stands, as I've stated, is harmful to the site's SEO, and is a usability anti-pattern.

What is your opinion on just hiding the website URL as I suggested above?

I believe that would work and accomplish the goal. ;) Just as it is in other places on the site, that would prevent scraping by search engines and remove the incentive for spammers.

#28 follow-ups: @jonoaldersonwp
5 years ago

If the objective is to disincentivise spammers due to the SEO value of links from wordpress.org, I don't understand how hiding/transforming the link is any different to adding a nofollow attribute, which we already do.

#29 in reply to: ↑ 28 ; follow-up: @jdembowski
5 years ago

Replying to jonoaldersonwp:
And yet, those pages do show up in Google. Just using nofollow hasn't been effective. I personally think that any solution based on the honor system is not an effective control.

NOTE: I do not want to challenge an SEO expert and I sure am not nor do I mean to. ;) But adding a log in requirement isn't obtrusive and has been demonstrated to work elsewhere.

#30 in reply to: ↑ 28 @tobifjellner
5 years ago

Replying to jonoaldersonwp:

If the objective is to disincentivise spammers due to the SEO value of links from wordpress.org, I don't understand how hiding/transforming the link is any different to adding a nofollow attribute, which we already do.

Spamming is not ONLY about stealing SEO-value. If you can place your keyword-rich message in a place that in itself ranks well on Google, then you won't bother too much that the site you're linking to won't get "google juice". If you get at least some traffic from visitors who are interested in your miraculous gadgets, then that's good enough...

#31 in reply to: ↑ 29 ; follow-up: @jonoaldersonwp
5 years ago

And yet, those pages do show up in Google. Just using nofollow hasn't been effective. I personally think that any solution based on the honor system is not an effective control.

The nofollow attribute on external links isn't meant to prevent the profile page from being indexed. It's meant to prevent search engines from passing value through the link; therefore disincentivising spam.

Spamming is not ONLY about stealing SEO-value. If you can place your keyword-rich message in a place that in itself ranks well on Google, then you won't bother too much that the site you're linking to won't get "google juice".

Agreed, which is why I'm happy to noindex profile pages until they reach a (TBD) activity/quality threshold.

#32 in reply to: ↑ 31 @zodiac1978
5 years ago

Replying to jonoaldersonwp:

Agreed, which is why I'm happy to noindex profile pages until they reach a (TBD) activity/quality threshold.

Which can easily can be tricked ... or is harmful to real users. Talking about UX anti-patterns ;)

#33 follow-up: @jonoaldersonwp
5 years ago

It's always going to be possible to create content on (various bits of) wordpress.org with malicious intent.

Let's make it harder to do it lazily/stupidly on profile creation, and tackle the 99% of cases, rather than introducing anti-patterns for everybody in order to solve for extreme edge cases.

#34 in reply to: ↑ 33 @jdembowski
5 years ago

Replying to jonoaldersonwp:

It's always going to be possible to create content on (various bits of) wordpress.org with malicious intent.

That's not helpful. I'm not proposing we solve all problems, please focus on this one. My proposal directly and effectively solves this problem.

Let's make it harder to do it lazily/stupidly on profile creation, and tackle the 99% of cases, rather than introducing anti-patterns for everybody in order to solve for extreme edge cases.

I'm still waiting to hear how this is harmful to SEO and to who's SEO.

I get that you do not like what I am proposing and that is fine and valid. But I am trying to address a tangible and real problem directly. This is directly related to dealing with forum spam and that is how I became aware of this years ago.

The spam problem on profiles.wordpress.org is not and had never been an edge case. It is the majority of profiles and I want to stop that while letting users see real people's information in an appropriate way.

#35 follow-up: @jonoaldersonwp
5 years ago

We're going round in circles. Let's summarise the moving parts.

Having complete, accessible, rich profiles of real WordPress users/contributors on wordpress.org is a key part of the broader SEO 'strategy' for the site (or, it would be, if we had such a thing). The profiles should be the definitive 'version' of these authors/users/contributors, in the context of WordPress.

Therefore, failing to preserve the crawlability, indexability and UX of current, valid profiles isn't an option.

Any scenario whereby we penalize legitimate users, consumers, or Google, will harm the visibility and performance of the wordpress.org site, and by extension, WordPress' reach.

So, any solution to the problems of:
1) Automated spam registrations, and;
2) Automated or manual spam profile content, and;
2) More sophisticated, manual spam detection/classification avoidance;

...Must be dealt with in a way which doesn't:

1) Impact the UX of profile pages
2) Require users to log in (Google)
3) Prevent legitimate users from having/using/consuming profile content

We can make some minor compromises on the difference between logged-in and logged-out experiences, but, transforming links to be 'unclickable' severely compromises the UX.

I maintain that the cleanest first-wave solution to reduce the sheer volume of spam is to:

1) Implement modern, invisible spam prevention techniques on the registration form.
2) Noindex empty profiles, until they're "not empty" (details TBD).
3) Run profile content submissions/updates through Akismet.
4) Continue to ensure that external links on profiles have a nofollow attribute.

Following this, we may wish to review further, more nuanced processes for more aggressive/sophisticated/nefarious abuse which these processes don't catch.

Last edited 5 years ago by jonoaldersonwp (previous) (diff)

#36 in reply to: ↑ 35 ; follow-up: @jdembowski
5 years ago

Replying to jonoaldersonwp:

We're going round in circles. Let's summarise the moving parts.

I do not like circular arguments either and I'm hoping we can stop. I get that we're both intractable in our positions but you are not addressing two points.

  1. The profiles.wordpress.org site isn't FB, Twitter or free HTML hosting. It's not but it is being used that way in amazing quantity by bad actors. That has to stop.
  1. The solution I am proposing doesn't negatively impact users in the community. That's the target audience for anything related to WordPress. It's as obtrusive as requiring a .org account to comment on a make blog, which is to say not obtrusive at all.

Having complete, accessible, rich profiles of real WordPress users/contributors on wordpress.org is a key part of the broader SEO 'strategy' for the site (or, it would be, if we had such a thing). The profiles should be the definitive 'version' of these authors/users/contributors, in the context of WordPress.

I completely disagree. See point 1 above.

Therefore, failing to preserve the crawlability, indexability and UX of current, valid profiles isn't an option.

As it has been implemented in other places on wordpress.org already, it is a valid option and just as before directly addresses the problem as it did on the forum profiles.

Any scenario whereby we penalize legitimate users, consumers, or Google, will harm the visibility and performance of the wordpress.org site, and by extension, WordPress' reach.

I'm not being obtuse, but this does not penalize any legitimate users. Item 1 above.

So, any solution to the problems of:
1) Automated spam registrations, and;
2) Automated or manual spam profile content, and;
2) More sophisticated, manual spam detection/classification avoidance;

...Must be dealt with in a way which doesn't:

This next part is not mean, it is not sarcasm and I genuinely hope that no one takes it as that. I participate in WordPress.ORG with respect and there is nothing here from anyone that I think is bad, rude or anything like that.

I am asking you and anyone replying to this ticket to please not expand this very specific issue regarding profiles.wordpress.org to a wider scope. I'm not trying to drain the ocean but I do want this spam venue closed without forbidding users from viewing and enjoying the profiles of legitimate users and people who have contributed.

Please stop trying to make this issue about something else. It's only about profiles.wordpress.org and the spammy pages there.

I maintain that the cleanest first-wave solution to reduce the sheer volume of spam is to:

1) Implement modern, invisible spam prevention techniques on the registration form.
2) Noindex empty profiles, until they're "not empty" (details TBD).
3) Run profile content submissions/updates through Akismet.
4) Continue to ensure that external links on profiles have a nofollow attribute.

Following this, we may wish to review further, more nuanced processes for more aggressive/sophisticated/nefarious abuse which these processes don't catch.

Please see above about changing the scope of this problem.

What you are proposing does not address that though they are not bad ideas in a general sense.

#37 in reply to: ↑ 36 @DeFries
5 years ago

Replying to jdembowski:

First off, I am fully aware of the spam accounts as a admin/moderator of the Dutch WP forum for a good 11 years now. I hear ya, I know where you're coming from.

Jan, even though @jonoaldersonwp does not address the problem you describe, it does describe the how to solve the bigger problem. Your suggested approach sort of nukes the profile pages and even though you don't see the value of those pages being indexed, those of us who vet people based on their activity for volunteers for WordCamps and Meetups, for instance, do.

This is just one of those examples that wouldn't work anymore if nothing was indexed. And having these profile pages indexed doesn't negate your statement about them not being social media profiles. You are correct about them not being that.

The bigger issue here is spam. It's the spam, and the possibility for spam to even be let onto our systems that should be targeted , and not change the behaviour of those profile pages to a negative UX experience for those that don't mean harm. I believe Jono to be 100% accurate on that.

So, even though you're saying the scope is being changed, it's actually not. The scope is the same: getting rid of spam accounts. The ways of going about it, that's where you differ in opinion. And frankly, I think you're approaching this with a sledgehammer mentality where a hammer in well aimed directions would suffice.

I strongly suggest we take a step back and see how we can find solutions to the core of the problem instead of creating new issues for other people.

Sidenote, I wish we have the bozo functionality back like we used to on the old bbPress forums 😕.

Last edited 5 years ago by DeFries (previous) (diff)

#38 @johnjamesjacoby
5 years ago

Technically, BuddyPress does have rudimentary privacy & visibility settings in profile fields that could be enabled; with a little bit of planning, what @jdembowski is suggesting is possible to do.

Hiding some parts of a profile exponentially increases the number of cache variants for each splice, introducing code complexity, contributor burden, and ongoing maintenance. For instance, we do this currently when you’re viewing your own profile, or have moderation abilities for badges. It’s not a huge deal, but it isn’t exactly a free lunch either.

An observation: hiding profiles from unauthenticated guests would make them behave more like Facebook currently does today. It would act more as a walled garden to otherwise generally harmless information. As for the few users who attempt to exploit profiles to cause harm, there is a lot we can learn from them to help them achieve their goals in a less harmful ways.

I have always believed Profiles has a lot of untapped potential. There is so much the community should be doing to make them better. It would be a real shame to see them hidden away before they’ve been given any real attention.

#39 @jdembowski
5 years ago

@DeFries I do think there is a bigger spam/registration problem but I want to address what can be addressed quickly for this problem. Selectively hiding portions is doable though as @johnjamesjacoby points out isn't a free lunch re caching, processing, etc.

Enabling it so search engines cannot index the profiles hidden sections becomes effective the moment it is turned on. There's no way for a spammy profile to get around it. The other solutions I am not convinced would be effective.

How about this:

  1. Hide the profile portions save the display name and Slack ID as a default if you are not logged in. If you are logged in then enable everything and make it all viewable.
  2. If there is a condition for a profile being viewable then make that condition any one or more of the badges. The badges signify a level of participation and works automatically for some things. I don't want the activity to be the part that determines it as scripted profiles will just leave a comment on the forums and active their profile.

Using badges as a qualifier is attractive because if you are a plugin, theme, or a member of a team on WordPress.ORG then your participation is more than anything that can be scripted. This will set a bar that I am comfortable with.

A profile for "Oddball Shipping Company" or "Scam Coin Base" will never have badge and I would be satisfied with that solution. It would eliminate any free HTML hosting that spammers count on and have used for many years.

Edit: We don't have Bozo (what a name for that flag!) but we do have the modwatched flag in the forums. Life was easier then. :P

Last edited 5 years ago by jdembowski (previous) (diff)

#40 follow-ups: @joostdevalk
5 years ago

I'm sorry but hiding data that is meant to be public (which is what it is), because "spam", is against everything we stand for. I'm very willing to discuss all sorts of spam prevention on the signup page, and further down the line, but the profile pages will be extended and have more things on them if all the plans I have for WordPress.org go somewhere.

The example you gave of the support profile ( https://wordpress.org/support/users/jdembowski/ ), is one of the problems in my eyes, it should be merged with and redirect to your profiles.wordpress.org profile in my opinion.

So, with that, I would like to suggest we close this ticket, open a new one and use all the brilliant minds in this thread to come up with good spam prevention measures.

#41 in reply to: ↑ 40 @SergeyBiryukov
5 years ago

Replying to joostdevalk:

The example you gave of the support profile ( https://wordpress.org/support/users/jdembowski/ ), is one of the problems in my eyes, it should be merged with and redirect to your profiles.wordpress.org profile in my opinion.

Related: #518

#42 in reply to: ↑ 40 ; follow-up: @jdembowski
5 years ago

This remains pertinent and none of the replies have addressed this.

  1. What is the purpose of profiles.wordpress.org?
  2. Is there an AUP for that free HTML hosting?
  3. What's the proposed process for finding and dealing with new and existing spam profile pages?

There has been reference to what some think the profiles should be about but that's not documented anywhere. If anyone wants to define that then I think they should also consider taking ownership and managing that as well. Someone would have to.

OK, onto the latest reply.

Replying to joostdevalk:

I'm sorry but hiding data that is meant to be public (which is what it is), because "spam", is against everything we stand for.

That sounds quite dramatic and really re-frames this problem to make this about something it's not. Again. The solutions proposed have been to show some information, have been to make it [ Log in to see the link ], only index based on a TBD criteria, etc.

That's not the same as being against everything we stand for.

I'm very willing to discuss all sorts of spam prevention on the signup page, and further down the line, but the profile pages will be extended and have more things on them if all the plans I have for WordPress.org go somewhere.

If anyone reading this wants to create their own ticket for Miracle Spam Registration Web Page Logarithmic Magic™ then please do.

https://meta.trac.wordpress.org/newticket

I've not looked but for all I know there may be one already. I'll even chime in on that ticket as that would need a new work flow and new owners of those tasks that does not exist at the moment for profiles.

But again, please stop trying to make this a boil the ocean problem. That's what spam fighting is (and for goodness sake, do not write that I'm against solving the spam problem as that would be amazingly dishonest). I'm talking about profiles and implementing a control that was done on the forum profile pages already.

The example you gave of the support profile ( https://wordpress.org/support/users/jdembowski/ ), is one of the problems in my eyes, it should be merged with and redirect to your profiles.wordpress.org profile in my opinion.

I agree. Having two profiles isn't optimal and the two should be consolidated.

See how easy that was? But as I stated earlier:

The forum post and replies, the make blogs, the plugins, the themes, all of these are already indexed appropriately in search engines. Nothing here will change that so who's SEO are you referring to?

I forgot to mention the document pages, Codex, etc. but you get the idea.

At the moment profiles.wordpress.org is a free un-managed web page hosting for WordPress users. It is being abused everyday and I am proposing that gets addressed by removing the business driver for spammers.

There are 1.7+ million registered accounts here and, conservatively I am sure it is much higher, a quarter of those users have a spam page on that site. Saying this is a spam problem doesn't do anything for the existing spam pages. Making the registration somehow foolproof does not address those existing pages.

Creating an account on wordpress.org, creating a spammy profile page is a business driver for spammers. That's the situation right now and has been for years.

If anyone wants to propose different solutions to this problem then I am all for it and want to discuss that and implement something to address this problem of spammy profiles.

So, with that, I would like to suggest we close this ticket, open a new one and use all the brilliant minds in this thread to come up with good spam prevention measures.

Please do not. You or anyone else can create a new ticket to address that spam and spam registrations.

If someone in authority (Matt? Otto? Gary? Someone.) comes along and explains why this ticket isn't being implemented then let's just make it "Won't fix". But if they do then I would appreciate if that person would scroll up to the top of this comment and address the three points at the top.

*Drinks coffee, so good*

Look, this is not a hill I'm willing to die on.

*More coffee*

OK, that's not the best metaphor. Let me try again.

The profiles.wordpress.org is a free hosting service for spammers. It is. That's not a debatable point. Not directly addressing this won't make my life better or worse. I'm not a SEO expert by any means but I am confident that any site being a spam page host isn't good either. Dealing with this will make WordPress better if only from a web reputation point of view.

If I could crawl all 1.7+ million profiles (and don't worry, I wont divide the task up and do that) and count the number of links in each profile, run those links and profile text as a comment into Akismet then I could give you actual numbers to better illustrate the problem. I would if that were achievable but I don't think that would matter to anyone who is against the idea of a control to fix this.

Whatever benefit you believe the profiles provide legitimate users is far outweighed by that spam because there is no vetting and no controls on those pages.

Focus on the problem here. What can be done to take away that business driver for spammers for the profiles?

I've proposed making logging in a condition to see those pages, others have helpfully suggested other options such as [ Log into see those links ] as well as not index the profiles that do not pass a condition TBD.

What change can be implemented on profiles.wordpress.org to address this?

#43 in reply to: ↑ 42 ; follow-up: @zodiac1978
5 years ago

Replying to jdembowski:

  1. What is the purpose of profiles.wordpress.org?

I just can answer this for me. For me it is a place that I can point to colleagues, potential clients/employers - to every visitor of my website to show what I have done in the WordPress community.
Plugins, Themes, Answers in the Forums, Contribution to Trac - easy to see with all the badges. And further infos below/abvove.

If this would stop spammer to register, I'm fine with deleting the website URL and removing all HTML from Bio, Interests and Origin story. Because it is not relevant for *my* use case.

Just my 2ct.

EDIT: To prove my case, here is a spammer. Look at what infos are provided and which are not: https://SPAM-profiles.wordpress.org/aswinnerz/ (just remove the "SPAM-" part)

Last edited 5 years ago by zodiac1978 (previous) (diff)

#44 in reply to: ↑ 43 @jdembowski
5 years ago

Replying to zodiac1978:

EDIT: To prove my case, here is a spammer. Look at what infos are provided and which are not: https://SPAM-profiles.wordpress.org/aswinnerz/ (just remove the "SPAM-" part)

THAT. THAT. Everyday in large quantity, that.

The profiles.wordpress.org site has been the marketplace for spammers. They do not need to post in the forums, they do not need to reply to a make blog. Everyday they achieve their goal which is to get those profile pages.

This is the thing I want to take away from bad actors.

#45 follow-up: @johnjamesjacoby
5 years ago

We could run the profile field values through the Discussion settings’ moderation & blocked words lists.

bbPress does this for topics & replies, but neither bbPress nor BuddyPress do this for profile fields.

It still requires someone to maintain a bad-list of disallowed words, and the matching is fuzzy, so care needs to be taken to avoid false positives (trailingslashit() comes to mind.)

That’s a relatively simple task that would cut a lot of this out.

Last edited 5 years ago by johnjamesjacoby (previous) (diff)

#46 in reply to: ↑ 45 @jdembowski
5 years ago

Replying to johnjamesjacoby:

That’s a relatively simple task that would cut a lot of this out.

Reduction (cutting a lot of this out) would be a great improvement. :thumbsup-not-slack-here:

This ticket was mentioned in Slack in #forums by clorith. View the logs.


5 years ago

#48 follow-up: @JarretC
5 years ago

Is there a spammer profile page that has been indexed in Google for the spam links on the profile? Using the profile mentioned above, nothing shows up in Google for me with...

site:profiles.wordpress.org jets vs raiders live

I'd have to agree with @jonoaldersonwp on this one, hiding profile data isn't the way to go and requiring a user to log in to see that info could be harmful. If I reply on the forums helping somebody out, a user in Google is searching for that same issue and comes across the thread, sees my reply and it resolves the issue, then visits my profile (highly unlikely they would anyways) having my site URL linked on there doesn't hurt anybody.

I'm not sure what specifically @joostdevalk has in mind in terms of what he would like to see the profiles turn into but I believe they should be used as a profile page as they are on any other social media site and contain whatever content that user feels is necessary.

#49 in reply to: ↑ 48 @jdembowski
5 years ago

Replying to JarretC:

Is there a spammer profile page that has been indexed in Google for the spam links on the profile? Using the profile mentioned above, nothing shows up in Google for me with...

site:profiles.wordpress.org jets vs raiders live

I suspect that Google is doing something to limit the exposure of that. Which is why I want to implement something here. Relying of something external is not a control, it's luck like the honor system.

This search produces some hits when I tried it.

site:profiles.wordpress.org "Movies"

This is the first hit.

https://profiles.wordpress[.]org/putlocker38/

That 2016 account has (quick look while on almost one cup of coffee) six links and this as the text. Please don't @ me for spreading spam. ;)

Bio

Watch Movies Online Free. Watch your most loved films online free on Putlocker. Find a huge number of most recent Movies on the web.

Interests

Putlocker Movies, Putlocker TV Shows, Popular TV Shows, Popular Movies, Free Online Movies, New Released Movies, Latest Movies, Latest Released Hollywood Movies, Top Super Hit Movies, High TRP Shows

Which remains my point. Now look at this search in Google.

inurl:https://wordpress.org/support/users/ "Movies"

While this is a similar problem for the whole site, the first hit for me is this one. It's a 2017 account.

https://wordpress[.]org/support/users/kazmpire/

When not logged in, this is only text.

kazMPIRE MOVIES LIFESTYLE

And that's it.

Look at the profiles.wordpres.org link while not logged in.

https://profiles.wordpress[.]org/kazmpire/

(That's a spammer too despite having legit topics and I'll bash both accounts into oblivion next week).

This ticket was mentioned in Slack in #meta by yui. View the logs.


5 years ago

#51 follow-up: @dd32
5 years ago

In 9170:

Support Forums: Add a robots header for banned users and those who don't have any profile content.

See #4754, #4632, #4714, #4696.

#52 @dd32
5 years ago

  • Owner set to dd32
  • Status changed from new to reviewing

#53 in reply to: ↑ 51 @jdembowski
5 years ago

Replying to dd32:

In 9170:

Support Forums: Add a robots header for banned users and those who don't have any profile content.

See #4754, #4632, #4714, #4696.

This cheers me.

If there is no plugin, theme commits, if there are no replies or posts in the forums then the profile should not be scraped.

I also have no idea if robots header works but I defer to those who know. ;)

This ticket was mentioned in Slack in #forums by jan_dembowski. View the logs.


5 years ago

#55 @dd32
5 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

Going to mark this as fixed, it looks like the changes I made a few months back (Both what's on this ticket, and in our private repo) have worked to remove a bunch of the spammy/uninteresting profiles.

This ticket was mentioned in Slack in #forums by carike. View the logs.


4 years ago

Note: See TracTickets for help on using tickets.