4500 Words on Everything We Don’t Know About Penguin and What Not to Do About It
Cynical title, ain’t it? Frankly, I’ve been putting off writing this “Penguin post” for 2 weeks.
Because I have a problem with talking about things I don’t actually know about.
The unfortunate state of affairs right now is that nobody really knows much about what’s going on. Sure, there are plenty of gurus writing blog posts on “how to recover after Penguin,” but the irony is that there are almost no blog posts called “How I Actually Recovered After Penguin” (hint: link bait opportunity, nerds. Get on it.). In other words the vast majority of what the gurus and professional SEO journalists are saying is theoretical fluff, much of it contradicting, with very few real case studies behind it.
With our shared sentiments on the table, let’s roll up our sleeves and get to work. My goal is to give you a glimpse into what we’re seeing, what others are saying, what kind of data is out there, what it means, and what to do about it.
Edit #1: Despite the passive aggressive headline, it appears that the discerning folks on Twitter have deemed this article helpful. Yay.
4500 Words on Everything We Don’t Know About Penguin and What Not to Do About It truth.thehoth.com/seo/penguin/ <- A++ would read again.— Matthew Brown (@MatthewJBrown) May 18, 2012
Edit #2: This post was intended to be shared with our clients only. To our surprise, it went pretty huge on Twitter (see conversation) and inbound.org. As a result, some unintended readers have brought up a concern that this analysis is biased, somehow favors our own link building product, etc.
To be clear, this is not the case. This article was written for people who already bought our product. It does offer some perspective on what the update means for them, the culmination of which is “there is no noticeable correlation” and “there are no silver bullets,” not “buy buy buy!”
In short, this article is not intended to sell anybody anything. It is intended to help our community, and based on the above feedback from Twitter, it appears to have done just that. If you are not a user of our product and have concerns about it, first make sure you watch our homepage video so you actually understand it. Next, if you still have product related questions, please contact us.
Neither I nor any member of my team will be fielding sales questions on this blog post.
Step 1: What Are We Dealing With Here? (In Theory, Anyway)
Penguin is the long awaited “over-optimization penalty” and rumblings of its coming were apparent months ahead. If you watched our Panda 3.3 webinar, we were already advising our customers to diversify their link profiles heavily with natural anchor texts (i.e. click here, more info, learn more, etc.) and naked anchor texts (i.e. <a href=“http://mysite.com”>http://mysite.com</a>).
Almost unanimously, we’ve heard this to be the case. If your anchor text profiles are heavily over optimized for your “money keywords” you may have gotten slapped or the anchor text relevancy factor may have merely been devalued. Which of the 2 it is tough to determine, but since we know that negative SEO works now, we can bet its a penalty, which is good because penalties are reversible.
It seems that Google is less interested in particular keywords, and more in their meanings. Applied, this means much better treatment of synonyms (this official statement uses the word synonym 20 times alone). Ideally this would mean that when I search for something like “emergency room chicago,” the search engine would be smart enough to look at available results for terms like “hospital chicago,” “emergency care chicago,” “ambulance chicago” and deliver me a SERP consisting of the most qualified sites from all of them. For us as SEOs, this is a signal, more than ever, that we should be diversifying anchor texts heavily, not just on 2–5 terms, but upwards of 10–30 terms in the long run. With all of this said, whether this synonym recognition is actually in effect is still questionable and its quality even more so (more on this below).
That’s half the story anyway. According to the wonderful wizard of Google, aka Matt Cutts (see official statement), Penguin is also intended to devalue/penalize sites for links from irrelevant pages. To be clear, the severity of irrelevance that Matt suggests they’re looking for is absurd. The outright stupidity of someone who’d build links this way is beyond my comprehension.
For your convenience, here is Matt’s example of building irrelevant links in 2012.
I’m not sure if this is a joke or not. I’ve interacted with over 1000 SEOs in my career and I’ve literally never met anyone that knowingly built links like this.
No worries, Matt’s announcement is a full on retro party, featuring throwbacks such as “we don’t like keyword stuffing,” screenshots and all.
As I’ve previously described, this is what I call the world of appearances. It’s the song and dance Google puts on and expects us to believe. There are 3 main groups of people who believe this stuff — the media, investors and sheep. Don’t get taken to slaughter. There’s a lot more here than meets the eye.
The World of Reality
Right now the SERPs are awful.
I mean, truly abysmal. I don’t mean this from an SEO’s perspective, I mean it from a search engine user’s perspective. The crap that I see popping up for search terms is worse now than it was 5 years ago.
Chris Rempel says it fantastically in his very thorough Penguin analysis -
Anyone with half a brain can plainly see that Google’s current SERPs, spanning almost every vertical, are some of the lowest quality results that we’ve seen in years. Decades, even. There are literally forum profiles and empty blogspot/web 2.0 pages ranking for some of the web’s most competitive keywords in every major commercial market…
I’ve spent the past week pouring over hundreds SERPs, and I’m consistently seeing low-quality, and in many cases outright nonsense, ranking on Page 1 for basically any high-comp keyword. Of course, there’s the nearly-guaranteed presence of WikiPedia/Squidoo/eHow/YouTube/BlogSpot (and equvalents) across the gamut – irrespective of quality or even relevance. Clearly, the domain-authority filter has been jacked up, way too much.
Maybe that’s the “3%” that Matt Cutts had mentioned was affected by Penguin. Perhaps the other 97% of Google’s results comprise searches like “Why do hippies smell?”, “Who would win in a fight Chuck Norris or Moby?” and other completely unprofitable keywords that simply don’t matter, to anyone.
Although I can give examples for days, here is a great one from Jennifer Ledbetter, a former Google fangirl who really used to believe all of Google’s hooplah about sites ranking just because they have great content and structure (which definitely does help, but obviously isn’t enough). Below is a screen shot from Jeniffer’s analysis on how bad the SERPs are right now. This is the SERP for “make money online,” an absurdly competitive term with literally thousands of active competitors.
Amongst some questionable results on the front page, the dodgiest by far is a no-name Blogspot blog. But maybe we should give Google the benefit of the doubt? Maybe this blogger is a rising star being awarded for his incredible content and meteoric rise to popularity? Or not…
Yes, it is a blank Blogspot without so much as a single post on it that ranks for “make money online,” one of the most competitive terms out there. Note: if you perform this search right now, you won’t find this ranking anymore, but based on the fact that folks like Chris Rempel and I continue to see this crap over hundreds of SERPs, I’m pretty sure this one was a manual fix from the folks at Google.
This is the reality of the SERPs right now, and it’s not you who should be scared.
Here’s the thing, this is far more dangerous for Google than it is for SEOs.
To understand why, you have to understand the core of Google’s business model.
Google’s key asset is the quality of its search results. The more people search, the more search result pages are shown, the more opportunities Google has to sell those eyeballs to advertisers. Searches are Google’s inventory, just as any website that sells ads treats its page views as inventory. Decreased quality of search results means users (sooner or later) begin to lose faith in the search engine and begin to look elsewhere. As searchers, we’re already doing it with sites like Yelp for local information and this scares the crap out of the folks at Google.
The One Equation Google Management Understands Better Than Anything Else
Lost Faith = Lost Eyeballs = Lost Inventory = Lost Revenues = Lost Stock Valuations = Lost Management Seats = #$%^&*
AdWords (aka PPC advertising) is Google’s primary revenue engine and always has been. To date, they have yet to create another industry leading, monetized product. Yelp is beating them in local. Groupon and Living Social are beating Google Offers in daily deals. And Facebook is absolutely obliterating the joke known as Google Plus+. YouTube is certainly not the king money maker in the building (extremely rough estimates suggest 6%). Search is where it’s at for them (and content network as well, although that has much lower margins).
Google management is aware that search is their golden goose, and the vitality of their search product is virtually synonymous with the company as a whole.
Right now, search is in trouble.
While Google puts on its pretty face for the media, claiming that this update is a huge improvement in SERP quality, this is the first time we’ve seen them setup a “holy s#*%! What’d you do to my site?!” form (whether you should fill out this form is discussed in the “Action Items” below).
With absolutely abysmal SERPs for so many key terms, something’s gotta give. There has to be another update coming.
Our Opinion: Don’t Panic. Lots of Great Sites Got Hit. We Have No Clue What’s Coming. But Something Has to be Coming.
I think investing a ton of time, resources and money into changing ship right now is kind of insane. The SERPs aren’t good. That’s not a sustainable situation for Google. Who knows where you’ll rank after the next update or what kind of different strategic changes you’ll have to make then.
Also, based on updates coming out every couple of weeks, its probably safe to expect another change coming sooner than later. Yippee.
But What About Over-optimization, Sites Are Being Penalized For That, Right?
Yeah, but honestly, the data is a lot muddier than you’d think. There are a TON of “over optimized” sites that show perfectly fine rankings and no penalization.
The folks at MicrositeMasters, a rank tracking company, put together their before-and-after Penguin data for thousands of sites they track rankings for to make sense of this. To date, it is the most insightful dataset we’ve seen come out about the Penguin update. With this said it has some serious flaws, which I’ll explain below.
First of all, we see that there is a decent distribution of sites of all levels of anchor text optimization, from 0% to 100% “money terms,” amongst those that did not get penalized.
In order to conclude that sites with less anchor optimization were safer, we’d expect to see a downward sloping trend, with the highest level of survivors at 0% and the lowest level at 100%. We don’t see that here at all. Likewise, why the 0% and 5% columns are empty is unclear. In general, there are no conclusions to be drawn from this graph.
This data is a bit more insightful. Amongst the sites that did get penalized, we see a threshold of about 65% anchor text optimization before penalties begin to happen. Even so, it seems to be a fairly small percentage of sites. It’s only at 90%-100% that we see pretty consistent penalties. Again, due to issues with this data that I’ll explain below, I’m reluctant to state either of these casual observations as conclusive claims. With that said, it’s probably smart to keep your anchor text optimization at 60% or below (shameless plug: HOTH users can easily do this by submitting 2 or more natural/naked anchors for any URL targeted).
This chart confirms what we told people back in our Panda 3.3 webinar, use branded, natural and naked anchors to stay on the Google’s good side. It makes your anchor text profile look natural. And Google likes natural.
So yeah, clearly more sites are getting penalized for over optimizing than not, but there’s still a huge percentage of over-optimized sites sitting pretty. Likewise, countless content-leaders in their category have had their rankings absolutely smashed. There clearly are a ton of confounding factors here, based on how many 100% anchor text optimized survivors there are alone.
In Chris Rempel’s words -
many salt-of-the-earth publishers (like AskTheBuilder.com, DaniWeb, and countless others) were severely affected by Penguin. Sites that are in some cases over a decade old, comprised of thousands of pages of quality, unique content, and plenty of social/brand signals – and they’re tanking, hard. These are sites that provide an awesome user experience.
Google claims that they are rewarding high quality sites. Their SERPs make it clear that they are rewarding scrapers, irrelevant, outdated web 2.0 pages, generic “slightly relevant” domains, and YouTube.
The Other Supposed Piece of Puzzle — Irrelevant Links
Based on their dataset, Microsite Masters concluded that having links from irrelevant sites is actually a bigger predictor of whether a site would be penalized by Penguin or not.
Unfortunately, I view this as a total misread of the data.
Like the previous graph that showed a huge number of 100% anchor text optimized sites ranking just fine after Penguin, this one shows that plenty of sites with 0% links from relevant sites are ranking fine as well. The distribution above and below 50% link relevance looks virtually identical. Based on this data, we cannot conclude that more links from relevant sites means additional safety from Penguin.
If you’re like me, upon looking at this graph, you think “look, pages with 0% links from relevant sites obviously get penalized, and there’s a heightened probability of getting penalized until you reach 30%!” Unfortunately, you cannot arrive at this conclusion from this data.
Warning: $#*% is about to get mathy.
In order to make a comparison like “sites that have 20% of links coming from relevant sites are 3x more likely to get penalized than sites with 30% of links coming from relevant sites,” we’d have to know what percentage of sites at the 20% level got penalized and what percentage at the 30% level got penalized. THIS DATA SET DOES NOT TELL US THIS. Instead it tells us what percentage of all penalized sites in this sample had 20% relevant links and what percentage had 30% relevant links. This does not allow us to make any comparative claims regarding likelihood of getting penalized.
Ok. I probably just confused the crap out of you. Here’s a simple metaphor to clarify.
Imagine I have 50 very short basketball players and 5 very tall basketball players. I have each player attempt a slam dunk. Of the 50 short players, 25 can do it. Of the 5 tall players, all 5 can do it. We can make a comparative claim here (sample sizes aside) that since short players can dunk 50% of the time and tall players can dunk 100% of the time, tall players are twice as likely to dunk than short players. If we treated this data the way Microsite Masters treated theirs, we’d report that 83% of successful dunkers were short, while 16% were tall. While this is factually accurate, it tells us nothing about the likelihood of a tall player dunking compared to a short one. So when you see that huge bar at 0%, that tells you nothing about it’s actual likelihood of getting a site penalized in comparison to higher levels. Sucks, huh?
I know what you’re thinking right now.
“Ugh! Science is so annoying! I’m obviously looking at a compelling graph. Can’t I make some conclusion?”
For the most part, no. But let’s humor ourselves and pretend like we are looking at data that can speak somewhat probabilistically (note: the only reason it would is if we invented this convenient reasoning). The chart for penalized sites shows us that as long as you have 10% relevant links, your likelihood of getting penalized is way less than if you have 0% (which is weird because, again, 0% is the category of highest survivors as well). Anything more than 10% of relevant links only produces a marginally better chance of not getting penalized. Likewise, we see no shortage of sites at 50–90% relevant links, which most people would call “squeaky clean,” still getting slammed.
Ultra Unscientific Conclusion: Get At Least 10% Of Your Links from Relevant Sites.
Why I Think The Relevant Site Filter Can’t Be Right
As one commenter in the MicroSite Masters article pointed out, an “irrelevant sites” filter would effectively value a New York Times link less than some blog roll from any industry friend. Logically, this is kind of nuts and I don’t think Google would want to do this. Often, non-subject matter specific sites (aka “irrelevant sites” or at least “not particularly relevant sites”) are the most influential and independent. If links from irrelevant sites really do produce a ranking drop after Penguin (which, again, based on the data, we have little reason to believe), I’m fairly confident it will either be rolled back or heavily toned down in the long run.
The other sacred cow that Google would be slaying if it did introduce an irrelevant sites filter/penalty is that of virality. Matt Cutts has said “just create great content and people will link to it!” so many times it makes my ears bleed when I hear it. But that’s just the thing. Pages, as they become more and more popular, increasingly get more and more irrelevant links. Likewise, if you’ve ever gotten a link from Huffington Post, Tech Crunch, Wallstreet Journal or any other huge publication, you know that its going to be syndicated by hundreds of sites. These sites, are likely “irrelevant” as well. Are we supposed to believe that there is a Wallstreet Journal penalty? Personally, I think that sounds totally nuts.
What Does This All Mean For The HOTH & Our Users?
(Disclaimer: I’m going to speak about The HOTH product in this section. As such, it may be biased, but my goal is to be as honest as possible and address our customers’ concerns.)
To date, we’ve worked on over 4000 campaigns, and roughly 8000 websites. If I told you that not a single site we’ve worked on was affected by Penguin, I’d be lying through my teeth. Likewise, for you to have that expectation would be a bit unreasonable as well. With that said, it’s with some relief that we have not heard a unanimous cry from our customers that their sites have been penalized. Some certainly have, but nothing above industry average, which is the best any of us can ask for when even the squeakiest of clean sites are getting hit.
In short, we don’t see any correlation between HOTH link building and penalization.
As far as diversifying your anchors, The HOTH is kind of the perfect tool to do that with. We give you 5 keyword slots per URL and you can submit naturalized anchors (i.e. click here, more info, learn more, etc.) or naked anchors (i.e. http://brandname.com, brandname.com, brand-name, etc.) in them. This means you can use The HOTH to naturalize your anchor text distribution.
We’ve heard different theories, but there seems to be a consensus that you should have 50%+ non-optimized anchors.
Likewise, from the perspective of relevant pages, all of the pages that link to you from The HOTH are relevant. Literally, every last one has custom written content that is relevant to your subject matter. The top-level properties are specifically on subdomains optimized for your subject matter, meaning they have site-wide relevance. Meanwhile, the rest of the links, following from our viral link structure, mimic the kinds of links you’d get as a result of a Huffington Post article being syndicated. Again, the the virality concept is one of the most sacred cows for the search algorithm. To penalize sites that have gotten links as a result of some sort of viral distribution or syndication would effectively create a Wallstreet Journal Penalty. But clearly, that’s not happening.
We’re keeping our footprint squeaky clean while we move towards better verticalization of our links. Likewise, with HOTH Plus+, you can now have content written by 100% US educated, talented writers who produce SEO copy to the highest standards of readability. All of this means that you can continue using The HOTH confidently.
As always, we advise that you use it as a tool in your toolbox and not as a silver bullet. In SEO, there are no silver bullets, and with time, as the algorithm becomes increasingly more complex, there will be fewer still. Use us as a fantastic tool, but if the Microsite Masters data tells us anything, those 10% of links from relevant/authority sites that are the biggest difference between penalized sites and safe sites are still on you.
- Keep calm. Don’t panic. Change will come again, probably soon.
- Rough rules of thumb (based on Microsite Masters’ data analysis above): Keep 10% of your of your link profile from relevant sites and and at least 35% of your anchor texts natural/not-optimized.
- Analyze your actual SERPs. Use Open Site Explorer, Ahrefs, and Majestic SEO to analyze who is ranking in your niche. Do you see any trends for sites that are ranking? What is their anchor distribution like? See what you can learn about what Google is actually favoring in your SERPs by analyzing the winners and adjust your SEO strategy accordingly.
- Use a well rounded link building strategy, such as the one we taught in our Link Building Pyramid webinar earlier this year (full webinar unfortunately not available due to corrupt recording, but here is a condensed version via PubCon). Don’t put all your eggs in 1 basket. Like any investment, building all the same kind of links is not sound.
- Maximize the value out of every visitor that comes to your site. STOP SQUANDERING YOUR TRAFFIC. If you don’t already live the age-old advertising adage “the money is in the list,” now is the time to start. This means you should be retargeting your visitors with services like AdRoll, split testing the living crap out of your site, and doing everything you can to get visitors on your mailing list (whether they buy or not). So many marketers squander their traffic by only giving themselves 1 shot with visitors. Find ways to turn a visit into a relationship that may yield many touch points and many possible sales down the road, not just 1. I know for a fact we’ve gotten some of our biggest customers because they visited us once, saw us all over the internet for a while thanks to retargeting and finally became curious enough to give us a shot. If the only part of the funnel you’re interested is the mouth (i.e. traffic generation) you will find yourself extremely frustrated over time and ultra susceptible to quirks that come up with different traffic sources. In my opinion, understanding, building and optimizing sales funnels versus merely getting traffic is the definitive difference between a bonafide internet marketer and an SEO technician. There is nothing wrong with either, but the prior has a career even if Google gets nuked tomorrow.
- Look for guest posting and press opportunities. There are a billion articles written about the art of guest posting so I won’t waste your time beating a dead horse. Just do it. This will get you really solid authoritative links and, if your guest posts are at least decent, you’ll get great qualified traffic back to your site. In many cases, the market value of a guest post can easily be hundreds or thousands of dollars, while the return may be higher than any banner ad. The only thing it costs is brainpower and time. Likewise, press is easier than ever with services like HARO that will send you emails every day packed with requests from journalists for qualified sources. Pitch the ones that you feel you’re qualified for and you may get some press. Again, this is super smart because it drives qualified traffic, builds reputational assets, and depending on the quality of the sites you get published on, will likely result in a ton more syndicated links from elsewhere. Don’t have time to reply to HARO? No problem. There’s an app for that.
- Watch our Panda 3.3 webinar. We basically called all the warning signs of Penguin, over a month before its release and adjusted our product accordingly. Many of those ideas are shadowed in this article, but you can get more depth on what’s been going on with Google over the past few months, what we’re doing about it and what you can too.
- A huge portion of SERPs now contain results linked to a specific author. Whether this is causal or correlative is hard to tell, but its pretty safe to say that Google is going to give more credit to content associated with trusted author accounts over time. Now seems like a good time to set yours up. Here are the official Google instruction on how to setup your author profile.
- Stay curious. In the words of Chris Rempel, “the fat lady has not sung.” Pay attention to the SERPs. Analyze your competitors. Notice changes. And give yourself the benefit of the doubt that it’s not necessarily you who is screwing up, but Google (at least sometimes). This will keep you sane, but more importantly, it will give you the perspective not to ditch everything you know every time there’s a bump in road and over-invest into appeasing Google’s latest temper tantrum. This update has literally been out for 3 weeks now. Our standard advice to anyone doing any kind of SEO work is to watch for changes in the SERPs for 3–6 weeks from the time of the initial change — even for something as simple as link building. For something as major as this, I’d definitely recommend the same.
- Evolve. As you look at the serps, you may find many new opportunities to experiment with. Right now, large web 2.0 sites (i.e. tumblr, blogspot, wordpress, youtube) seem to be ranking abnormally high. If you have the bandwidth, move quick and exploit these opportunities. If you don’t, don’t fret. This probably isn’t something that’ll make or break you in the future.
What NOT To Do
- DO NOT file a re-inclusion request unless you’ve been de-indexed. This seems to be a popular move amongst panickers. If you can type in your sites URL into the Google search bar and it finds your site, you haven’t been de-indexed. In this case, filling out a reinclusion request makes about as much sense as filling out your death certificate when you have the flu.
- DO NOT assist Google on its crusade. Google has created a page for you to assist them in screwing over sites. The reasons why you shouldn’t do this are endless. First there is the karmic factor. You can bet its a matter of time before someone does the same to you. Then there is the fact that just because Google slaps one of your competitors doesn’t mean you will necessarily rank any higher. Generally, this is a huge waste of time and only invites greater scrutiny into your particular SERPs, which is something that almost never benefits anyone.
- This one is not a strict “do not,” but I personally wouldn’t fill out Google’s Penguin Feedback Form (aka “AHHH! WHAT’D YOU DO TO MY SITE?!?!?!?!” form). First of all, as a rule of thumb, any time you ask for a manual review, you are usually asking for trouble. Reviewers are not interested in helping you. Just because you requested it doesn’t mean they’re magically on your side. And more importantly, I just haven’t heard of it helping anyone yet, so it seems like a waste of time. Proceed at your own risk.