пятница, 16 декабря 2016 г.

List of Bookmarking Sites Powered by Pligg

Despite that at the end of this post you will find a filtered and somehow manually edited list of Pligg-based social bookmarking sites, this post is rather about some tricks which can help you to quickly evaluate the quality and freshness of any directory list you find on the internet.

Finding a good list of Pligg sites

I have previously attempted to find a comprehensive and organized list of social bookmarking sites, but I just couldn’t find a sophisticated collection of these sites similar to a quality directory list for instance like eat-me.s-cars.com.ua has. This time I have decided to look for a list based on the type of the Social Bookmarking CMS. Although there are other specialized solutions for creating a social bookmarking site (like PHPdugg or Scuttle), the most popular of them is the Pligg CMS, so I have decided to seek a list of Pligg powered sites.
I have came accross some sources which listed a few hundred of Pliggs or other sites which wanted to sell me those lists for a few bucks, but what seemed for me the most promising at first sight was published in a forum thread and referenced from many other forums: this post stated that there are 9000 Pligg sites listed there.

Evaluating the available lists

As I was randomly clicked in this list quickly realized that there are many dead sites and parked domains listed here. Nevertheless I have copied this list and realized that instead of nine thousand, the list contained only about 1735 links. I have removed the unnecessary endings (/register.php for example) and also cleared the www. subdomains to have a list of domains only.

Removing duplicate entries

As I sorted this list with OpenOffice Calc, I found out that there are many duplicate entries in this list. Having pruned the duplicate rows from this list with this method the list shrinked to 1548 sites.

Retrieving Page Rank values

Then I fed this list to Alex Polski’s very handy Mass PageRank Checker (run on my own notebook with MAMP) to check the Page Rank values of the listed sites. The idea behind this was that if a site cannot get PageRank value in almost one year (or simply have lost it since the original list was published), then it is a waste of time to deal with that site. Therefore the next step was to sort the remained list by Page Rank values and eliminate those which had PageRank zero values: having removed these sites not respected by Google I ended up having a list of only 453 sites.

Checking Pligg specific URLs

The third step was to check whether the sites in question are still powered by Pligg. To validate this, first created a HTML page from this list where the links were pointing to the /upcoming.php URI, so I have added this to the end of every listed site. Since (hopefully, I am not a Pligg expert) all Pligg sites have this subpage, with this trick, I could identify the pages which are powered by pligg. Then I opened this HTML file with Firefox having theFirefox Link Checker Add-on installed and activated. This add-on simply goes through all links present in a web page and gives them different background colors and a title element to all links based on the results of the link check (with Valid, Invalid, or Forwarded status). You have to wait for a while until LinkChecker does the job, but when it finishes, you can download the web page annotated by this addon. Note: This step also could have been done before the mass page rank check.
The final step was to copy into the spreadsheet the annotated html file next to the remained list: sort it by the column which was copied from the html file and remove all those entries where the status was not “Valid link”. At the end of this process the result was a list of 271 Pligg sites what you can download from here.

Manual check of the listed sites

Having clicked on some random links listed I had to realize that the result of the latter step accomplished with the Link Checker add-on was far from satisfactory.  Some domain parking systems for instance will not give you 404 Not found error no matter what URL you request, therefore — if a parked domain still maintains its Page Rank value — these cannot be filtered with the above described methods. Therefore I decided to click through all the listed sites, and check all of them manually. At the end of this painful last step the real list of Pligg sites extracted from the original list has only 154 sites. (see below)

What was not validated?

  • Topic: Many pligg powered sites are dedicated to collect links from a certain niche topic of only those links which are written in a foreign language. To decide whether a site is useful for general link building since it accepts submissions from all topics (or languages), you should visit these sites one by one.
  • Dofollow/nofollow: Unfortunately I don’t know a good automated methot to check whether a Pligg site is dofollow or nofollow.

Conclusion

  • If you see somewhere that a certain list has many thousand sites, do not believe it automatically (see 1735 vs. 9000)
  • Things can change quite fast: a list published ten months ago can became rather outdated.
  • A long list does not automatically mean quality list: in this case many sites were already broken at the time of publishing (see comments at the original thread).
  • No automated tests can provide the same performance as the good old manual check procedure.
  • Pligg sites with good page rank values most likely concentrate on niche topics or on a foreign language. For instance one of the listed PR6 Pligg sites is a non-English site with Joomla related news, while the other is a Turkish site (which are useless for many of us trying to avoid spamming these social bookmarking sites)

The manually checked list of Pligg Sites