Archiving sites

When we rebuilt the openly local map we knew that there were hundreds of hyperlocal sites, Facebook pages and Twitter accounts out there that weren’t on the map.  And that the map database contained hundreds of sites that had ceased publishing as natural wastage since 2009 in a fast moving sector where people are testing a myriad of business models.  Based on our recent work we judge that there are between 1,500 and 2,500 active hyperlocal sites in the UK of which we have 511 mapped.

Through some desk based exercises in the last couple of months, Finding Scottish Hyperlocals, Devon & Somerset – putting about 100 hyperlocal sites on the map and the Wakefield experiment we have added 209 sites to the map – not ‘new’ sites, just sites that had been around for ages but hadn’t got on the map. Our exercise covered barely 10% of the population. So based on this exercise we are cautiously confident that there are at least 1,000 other sites out there waiting to be added.

We have known for a long time that the OL map had lots of dead wood in it – sites that started and then shut up shop for some reason as part of natural wastage since 2009 – for instance the Daily Mail’s ‘Local People’ sites that ‘restructured’ in 2013 removing 122 sites from the database. So I’ve spent the past week reviewing all the sites on local Web List and archiving sites.

When we took the map over from Openly Local we imported 716 sites.  We added in around 209 in our desk based exercises.  This led to a grand total of 925 sites.

Every one of the 925 sites has been checked to see if it still exists and is occasionally updated, i.e is active somehow.  From this process the current state of the database is:

  • 925 sites at start of exercise
  • 372 sites no longer active (Archived but findable through search)
  • 42 sites that seem to have become spam or malware factories as their domains have been taken over – we have unpublished these to protect the unwary
  • 511 active hyperlocals displayed on map

This leaves 511 active hyperlocal sites on the map in the UK and ROI (that we know of).  These changes are natural wastage over five years that we have been overdue in catching up with.
Will and I have made a decision to not delete any sites in the list that are no longer active, but to move them into Archive status. This means that while they don’t appear on the map or lists of sites, they are still searchable. In the future if they do become active again we can just change the status of the site back to live.

The criteria I used when reviewing the sites was very roughly this:

site-checking-process

Dead sites – Does the site exist yes or no, if the site returns a 404 or server doesn’t respond then I archive it.

Changed sites – If the site exists then I check is it the original site, if it isn’t then I’ll unpublish it.

Spam/Link Bait I have found that some domains have expired and been hoovered up, these sites are now something completely different, some are legitimate sites for new things but a vast majority of them are spam / link bait sites. Will and I will need to discuss what is the best way forward with these, I’d hate to just delete them so maybe we just need to remove the URL so people can’t click through. If you have a view on what we should do, then let us know.

Active sites – if it is the original site, I check to see if it has been updated in roughly the last 3 months, I take a varying view on this depending on the type of site:

News-y sites  So if it is a site that looks news led i.e. latest stories on the front page, or one that I know used to publish every day or week I look to see when the last post was published. If it has been updated recently I leave it, if it hasn’t then I move it to archive status.

Community information – If the site is more a community information site than a news site and hasn’t been updated then I leave it.

One thing that appeared during this exercise is that custom sites with domains seemed to stop publishing and die off more than sites that used the available free platforms like  Blogger & WordPress. Maybe the hassle of having to keep the site updated technically as well as with content makes it less fun? More analysis would be needed to draw any real conclusion from this.

All of this isn’t fool proof, so if I have got your site wrong or you think we should be doing it differently then get in touch with us.

Updated Search

We’ve updated the search on the site so it includes meta data. WordPress doesn’t search this data, tags & categories, out of the box so we’ve added it in.

So if you want to search for sites built on a certain platform type the platform in to the search box and hit enter, the same goes for networks.

We have removed the links to Networks & Platforms from the side bar as a result of this update. If you have any problems with the search then please let us know.