Data journalism, robots … and predictions on where we go from here



At Google’s latest Digital News Initiative conference, held in Amsterdam last week, there were plenty of ideas being discussed around what the future of news looked like.

The DNI involves Google investing millions of pounds in projects put forward by media organisations large and small from across Europe, which could help shape the future of the media and support the development of journalism in the years to come.

Britain’s Press Association was one of the biggest winners this time, securing over 700,000 Euros to fund a new news service which will generate 30,000 local news stories a month sourced from data … and written by ‘robots’.

A team of five journalists will spot stories in data sets and then use artificial intelligence to create potentially hundreds of versions for different locations – hence the notion of robots.

AI-powered (Artifical Intelligence) journalism has been bubbling for a number of years. In America, Chicago Tribune publisher Tronc plans to use AI to auto-generate up to 2,000 videos a day to support stories, news agency AP has increased its volume of earnings reports from business announcements by 10-fold using AI (with the firm saying there are fewer errors than when humans did them)and the same company is now using AI to write minor baseball league reports.

At the Washington Post, ‘robots’ are deployed to write results of some elections, and also for sport. At the LA Times, a bot automatically sends out alerts whenever an earthquake is recorded – an inadvertently alarmed people about an earthquake predication for 2025 after a bug entered the system which powered the data the LA Times relies on.

So are we at a tipping point where technology now replaces reporters in newsrooms up and down the UK? I don’t think so – especially if we embrace their potential. Here’s why:

Continue reading


Removing all trace of appearing in a vox pop … or why using the ‘right to be forgotten’ is an own goal

You can hide in Google …. but you can still be found

What sort of person contacts Google to make the most of the ‘right to be forgotten’ ruling which entitles people to demand the search engine remove any results about themselves which they think are ‘outdated’ or ‘irrelevant.’?

This week, publishers began to find out who was making the most of the opportunity served up by a European court ruling. According to the Guardian, 70,000 such requests have been received so far, and whether they are are true, accurate or fair articles doesn’t enter into the equation thanks to the Google ruling.

Google has notified websites of links it will no longer be able to show ‘for certain searches’ on its European search pages – and the first bunch of links I’ve seen this week make cover a wide range of issues – and where more than one person is involved in a story, we don’t know who has triggered the complaint.

Not surprisingly, there were a whole bunch of links to stories of people who were either on the wrong side of the law, or being exposed as being such by the newspaper.

The most random one, though, was the story of ‘parking rage’ being a regular occurrence in a Buckinghamshire village. No-one’s court appearance was reported, no-one’s embarrassing exploits shared with the world. Just the concerns of people who didn’t like the way people were falling out over parking. It appears to be a

There has been a lot of concern about this ruling, and I saw one comment which likened it to ‘burning the books in the library.’ That’s not quite the case, because there is nothing in the ruling compelling publishers to remove stories people want to disappeared off the internet.

Many of the newsrooms I work with have had calls from people demanding content be removed from online archives ‘because I now have a right to be forgotten.’ That’s wishful thinking on their part … they have a right to be removed from Google in Europe, that’s all.

Continue reading

Seven useful search engines for journalists

One of the first posts I wrote when I began this blog looked at alternative search engines to Google for journalists. It wasn’t a knocking post about Google, but a post which aimed to explore if there were alternatives to Google for journalists seeking information beyond Google’s first page.

Earlier this year, it became the most read post on my blog by a country mile after being linked to from an American forum, and with that link came a list of suggested other useful search engines to explore.

Last week I finally got around to looking at some of the suggestions, as well as some new ones which I’d heard about elsewhere. What follows is a list of seven search engines which I think have potential for journalists who are digging for information related to stories and projects. It’s not intended to be definitive, just useful (I hope):

Continue reading

What can the news industry learn from the Meerkats?


Rupert Murdoch tells the world that Google should pay for the content it accesses from his websites. Google responds by saying it isn’t in the business of producing content, it’s just there to help people find it. And then the debate continues about who needs who more:  Do newspaper websites need Google to get an audience more than Google needs newspaper websites to satisfy the desire of its users to find content which is of interest to them?

Google News has always struck me as the online version of the world’s best newspaper A-board.

Rather than a busy sub-editor rushing through a dozen or so bills on the stories s/he thinks will grab the interest of the passer by, with no guarantee any will make it on to the board, stick a decent story online and you’re pretty much guaranteed a showing in Google News results for searches which share words with those in your article.

Of course, the downside is that whereas your average newspaper probably only has to compete with two or three rival A boards outside the newsagent, on t’web you’re up against potentially thousands of rivals. Which is where the murky world of SEO comes into play.

One of the arguments critics of Murdoch throw back at him when he has a pop at Google is “If you hate it so much, it’s very easy to get Google to stop crawling your site, why don’t you do that?” The answer, at first, seems obvious: you don’t want to lose the traffic.

But what if we compared the news industry to the insurance market – in other words, what if this post was inspired by meerkats?

One one hand, we have websites such as Compare the Market (you do your own impressions of the meerkats here, ok?), and MoneySupermarket which promise to find us the best deal on car/house/travel insurance and so on. Several quick clicks and you have the best deal, right?

Continue reading

So, about picking the X Factor winner from search…

Ok, so I’m not sure where I’m going with this. A week ago, I decided to test out Bill Tancer’s theory that by monitoring search trends, you can determine who will win a talent contest (a televised one, of course) several weeks in advance.

Of course, Bill has the benefit for all the Hitwise data at his disposal. I don’t. So I thought I’d try it using Google Insights for Search on the X Factor. Based on Google’s indicitive popularity, it was clear Kandy Rain were the most popular act.

But given they were a group of ex-strippers, perhaps it wasn’t so surprising that they were so hotly searched, while the face their singing was, well, disappointing, perhaps meant it wasn’t such a surprise that they were the least voted for act in the live finals.

So where does that leave the notion that you can predict a winner via search popularity? Well, I’m going to give it another go, but I think the caution to be taken from last week’s experiment is the value of the “story” on X Factor.

We all know the X Factor contestants often have a back story which makes the headlines, or an event takes place on the show which grabs the headlines. That clearly had a bearing on Kandy Rain’s search results last week.

Continue reading

The X-Factor search experiment: Kandy Rain

On Thursday, in my first post on this blog, I explored the idea that by studying search trends you can pick an early winner of a a talent competition.

This was based on the book “Click” I have been reading, by Bill Tancer of Hitwise. He demonstrated how it was obvious Mark Ramprakash was going to win Strictly Come Dancing in 2006 several weeks before the final result based on search volumes.

I then applied this to the final 12 in X Factor, using Google Insights for Search. The result was that Kandy Rain were the most searched act over the last week  and concluded by saying:

So, following current search trends as listed by Google in the UK, the winner will be Kandy Rain, a group.

The same Kandy Rain who were the first to be booted off on tonight’s show. So while I’ve proved here that search engine popularity probably won’t ever become part of the process for someone selecting a winner at the bookies, has the theory that search engine popularity = votes on a show been proved wrong?

I’m tempted to argue that no, it hasn’t. At the end of the day, out of all of the acts, Kandy Rain had the back story most likely to be researched online – ex pole dancers and all that. And while there’s no doubt they were popular on Google, it’s certainly no substitute for  the ability to sing, as Kandy Rain proved tonight.

So, so far I’m tempted to say all I’ve proved is that it’s too early to rely on search popularity to pick a winner – which is why, later this week, I’m going to do it all over again and see what jumps out.

That might sound a bit indulgent, but I will be writing other stuff which might be of more interest, so please stick with me!

Five search engines (other than Google) for journalists

Warning: This isn’t a knocking post about Google. Google is great for the vast amount of searches we do, but it’s always dangerous as a journalist to fall into the trap of only ever using one search.

If Google does have a problem, it’s the fact that with so many different organisations competing to be on the first page of results, it’s quite possible that the search results for a given term won’t change from one month to the next.

There are an abundance of other search engines around – some good, some bad, some just a little different – but there are a number which I’ve found useful for journalistic purposes over the past few months.

Here’s five – and how they could be used.

1. Addictomatic: Best for one glance at your beat


Addictomatic is ideal if you have a set brief in your job – be it as a district reporter or a specialist. Enter a search term – in the screen grab above I’ve used “Sutton Coldfield” and it returns the most recent results from a variety of sources including Youtube, Bing, Google Blog Search,  and Flickr. A bit like Google Reader in a sense – it’s a one-stop place to keep up to date on an issue – but without the hassle of setting it up.

2. Wolfram Alpha: Best for one fact answers quickly

february 1 1972 - Wolfram-Alpha_1255096376933

Launched in a blaze of glory not so long ago, it aims to to make all systematic knowledge immediately computable
by anyone.
While it’s still a way off from achieving that goal, it’s a much handier way of working things out that going through Google (although Google is launching more and more stat-type search tools all the time). It’s great for stocks and shares information, for working out ages, for getting statistics on places and subjects you might be searching for. Oh yes, and it’s great at doing sums for you too!

3. Keotag: Best for searching across blogs

Keotag - tag search multiple engines, tag generator and social bookmark links generator_1255097427687

Keotag is the best search engine I’ve found when searching for blogs. It searches around 17 search facilities and then lets you search through the results, search engine by search engine. I’ve always found searching blogs infuriating, and often Twingly, Icerocket and Google Blogs get dominated by non-blogs or by spam. I think Keotag get around this by just looking at tags in blog posts. As a result, the searches results tend to be much better. The fact you can clearly search multiple search engines in one place is a big boost.

4. Cuil – Best for research at the start of a project

preston north end1 - Cuil_1255115388647

Cuil had made a big play about the fact it doesn’t doesn’t return results on page rank, it also digs into each page, and then finds content which is related to it. It then serves it up in a way which makes it great for getting to grips with a subject. is also good in this respect. as starting points go for research on a new subject quickly, both beat Google.

5. Omgili – Finding the conversations people are having in forums

English Defence League - Forums & Discussions_1255116772106

Omgili is the sort of search engine I’ve been looking for for ages – one which makes it easy to find places where people are talking about what you might be writing about. Boardreader does this, but not as well, while others have come and gone. If you’re after just monitoring what’s being discussed on Twitter then is the most effective option. But, as a journalist looking for communities who might be able to get involved/might be interested in what you’re writing about, Omgili is superb.

This list isn’t intended to be a definitive list of the search engines journalists should seek to use, so if you’ve got any secret search engines, please tell me about them!