Today, we’re happy to launch a new Data Channel with Dow Jones, publisher of The Wall Street Journal. Sharing data from Factiva, a business information and research tool on news content owned by Dow Jones, find data about media coverage for a variety of topics, explore visualizations using or related to such data, and see other noteworthy visualizations selected by Dow Jones.
The competition for the world’s tallest building feels ever-present with countries around the world vying for the prestigious title. Dow Jones kicks off their Data Channel with a unique data set of collected media mentions related to the official openings of the world’s tallest buildings, including the recently opened London skyscraper, The Shard. Explore how people in a country talk about a building. How does media coverage compare in countries that already have their own horse in the race? Mash up the data with another source and discover if there are any economic relationships to having a record-breaking building in your country? This data set is just the start of exploring how media coverage data can reveal insight about the world. And to get an idea of what else lies ahead, we talked with Barak Ronen from Factiva.
Visualizing: Factiva collects data on the world’s most important news. Describe the data — where does it come from and how is it processed?
Barak Ronen: Factiva’s data about the news is a precious derivative of actually collecting, storing, categorizing and therefore understanding the news. Factiva licenses content from many thousands of sources, in twenty-eight different languages from almost two hundred countries – more than half of the key sources we have are not available freely on the internet but are password and paywall protected. Factiva harmonizes this huge volume of news based on the unique attributes of each and every news article. These attributes include elements such as area of origin, type of source, but also more complex elements that have to do with the ‘about-ness’ of the article – what does it actually talk about? which industries is it relevant for – finance or the auto industry? What kind of event does it describe – a new product launch or corporate acquisition? Where is this article relevant – which is not the same as where it was written, is it a Wall Street Journal article about property prices in Mumbai? The answers to all of these questions are generated based on our elaborate taxonomies and unique algorithms and are associated with the articles as metadata tags. Our data is derived by querying our news repository and then quantifying these ‘about-ness’ metadata tags to give a unique insight into the essence of news around a specific theme.
V: What in this data should designers and coders get excited about?
BR: With Factiva data designers and coders are able to tell the story of the news. The story of how subjects and issues in the world around us are perceived, processed and interpreted. What are we talking about? Who is talking about what? How are we framing these discussions? What are the relative weights given in the media for comparable or very different events in the real world? How do these more subjective elements compare with more objective datasets? And of course coders and designers will be rightly tempted to bring their own judgments and view into their creations: what’s ‘better’ or ‘more accurate’ the subjective media reflection of a subject, or an ‘objective’ set of facts?
Factiva’s data is unique in that it comes from an astonishingly high number of sources not available on the free internet. It allows designers and coders to get to the real story of news. Instead of showing and interpreting just the tip of the iceberg you get to work with the actual iceberg. And that’s really exciting.
V: What makes Factiva’s data unique from other aggregated news sites?
BR: Factiva’s data is uniquely broad and deep. The data is deep because the metadata taxonomies applied to the news articles are poly-hierarchical; they have multiple levels where articles belong to a very specific tag but also to a wider data category. The number of different tags is very high – with hundreds of thousands of region, industry and subject tags, and millions of different company tags. The breadth of the data goes back to the amount and variety of the content sources, as well as the length of the archive. With millions of well-tagged articles sourced globally from quality and rare sources – you can really generate meaningful insight.
Factiva’s accuracy is industry leading. We have extremely high Precision and Recall scores which we achieve by constantly monitoring our tags in all languages, and systematically comparing our algorithms’ accuracy to human-based tagging. We bring the highest quality editorial know-how into the algorithm.
V: What are some useful tips for designers working with this data?
BR: Know the story you are after – Factiva’s data, especially the deeper and more complex data sets we will be sharing, can be very rich so don’t get lost in the numbers. Also, keep your eyes open for a more interesting story that might be hidden within the data – maybe a type-of-source-based comparison of a the coverage of a given story is much more interesting than a geography based one? Contrast the media analysis data from Factiva with other data sets such as economic indicators and statistics – subjective vs. objective is always an interesting match-up. And lastly, brace yourself for a lively conversation – there will be people out there who will have a completely different slant on any subject and will challenge your observations. Everyone will see patterns, just not necessarily the same ones.
To check out the Dow Jones Data Channel, click here