Wikipedia or Britannica? The case for crowd sourcing in Enterprise IT

Britannica recently announced the closure of its 244-year-old print version and my first response was that this is a clear indication of the start of the end game between Wikipedia and Britannica. However I was surprised to read a message from Jorge Cauz (President from Britannica) in the same article. He exuded confidence and stated that he would bet a lot of money that most people would rather use Britannica than Wikipedia in the future. This got me curious. I’ve never felt the need for Britannica encyclopedia and have always relied on Wikipedia to learn anything new. When almost everything is available in Wikipedia for free, why would one want to go elsewhere and pay for it? What exactly is the case for Britannica? The other bigger question that had kept me disturbed in the recent past is the case for crowd sourcing (companies like innocentive, topcoder, crowdANALYTIX etc.) in enterprise IT. And I can imagine no better success story than Wikipedia, when it comes to crowd sourcing. In this blog I’ve tried to answer my own question by attempting to imagine Cauz’s reasoning.

Who will own the master data in the web?

We are in the age of information explosion. Computing will get pervasive and digital data is going to get overwhelming. Information will be available through various channels and all this is going to make fact finding a challenge. Fact is going to be diluted by one’s imagination in the form of bias, opinions and individual ideologies. Truth will continue to matter for the curious minds and the search for it is sure to get increasingly difficult. Where does one find the truth then? I’ll try to answer that through an even fundamental question.  How does an individual experience truth in a subject? Every individual is likely to have his trusted network (friends, respected authors, professionals like lawyers, doctors) to get access to facts in the respected fields like medicine, law or politics and he is most likely to believe what they say as truth. But how about subjects or topics that are completely alien to his network too. Or when he wishes to experience truth himself? Don’t we deserve a single source of truth somewhere? For years and centuries Britannica encyclopedia was this source of truth until joined by Wikipedia in the recent past. I believe both of them have a genuine cause, albeit different philosophies. But out of these two, which will emerge as the fact repository in the web? Let us look into their models in detail and also identify a clear premise to compare them. In this article, I will consider Wikipedia and Britannica in their purest forms, free of any adjustments (includes incorporating best practices from each other) either of them would have done to their models in the past. This premise will place them in 2 ends of a spectrum.

I see Wikipedia to be built around the concept that knowledge should be made accessible to everyone in this world. There should be no obstacle to a curious mind. The repository is free of guard (well, at least as compared to Britannica) and anyone can contribute to this knowledge base either by creating new content or editing existing content. The expectation is a responsible user community. The model is certainly hard to digest but I perceive it works really well. On the other end, I see Britannica to be built around the concept that knowledge is precious and it has to be created and closely guarded by content matter experts. Else there is a possibility of it getting diluted to opinions, bias etc. This model necessitates a small price be paid by the seeker thereby making one earn the knowledge. I understand this too to be a valid paradigm. Until one earns knowledge, he/she is most likely not to appreciate the value of it.

Apart from the content being free, I believe Wikipedia will always stay ahead of Britannica in terms of content coverage. This is primarily due to two reasons that are inherent to the model– (1) the higher number of authors and (2) ability to allow the truth to evolve and settle. For Britannica to always provide accurate facts to us it has to be absolutely certain before base lining a fact and this cannot be met without a time delay. This sure sounds like a very important advantage for Wikipedia, but what about its accuracy? Does an open framework mean inaccurate data? Is there a way to compare the 2 for accuracy? In the year 2005, Nature (a scientific journal) did a study by choosing 42 articles from both sites across multiple topics and had those all reviewed by relevant field experts. As per the article, Wikipedia had 162 such problems, while Britannica had 123. That averages out to 2.92 mistakes per article for Britannica and 3.86 for Wikipedia. For Wikipedia, that does not sound bad at all.

If Wikipedia can provide a wide range of content free of cost, with accuracy close to that of Britannica, why would anyone still prefer Britannica encyclopedia? Assuming Britannica will maintain its precision in accuracy even in the future, I can think of 2 kinds of users who would subscribe to Britannica – Those locked in customers (ones used to Britannica) and who research on base lined facts like history, science etc. will most likely find it difficult to switch. Students especially in their early stages of development using Encyclopedia from their childhood will evolve into these users. Understandably 85% of Britannica’s revenue is from educational products. Britannica has the remaining 15% of its revenue from annual subscriptions. To comprehend these set of users, let me explain another important influencing factor and how I understand it – ‘accountability’. I understand accountability as betting and taking responsibility of a particular event (in this case fact accuracy) at any particular point in time. In a truly Wikipedia model which is 100% open, time has no relevance in it. Any attempt to bring accountability (Wikipedia) on the accuracy of the data cannot happen without compromising on the freedom that the model provides. I think Wikipedia will continue to be paranoid about how quickly content can be corrected in the event of vandalism and Britannica will continue to be paranoid about avoiding vandalism in the first place. Of the two, I believe Britannica is better positioned to stay accountable. I call these other kind of users as ‘serious seekers’ who will be passionate about facts. These seekers will most likely prefer Britannica and I believe Britannica could evolve as an authorized source of facts for the topics it carries. Casual seekers, a seeker who is simply curious and interested in current affairs or trends or simply learning new things will obviously be reaching out for Google and Wikipedia. I think most of us; most of the time will play the role of a casual seeker.

In summary, the clear advantage of Wikipedia as I see is its accessibility and content coverage, both driven by its open model. For Britannica it is going to be ‘accountability’. This way I anticipate Wikipedia to emerge as the ‘popular source’ and Britannica as the ‘authorized source’. So, coming back to my original question, how do we translate this to crowd sourcing in enterprise IT?

One interesting difference between Encyclopedia and Software is that while software writing is all about unleashing one’s imagination (creativity), content preparation for an encyclopedia is all about suspending one’s imagination (discipline). To state a fact, one has to strive to be 100% neutral and be as unimaginative as possible. This should however not discourage us from leveraging our learning that we just had. How do we map ‘accessibility’ and ‘content coverage’ to crowd sourcing in software? Accessibility in software can be interpreted as accessibility to available talent. Clearly crowd sourcing in software opens up a ‘never imagined’ talent pool to the solution seeker.  Content coverage can be mapped to innovation in software. At the heart of the crowd sourcing model is complete freedom of choice and expression. A problem solver has full freedom to choose the problem that he wishes to solve and how he wishes to solve. For this model to sustain, I believe 2 things need to hold good (1) the solver should be skilled and passionate and (2) the problem should demand innovation. Compared to traditional outsourcing, I certainly see a better possibility to innovate in crowd sourcing. In software, I would like to interpret ‘accountability’ as ‘timing the solution delivery’ or simply ‘delivery certainty’. Similar to Wikipedia, time has no relevance in pure crowd sourcing. Any attempt to bring this certainty cannot happen without compromising on the freedom it offers. Like Britannica, I believe this (delivery certainty) will continue to be the strength of mainstream outsourcing.

I anticipate, Crowd sourcing and traditional outsourcing to serve different use cases in the future. Crowd sourcing in the purest form can be disruptive and be an appropriate model to foster innovative solutions for an enterprise but it will be a challenge to time the outcome. Traditional outsourcing will continue to do well in providing this certainty but the model will find it difficult to innovate.


