The Golden Hammer in Software Technology

The law of the hammer goes like this – If all you have is a hammer, everything looks like a nail. The concept is used to symbolize the over reliance on a familiar tool. The hammer here is a metaphor to one’s habit of using a single tool for all purposes. As a computer science engineer, I have come across one specific pain point during my interactions with many professionals in my field. It is the resistance many of us have in moving beyond the SQL paradigm while finding a solution. In my opinion, SQL is a perceived golden hammer in Software Technology. Almost any technology that has attempted to offer something beyond what SQL offers, has needed the blessing of these SQL programmers (explained in the next paragraph) to gain adoption. A SQL flavor has become inevitable. One such technology that is facing this challenge is Hadoop, which too has tried its bit to lure the SQL community. This blog is my attempt to reason out this resistance.

When RDBMS rose to stardom in the 1980s, SQL offered a much-needed convenience to the then programmers through its declarative style by abstracting the underlying implementation and allowing the developer focus on the ‘what’ aspect. With the advent of SQL, developers no longer need to write long programs detailing the how part (imperative style) to retrieve the data elements. BTW, SQL in this article refers only to the non-procedural aspect of it until explicitly included, which is widely used. SQL did cover a lot of use cases through its limited set of functions and the world (especially the enterprises) was satisfied for the most part. Over a period of time, this convenience has resulted in a fairly large community of pure play SQL programmers who no more write algorithms i.e. do imperative style programming but expect to solve everything through SQL. In the Hadoop paradigm, Hive was an explicit attempt to lure such SQL programmers. Aster Data introduced SQL-MR (SQL – Map Reduce) to penetrate. I recently stumbled upon a white paper on Oracle’s In-DB Hadoop capabilities.  All of these exploit the procedural capabilities of SQL to position Hadoop, which again is imperative style.

Fundamentally, I see Hadoop/Map-reduce to bring us back to the basics, where in we use a procedural style to implement the intelligence that we wish to infer from the underlying data. Hadoop comes with a pre-built distributed infrastructure, where our algorithm can be applied on large volumes of data. With Analytics being the primary use case for Hadoop, it is intuitive that if we were to apply our own intelligence, we need to be able to define the algorithm i.e. the implementation as well. The moment that ‘intelligence’ becomes available as a pre-built SQL function, it means we have already commoditized it and one is forced to write a different/more innovative algorithm all over again. So ‘imperative style’ programming is fundamental to analytics. Some technology consultants tend to project Hadoop as a complex paradigm compared to SQL and propose Hadoop to converge with SQL to gain adoption.  The proposal is flawed for the fundamental reason that SQL is ‘declarative’ and Hadoop is ‘imperative’. Declarative style is only a high level abstraction and cannot exist without an imperative style implementation underneath. We will face this resistance until we embrace ’imperative style’ programming completely.

If Money: Swiss Bank then Data: ?

While my last blog on Internet privacy was a satisfying effort, the groundwork got me exposed to some disturbing details around data collection and the contracts that I have signed up for. With Google’s unified privacy policy, nowadays I either access Google or Gmail from a particular device. I have this feeling, whatever I type and click (be it a MS Word document, PowerPoint) might appear in my friend’s news feed and I think Amazon knows what coffee I’m going to buy tomorrow. If ‘information is wealth’ and when data like crash information of my computer can be of interest to someone, maybe ‘data is money’ too. In my experience, every online vendor has wanted some part of my identity i.e. some data to do business with. In the last few weeks, I was looking for a vendor in the Internet, who simply does business and doesn’t give a damn for one’s identity. Someone who can allow me experience privacy in its literal sense, maybe the way a ‘swiss bank’ treats it? Much to my surprise, I did find one such vendor and it is understandably controversial too.

Switzerland has inspired me through its armed neutrality policy. By staying neutral during both the world wars, Switzerland in a way has earned the world’s trust. Switzerland is also famed for is its banking secrecy (Singapore is a close contender these days). Switzerland’s banking laws are built on the premise that ‘privacy’ is one’s fundamental right and it is a criminal offence to reveal one’s private data. Historically especially in the recent past, Switzerland has been challenged for its banking secrecy and has yielded to international pressure on certain occasions too. Still, most of its banking secrecy laws still persist. By weathering many such storms, Swiss banks have definitely earned the customer’s trust in keeping their information private. Maybe a reason why a third of world’s offshore funds are estimated to be present in Swiss banks.

If I imagine Internet as a medium to express, I would map ‘swiss bank’ experience to the ability to stay totally anonymous in the web. To allow such levels of anonymity, like privacy in Switzerland, ‘freedom of speech’ should be recognized and honored as a fundamental right. Sweden strongly believes in this. Sweden is a pioneer in officially abolishing censorship and more importantly Sweden constitution gives the right for total anonymity in expressing one’s views unless it is perceived as a hate speech. While it is understandably difficult to draw the line, Sweden continues to strive to draw (or not draw) this line. Interestingly Sweden has many parallels with Switzerland. Like Switzerland, Sweden is also famed for its armed Neutrality. Sweden too stayed neutral in both the world wars. Sweden also offers high quality life and was ranked second (next to Switzerland) most competitive nation in the 2010 World Economic forum.

Sweden’s belief in ‘freedom of speech’ makes it less difficult for companies like Wikileaks to host their websites there. I think Freedom of Speech along with Sweden’s location (close to the Artic) makes an attractive choice for green data centers of the future (and present) as well. One specific vendor caught my attention and that is PRQ. PRQ is a web hosting company. PRQ claims to have a spotless track record in hosting some of the controversial websites in this world. Looks like PRQ needs no details about the customer and all what they need is a valid email address to receive an invoice. The disclaimer in their website is simply mind-blowing.

I think if Money: Swiss Bank then Data: Swedish Data Center.

Internet Privacy: Your anonymous signature in the web

Webster’s dictionary defines privacy as the quality or state of being apart from company or observation. ‘Online privacy’ has been a topic of concern and debate for a while. Richard Stallman had warned us long time back that Internet users need to swim against the tide if they were to protect their privacy. Recently, Steve Woznaik predicted that Cloud Computing could cause horrible problems to us in the next five years. Above all, privacy Expert Steve Rambam has declared, Privacy is actually dead and it’s time we get over it. While these are no good news to the consumer, technology giants (like Apple, Amazon, Google, Facebook, Microsoft) on the other end are claiming to take privacy pretty seriously. With all of them showing keenness to crack the human mind, I’m convinced it is fundamentally not possible for any of them to allow the consumer experience ‘privacy’ as defined by Webster’s. So, what’s the privacy that they are talking about? As I looked into their privacy policies, it is evident that at best, these policies explain clearly how one’s privacy (going strictly by Webster’s definition) can be compromised when their products are used. My attempt in this blog is to understand their policies and identify the one that intrudes the least into user’s privacy.

Google

Google has made it’s intentions clear through it’s privacy policy that for its products to provide a compelling user experience that is personalized and relevant, it would need to collect every possible information about customer behavior that is out there. The information includes details about the consumer’s device (hardware, OS, crash events etc.), customer’s footsteps in their site (click stream, IP, cookies), customer’s location details (GPS, nearby wi-fi towers, cell towers). Google apparently strives to get the most out of cookies, anonymous identifiers and pixel tags to know the customer’s mind. While Google’s ambitions maybe to decipher the human mind, it is feared for the enormity of the data it holds about the customer. Maybe it has enough data to identify the customer through his/her mind.

Some of the controversies in the past clearly reflect this sentiment. The incident of Google to have inadvertently collected Wi-fi payload data from public wi-fi spots was received with intense criticism. Its street cameras are perceived to be too invasive into our homes. Google’s recent move to unify its privacy policies across products is received with skepticism.

Amidst all these controversies, one thing that I see positive about goggle is its data liberation initiative, which is about allowing the user to liberate him/her along with their data from Google when one decides to stop using its services.

Apple

Apple collects as much information as Google does when the consumer is on its terrain (Apple website, icloud, app store etc.).  In fact in limited cases, Apple also collects the consumer’s SSN. While Google’s message to the consumers sounds like ‘We would like to know everything about you to delight you’, Apple’s message is more like ‘Look, we need all this data about you to delight you. You better be aware of this’. Apple has always mentioned it takes privacy seriously. Apple’s safari browser blocks third arty cookies by default and its opt-out channels are relatively straightforward and accessible.

But still Apple is not free of any controversies in the past. Notable instances include the location logging controversy last year where in iOS was found to be logging one’s location details and backing up in one of Apple’s servers. Apple acknowledged this to be a bug. The recent one is about exposing the unique identifier of an iphone/ipad device (UUID) though one its APIs. Apple has responded by deprecating this API as well. Apple was also criticized for allowing mobile apps (like Path) to access one’s personal contacts (without explicit consent) that could eventually land in external servers. Apple responded by including an explicit consent. Above all this, I imagine the biggest source of personal data to Apple is through its flagship app Siri. While Apple has made it clear that Siri’s data will only stay with Siri that sure is a lot of personal data including our voice.

Amazon

With its very minimal retail (store) presence, Amazon relies heavily on data for their marketing needs. I believe Amazon crunches enormous data to understand their customer behavior and target them with personalized product recommendations. Amazon has been pretty effective in handling data with no major controversies until last year. The only controversy (on privacy) that I can recall about Amazon is with its ‘Silk’ browser that is packaged as part of its tablets. There is a default option in Silk that allows every HTTP request go through Amazon’s cloud infrastructure. While Amazon assures you a better experience through speed, it allows it to capture the user’s entire web history. This does not apply to SSL connections and the user could turn off this feature as well.

Facebook

While the companies mentioned above are feared for what they know about you, Facebook is primarily feared for what they can reveal about you. While its mission is to make the world open and connected, its users have certainly found it difficult to digest that and keep pace with the change. There have been several controversies in the past where Facebook has reimagined privacy. Beacon feature is an acknowledged misstep. News feed was another controversial product in the beginning but proved successful over time. Facebook is currently pushing the envelope through its ‘frictionless sharing’ applications where there is nothing private after signing up for it. As usual, it is both loved and hated at the same time.

Microsoft

While there have been numerous controversies around Microsoft’s control over the platform, when it comes to Privacy, Microsoft comes out pretty clean. I observe its privacy policy to be the friendliest of the lot. While Microsoft is equally interested in data about consumer behavior, its privacy policy (Bing) explicitly calls out that personally identifiable information will not be correlated with the behavior data. Microsoft has demonstrated its ‘privacy by design’ philosophy on multiple occasions. While Siri transmits the voice to a remote server, Microsoft’s Kinect keeps the biometric data locally. While every one else assume opt-in by default (on many occasions) Microsoft keeps ‘Opt-Out’ as a default.  Microsoft’s recent heroic in privacy is about insisting and keeping ‘Do Not Track’ as the default setting in its IE10.  In my opinion, Microsoft emerges as a clear winner when it comes to Privacy.

Free Software: When man ignores a walled garden for a forest

Please read him as him/her. Man refers here to mankind/humanity in general and includes everyone in this planet.

In 2008, when most of the world was excited about cloud computing, he called it worse than stupidity. When almost the entire world mourned the death of Steve Jobs in 2011, he said he was glad that he is gone. Although a computer programmer and a techie himself, he advocates paper voting to machine voting. He does not recommend mobile phones. He is against Software Patents and Digital Rights Management, prompting Microsoft CEO Steve Ballmer to call the license he authored ‘a cancer that attaches itself in an intellectual property sense to everything it touches’. He is Richard M Stallman (rms going forward), founder of Free Software Foundation and the author of the GNU public license. I have perceived rms to a stubborn personality and one whose views (when implemented) could cause a lot of inconvenience to the majority. This blog is my curious attempt to get into his mind and imagine his point of view.

‘I prefer the tumult of liberty to the quiet of servitude’ – Thomas Jefferson

Man likes to indulge and more importantly exert his capabilities. Products and services emerge to enable him achieve both of these with minimal effort. Man doesn’t really care about the ‘how’ part in these as long as they don’t infringe in his very ability to exert. When these products and services get prevalent/rigid and when man begins to feel constrained by his lack of choice, a force does emerge to remind mankind of his fundamental rights and basic freedom. Free software is one such force in the field of computing. The motivation for free software arose in a similar context during the seventies, when there was a considerable rise in proprietary software in the form of companies releasing software in binary/executable formats making it impossible for the end user to change or modify them on their computers. In the year 1980, copyright was extended to software programs. In 1983, rms announced his frustration and the GNU project was born. The goal of the GNU project was to develop a sufficient body of free software to get along without any software that is not free.

At the heart of Free Software is the GNU General public license (GNU GPL) authored by rms and we will focus on that. In the most fundamental sense, free software was and is never about money. As rms explains, it is about liberty and not price. The word ‘free’ should be thought of as in ‘free speech’ and not ‘free beer’. A software program that qualifies on this paradigm offers 4 essential freedoms to its users.

  • The freedom to run the program, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

I consider a river to be a good metaphor to understand the philosophy of free software. The follow of a river symbolizes freedom. The source of a river is analogous to the software written from scratch by a programmer who believes in the above 4 freedoms and opts to distribute it (free of cost or fee) under GNU GPL. Now anyone who is interested in this water can consume it at his own risk i.e. under GNU GPL, he can receive a copy of the software in full and do anything with it. This includes using the software for a commercial purpose, either using it as such or by modifying or tinkering the same to suit his purpose. The risk is analogous to the lack of warranty on the received software. GNU GPL offers no warranty on the received software. The consumer is not obliged to share the software with anyone even if he has received it free of cost. It is only when the consumer decides to become part of the river i.e. when he opts to distribute the software further (as such or the modified version), the license expects him to distribute again in full. This is called ‘copyleft’ a very powerful license strategy to ensure the freedom does not get diluted downstream. There can be no further restrictions added to this software that is in violation with the GNU GPL license. The license is built around this very important paradigm and this is where some of its challenges are. If there is a conflict of any sort, the license expects the consumer to not distribute the software at all. It is either all or none and there is no half-truth whatsoever.

So how does free software differ from open source? I’m sure many of us (including me) think they are synonymous whereas it is not. A well thought out article on this topic is written by rms himself that can be accessed here. As it might be evident from this article, the philosophy of GPL and free software is very fundamental. It is not just about keeping the source code in a software open. While that sure has the practical benefits, the philosophy is much deeper. In a way, it is about asserting what is important for humanity (which is freedom) as well. Any movement that asserts something righteous and fundamental is most likely to be one that inconveniences the most. There is a possibility that the very people who were the intended beneficiaries could reject such a movement. I imagine Open Source to be an initiative (splintered off from Free software in 1998) that intended to address this problem. Open source attempts to get the practical benefits out to the user first before educating him on what is right to him in the first place. While free software is a social movement aimed to respect (and even remind him of) user freedom, Open source is a development methodology that asserts the philosophy that the only way software could be made better is by keeping it open and accessible. Interestingly rms calls them ‘free software activists’ and ‘open source enthusiast’ in his article.

Now I will cite some very specific contexts that have challenged GPL in the past and how GPL has evolved. I believe in all these contexts, rms has stood firm without compromising on the core philosophy. The first context – What if one has a restriction imposed that prevent him from distributing GPL-covered software in a way that respects other users’ freedom (for example, if a legal ruling states that he or she can only distribute the software in binary form). In such a setting, rms asserts the software not be distributed at all. This is the most important change as part of version 2 of GPL in 1991. Rms calls it ‘liberty or death’.

What if the software is open but the underlying hardware is restrictive?  As part of GPL V3, this got addressed (tivoization) and GPL does not permit GPL covered software be installed on such a restrictive hardware.  Again Free software considers laws around DRM (Digital Rights Management or Digital Restrictions Management) as restrictive. Here again rms has stood firm while some prominent open source proponents (like Linus Torvalds) have a differing point of view. Linus’s view is that software licenses should control only software.

What about Patents? GPL addressed this in version 3 as well. Free software philosophy does not encourage patents that too for software on general-purpose computers. If someone has a patent claim on software that is covered under GPL, which he wishes to distribute, GPL prohibits him from imposing a license fee, royalty, or other charge (specific to the patent claim) for exercise of rights granted under this License.

While rms’s views (on software) may not align with the vast majority and could also be inconvenient, when listened to, the philosophy cannot be proved incorrect. The problem with pure indulgence in general is that it blinds one momentarily of his judgment. Like any other thought, freedom along with one’s basic rights need to be remembered and exercised else they might just go away. In line with the famous quote – ‘the price of freedom is eternal vigilance’, in every aspect of GPL revision I do see vigilance. I think free software is a very important force and rms needs to be more than listened to.

Patent Wars: Guarding the Open secret

Please read him as him/her.

Not long back, I stumbled upon this URL that showed who is suing whom in the technology industry. To see some of those companies, whose products we all love, to be involved in such legal brawl was sure not an inspiring thing to me. Last month in his interview with All Things Digital, Apple CEO Tim Cook summed up the whole scenario as a pain in the ass. In this blog I have attempted to get a handle around the current ‘patent situation’ in the tech industry (mostly around the mobile space) and imagine why it is a difficult problem to solve.

Let us start with the basics. Why a patent? The intuition of a patent is to provide a commercial incentive to the creator who is willing to disclose his idea in complete detail. By making his idea public, the inventor creates a possibility of new ideas that could be spawned (from the original) and the commercial incentive he gets in return is a legal monopoly to monetize his idea in the market for a limited period of time.  How does this work? For an idea to be patented, it needs to be first patentable (novel & non-obvious to start with), which are applied by the inventors at the Patent Office (USPTO for instance), subsequently verified and approved by them. Broadly, a patent application has 2 components – claims and the specifications. The ‘Claims’ part talks about the scope of the invention and the ‘specifications’ part describes the technique involved around meeting the scope. Each claim should strive to be specific. It can be broad but not generic. When a particular ‘claim’ is implemented in full in another product without the consent of the patent holder using any technique, it can be called an infringement. The catch here is that for an infringement to occur, at least one claim in the patent needs to be implemented in full. This is precisely where it gets difficult to prove an infringement and this will be clear in the next few paragraphs.

Any new invention (product or a service) is most likely to be a unique combination of some existing features (features that are already present in this world) and new features.  Each feature is equivalent to a ‘claim’ in some patent. For this invention to be free of any infringement, for each existing feature there should be a consensus in some form with the original patent holder for including that feature as part of the current invention. With this knowledge, let us look at the various scenarios that are alleged around.

Here is a classic infringement allegation – a novel patented feature found in another product without the consent of the creator. The novel feature that I will quote here is the ‘data-tapping feature’ (FOSS Patents) that we find in smart phones (iphone). This is an invention that marks up addresses or phone numbers in an unstructured document like an email, to help users bring up relevant applications like maps, dialer apps, which can process such data. Apple filed this complaint against HTC (Android), which was ruled by ITC (International Trade Commission) in favor of Apple in Dec 2011. HTC had to either drop this popular feature or find a workaround and HTC chose to workaround this.

In the US, the legal monopoly offered for a design patent is 14 years and for a utility patent it is 20 years. What if a feature/claim becomes so common and quickly evolves as a standard, acquiring enough mindshare that it becomes part of the infrastructure? These features become ‘standards and essentials’ and once this state is reached, these inventions are FRAND (Fair, Reasonable And Non-Discriminatory) pledged by the creators with a license fee. Again the catch here is ‘fair’, which is very subjective and plays a crucial role when one’s direct competitors own these patents. Kodak has alleged patent infringement against Apple and HTC relating to their patents on Digital Imaging Technology. Another instance is Nokia alleging Apple to infringe into its patents on 3G, GSM, which was finally settled by Apple.

An interesting combination of the above two occurs when 2 inventions allegedly overlap. A classic example of this would be some of the allegations between Samsung and Apple. Apple alleged that certain Samsung’s smart phones have the same “look and feel” (called a trade dress in the IP world) as their own products. In another context, Samsung alleged that Apple infringed into its W-CDMA (standards & essentials) patent. Cross-Licensing each other’s patent to the other party is one way these conflicts could be resolved.

Defensive Patent strategy is where companies go out and acquire patents to help protect them from possible lawsuits from other companies. A famous example would be when Google announced its plan to buy Motorola Mobility in 2011 to protect its Android platform from its competitors (mainly Apple). An aggressive variation of this is a Patent Troll, which is to go out and buy patents with an intention to sue others.

Patent Pending is another clever strategy, wherein the creator releases his product after filing a patent that is not issued yet. He observes competition’s response, refines the patent to trap the competitor to infringement.

How does one defend?

The alleged infringer would typically defend his stance trying to prove the patent to be invalid. One way is to prove the invalidity is claiming that the invention was indeed obvious at that point in time (when the patent was issued) and the other way is by citing a prior publication or an art invalidating the novelty of the issued patent.

So, coming back to my original question – Why is this a difficult problem to solve?  I’ll try answering this by splitting this into 2 sub-questions – (1) why is an infringement inevitable? And (2) why will it be difficult to spot an infringement? The answer to the first question lies with a fundamental difficulty the human mind has in dis-possessing a good idea after experiencing it. In this information age, to innovate and differentiate has become mandatory for any business to survive. Companies are going to find it challenging to ignore innovation that is happening around and may end up indulging in creative workarounds/alternates. The answer to the second question lies with another fundamental difficulty.  Anyone other than the alleged creator will never know for sure if the creation was inspired or copied. Inspiration is important for the very advancement of science, while copying may not be. This necessitates any patent framework that we attempt to build be designed such that it assumes ‘inspiration’ when in doubt. Doing it the other way will defeat the whole purpose of the framework (remember the intent of a patent framework is also to make progress).

In a competitive environment (tech industry for example) where the stakes are high, patenting one’s invention (to detail out the idea along with the how part) is actually a risk. Every idea needs an ‘engineering magic’ to materialize to a unique user experience and if that ‘engineering magic’ is hard to guess, most of the companies would retain the ‘how part’ as a trade secret without patenting it (coca cola is a classic example). It is only when the ‘engineering magic’ is guessable that companies opt for a patent to prevent someone from copying and I think ‘visual’ user experiences are relatively guessable. While the intent of the patent framework was to promote the progress of science and useful art, the current use case for the same in my opinion is completely different. I anticipate the patent framework will continue to be used as a defensive weapon at best, used to intimidate, distract the alleged infringer and delay his progress.


Wikipedia or Britannica? The case for crowd sourcing in Enterprise IT

Please read him as him/her.

Britannica recently announced the closure of its 244-year-old print version and my first response was that this is a clear indication of the start of the end game between Wikipedia and Britannica. However I was surprised to read a message from Jorge Cauz (President from Britannica) in the same article. He exuded confidence and stated that he would bet a lot of money that most people would rather use Britannica than Wikipedia in the future. This got me curious. I’ve never felt the need for Britannica encyclopedia and have always relied on Wikipedia to learn anything new. When almost everything is available in Wikipedia for free, why would one want to go elsewhere and pay for it? What exactly is the case for Britannica? The other bigger question that had kept me disturbed in the recent past is the case for crowd sourcing (companies like innocentive, topcoder, crowdANALYTIX etc.) in enterprise IT. And I can imagine no better success story than Wikipedia, when it comes to crowd sourcing. In this blog I’ve tried to answer my own question by attempting to imagine Cauz’s reasoning.

Who will own the master data in the web?

We are in the age of information explosion. Computing will get pervasive and digital data is going to get overwhelming. Information will be available through various channels and all this is going to make fact finding a challenge. Fact is going to be diluted by one’s imagination in the form of bias, opinions and individual ideologies. Truth will continue to matter for the curious minds and the search for it is sure to get increasingly difficult. Where does one find the truth then? I’ll try to answer that through an even fundamental question.  How does an individual experience truth in a subject? Every individual is likely to have his trusted network (friends, respected authors, professionals like lawyers, doctors) to get access to facts in the respected fields like medicine, law or politics and he is most likely to believe what they say as truth. But how about subjects or topics that are completely alien to his network too. Or when he wishes to experience truth himself? Don’t we deserve a single source of truth somewhere? For years and centuries Britannica encyclopedia was this source of truth until joined by Wikipedia in the recent past. I believe both of them have a genuine cause, albeit different philosophies. But out of these two, which will emerge as the fact repository in the web? Let us look into their models in detail and also identify a clear premise to compare them. In this article, I will consider Wikipedia and Britannica in their purest forms, free of any adjustments (includes incorporating best practices from each other) either of them would have done to their models in the past. This premise will place them in 2 ends of a spectrum.

I see Wikipedia to be built around the concept that knowledge should be made accessible to everyone in this world. There should be no obstacle to a curious mind. The repository is free of guard (well, at least as compared to Britannica) and anyone can contribute to this knowledge base either by creating new content or editing existing content. The expectation is a responsible user community. The model is certainly hard to digest but I perceive it works really well. On the other end, I see Britannica to be built around the concept that knowledge is precious and it has to be created and closely guarded by content matter experts. Else there is a possibility of it getting diluted to opinions, bias etc. This model necessitates a small price be paid by the seeker thereby making one earn the knowledge. I understand this too to be a valid paradigm. Until one earns knowledge, he/she is most likely not to appreciate the value of it.

Apart from the content being free, I believe Wikipedia will always stay ahead of Britannica in terms of content coverage. This is primarily due to two reasons that are inherent to the model– (1) the higher number of authors and (2) ability to allow the truth to evolve and settle. For Britannica to always provide accurate facts to us it has to be absolutely certain before base lining a fact and this cannot be met without a time delay. This sure sounds like a very important advantage for Wikipedia, but what about its accuracy? Does an open framework mean inaccurate data? Is there a way to compare the 2 for accuracy? In the year 2005, Nature (a scientific journal) did a study by choosing 42 articles from both sites across multiple topics and had those all reviewed by relevant field experts. As per the article, Wikipedia had 162 such problems, while Britannica had 123. That averages out to 2.92 mistakes per article for Britannica and 3.86 for Wikipedia. For Wikipedia, that does not sound bad at all.

If Wikipedia can provide a wide range of content free of cost, with accuracy close to that of Britannica, why would anyone still prefer Britannica encyclopedia? Assuming Britannica will maintain its precision in accuracy even in the future, I can think of 2 kinds of users who would subscribe to Britannica – Those locked in customers (ones used to Britannica) and who research on base lined facts like history, science etc. will most likely find it difficult to switch. Students especially in their early stages of development using Encyclopedia from their childhood will evolve into these users. Understandably 85% of Britannica’s revenue is from educational products. Britannica has the remaining 15% of its revenue from annual subscriptions. To comprehend these set of users, let me explain another important influencing factor and how I understand it – ‘accountability’. I understand accountability as betting and taking responsibility of a particular event (in this case fact accuracy) at any particular point in time. In a truly Wikipedia model which is 100% open, time has no relevance in it. Any attempt to bring accountability (Wikipedia) on the accuracy of the data cannot happen without compromising on the freedom that the model provides. I think Wikipedia will continue to be paranoid about how quickly content can be corrected in the event of vandalism and Britannica will continue to be paranoid about avoiding vandalism in the first place. Of the two, I believe Britannica is better positioned to stay accountable. I call these other kind of users as ‘serious seekers’ who will be passionate about facts. These seekers will most likely prefer Britannica and I believe Britannica could evolve as an authorized source of facts for the topics it carries. Casual seekers, a seeker who is simply curious and interested in current affairs or trends or simply learning new things will obviously be reaching out for Google and Wikipedia. I think most of us; most of the time will play the role of a casual seeker.

In summary, the clear advantage of Wikipedia as I see is its accessibility and content coverage, both driven by its open model. For Britannica it is going to be ‘accountability’. This way I anticipate Wikipedia to emerge as the ‘popular source’ and Britannica as the ‘authorized source’. So, coming back to my original question, how do we translate this to crowd sourcing in enterprise IT?

One interesting difference between Encyclopedia and Software is that while software writing is all about unleashing one’s imagination (creativity), content preparation for an encyclopedia is all about suspending one’s imagination (discipline). To state a fact, one has to strive to be 100% neutral and be as unimaginative as possible. This should however not discourage us from leveraging our learning that we just had. How do we map ‘accessibility’ and ‘content coverage’ to crowd sourcing in software? Accessibility in software can be interpreted as accessibility to available talent. Clearly crowd sourcing in software opens up a ‘never imagined’ talent pool to the solution seeker.  Content coverage can be mapped to innovation in software. At the heart of the crowd sourcing model is complete freedom of choice and expression. A problem solver has full freedom to choose the problem that he wishes to solve and how he wishes to solve. For this model to sustain, I believe 2 things need to hold good (1) the solver should be skilled and passionate and (2) the problem should demand innovation. Compared to traditional outsourcing, I certainly see a better possibility to innovate in crowd sourcing. In software, I would like to interpret ‘accountability’ as ‘timing the solution delivery’ or simply ‘delivery certainty’. Similar to Wikipedia, time has no relevance in pure crowd sourcing. Any attempt to bring this certainty cannot happen without compromising on the freedom it offers. Like Britannica, I believe this (delivery certainty) will continue to be the strength of mainstream outsourcing.

I anticipate, Crowd sourcing and traditional outsourcing to serve different use cases in the future. Crowd sourcing in the purest form can be disruptive and be an appropriate model to foster innovative solutions for an enterprise but it will be a challenge to time the outcome. Traditional outsourcing will continue to do well in providing this certainty but the model will find it difficult to innovate.

References:

1 – http://money.cnn.com/2012/03/13/technology/encyclopedia-britannica-books/index.htm

2 – http://news.cnet.com/2100-1038_3-5997332.html