You are on page 1of 3

Data vs.

Last December, I spent a day at The Delphi Groups Corporate Portal Conference at the Claremont Hotel in Berkeley, Calif. While the air outside smelled of eucalyptus, the air inside smelled of early adopters. The small though focused set of attendees was clearly interested in the vendors stories and demonstrations. And, as is typical when technologies are at an early adopter stage, there were no clear segment boundaries. Although some vendors, such as Visual Mining Inc., focused on creating portals through structured or numeric information, most vendors focused on adding value in one form or another to so-called unstructured information or text. The most common applications I saw were text engines, collaborative environments, publish and subscribe capabilities, and visual interfaces. The term vendors and attendees most frequently used to describe this process of adding value to unstructured information was knowledge management. (The journal Knowledge Management was also one of the conferences cosponsors, as was Intelligent Enterprise.) The questions I usually ask vendors include whether they store their data, and if so, how they maintain consistency with data sources; whether they support XML, and if so, can they assign their own metatags; how they manage security; how they define personal profiles; and what functionality is accessible through APIs. In addition to these questions, I also tried to discover how the vendors saw themselves fitting in with the so-called data analysis products such as relational databases, multidimensional databases, data mining, data visualization, spreadsheets, and query and reporting tools. The implicit assumption surfaced that knowledge management software is geared more toward working with textual documents, and that business intelligence software is more focused on working with numbers; that knowledge is somehow more textually oriented while data is more numerically oriented. Can that be true? Is the relationship between data and knowledge roughly equivalent to the relationship between numbers and text? If the answer is yes, then I suppose everything is fine just the way it is. But if, as I aim to show, the answer is a resounding no, then a lot of knowledge resides in nontextual form. Therefore, anyone who wants to provide genuine knowledge management will need to find ways to access and manage this larger corpus. For the remainder of this article, I use many examples to point incrementally toward a working (as opposed to theoretic) definition of knowledge that encompasses substantially more ground than so-called knowledge management software currently covers. Its worth noting that this more complete notion of knowledge needs to interact with our voluminous, corporate data streams to support high-quality decision-making within a continuously learning organization, but detail about this statement must wait for a future article. Medium Isnt in the Definition Consider the following example-pairs intended to show that the distinction between data and knowledge has nothing to do with whether the information is numeric or textual, or even visual. Each example pair shows a representative sample of what would typically be called data in the A section, and what would typically be called knowledge in the B section. The three example pairs represent numeric, textual, and visual media. (A tilde appears before the terms data and knowledge because I do not pretend to define these terms by way of example. Rather, I want to refute the notion that these terms run proxy for the terms numeric and textual.) Numeric Media Numeric ~Data 9/9/99 Sales Cambridge $10,000 This is a typical example of numeric data as it might appear in any query or reporting tool or spreadsheet. The $10,000 is the so-called numeric data, measure, fact, or variable; it represents the amount of sales in Cambridge. Other examples include just about any fact table in a data warehouse: dollars spent as found in an expense report, budget dollars entered into spreadsheets, entries in the general ledger, and just about all filled cells in multidimensional databases. Numeric ~Knowledge Sales(t) = x1 Sales(t-1) + x2 Dinterest rates + x3 Dprices Another, somewhat wordy, but still essentially numeric, example would be: Stores with less than 10,000 products have top quintile earnings per product in areas where population density is less than 200 persons per square mile. Most relationships as discovered through data mining or statistics represent numeric knowledge. This knowledge includes everything from sales forecasts, unemployment drivers, and price elasticity
What should knowledge management software be managing?

of demand, to market-basket association coefficients, options pricing models, and customer segmentation analyses. Vast amounts of corporate knowledge are in numeric form. Textual Media Textual ~Data Customer Bob Jones had a bad encounter with service representative Charley Johnson at 3:00 p.m. on 9/9/99. The weather in Cambridge on 9/9/99 was sunny. Its easy to see how each of these examples could have been portrayed as dimensional grids with a qualitative or text variable in the cells, namely type of encounter in the first example and type of weather in the second. In fact, most qualitative attributes for the dimensions in a data warehouse would qualify as textual data. This includes, for example, package type, product color, name of store manager, customer address, sex, and family status. Textual ~Knowledge All of our competitors that are growing faster than us have poorer customer service. All plants absorb CO2 and emit O2. Snippets of wisdom, rules of thumb, business procedures, best practices, and customs typically find their expression in textual form. A lot of textual-form corporate knowledge goes up and down the elevators everyday. Collecting and sharing it is more of a cultural than technical challenge. Visual Media Visual ~Data Although the information in Figure 1 is visual and we typically think of visual information as portraying knowledge and insight, in this case it shows no more than two simple data points. FIGURE 1 This information is visual, but not knowledge.

Visual ~Knowledge In Figure 2, the visual information shows relationships between product prices, earnings, and location. This graph is a classic example of visually represented knowledge. These examples show that data and knowledge can come in any medium or form. The term data does not mean numeric, and the term knowledge does not mean textual. So, medium plays no role in distinguishing data from knowledge, but what does? FIGURE 2 A classic example of visually represented knowledge.

The concept of generality, abstraction, or inference support seems to play an important role in distinguishing knowledge from data. Single facts or assertions such as, The Paris store sold 100 pairs of shoes yesterday, have very low inference support and thus their decision-making is very limited. Given this one sales figure, for example, about the only decision you could make is how many shoes sold on that day in Paris. Regardless of the medium of expression, single facts or assertions with low inference support seem to function in the context of decision-making as what we typically call data. In contrast, statements of the type, Womens shoes are selling much faster than expected in Europe, are far more general, applying to many situations. They could thus enable all sorts of stocking and drill-down decisions for those stores where womens shoes are not selling well.

General facts or assertions with high inference support seem to function in the context of decisionmaking as what we typically call knowledge. Certainly, all of the preceding knowledge examples are more general in their scope than the data examples. So degree of generality does appear to be a factor in distinguishing data from knowledge. Is that true? Is degree of generality a defining factor for or just a typical attribute of knowledge? Consider the following two examples. Imagine you are the CFO of a large corporation and were the sole witness as your corporations president was abducted by some alien beings who then transformed one of their own into the presidents image. You now know that the president is an evil alien! This piece of information is very specific; it has low inferencing capability, just like data. But it is very important. This one simple fact no more general than, Bob Jones is a loyal customer if believed, would generate an entire sequence of actions: secret meetings with the executive staff and certain government agencies, possibly an assassination attempt, and certainly a search for the real president. By any normal use of the term knowledge, the fact that the president is an evil alien would be considered vital, strategic knowledge and would deserve a place in your knowledge management system (albeit with very tight access privileges). So a statement doesnt have to have high inferencing capability to be knowledge. Now imagine you are VP of marketing for an automobile company and you learn that Insects dont buy your products. This statement has extremely high inferencing capability. For every insect you might encounter, you could infer that that insect does not buy your cars. By sheer numbers, this statement applies to a far greater percentage of the world than any statement applicable to potential human customers. But it is fairly irrelevant because it would not be apply to any decisions you are likely to make. I doubt you would store and manage this piece of information in your knowledge-management application. These two examples showed that the notion of generality is neither a necessary nor a sufficient condition for considering a piece of information as knowledge. In practical terms, in other words, you would not want to expend energy collecting, storing, verifying, or managing it for decisionmaking purposes. Im not trying to suggest that we load our knowledge bases with masses of specific facts or that high inference-supporting assertions arent extremely useful. Rather, pragmatically, knowledge comprises that body of assertions, both specific and general, that we believe to be true and that support the decisions we need to make. So the difference between data and knowledge is not a matter of media or even abstraction. The difference is functional. Consider the following examples of knowledge used for decision-making. A set of rules for calculating the likelihood that a potential borrower defaults on a loan, which may have been generated with a data-mining package, is relevant knowledge for deciding whether or not to loan someone money. In practice, data about a person would be submitted and, through use of loan knowledge, a decision would be reached regarding whether or not to loan the money. A set of formulas, which may have been created in an OLAP environment, for calculating how to determine the earnings a product generates is relevant knowledge for making decisions about whether to increase or decrease production of that product. A set of procedures, which may have been created in a marketing document, for displaying packages in stores is relevant knowledge for making decisions about where to display products or even which stores to work with. Knowledge is embedded in many different places within todays corporate information systems: formulas in OLAP systems, triggers in relational databases, visual patterns in GIS systems, predictive models in data mining packages, classification rules and personal profiles in text enginebased applications, business procedures in rules-automation systems, and drivers in decisionanalysis tools. Real knowledge management can occur only when all this knowledge can be entered, accessed, verified, edited, and used through a single interface.