Journal of Knowledge Management Practice,

Journal of Knowledge Management Practice, December 2001

Content Management Makes Sense - Part 1

Delivering Increased Business Value Through Semantic Content Management

Michaël Auffret, Profium

ABSTRACT:

Organizations across the world are facing a new challenge; to efficiently manage the growing volume of static and/or dynamic information accumulated within the enterprise. If managed successfully, large volumes of data need not be a burden. Indeed, significant benefits can be drawn from a new generation of content management solution that leverages semantics (the study of meaning) to improve the way information is used.

By using such technology, organizations working with large volumes of data will soon realize dramatic cost reductions, revenue improvements and opportunities for gaining competitive advantage. In this regard contends that Semantic Content Management offers enormous business benefits, both in terms of cost reductions and increased revenue/competitive advantage.

In Part 1 the author first defines the meaning of ‘content’ and ‘management’. He then considers the business-value of content and the necessity to add meaning to content objects. Technological aspects of semantic content management are then reviewed together with the advantages associated with the ‘next generation’ solution. Next the value of having open standards is discussed, followed by an overview of business benefits generated by semantic content management. Finally the use of semantics to increase business value is explored.

Introduction

With an explosion in the amount of information made available to us as individuals, our world is often characterized by increasing complexity. Most of the time this wealth of information is considered key to the welfare of both individuals and enterprises. However, handling massive information streams is not a trivial task; on the contrary, it requires a sophisticated IT environment that employs the correct tools and well-chosen standards to offer the freedom and ability to face the content management challenges of tomorrow. This is further magnified by the fact that content is delivered in many forms and via many channels; including print, Internet, intranet, extranet, email, SMS, WAP, 3G, PDA’s and Digital TV to name but some.

There are already several well-reported examples of businesses successfully leveraging their information resources. In February 2000, CIO Magazine reported that Pfizer Inc., by reworking existing information relating to pharmaceutical development, was able to reduce the time to market for new drugs and generate additional revenues of $142 million over a four-year period. There are also an increasing number of companies making money directly from the sale of their content. These include the Wall Street Journal’s WSJ.com, which has generated over $40 million, and the Finnish financial news portal Kauppalehti Online, which generates almost two thirds of its revenue from the sale of information-based content

In the gold rush to service the needs of organizations, software vendors have begun offering different solutions, based on different standards. This has led to further confusion as to the exact definition of content management:

· What do we mean by content? Content can be as diverse as film, audio, SMS, email and news streams

· What is ‘management’? Is it storage, editing, web site structure and/or workflow authorization procedures? Or is it something else entirely?

Given the challenges described above, it is clear that organizations able to successfully manage content possess a valuable asset.

The Value of Content From A Business Perspective

Naturally, efficient content management is of particular importance to industries where information is sold as a commodity (media, telco, web etc.). In such industries the following factors must be considered:

· Many sources of content in many formats

· Many delivery channels

· A need to avoid information bottlenecks

· Easy content reuse (e.g. in new products or services)

· Volume and speed, often in real time

Financial companies, and large to midsize industrial enterprises (such as pharmaceuticals), face very similar problems, even when they do not directly sell information as a commodity. Efficient, highly targeted communication is essential for successful Customer Relationship Management and Supply Chain Management systems.

Customer service organizations also face complex content delivery scenarios, as do companies selling primarily to ‘invisible’ customers; for example real estate agents or supermarkets. Finally, the public sector and some service companies depend on good content management to be effective in their business.

In any of these environments there are three common stages to content management:

1. Content sourcing: Includes tasks such as authoring, collecting and editing the information from external and internal sources and systems. Of extreme importance here is the issue of classification. Information objects not accurately described are difficult to manage and publish. In other words, the less effort spent during content input, the higher the overall costs – or the lower the success of the following two stages.

2. Administration: Where possible, this function should be automated. Without automation, bottlenecks may occur, bringing the entire cycle to a halt. This function is not trivial; the sheer volume of information passing through a content management system is simply too much to be dealt with by humans in a reasonable timeframe. Other issues of importance are ease of assigning users’ information needs and access rights.

3. Publication: Static publication of content takes many forms: Web pages on inter-/intra-/extranets, print, TV etc. Although effective, this form of publication is not targeted at the customer or even a customer type. Dynamic publication may use web technologies or new ‘push-media’ such as email, SMS, WAP and 3G. Effective reuse of (reformatted versions of) content is essential, but is highly dependent on the accuracy of the object’s classification.

One issue has an overriding influence on the cost/benefit of a content management system; the accurate classification of content and its ease of reuse. Any means of streamlining and improving this central process will have a knock-on effect on the subsequent two stages; delivering reduced labor costs, and improved content reuse across distribution channels (‘Create once, publish anywhere’).

Surprisingly, only a small percentage of a product’s functionality is ever used. Many content management products originate from a web site administration or a document management background. Such products offer only basic keyword searches (like traditional Internet search engines). Although many sophisticated techniques are used, such as word frequency counts and lexical analysis, this method is not sophisticated enough to go beyond basic information searches.

Introducing Meaning

Content objects by themselves are just that, objects. To make an object more useful one needs to introduce meaning.

A normal web page or document-based search engine typically returns far too many results to be of use; often a large percentage of results are not relevant to the original request. Users must be able to interrogate content based on its exact meaning. Take, for example, a basic search for Woody Allen. The results do not tell us whether they relate to Woody Allen as a director, actor or both. Such methods are one-dimensional. The software is simply searching against a list of keywords, unknown and unrelated to one another.

It is this inability to exactly exclude or limit searches that is lacking from today’s web and content management solutions.

Only by introducing meaning can content be optimized. This can be achieved through semantics, a highly complex discipline defined by Webster’s Encyclopedic Unabridged Dictionary of the English Language (1994 edition) as, ‘the study of meaning’.

Although semantics is a term derived from the world of linguistics it can be compared to the IT term ‘metadata’, a word to signify data that describes data. Metadata can, for example, be used to describe the meaning of content stored within a data warehouse or an online catalog. Such technology can, therefore, be used as the mechanism for building large, structured descriptions of content within enterprise systems.

As the volume of information continues to explode, the need for efficient and targeted automation of data capture / delivery is becoming more and more apparent. Without such systems in place, users will ultimately spend more time searching for information than actually consuming it. There is only one way of enabling computers to deal with information in a meaningful way and that is to describe it in a precise, machine-readable format. This can be achieved using metadata.

Using metadata also elegantly solves traditional problems of scalability. In many instances a user only needs to know that a piece of information exists. In such cases it is not necessary to consume bandwidth by sending the user the entire content object. Instead, the associated piece of metadata would be sufficient.

Semantic Content Management

As we have learnt, semantic content management is about managing content objects based on their properties. The objects can be of any type and the meaning of their properties can be recorded within metadata descriptions. These metadata descriptions are like library index cards meant for machine readers, not human readers. The metadata expresses the semantics according to the business environment. It could be customer codes for tenders, personal identities for digital images and artist names for audio files.

The technologies for managing this are based on XML and RDF:

· XML (Extensible Mark-up Language): An open market, non-proprietary standard for defining, validating and storing structured data objects by expressing these objects as tagged text. XML is a subset of an earlier mark-up language, SGML

· RDF (Resource Description Framework: A declarative language that provides a standard way of using XML to represent metadata in the form of statements about properties and relationships of items. Such items, known as resources, can be almost any type of object. On top of this you find RDF Schemas, which describe metadata vocabulary sets. A schema defines the meaning, characteristics and relationships of a set of properties, and this may include constraints on potential values and the inheritance of properties from other schemas. Within a schema, the meanings of terms are spelled out in detail, enabling independent communities to share vocabularies

Only systems, which, by design, are implemented to support those standards are capable of taking full advantage of semantics. In a semantic content management system the input is described by metadata whilst Active Query Agents continuously traverse the semantic network and try to match the information needs of users / customers with the information patterns available in the semantic content management system. Information is pushed to users / customers based on their profiles and on their preferred delivery devices (Web, email, mobile handsets etc.).

There are several major advantages to this next generation of content management solution:

· Automation: Only metadata enables computers to deal with information in a meaningful manner, actively enabling a new, higher level of automation.

· Flexible Reuse: Enabling easy development of new services. Content can be easily reused in many different publishing contexts and using new media as it becomes available.

· Quality: Because results from semantic queries only return meaningful results (no ‘spamming’ or ‘information overload’). By its nature, a collection of structured metadata only gets better as it grows – this, in contrast to the confusion created by a large collection of objects having little, or no, meaningful semantic structure (such as the Web today).

· Ease of use/ implementation: Complex queries are much easier to express, leading to easier development of more advanced applications.

· Interoperability: Content can be exchanged between different parties because its meaning is expressed using technology based on open standards. Peer-to-Peer content networks can be established.

· Location and Storage Independence: Since the meaning of the content object is described using metadata there is no need to move content objects into specialized storage with specialized search facilities – this has a dramatic effect on the cost structure of content providing systems

· Format / Type Independent: For the same reason as above, there is no limit to the kind of content that can be managed – any content object can be described by metadata, including non-textual objects such as pictures.

· Best of Breed Approach: A semantic content management solution may eventually be composed of products from several software vendors, as long as the products are using the same, open standards. The days of proprietary architectures and vendor lock-in are numbered.

· Scalability: Since all the system needs to manage the content is the metadata, and since metadata is compact, a semantic content management system can handle very large numbers of content objects without scalability problems.

· Open Standards: Basing strategic solutions on open standards is the best investment protection possible. At the same time, open standards make it a lot easier to acquire qualified staff, additional software tools, training classes etc. These all result in significant cost reductions.

The Value Of Open Standards

“The great fortunes of the information age lie in the hands of companies that have successfully established proprietary architectures that are used by a large installed base of locked-in customers. And many of the biggest headaches of the information age are visited upon companies that are locked into information systems that are inferior, orphaned, or monopolistically supplied” (Shapiro & Varian, 1999).

Allowing information to flow freely and interact both on the input and output side is critical to the success of any environment that hosts a content management system. In order to establish such a highly efficient, communicative content management system we must look to the use of open standards.

On its grandest scale, the use of open standards is key to the success of the World Wide Web. An organization called The World Wide Web Consortium (W3C) is dedicated to promoting the evolution and interoperability of the Web by developing common protocols such that the next generation of web technologies are capable of communicating in a meaningful way. This initiative is called ‘The Semantic Web’ (Scientific American, 2001). W3C is a non-profit consortium founded in 1994 by Tim Berners-Lee, and its current host institutions are Massachusetts Institute of Technology, Institute National de Recherche en Informatique et Automatique, and Keio University of Japan)

The two main standards to consider within the semantic web are XML and RDF, both of which were described previously:

· XML (Extensible Markup Language): A W3C standard for syntax, which is very useful for assigning metadata descriptions to objects

· RDF (Resource Description Framework): A W3C standard for defining metadata and its structure. Where XML is the ‘syntax’, RDF is the ‘grammar’

XML is in widespread use today. RDF is rapidly gaining ground, and is utilized in many different contexts and products. In the publishing industry for instance two industry-standard sets of metadata (Dublin Core and PRISM) are defined and described in RDF. Other examples include: RSS (Rich Site Summary) – a scheme for Web site categorization and CC/PP – a forthcoming standard that, amongst other uses, will describe the capabilities of next generation telephone devices.

RDF is a solid foundation for adding metadata to content objects in the authoring phase. For example, Internet browsers Mozilla and Netscape 6 use RDF. Another very good example is Adobe Acrobat 5, which is using RDF as its metadata language. This is part of Adobe’s XMP (Extensible Metadata Platform) framework, in-turn the backbone of Adobe’s approach to network publishing. In Adobe’s own words: “XMP provides Adobe applications and partners with a common metadata framework that standardizes the creation, processing and interchange of document metadata across publishing workflows. XMP will be incorporated into all Adobe products eventually”.

An Overview of the Business Benefits

Some benefits arise from cost reduction, others from increased revenue. Significant competitive advantage may also be gained. A summary of possible business benefits is shown in the table below:

	Reduced costs	Increased revenue	Competitive advantage
Automation	Less labor intensive	Can handle more users / customers	Faster to the market
Flexible reuse	Little or no manual intervention	New products and services as the market evolves	Deliver new products and services sooner
Quality	Less time spent on administration; increased reliability	Satisfied users / customers	Better prices and reputation
Ease of use and implementation	Short implementation time and low costs	Deliver quality products and services with good prices	React quicker to market changes
Interoperability	Little or no manual intervention in incoming or outgoing content streams	Content syndication and repackaging of external sources for more comprehensive solutions	A lot easier to interoperate with partners’ and customers’ systems
Location and Storage independence	Reduce machine, software and network bandwidth costs	Lower the bar for providing new solutions that are closer to real needs	React quickly to changes without having to wait for database conversions.
Format and type independence	Reduce implementation costs	Take advantage of new formats and specific customer needs	Can support the formats that customers need, quickly
Open Standards	Investment protection, easier to get specialist staff, software tools and training for standards-based products	Flexibility in choice of client interfaces, support any customer using standard products	Beat competitors, who are locked to proprietary architectures
Best of breed approach	Reduce software costs	Tailor your solution to market needs	Quickly integrate new infrastructure products
Scalability	Reduce IT investment	Grow with the business	Achieve large market shares, quickly

Using Semantics To Increase Business Value

The author believes that semantic content management is particularly well suited in environments with one or more of the following characteristics:

· Business value of content is high

· A need for complex searches

· Rich content objects with many properties

· Dynamic content

· A need for real-time publication

· A need for multi-channel delivery (and/or capture)

· Many content provider sources

· High volume

· External content feed and/or delivery

Maybe the most important aspect from a business perspective is the ability to ‘create once, publish anywhere’. For example: A provider of financial information can deliver chargeable subscriber services allowing professional clients to receive real-time information that they can further distribute to several Intranet and Internet sites, as well as to clients’ email and mobile devices. The same content could be delivered directly to end-users, and also indirectly in partnership with large companies, for use on an intranet. Delivery of such content can:

· Reduce production costs

· Reduce deployment time of new service(s)

· Increase loyalty of end users

· Increase usage of a cellular network = increased revenue

· Offer high quality services, which attract new professional customers

Summary

Semantic content management offers enormous business benefits, both in terms of cost reductions and increased revenue/competitive advantage.

Only by acting accordingly and taking an interest in semantic technology will organizations be in a position to reap the financial rewards and benefit, indirectly, from reduced labor costs, reduced IT investments, shorter time to market for new products/services, increased quality of products and services and competitive pricing.

References

Scientific American, 2001, http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html.)

Shapiro, C. and Varian, H.R., Information Rules, Harvard Business School Press, Cambridge Mass., 1999

Michaël Auffret is employed at Profium (http://www.profium.com/) and can be reached at michael.auffret@profium.com