VLDB 2002
28th International Conference on
Very Large Data Bases
Kowloon Shangri-La Hotel
August 20-23, 2002
Hong Kong, China

PANEL 1: TUESDAY, 20 AUGUST 2002, 16:30-18:00


Data Management Challenges in Very Large Enterprises


Adam Bosworth, VP, Engineering, BEA Systems, Inc., U.S.A.
James Hamilton, Architect, Microsoft SQL Server, Microsoft Corp., U.S.A. (PDF Presentation Slides - 324K; PowerPoint Presentation Slides - 364K)
Pat Selinger, IBM Fellow and VP, Data Management Architecture and Technology, IBM, U.S.A. (PDF Presentation Slides - 124K)
Hans-Peter Steiert, Research & Technology, DaimlerChrysler AG, Germany (PDF Presentation Slides - 624K)

Panel Chair
Michael L. Brodie
Verizon Communications, U.S.A.
PDF Presentation Slides - 208K)


Panel Outline

Very large enterprises have approximately a petabyte of operational data stored in over 1,000 data repositories supporting over 5,000 applications. Data storage volumes grow in excess of 50% annually. Repositories for decision support systems, which often contain replicated data, grow at twice to three times as fast as databases used for online transaction processing (OLTP). OLTP workloads are growing at over 60% per year. This growth is expected to continue for some time due to new Web-based systems, increased accesses to existing systems and the introduction of new sources of data, new workloads, and, new (e.g., XML-based) access requirements. While dealing with massive growth, large enterprises must also address unpredictable or elastic access demand of constantly evolving Web-based systems, increased storage complexity, new storage technologies (e.g., network data storage over IP, storage utilities), and more conventional but increasingly complex and costly challenges of data and storage management (e.g., backup and recovery).

While data management and data storage technologies continue to make impressive advances, there is only so much they can do in the face of the predicted growth rates. Very large enterprises are attempting to identify and address the drivers of data growth. A leading candidate is integration. Recent analyst studies conclude that over 40% of IT budgets are devoted to the integration of new and existing systems and databases. Technology advances often manifest in new systems and databases rather than in improvements and enhancements to existing systems. Consequently, very large enterprises operate their businesses with 1,000's of systems and databases ranging in age from 6 weeks to 30 years. Operational efficiencies require that these systems be integrated. The Web's potential of universal access adds increased urgency to these challenges. As a result, very large enterprises deal not only with massive data and workload growth and the attendant management activities, but also with massive integration challenges and costs.

Solution providers continue to offer significant advances to deal with specific data storage and data management problems (e.g., availability, robustness, performance) and are beginning to turn their attention to the integration challenge.

Current solutions tend not to map directly to the problems of very large enterprises. Solutions are seldom comprehensive and are product or vendor specific. Three approaches to address the problem of integrating component solutions into an enterprise solution are standards, consultants, and integrated product suites. Standards, such as those for Web Services, are intended to provide common specifications for all products in a domain so that different vendor products can be readily integrated. Consultants are intended to be vendor neutral while bringing a wealth of experience and knowledge to multi-vendor problems. A third approach is for vendors to integrate their products into tightly integrated product suites. Each approach has severe limitations. Very large enterprises are looking more than ever to solution providers to assist with their massive data management challenges.

The panel will identify the dominant data management challenges facing very large enterprises from the perspective of problem owners and will explore the solutions being offered by leading solution owners. It will discuss specific VLDB challenges and how the solutions address the challenges.

PANEL 2: WEDNESDAY, 21 AUGUST 2002, 16:30-18:00

The Future Home of Data


Michael Franklin, University of California, Berkeley, U.S.A. (PDF Presentation Slides - 424K)
Paul Larson, Microsoft Research, U.S.A. (PDF Presentation Slides - 20K)
Guy Lohman, IBM Almaden Research, U.S.A. (PDF Presentation Slides - 44K)
Guido Moerkotte, Universität Mannheim, Germany (PDF Presentation Slides - 60K)

Panel Chair
J. Christoph Freytag
Humboldt University of Berlin, Germany
PDF Presentation Slides - 100K)


Panel Outline

Over the last year the question of how and where to store data and how to access it has become a pressing issue. Especially with the growing importance of E-Commerce and the widely used Internet access, it is not clear any more if one approach satisfies all the needs of different communities accessing and processing data.

XML currently seems to be the most favored form for presenting, exchanging and possibly storing data. New technology "waves" such as Web Services, an "interaction model" between businesses and customers (B2C) or among business themselves (B2B), heavily rely on XML data because semantic information can be included with the data itself. Many companies in the E-Commerce space assume XML to be the "universal data format" for the future when storing and accessing data.

Database management systems (DBMSs) have been the "modern" technology for the last 40 years (with different data models and different levels of functionality and sophistication) to store, to access and to manipulate large amounts of data. This technology comes with many properties embedded in subsystems of DBMSs that is essential for reliable, efficient information processing in the business world, such as transaction management and query processing. Over the years, DBMSs have changed their assumptions where data might reside for access and processing. The initial assumption that all data is stored centrally changed to a "distributed model" by assuming that data is spread over several locations. The latter model quickly extended into a federated one assuming that "data sources" might be autonomous and not always under the control of one (database) system at the same time acknowledging that data might come in different formats.

Despite the "new technology waves" one has to acknowledge that much data is still stored as "flat files" without structural information or other relevant "meta data" that might be important for efficient access and correct processing.

Besides the form of the data, its location, i.e., where to expect data that might be relevant and important for a user to perform a particular task, has become an important issue. Having many devices such as mobile phones, notebooks, PDAs and other mobile devices, those often need data from other sources, at the same time storing new/additional data that might be relevant for others to perform their tasks successfully. Some devices, such as mobile phone, exist in large numbers; they perform tasks different from a computing device. Still their capabilities as storage processing devices (limited capabilities) make them an important data generation source and storage device that must be included into current and future processing environments.

Panel members from industry and academia have been asked to address the following issues in their statements:

  • What are the different alternatives for storing and accessing data for the future?
  • What are the characteristics for storage now and in the future?
  • Is XML the universal answer to all future needs, does it take over the "world of data"?
  • Should existing DBMSs be thrown away?
  • Should existing DBMSs be changed to adapt to new requirements and challenges?
  • Are XML-DBMSs the answer to all needs?
  • Is the role/functionality of DBMSs changing?
  • How do we deal with the many (mobile) devices as devices for data storage and data processing?
  • What are the "right" assumptions about distribution and heterogeneity of data?
  • What is the industrial and/or academic approach to deal with this challenge?
  • What are the trade-offs between different approaches?

PANEL 3: FRIDAY, 23 AUGUST 2002, 11:00-13:00

Biodiversity and Ecosystem Informatics - Research, Technology Transfer, or Application Development?

Panel Summary - 156K PDF file


Kathleen Bergen, University of Michigan, U.S.A. (PDF Presentation Slides - 708K)
Yannis E. Ioannidis, University of Athens, Greece (PDF Presentation Slides - 32K)
Jessie Kennedy, Napier University, U.K. (PDF Presentation Slides - 768K)
Renée J. Miller, University of Toronto, Canada (PDF Presentation Slides - 580K)

Panel Chair
Judy Cushing
The Evergreen State College , U.S.A.
PDF Presentation Slides - 48K)


Panel Outline

At VLDB 2000, a keynote speaker and a panel session urged database researchers to help solve critical problems in biodiversity informatics. A subsequent Spring 2001 report of a National Science Foundation Panel on Biodiversity and Ecosystem Informatics (BDEI) suggested that next-generation CS/IT applications required to understand complex, ecosystem-scale processes would require significant, ground-breaking CS/IT research . Considerable interest in this application domain has materialized, but as Maria Zemankova, NSF Program Officer, reported to the BDEI Workshop organizing committee after a talk at the National Library of Medicine: definitions of key terms and the list of research issues were not "jumping out". Some have suggested that the BDEI report is on the "nice" side rather than "hard-core". In short, before researchers or funding agencies get involved, they want to know more detail about how current research would transfer to this domain, and whether there are genuine research issues, or "just" complex application development. I think that, at the time the BDEI report was written, we knew only that there were domain problems, not what original research might be required. We could identify research areas, but not articulate research topics. In fact, one might ask whether these (admittedly critical) domain problems need:

  • original research
  • off the shelf applications or different (general) DBMS
  • organizational infrastructure for supporting data re-use
  • training for ecologists to use technology that already exists
  • more ecologists to do the domain research

It is, however, not possible to simply look at these problems and decide whether research is required to solve them -- one must actually try. Thus, in the summer of 2001, the NSF granted $1.25 million in grants to fifteen research groups to initiate planning projects and research initiation efforts in BDEI. By summer of 2002, these projects should be in a position to determine whether the project they have outlined in the proposal requires original research and if so, what is the nature of that research.

This panel will report back to the VLDB on the nature of research issues in this domain. Using one or more concrete examples of BDEI (funded) research, panelists will present their view of where the work lies with respect to database issues (e.g., conceptual modeling, spatial and temporal databases, metadata, data mining, data integration) and whether the work constitutes 1) work in the domain by ecologists, e.g., training in existing technology, infrastructure for community databases or to support reuse, or more ecologists doing field work, etc., 2) applying existing DBMS technology, 3) applying existing DBMS research to create new technology, or 4) original DBMS (or other CS) research.

