Share this article:
Ask an Expert: Laura Sebastian-Coleman
We asked author Laura Sebastian-Coleman about data quality, data literacy, and her latest book entitled, Meeting the Challenges of Data Quality Management.
- How long have you been working in the Data Quality field?
I started working in data quality management in 2003, when I took a position as a data quality manager for the data warehouse at UnitedHealth Group. However, even before that, I had had positions that made me very aware that the quality of data influenced operational efficiency and the ability of people to do their work. For example, during college, I was a bank teller. The seemingly simple act of balancing at the end of the day was really an exercise in checking the quality of each transaction. I also worked for a manufacturer/distributor. There, on the manufacturing side, the quality of physical products was essential, but the data for managing inventory was equally important to running the business efficiently and effectively.
- What got you interested in this field?
The opportunity itself provoked my interest—managing the quality of data is both important and complex. At the time I started, the profession of data quality management was only starting to emerge. Tom Redman, Larry English, and David Loshin had written foundational books (drawing deeply from the existing literature on product manufacturing), but organizations were just getting started on how to implement the ideas. I was very fortunate, early in my career at UnitedHealth, to attend seminars and symposiums at MIT, where leading thinkers were working to define what it meant to manage the quality of information. I wanted to participate in this discussion. In my role as a practitioner, I saw both the benefits of high-quality data and the numerous costs of poor-quality data, as well as the ways that both organizational culture and technology influence the quality of data. I stay interested because data itself is evolving. The importance of data to organizations has only increased in the last two decades as they have found new ways to use data. With changes in the technologies used to create and manage data and with the sheer volume of data, it is even more challenging to manage it for quality.
- Can you explain why building data literacy is so important for data quality management?
Data literacy is the ability to read, understand, interpret, and learn from data in different contexts, and to communicate with other people about data. Data is means of representing real-world objects and events by encoding information about those events. It is a kind of language. Using data always involves interpreting it (i.e., Data uses are all at some level answering the question: what does this data mean? What do we learn from it?). To answer questions with data, people need the knowledge and skills to interpret data. Knowledge of data includes general knowledge about how data is created and organized, as well as specific knowledge about how data works in a particular function or industry and how a specific organization’s data works.
The quality of data is judged largely by the ability of people to use it (“fitness for purpose”). Data literacy influences what both Tom Redman and Danette McGilvray have identified as the most important points in the data life cycle: data creation and data use. The human element in data quality management requires that the people creating data (or developing the systems that create data) understand that other people will want to use that data. If the people creating data do so in a “data literate” way (if they design their data system to be understood by others), then they will produce higher-quality data. If the people using data are “data literate” and have reliable knowledge about the data they are using and the skills to interpret it, then they will be able to articulate their requirements for quality so that the data life cycle overall can account for these requirements. In a highly data literate organization, there would actually be less need for formal data quality management because people working in such an organization would already have high standards for quality.
- What key tips can you give to data management professionals who want to make sure they have an effective data quality management program?
Of course, the whole book is trying to answer this question, but several ideas jump immediately to mind.
- First, focus on the most important data. Not all data is created equal. Some is critical to the operation of the organization. Some is not. Never do data quality management for the sake of data quality management. The point of managing the quality of data is to help the organization succeed by getting knowledge and, hence, value from its data. To that end, the data quality practitioner should help ensure that the most important data is in the condition it needs to be for the organization to be successful.
- Second, listen to people who use data about their expectations for quality (how they think the data should look) and the things that get in the way of their use of data (their pain points). Don’t make assumptions. Don’t try to change their language. Hear what they say, then engage them in helping you help them.
- Finally, while the book is trying to help people understand the connections between data and organizational processes and technology – which can seem big and complicated – never underestimate the value of small changes. As more people become more aware of how data works within their organization, they see that their work enables other people to do their work. This awareness itself increases the ability of the organization to improve the quality of its data.
- You have written a book entitled, Meeting the Challenges of Data Quality Management. What was your objective in developing this book?
When I started working on the book, my intention was to provide a kind of update on the ideas I had presented in my previous book, Measuring Data Quality for Ongoing Improvement. I had learned a lot through applying the ideas from that book in three different organizations. That book was focused largely on understanding the concept of measurement, which is integral to understanding data itself, as well as data quality. In addition, as production editor of DAMA’s Data Management Body of Knowledge (DMBOK2), I had thought a lot about data quality in the context of data management. As I started to research and write, I found myself thinking that the ways in which data management in general and data governance in particular have emerged in the past decade and a half have had some detrimental effects on the ability of organizations to manage the quality of data. So, I felt I needed to engage with wider questions involving the relationship between people, process, technology, data, and culture: What is the impact of culture on the quality of data? How does technology management end up impacting the quality of data? How does metadata management (or failure to manage metadata) affect the quality of data? What knowledge, skills, and experience do people need to understand and interpret data? What does it mean to “be accountable” for “data as an asset?” I think it is time to reassess how we think about data management – from the creation of data through processes and applications, to its use in operations, analytics, and reporting, to its disposal – and to really focus data management on ensuring that data is reliable and of high quality throughout its life cycle. Data consumers within organizations have every right to expect data to be of high quality. More importantly, the customers of organizations expect information about them to be correct and protected from misuse.
- What approach does your book take and how will it benefit data management professionals across a wide range of industries, as well as academic and government organizations?
The first part of the book sets the context, describing why data quality management is important to reducing risks and improving efficiency. The second section is organized around what I have defined as the five challenges:
- The data challenge: Understanding what data is and how it works; particularly, understanding how data work as a representation of real-world objects and events.
- The process challenge: Ensuring organizational processes are designed to create reliable data, so that data is a product of those processes, rather than a by-product of them.
- The technical challenge: Ensuring there is the right balance between data and technology and that business data requirements drive technology decisions, rather than technology defining what data the organization has and what its quality is.
- The people challenge: Ensuring that people within the organization have the knowledge, skill, and experience they need to use data well (getting value from it) and responsibly (using it ethically).
- The culture challenge: Ensuring that leadership understand how data works and acts accountably toward data.
Most of my professional experience is in the healthcare industry, and many of my examples come from healthcare but these challenges exist in all organizations. They provide a set of lenses through which people in different industries can understand their own organizations.
The third section details the core capabilities in data quality management: defining data quality standards; assessing data against standards; monitoring and reporting on data quality; and managing data quality issues. The goal of these activities is first to make data quality known and then to influence the processes that create and use data so that the quality of data actually improves. This section defines data quality dimensions in depth and connects them to the five challenges. It also directly connects data quality management to other organizational processes, such as supply chain management, value management, and systems development. The models presented in section three can be applied widely across industries. All organizations want to produce value. All of them have a data supply chain. And most do some kind of system development.
- How might your book also be used as a key reference for graduate students in computer and data science programs?
I have learned a great deal from the books I’ve read and the ideas I have been exposed to at conferences and through conversations with others in the field. In writing this book, I have tried to faithfully represent the ideas that influenced me and to synthesize those ideas with my experience (not to mention citing them in the bibliography). My hope is that graduate students will find the book itself valuable and will also use it as a way to dig into the history of quality, changing concepts of data, and the evolution of organizational ideas like data governance, data stewardship, and the Chief Data Officer.
- How do you see the field of data quality evolving over the next 5-10 years?
My first hope is that data quality management will return to its roots and rediscover the value in the ideas of quality pioneers like W.E. Deming and Joseph Juran. Both men saw the importance of an organization adopting a systematic approach to quality. If we are able to return to these roots, then I think other changes will take place. For example, right now, in many organizations, data quality management is seen as a part of data governance. I actually think this relationship should be reversed. Quality data is the point. Data governance (controls, policies, and procedures) should be focused on ensuring that data is of high quality.
Ready to read this book?
Meeting the Challenges of Data Quality Management is available in the Elsevier Store.
Computing functionality is ubiquitous. Today this logic is built into almost any machine you can think of, from home electronics and appliances to motor vehicles, and it governs the infrastructures we depend on daily — telecommunication, public utilities, transportation. Maintaining it all and driving it forward are professionals and researchers in computer science, across disciplines including:
- Computer Architecture and Computer Organization and Design
- Data Management, Big Data, Data Warehousing, Data Mining, and Business Intelligence (BI)
- Human Computer Interaction (HCI), User Experience (UX), User Interface (UI), Interaction Design and Usability
- Artificial intelligence (AI)
- Peter Pacheco’s An Introduction to Parallel Programming
- Carol Barnum’s Usability Testing Essentials
- Peterson and Davie’s Computer Networks