Share this article:
Ask an Expert: Jules J. Berman
We asked author Dr. Jules J. Berman about big data, classifications, and his latest book entitled, Classifications Made Relevant: How Scientists Build and Use Classifications and Ontologies.
- How long have you been researching or thinking about classifications and ontologies?
- What got you interested in this field?
In the early 1990s I was heavily involved in the field of biomedical informatics and was led to various medical classifications intended to support our team’s datamining projects. To my dismay, all of the biomedical classifications and ontologies at the time were poorly designed and inadequate for our purposes. Some were no better than lists of medical entities. I soon realized that professionals engaged in any field of science need to devote serious thought to the design of their classifications and ontologies, if they hope to achieve useful and valid results.
- Can you explain why a strong understanding of classifications and ontologies is so important to doing research across the fields of mathematics, physics, chemistry, biology, and medicine?
Science is not primarily concerned with finding new facts. Science is all about using facts to draw general inferences about the nature of things. Generalizations apply to specific classes of related things. Therefore, when we draw our inferences, we need to have everything organized into well-defined classes, with strict rules for class membership, and with established relationships linking the different classes and their members. This is all done with two closely related data structures: classifications and ontologies. Every modern science depends upon classifications (e.g., The classification of living organisms for Biology, the Periodic Table of the Elements for Chemistry, and the classification of symmetry groups for Physics).
- What suggestions can you give to data and computer scientists who want to communicate more effectively with STEM researchers about data organization and analysis?
Computer scientists should not expect STEM researchers to know, at the outset of a project, the type of data structure that best suits their needs. The computer scientist should discuss with their STEM partners all the different options, at an early stage in the project. In some cases, projects can be vastly simplified by linking newly acquired data to an appropriate standard framework. If there are existing classifications and ancillary data resources available, they should be evaluated by the whole team.
Regarding data analysis, scientists should understand that a deep understanding of the questions being asked is always more important than the technique chosen for data analysis. Data scientists and STEM researchers must share a clear understanding of the questions being pursued, and they must together determine whether those questions have been formulated in a manner that allows successful analysis.
- You have written a book entitled, Classification Made Relevant. What was your objective in developing this book?
We all assume that we know how to classify things. Experience indicates that humans are adept at sorting information into categories, but we seem to lack any innate ability to relate classes of information to other classes of information, or to produce a self-consistent class hierarchy. The science of classification is a discipline, and just like any other discipline, there are principles and techniques that must be mastered. Today, many professionals are involved in building, curating, and using immense data resources. In my previous book, Data Simplification, I reviewed how various types of data structures are created and employed. In Classification Made Relevant, I focus on classifications and ontologies, which are, by far, the most difficult data structures to build. I draw examples from the best classifications in the natural sciences, explaining how classifications are used to advance and unify all scientific disciplines.
- For those readers who are new to ontologies and classifications and want a good understanding, what suggestions do you have on how they should approach reading your book?
Most of us have come to think of classifications as data structures that help us search through large collections of information, for the purpose of retrieving particular items of interest. Actually, fast search and retrieval is a job for indexes and catalogs, not classifications. Classifications and ontologies are all about relationships: relationships of objects to other objects and classes to other classes. As we read the text, reviewing hundreds of examples, we should continually ask ourselves how the provided class definitions and relationships help us draw new inferences about classes and their members. Readers who are looking for a general perspective on the subject will find that classifications capture the intrinsic meaning of knowledge domains. For example, the classification of living organisms encapsulates all of the biological sciences, and helps readers make sense of evolution, embryology, anatomy, biochemistry, and medicine, through the agency of a simple data structure.
- How might your book also be used as a key reference for graduate students in data and computer science programs?
To the best of my knowledge, there is simply no other book that shows how classifications are designed, and how classifications can be linked to other data structures. For data scientists, there is a chapter explaining how classifications can be modeled using object-oriented programming languages, and how these modeled classifications can be usefully linked to triples, the fundamental units of meaningful data represented within modern semantic languages.
- How do you see the field of data science evolving over the next 5-10 years?
As we collect more and more data, we will need to have classifications that can link to databases, indexes, and ontologies. Our classifications will be composed in any of several semantic languages and ported into a class hierarchy modeled by object-oriented programming languages. To avoid drowning in our data, we will need professionals who understand how classifications can be built and used. Many of our existing classifications will need to be thoroughly redesigned. If all goes well, over the next 5-10 years, professionals will return to the serious study of classifications, a time-honored subject that has been ignored by the bulk of data scientists.
Ready to read this book?
Classification Made Relevant (9780323917865) is available now on Science Direct. Or purchase your own copy from the Elsevier bookstore and save 30% + get free shipping with promo code STC30.
Other books from Dr. Jules J. Berman include:
Principles and Practice of Big Data
Computing functionality is ubiquitous. Today this logic is built into almost any machine you can think of, from home electronics and appliances to motor vehicles, and it governs the infrastructures we depend on daily — telecommunication, public utilities, transportation. Maintaining it all and driving it forward are professionals and researchers in computer science, across disciplines including:
- Computer Architecture and Computer Organization and Design
- Data Management, Big Data, Data Warehousing, Data Mining, and Business Intelligence (BI)
- Human Computer Interaction (HCI), User Experience (UX), User Interface (UI), Interaction Design and Usability
- Artificial intelligence (AI)
- Peter Pacheco’s An Introduction to Parallel Programming
- Carol Barnum’s Usability Testing Essentials
- Peterson and Davie’s Computer Networks