Share this article:
In this blog post we interview the author of Computer Vision and ask him about the fifth edition of his book and if he has any tips for teaching and learning computer vision.
Mark Nixon mentioned in his Foreword for the fifth edition that your book has a ‘unique and pragmatic blend of theory, implementation and algorithms’ What influenced you to take this approach?
My experience of image processing algorithm development showed me, above all, that nice pieces of theory alone are insufficient in guiding one to produce useful algorithms: what matters is making them work in real environments, with all the concomitant problems like noise, ‘clutter’ and distortion.
You are still taking this approach in this edition. Can you give some examples of how the present edition takes this approach?
This is very obvious in the Hough transform chapters, Chs. 10 and 11. These seek to find objects using robust algorithms – not by adding robustness afterwards (which is bound to be too late!) but as integral parts of their design. It is still there in all the previous material (four 4th edition chapters were compressed pretty well to get them into Chs. 10 and 11 with no real loss!). More now appears in the new practical examples Section 14.4, but here relying on the strongly Machine Learning based EM algorithm. Also, see the simple approach to face detection, Section 21.2 (though this is illustrative rather than an algorithm for serious use).
In the many years you have been teaching computer vision, what typical problems do students have in mastering the subject?
- Students find the design of thinning algorithms quite tricky, with the absolute necessity to maintain connectedness while thinning object limbs. Here there are mathematical subtleties to understand. These are also present in boundary tracking. Such trickiness is compounded when sequential algorithms are used (especially inadvertently!) instead of more robust parallel algorithms.
- Other problems arise as humans are disposed to thinking vision must be simpler than it is: a case in point is that of occlusion – easily coped with by humans but needing exactly the right algorithms for robust computer vision.
- In machine learning, fully understanding the principles of training, including training data having to be sufficient and of the right type.
- Nowadays, students’ capability for using standard modules in standard packages gives them over-confidence that they understand the subject when they haven’t had to consider the detailed design of any of the standard modules.
What advice do you have for instructors who are new to teaching an introductory computer Vision course, so that they can help students overcome some of these problems?
Inevitably, let the students progress as far and as fast as possible using standard packages with standard modules, but let them spend some time looking at how to design/redesign relevant modules, and design totally new ones.
You have made many changes to this new edition, not least including chapters on Machine Learning and Deep Learning Networks. Can you say how you are covering these?
- Machine learning is tricky because of the amount of probability theory and associated maths (which many students are going to be weak on). Furthermore, each part of the subject has its own way of progressing by using its own optimisation criteria plus a whole variety of methods of optimisation. Often there is no best solution, and over the whole subject area completely different solutions have to be developed. I have chosen for example to try to show the whole of one area of solution (relating to the EM algorithm) and to hide little, so as not to pretend to students that things are trivial, but that there are important things to bear in mind in reaching a proper solution. See particularly the theory of Section 14.12 Boosting with Multiple Classes. I have tried my best to do this throughout Ch. 14.
- Deep learning networks have grown up incredibly quickly, and I have aimed to give a reasoned chronological development of the methods used in these remarkable networks. In the end, they are limited by the amount of data they are supplied with: but also there are rules about the interrelatedness of the huge sets of supplied data. So the name of the game in vision has changed from being algorithm centred or even training centred to being data management centred – i.e., the vision engineer now suddenly has to look in these totally different directions to get a full view of what is going on. I have tried to reveal what I can in one chapter (Ch. 15) plus a face detection/recognition chapter (Ch. 21) that makes strong use of deep learning. I have also tried to indicate that changing the orientation of the subject brings its own dangers and that the conventional orientation (i.e., algorithmic rather than purely learning based) must not be lost sight of.
New to this edition is the inclusion of MATLAB and C++ code. Can you say how students and instructors can benefit from these?
Ordinary C++ code was included in the book itself from the 4th edition onwards. It was there to give concreteness to the coding of algorithms. And it avoided the separate problems of parallelisation (evident in MATLAB), which introduce a layer of trickiness that is one difficulty too many for some – especially when concentrating on the underlying concepts of the subject.
For this edition, the MATLAB is being added to the book website, as it is now a common coding currency, and it will be wanted by many. However, students’ capabilities, knowledge and experience in this direction will vary widely, both for individuals and between one body of students and another. Here, there is no substitute for the local instructor to provide individual and group guidance. Nevertheless, the commonalities between the needs may in some cases be quite close: to this end I am providing as much help as I can regarding overview, guidance, general observations and examples.
At the Elsevier booth at the CVPR conference many researchers have said they learnt computer vision from your textbook. Can you give some advice on how someone could use your book to teach themselves computer vision?
For prospective authors the best advice is never to start with a blank sheet of paper – it’s too intimidating! The problem is similar when trying to learn from a book: reading and then doing more reading, and then doing even more, can leave one thinking one knows what one has read, but then it will gradually fade away. Even with revision things will still fade away. However, in vision one cannot get a full feel for the subject, and retention, and significance, without processing pictures and looking at the results. Fortunately, in vision the data is often immediately understandable to the student – e.g., a picture in which the pedestrians or other objects have been highlighted in boxes. And seeing the result of an algorithm in which the boxes are misplaced, poorly orientated, wrongly sized, etc., makes any interpretation faults all too obvious; importantly, what to do about it also becomes much clearer. In this way, developing algorithms becomes part of the self-learning loop, and in general it is unwise to read too far without verifying by such practical demonstrations that the ideas one has been obtaining from reading are indeed correct.
- Examples and applications give the ‘ins and outs’ of developing real-world vision systems, showing the realities of practical implementation.
- Tailored programming examples—code, methods, illustrations, tasks, hints and solutions (mainly involving MATLAB and C++)—will be available.
- Three new chapters on Machine Learning emphasise the way the subject has been developing:
– Two chapters cover Basic Classification Concepts and Probabilistic Models.
– The third covers the principles of Deep Learning Networks and shows its impact on computer vision, reflected in a new chapter Face Detection and Recognition.
Want to read more? Computer Vision: 5th Edition is available to pre-order now on Elsevier.com, save up to 30% when you order, enter STC317 at the checkout. The book will also be available on ScienceDirect when it is published.
Computing functionality is ubiquitous. Today this logic is built into almost any machine you can think of, from home electronics and appliances to motor vehicles, and it governs the infrastructures we depend on daily — telecommunication, public utilities, transportation. Maintaining it all and driving it forward are professionals and researchers in computer science, across disciplines including:
- Computer Architecture and Computer Organization and Design
- Data Management, Big Data, Data Warehousing, Data Mining, and Business Intelligence (BI)
- Human Computer Interaction (HCI), User Experience (UX), User Interface (UI), Interaction Design and Usability
- Artificial intelligence (AI)