What Is Probabilistic Graphical Models?

  • Background

    • Converging from areas of electrical engineering (e.g., communications and statistical signal processing) and computer sciences (e.g., machine learning, data mining and artificial intelligence), the subject of probabilistic graphical models is a state-of-the-art methodology for statistical inference.
    • Historically, using graphs to represent the relationship between random variables may be dated back to 1920’s in statistical physics [1] and genetics [2]. In 1980s, this graphical language was introduced to the community of electrical engineering [4], and around the same time, computer scientists independently formalized the framework of Bayesian networks and as a representational and computational tool for statistical inference [3]. Since then, the subject of graphical models has attracted increasing interest in the research community of machine learning and statistical inference. The astonishing power of this methodology was however only demonstrated in mid 1990s, where the methodology revolutionized the field of digital communications: its application in communication over noisy channels enabled reliable communication at rates arbitrarily close to the theoretical limit predicted by Shannon in 1948, whereby solving the long-standing “Shannon’s challenge”. This success has led to the modern interest of applying graphical models to solving inference problems involving a large number of variables; these problems, under the “curse of dimensionality”, are often intractable with conventional statistical techniques. In the past few years, this methodology has been applied to many areas of engineering and applied sciences and demonstrated tremendous successes (see, e.g., [8–18]). To date, the subject is still of active research interest in machine learning and statistics.
  • Introduction

    • Statistics and statistical methods are playing a central role in many research areas in electrical engineering and computer science, where a research problem of interest often contains components of statistical inference or learning. Typical such areas are, for example, signal processing, image processing, communications, networking, data mining, machine learning, artificial intelligence, etc.
    • A generic setup of inference/learning/statistics may be the following. Given a set of observations, find a ”mechanism” that according to certain metric of goodness, best explains the observations. For example, in speech recognition, one is interested in finding which word, phrase, or sentence leads to a given speech signal; in image segmentation, one is interested in determining which partitioning of the image into objects that best suits a given image, even corrupted by noise; in communications, one is interested in finding what data is transmitted from received signal that is corrupted and distorted by a noisy channel; in networking, a node may be interested in inferring the link-loss rates on some far-away links in a network based on packet-loss statistics measured at the node; in various data-mining applications, one may be interested in finding the relationship between features or attributes from a given large database; in molecular genetics, one may be interested in finding gene and protein interaction principles based on a set of biochemical or biophysical measurements.
    • There are two key components in solving such problems, namely, first, establishing a model — a family of hypotheses — for the underlying mechanism, and second, performing inference. As a consequence, good models must on one hand closely reflect the reality (of which human beings either have no control — as in a ”science” problem, or have partial control — as in an ”engineering” problem — so as to make the problem simpler or having certain desired properties), and on the other hand provide convenience for computation purpose.
    • Probabilistic graphical models, including Bayesian networks, Markov random fields, and factor graphs etc form such a modelling framework. The power of this modelling language is three-fold.
      • It can be easily adapted to combining with other standard modelling techniques.
      • It transforms qualitative knowledge — such as cause and effect relationships, constraint or interaction relationships, dependence or conditional independence relationships etc — into a quantitative representation.
      • Very powerful algorithmic methodologies have been developed within this modeling framework, which provide efficient and near-optimal solutions for problems that are otherwise intractable. In this framework, many previously known important algorithms are unified, generalized, and extended. For example, the methodology of the powerful Hidden Markov Model falls as a special case of this framework; the Viterbi Algorithm and BCJR algorithm in digital communications, Kalman filtering in estimation, Belief Propagation and Belief Revision in statistical inference etc are all instances of a generic class of algorithms within this framework; the Expectation-Maximization Algorithm and variational methods are also generalized to this framework.