SlideShare una empresa de Scribd logo
1 de 124
Descargar para leer sin conexión
Ecole Doctorale d’Informatique
       Télécommunications et Electronique
                    de Paris


Ecole Nationale Supèrieure des Télécommunications




                     Thèse

       Présentée pour obtenir le grade de docteur
de l’École Nationale Supérieure des Télécommunications

     Spécialité : Electronique et Communications


                 Amira ALLOUM


                        Sujet :


    C ONSTRUCTION AND ANALYSIS OF
 N ON -S YSTEMATIC C ODES ON G RAPH FOR
            R EDUNDANT D ATA .
Contents

Contents                                                                                           i

List of figures                                                                                   iii

List of tables                                                                                    v

Acronyms                                                                                         vii

Introduction                                                                                      1

1 Non Systematic Constructions of Codes on Graph for Redundant data                               5
  1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .    6
      1.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . .            .    6
      1.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        .    7
  1.2 System Model and Notations . . . . . . . . . . . . . . . . . . . . . . . . .           .    8
  1.3 Information Theoretical Limits for Non Uniform Sources . . . . . . . . .               .    9
      1.3.1 AWGN Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . .             .   10
      1.3.2 Binary Erasure Channel . . . . . . . . . . . . . . . . . . . . . . . .           .   11
      1.3.3 Binary Symmetric Channel . . . . . . . . . . . . . . . . . . . . . .             .   12
  1.4 Design Principles for Non Systematic Codes on Graph . . . . . . . . . .                .   13
      1.4.1 Preliminaries on LDPC codes . . . . . . . . . . . . . . . . . . . . .            .   15
      1.4.2 Mackay-Neal Codes . . . . . . . . . . . . . . . . . . . . . . . . . .            .   17
      1.4.3 Non Systematic LDPC Framework . . . . . . . . . . . . . . . . . .                .   18
             1.4.3.1 Scramble-LDPC Construction . . . . . . . . . . . . . . .                .   19
             1.4.3.2 Split-LDPC Construction . . . . . . . . . . . . . . . . . .             .   20
      1.4.4 Information Theoretical Comparison of Splitting and Scrambling                       22
  1.5 Source Controlled Sum-Product Decoding . . . . . . . . . . . . . . . . .               .   24
  1.6 Multi Edge classification to the Non Systematic LDPC Codes . . . . . . .                .   27
  1.7 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       .   30
  1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   32

2 Density Evolution Analysis for Split-LDPC Codes                                                35
  2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   .   .   36
      2.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . .          .   .   36
      2.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   37
  2.2 Preliminaries on Density Evolution . . . . . . . . . . . . . . . . . . . . .       .   .   38
      2.2.1 Concentration and The local tree assumption . . . . . . . . . . .            .   .   38
      2.2.2 Symmetry Assumptions and the All one Codeword Restriction                    .   .   39

                                                                                                   i
CONTENTS


           2.2.3 General Statement of Density Evolution . . . . . . . . . . . . . .                      .   40
     2.3   Statement of Split-LDPC Density Evolution . . . . . . . . . . . . . . . . .                   .   43
     2.4   Analytic Properties of Split-LDPC Density Evolution . . . . . . . . . . .                     .   47
           2.4.1 The Consistency Condition . . . . . . . . . . . . . . . . . . . . . .                   .   47
           2.4.2 Monotonicity and Convergence to Fixed Points . . . . . . . . . .                        .   49
           2.4.3 Thresholds and Density Evolution Fixed Points . . . . . . . . . .                       .   50
     2.5   The Stability Analysis of Split-LDPC Density Evolution . . . . . . . . . .                    .   51
           2.5.1 The Stability Condition for Systematic LDPC Codes : . . . . . . .                       .   52
           2.5.2 The Stability Condition for the Non Systematic Split-LDPC Code                          .   54
     2.6   Simulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .               .   60
     2.7   Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .             .   63

3 Exit Chart Analysis and Design of Irregular Split-LDPC Codes                                               65
  3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .           .   .   .   .   66
       3.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . .                 .   .   .   .   66
       3.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . .             .   .   .   .   67
  3.2 Preliminaries on the EXIT Chart Analysis of LDPC Codes . . . . . .                     .   .   .   .   68
       3.2.1 The EXIT Charts Principle . . . . . . . . . . . . . . . . . . . .               .   .   .   .   69
       3.2.2 The Semi-Gaussian Approximation and the Related Metrics                         .   .   .   .   70
       3.2.3 Analysis of Regular Structures . . . . . . . . . . . . . . . . .                .   .   .   .   72
       3.2.4 Analysis of Irregular Structures . . . . . . . . . . . . . . . . .              .   .   .   .   73
  3.3 The Two Dimensional EXIT Charts Analysis of Split-LDPC Codes .                         .   .   .   .   75
       3.3.1 Analysis of Regular Structures . . . . . . . . . . . . . . . . .                .   .   .   .   79
       3.3.2 Analysis of Irregular Structures . . . . . . . . . . . . . . . . .              .   .   .   .   81
  3.4 Design of Irregular Split LDPC codes . . . . . . . . . . . . . . . . . .               .   .   .   .   86
  3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            .   .   .   .   89

4 Enhancing Iterative Decoding via EM Source-Channel Estimation                                             91
  4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     .   .   .   .   .   .   . 92
      4.1.1 Motivation and Related Work . . . . . . . . . . . . . . .            .   .   .   .   .   .   . 92
      4.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   . 93
  4.2 System Model and Notations . . . . . . . . . . . . . . . . . . .           .   .   .   .   .   .   . 94
      4.2.1 Source State Information . . . . . . . . . . . . . . . . . .         .   .   .   .   .   .   . 95
      4.2.2 Channel State Information . . . . . . . . . . . . . . . . .          .   .   .   .   .   .   . 95
  4.3 General Statement of the EM Algorithm . . . . . . . . . . . . .            .   .   .   .   .   .   . 96
  4.4 EM application to BECBSC . . . . . . . . . . . . . . . . . . . . .         .   .   .   .   .   .   . 97
      4.4.1 Expectation step on BECBSC . . . . . . . . . . . . . . .             .   .   .   .   .   .   . 97
      4.4.2 Maximization step on BECBSC . . . . . . . . . . . . . .              .   .   .   .   .   .   . 98
      4.4.3 Two simple special cases, BEC and BSC . . . . . . . . .              .   .   .   .   .   .   . 99
  4.5 EM application to AWGN . . . . . . . . . . . . . . . . . . . . .           .   .   .   .   .   .   . 99
      4.5.1 Expectation step on AWGN . . . . . . . . . . . . . . . .             .   .   .   .   .   .   . 99
      4.5.2 Maximization step on AWGN . . . . . . . . . . . . . . .              .   .   .   .   .   .   . 100
  4.6 Simulations and Results . . . . . . . . . . . . . . . . . . . . . .        .   .   .   .   .   .   . 101
  4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      .   .   .   .   .   .   . 102

5 Conclusions and Perspectives                                                                               103

Bibliography                                                                                                 107


ii
List of Figures

 1.1    Block Diagram for a Source Controlled Channel Coding System . . . . . .                   8
 1.2    Minimum achievable Ebr /N0 vs. source entropy Hs Coding rateR = 0.5. .                   12
 1.3    Minimum Achievable SNR versus Coding Rate in AWGN, Hs = 0.5. . . .                       13
 1.4    Capacity limit versus source entropy for BEC and BSC. . . . . . . . . . . .              14
 1.5    Factor Graph of an LDPC Code . . . . . . . . . . . . . . . . . . . . . . . . .           15
 1.6    Mackay-Neal Codes: Block and Tanner Graph descriptions . . . . . . . . .                 18
 1.7    General Non-Systematic LDPC Encoding Framework . . . . . . . . . . . .                   19
 1.8    Scramble LDPC Tanner Graph and block Diagram Description . . . . . .                     20
 1.9    Split-LDPC Tanner Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . .          21
 1.10   General Structure of a Split-LDPC Parity Check Matrix . . . . . . . . . . .              22
 1.11   Mutual information vs. Ebr /N0 for Hs = 0.5 and coding rates R = 0.5, 0.8                23
 1.12   Minimum achievable Ebr /N0 vs. source entropy Hs Coding rate R = 0.8. .                  24
 1.13   The local message node to checknode update rule. . . . . . . . . . . . . . .             25
 1.14   The local checknode to bitnode update rule. . . . . . . . . . . . . . . . . . .          26
 1.15   Multi-Edge graphical interpretation of Split and Scramble-LDPC. . . . . .                28
 1.16   Performance of systematic and non systematic LDPC with R = 0.5, µ = 0.1                  30
 1.17   Performance of systematic and non systematic LDPC with R = 0.5, µ = 0.2                  31
 1.18   Performance of systematic and non systematic LDPC with R = 0.9, µ = 0.1                  31
 1.19   The effect of splitter degree variation over Split-LDPC codes with R = 0.5               32

 2.1    The tree describing a regular (db , dc ) LDPC. . . . . . . . . . . . . . . . .   .   .   40
 2.2    Tree representation for type-1 messages for rate R = 1/2 split-LDPC. .           .   .   43
 2.3    Tree representation for type-2 messages for rate R = 1/2 split-LDPC. .           .   .   44
 2.4    Tree representations for type-3 messages for rate R = 1/2 split-LDPC. .          .   .   44
 2.5    Consistency of the Splitter Output distribution . . . . . . . . . . . . . .      .   .   49
 2.6    The equivalent channel that is seen by the core-LDPC . . . . . . . . . .         .   .   56
 2.7    Threshold versus source entropy variation for the Split-LDPC . . . . .           .   .   61

 3.1    EXIT charts Analysis General Principle. . . . . . . . . . . . . . . . . . . . .          69
 3.2    The evolution of densities at the bitnode output (left) and checknode out-
        put(right) for a regular (3, 6) LDPC at 1.18dB. . . . . . . . . . . . . . . . . .        70
 3.3    Elementary GA charts with single Gaussian pdf input for different variable
        degrees for a regular LDPC with dc = 6 on an AWGN at SN R = −2.0dB .                     72
 3.4    Elementary charts with Gaussian mixture input for the different variable
        nodes degrees for an irregular LDPC with ρ6 = 0.7855 ρ7 = 0.2145 and
        λ(x) = .3266x + .1196x2 + .1839x3 + .3698x4 on an AWGN channel at
        SNR=-2.0dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      73

                                                                                                 iii
LIST OF FIGURES


     3.5    EXIT charts trajectory with Gaussian mixture input for an irregular LDPC
            with ρ6 = 0.7855 ρ7 = 0.2145 and λ(x) = .3266x + .1196x2 + .1839x3 +
            .3698x4 on an AWGN channel at SNR=-2.0dB. (the tunnel is open). . . . .                   74
     3.6    Tree representation for type-1 messages for rate R = 1/2 split-LDPC. . . .                75
     3.7    Tree representation for type-2 messages for rate R = 1/2 split-LDPC. . . .                76
     3.8    Tree representations for type-3 messages for rate R = 1/2 split-LDPC. . . .               76
     3.9    Transfer chart F (x, y) without mixtures. Illustration for ds = db = 3, dc =
               E
            6, Ns = −5.00dB. Input distributions are single Gaussian.Entropy Hs = 0.5
                 0
                                                                                                      79
     3.10   Trajectory of error probability near the code threshold over the surface
            associated to the transfer chart F (x, y) without mixtures. Illustration for
                                                               E
            ds = db = 3, dc = 6, right to the threshold : Ns = −5.00dB. Input distribu-
                                                                 0
            tions are single Gaussian. Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . .           80
     3.11   Open tunnel obtained from the EXIT chart of the ds = db = 3, dc = 6
                                              E
            regular split-LDPC with at Ns = −5dB , Entropy Hs = 0.5. The tunnel
                                                0
            is made by plotting the trajectory of error probability and its z = x plane
            reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     80
     3.12   Transfer chart F (x, y) with mixtures. Illustration for a (ds = 3, db = 3, dc =
                                          E
            6) regular LDPC code at Ns = −5.00dB. Input distributions are Gaussian
                                            0
            mixtures. Channel initial point at top of the sail. Fixed point of zero error
            rate at bottom of the sail.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . .         82
     3.13   Transfer chart F (x, y) with and without mixtures and associated error prob-
                                                                                 E
            ability trajectories. Illustration for a regular LDPC code at Ns = −5.00dB.
                                                                                   0
            Channel initial point at top of the sail. Fixed point of zero error rate at
            bottom of the sail.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . . . . . .         82
     3.14   Trajectory of error probability near the code threshold. Illustration for an
            irregular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 +
                                                                                            E
            0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Right to the threshold: Ns =           0
            −5.58dB, Threshold=−5.68dB. Final fixed point is 0.Entropy Hs = 0.5 . .                    83
     3.15   Trajectory of error probability near the code threshold. Illustration for an
            irregular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 +
                                                                                            E
            0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Left to the threshold: Ns =            0
            −5.78dB, Threshold=−5.68dB. Final fixed point is non-zero.Entropy Hs =
            0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   84
     3.16   Stability of fixed points beyond the code threshold. Illustration for an ir-
            regular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 +
                                                                                            E
            0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Left to the threshold: Ns =            0
            −5.78dB, Threshold=−5.68dB. Stable and unstable fixed points.Entropy
            Hs = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      85
     3.17   Open tunnel obtained from the EXIT chart of an irregular split-LDPC with
            λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 + 0.36988x4 , ρ(x) = 0.78555x5 +
                            E
            0.21445x6 , at Ns = −5.58dB , Entropy Hs = 0.5. The tunnel is made by
                              0
            plotting the trajectory of error probability and its z = x plane reflection . .            85

     4.1    General Model of a coded communication system with EM estimation .                   . 94
     4.2    General model for the EM source-channel estimation . . . . . . . . . . .             . 96
     4.3    Estimated source distribution µ versus EM iterations. . . . . . . . . . . .          . 100
     4.4    EM Performance for the (3,6) LDPC over BEC . . . . . . . . . . . . . . . .           . 101
     4.5    EM Performance for systematic and non-systematic LDPC over AWGN                      . 102

iv
List of Tables

 1.1   Multinomial description of Split (top) and Scramble (bottom) LDPC             . . .   29

 2.1   Threshold table for regular Split LDPC with R = Hs = 1 and ds ∈ {3, 5} . 62
                                                              2
 2.2   Threshold tables for regular Split-LDPC with ds = 3, R = 1 and Hs ∈ {1, .3} 62
                                                                  2
 2.3   Threshold table for irregular Split-LDPC with R = Hs = 1 and ds = 3 . . 63
                                                                2

 3.1   Error in approximation of threshold (dB) using EXIT chart analysis for var-
       ious regular split-LDPC codes of Rate one-half with ds = 3 and Hs = 0.5.
       ∆ is the log-ratio quantization step. . . . . . . . . . . . . . . . . . . . . . . .   86
                                                                1
 3.2   Threshold table for irregular split LDPC with Hs = 2 and ds = 3 . . . . . .           88




                                                                                             v
LIST OF TABLES




vi
A CRONYMS




                               ACRONYMS


    Here we list the main acronyms used in this document. The meaning of the acronym
is usually indicated once, the first time it appears in the text.


JSCC      Joint Source Channel Coding
SCCD      Source Controlled Channel Decoding
LDPC      Low Density Parity Check Code
MN        Mackay-Neal
AWGN      Additif White Gaussian Noise
BIAWGN    Binary Input Additif White Gaussian Noise
BEC       Binary Erasure Channel
BSC       Binary Symmetric Channel
QLI       Quick Look In
pdf       Probability density function
pmf       Probability mass function
i.i.d     independant and idenitcally distributed
BPSK      Binary Phase Shift Keying
LDGM      Low Density Generator Matrix
LLR       Log Likelihood Ratio
SNR       Signal to Noise Ratio
DE        Density Evolution
EM        Expectation Maximization
CSI       Channel State Information
SSI       Source State Information
BECBSC    Binary Symmetric Channel with Erasures
APP       A Posteriori Probability
BWT       Burrows Wheeler transform
EXIT      Extrinsic Information Transfer
MIMO      Multi Input Multi Output
GA        Gaussian Approximation




                                                                                 vii
A CRONYMS




viii
Introduction

    In the past decade, the field of Modern Coding Theory has known a focused interest
and an intensity of efforts for the design of capacity achieving codes defined on graphs
and the analysis of the associated iterative decoding algorithms [52],[72].
    This excitement for codes on graphs and the iterative techniques was ignited in the
mid-1990s by the excellent performance exhibited by the turbo codes of Berrou and Glavieux
[17], and the resuscitated Gallager codes [33].
    Afterwards, iterative methods have been influencing a wide range of applications
within and beyond communications, allowing the development of sophisticated com-
munication systems.
    In most of those communication systems, the channel coding procedures are designed
independently of the source statistics assuming uniformly distributed sources. This ap-
proach is justified by the Shannon’s famous separation theorem [83] stating that separate
source channel coding incurs no loss of optimality as long as the entropy of the source is
less than the capacity of the channel providing that blocklength goes to infinity.
    Because of various considerations (such as legacy and the difficulty in adapting uni-
versal data compressors to short-packet transmission systems), certain existing applica-
tions do not compress the redundant data prior to channel encoding (including third-
generation wireless data transmission systems)[60]. In other applications, such as the
Global System for Mobile Communications (GSM) second-generation cellular wireless
system, the vocoder leaves some residual redundancy prior to channel encoding [111],
[64], [108], [60] . In those cases, the signal sent through the channel incorporates redun-
dancy due to its channel encoder and the data itself.
    While Shannon has been establishing the famous separation theorem, he intuited that
any redundancy in the source will usually help if it is utilized at the receiving point [83].
In other words, when the residual source redundancy is ignored by the receiver, some
losses are induced and the system performs worse than its potentiality.
    The first practical embodiment of this idea was the proposal of the Source Controlled
Channel Decoding technique (SCCD) made by Hagenauer in [39] and consisting in ex-
ploiting the redundancy as an a priori information during the iterative decoding process.
    Besides, Shamai and Verdu have shown in [78] that the use of non systematic coding
schemes is more appropriate in presence of redundant sources. This is because those
constructions supply the capacity achieving distribution to the channel.
    In the light of these two pioneering contributions several research directions have
been drawn, where most important concepts encountered during the study of codes on
graphs within the uniform sources assumption are to be carried over to the non uniform
case.
    Some of the major concerns of those directions are:

                                                                                         1
I NTRODUCTION


    • Investigating the new information theoretical limits in presence of redundant sources.

    • Designing capacity achieving non-systematic codes on graph.

    • Defining associated source controlled iterative decoding algorithms.

    • Building analysis tools in order to investigate the asymptotical performance of non-
      systematic codes in presence of redundancy, and to design the best constructions.

    These guidelines meet the challenges of the present thesis, where the purpose is to
introduce the design principles and the analysis tools for a novel class of non systematic
low density parity check codes dedicated to non uniform memoryless sources. In partic-
ular, we study the most performant scheme of the class namely the split-LDPC code that
consist in the concatenation of a pre-coding with an LDPC code [5],[80].
    In 1997 Mackay and Neal pioneered the field with a class of non systematic codes
called ’MN codes’ and obtained by the scrambling of a low-density generator-matrix
[53],[51]. One of the main drawbacks related to MN codes , is that their decoder is specif-
ically designed for the binary symmetric channel. Another major proposal has been done
by Alajaji for non systematic parallel turbo-codes in [116].
    Afterwards, in 2005 G.Shamir and J.J.Boutros have introduced a novel general ap-
proach to design a special class of non systematic low density parity check codes dedi-
cated to non uniform memoryless sources. This code family is including MN codes as a
particular case.
    First, the authors proposed several design configurations based on the concatenation
of a pre-coding or a post-coding with an LDPC code. We went on with further investi-
gations and found out that the most performant scheme consist in the concatenation of a
pre-coding with an LDPC code namely the split-LDPC [5],[80].
    In this thesis we focus on the design of the split-LDPC codes in presence of redundant
memoryless data; and the asymptotical analysis of the associated source-controlled sum-
product decoding.
    Firstly, we shall investigate the new theoretical limits in presence of non uniform
sources and show theoretically the advantage of non systematic constructions over sys-
tematic ones. Then we shall introduce the novel class of non systematic LDPC codes and
prove the supremacy in term of performance of splitting structures over all other non
systematic codes on graph constructions including MN codes.
    Secondly, we shall perform the asymptotical analysis of split-LDPC codes through
new tools obtained by adapting the classical general theory related to Density Evolution
and EXIT Chart analysis to the split-LDPC case and the non uniform source assumption.
Using the tools introduced for the analysis, we shall address the problem of designing
good irregular split-LDPC constructions.
    Finally, we shall extend the context of our communication setting by considering the
practical scenarios, where the knowledge of the source and channel parameters is im-
possible. For so doing, we shall investigate jointly the decoding and the estimation of
the source and channel parameters, using an iterative estimation technique based on the
Expectation Maximization (EM) algorithm.
    The material covered in this thesis has the special appeal that it unifies many themes
of information theory, coding, and communication.
    The outline of this thesis is as follows:

2
I NTRODUCTION


  • In Chapter 1, we demonstrate using information theory tools why non systematic
    constructions show the best theoretical limits in presence of redundancy. In order to
    attain these limits, we introduce a universal family of non systematic LDPC codes
    extending Mackay construction and well suited to SCCD sum-product decoding.
    The proposed configurations are based on the concatenation of a pre-coding or a
    post-coding with an LDPC code. The precoding module consist in a sparse matrix
    ’scrambler’ or the inverse of a sparse matrix ’splitter’ dedicated to transform the re-
    dundant data bits into uniformly distributed coded bits . We show how to perform
    a source controlled iterative decoding (SCCD) on the factor graph associated to the
    non-systematic LDPC codes. We prove the supremacy in term of performance of
    splitting structures over prior non-systematic LDPC constructions as well as the
    other realizations of our codes. The material of this chapter is reported in part in [5]
    and [80].

  • Chapter 2 intends to determine the asymptotical performance of the split-LDPC
    codes, and evaluate how close are they from the theoretical limits. In this pur-
    pose, we derive a message-oriented density evolution analysis of the split-LDPC
    source controlled sum-product decoding. The basic concepts,features, assumptions
    and results of the general theory are conformed to the split-LDPC construction and
    the presence of redundant data. The analysis reveals that the split-LDPC construc-
    tion shows a good asymptotical performance. In order to make our analysis com-
    plete, we perform a stability analysis of the dynamical system associated to the
    split-LDPC density evolution algorithm. We derive a general stability condition as-
    sociated to the split-LDPC code and involving the source statistics as well as the
    splitter structure. We investigate if the stability of the whole system is related to
    the stability of the constituent LDPC code for a considered channel condition. The
    material of this chapter is reported in part in [5] and [4].

  • In Chapter 3, we propose a message oriented two-dimensional EXIT chart analysis
    for split-LDPC codes. Our approach is based on the elimination of the Gaussian as-
    sumption at the output of the checknodes , and the use of the mutual information as
    a measure of approximation. By doing so, we compute the split-LDPC codes thresh-
    olds within few thousandths of a decibel, at the cost of a lower complexity than den-
    sity evolution approaches. Through the formulation of the two-dimensional EXIT
    chart analysis of split-LDPC codes, the source controlled iterative decoder is to be
    illustrated as a bi-dimensional dynamical system. Hence, the graphical representa-
    tion of charts will bring more insight to the process of convergence to fixed points,
    as well as to the stability issues of the associated iterative decoding system. Finally,
    within the framework of our proposal, we formulate the problem of designing ir-
    regular split-LDPC codes as a linear program. We propose practical algorithms to
    solve this problem and design capacity achieving split-LDPC codes. Hence, we
    obtain irregular split-LDPC constructions, with a significantly reduced gap to the
    Shannon limit. The material of this chapter is reported in part in [4].

  • Chapter 4 is devoted to the practical scenarios, where the knowledge of the source
    and channel parameters is impossible and an estimation module is required. There-
    fore, we investigate jointly the decoding and the estimation of the source and chan-
    nel parameters, using an iterative estimation technique based on the Expectation
    Maximization (EM) algorithm. We describe how the estimation technique can be

                                                                                          3
I NTRODUCTION


      integrated into the non systematic decoding process. We propose a scheduling for
      the interaction between the sum-product decoder and the EM estimation module.
      Our approach includes both systematic and non systematic LDPC codes, and can be
      extended to any other block codes [58] decoded by the sum-product algorithm. Our
      study is applied to the discrete binary symmetric channel with erasures (BECBSC)
      and the continuous complex additive white Gaussian noise channel (AWGN). The
      simulation results confirm that using the EM estimation technique no loss in error-
      rate performance is observed with respect to the perfect knowledge case, at the
      expense of a negligible complexity. Hence, we show that the decoder is able in a
      blind mode to perform as good as a perfect knowledge situation. The material of
      this chapter is reported in part in [19].

    • Eventually,conclusions, future work perspectives and open problems are given in
      Chapter 5.




4
Chapter 1

Non Systematic Constructions of
Codes on Graph for Redundant data

    Claude Elwood Shannon in his gospel mandates :
    "The redundancy must be introduced in the proper way to combat the particular noise struc-
ture involved. However any redundancy in the source will usually help if it is utilized at the
receiving point. In particular if the source already has a certain redundancy and no attempt is
made to eliminate it in matching to the channel, this redundancy will help to combat noise".
In a tandem view, when channel and source coding are treated independently , the first
cited redundancy is the one introduced by the forward error correcting code, while the
second one is related to the source and could be natural (no source coding applied)or
residual (compression is applied).

    Two directions have been taken, to demonstrate Shannon intuition. In 1995 Hage-
nauer has shown how iterative probabilistic decoding is able to exploit the source statis-
tics via the Source Controlled Channel Decoding (SCCD) strategy. At the end of the nineties,
Mackay then later Alajaji proposed to use non systematic constructions to encode redun-
dant data, because their theoretical limits are better than systematic ones in presence of
redundancy. Both authors focused on capacity achieving codes family, in order to ap-
proach as close as possible the challenging theoretical limits shown with non systematic
constructions. Hence, a well designed non systematic construction added to an SCCD
decoding constitute the key of success of a coding system in presence of redundant data.

   In our setting, we investigate this idea and demonstrate using information theory
tools why non systematic constructions show the best theoretical limits in presence of
redundancy. In order to attain these limits we describe a particular design of non sys-
tematic LDPC codes extending Mackay’s construction and well suited to SCCD belief
propagation decoding. We reveal that one specific realization of this code family outper-
forms the prior LDPC constructions as well as the other realizations of the family. We
explain how the introduced design criterias are involved in such performance.




                                                                                             5
1. N ON S YSTEMATIC C ONSTRUCTIONS     OF   C ODES   ON   G RAPH   FOR   R EDUNDANT   DATA


1.1 Introduction
1.1.1 Motivation and Related Work
In most of tandem communication schemes, the channel coding procedures are designed
and analyzed for uniformly distributed sources.
    To a large extent, this approach was justified by Shannon’s famous separation theo-
rem [83] stating that separate source channel coding incurs no loss of optimality as long
as the entropy of the source is less than the capacity of the channel providing that block-
length goes to infinity. In practice, delay and complexity restrictions related to extremely
long blocklength, made the Joint Source Channel Coding (JSCC) approaches expected to
offer improvements for the combination of a source with significant redundancy and a
channel with significant noise[110]. Also, the separation is no longer valid in multi-user
scenarios [72].
    Enhancing the tandem schemes performance by making the channel codes exploiting
the source redundancy which could be natural or residual, is in essence a joint source-
channel coding problem. This problem was first intuited by Shannon [83], then for-
malized by Hagenauer proposal via Source Controlled Channel Decoding (SCCD) [39]
thanks to iterative probabilistic approaches.
    The natural redundancy is considered in the situations where it is not worth to com-
press, for instance when the channel conditions are bad or medium, thus it is preferred
exploiting this redundancy in the decoding step instead of compressing the source [20].
Conversely,when sources are highly redundant, a source encoder would be used. An
"ideal" source encoder would produce an independent, identically distributed (i.i.d.)
sequence of equiprobable bits at the output and eliminate all the source redundancy.
However, lossless Variable length or entropy codes (e.g., Huffman codes), which can be
asymptotically optimal,are avoided since they imply error-propagation problems in the
presence of channel noise. Therefore, over noisy channels we commonly use fixed length
encoders which are suboptimal and leave some residual redundancy at their output[37].
    In presence of non uniform sources, the study of information theoretical limits has
shown that the non systematic codes exhibits better asymptotical limits than systematic
constructions . Those codes supply the capacity achieving distribution to the considered
channel as highlighted by Shamai and Verdu in [78]. Consequently, when a redundant
source is coded with a non systematic code, then is transmitted over a noisy channel and
decoded via a SCCD strategy, the Shannon capacity limits are moving to better regions.
    Best candidates in the race to these challenging information theoretical limits are
capacity achieving codes on graphs, namely Turbo Codes introduced by Berrou and
Glavieux in 1993 [17] and Gallager Low Density Parity Check codes (LDPC) codes [33]
proposed in the sixties and resurrected after turbo code invention [53], [51] and [70]. This
class of codes defined on graph is renowned to be able to approach Shannon capacity
bound very closely with a reasonable decoding complexity. Both families are obtained
by connecting simple component codes through an interleaver then decoded via itera-
tive algorithms consisting of the application of soft, decentralized decoding algorithms
of these simple codes, (eg., BCJR and message passing) [10],[33]. Those algorithms are
realizations of the "Belief Propagation" [63], a more general algorithm that is well suited
to integrate the SCCD strategy.
    Until the end of the twentieth century, most developments in coding area have con-
sidered the design and optimization of codes defined on graphs for uniformly distributed

6
1.1. I NTRODUCTION


sources, until David Mackay with Oliver Neal[53] made the exception in 1995 when they
proposed Mackay-Neal (MN) codes which are a class of non systematic codes obtained
by the scrambling of low-density generator-matrix [51]. The decoder of an MN code was
specifically designed for binary symmetric channel. Hence, when it is applied to additive
white Gaussian noise (AWGN) channel, this is done by hard decision quantization of the
a priori values,which is involving a degradation in decoding performance.
     Other work for SCCD with LDPC codes using systematic LDPC codes was proposed
by Shamir in [81]. The author has shown a gain compared to standard decoding owing
to the utilization of source statistics, however this performance is still less than the one
expected with non systematic constructions.
     The main following proposals were made by Alajaji et al. and were mainly related
to non systematic Turbo codes for AWGN channels [3],[114],[116], [115], [113],and more
recently for wireless fading channels [112]. In [1], Adrat et. al proposed an improved
iterative source-channel decoding for systematic Turbo codes using EXIT charts.
     Another straightforward solution to obtain non systematic code, is to generate a lower
rate systematic LDPC code, puncture the systematic bits and transmit the parity bits.
However puncturing has been proved to be unsuccessful and the gain attained not suffi-
cient to offset the loss of performance due to puncturing [79].
     In the IEEE 2005 International Symposium on Information Theory G.Shamir and
J.J.Boutros have introduced a novel general approach to design a special class of non
systematic low density parity check codes suited for non uniform memoryless sources.
     First, the authors proposed several design configurations based on the concatenation
of a pre-coding or a post-coding with an LDPC code. Subsequently we have carried on,
with further investigations to reveal the most performant scheme namely the split-LDPC
[5],[80].
     The purpose of the present chapter is to introduce this universal class of non sys-
tematic low density parity check codes through the underlying concepts and techniques
involved in the design of this family code as well as in the algorithmic strategies un-
dertaken to decode it. Moreover, we will prove using information theoretical tools the
supremacy of the split-LDPC construction.


1.1.2 Contributions
The main contributions of the author for this chapter are based on the results reported
in the two papers presented at the Allerton Conference on Communication, Control, and
Computing [5], and the inaugural of Information Theory and Applications Workshop
[80].
    The innovative contribution in this chapter is threefold.
    First, we demonstrate in term of Information theoretical limits the advantage of non
systematic code constructions over systematic ones, in presence of non-equiprobable
memoryless sources. We focus on binary memoryless channels namely BEC, BSC and
AWGN channel, and evaluate analytically the losses involved in systematic coding in
terms of mutual information and energy.
    Then, we introduce and study the general method proposed by Shamir and Boutros to
construct a universal family of non systematic LDPC encoders and decoders comprising
two kinds of constructions called scramble and split LDPC. We describe how to perform
source controlled decoding (SCCD) for exploiting the redundancy of the source during
the decoding step.

                                                                                          7
1. N ON S YSTEMATIC C ONSTRUCTIONS            OF   C ODES   ON   G RAPH   FOR   R EDUNDANT   DATA


    Finally, we place the split and scramble LDPC codes in the more general framework
of non systematic codes, and prove the split-LDPC as the most performant code design
over other non systematic constructions namely the scramble-LDPC and the MN codes.
We show that MN and our class of non systematic-LDPC codes are both members of a
more general family called Multi-Edge LDPC [69].
    The most important results are the following :
    • We demonstrate that in the presence of source redundancy there may be a signifi-
      cant advantage to the use of a well designed non-systematic channel encoding over
      a systematic one.
    • We describe general methods for designing non-systematic LDPC codes by scram-
      bling or splitting redundant data bits into coded bits. These methods consist mainly
      of cascading a sparse matrix or the inverse of a sparse matrix with an LDPC code.
    • Splitting based LDPC codes achieve better gains in the presence of redundancy than
      other known codes, including MacKay-Neal (MN) codes, without significant loss
      in performance even if the data contains no redundancy.


1.2 System Model and Notations


      0.9

                  p(s = 1) = µ
      0.1     0
                  1


               Source       Channel Encoder        Channel       Channel Decoder         Sink




                                              µ


            Figure 1.1: Block Diagram for a Source Controlled Channel Coding System

   Let us describe the considered system model following the block diagram illustration
given in the Figure (1.1).
   Our setting assumes a non-uniform binary independent identically distributed (i.i.d)
source which generates a binary sequence of length K denoted by s             (s1 , s2 , ..., sK )T
where si ∈ {0, 1}. The source follows a discrete Bernoulli probability distribution (pdf),
characterized by the parameter µ = P (si = 1). This parameter is the probability that a
source symbol equals 1, i.e. 0 < µ ≤ 1 . The source entropy is given by Hs = H2 (µ),where
                                     2
0 < Hs ≤ 1 and H2 (x) = −x log(x) − (1 − x) log(1 − x) is the binary entropy function.
The logarithm function is taken, here and elsewhere in the report, to be the logarithm
function of base of 2.
   Our model is not restrictive, because it still possible to convert a general finite-memory
source into a piecewise i.i.d. non-uniform source as proposed in [28]. The interested
reader may find the developments related to finite memory sources in [36].

8
1.3. I NFORMATION T HEORETICAL L IMITS           FOR   N ON U NIFORM S OURCES


   We assume that the source sequence is directly fed to a linear blocklength channel
encoder C(N, K) without any data compression. Our study is restricted to channel en-
coding via systematic and non-systematic binary linear codes of rate R = K/N . The
codeword has length N ≥ K and is denoted by c (c1 , ..., cN )T .
   We assume x (x1 , ..., xN )T the (B.P.S.K) modulated codeword which is transmitted
over a symmetric memoryless noisy channel. Three kinds of binary input symmetric out-
put channels are considered in our setting: the binary erasure channel (BEC), the binary
symmetric channel (BSC) and the binary input additive white Gaussian noise channel
(BIAWGN). In all the following we denote by AWGN the binary input AWGN channel,
continuous input AWGN channel will be indicated if necessary.
   The BEC channel has a binary input and a ternary output and is characterized by
an erasure probability denoted ε. The BSC channel is characterized by its cross-over
probability denoted λ. The AWGN channel is characterized by one sided power spectral
density N0 .
   The received noisy vector denoted by y       (y1 , ..., yN )T is introduced in the decoder
which is supposed to know the distribution of the source as well as its parameter µ.


1.3 Information Theoretical Limits for Non Uniform Sources
In this section, we illustrate the advantage of the best possible non-systematic codes over
systematic codes, when the source is redundant.We show how the mutual information
between channel input and output can be made closer to the maximal achievable one
using well designed non systematic codes, while systematic codes constrain the mutual
information to be smaller. We start by introducing some useful information theory facts :
    Let I(X;Y)     H(Y) − H(Y|X) denotes the mutual information between the channel
input vector X and the channel output vector Y . Capital letters are used to denote random
variables and Capital boldface letters are used to denote random vectors. The channel
capacity is given by :

                                         C = max      I(X;Y)                            (1.1)
                                               p(x)

where the maximum is taken over all possible input distribution p(x) [22]. In practice,
we use the following capacity expressions for the continuous channel case , when the
channel is discrete we replace integration operator by the summation one :

                                                                  P (y|x)
                    C = max            P (x)P (y|x) log                         dx dy   (1.2)
                         P (x)                                 P (x′ )P (y|x′ )
                                 x y                      x′

For a given channel and channel input distribution, the theoretically achievable channel
code rate R satisfies :
                              N × Hs × R = I(X;Y)                                   (1.3)
    Let us recall that the discrete uniform input distribution is achieving the capacity
for a binary discrete input memoryless channels, as well as the Gaussian distribution
does with the continuous input AWGN channel. Besides, the discrete uniform input
distribution attains the maximal achievable mutual information over the binary input
AWGN (BIAWGN), which is indeed lower than the exact capacity of the channel and is
usually limited by th size of the modulation.

                                                                                           9
1. N ON S YSTEMATIC C ONSTRUCTIONS         OF   C ODES   ON   G RAPH   FOR   R EDUNDANT   DATA


    The systematic code weakness arises from the fact that they result in channel se-
quences that are not uniformly distributed for most redundant sources. Hence, the em-
pirical distribution of the transmitted sequences is far from the capacity achieving distri-
bution. (see [78] for discussion about the empirical distribution of good codes)
    In the following, let us evaluate the mutual information I(X; Y ) for both systematic
and non systematic codes design for the BEC(ε), BSC(λ) and BIAWGN (N0 ), hence the
equation (1.3) becomes :
                                      Hs R = I(X; Y )                                 (1.4)
   For a systematic code, when the source is non uniform the modulated sequence x in-
cludes a non uniformly distributed information sequence xs and a uniformly distributed
parity sequence. The mutual information for a systematic code is the average taken over
these two populations :

               Isys (X; Y ) = R Iinf ormation (Xs ; Ys ) + (1 − R) Iparity (X; Y )          (1.5)

    Besides, a well designed theoretically optimal non systematic code can generate an
empirical uniform distribution, achieving the maximal achievable mutual information asso-
ciated to the channel as well as to the considered modulation i.e. BPSK:

                               Inonsys (X; Y ) = Iparity (X; Y )                            (1.6)

   Accordingly, the systematic construction reveals a loss in mutual information as we
will demonstrate in the following.

1.3.1 AWGN Channel
Let us begin with the continuous case and consider an AWGN channel with a real Gaus-
sian code book of rate R = K/N for channel encoding.The information rate transmitted
by the encoder is the product Hs × R (bits per real dimension) and the Gaussian dis-
tribution is the capacity achieving input distribution for the Gaussian channel. For a
given source distribution and a fixed coding rate, in order to fix the minimal achievable
signal-to-noise ratio per bit, Ebr /N0 , where Ebr denotes the energy per redundant source
symbol, following (1.3)and (1.4) we solve the equation where information rate is equal to
the channel capacity to obtain :
                                         1
                               Hs R =      log (1 + 2REbr /N0 )                             (1.7)
                                         2
Then , we find the well known expression of the theoretical minimum achievable SNR
[36]
                                          22Hs R − 1
                                Ebr /N0 =            .                             (1.8)
                                             2R
We obtain the Shannon limit to the uniform source case, by considering Hs = 1, as an
upper bound to (1.8). Consequently if one doesn’t consider the statistics of the source
and assumes it uniform, a suboptimal limits would be achieved and a potential gain
would be lost.
    For an AWGN channel with BPSK input and a real continuous output, the maximal
achievable mutual information denoted CBP SK is written as:

              CBP SK = 1 − E [log2 (1 + exp(−2X/N0 ))] ,           X is N (1, N0 )          (1.9)

10
1.3. I NFORMATION T HEORETICAL L IMITS      FOR   N ON U NIFORM S OURCES


where the mathematical expectation is denoted by E[·], and N (1, N0 ) denotes√ Gaussian
                                                                              a
random variable with mean 1 and variance N0 . The BPSK amplitude A = 2Rc Ebr is
normalized to 1.
    As argued in equation (1.5) we obtain the best achievable average mutual information
for a systematic code with BPSK modulation by averaging the systematic bits mutual
information denoted I(Xs ; Ys ) and the parity bits best mutual information CBP SK as
written in the following:

               Isys (X, Y)/N = R I(Xs ; Ys ) + (1 − R) CBP SK ≤ CBP SK .           (1.10)

Where :

                  I(Xs ; Ys ) = (1 − µ) E [log2 (1 − µ (1 − exp(2Y /N0 )))]        (1.11)
                        + µ E [log2 (1 − (1 − µ) (1 − exp(−2X/N0 )))] ,

where X is N (1, N0 ) and Y is N (−1, N0 ).
A theoretically optimal non-systematic code, can generate uniform distributions for all
components of X and achieve CBP SK . Accordingly, we can deduce that :

                                                    f (X)µ f (−X)1−µ
                  Isys = Inonsys − R × E log2               1                      (1.12)
                                                           f 2 (X)

    We define f (X) = µeX/N0 + (1 − µ)e−X/N0 where X is N (1, N0 ) , and keep the same
notations and assumptions as in (1.9). The equations (1.10) and (1.12) demonstrate the
loss in mutual information induced when systematic codes are encoding non uniform
data.
    The minimum achievable Ebr /N0 for AWGN channel corresponding to the threshold
is found by solving numerically the equality between the information rate with (1.9) for
the non systematic case on the one hand, then with (1.12) for the systematic case on the
other hand. The threshold variation versus the source entropy is illustrated in Figure
(1.2) where we can observe that the gain of non systematic codes is more significant with
highly biased sources.
    Furthermore, the Figure (1.3) shows that the loss of systematic codes is increasing
significantly with the channel code rate, pointing out that for high rate codes there is a
significant benefit in using non-systematic codes. This is expected, since high rate sys-
tematic codes contain more non-uniformly distributed bits, supplying to the channel a
distribution far from the capacity achieving one.


1.3.2 Binary Erasure Channel
The maximal achievable mutual information attained with a non systematic code in a
binary erasure channel with ε as erasure probability is :

                                 Inonsys (X; Y ) = 1 − ε                           (1.13)

With a systematic construction we find using (1.5):

                   Isys (X; Y ) = Inonsys (X; Y ) − R (1 − ε) (1 − Hs )            (1.14)

                                                                                      11
1. N ON S YSTEMATIC C ONSTRUCTIONS                                          OF   C ODES      ON   G RAPH    FOR    R EDUNDANT   DATA


                                          2



                                          0
         Minimum Achievable Eb/N0 (dB)




                                          -2



                                          -4



                                          -6



                                          -8

                                                                                               Systematic code, BPSK input
                                                                                                           Scrambled ds=3
                                         -10
                                                                                                           Scrambled ds=5
                                                                                           Non-Systematic code, BPSK input
                                                                                                            Gaussian input
                                         -12
                                               0.1   0.2   0.3       0.4      0.5        0.6        0.7      0.8        0.9   1
                                                                           Source Entropy (bits)



     Figure 1.2: Minimum achievable Ebr /N0 vs. source entropy Hs Coding rateR = 0.5.


The upper limit on ε noted εth characterizing the threshold of the code should satisfy
(1.4); thus we obtain :

                                                                    εth
                                                                     sys     = 1 − R Hs                                           (1.15)
                                                                                 1−R
                                                                 εth
                                                                  nonsys   =                                                      (1.16)
                                                                             1 − R(1 − Hs )

The loss of mutual information is equivalent to an increase of the threshold as we can
observe in the Figure (1.4).

1.3.3 Binary Symmetric Channel
The maximal achievable mutual information attained with a non systematic code in a
binary symmetric channel with λ as channel transition probability is :

                                                                 Inonsys (X; Y ) = 1 − H2 (λ)                                     (1.17)
   With a systematic construction let us define γ = µ(1 − λ) + (1 − µ)λ, then using (1.5)
we find:
                      Isys (X; Y ) = Inonsys (X; Y ) − R [1 − H2 (γ)]             (1.18)
    In order to evaluate the threshold in the BSC case, we need to solve (1.4) numerically.
In a similar way to last mentioned BEC case, the loss of mutual information is equivalent
to an in crease of the threshold as we can observe in Figure (1.4).
    In presence of redundant sources non systematic codes are more advantageous, be-
cause when some constraints are satisfied, they generate asymptotically uniform output.
Whereas, systematic constructions involve a mismatch between the biased distribution

12
1.4. D ESIGN P RINCIPLES                                     FOR   N ON S YSTEMATIC C ODES          ON   G RAPH


                                       6
                                                     Systematic codes, BPSK input
                                       5                          Scrambled ds=3
                                                                  Scrambled ds=5
                                       4          Non-Systematic code, BPSK input
                                                                   Gaussian input
       Minimum Achievable Eb/N0 (dB)   3

                                       2

                                       1

                                       0

                                       -1

                                       -2

                                       -3

                                       -4

                                       -5
                                            0.1    0.2       0.3       0.4          0.5      0.6   0.7    0.8     0.9   1
                                                                                    Coding Rate



    Figure 1.3: Minimum Achievable SNR versus Coding Rate in AWGN, Hs = 0.5.

of the systematic symbols and the uniform input distribution needed to achieve channel
capacity. This interpretation comes in light of Shamai and Verdu work in [78] proving
that the empirical distribution of any good code should approach the capacity achieving
input distribution.
    To attain the challenging theoretical limits computed in the present section, well de-
signed non systematic constructions are required. As a matter of fact, capacity achieving
codes are the best candidates to attain these limits. Accordingly, we propose to set in
the following section, the main issues related to the design principles of non systematic
codes. Then, we will describe the prior and novel non systematic capacity achieving
codes constructions suited to non uniform sources.


1.4 Design Principles for Non Systematic Codes on Graph
Non systematic constructions are made attractive because of their better use of coding
space, and their convenience for redundant data. Puncturing the systematic bits of a
lower rate systematic code, could be a straightforward method to build a non systematic
code , then passing the a-priori probabilities to the punctured systematic bits during the
decoding process would help improving performance of the punctured code, neverthe-
less the gain attained with SCCD is not sufficient to offset the loss of puncturing. Building
good non systematic codes for redundant source requires setting-up some design crite-
rias following two main issues :

   • Encoding should ensure the transformation of a non uniform distributed source
     into a uniformly distributed one.
   • Decoding should follow an SCCD strategy exploiting the statistics of the source.

                                                                                                                            13
1. N ON S YSTEMATIC C ONSTRUCTIONS                                        OF   C ODES   ON    G RAPH    FOR    R EDUNDANT   DATA


                                        1.00
                                                                                                 BEC + Non Systematic
                                        0.90                                                         BEC + Systematic

                                                                                                 BSC + Non Systematic
                                        0.80
                                                                                                     BSC + Systematic
       Channel Transition Probability




                                        0.70                    BEC


                                        0.60


                                        0.50


                                        0.40


                                        0.30                    BSC


                                        0.20


                                        0.10
                                               0   0.1   0.2   0.3    0.4        0.5      0.6     0.7      0.8      0.9   1
                                                                      Source Entropy (bits)



                                         Figure 1.4: Capacity limit versus source entropy for BEC and BSC.


    First proposals for non systematic codes have been made for turbo code first by
Massey et al. in [55], then [11], where the proposed asymmetric nonsystematic turbo
codes (NSTC) outperform Berrou’s (37, 21)code [16] by about 0.2 dB at the 10−5 BER
level. The motivation for using these NSTC is due to their larger code space, while in
presence of highly biased sources these codes don’t approach Shannon capacity as close
as in the uniform case. In order to fill this gap Alajaji in [116], then Shamir in [82] came
with non systematic turbo codes constructions well suited for non uniform sources.
    In most of those proposals, constituent codes with the Quick-Look-in (QLI) property
are used. The QLI property is defined such that the two parity sequences added together
(modulo 2) are equal the information sequence (or a delayed version of it). Actually, the
QLI property makes the nonsystematic constituent code close to a systematic one, then
the initial extrinsic estimates for the information bits are enhanced and good enough to
start-up the iterative convergence process for the second constituent. Consequently, the
performance in the waterfall region is made better for uniform distributions of the source.
Similarly, in the case of more redundant data, Shamir et al. have shown in [82] that QLI
codes exhibits a good adaptation.
    Recently, Zhu et al. in [116], have found out necessary and sufficient conditions, for
recursive convolutional encoders, to supply an asymptotically uniform marginal output
distributions,regardless of the degree of source non uniformity. These properties offer
pertinent design criterias for constructing good turbo codes dedicated to heavily biased
non uniform i.i.d sources.
    The most significant proposal for non systematic LDPC-like codes, was Mackay’s con-
struction in 1999 [51]. The Mackay-Neal (MN) codes are based on a low density generator
matrix (LDGM) code. this construction was followed by Shamir and Boutros in [79] with
a novel general approach called "Non Systematic LDPC" and including MN encoder con-

14
1.4. D ESIGN P RINCIPLES      FOR   N ON S YSTEMATIC C ODES   ON   G RAPH


struction as particular case, while the general decoder procedure enhances the MN’s one.
The two approaches dedicated to redundant data follow the same design principle, i.e.
the use of a pre-coder, or post coder emulating the effect obtained by the quick look in
property in the non systematic turbo code proposal in [82].
    Besides, we should also cite, another class of non systematic LDGM-like codes called
"Rateless codes" or "Fountain codes" dedicated to network communication. This codes
family includes LT codes [49] and Raptor codes [87]. Some common design guidelines
may be found between Raptor and scramble-LDPC codes, however to the best of our
knowledge, so far there is no study applying rateless channel encoding to non uniform
data.
    In the following, after some preliminaries on LDPC codes, let us give a detailed de-
scription of the design and decoding of the two codes family MN, then the universal class
of non systematic LDPC codes.

1.4.1 Preliminaries on LDPC codes


                         c1
                                                              P CE1
                                              R
                         c2
                                              A
                                              N
                                              D
                                              O               P CEj
                                              M
                         ci
                                                     dc
                                db
                                              G
                                              R
                                              A
                                              P
                                              H
                                                              P CEℓ
                        cN
                              N bitnodes           ℓ subcode nodes


                       Figure 1.5: Factor Graph of an LDPC Code

    Low Density Parity Check codes (LDPC) invented by Robert Gallager in 1963 [34], are
linear block codes, which are described by a low density parity check matrix and repre-
sented through a sparse bi-partite factor graph . The graphical and matricial descriptions
are related by the fact that the parity check matrix of the code is the adjacency matrix of
the bi-partite graph. Similar descriptions are used with Low Density Generator Matrix
(LDGM) codes, by considering the generator matrix instead of the parity check one in all
descriptions.
    The sparse nature of both parity check matrix and graph structure is the key property
involving the algorithmic efficiency of LDPC codes. Hence,in the matricial description

                                                                                        15
1. N ON S YSTEMATIC C ONSTRUCTIONS         OF   C ODES    ON   G RAPH   FOR   R EDUNDANT   DATA


the code is defined through the evaluation of the nonzero entries in the matrix H per line
or column. The code is regular if the number of these nonzero entries doesn’t change
from a column to another as well as from a line to another. Otherwise, we say that the
code is irregular. The irregular structure being more general, will be the one described in
the following.
    The LDPC codes are usually , described and studied in term of their graphical rep-
resentation. The graphical descriptions for codes started with Tanner graphs for linear
codes [89], then the graphical understanding of the "codes on graphs" has been shaped,
when the more general concept of "factor graph" was introduced [46]. Genealogically,
factor graphs are a straightforward generalization of the "Tanner graphs" of Wiberg et
al. [102], [31]. In Tanner’s original formulation, all variables are codeword symbols and
hence are "visible"; Wiberg et al., introduced "hidden" state variables and also suggested
applications beyond coding. Factor graphs take these graph-theoretic models one step
further, by applying them to functions, in order to represent the factorization of a mul-
tivariate function into simpler functions. Therefore, a factor graph is a graphical model
which is naturally more convenient for problems that have a probabilistic side.
    Following the factor graph trend, Turbo codes and LDPC have been unified as "codes
on graphs", and their decoding algorithms have been revealed in ([45],[56]) as special
instances of the belief propagation algorithm on general Bayesian networks [63]. Not
only has this understanding helped coding theorists in analysis of these codes and design
of decoding algorithms for them, they also learned how to design their codes to get the
best out of a given decoding algorithm.
    A factor graph for an LDPC code is always a bipartite graph related to a parity check
matrix H of size ℓ × N , and partitioned to N left nodes called variable nodes and ℓ right
nodes as function nodes usually called checknodes, and E edge linking the two parts.
The Figure (1.5) shows an example of such a bipartite graph. Notice that in this Figure,
the variable nodes are shown by circles and the checknodes by squares, as is the case for
most factor graphs.
    A variable node is a binary variable from the alphabet {0, 1} representing the N sym-
bols of the codeword c. A checknode P CEj is an even parity constraint on its neigh-
boring variable nodes. Each checknode represents a row of the parity check matrix H as
each variable node corresponds to a column . Each non zero entry at the position (i,j) in
the matrix coincides to an edge from the variable node cj to the checknode P CEi .The
represented linear code is of dimension k ≥ (N − ℓ) with equality if and only if all the
parity constraints are linearly independent.
    A graph ensemble is specified through two polynomials :
     • the polynomial λ(x) associated to variable nodes denoted as :
                                                  dbmax
                                        λ(x) =            λi xi−1                            (1.19)
                                                    i

       Where λ = {λ2 , λ3 , ..., λdbmax } denotes the variable edge degree distribution; where
       λi denotes the fraction of edges incident on variable nodes of degree i and λi ∈ [0, 1].
     • the polynomial ρ(x) associated to checknodes denoted as :

                                                  dcmax
                                        ρ(x) =            ρi xi−1                            (1.20)
                                                    i

16
1.4. D ESIGN P RINCIPLES   FOR   N ON S YSTEMATIC C ODES          ON   G RAPH


     Where ρ = {ρ2 , ρ3 , ..., ρdcmax } denotes the check edge degree distributions; where
     ρj denotes the fraction of edges incident on checknodes of degree j and ρi ∈ [0, 1].
    Notice that the graph is characterized in terms of the fraction of edges of each degree
and not the nodes of each degree. For a given length N and a given degree distribution
(λ, ρ), we define an ensemble of codes by choosing edges randomly. It has been shown in
[73] that the average behavior of almost all instances of an ensemble of irregular codes is
concentrated around its expected behavior, when the code is large enough. Additionally,
the expected behavior converges to the cycle- free case.
    Given the degree distribution of an LDPC code (λ, ρ)and its number of edges E, it is
easy to see that the number of variable nodes n is
                                                           1
                                            λi
                           N = E ×             = E             λ(x) dx               (1.21)
                                            i          0
                                        i

and the number of checknodes ℓ is
                                                           1
                                            ρi
                           ℓ = E ×             = E             ρ(x) dx               (1.22)
                                            i          0
                                        i

Therefore the design rate of the code will be
                                                 1
                                                0 ρ(x) dx
                                    R=1−        1                                    (1.23)
                                                0 λ(x) dx

    Finding a good asymptotically family of irregular codes is equivalent to finding a
good degree distribution. The task of finding a degree distribution that results in a code
family with some required properties is not a trivial task and will be one of the focuses
of this thesis.

1.4.2 Mackay-Neal Codes
Mackay-Neal codes [51], also called ’MN’ codes, are non systematic LDGM codes.The
key idea behind MN codes is that the generator matrix is constructed in terms of an
invertible matrix, in such a way that the sparse source and the sparse noise can be treated
symmetrically during the decoding process.
   The coding principle consists in encoding a source vector s by a Low Density Gener-
ator Matrix (LDGM) code defined by the matrix C1 , followed by the scrambling of the
LDGM codewords by the inverse of a sparse matrix denoted C2 .
   The block diagram description of the MN encoding process is illustrated in Figure (1.6)
and is written as following :
                                             −1
                                       c = C2 C1 s                                    (1.24)
    The dimensions of C1 and C2 are respectively N × K , and N × N . Both matrices
are sparse and can be found by applying a Gaussian elimination to an N × (K + N )
LDPC matrix H of column weight db and row weight dc (see the construction proposed
in [51]). After applying column permutations to guarantee the full rank, H is written as
H = [C1 |C2 ].
    The decoding problem is considered for a BSC, where the received vector is

                                        y=c+n                                        (1.25)

                                                                                         17
1. N ON S YSTEMATIC C ONSTRUCTIONS      OF   C ODES    ON   G RAPH   FOR   R EDUNDANT   DATA



      y


                  C2
      y
                           PCE


      n
                                                                              −1
                 C2        PCE s                 C1                          C2           c
                                               [N × K]                  [N × N ]
      n
                                                LDGM                     Splitter

                           PCE
      s
                 C1

      s


          Figure 1.6: Mackay-Neal Codes: Block and Tanner Graph descriptions


n is the binary noise vector of length N , assumed to be sparse with independent and
identically distributed bits.
    The decoding procedure should solve :

                                   C2 y = C1 s + C2 n                                     (1.26)

In this purpose, message passing decoding is performed on the Tanner Graph corre-
sponding to the equation (1.26) as it is illustrated in the Figure (1.6).
In the above description, MN codes are regular since the column weight of C1 and C2 is
equal to db . Improved irregular MN codes have been proposed by Kanter and Saad in
[43]. Stability of irregular MN codes has been improved in [69].
One of the main drawbacks related to MN codes , is that their decoder is specifically
designed for the binary symmetric channel . The decoding principle can be extended to
real-output channels by hard decision quantization of the a-priori values passed to the
noise bitnodes [51], however the quantization imply a degradation of the decoding per-
formance.

1.4.3 Non Systematic LDPC Framework
In this section, we propose a novel general approach to build an LDPC based non sys-
tematic code. We describe several different ways for building the encoder. All the config-
urations can successfully utilize redundancy in the coded sequence during the decoding
process.
    The general idea is to concatenate a pre-coding (or post-coding) block that uses an
invertible square matrix with an LDPC or an LDGM encoder. Then the decoder performs
over one single irregular bipartite factor graph combining the code and the precoder or
the post coder. The systematic bits are not sent over channel, but included in the decoding
graph as a subset of the set of bitnodes, to which a-priori information can be provided.

18
1.4. D ESIGN P RINCIPLES   FOR   N ON S YSTEMATIC C ODES   ON   G RAPH


                         −1                        LDPC or LDGM
          s            Cs or Cs                                                 c
                        [K × K]
                                        Pre−Coding


                                                         −1
                    LDPC or LDGM                        Cs or Cs
          s                                                                     c
                                                        [N × N ]

                                       Post−Coding


              Figure 1.7: General Non-Systematic LDPC Encoding Framework

The set of bitnodes is completed with the set of transmitted codeword bits, and in some
of the configurations, but not all, another subset of un-transmitted bits can be added. The
parity checknodes in the decoder graph are those,of both the pre-coder or post-coder and
of the LDPC or LDGM code.
    The pre-coder (or post-coder) is a square matrix which is either of low density or has
an inverse that is sparse. The main idea is that the pre-coder (or post-coder) emulates the
effect obtained by the Quick-Look-In property in turbo codes, by converting the system-
atic non uniform message to an almost uniformly distributed one. Hence, there are eight
different possible configurations as illustrated in Figure (1.7). Each one depends on the
combination choices among:
   1. LDPC or LDGM as a core code.
   2. Pre-coder or Post-coder.
   3. Low density square matrix denoted as scrambler or a matrix whose inverse is low
      density denoted as splitter.
One specific configuration of our codes is the encoder of an MN code which could be
viewed as a concatenation of a non-systematic LDGM code with a post coding splitter.
However our proposed general decoder procedure is channel independent and is differ-
ent from the one proposed in the original MN proposal [51]. Thus, the proposed general
decoder procedure gives an enhanced decoding alternative for the MN.
     In the remainder of this section, we focus on pre-coding with either a scrambler or
a splitter combined with an LDPC code. We refer to a pre-coding scrambler system as
a scramble-LDPC code, and to the pre-coding splitter as a split-LDPC code. Post-coding
scrambler and splitter LDPC codes will be referred to as LDPC-scramble and LDPC-split
codes, respectively. The methods described herein may be easily adapted to implement
all the other configurations.

1.4.3.1   Scramble-LDPC Construction
Definition 1.1. A scramble-LDPC code is built by the concatenation of a pre-coding module
denoted "Scrambler" with an LDPC code.The scrambler is described by a sparse matrix denoted
Cs .

                                                                                        19
1. N ON S YSTEMATIC C ONSTRUCTIONS            OF   C ODES   ON   G RAPH   FOR   R EDUNDANT   DATA



            v
(1−R).N                                     β        (1−R).N
            v        db
                             LDPC

                                            β
           u
                     db
 R.N                                  dc
           u
                                                                   Cs                 LDPC
                                                         s                                        c
                                                               Scrambler
           s                                α
                ds
 R.N       s                Cs              α         R.N
                                      ds
            s                               α
                          Scrambler

          Figure 1.8: Scramble LDPC Tanner Graph and block Diagram Description


   The block diagram as the factor graph of the scramble-LDPC are illustrated in the
Figure (1.8).A scramble-LDPC encoder first encodes (scrambles) the source vector s into
u by :
                                      u = Cs s                                  (1.27)
Let Cs be a sparse matrix of dimensions K × K.For a regular scrambler, Cs has row and
column weight ds . Similarly to the LDPC case, we can consider building scramblers with
an irregular structure. Afterwards, standard LDPC systematic encoding is performed on
the scrambled vector u,to generate the codeword :

                                       c = [uT |ϑT ]T = G u                                    (1.28)

     where G is a systematic generator matrix for the low-density parity check matrix H,
and the superscript T denotes the transpose operator. Hence, the codeword c consists of
the K scrambled bits of u and the N − K parities over the scrambled bits denoted ϑ.For
a regular LDPC code, the parity-check matrix H has column weight db and row weight
dc . The decoding graph for the scramble-LDPC combines parity checks of the LDPC
codes, denoted by β, with those obtained from the scrambler, denoted by α as shown in
Figure (1.8). The decoding process is realized , by performing over this whole graph, a
source controlled belief propagation as described further in the next section.

1.4.3.2    Split-LDPC Construction
Definition 1.2. A split-LDPC code is built by the concatenation of a pre-coding module denoted
"Splitter" with an LDPC code.The splitter is described by the inverse of a sparse matrix denoted
  −1
Cs .

20
1.4. D ESIGN P RINCIPLES    FOR   N ON S YSTEMATIC C ODES    ON   G RAPH


           v
(1−R).N                                      β    (1−R).N

           v
                          LDPC         dc
               db
                                             β
          u
   R.N
          u
                             Cs                              C−1
                                                              s              LDPC
                                      ds              s                                c
                    ds
                         Splitter                           Splitter
          s                                  α

   R.N    s                                  α        R.N


           s                                 α


                           Figure 1.9: Split-LDPC Tanner Graph


   The block diagram as the factor graph of the split-LDPC are illustrated in the Fig-
ure (1.9).The split-LDPC encoder is very similar to the scramble-LDPC one’s described
upwards, except that the scrambling operation is replaced by splitting performed by
                                                −1
                                           u = Cs s                                 (1.29)

    Where Cs is a sparse matrix of dimensions K × K, that has in the regular case row
and column weight ds .Similarly to the scrambler case, we can consider building splitters
with an irregular structure.
    Afterwards, standard LDPC systematic encoding is performed on the scrambled vec-
tor u,to generate the codeword :

                                      [uT |ϑT ]T = G u                              (1.30)

    where G is a systematic generator matrix for the low-density parity check matrix H,
and the superscript T denotes the transpose operator. Hence, the codeword c consists of
the K splitted bits of u and the N − K parities over the scrambled bits denoted ϑ.For a
regular LDPC code, the parity-check matrix H has column weight db and row weight dc .
                  −1
    The matrix Cs describing the splitter is a dense matrix, given that it is the inverse
of a sparse one, according to [12]. The latter fact gives a reason to believe that, with
redundant sequences, a split-LDPC code has an advantage over a scramble-LDPC code
in generating channel distribution closer to the uniform capacity achieving one. This is
also because splitting results in an even split to 1 and 0 bits; while scrambling leads to
output distribution closer to uniform from the original nonuniform one, this distribution
can still be shown to be nonuniform.

                                                                                       21
1. N ON S YSTEMATIC C ONSTRUCTIONS       OF   C ODES   ON   G RAPH   FOR   R EDUNDANT   DATA


                               ϑ bits             u bits                s bits

         Systematic        011011001100000000000000000000
         LDPC              001010000010000011010000000000
                           000001101001011000000000000000
                           100000010010000001110000000000
                           000101110100010000000000000000
                           110000000001010001100000000000
                           011100000001001100000000000000
                           000010111010100000000000000000
                           000100000000100110110000000000
                           100000000100101110000000000000
                           000000000000000001110000100000
                           000000000000110001000000000001
                           000000000000100010100000001000
                           000000000000000010110000000010
                           000000000011100000000010000000
                           000000000001001010000001000000
                           000000000010001100001000000000
                           000000000000010100010000010000
                           000000000000010101000100000000
                           000000000011001000000000000100

          Square Low−Density Scrambler                            Permutation


           Figure 1.10: General Structure of a Split-LDPC Parity Check Matrix


    Besides, the splitter can be viewed as if an incoming source bit is splitted into several
coded bits, that is why splitting resembles the Quick-Look-In (QLI) property in turbo codes
[11],in which two (or more) parity bits sum up to the original source bit. Let us remind
that, MN codes are also obtained by LDGM followed by a splitter.
    The decoding graph for the split-LDPC combines parity checks of the LDPC codes,
denoted by β, with those obtained from the splitter , denoted by α as shown in Fig-
ure (1.9). By the same token, we illustrate in Figure (1.10) the parity check matrix related
to this graph. Encoding, is realized by performing the Gaussian elimination over this par-
ity check matrix. The systematic form of this matrix reveals a highly dense scrambling
applied to the source bits during the encoding.
    The decoding process is realized , by performing over this whole graph, a source
controlled belief propagation as described further in the next section.


1.4.4   Information Theoretical Comparison of Splitting and Scrambling
In the present section, we analyze the mutual information of the best regular scrambling
based code and show that it is smaller than that of the best splitting based code. A split-
LDPC code multiplies the nonuniform sequence s by a dense matrix, generating a split
vector that has distribution very close to uniform [12]. Hence, the best possible splitting
based code may be close to achieving the BPSK capacity.
    For a regular scrambling-based code as described in section (1.4.3.1), ds systematic
nonuniform bits are scrambled into a code bit. Assuming that the parity bits that are
generated by the LDPC code part are uniformly distributed, the best achievable mutual
information can be computed using the equations (1.10) and (1.11) where µ in (1.11) is
replaced by γ . The probability γ denotes the probability of 1 in the scrambled sequence

22
1.4. D ESIGN P RINCIPLES                                           FOR   N ON S YSTEMATIC C ODES                                 ON       G RAPH


                     1.8                                                                                      2.2
                                                                                                                                       Gaussian input
                                                                                                               2      Non-Systematic code, BPSK input
                     1.6                                                                                                              Scrambled ds=5
                                              Gaussian input                                                  1.8                     Scrambled ds=3
                     1.4     Non-Systematic code, BPSK input                                                             Systematic codes, BPSK input
                                             Scrambled ds=5
                                                                                                              1.6
                                             Scrambled ds=3
                     1.2        Systematic codes, BPSK input
                                                                                                              1.4
Mutual information




                                                                                         Mutual information
                      1                                                                                       1.2


                     0.8                                                                                       1

                                                                                                              0.8
                     0.6
                                                                                                              0.6
                     0.4
                                                                                                              0.4

                     0.2
                                                                                                              0.2

                      0                                                                                        0
                       -10          -5                     0               5        10                          -10                -5                       0       5        10
                                                       Eb/N0(dB)                                                                                        Eb/N0(dB)


Figure 1.11: Mutual information vs. Ebr /N0 for Hs = 0.5 and coding rates R = 0.5 (left)
and R = 0.8 (right).

u. For a regular scrambler of degree ds , it can be obtained, using Gallager’s lemma [34] :

                                                                          γ = 0.5 1 − (1 − 2 µ)ds                                                                       (1.31)

For an irregular scrambler, similar computation can be done, considering all the different
scrambling degrees.
    The Figure (1.11) shows curves of the best possible mutual information versus Ebr /N0
for the different methods discussed above for a nonuniform source with µ = 0.1 with
coding rates R = 0.5 and R = 0.8. The BPSK input curve bounds the mutual informa-
tion of the splitting based code. As both graphs show, a splitter based code is better than
the scrambler based codes, and both classes have gains over systematic codes. We ob-
serve that as the scrambler degree increases, the scrambler approaches the behavior of
the splitter. This is expected, since the scrambler is converging to the splitter behavior in
the equation (1.31) where γ → 1/2 when ds is increasing.
    The Figures (1.2) and (1.12) show the theoretical minimum Ebr /N0 as a function of
the source entropy for channel code rates R = 0.5 and R= 0.8 respectively. The curves in
both cases support the advantage of non-systematic codes, and in more particular way
the supremacy of splitting based codes. Moreover, the loss in using systematic codes is
shown to increase significantly as the non-uniformity of the source increases.
    Additionally to point out that for high rate codes there is a significant benefit in using
non-systematic codes, the Figure (1.3) reveals also the improvement of scramblers with
the increase of ds .
    Conversely, in most of finite length cases, we find out that a scrambler/splitter with
an improper degree for a distribution can degrade the performance from that of system-
atic codes, even if the considered degree value is high enough . This fact indicates that
the degree of the scrambler/splitter should be optimized for each value of µ.
    These results, despite being incoherent with the theoretical limits described above,
might be interpreted with the increasing of cycles’density in the lower subgraph re-
lated to the scrambler/splitter. Actually, the increase of ds for a constant code length,

                                                                                                                                                                           23
1. N ON S YSTEMATIC C ONSTRUCTIONS                                      OF    C ODES         ON   G RAPH   FOR   R EDUNDANT   DATA


                                          4


                                          2


                                          0
         Minimum Achievable Eb/N0 (dB)




                                          -2


                                          -4


                                          -6


                                          -8
                                                                                           Systematic code, BPSK input
                                                                                                       Scrambled ds=3
                                         -10                                                           Scrambled ds=5
                                                                                       Non-Systematic code, BPSK input
                                                                                                        Gaussian input
                                         -12
                                               0.1   0.2   0.3   0.4      0.5        0.6            0.7   0.8       0.9   1
                                                                       Source Entropy (bits)



 Figure 1.12: Minimum achievable Ebr /N0 vs. source entropy Hs Coding rate R = 0.8.


induces an increase of edges locally in this lower subgraph for a constant number of
nodes (R × N ), involving doubtlessly more cycles.
   In this section, split-LDPC have been revealed as being theoretically the best non
systematic LDPC structure. Moreover, The simulations results, presented in section (1.6),
show that gains of about 1dB are achieved when the scrambler is replaced by a splitter.
Accordingly, we expect the MN codes to outperform scramble-LDPC, since MN codes
belong to the split-LDPC general structure.


1.5 Source Controlled Sum-Product Decoding
The sum-product algorithm is a generic message-passing algorithm, that operates over
a factor graph in order to compute various marginal functions associated with a global
function [46].A wide variety of algorithms developed in the artificial intelligence, signal
processing, and digital communication communities may be seen as specific instances of
the sum-product algorithm, operating in an appropriated chosen factor graph.
    One important subclass of the sum-product algorithm is the Pearl’s powerful belief
propagation algorithm , which operates by message-passing over a Bayesian network, in-
cluding as instances the iterative decoding algorithms related to LDPC and turbo codes
as it was shown in [56],[86].
    In his original work; Gallager has already proposed a simplified version of the sum-
product algorithm to decode LDPC codes [33] consisting on the propagation of beliefs
or probabilities between variable nodes and checknodes. This messages exchange is the
reason why this kind of algorithms are also referred to as Message Passing algorithms.
    Without going into much detail,we define the nature of the messages and the sim-
plified update rules on them, for our class of non systematic LDPC in presence of non
uniform sources.

24
1.5. S OURCE C ONTROLLED S UM -P RODUCT D ECODING


    The most common message type used in the literature is the log-likelihood ratio (LLR)
             p(x=0|y)
namely log p(x=1|y) , this is because it is more appropriate to computer implementation,
since we can represent probability values that are very close to zero or one without caus-
ing a precision errors, moreover its update rules are quiet simple.
    The message passing algorithm is an iterative algorithm, where in each round mes-
sages are sent from message nodes to checknodes, then from checknodes back to message
nodes.
    In order to describe the algorithm let us consider the following notations:
   • The bipartite graph describing our non systematic LDPC codes with three type of
     message nodes ci ∈ {u, ϑ, s}and two types of checknodes P CEi ∈ {α, β} (see
     Figure (1.15)).
   • Sci is the set of checknodes involving the message node ci .
   • SP CEi is the set of bitnodes involving the checknode P CEi .
   • LLR0 denotes the channel observation LLR.
                   1−µ
   • LLRs = log(       ) denotes the LLR related to the source statistics.
                    µ
   • LLRci →P CEi denotes the LLR coming from a message node ci towards a checknode
     P CEi , respectively, LLRP CEi →ci denotes the LLR coming from a checknode P CEi
     towards a message node ci .
   The bitnode to checknode step




                                              P CEj




                                  ϑ                             s


                                        Extrinsic Information


                           Observation LLR0           A priori LLRs




             Figure 1.13: The local message node to checknode update rule.

   The variable node to checknode update rule is written as :

                LLRci →P CEi = LLRtype +                               LLRP CEj →ci   (1.32)
                                                P CEj ∈Sci −{P CEi }

   where :
                                  LLR0        if the bitnode type ∈ {u, ϑ}
                   LLRtype =                                                          (1.33)
                                  LLRs          if the bitnode type ∈ {s}

                                                                                         25
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation
Phd dissertation

Más contenido relacionado

La actualidad más candente

Thesis: Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006) Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis: Slicing of Java Programs using the Soot Framework (2006) Arvind Devaraj
 
Hub location models in public transport planning
Hub location models in public transport planningHub location models in public transport planning
Hub location models in public transport planningsanazshn
 
Fuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABFuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABESCOM
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
Script md a[1]
Script md a[1]Script md a[1]
Script md a[1]Peter
 
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...Olga Lednichenko
 
PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"Vicente RIBAS-RIPOLL
 
Cloud enabled business process management systems
Cloud enabled business process management systemsCloud enabled business process management systems
Cloud enabled business process management systemsJa'far Railton
 
thesis-cannings-2004
thesis-cannings-2004thesis-cannings-2004
thesis-cannings-2004Rich Cannings
 
Lecture Notes in Machine Learning
Lecture Notes in Machine LearningLecture Notes in Machine Learning
Lecture Notes in Machine Learningnep_test_account
 
HRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionHRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionDavid Jardim
 
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...Francisco Javier Franco Espinoza
 
An introduction to higher mathematics
An introduction to higher mathematicsAn introduction to higher mathematics
An introduction to higher mathematicsYohannis Masresha
 
Flexible and efficient Gaussian process models for machine ...
Flexible and efficient Gaussian process models for machine ...Flexible and efficient Gaussian process models for machine ...
Flexible and efficient Gaussian process models for machine ...butest
 

La actualidad más candente (20)

Thesis: Slicing of Java Programs using the Soot Framework (2006)
Thesis:  Slicing of Java Programs using the Soot Framework (2006) Thesis:  Slicing of Java Programs using the Soot Framework (2006)
Thesis: Slicing of Java Programs using the Soot Framework (2006)
 
Hub location models in public transport planning
Hub location models in public transport planningHub location models in public transport planning
Hub location models in public transport planning
 
Grl book
Grl bookGrl book
Grl book
 
22024582
2202458222024582
22024582
 
t
tt
t
 
Fuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABFuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLAB
 
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
Script md a[1]
Script md a[1]Script md a[1]
Script md a[1]
 
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...
Elementray college-algebra-free-pdf-download-olga-lednichenko-math-for-colleg...
 
PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"PhD thesis "On the intelligent Management of Sepsis"
PhD thesis "On the intelligent Management of Sepsis"
 
Cloud enabled business process management systems
Cloud enabled business process management systemsCloud enabled business process management systems
Cloud enabled business process management systems
 
thesis
thesisthesis
thesis
 
thesis-cannings-2004
thesis-cannings-2004thesis-cannings-2004
thesis-cannings-2004
 
Lecture Notes in Machine Learning
Lecture Notes in Machine LearningLecture Notes in Machine Learning
Lecture Notes in Machine Learning
 
HRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionHRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State Abstraction
 
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
MSc Thesis_Francisco Franco_A New Interpolation Approach for Linearly Constra...
 
An introduction to higher mathematics
An introduction to higher mathematicsAn introduction to higher mathematics
An introduction to higher mathematics
 
SMA206_NOTES
SMA206_NOTESSMA206_NOTES
SMA206_NOTES
 
Flexible and efficient Gaussian process models for machine ...
Flexible and efficient Gaussian process models for machine ...Flexible and efficient Gaussian process models for machine ...
Flexible and efficient Gaussian process models for machine ...
 

Similar a Phd dissertation

A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsJeff Brooks
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRSAbhishek Nadkarni
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggRohit Bapat
 
Crypto notes
Crypto notesCrypto notes
Crypto notesvedshri
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFaniruddh Tyagi
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFAniruddh Tyagi
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFaniruddh Tyagi
 
Location In Wsn
Location In WsnLocation In Wsn
Location In Wsnnetfet
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Alexander Zhdanov
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsSandra Long
 
No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)catprasanna
 

Similar a Phd dissertation (20)

A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative Optimizations
 
BE Project Final Report on IVRS
BE Project Final Report on IVRSBE Project Final Report on IVRS
BE Project Final Report on IVRS
 
VHDL Reference
VHDL ReferenceVHDL Reference
VHDL Reference
 
phd thesis
phd thesisphd thesis
phd thesis
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
 
Di11 1
Di11 1Di11 1
Di11 1
 
Crypto notes
Crypto notesCrypto notes
Crypto notes
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Michael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDFMichael_Lavrentiev_Trans trating.PDF
Michael_Lavrentiev_Trans trating.PDF
 
Location In Wsn
Location In WsnLocation In Wsn
Location In Wsn
 
Wu dis
Wu disWu dis
Wu dis
 
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency Algorithms
 
Ns doc
Ns docNs doc
Ns doc
 
jc_thesis_final
jc_thesis_finaljc_thesis_final
jc_thesis_final
 
Matconvnet manual
Matconvnet manualMatconvnet manual
Matconvnet manual
 
No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)No SQL Databases (a thorough analysis)
No SQL Databases (a thorough analysis)
 
Sdr
SdrSdr
Sdr
 
report
reportreport
report
 

Phd dissertation

  • 1. Ecole Doctorale d’Informatique Télécommunications et Electronique de Paris Ecole Nationale Supèrieure des Télécommunications Thèse Présentée pour obtenir le grade de docteur de l’École Nationale Supérieure des Télécommunications Spécialité : Electronique et Communications Amira ALLOUM Sujet : C ONSTRUCTION AND ANALYSIS OF N ON -S YSTEMATIC C ODES ON G RAPH FOR R EDUNDANT D ATA .
  • 2.
  • 3. Contents Contents i List of figures iii List of tables v Acronyms vii Introduction 1 1 Non Systematic Constructions of Codes on Graph for Redundant data 5 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 System Model and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Information Theoretical Limits for Non Uniform Sources . . . . . . . . . . 9 1.3.1 AWGN Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Binary Erasure Channel . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Binary Symmetric Channel . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Design Principles for Non Systematic Codes on Graph . . . . . . . . . . . 13 1.4.1 Preliminaries on LDPC codes . . . . . . . . . . . . . . . . . . . . . . 15 1.4.2 Mackay-Neal Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4.3 Non Systematic LDPC Framework . . . . . . . . . . . . . . . . . . . 18 1.4.3.1 Scramble-LDPC Construction . . . . . . . . . . . . . . . . 19 1.4.3.2 Split-LDPC Construction . . . . . . . . . . . . . . . . . . . 20 1.4.4 Information Theoretical Comparison of Splitting and Scrambling 22 1.5 Source Controlled Sum-Product Decoding . . . . . . . . . . . . . . . . . . 24 1.6 Multi Edge classification to the Non Systematic LDPC Codes . . . . . . . . 27 1.7 Simulation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2 Density Evolution Analysis for Split-LDPC Codes 35 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . 36 2.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2 Preliminaries on Density Evolution . . . . . . . . . . . . . . . . . . . . . . . 38 2.2.1 Concentration and The local tree assumption . . . . . . . . . . . . . 38 2.2.2 Symmetry Assumptions and the All one Codeword Restriction . . 39 i
  • 4. CONTENTS 2.2.3 General Statement of Density Evolution . . . . . . . . . . . . . . . 40 2.3 Statement of Split-LDPC Density Evolution . . . . . . . . . . . . . . . . . . 43 2.4 Analytic Properties of Split-LDPC Density Evolution . . . . . . . . . . . . 47 2.4.1 The Consistency Condition . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.2 Monotonicity and Convergence to Fixed Points . . . . . . . . . . . 49 2.4.3 Thresholds and Density Evolution Fixed Points . . . . . . . . . . . 50 2.5 The Stability Analysis of Split-LDPC Density Evolution . . . . . . . . . . . 51 2.5.1 The Stability Condition for Systematic LDPC Codes : . . . . . . . . 52 2.5.2 The Stability Condition for the Non Systematic Split-LDPC Code . 54 2.6 Simulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3 Exit Chart Analysis and Design of Irregular Split-LDPC Codes 65 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . 66 3.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2 Preliminaries on the EXIT Chart Analysis of LDPC Codes . . . . . . . . . . 68 3.2.1 The EXIT Charts Principle . . . . . . . . . . . . . . . . . . . . . . . . 69 3.2.2 The Semi-Gaussian Approximation and the Related Metrics . . . . 70 3.2.3 Analysis of Regular Structures . . . . . . . . . . . . . . . . . . . . . 72 3.2.4 Analysis of Irregular Structures . . . . . . . . . . . . . . . . . . . . . 73 3.3 The Two Dimensional EXIT Charts Analysis of Split-LDPC Codes . . . . . 75 3.3.1 Analysis of Regular Structures . . . . . . . . . . . . . . . . . . . . . 79 3.3.2 Analysis of Irregular Structures . . . . . . . . . . . . . . . . . . . . . 81 3.4 Design of Irregular Split LDPC codes . . . . . . . . . . . . . . . . . . . . . . 86 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4 Enhancing Iterative Decoding via EM Source-Channel Estimation 91 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.1.1 Motivation and Related Work . . . . . . . . . . . . . . . . . . . . . . 92 4.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2 System Model and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.1 Source State Information . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.2 Channel State Information . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3 General Statement of the EM Algorithm . . . . . . . . . . . . . . . . . . . . 96 4.4 EM application to BECBSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.1 Expectation step on BECBSC . . . . . . . . . . . . . . . . . . . . . . 97 4.4.2 Maximization step on BECBSC . . . . . . . . . . . . . . . . . . . . . 98 4.4.3 Two simple special cases, BEC and BSC . . . . . . . . . . . . . . . . 99 4.5 EM application to AWGN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.5.1 Expectation step on AWGN . . . . . . . . . . . . . . . . . . . . . . . 99 4.5.2 Maximization step on AWGN . . . . . . . . . . . . . . . . . . . . . . 100 4.6 Simulations and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5 Conclusions and Perspectives 103 Bibliography 107 ii
  • 5. List of Figures 1.1 Block Diagram for a Source Controlled Channel Coding System . . . . . . 8 1.2 Minimum achievable Ebr /N0 vs. source entropy Hs Coding rateR = 0.5. . 12 1.3 Minimum Achievable SNR versus Coding Rate in AWGN, Hs = 0.5. . . . 13 1.4 Capacity limit versus source entropy for BEC and BSC. . . . . . . . . . . . 14 1.5 Factor Graph of an LDPC Code . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 Mackay-Neal Codes: Block and Tanner Graph descriptions . . . . . . . . . 18 1.7 General Non-Systematic LDPC Encoding Framework . . . . . . . . . . . . 19 1.8 Scramble LDPC Tanner Graph and block Diagram Description . . . . . . 20 1.9 Split-LDPC Tanner Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.10 General Structure of a Split-LDPC Parity Check Matrix . . . . . . . . . . . 22 1.11 Mutual information vs. Ebr /N0 for Hs = 0.5 and coding rates R = 0.5, 0.8 23 1.12 Minimum achievable Ebr /N0 vs. source entropy Hs Coding rate R = 0.8. . 24 1.13 The local message node to checknode update rule. . . . . . . . . . . . . . . 25 1.14 The local checknode to bitnode update rule. . . . . . . . . . . . . . . . . . . 26 1.15 Multi-Edge graphical interpretation of Split and Scramble-LDPC. . . . . . 28 1.16 Performance of systematic and non systematic LDPC with R = 0.5, µ = 0.1 30 1.17 Performance of systematic and non systematic LDPC with R = 0.5, µ = 0.2 31 1.18 Performance of systematic and non systematic LDPC with R = 0.9, µ = 0.1 31 1.19 The effect of splitter degree variation over Split-LDPC codes with R = 0.5 32 2.1 The tree describing a regular (db , dc ) LDPC. . . . . . . . . . . . . . . . . . . 40 2.2 Tree representation for type-1 messages for rate R = 1/2 split-LDPC. . . . 43 2.3 Tree representation for type-2 messages for rate R = 1/2 split-LDPC. . . . 44 2.4 Tree representations for type-3 messages for rate R = 1/2 split-LDPC. . . . 44 2.5 Consistency of the Splitter Output distribution . . . . . . . . . . . . . . . . 49 2.6 The equivalent channel that is seen by the core-LDPC . . . . . . . . . . . . 56 2.7 Threshold versus source entropy variation for the Split-LDPC . . . . . . . 61 3.1 EXIT charts Analysis General Principle. . . . . . . . . . . . . . . . . . . . . 69 3.2 The evolution of densities at the bitnode output (left) and checknode out- put(right) for a regular (3, 6) LDPC at 1.18dB. . . . . . . . . . . . . . . . . . 70 3.3 Elementary GA charts with single Gaussian pdf input for different variable degrees for a regular LDPC with dc = 6 on an AWGN at SN R = −2.0dB . 72 3.4 Elementary charts with Gaussian mixture input for the different variable nodes degrees for an irregular LDPC with ρ6 = 0.7855 ρ7 = 0.2145 and λ(x) = .3266x + .1196x2 + .1839x3 + .3698x4 on an AWGN channel at SNR=-2.0dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 iii
  • 6. LIST OF FIGURES 3.5 EXIT charts trajectory with Gaussian mixture input for an irregular LDPC with ρ6 = 0.7855 ρ7 = 0.2145 and λ(x) = .3266x + .1196x2 + .1839x3 + .3698x4 on an AWGN channel at SNR=-2.0dB. (the tunnel is open). . . . . 74 3.6 Tree representation for type-1 messages for rate R = 1/2 split-LDPC. . . . 75 3.7 Tree representation for type-2 messages for rate R = 1/2 split-LDPC. . . . 76 3.8 Tree representations for type-3 messages for rate R = 1/2 split-LDPC. . . . 76 3.9 Transfer chart F (x, y) without mixtures. Illustration for ds = db = 3, dc = E 6, Ns = −5.00dB. Input distributions are single Gaussian.Entropy Hs = 0.5 0 79 3.10 Trajectory of error probability near the code threshold over the surface associated to the transfer chart F (x, y) without mixtures. Illustration for E ds = db = 3, dc = 6, right to the threshold : Ns = −5.00dB. Input distribu- 0 tions are single Gaussian. Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . 80 3.11 Open tunnel obtained from the EXIT chart of the ds = db = 3, dc = 6 E regular split-LDPC with at Ns = −5dB , Entropy Hs = 0.5. The tunnel 0 is made by plotting the trajectory of error probability and its z = x plane reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.12 Transfer chart F (x, y) with mixtures. Illustration for a (ds = 3, db = 3, dc = E 6) regular LDPC code at Ns = −5.00dB. Input distributions are Gaussian 0 mixtures. Channel initial point at top of the sail. Fixed point of zero error rate at bottom of the sail.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . . 82 3.13 Transfer chart F (x, y) with and without mixtures and associated error prob- E ability trajectories. Illustration for a regular LDPC code at Ns = −5.00dB. 0 Channel initial point at top of the sail. Fixed point of zero error rate at bottom of the sail.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . . . . . . 82 3.14 Trajectory of error probability near the code threshold. Illustration for an irregular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 + E 0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Right to the threshold: Ns = 0 −5.58dB, Threshold=−5.68dB. Final fixed point is 0.Entropy Hs = 0.5 . . 83 3.15 Trajectory of error probability near the code threshold. Illustration for an irregular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 + E 0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Left to the threshold: Ns = 0 −5.78dB, Threshold=−5.68dB. Final fixed point is non-zero.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.16 Stability of fixed points beyond the code threshold. Illustration for an ir- regular split LDPC code with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 + E 0.36988x4 , ρ(x) = 0.78555x5 + 0.21445x6 . Left to the threshold: Ns = 0 −5.78dB, Threshold=−5.68dB. Stable and unstable fixed points.Entropy Hs = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.17 Open tunnel obtained from the EXIT chart of an irregular split-LDPC with λ(x) = 0.3266x + 0.1196x2 + 0.18393x3 + 0.36988x4 , ρ(x) = 0.78555x5 + E 0.21445x6 , at Ns = −5.58dB , Entropy Hs = 0.5. The tunnel is made by 0 plotting the trajectory of error probability and its z = x plane reflection . . 85 4.1 General Model of a coded communication system with EM estimation . . 94 4.2 General model for the EM source-channel estimation . . . . . . . . . . . . 96 4.3 Estimated source distribution µ versus EM iterations. . . . . . . . . . . . . 100 4.4 EM Performance for the (3,6) LDPC over BEC . . . . . . . . . . . . . . . . . 101 4.5 EM Performance for systematic and non-systematic LDPC over AWGN . 102 iv
  • 7. List of Tables 1.1 Multinomial description of Split (top) and Scramble (bottom) LDPC . . . 29 2.1 Threshold table for regular Split LDPC with R = Hs = 1 and ds ∈ {3, 5} . 62 2 2.2 Threshold tables for regular Split-LDPC with ds = 3, R = 1 and Hs ∈ {1, .3} 62 2 2.3 Threshold table for irregular Split-LDPC with R = Hs = 1 and ds = 3 . . 63 2 3.1 Error in approximation of threshold (dB) using EXIT chart analysis for var- ious regular split-LDPC codes of Rate one-half with ds = 3 and Hs = 0.5. ∆ is the log-ratio quantization step. . . . . . . . . . . . . . . . . . . . . . . . 86 1 3.2 Threshold table for irregular split LDPC with Hs = 2 and ds = 3 . . . . . . 88 v
  • 9. A CRONYMS ACRONYMS Here we list the main acronyms used in this document. The meaning of the acronym is usually indicated once, the first time it appears in the text. JSCC Joint Source Channel Coding SCCD Source Controlled Channel Decoding LDPC Low Density Parity Check Code MN Mackay-Neal AWGN Additif White Gaussian Noise BIAWGN Binary Input Additif White Gaussian Noise BEC Binary Erasure Channel BSC Binary Symmetric Channel QLI Quick Look In pdf Probability density function pmf Probability mass function i.i.d independant and idenitcally distributed BPSK Binary Phase Shift Keying LDGM Low Density Generator Matrix LLR Log Likelihood Ratio SNR Signal to Noise Ratio DE Density Evolution EM Expectation Maximization CSI Channel State Information SSI Source State Information BECBSC Binary Symmetric Channel with Erasures APP A Posteriori Probability BWT Burrows Wheeler transform EXIT Extrinsic Information Transfer MIMO Multi Input Multi Output GA Gaussian Approximation vii
  • 11. Introduction In the past decade, the field of Modern Coding Theory has known a focused interest and an intensity of efforts for the design of capacity achieving codes defined on graphs and the analysis of the associated iterative decoding algorithms [52],[72]. This excitement for codes on graphs and the iterative techniques was ignited in the mid-1990s by the excellent performance exhibited by the turbo codes of Berrou and Glavieux [17], and the resuscitated Gallager codes [33]. Afterwards, iterative methods have been influencing a wide range of applications within and beyond communications, allowing the development of sophisticated com- munication systems. In most of those communication systems, the channel coding procedures are designed independently of the source statistics assuming uniformly distributed sources. This ap- proach is justified by the Shannon’s famous separation theorem [83] stating that separate source channel coding incurs no loss of optimality as long as the entropy of the source is less than the capacity of the channel providing that blocklength goes to infinity. Because of various considerations (such as legacy and the difficulty in adapting uni- versal data compressors to short-packet transmission systems), certain existing applica- tions do not compress the redundant data prior to channel encoding (including third- generation wireless data transmission systems)[60]. In other applications, such as the Global System for Mobile Communications (GSM) second-generation cellular wireless system, the vocoder leaves some residual redundancy prior to channel encoding [111], [64], [108], [60] . In those cases, the signal sent through the channel incorporates redun- dancy due to its channel encoder and the data itself. While Shannon has been establishing the famous separation theorem, he intuited that any redundancy in the source will usually help if it is utilized at the receiving point [83]. In other words, when the residual source redundancy is ignored by the receiver, some losses are induced and the system performs worse than its potentiality. The first practical embodiment of this idea was the proposal of the Source Controlled Channel Decoding technique (SCCD) made by Hagenauer in [39] and consisting in ex- ploiting the redundancy as an a priori information during the iterative decoding process. Besides, Shamai and Verdu have shown in [78] that the use of non systematic coding schemes is more appropriate in presence of redundant sources. This is because those constructions supply the capacity achieving distribution to the channel. In the light of these two pioneering contributions several research directions have been drawn, where most important concepts encountered during the study of codes on graphs within the uniform sources assumption are to be carried over to the non uniform case. Some of the major concerns of those directions are: 1
  • 12. I NTRODUCTION • Investigating the new information theoretical limits in presence of redundant sources. • Designing capacity achieving non-systematic codes on graph. • Defining associated source controlled iterative decoding algorithms. • Building analysis tools in order to investigate the asymptotical performance of non- systematic codes in presence of redundancy, and to design the best constructions. These guidelines meet the challenges of the present thesis, where the purpose is to introduce the design principles and the analysis tools for a novel class of non systematic low density parity check codes dedicated to non uniform memoryless sources. In partic- ular, we study the most performant scheme of the class namely the split-LDPC code that consist in the concatenation of a pre-coding with an LDPC code [5],[80]. In 1997 Mackay and Neal pioneered the field with a class of non systematic codes called ’MN codes’ and obtained by the scrambling of a low-density generator-matrix [53],[51]. One of the main drawbacks related to MN codes , is that their decoder is specif- ically designed for the binary symmetric channel. Another major proposal has been done by Alajaji for non systematic parallel turbo-codes in [116]. Afterwards, in 2005 G.Shamir and J.J.Boutros have introduced a novel general ap- proach to design a special class of non systematic low density parity check codes dedi- cated to non uniform memoryless sources. This code family is including MN codes as a particular case. First, the authors proposed several design configurations based on the concatenation of a pre-coding or a post-coding with an LDPC code. We went on with further investi- gations and found out that the most performant scheme consist in the concatenation of a pre-coding with an LDPC code namely the split-LDPC [5],[80]. In this thesis we focus on the design of the split-LDPC codes in presence of redundant memoryless data; and the asymptotical analysis of the associated source-controlled sum- product decoding. Firstly, we shall investigate the new theoretical limits in presence of non uniform sources and show theoretically the advantage of non systematic constructions over sys- tematic ones. Then we shall introduce the novel class of non systematic LDPC codes and prove the supremacy in term of performance of splitting structures over all other non systematic codes on graph constructions including MN codes. Secondly, we shall perform the asymptotical analysis of split-LDPC codes through new tools obtained by adapting the classical general theory related to Density Evolution and EXIT Chart analysis to the split-LDPC case and the non uniform source assumption. Using the tools introduced for the analysis, we shall address the problem of designing good irregular split-LDPC constructions. Finally, we shall extend the context of our communication setting by considering the practical scenarios, where the knowledge of the source and channel parameters is im- possible. For so doing, we shall investigate jointly the decoding and the estimation of the source and channel parameters, using an iterative estimation technique based on the Expectation Maximization (EM) algorithm. The material covered in this thesis has the special appeal that it unifies many themes of information theory, coding, and communication. The outline of this thesis is as follows: 2
  • 13. I NTRODUCTION • In Chapter 1, we demonstrate using information theory tools why non systematic constructions show the best theoretical limits in presence of redundancy. In order to attain these limits, we introduce a universal family of non systematic LDPC codes extending Mackay construction and well suited to SCCD sum-product decoding. The proposed configurations are based on the concatenation of a pre-coding or a post-coding with an LDPC code. The precoding module consist in a sparse matrix ’scrambler’ or the inverse of a sparse matrix ’splitter’ dedicated to transform the re- dundant data bits into uniformly distributed coded bits . We show how to perform a source controlled iterative decoding (SCCD) on the factor graph associated to the non-systematic LDPC codes. We prove the supremacy in term of performance of splitting structures over prior non-systematic LDPC constructions as well as the other realizations of our codes. The material of this chapter is reported in part in [5] and [80]. • Chapter 2 intends to determine the asymptotical performance of the split-LDPC codes, and evaluate how close are they from the theoretical limits. In this pur- pose, we derive a message-oriented density evolution analysis of the split-LDPC source controlled sum-product decoding. The basic concepts,features, assumptions and results of the general theory are conformed to the split-LDPC construction and the presence of redundant data. The analysis reveals that the split-LDPC construc- tion shows a good asymptotical performance. In order to make our analysis com- plete, we perform a stability analysis of the dynamical system associated to the split-LDPC density evolution algorithm. We derive a general stability condition as- sociated to the split-LDPC code and involving the source statistics as well as the splitter structure. We investigate if the stability of the whole system is related to the stability of the constituent LDPC code for a considered channel condition. The material of this chapter is reported in part in [5] and [4]. • In Chapter 3, we propose a message oriented two-dimensional EXIT chart analysis for split-LDPC codes. Our approach is based on the elimination of the Gaussian as- sumption at the output of the checknodes , and the use of the mutual information as a measure of approximation. By doing so, we compute the split-LDPC codes thresh- olds within few thousandths of a decibel, at the cost of a lower complexity than den- sity evolution approaches. Through the formulation of the two-dimensional EXIT chart analysis of split-LDPC codes, the source controlled iterative decoder is to be illustrated as a bi-dimensional dynamical system. Hence, the graphical representa- tion of charts will bring more insight to the process of convergence to fixed points, as well as to the stability issues of the associated iterative decoding system. Finally, within the framework of our proposal, we formulate the problem of designing ir- regular split-LDPC codes as a linear program. We propose practical algorithms to solve this problem and design capacity achieving split-LDPC codes. Hence, we obtain irregular split-LDPC constructions, with a significantly reduced gap to the Shannon limit. The material of this chapter is reported in part in [4]. • Chapter 4 is devoted to the practical scenarios, where the knowledge of the source and channel parameters is impossible and an estimation module is required. There- fore, we investigate jointly the decoding and the estimation of the source and chan- nel parameters, using an iterative estimation technique based on the Expectation Maximization (EM) algorithm. We describe how the estimation technique can be 3
  • 14. I NTRODUCTION integrated into the non systematic decoding process. We propose a scheduling for the interaction between the sum-product decoder and the EM estimation module. Our approach includes both systematic and non systematic LDPC codes, and can be extended to any other block codes [58] decoded by the sum-product algorithm. Our study is applied to the discrete binary symmetric channel with erasures (BECBSC) and the continuous complex additive white Gaussian noise channel (AWGN). The simulation results confirm that using the EM estimation technique no loss in error- rate performance is observed with respect to the perfect knowledge case, at the expense of a negligible complexity. Hence, we show that the decoder is able in a blind mode to perform as good as a perfect knowledge situation. The material of this chapter is reported in part in [19]. • Eventually,conclusions, future work perspectives and open problems are given in Chapter 5. 4
  • 15. Chapter 1 Non Systematic Constructions of Codes on Graph for Redundant data Claude Elwood Shannon in his gospel mandates : "The redundancy must be introduced in the proper way to combat the particular noise struc- ture involved. However any redundancy in the source will usually help if it is utilized at the receiving point. In particular if the source already has a certain redundancy and no attempt is made to eliminate it in matching to the channel, this redundancy will help to combat noise". In a tandem view, when channel and source coding are treated independently , the first cited redundancy is the one introduced by the forward error correcting code, while the second one is related to the source and could be natural (no source coding applied)or residual (compression is applied). Two directions have been taken, to demonstrate Shannon intuition. In 1995 Hage- nauer has shown how iterative probabilistic decoding is able to exploit the source statis- tics via the Source Controlled Channel Decoding (SCCD) strategy. At the end of the nineties, Mackay then later Alajaji proposed to use non systematic constructions to encode redun- dant data, because their theoretical limits are better than systematic ones in presence of redundancy. Both authors focused on capacity achieving codes family, in order to ap- proach as close as possible the challenging theoretical limits shown with non systematic constructions. Hence, a well designed non systematic construction added to an SCCD decoding constitute the key of success of a coding system in presence of redundant data. In our setting, we investigate this idea and demonstrate using information theory tools why non systematic constructions show the best theoretical limits in presence of redundancy. In order to attain these limits we describe a particular design of non sys- tematic LDPC codes extending Mackay’s construction and well suited to SCCD belief propagation decoding. We reveal that one specific realization of this code family outper- forms the prior LDPC constructions as well as the other realizations of the family. We explain how the introduced design criterias are involved in such performance. 5
  • 16. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA 1.1 Introduction 1.1.1 Motivation and Related Work In most of tandem communication schemes, the channel coding procedures are designed and analyzed for uniformly distributed sources. To a large extent, this approach was justified by Shannon’s famous separation theo- rem [83] stating that separate source channel coding incurs no loss of optimality as long as the entropy of the source is less than the capacity of the channel providing that block- length goes to infinity. In practice, delay and complexity restrictions related to extremely long blocklength, made the Joint Source Channel Coding (JSCC) approaches expected to offer improvements for the combination of a source with significant redundancy and a channel with significant noise[110]. Also, the separation is no longer valid in multi-user scenarios [72]. Enhancing the tandem schemes performance by making the channel codes exploiting the source redundancy which could be natural or residual, is in essence a joint source- channel coding problem. This problem was first intuited by Shannon [83], then for- malized by Hagenauer proposal via Source Controlled Channel Decoding (SCCD) [39] thanks to iterative probabilistic approaches. The natural redundancy is considered in the situations where it is not worth to com- press, for instance when the channel conditions are bad or medium, thus it is preferred exploiting this redundancy in the decoding step instead of compressing the source [20]. Conversely,when sources are highly redundant, a source encoder would be used. An "ideal" source encoder would produce an independent, identically distributed (i.i.d.) sequence of equiprobable bits at the output and eliminate all the source redundancy. However, lossless Variable length or entropy codes (e.g., Huffman codes), which can be asymptotically optimal,are avoided since they imply error-propagation problems in the presence of channel noise. Therefore, over noisy channels we commonly use fixed length encoders which are suboptimal and leave some residual redundancy at their output[37]. In presence of non uniform sources, the study of information theoretical limits has shown that the non systematic codes exhibits better asymptotical limits than systematic constructions . Those codes supply the capacity achieving distribution to the considered channel as highlighted by Shamai and Verdu in [78]. Consequently, when a redundant source is coded with a non systematic code, then is transmitted over a noisy channel and decoded via a SCCD strategy, the Shannon capacity limits are moving to better regions. Best candidates in the race to these challenging information theoretical limits are capacity achieving codes on graphs, namely Turbo Codes introduced by Berrou and Glavieux in 1993 [17] and Gallager Low Density Parity Check codes (LDPC) codes [33] proposed in the sixties and resurrected after turbo code invention [53], [51] and [70]. This class of codes defined on graph is renowned to be able to approach Shannon capacity bound very closely with a reasonable decoding complexity. Both families are obtained by connecting simple component codes through an interleaver then decoded via itera- tive algorithms consisting of the application of soft, decentralized decoding algorithms of these simple codes, (eg., BCJR and message passing) [10],[33]. Those algorithms are realizations of the "Belief Propagation" [63], a more general algorithm that is well suited to integrate the SCCD strategy. Until the end of the twentieth century, most developments in coding area have con- sidered the design and optimization of codes defined on graphs for uniformly distributed 6
  • 17. 1.1. I NTRODUCTION sources, until David Mackay with Oliver Neal[53] made the exception in 1995 when they proposed Mackay-Neal (MN) codes which are a class of non systematic codes obtained by the scrambling of low-density generator-matrix [51]. The decoder of an MN code was specifically designed for binary symmetric channel. Hence, when it is applied to additive white Gaussian noise (AWGN) channel, this is done by hard decision quantization of the a priori values,which is involving a degradation in decoding performance. Other work for SCCD with LDPC codes using systematic LDPC codes was proposed by Shamir in [81]. The author has shown a gain compared to standard decoding owing to the utilization of source statistics, however this performance is still less than the one expected with non systematic constructions. The main following proposals were made by Alajaji et al. and were mainly related to non systematic Turbo codes for AWGN channels [3],[114],[116], [115], [113],and more recently for wireless fading channels [112]. In [1], Adrat et. al proposed an improved iterative source-channel decoding for systematic Turbo codes using EXIT charts. Another straightforward solution to obtain non systematic code, is to generate a lower rate systematic LDPC code, puncture the systematic bits and transmit the parity bits. However puncturing has been proved to be unsuccessful and the gain attained not suffi- cient to offset the loss of performance due to puncturing [79]. In the IEEE 2005 International Symposium on Information Theory G.Shamir and J.J.Boutros have introduced a novel general approach to design a special class of non systematic low density parity check codes suited for non uniform memoryless sources. First, the authors proposed several design configurations based on the concatenation of a pre-coding or a post-coding with an LDPC code. Subsequently we have carried on, with further investigations to reveal the most performant scheme namely the split-LDPC [5],[80]. The purpose of the present chapter is to introduce this universal class of non sys- tematic low density parity check codes through the underlying concepts and techniques involved in the design of this family code as well as in the algorithmic strategies un- dertaken to decode it. Moreover, we will prove using information theoretical tools the supremacy of the split-LDPC construction. 1.1.2 Contributions The main contributions of the author for this chapter are based on the results reported in the two papers presented at the Allerton Conference on Communication, Control, and Computing [5], and the inaugural of Information Theory and Applications Workshop [80]. The innovative contribution in this chapter is threefold. First, we demonstrate in term of Information theoretical limits the advantage of non systematic code constructions over systematic ones, in presence of non-equiprobable memoryless sources. We focus on binary memoryless channels namely BEC, BSC and AWGN channel, and evaluate analytically the losses involved in systematic coding in terms of mutual information and energy. Then, we introduce and study the general method proposed by Shamir and Boutros to construct a universal family of non systematic LDPC encoders and decoders comprising two kinds of constructions called scramble and split LDPC. We describe how to perform source controlled decoding (SCCD) for exploiting the redundancy of the source during the decoding step. 7
  • 18. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA Finally, we place the split and scramble LDPC codes in the more general framework of non systematic codes, and prove the split-LDPC as the most performant code design over other non systematic constructions namely the scramble-LDPC and the MN codes. We show that MN and our class of non systematic-LDPC codes are both members of a more general family called Multi-Edge LDPC [69]. The most important results are the following : • We demonstrate that in the presence of source redundancy there may be a signifi- cant advantage to the use of a well designed non-systematic channel encoding over a systematic one. • We describe general methods for designing non-systematic LDPC codes by scram- bling or splitting redundant data bits into coded bits. These methods consist mainly of cascading a sparse matrix or the inverse of a sparse matrix with an LDPC code. • Splitting based LDPC codes achieve better gains in the presence of redundancy than other known codes, including MacKay-Neal (MN) codes, without significant loss in performance even if the data contains no redundancy. 1.2 System Model and Notations 0.9 p(s = 1) = µ 0.1 0 1 Source Channel Encoder Channel Channel Decoder Sink µ Figure 1.1: Block Diagram for a Source Controlled Channel Coding System Let us describe the considered system model following the block diagram illustration given in the Figure (1.1). Our setting assumes a non-uniform binary independent identically distributed (i.i.d) source which generates a binary sequence of length K denoted by s (s1 , s2 , ..., sK )T where si ∈ {0, 1}. The source follows a discrete Bernoulli probability distribution (pdf), characterized by the parameter µ = P (si = 1). This parameter is the probability that a source symbol equals 1, i.e. 0 < µ ≤ 1 . The source entropy is given by Hs = H2 (µ),where 2 0 < Hs ≤ 1 and H2 (x) = −x log(x) − (1 − x) log(1 − x) is the binary entropy function. The logarithm function is taken, here and elsewhere in the report, to be the logarithm function of base of 2. Our model is not restrictive, because it still possible to convert a general finite-memory source into a piecewise i.i.d. non-uniform source as proposed in [28]. The interested reader may find the developments related to finite memory sources in [36]. 8
  • 19. 1.3. I NFORMATION T HEORETICAL L IMITS FOR N ON U NIFORM S OURCES We assume that the source sequence is directly fed to a linear blocklength channel encoder C(N, K) without any data compression. Our study is restricted to channel en- coding via systematic and non-systematic binary linear codes of rate R = K/N . The codeword has length N ≥ K and is denoted by c (c1 , ..., cN )T . We assume x (x1 , ..., xN )T the (B.P.S.K) modulated codeword which is transmitted over a symmetric memoryless noisy channel. Three kinds of binary input symmetric out- put channels are considered in our setting: the binary erasure channel (BEC), the binary symmetric channel (BSC) and the binary input additive white Gaussian noise channel (BIAWGN). In all the following we denote by AWGN the binary input AWGN channel, continuous input AWGN channel will be indicated if necessary. The BEC channel has a binary input and a ternary output and is characterized by an erasure probability denoted ε. The BSC channel is characterized by its cross-over probability denoted λ. The AWGN channel is characterized by one sided power spectral density N0 . The received noisy vector denoted by y (y1 , ..., yN )T is introduced in the decoder which is supposed to know the distribution of the source as well as its parameter µ. 1.3 Information Theoretical Limits for Non Uniform Sources In this section, we illustrate the advantage of the best possible non-systematic codes over systematic codes, when the source is redundant.We show how the mutual information between channel input and output can be made closer to the maximal achievable one using well designed non systematic codes, while systematic codes constrain the mutual information to be smaller. We start by introducing some useful information theory facts : Let I(X;Y) H(Y) − H(Y|X) denotes the mutual information between the channel input vector X and the channel output vector Y . Capital letters are used to denote random variables and Capital boldface letters are used to denote random vectors. The channel capacity is given by : C = max I(X;Y) (1.1) p(x) where the maximum is taken over all possible input distribution p(x) [22]. In practice, we use the following capacity expressions for the continuous channel case , when the channel is discrete we replace integration operator by the summation one : P (y|x) C = max P (x)P (y|x) log dx dy (1.2) P (x) P (x′ )P (y|x′ ) x y x′ For a given channel and channel input distribution, the theoretically achievable channel code rate R satisfies : N × Hs × R = I(X;Y) (1.3) Let us recall that the discrete uniform input distribution is achieving the capacity for a binary discrete input memoryless channels, as well as the Gaussian distribution does with the continuous input AWGN channel. Besides, the discrete uniform input distribution attains the maximal achievable mutual information over the binary input AWGN (BIAWGN), which is indeed lower than the exact capacity of the channel and is usually limited by th size of the modulation. 9
  • 20. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA The systematic code weakness arises from the fact that they result in channel se- quences that are not uniformly distributed for most redundant sources. Hence, the em- pirical distribution of the transmitted sequences is far from the capacity achieving distri- bution. (see [78] for discussion about the empirical distribution of good codes) In the following, let us evaluate the mutual information I(X; Y ) for both systematic and non systematic codes design for the BEC(ε), BSC(λ) and BIAWGN (N0 ), hence the equation (1.3) becomes : Hs R = I(X; Y ) (1.4) For a systematic code, when the source is non uniform the modulated sequence x in- cludes a non uniformly distributed information sequence xs and a uniformly distributed parity sequence. The mutual information for a systematic code is the average taken over these two populations : Isys (X; Y ) = R Iinf ormation (Xs ; Ys ) + (1 − R) Iparity (X; Y ) (1.5) Besides, a well designed theoretically optimal non systematic code can generate an empirical uniform distribution, achieving the maximal achievable mutual information asso- ciated to the channel as well as to the considered modulation i.e. BPSK: Inonsys (X; Y ) = Iparity (X; Y ) (1.6) Accordingly, the systematic construction reveals a loss in mutual information as we will demonstrate in the following. 1.3.1 AWGN Channel Let us begin with the continuous case and consider an AWGN channel with a real Gaus- sian code book of rate R = K/N for channel encoding.The information rate transmitted by the encoder is the product Hs × R (bits per real dimension) and the Gaussian dis- tribution is the capacity achieving input distribution for the Gaussian channel. For a given source distribution and a fixed coding rate, in order to fix the minimal achievable signal-to-noise ratio per bit, Ebr /N0 , where Ebr denotes the energy per redundant source symbol, following (1.3)and (1.4) we solve the equation where information rate is equal to the channel capacity to obtain : 1 Hs R = log (1 + 2REbr /N0 ) (1.7) 2 Then , we find the well known expression of the theoretical minimum achievable SNR [36] 22Hs R − 1 Ebr /N0 = . (1.8) 2R We obtain the Shannon limit to the uniform source case, by considering Hs = 1, as an upper bound to (1.8). Consequently if one doesn’t consider the statistics of the source and assumes it uniform, a suboptimal limits would be achieved and a potential gain would be lost. For an AWGN channel with BPSK input and a real continuous output, the maximal achievable mutual information denoted CBP SK is written as: CBP SK = 1 − E [log2 (1 + exp(−2X/N0 ))] , X is N (1, N0 ) (1.9) 10
  • 21. 1.3. I NFORMATION T HEORETICAL L IMITS FOR N ON U NIFORM S OURCES where the mathematical expectation is denoted by E[·], and N (1, N0 ) denotes√ Gaussian a random variable with mean 1 and variance N0 . The BPSK amplitude A = 2Rc Ebr is normalized to 1. As argued in equation (1.5) we obtain the best achievable average mutual information for a systematic code with BPSK modulation by averaging the systematic bits mutual information denoted I(Xs ; Ys ) and the parity bits best mutual information CBP SK as written in the following: Isys (X, Y)/N = R I(Xs ; Ys ) + (1 − R) CBP SK ≤ CBP SK . (1.10) Where : I(Xs ; Ys ) = (1 − µ) E [log2 (1 − µ (1 − exp(2Y /N0 )))] (1.11) + µ E [log2 (1 − (1 − µ) (1 − exp(−2X/N0 )))] , where X is N (1, N0 ) and Y is N (−1, N0 ). A theoretically optimal non-systematic code, can generate uniform distributions for all components of X and achieve CBP SK . Accordingly, we can deduce that : f (X)µ f (−X)1−µ Isys = Inonsys − R × E log2 1 (1.12) f 2 (X) We define f (X) = µeX/N0 + (1 − µ)e−X/N0 where X is N (1, N0 ) , and keep the same notations and assumptions as in (1.9). The equations (1.10) and (1.12) demonstrate the loss in mutual information induced when systematic codes are encoding non uniform data. The minimum achievable Ebr /N0 for AWGN channel corresponding to the threshold is found by solving numerically the equality between the information rate with (1.9) for the non systematic case on the one hand, then with (1.12) for the systematic case on the other hand. The threshold variation versus the source entropy is illustrated in Figure (1.2) where we can observe that the gain of non systematic codes is more significant with highly biased sources. Furthermore, the Figure (1.3) shows that the loss of systematic codes is increasing significantly with the channel code rate, pointing out that for high rate codes there is a significant benefit in using non-systematic codes. This is expected, since high rate sys- tematic codes contain more non-uniformly distributed bits, supplying to the channel a distribution far from the capacity achieving one. 1.3.2 Binary Erasure Channel The maximal achievable mutual information attained with a non systematic code in a binary erasure channel with ε as erasure probability is : Inonsys (X; Y ) = 1 − ε (1.13) With a systematic construction we find using (1.5): Isys (X; Y ) = Inonsys (X; Y ) − R (1 − ε) (1 − Hs ) (1.14) 11
  • 22. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA 2 0 Minimum Achievable Eb/N0 (dB) -2 -4 -6 -8 Systematic code, BPSK input Scrambled ds=3 -10 Scrambled ds=5 Non-Systematic code, BPSK input Gaussian input -12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Source Entropy (bits) Figure 1.2: Minimum achievable Ebr /N0 vs. source entropy Hs Coding rateR = 0.5. The upper limit on ε noted εth characterizing the threshold of the code should satisfy (1.4); thus we obtain : εth sys = 1 − R Hs (1.15) 1−R εth nonsys = (1.16) 1 − R(1 − Hs ) The loss of mutual information is equivalent to an increase of the threshold as we can observe in the Figure (1.4). 1.3.3 Binary Symmetric Channel The maximal achievable mutual information attained with a non systematic code in a binary symmetric channel with λ as channel transition probability is : Inonsys (X; Y ) = 1 − H2 (λ) (1.17) With a systematic construction let us define γ = µ(1 − λ) + (1 − µ)λ, then using (1.5) we find: Isys (X; Y ) = Inonsys (X; Y ) − R [1 − H2 (γ)] (1.18) In order to evaluate the threshold in the BSC case, we need to solve (1.4) numerically. In a similar way to last mentioned BEC case, the loss of mutual information is equivalent to an in crease of the threshold as we can observe in Figure (1.4). In presence of redundant sources non systematic codes are more advantageous, be- cause when some constraints are satisfied, they generate asymptotically uniform output. Whereas, systematic constructions involve a mismatch between the biased distribution 12
  • 23. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH 6 Systematic codes, BPSK input 5 Scrambled ds=3 Scrambled ds=5 4 Non-Systematic code, BPSK input Gaussian input Minimum Achievable Eb/N0 (dB) 3 2 1 0 -1 -2 -3 -4 -5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Coding Rate Figure 1.3: Minimum Achievable SNR versus Coding Rate in AWGN, Hs = 0.5. of the systematic symbols and the uniform input distribution needed to achieve channel capacity. This interpretation comes in light of Shamai and Verdu work in [78] proving that the empirical distribution of any good code should approach the capacity achieving input distribution. To attain the challenging theoretical limits computed in the present section, well de- signed non systematic constructions are required. As a matter of fact, capacity achieving codes are the best candidates to attain these limits. Accordingly, we propose to set in the following section, the main issues related to the design principles of non systematic codes. Then, we will describe the prior and novel non systematic capacity achieving codes constructions suited to non uniform sources. 1.4 Design Principles for Non Systematic Codes on Graph Non systematic constructions are made attractive because of their better use of coding space, and their convenience for redundant data. Puncturing the systematic bits of a lower rate systematic code, could be a straightforward method to build a non systematic code , then passing the a-priori probabilities to the punctured systematic bits during the decoding process would help improving performance of the punctured code, neverthe- less the gain attained with SCCD is not sufficient to offset the loss of puncturing. Building good non systematic codes for redundant source requires setting-up some design crite- rias following two main issues : • Encoding should ensure the transformation of a non uniform distributed source into a uniformly distributed one. • Decoding should follow an SCCD strategy exploiting the statistics of the source. 13
  • 24. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA 1.00 BEC + Non Systematic 0.90 BEC + Systematic BSC + Non Systematic 0.80 BSC + Systematic Channel Transition Probability 0.70 BEC 0.60 0.50 0.40 0.30 BSC 0.20 0.10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Source Entropy (bits) Figure 1.4: Capacity limit versus source entropy for BEC and BSC. First proposals for non systematic codes have been made for turbo code first by Massey et al. in [55], then [11], where the proposed asymmetric nonsystematic turbo codes (NSTC) outperform Berrou’s (37, 21)code [16] by about 0.2 dB at the 10−5 BER level. The motivation for using these NSTC is due to their larger code space, while in presence of highly biased sources these codes don’t approach Shannon capacity as close as in the uniform case. In order to fill this gap Alajaji in [116], then Shamir in [82] came with non systematic turbo codes constructions well suited for non uniform sources. In most of those proposals, constituent codes with the Quick-Look-in (QLI) property are used. The QLI property is defined such that the two parity sequences added together (modulo 2) are equal the information sequence (or a delayed version of it). Actually, the QLI property makes the nonsystematic constituent code close to a systematic one, then the initial extrinsic estimates for the information bits are enhanced and good enough to start-up the iterative convergence process for the second constituent. Consequently, the performance in the waterfall region is made better for uniform distributions of the source. Similarly, in the case of more redundant data, Shamir et al. have shown in [82] that QLI codes exhibits a good adaptation. Recently, Zhu et al. in [116], have found out necessary and sufficient conditions, for recursive convolutional encoders, to supply an asymptotically uniform marginal output distributions,regardless of the degree of source non uniformity. These properties offer pertinent design criterias for constructing good turbo codes dedicated to heavily biased non uniform i.i.d sources. The most significant proposal for non systematic LDPC-like codes, was Mackay’s con- struction in 1999 [51]. The Mackay-Neal (MN) codes are based on a low density generator matrix (LDGM) code. this construction was followed by Shamir and Boutros in [79] with a novel general approach called "Non Systematic LDPC" and including MN encoder con- 14
  • 25. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH struction as particular case, while the general decoder procedure enhances the MN’s one. The two approaches dedicated to redundant data follow the same design principle, i.e. the use of a pre-coder, or post coder emulating the effect obtained by the quick look in property in the non systematic turbo code proposal in [82]. Besides, we should also cite, another class of non systematic LDGM-like codes called "Rateless codes" or "Fountain codes" dedicated to network communication. This codes family includes LT codes [49] and Raptor codes [87]. Some common design guidelines may be found between Raptor and scramble-LDPC codes, however to the best of our knowledge, so far there is no study applying rateless channel encoding to non uniform data. In the following, after some preliminaries on LDPC codes, let us give a detailed de- scription of the design and decoding of the two codes family MN, then the universal class of non systematic LDPC codes. 1.4.1 Preliminaries on LDPC codes c1 P CE1 R c2 A N D O P CEj M ci dc db G R A P H P CEℓ cN N bitnodes ℓ subcode nodes Figure 1.5: Factor Graph of an LDPC Code Low Density Parity Check codes (LDPC) invented by Robert Gallager in 1963 [34], are linear block codes, which are described by a low density parity check matrix and repre- sented through a sparse bi-partite factor graph . The graphical and matricial descriptions are related by the fact that the parity check matrix of the code is the adjacency matrix of the bi-partite graph. Similar descriptions are used with Low Density Generator Matrix (LDGM) codes, by considering the generator matrix instead of the parity check one in all descriptions. The sparse nature of both parity check matrix and graph structure is the key property involving the algorithmic efficiency of LDPC codes. Hence,in the matricial description 15
  • 26. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA the code is defined through the evaluation of the nonzero entries in the matrix H per line or column. The code is regular if the number of these nonzero entries doesn’t change from a column to another as well as from a line to another. Otherwise, we say that the code is irregular. The irregular structure being more general, will be the one described in the following. The LDPC codes are usually , described and studied in term of their graphical rep- resentation. The graphical descriptions for codes started with Tanner graphs for linear codes [89], then the graphical understanding of the "codes on graphs" has been shaped, when the more general concept of "factor graph" was introduced [46]. Genealogically, factor graphs are a straightforward generalization of the "Tanner graphs" of Wiberg et al. [102], [31]. In Tanner’s original formulation, all variables are codeword symbols and hence are "visible"; Wiberg et al., introduced "hidden" state variables and also suggested applications beyond coding. Factor graphs take these graph-theoretic models one step further, by applying them to functions, in order to represent the factorization of a mul- tivariate function into simpler functions. Therefore, a factor graph is a graphical model which is naturally more convenient for problems that have a probabilistic side. Following the factor graph trend, Turbo codes and LDPC have been unified as "codes on graphs", and their decoding algorithms have been revealed in ([45],[56]) as special instances of the belief propagation algorithm on general Bayesian networks [63]. Not only has this understanding helped coding theorists in analysis of these codes and design of decoding algorithms for them, they also learned how to design their codes to get the best out of a given decoding algorithm. A factor graph for an LDPC code is always a bipartite graph related to a parity check matrix H of size ℓ × N , and partitioned to N left nodes called variable nodes and ℓ right nodes as function nodes usually called checknodes, and E edge linking the two parts. The Figure (1.5) shows an example of such a bipartite graph. Notice that in this Figure, the variable nodes are shown by circles and the checknodes by squares, as is the case for most factor graphs. A variable node is a binary variable from the alphabet {0, 1} representing the N sym- bols of the codeword c. A checknode P CEj is an even parity constraint on its neigh- boring variable nodes. Each checknode represents a row of the parity check matrix H as each variable node corresponds to a column . Each non zero entry at the position (i,j) in the matrix coincides to an edge from the variable node cj to the checknode P CEi .The represented linear code is of dimension k ≥ (N − ℓ) with equality if and only if all the parity constraints are linearly independent. A graph ensemble is specified through two polynomials : • the polynomial λ(x) associated to variable nodes denoted as : dbmax λ(x) = λi xi−1 (1.19) i Where λ = {λ2 , λ3 , ..., λdbmax } denotes the variable edge degree distribution; where λi denotes the fraction of edges incident on variable nodes of degree i and λi ∈ [0, 1]. • the polynomial ρ(x) associated to checknodes denoted as : dcmax ρ(x) = ρi xi−1 (1.20) i 16
  • 27. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH Where ρ = {ρ2 , ρ3 , ..., ρdcmax } denotes the check edge degree distributions; where ρj denotes the fraction of edges incident on checknodes of degree j and ρi ∈ [0, 1]. Notice that the graph is characterized in terms of the fraction of edges of each degree and not the nodes of each degree. For a given length N and a given degree distribution (λ, ρ), we define an ensemble of codes by choosing edges randomly. It has been shown in [73] that the average behavior of almost all instances of an ensemble of irregular codes is concentrated around its expected behavior, when the code is large enough. Additionally, the expected behavior converges to the cycle- free case. Given the degree distribution of an LDPC code (λ, ρ)and its number of edges E, it is easy to see that the number of variable nodes n is 1 λi N = E × = E λ(x) dx (1.21) i 0 i and the number of checknodes ℓ is 1 ρi ℓ = E × = E ρ(x) dx (1.22) i 0 i Therefore the design rate of the code will be 1 0 ρ(x) dx R=1− 1 (1.23) 0 λ(x) dx Finding a good asymptotically family of irregular codes is equivalent to finding a good degree distribution. The task of finding a degree distribution that results in a code family with some required properties is not a trivial task and will be one of the focuses of this thesis. 1.4.2 Mackay-Neal Codes Mackay-Neal codes [51], also called ’MN’ codes, are non systematic LDGM codes.The key idea behind MN codes is that the generator matrix is constructed in terms of an invertible matrix, in such a way that the sparse source and the sparse noise can be treated symmetrically during the decoding process. The coding principle consists in encoding a source vector s by a Low Density Gener- ator Matrix (LDGM) code defined by the matrix C1 , followed by the scrambling of the LDGM codewords by the inverse of a sparse matrix denoted C2 . The block diagram description of the MN encoding process is illustrated in Figure (1.6) and is written as following : −1 c = C2 C1 s (1.24) The dimensions of C1 and C2 are respectively N × K , and N × N . Both matrices are sparse and can be found by applying a Gaussian elimination to an N × (K + N ) LDPC matrix H of column weight db and row weight dc (see the construction proposed in [51]). After applying column permutations to guarantee the full rank, H is written as H = [C1 |C2 ]. The decoding problem is considered for a BSC, where the received vector is y=c+n (1.25) 17
  • 28. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA y C2 y PCE n −1 C2 PCE s C1 C2 c [N × K] [N × N ] n LDGM Splitter PCE s C1 s Figure 1.6: Mackay-Neal Codes: Block and Tanner Graph descriptions n is the binary noise vector of length N , assumed to be sparse with independent and identically distributed bits. The decoding procedure should solve : C2 y = C1 s + C2 n (1.26) In this purpose, message passing decoding is performed on the Tanner Graph corre- sponding to the equation (1.26) as it is illustrated in the Figure (1.6). In the above description, MN codes are regular since the column weight of C1 and C2 is equal to db . Improved irregular MN codes have been proposed by Kanter and Saad in [43]. Stability of irregular MN codes has been improved in [69]. One of the main drawbacks related to MN codes , is that their decoder is specifically designed for the binary symmetric channel . The decoding principle can be extended to real-output channels by hard decision quantization of the a-priori values passed to the noise bitnodes [51], however the quantization imply a degradation of the decoding per- formance. 1.4.3 Non Systematic LDPC Framework In this section, we propose a novel general approach to build an LDPC based non sys- tematic code. We describe several different ways for building the encoder. All the config- urations can successfully utilize redundancy in the coded sequence during the decoding process. The general idea is to concatenate a pre-coding (or post-coding) block that uses an invertible square matrix with an LDPC or an LDGM encoder. Then the decoder performs over one single irregular bipartite factor graph combining the code and the precoder or the post coder. The systematic bits are not sent over channel, but included in the decoding graph as a subset of the set of bitnodes, to which a-priori information can be provided. 18
  • 29. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH −1 LDPC or LDGM s Cs or Cs c [K × K] Pre−Coding −1 LDPC or LDGM Cs or Cs s c [N × N ] Post−Coding Figure 1.7: General Non-Systematic LDPC Encoding Framework The set of bitnodes is completed with the set of transmitted codeword bits, and in some of the configurations, but not all, another subset of un-transmitted bits can be added. The parity checknodes in the decoder graph are those,of both the pre-coder or post-coder and of the LDPC or LDGM code. The pre-coder (or post-coder) is a square matrix which is either of low density or has an inverse that is sparse. The main idea is that the pre-coder (or post-coder) emulates the effect obtained by the Quick-Look-In property in turbo codes, by converting the system- atic non uniform message to an almost uniformly distributed one. Hence, there are eight different possible configurations as illustrated in Figure (1.7). Each one depends on the combination choices among: 1. LDPC or LDGM as a core code. 2. Pre-coder or Post-coder. 3. Low density square matrix denoted as scrambler or a matrix whose inverse is low density denoted as splitter. One specific configuration of our codes is the encoder of an MN code which could be viewed as a concatenation of a non-systematic LDGM code with a post coding splitter. However our proposed general decoder procedure is channel independent and is differ- ent from the one proposed in the original MN proposal [51]. Thus, the proposed general decoder procedure gives an enhanced decoding alternative for the MN. In the remainder of this section, we focus on pre-coding with either a scrambler or a splitter combined with an LDPC code. We refer to a pre-coding scrambler system as a scramble-LDPC code, and to the pre-coding splitter as a split-LDPC code. Post-coding scrambler and splitter LDPC codes will be referred to as LDPC-scramble and LDPC-split codes, respectively. The methods described herein may be easily adapted to implement all the other configurations. 1.4.3.1 Scramble-LDPC Construction Definition 1.1. A scramble-LDPC code is built by the concatenation of a pre-coding module denoted "Scrambler" with an LDPC code.The scrambler is described by a sparse matrix denoted Cs . 19
  • 30. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA v (1−R).N β (1−R).N v db LDPC β u db R.N dc u Cs LDPC s c Scrambler s α ds R.N s Cs α R.N ds s α Scrambler Figure 1.8: Scramble LDPC Tanner Graph and block Diagram Description The block diagram as the factor graph of the scramble-LDPC are illustrated in the Figure (1.8).A scramble-LDPC encoder first encodes (scrambles) the source vector s into u by : u = Cs s (1.27) Let Cs be a sparse matrix of dimensions K × K.For a regular scrambler, Cs has row and column weight ds . Similarly to the LDPC case, we can consider building scramblers with an irregular structure. Afterwards, standard LDPC systematic encoding is performed on the scrambled vector u,to generate the codeword : c = [uT |ϑT ]T = G u (1.28) where G is a systematic generator matrix for the low-density parity check matrix H, and the superscript T denotes the transpose operator. Hence, the codeword c consists of the K scrambled bits of u and the N − K parities over the scrambled bits denoted ϑ.For a regular LDPC code, the parity-check matrix H has column weight db and row weight dc . The decoding graph for the scramble-LDPC combines parity checks of the LDPC codes, denoted by β, with those obtained from the scrambler, denoted by α as shown in Figure (1.8). The decoding process is realized , by performing over this whole graph, a source controlled belief propagation as described further in the next section. 1.4.3.2 Split-LDPC Construction Definition 1.2. A split-LDPC code is built by the concatenation of a pre-coding module denoted "Splitter" with an LDPC code.The splitter is described by the inverse of a sparse matrix denoted −1 Cs . 20
  • 31. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH v (1−R).N β (1−R).N v LDPC dc db β u R.N u Cs C−1 s LDPC ds s c ds Splitter Splitter s α R.N s α R.N s α Figure 1.9: Split-LDPC Tanner Graph The block diagram as the factor graph of the split-LDPC are illustrated in the Fig- ure (1.9).The split-LDPC encoder is very similar to the scramble-LDPC one’s described upwards, except that the scrambling operation is replaced by splitting performed by −1 u = Cs s (1.29) Where Cs is a sparse matrix of dimensions K × K, that has in the regular case row and column weight ds .Similarly to the scrambler case, we can consider building splitters with an irregular structure. Afterwards, standard LDPC systematic encoding is performed on the scrambled vec- tor u,to generate the codeword : [uT |ϑT ]T = G u (1.30) where G is a systematic generator matrix for the low-density parity check matrix H, and the superscript T denotes the transpose operator. Hence, the codeword c consists of the K splitted bits of u and the N − K parities over the scrambled bits denoted ϑ.For a regular LDPC code, the parity-check matrix H has column weight db and row weight dc . −1 The matrix Cs describing the splitter is a dense matrix, given that it is the inverse of a sparse one, according to [12]. The latter fact gives a reason to believe that, with redundant sequences, a split-LDPC code has an advantage over a scramble-LDPC code in generating channel distribution closer to the uniform capacity achieving one. This is also because splitting results in an even split to 1 and 0 bits; while scrambling leads to output distribution closer to uniform from the original nonuniform one, this distribution can still be shown to be nonuniform. 21
  • 32. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA ϑ bits u bits s bits Systematic 011011001100000000000000000000 LDPC 001010000010000011010000000000 000001101001011000000000000000 100000010010000001110000000000 000101110100010000000000000000 110000000001010001100000000000 011100000001001100000000000000 000010111010100000000000000000 000100000000100110110000000000 100000000100101110000000000000 000000000000000001110000100000 000000000000110001000000000001 000000000000100010100000001000 000000000000000010110000000010 000000000011100000000010000000 000000000001001010000001000000 000000000010001100001000000000 000000000000010100010000010000 000000000000010101000100000000 000000000011001000000000000100 Square Low−Density Scrambler Permutation Figure 1.10: General Structure of a Split-LDPC Parity Check Matrix Besides, the splitter can be viewed as if an incoming source bit is splitted into several coded bits, that is why splitting resembles the Quick-Look-In (QLI) property in turbo codes [11],in which two (or more) parity bits sum up to the original source bit. Let us remind that, MN codes are also obtained by LDGM followed by a splitter. The decoding graph for the split-LDPC combines parity checks of the LDPC codes, denoted by β, with those obtained from the splitter , denoted by α as shown in Fig- ure (1.9). By the same token, we illustrate in Figure (1.10) the parity check matrix related to this graph. Encoding, is realized by performing the Gaussian elimination over this par- ity check matrix. The systematic form of this matrix reveals a highly dense scrambling applied to the source bits during the encoding. The decoding process is realized , by performing over this whole graph, a source controlled belief propagation as described further in the next section. 1.4.4 Information Theoretical Comparison of Splitting and Scrambling In the present section, we analyze the mutual information of the best regular scrambling based code and show that it is smaller than that of the best splitting based code. A split- LDPC code multiplies the nonuniform sequence s by a dense matrix, generating a split vector that has distribution very close to uniform [12]. Hence, the best possible splitting based code may be close to achieving the BPSK capacity. For a regular scrambling-based code as described in section (1.4.3.1), ds systematic nonuniform bits are scrambled into a code bit. Assuming that the parity bits that are generated by the LDPC code part are uniformly distributed, the best achievable mutual information can be computed using the equations (1.10) and (1.11) where µ in (1.11) is replaced by γ . The probability γ denotes the probability of 1 in the scrambled sequence 22
  • 33. 1.4. D ESIGN P RINCIPLES FOR N ON S YSTEMATIC C ODES ON G RAPH 1.8 2.2 Gaussian input 2 Non-Systematic code, BPSK input 1.6 Scrambled ds=5 Gaussian input 1.8 Scrambled ds=3 1.4 Non-Systematic code, BPSK input Systematic codes, BPSK input Scrambled ds=5 1.6 Scrambled ds=3 1.2 Systematic codes, BPSK input 1.4 Mutual information Mutual information 1 1.2 0.8 1 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -10 -5 0 5 10 -10 -5 0 5 10 Eb/N0(dB) Eb/N0(dB) Figure 1.11: Mutual information vs. Ebr /N0 for Hs = 0.5 and coding rates R = 0.5 (left) and R = 0.8 (right). u. For a regular scrambler of degree ds , it can be obtained, using Gallager’s lemma [34] : γ = 0.5 1 − (1 − 2 µ)ds (1.31) For an irregular scrambler, similar computation can be done, considering all the different scrambling degrees. The Figure (1.11) shows curves of the best possible mutual information versus Ebr /N0 for the different methods discussed above for a nonuniform source with µ = 0.1 with coding rates R = 0.5 and R = 0.8. The BPSK input curve bounds the mutual informa- tion of the splitting based code. As both graphs show, a splitter based code is better than the scrambler based codes, and both classes have gains over systematic codes. We ob- serve that as the scrambler degree increases, the scrambler approaches the behavior of the splitter. This is expected, since the scrambler is converging to the splitter behavior in the equation (1.31) where γ → 1/2 when ds is increasing. The Figures (1.2) and (1.12) show the theoretical minimum Ebr /N0 as a function of the source entropy for channel code rates R = 0.5 and R= 0.8 respectively. The curves in both cases support the advantage of non-systematic codes, and in more particular way the supremacy of splitting based codes. Moreover, the loss in using systematic codes is shown to increase significantly as the non-uniformity of the source increases. Additionally to point out that for high rate codes there is a significant benefit in using non-systematic codes, the Figure (1.3) reveals also the improvement of scramblers with the increase of ds . Conversely, in most of finite length cases, we find out that a scrambler/splitter with an improper degree for a distribution can degrade the performance from that of system- atic codes, even if the considered degree value is high enough . This fact indicates that the degree of the scrambler/splitter should be optimized for each value of µ. These results, despite being incoherent with the theoretical limits described above, might be interpreted with the increasing of cycles’density in the lower subgraph re- lated to the scrambler/splitter. Actually, the increase of ds for a constant code length, 23
  • 34. 1. N ON S YSTEMATIC C ONSTRUCTIONS OF C ODES ON G RAPH FOR R EDUNDANT DATA 4 2 0 Minimum Achievable Eb/N0 (dB) -2 -4 -6 -8 Systematic code, BPSK input Scrambled ds=3 -10 Scrambled ds=5 Non-Systematic code, BPSK input Gaussian input -12 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Source Entropy (bits) Figure 1.12: Minimum achievable Ebr /N0 vs. source entropy Hs Coding rate R = 0.8. induces an increase of edges locally in this lower subgraph for a constant number of nodes (R × N ), involving doubtlessly more cycles. In this section, split-LDPC have been revealed as being theoretically the best non systematic LDPC structure. Moreover, The simulations results, presented in section (1.6), show that gains of about 1dB are achieved when the scrambler is replaced by a splitter. Accordingly, we expect the MN codes to outperform scramble-LDPC, since MN codes belong to the split-LDPC general structure. 1.5 Source Controlled Sum-Product Decoding The sum-product algorithm is a generic message-passing algorithm, that operates over a factor graph in order to compute various marginal functions associated with a global function [46].A wide variety of algorithms developed in the artificial intelligence, signal processing, and digital communication communities may be seen as specific instances of the sum-product algorithm, operating in an appropriated chosen factor graph. One important subclass of the sum-product algorithm is the Pearl’s powerful belief propagation algorithm , which operates by message-passing over a Bayesian network, in- cluding as instances the iterative decoding algorithms related to LDPC and turbo codes as it was shown in [56],[86]. In his original work; Gallager has already proposed a simplified version of the sum- product algorithm to decode LDPC codes [33] consisting on the propagation of beliefs or probabilities between variable nodes and checknodes. This messages exchange is the reason why this kind of algorithms are also referred to as Message Passing algorithms. Without going into much detail,we define the nature of the messages and the sim- plified update rules on them, for our class of non systematic LDPC in presence of non uniform sources. 24
  • 35. 1.5. S OURCE C ONTROLLED S UM -P RODUCT D ECODING The most common message type used in the literature is the log-likelihood ratio (LLR) p(x=0|y) namely log p(x=1|y) , this is because it is more appropriate to computer implementation, since we can represent probability values that are very close to zero or one without caus- ing a precision errors, moreover its update rules are quiet simple. The message passing algorithm is an iterative algorithm, where in each round mes- sages are sent from message nodes to checknodes, then from checknodes back to message nodes. In order to describe the algorithm let us consider the following notations: • The bipartite graph describing our non systematic LDPC codes with three type of message nodes ci ∈ {u, ϑ, s}and two types of checknodes P CEi ∈ {α, β} (see Figure (1.15)). • Sci is the set of checknodes involving the message node ci . • SP CEi is the set of bitnodes involving the checknode P CEi . • LLR0 denotes the channel observation LLR. 1−µ • LLRs = log( ) denotes the LLR related to the source statistics. µ • LLRci →P CEi denotes the LLR coming from a message node ci towards a checknode P CEi , respectively, LLRP CEi →ci denotes the LLR coming from a checknode P CEi towards a message node ci . The bitnode to checknode step P CEj ϑ s Extrinsic Information Observation LLR0 A priori LLRs Figure 1.13: The local message node to checknode update rule. The variable node to checknode update rule is written as : LLRci →P CEi = LLRtype + LLRP CEj →ci (1.32) P CEj ∈Sci −{P CEi } where : LLR0 if the bitnode type ∈ {u, ϑ} LLRtype = (1.33) LLRs if the bitnode type ∈ {s} 25