1. Traffic Characterization for Multicasting in NoC
V.Laxmi1 , Roopesh Chuggani2 , M.S.Gaur3 , Pankaj Khandelwal4 , Prateek Bansal5
Department of Computer Engineering
National Institute of Technology
Jaipur
{vlaxmi |gaurms }@mnit.ac.in,{roopesh.chuggani2 |pankaj1394 |prateekbansal.895 }@gmail.com
1 3
Abstract—NoC (Network on Chip) is an emerging paradigm one core to another. Traffic modelling has been proposed as an
for design of VLSI/ULSI circuits to overcome communication open area of research in recent papers [7]. Most evaluations
bottleneck of traditional bus based systems. NoC communica- and analysis of NoC design parameters are still based on basic
tion framework consists of regularly placed routers, which are
connected to processing cores. NoC performance is determined synthetic traffic patterns such as CBR (Constant Bit Rate),
by latency and throughput for communication requirements. bursty, bit-complement, transpose, etc. These traffic patterns
NoC communication traffic modelling plays an important role do not capture real-world scenario as each of these patterns
in design of NoC simulators and/or prototypes. This paper comprise of only point-to-point communications, i.e. for each
presents a framework for modelling source traffic for multipoint source there is only one destination. Traffic modelling of
communication from one source to different destinations as is
required for multicasting. Such a traffic model captures real- multicast communication for NoC is still in infancy.
world scenarios such as multicasting, execution of concurrent In multimedia applications such as NoC design for modules
multiple tasks on a single core (each task requiring commu- of MPEG encoder/decoder, point-to-multipoint communica-
nication with different destinations). The model proposes how tion patterns are also needed as experienced by authors while
concurrent traffic streams from a single core to different desti- extending capability of an NoC simulator. This requires gen-
nations can be mathematically characterized as a single stream
at source end. The model is derived from statistical behaviour eration of multiple traffic streams originating from the same
of probabilistically demultiplexing of a single traffic stream. In source but destined for different cores. A similar traffic pattern
its nascent stage, the method is proposed for a scenario of one is observed when a core is running concurrent tasks; each task
source concurrently communicating with two destinations as shall requiring communicating with different destination.
be required for mapping two concurrent tasks to same core or In this paper, we propose how multicast communication,
simultaneous broadcast to two destinations.
Index Terms—Network on Chip, Multicasting, Bursty Traffic,
i.e. multiple traffic streams originating at the source, can be
Probabilistic Demultiplexing, Exponential Distribution viewed as a single traffic stream without any adverse impact
on statistical characteristics of destination traffic streams. The
I. I NTRODUCTION model is derived from observations of statistical behaviour of
received streams at destinations in a single source multiple
VLSI designs are increasingly becoming more complex with destinations scenario. Till now, to the best of our knowledge,
increase in scale of integration resulting in more components no traffic model has been proposed to accurately characterize
being fabricated on the same chip. With resultant increase in this scenario. In this initial work, we present model for
the number of processing cores (CPU, DSP, memory, etc.), two destinations. This can be used as basis for n(n > 2)
increased inter-core communication requirement cannot be destinations.
satisfied by the traditional bus based communication archi- The model is based on the observation that probabilistic
tecture [1], [2]. Network on Chip (NoC) has been proposed division of a bursty traffic stream into two separate streams
as an alternative [3]. NoC provides a communication layer results in both streams being bursty. Burst parameter of each
of regularly placed, interconnected routers. Inter-core com- stream is related to the that of the original stream. The
munication takes place through these routers. Decoupling of proposed traffic model has been implemented and tested on
communication and computation simplifies IC design process. an open source NoC simulator NIRGAM [8].
Regularity in NoC structure results in better scalability and This paper is organized as follows: In Section II, we present
fault tolerance [2], [4]. Because of its modular structure, many the background survey in this field. In Section III, we present
components can be reused from previous designs resulting in objectives of the presented work and motivation for proposed
reduced time to market for new NoC designs. traffic model. In Section IV, we derive how statistical charac-
NoC design parameters include topology selection, router teristics of traffic streams received at destinations are related to
design and choice of routing function. A NoC simulator can those of the source traffic. These relationships are derived from
assist the designer in evaluation of different NoC designs. observations of experiments conducted. Section V describes
One important aspect of simulator design is characterization of NoC simulator NIRGAM, on which the proposed model is
inter-core traffic. Traffic modelling of the cores is an important implemented, in brief. In Section VI, implementation of the
step in NoC design [5], [6]. Traffic models are mathematical proposed model on NIRGAM is described. Experimental result
characterization of statistical properties of data flowing from are presented in Section VII followed by conclusions and
978-1-4244-8971-8/10$26.00 c 2010 IEEE
2. pointers for further extension in Section VIII. 0
1 2 3
II. R ELATED W ORK
7
Applications needs to be mapped to the underlying NoC 4 5 6
architecture by dividing their functionality of the application
into smaller tasks. Each task is mapped onto one NoC core.
8 9 10 11
Many algorithms for mapping these tasks on to IP core have
been proposed [9]–[11]. In each of previous work, a single
task is mapped onto one IP core. Most of the past work has 12 13 14 15
been done to map a single application onto the underlying
network. In [9], the tasks of a process control platform are 0
mapped on to NoC cores in one to one manner. In [11], Hu et IP Core Task Data Flow
al propose an energy constrained mapping of communication
task graph to a NoC. This work considers single task per core. Fig. 1: NoC Architecture with Multiple Task per core
NoC evaluation is based on the assumption of mapping sin-
gle task per core and point-to-point traditional traffic patterns
like bit complement, transpose [3]. This type of communi- statistical characteristics of traffic received at the destinations.
cation is limited to only few applications, because rarely a Following are the assumptions for our model.
node communicates with just a single node or with all the 1) There is one source and two destinations. This can
other nodes in the network. For modelling a multicast (point happen when at most two traffic streams are emanating
to multipoint) scenario, uniform random traffic is used by on a single core.
selecting a random destination for each packet; probability of 2) Each stream (task) is generating Bursty traffic; average
each destination being selected is same. In [12], a new traffic OFF time of this traffic is modelled using exponential
pattern is proposed to create the scenario where tasks with distribution.
higher intertask communicating tasks are mapped to cores in 3) Traffic model is independent of burst size (Number
adjacent regions. In this traffic pattern, communication is point of packets in a particular burst). Experimental results
to point but, traffic is distributed to multiple destinations. suggest that traffic statistics appears to be independent
These traffic patterns cannot model the point to multipoint of burst size. Details are discussed in Section VII.
traffic generated by multiple tasks executing on a single
core. This is because when we map multiple tasks on single We define following parameters for our traffic model :
core, traffic of the core is composed of the individual traffic 1) mc : Average (Mean) OFF time of the traffic generated
generated by each tasks. Each individual traffic stream can by the core node.
have different statistical properties and destination pattern. But 2) p1 : Probability that packet is destined for first destina-
traditional traffic generators do not provide functionality for tion.
such a communication. 3) p2 : Probability that packet is destined for second
destination
III. M OTIVATION 4) mt1 : Average (Mean) OFF time of the traffic received
In this paper, we try to model point-to-multipoint source by first destination.
traffic pattern given the statistical behaviour of traffic received 5) mt2 : Average (Mean) OFF time of the traffic received
at the destinations. This will result in multiple traffic streams by second destination.
emerging from same core. Each traffic stream may have a Our model is based on the observation that when a bursty
different destination and is likely to have different statistical traffic generated using exponential distribution with average
properties. OFF time as mc is demultiplexed probabilistically into two
Figure 1 shows one such scenario in an NoC of size 4 × 4 traffic streams, demultiplexed traffic streams still follow expo-
wherein cores are numbered 0 to 15. Core 0 is multicasting nential distribution. Average OFF time of each stream/task is
to cores 9 and 10 respectively. Core 7 is multicasting to cores mt1 and mt2 respectively. Probabilistic demultiplexing means
10 and 12 respectively. There is one unicast communication that each packet is assigned to one of the streams/tasks as per
from core 15 to core 13. probabilities (p1 , p2 ). A random number is generated and if
it is less than p1 this burst of packets belongs to first stream,
IV. P ROPOSED MODEL otherwise to second one.
The main objective of the work presented here is to deter- We investigate dependence of mt1 and mt2 on mc , p1 , p2 .
mine how a point-to-multipoint traffic pattern can be modelled
at source end. We need to derive statistical characteristic of the A. Bursty Traffic Model
traffic at source given traffic characteristics at the destination. Bursty traffic is modelled using exponential distribution [8].
For such a derivation, we first consider the inverse of the Both inter packet interval and packet size follow exponential
objective. Given source traffic characteristics, what are the distribution. We are concerned only with inter packet intervals.
3. Exponential distribution is parametrized by average value of To verify this observation, we generated and demultiplexed
the distribution denoted by m. The probability density function traffic for multiple values of mc . One such instance is shown in
(PDF) of an exponential distribution is Figure 2. Here, Figure 2(a) shows the probability distribution
x
1 −m of original trace with m = 30 while Figure 2(b) shows PDF
me , x≥0 of one of the demultiplexed trace with probability 0.6. As can
f (x; m) = (1)
0, x<0 be seen, both approximate to exponential distribution.
m is also known as expected value of the distribution. Fol- C. Deriving the relation
lowing variables are required in the traffic model
To seek relationship between mc , mt1 and mc , mt2 , we
B. Observation of Demultiplexed Trace generated and demultiplexed traces for various values of mc
We generated a traffic trace with a random average OFF and calculated the values of mt1 and mt2 . It was found
time mc . This traffic trace was divided into two different that average OFF time of traffic generated by each stream
traces using probabilities (p1 , p2 ). The PDF of the original is directly proportional to average core OFF time.
trace was exponential as expected. PDFs of each demultiplexed mt1 ∝ mc (2a)
trace was observed to follow similar exponential distribution.
This observation was significant because it meant that we can mt2 ∝ mc (2b)
generate two different exponential distributions from a single
distribution by probabilistically demultiplexing.
100
Offtime of task 1 (mt1) with probability 0.4
120 90 Offtime of task 2 (mt2) with probability 0.6
Average Off time for tasks
80
100
70
80 60
Frequency
50
60
40
40 30
20
20
10
0 5 10 15 20 25 30 35
0
Average Off time at Core
0 50 100 150
Value of inter packet time
Fig. 3: mt1 v/s mc and mt2 v/s mc
(a) Original
Figure 3 shows the plot of average OFF time of core and
70
of demultiplexed traffic streams. On X axis is the average
60
OFF time of core (mc ), while on Y axis is the OFF time
of both streams. As can be seen, the curve comes out to be
50 approximately linear, hence showing direct proportionality.
Next, we deduce the relationship between the mt1 , mt2 and
Frequency
40
p1 , p2 . To achieve this we kept the mc constant and probability
of generation was varied from 0.1 to 0.95 (p1 + p2 = 1). It
30
was found that average OFF time of traffic generated by each
20
stream is inversely proportional to respectiveprobability.
1
10 mt1 ∝ (3a)
p1
0
0 20 40 60 80 100 120 140 150 1
Value of inter packet time mt2 ∝ (3b)
p2
(b) Demultiplexed The Figure 4 shows the plot of mt1 versus the
probability(p1 ) for mc = 50. Probability is on the X-axis
Fig. 2: (a) PDF for Original Trace, (b) PDF for a demultiplexed and average OFF time is on the Y-axis. As can be seen
trace (probability= 0.6) from the plot, curve precisely shows the inverse relationship.
4. 400 450
400 Actual offtime for source offtime 15
350
Analytical offtime for source offtime 15
Actual offtime for source offtime 25
350
Average Off time (mt1)
300 Analytical offtime for source offtime 25
Actual offtime for source offtime 35
Average Off Time
300 Analytical offtime for source offtime 35
250
250
200
200
150
150
100
100
50 50
0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability (p1) Probability
Fig. 4: Variation of mt1 w.r.t p1 Fig. 5: Analytical v/s actual OFF time of Task 1 for different
values of mc
As the probability approaches unity the case reduces from
900
point-multipoint scenario to point-point scenario and mt1 Actual Off time for source off time 35
Analytical off time for source offtme 35
approaches mc . While for other destination, it attains a very 800
Actual Off time for source off time 25
high value. Using Equations (2a), (2b), (3a) and (3b) with 700
Analytical off time for source offtme 25
Actual Off time for source off time 15
curve fitting of both the curves, empirical relationship between Analytical off time for source offtme 15
Average Off time
600
average OFF time for each was derived as:
500
1
mc + p2 + c1
mt1 + c2 (4) 400
p1
300
1
mc + p1 + c3 200
mt2 + c4 (5)
p2
100
c1 , c2 , c3 , c4 are constants. In our case, when curve fitting 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
was applied following values were obtained c1 = c3 = 6 and Probability
c2 = c4 = −6.
Verification of the Equations (4), (5) is performed in two Fig. 6: Analytical v/s actual OFF time of Task 2 for different
steps. We calculate average OFF time of traffic generated by values of mc
each stream in two ways :
1) The values of mt1 and mt2 is calculated from the demul-
tiplexed traces obtained with different values of p1 , p2 V. NIRGAM
and mc . These values are referred to as ‘calculated’ or Network-on-chip Interconnect Routing and Application
actual OFF time from trace. Modelling (NIRGAM) [8] is a discrete event, cycle accu-
2) For all the corresponding values of p1 , p2 and mc , values rate simulator targeted at Network on Chip (NoC) research.
of mt1 and mt2 is calculated using Equations (4) and (5). NIRGAM is written in SystemC, which is a dynamic library
These values are referred to as ‘analytical’ OFF time. for hardware modelling built on top of C++. NIRGAM allows
Analytical and actual values are plotted on same figure to users to change various options in terms of NoC simulation
verify the derived Equations (4) and (5). Figures 5 and 6 show at every stage such as routing algorithm, topologies, virtual
the result of verification. The results have been shown for channels, buffers etc. Simulation framework allows analysing
different values of mc to verify our model for a range of results in terms of various performance metric such as latency,
core OFF time values. On X- axis is the probability of traffic throughput etc. Orion [13] has been integrated into NIRGAM
generation and transmission for each stream and on Y axis is and allows users to creating and analysing power estimation
the OFF time of the traffic generated for that stream. As can be graphs. NIRGAM provides support for fault tolerance [14] and
seen from the Figures 5 and 6, values from analytical formula QoS [15].
very accurately estimates the actual OFF time calculate from NIRGAM supports 2D mesh and 2D torus topologies. Rout-
demultiplexed trace. ing in NIRGAM is done using flits. These are the units that
5. flow between routers. NIRGAM support wormhole switching of mt1 and mt2 while last two columns represent values
mechanism. Presently it supports a number of routing algo- calculated from traces generated by our traffic model. It can
rithm such as XY, OE, DyaD, source, Q-routing, MaXY and be observed that calculated values and input values are nearly
PROM. A large number of options are available when it comes equal.
to traffic modelling in NIRGAM as it supports various type
of traffic patterns such as Hotspot NED [12] as well as traffic
injection models. TABLE I: Calculated vs Input mean OFF time
Other user configurable parameters in NIRGAM are virtual Input OFF Calculated mc Calculated
time Probability OFF time
channels i.e. number of virtual channels per physical channels, Task1 Task2 p1 p2 Task1 Task2
buffer size of an input channel, clock frequency. All these 16 25 0.60 0.40 4 15.4 22.2
parameters can be specified in the configuration file of the 20 40 0.66 0.34 8 21.3 43.0
16 16 0.50 0.50 3 17.1 18.0
NIRGAM before starting the simulation. 15 20 0.56 0.44 3 15.3 20.6
10 20 0.65 0.35 1 12.8 22.7
VI. I MPLEMENTATION OF P ROPOSED M ODEL 30 10 0.26 0.74 1 32.7 10.9
As discussed in Section IV, given the values of mc , p1 , p2
we can calculate mt1 and mt2 using Equations (4) and (5).
We ran simulation for different values of the flit interval.
Though for implementing the proposed traffic model as a
Simulation was done for three values of flit interval – 2, 4 and
traffic generator in any simulator it is desired that mt1 and
8 clock cycles. Results are shown in Table II. It is observed that
mt2 should be the input parameters. Different values of these
mean OFF time calculated from generated trace is independent
average OFF time will represent different classes of streams.
of the flit interval. Hence, proposed traffic model can be used
To derive values of mc , p1 , p2 for given values of mt1
with different flit intervals.
and mt2 , we use Equations (4) and (5) and the fact that
p1 + p2 = 1 along with the derived values of c1 , c2 , c3 and
c4 . A generalized version of the equation needed to solve for TABLE II: Calculated vs Input mean OFF time for different
p1 is shown below in Equation (6). Flit Intervals
Input Off time Calculated OFF time
Flit Interval = 2 Flit Interval = 4 Flit Interval = 8
Task1 Task2 Task1 Task2 Task1 Task2 Task1 Task2
p3 (mt1 + mt2 + 12) − p2 (mt1 + 2 ∗ mt2 + 18) +
1 1
15 20 15.8 20.0 16.2 19.0 15.6 20.2
p1 (mt2 + 8) − 1 = 0 (6) 11 30 11.0 31.4 11.2 29.7 11.4 29.1
8 11 8.7 11.4 8.7 11.5 8.6 11.6
Equation (6) has three possible roots, the one between 0 18 18 17.8 18.5 18.5 18.4 18.0 18.2
and 1 is selected as probability values are in range [0 · · · 1].
Computed root is assigned to p1 and p2 is computed as 1−p1 .
mc can be calculated using Equation (4).
When implementing the traffic model in NIRGAM values TABLE III: Calculated vs Input mean OFF time for different
of mt1 and mt2 are read from a configuration file. Using these Burst Length
values Equation (6) is solved for p1 using bisection method Input Off time Calculated OFF time
[16]. Once mc , p1 , p2 are known mc is used to generate Burst size = 4 Burst size = 8 Burst size = 12
Task1 Task2 Task1 Task2 Task1 Task2 Task1 Task2
bursty traffic. Each time a new burst starts a random number
is generated in range [0 · · · 1]. If the generated number is less 15 20 14.8 20.2 14.6 19.1 14.5 18.9
than p1 , first stream is allowed to transmit i.e. destination is 11 30 11.4 31.6 11.3 28.6 11.0 30.6
8 11 8.6 11.6 8.3 11.9 8.0 12.0
chosen according to first stream for the current burst, otherwise 18 18 18.3 17.2 17.2 18.4 18.4 18.4
destination is chosen according to second stream.
VII. E XPERIMENTAL R ESULTS
Simulation was run with different values of the burst size.
We ran NIRGAM simulator for different values of mt1 and We have used three values of burst size – 4, 8 and 12 packets.
mt2 on 4 × 4 mesh topology. Traffic model was attached to Results obtained are shown in Table III. Calculated mean
core 0 and two destinations were cores 7 & 10 respectively. OFF time from trace is independent of the burst size of the
Traffic was generated for 5000 clock cycle and simulation was traffic. This observation allows use of different burst sizes for
run for 8000 clock cycles. Number of virtual channels were modelling different streams/tasks.
eight.
To verify the traffic model, input values of mt1 and mt2 VIII. C ONCLUSION
(values read from configuration file as specified by the user) This paper presented a traffic model for multicast communi-
are compared with values calculated from demultiplexed trace. cation in NoC. This also models traffic scenario of concurrent
These values along with calculated values of mc , p1 and p2 tasks mapped to same core; each task requiring communication
are shown in Table I. Columns 1 and 2 show the input values with different destination. Mapping multiple tasks on a single
6. NoC core will reduce the size of NoC chip and the cost [15] K. K. Paliwal, J. S. George, N. Rameshan, V. Laxmi, M. S. Gaur,
and shall provide more optimal use of network resources. To V. Janyani, and R. Narasimhan, “Implementation of Q O S aware Q-
routing algorithm for network-on-chip,” in Communications in Computer
further analyse this concept of the multicasting/multitasking, and Information Science, 2009.
we provide a traffic model under the assumption that each task [16] A. Eiger, K. Sikorski, and F. Stenger, “A bisection method for systems
generates bursty traffic. For point-multipoint communication, of nonlinear equations,” ACM Trans. Math. Softw., vol. 10, no. 4, pp.
367–377, December 1984.
the core can be viewed as generating a single stream with a
fixed average OFF time. This burst is probabilistically demul-
tiplexed into two streams. The probabilities for demultiplexing
are calculated based on specified average OFF time of traffic
generated by each communication stream. Traffic model is
implemented and verified on an open source NoC simulator
NIRGAM. Multicast traffic model is independent of inter-flit
interval and burst size. In this paper, we have presented a
novel model for simultaneous broadcast to two destinations
but the model can be extended to n(n > 2) destinations. In
latter case, the solution will require numerical method. Further
analysis of the performance of the various routing algorithms,
topologies under other traffic distributions shall be part of our
future work.
R EFERENCES
[1] L. Carloni, P. Pande, and Y. Xie, “Networks-on-chip in emerging
interconnect paradigms: Advantages and challenges,” in Networks-on-
Chip, 2009. NoCS 2009, may 2009, pp. 93 –102.
[2] L. Benini and G. D. Micheli, “Networks on chips: A new soc paradigm,”
Computer, vol. 35, pp. 70–78, 2002.
[3] W. J. Dally and B. Towles, “Route packets, not wires: on-chip intecon-
nection networks,” in DAC ’01: Proceedings of the 38th annual Design
Automation Conference, 2001, pp. 684–689.
[4] J. Duato, S. Yalamanchili, and N. Lionel, Interconnection Networks: An
Engineering Approach. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 2002.
[5] M. Ali, M. Welzl, and S. Hellebrand, “A dynamic routing mechanism
for network on chip,” in NORCHIP Conference, 2005. 23rd, 21-22 2005,
pp. 70 – 73.
[6] L. Tedesco, A. Mello, L. Giacomet, N. Calazans, and F. Moraes, “Ap-
plication driven traffic modeling for nocs,” in SBCCI ’06: Proceedings
of the 19th annual symposium on Integrated circuits and systems design,
2006, pp. 62–67.
[7] R. Marculescu and P. Bogdan, “The chip is the network: Toward a sci-
ence of network-on-chip design,” Foundations and Trends in Electronic
Design Automation, vol. 2, no. 4, pp. 371–461, 2009.
[8] “NIRGAM,” 2009. [Online]. Available: http://cse-trac.mnit.ac.in
[9] T. Ahonen, D. A. Sig¨ enza-Tortosa, H. Bin, and J. Nurmi, “Topology
u
optimization for application-specific networks-on-chip,” in SLIP ’04:
Proceedings of the 2004 international workshop on System level in-
terconnect prediction. New York, NY, USA: ACM, 2004, pp. 53–60.
[10] W. H. Ho and T. M. Pinkston, “A methodology for designing efficient
on-chip interconnects on well-behaved communication patterns,” in
HPCA ’03: Proceedings of the 9th International Symposium on High-
Performance Computer Architecture. Washington, DC, USA: IEEE
Computer Society, 2003, p. 377.
[11] J. Hu and R. Marculescu, “Energy-aware mapping for tile-based noc
architectures under performance constraints,” in ASP-DAC ’03: Proceed-
ings of the 2003 Asia and South Pacific Design Automation Conference.
New York, NY, USA: ACM, 2003, pp. 233–239.
[12] A.-M. Rahmani, I. Kamali, P. Lotfi-Kamran, A. Afzali-Kusha,
and S. Safari, “Negative exponential distribution traffic pattern for
power/performance analysis of network on chips,” in VLSI Design, 2009
22nd International Conference on, 5-9 2009, pp. 157 –162.
[13] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion 2.0: A fast
and accurate noc power and area model for early-stage design space
exploration,” in DATE’09, 2009, pp. 423–428.
[14] C. Grecu, L. Anghel, P. P. Pande, A. Ivanov, and R. Saleh, “Essential
fault-tolerance metrics for noc infrastructures,” in IOLTS ’07: Pro-
ceedings of the 13th IEEE International On-Line Testing Symposium.
Washington, DC, USA: IEEE Computer Society, 2007, pp. 37–42.