IBM uses InfluxDB to store metrics collected from nmon and Grafana to visualize those metrics. This helps IBM monitor large production servers and benchmark centers. Some key points:
- nmon was originally created 25 years ago by Nigel Griffiths to monitor OS performance but the data format and lack of central storage was limiting. nmon was updated to output JSON and line protocol for InfluxDB.
- Grafana provides various visualizations of the metrics stored in InfluxDB like donut graphs, line graphs, heat maps, and single stat/graph panels. This helps identify issues like busy periods and system bottlenecks.
- Ideas were discussed for better visualizing periodic trends like busy Fridays or batch over
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Production Servers and IBM Benchmark Centers
1. Discover How IBM Uses InfluxDB and Grafana to Help Clients
Monitor Large Production Servers and IBM Benchmark
Centers
Nigel Griffiths
Advanced Technology
Specialist
IBM Power Systems
Ronald McCollam
Solutions
Engineer
Grafana Labs
Russ Savage
Director of Product
Management
InfluxData
3. The Grafana Philosophy
Observability is owned by an entire organization
No one tool can do all things
for all people. Each tool has
specific features and its own
niche where it is best of
breed.
Grafana Labs strives for an
open and composable
solution to unite data across
the great technologies you
selected and deployed.
By unifying your existing
data, wherever it lives, we
help deliver unprecedented
insights, while maintaining
choice, and flexibility.
4. The analytics platform for all your metrics
Grafana allows you to query, visualize,
alert on and understand your metrics
no matter where they are stored. Create,
explore, and share dashboards with your
team and foster a data driven culture.
Trusted and loved by the community.
5. Grafana: Center of an Open and Composable Observability
PlatformOur products have begun to evolve to unify into a single offering: the world’s first composable open-
source observability platform for Metrics, Logs and Traces. Centered around Grafana.
This allows our customers to get insights from their existing vendors, use our curated stack, or both.
This level of interoperability and choice is an industry first. More than just a combination of telemetry
data, the platform unifies all aspects of observability into a seamless and contextual experience that
feels magical.
2014
Grafana Labs, was created
to accelerating the adoption
of the open source Grafana
software as well as building a
sustainable business around
it
2016
Grafana Enterprise,
which offers features
needed by
enterprise-level
organizations, is
created
2017
Grafana Cloud, a
fully managed
metrics platform
supporting Graphite,
is created
2020
Open and
composable
observability platform
with Grafana at the
center
2018
Grafana Loki, a
Prometheus-inspired
log aggregation
system, is launched
at KubeCon.
7. Flux Support
Now available!
In addition to default InfluxQL support,
now Flux support is available!
(Released in Grafana 7.1)
This allows the full power of Flux queries
to be executed against InfluxDB ≥ 1.8.
8. USAGE ANALYTICS
Helps large companies
get better insight into the
behavior and utilization of
their users, dashboards,
and data sources
UNIFIED DATA
MODEL
Viz can come from
data/context, not manual.
Smart viz based on data
(min/max/mean graphs,
etc). Basic BI.
TRACING
Add trace UI to show traces
from tracing data sources and
Jaeger datasource within Loki
to reduce mean time to
resolution
CUSTOM TIME
FORMATS
AND TIME ZONE
SELECTION
Better UZ and viz options.
Better selection of viz with live
previews.
NEW VIS AND PANEL
EDITING
Enhanced UX and visualization
options for better consistency and
usability including a new table panel,
a new grid layout engine and an
improved experience for editing
panels
NEW TABLE AND GRAPH
PANELS
Move to React enables more
reusability of components,
scaling of multiple stats. New
single stat and bar graph already
available
PLUGINS
PLATFORM
Advanced platform so
users can easily create
new Plugins faster and
AWS
CLOUDWATCH
LOGS
Added support for AWS
CloudWatch Logs
TRANSFORMATIONS
The new Transformations
capabilities allow users to go
beyond data visualization and
transform all types of data
Now Available
9. Unified Data Model
A new unified data model makes Grafana more consistent and easier to use because it provides users
with a consistent way to define data sources, conventions, user defaults, and override rules
Previous Versions of Grafana
Each visualization had slightly different ways to define
options
Grafana 7.0
Consistent UI for specifying override rules and is extensible for custom panel
specific options
Singlestat
Options
Table
Override
Rules
Graph
Threshold
s
This new option architecture and UI will make all panels have a consistent set
of options and behaviors for attributes like unit, min, max, thresholds, links,
10.
11. Plugins Platform
The new Plugins Platform makes it easier for all
Grafana users to build high-quality plugins
exponentially faster.
In the new Plugins Platform users will find:
● A new React component library which provides a
consistent framework that makes it easier and faster
for users to create Plugins
● New tools for building Plugins via the
@grafana/toolkit which delivers a simple CLI that helps
plugin authors quickly scaffold, develop, and test their
plugins without worrying about configuration details
● New data formats based on a more generic structure
so they can return different types of data like non time-
series data such as JSON or static resources (i.e., that
enable users to create panels and dashboards from
non-time-series data The new @grafana/ui components library is documented with
Storybook (visual documentation) and is available on NPM.
12. Tracing
Grafana 7.0 now includes full native support for
trace data so users can understand how a single
trace has traveled through distributed system and
troubleshoot issues faster
Users can use tracing in Explore either directly to
search for a particular trace or you can configure
Loki to detect trace IDs in the log lines and link
directly to a trace timeline
With tracing, Grafana now has a full observability
solution allowing users to achieve a seamless
and unified experience that connects and
visualizes metrics, logs and traces
We are starting with an integrated tracing for two new built-in data sources: Jaeger
and Zipkin
13. Transformations
● Users can now transform non-time series data into tables (e.g., JSON files or even simple lookup tables) in seconds without any
customization or additional overhead
● Combine non-time series data with any other data in Grafana- be it data from an external database or a panel that already exists in one
of your current dashboards
● By chaining a simple set of point and click transformations, users will be able to join, pivot, filter, re-name, and calculate all kinds of
data to quickly customize their panels
For Example:
Apply a
transformation
!
Define labels in a database Labels appear in the table as fields
15. Prometheus &
Loki Query
Inspection
expose query metrics via
the “inspect” panel to
help troubleshoot slow
queries
Grafana Q2 2020
H
2
Ease of use
improvements
more streamlined
process for getting data
into Grafana
Loki metrics-from-
logs
manipulate metric data in
LogQL and extract
metrics from logs
Alerting
improvements
alert from more data
sources, more options
for alert management
22. InfluxDB Platform
InfluxDB (Open Source)
InfluxDB Cloud (AWS, GCS, Azure)
InfluxDB Enterprise (On-premise/Own compute)
Free Forever
Everything you need in a single binary
Pay Per Use
Node Based Cloud Native
CommonAPI
Telegraf
$
$$
Client Libraries
& SDKs
Custom Apps
3rd Party
Integrations
28. • On Slack - influxdata.com/slack
• On GitHub - github.com/influxdata
• Community Office Hours
• Virtual Meetups & Summits, InfluxDays
We want your feedback! Come join us!
30. Discover How IBM Uses
InfluxDB and Grafana to
Help Clients Monitor
Large Production Servers
and in
IBM Benchmark Centers
Nigel Griffiths Advanced Technology Support, EMEA
IBM email: nag@uk.ibm.com
Open Source: nigelargriffiths@hotmail.com
@mr_nmon twitter
http://tinyurl.com/njmon - njmon sourceforge project
http://tinyurl.com/AIXpert - My 135 Blog
https://www.youtube.com/user/nigelargriffiths - 205
Grafana LabsInfluxdata
31. 350,000 people are IBMers
Benchmark Centres, Demonstrations, Services people, Cloud Offerings
Very roughly
• 1/3rd Software
• 1/3rd Services
• (technical + business)
• 1/3rd Hardware (Systems)
• (servers + storage)
One chart on
32. 1/3rd Hardware (Systems)
• (servers + storage)
• POWER (IBM chip POWER9)
• OS: Linux, AIX (UNIX), IBM i
• 192 CPU cores, 1536 HW threads
• 64 TB memory, 64 adapters
• Z (mainframe, IBM chip z15)
• OS: z/OS, LinuxONE for Linux
• Storage . . .
Second chart on
33. My claim to fame?
Started 25 years ago
nmon Nigel’s Monitor
OS performance data
On screen or CSV file
Various graphing tool
For AIX and Linux (any HW)
nmon for AIX now part of AIX
nmon for Linux open source
960,000+ downloads
34. My claim to fame?
Started 25 years ago
nmon Nigel’s Monitor
OS performance data
On screen or CSV file
Various graphing tool
For AIX and Linux (any HW)
nmon for AIX now part of AIX
nmon for Linux open source
960,000+ downloads
Things have changed
since starting nmon
- CPUs x 200,000 faster
- RAM x 1 million larger
- Network x 10,000 rate
- Disks, SSD & NVMe
- x 500,000 larger
- x 10,000 faster
- nmon file format
= quirky & !standard
37. Every possible statistic DONE
Standard format: JSON + LP
Central database: InfluxDB
Live graphs: Grafana
In 2018:
What would I do differently?
38. Every possible statistic DONE
Standard format: JSON + LP
Central database: InfluxDB
Live graphs: Grafana
JSON elastic & Splunk
LP telegraf Prometheus
In 2018:
What would I do differently?
39. In 2020:
njmon = JSON output to
njmond.py central daemon
nimon = InfluxDB Line Protocol
direct to InfluxDB
What to know more?
http://nmon.sourceforge.net/njmon
40. Wow!!
Every release is like Xmas
we get new toys (graphs)
- Even a webpage with samples
Lets talk about
Grafana!
41. Lets talk about
Grafana!
1
2
3
1. My logo = cool
2. Donut graph, yum
3. Dark mode: Helps you sleep at the desk!
4. LED graphic equaliser: draws attention to red stats
5. Button single stat and graph: high density
6. Blue Ridge mountain range graph
7. Carpet graph – see later
4
5
6
43. Any one heard of the
Dolly Parton curve?
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
44. Any one heard of the
Dolly Parton curve?
Three Crunch points
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
45. Any one heard of the
Dolly Parton curve?
Three Crunch points
TIME
CPUBUSY
PMPMAM
Lunch
AM
AfternoonMorning Batch
100%
Problems:
Averaging the day hides the three crunch points
Periodic over a day and over a week (typical busier on Friday)
Periodic over a month (end of month extra reporting) and end of year!
Batch overrun times
46. Heat map for whole days using the Grafana Carpet Plugin
This is a excellent way to determining the busy day + busy hours = first step for trend forecasting
WeekWeekWeek
47. Heat map for whole days using the Grafana Carpet Plugin
This is a excellent way to determining the busy day + busy hours = first step for trend forecasting
Heat Map Warning: There are always red parts!
WeekWeekWeek
Interesting Peaks 8 to 10
am & 2 pm
Tuesday to Friday
Busy day is Thursday
48. My to do list:
Work out how to graph CPU on
successive Fridays 8 am to 10 pm
Batch overrun can be handled
with alerts but still need trending
Ideas to nag@uk.ibm.com
Could be done in “flux” or Grafana
49. Some ideas
Fri Fri Fri Fri Friday
(1) Remove the weeds
(2) One graph with overlay
selected time periods
(3)
50. Two recent ideas:
1. Not easy to document
measures & statistics names!
[Tried to find out how many stats from Linux statd?]
2. Capturing ad-hoc stats on Big
Production Servers
Answers: AIXpert Blog
51.
52. Grafana
| CPU
| Memory
| Disks
| Network
| Kernel
| Processes
InfluxDB
Measure for AIX and Linux
Saving other statistics to the same njmon
database.
If you can get the data via a script, you can send it
on with the same njmon tags in 1/100th of a second.
Then graph OS stats & your stats at the same time.
Measure Statistics
RDBMS script:
measure* -g rdbms -G commits=986.34,rollbacks=23.1,hitratio=99.3
Sales script:
measure* -g sales -G itemsold=32984,avgcost=79.99,profit=-0.003
Users script:
measure* -g user -G online=65389,online_mins=184,click_pm=18.2
IT-tasks times script:
measure* -g tasks -G dataload=47_min,backupmin=124,batch_min=84 * Also need InfluxDB: hostname + port & Influx-DB-name
53. Pi Returning temp of Zero
Pi fell off Network
Effect of outside air
temperature rising to 32C
Raspberry Pi 3
MicroSD card
With five
temperature
probes
54. End of Message
- Thank you for your time
Feedback + ideas welcome:
nag@uk.ibm.com
or twitter @mr_nmon
or LinkedIn:
https://www.linkedin.com/in/nigel-griffiths
55. We look forward to bringing together our
community of developers to learn, interact and
share tips and use cases.
October 27 – 28, 2020
Hands-On Flux Training
www.influxdays.com/virtual-experience-2020/
November 10 – 11, 2020
Virtual Experience