How could I automate log gathering in the distributed system
1. How could I automate log gathering
in the distributed system using Perl?
Some system programmer’s survival(!) story
-When developed Ethernet/IP modules in EPC
Core system.
3. A long time ago…
There was a S/W developer
( image : http://www.blogsolute.com/9-things-that-shows-you-a
re-still-a-rookie-in-blogosphere/4037/ )
4. Happy Life
He worked in the ‘S’ company
Actually, ‘S’ company has heavy workloads
Coincidently, the team, which includes him, had
many free time
Image
http://photo2.si.
edu/150now/15
0visitors.htm
5. Then one Day,…
He got a transfer to the EPC Core system
develop team
Image http://thetechtiger.blogspot.kr/2010/12/newbie-spirit.html
6. EPC?
Evolved Packet Core
It is core system for LTE service
http://www.iphase.com/products/lte_about.cfm
7. Would U want know EPC/Lte?
But, It is out of scope in this seminar
8. More over…
I didn’t know detailed EPC/Lte technology
I still don’t know EPC/Lte technology
http://jarielsmith.blogspot.kr/2012/05/l
ost-my-wallet.html
9. My Duty
Developed network device drivers
◦ Ex) Switch
Modify network layer in Linux kernel
L2/L3 protocols handling
11. First Challenge
The EPC system is first challenge in the
company
◦ If you are developer, you will understand it’s meaning
fully
http://emenshealth.design.co.kr/in_magazine/sub.html?at=view&p_no=&info_id=45762&c_id=00010006
12. Problem of human resource
A lot of hundreds engineers are involved
in this project.
Overall, it seems to be not bed
13. But,…
OS/DD team was consist of 3 senior
engineers and one newbie
◦ Specially, Device drivers & L2/L3 protocols
Only One Guy
http://wkstudio.bigcartel.com/product/really-onesie
15. Lack of useful tools
When early develop stage, useful
tools were not ready
http://www.drillspot.com/products/106902/brady_worldwide_inc_652
90_lockout_tool_box_no_lockout_devices_included
16. Basically
Network Core System is huge,
complex and difficult
One more see !!
http://www.iphase.com/products/lte_about.cfm
17. And (Just in my feeling)
It was horrible & heavy work
18. Anyway
I solved many difficult problems
◦ I survived finally
20. Firstly
We should get basic understanding about
system architecture
http://depositphotos.com/5735004/stock-illustration-School-chalkboard.-Hand-
Drawn-Design-Element.html
22. Importance in EPC system
Each services are distributed
Must provide fail-over & none stop
service
◦ HA(High Availability)
23. The basic composition of system
Management boards
◦ Master & secondary master board
If master board is failed, secondary master take management
role quickly(H.A)
Each service boards
◦ Variety call/protocol service
Other service connections
◦ Ex)AAA
All boards are connected with gigabit ethernet.
“I can’t tell detailed & exact contents because of security reason”
24. Shape of physical system
* This is just reference for
system image
http://www.compelgroup.net/english/10_06_advanced_tca_chassis.htm
25. Let’s Imagine!!
In this architecture,
http://www.cinema4d.co.kr/freeboard/901145
26. If some problem is occurred,
How to debug it?
http://www.wpclipart.com/computer/humour/debugging.png.html
28. Variety reason - configuration
Mistyping
◦ Ex)Illegal number
Just mistake
◦ Someone changed physical configuration without
notice when some batch work is processed
Application problem
◦ Shell, reporter, statistics Apps
Misconfiguration
◦ Tester’s misunderstanding for network/service
33. Show me the LOG!!
Variety status information, error/warring
messages, some dump and blabla…
◦ These are stored in the system as log file
form
34. When finished stage…
Many utilities and shell commands
are provided
http://berxblog.blogspot.kr/
35. But, the early days,
Collect variety logs from each
board manually
http://blog.naver.com/PostList.nhn?blogId=alwkcjstk
36. More Limitation
Per chassis, only management boards
have public IP address and connected to
external network
Other boards have just private IP address
and it is connected from M.G board only
37. Limitation(cont.)
User only could login to service boards
from M.G board
http://www.doyletics.com/mrules.htm
38. Sometimes
I should directly execute some
debugging tool to get specific
register values on the each board
◦ Ex) PHY, Switch, etc.
For Switch ASIC,
◦ It has huge registers set and
complexity
39. That job..
It was very troublesome
Needed a lot of time
http://www.nemopan.com/2650088
40. More sad story
If some hang-up or service fail is
occurred,
http://www.bazaardesigns.com/8035-glossy-burning-fire-flame/
41. OS/DD team had to clarify it firstly
Yes, I was involved this team
Yes, only 3+1 humans
42. How to automatically
Login to each board
Find & check files
Transfer log files
Check change of system
Execute external command or
application then get result from it
Extract some data from log files
Etc.
46. CPAN Is
http://www.pixmac.kr/picture/%EB%B3%B4%EB%AC%BC
+%EC%83%81%EC%9E%90/000039689131
47. There are many useful modules in CPAN
Net::Telnet
Net::SSH
Net::Ping
Net::FTP
Net::SFTP
Blabla::Bla
48. But I want to …
Integrate all these
Execute external commands/tools
interactively
Fix some little issues for the CPAN
module
◦ Some modules had bug or weakness
Ex) Ping module had ICMP bug
◦ Some feature was not implemented
49. Yes! I found
Expect
◦ http://search.cpan.org/~rgiersig/Expect-1.21/
50. Expect!!
Expect is TCL based application
◦ I don’t want to learn Tcl language
Expect module is perl port
51. Simple Usage
Load module
Run external application
Control timeout
Detect prompt/result with pattern
Execute command
52. Simple Usage(cont.)
use Expect;
# ==========================
# prepare something
# ==========================
my $Agent = Expect->new( $externlApp, $params )
or die “blabla” ;
$Agent->expect( $timeout, $some_pattern);
$Agent->send($some_command);
# ========================
# do something more
# =========================
$Agent->expect($timeout, $some_pattern4prompt);
$Agent->send($exit_command);
$Agent->soft_close();
53. Sorry!!
Now I don’t have this code
So I can’t show it
http://best-messages.blogspot.kr/2010/12/best-sorry-sms-how-to-
say-sorry-with.html
55. Chassis 0 Chassis 1
Slot Slot Slot Slot
#0 #n #0 #n
IP table
Log aaa
Log bbb
Arp
Device info B
Device info A
System start time
56. Cost
All modules are free
I just consumed 2 hours to write
codes
◦ considering all exceptional cases
◦ looking for patterns about login
prompts and result of external Apps
◦ include testing & debugging time
57. Benefit
I needed 15~20 Min to get all logs
from all boards
just few seconds in regular case
this was often work
58. Benefit
Execute batch process every night
◦ We tested new service or release s/w
in every night
◦ My this solution was used in few days
◦ Before long, other reporting tool was prepared
59. Thanks Perl
Perl had helped me to save my life
from many dirty & annoying works
http://www.e-cute.net/super-happy-baby-with-a-super-happy-camel/