7. Java ThreadMXBean to rescue..
●
The management interface for the thread system
of the JVM.
●
Java virtual machine implementation supports
measuring the CPU time for the current thread
or for any thread.
●
getCurrentThreadCpuTime() Returns the total
CPU time for the current thread in nanoseconds.
●
Does not account for thread sleep/idle time.
8. Sample Code Usage
if (Metering is enabled) { //check the system property (carbon.xml)
if(tenant!= carbon.super && context equals “services” or “webapps”) {
startCpuTime=threadMXBean.getCurrentThreadCpuTime() //get thread cpu time
}
}
//Executable code
if (Metering is enabled) {
if(tenant!= carbon.super && context equals “services” or “webapps”) {
endCpuTime = Get thread Cpu time
threadCpuTime = (endCpuTime – startCpuTime)/1000000 //to milliseconds
if(threadCpuTime> 0) Add cpuStatisticsEntry to Queue
}
}
9. CPU time of requests passing through
Tomcat to ODE
Thread Pool
Request
Request BPEL
Response Axis 2 Component
Tomcat Valves
Invoke
Tomcat Servlet Transport (Job)
Apache ODE
Internal
Thread Pool
Scheduler
Simple
10. ESB CPU Time …
Synapse
Proxy Service
In Sequence
Class
Endpoint
Mediator
Service
Consumer ServerWorker Provider
Out Sequence
Class
Mediator
ClientWorker
11. How it works ??
Thread Execution Component
CpuUsageStatisticsContainer
CpuUsageStatistics
Retreive retrieval Component
CpuUsageStatisticsEntries Queue
Thread CPU Time per Request Send
Existing
(To be used for Billing Publish Stratos Usage Agent
and Throttling) Component
14. BpsCpuUsageStatisticsContainer
Data Retriever <CpuUsageStatisticsEntry>
BPS Usage Agent Component
<<send>> SimpleScheduler
PublisherUtils
ode
Data Persister
<<send>>
Data Retriever EsbCpuUsageStatisticsContainer
<CpuUsageStatisticsEntry>
Data Retriever
ESB Usage Agent Component
ServerWorker ClientWorker
Stratos Usage
Agent
synapse-nhttp-
tranport
<<retrieve>>
CpuUsageStatisticsContainer
New Agent Component <CpuUsageStatisticsEntry>
TransportStatisticsContainer
<CpuUsageStatisticsEntry>
ThreadMXBean
CarbonStuckThreadDetection
Valve
Extensible Cpu Time Capturing Component
Tomcat Ext
15. Why different … ??
✔ Every Request does not go through Tomcat
servlet transport (eg: ESB uses nhttp requests)
✔ Some products uses their internal thread pools
and thread execution mechanisms. (eg : BPS
uses Apache Ode & ESB uses Apache Synapse)
✔ BAM script execution is handled by a separate
JVM
16. Solution
✔ Specifically capture the CPU time for the products
which has above constraints.
✔ Separate Component to retrieve product specific
CPU Usage Statistics and send them to Stratos
Usage Agent Component.
✔ Should add CPU Statistics to the same Usage
Agent instance, once it is registered as an OSGI
Service.
17. How to use tenant CPU usage Statistics
Metered CPU Statistics will be summarized in BAM.
Data will be used for billing and throttling.
Tenants will be throttled and billed at the end of the
month according to their CPU usage.
Summarized Data in BAM using a Hive Script
18. How they all fit - in …???
er Throttling
st to Serv
Reque Agent
Client
Metering Data
Store
Mediator Agent
( Optional )
Server Usage Agent
22. Problems
[1] Products are different
Thread handling is done differently in some
products. Had to remotely debug each an every
product's dependent apache code
(ode/synapse/hive/hadoop) and find the thread
execution part and capture the CPU time of each
request
Usually tenant information is not associated with
each request/response in apache code. I had to
send the tenant domain/id in certain cases as a
parameter in the invoke method from the particular
component or set it as a property so that I could
find which request comes from which tenant.
23. Problems Continued ..
[2] Retrieving data from different dependencies
Cannot add direct dependencies to ode/synapse
in Stratos usage agent component since it is not
used in every WSO2 product. I had to write new
component to do the data retrieval/persistence
tasks for each product, where I had to capture
CPU time, except for Tomcat.ext
Had to register UsageDataPersistenceManager in
usage agent as an OSGI service, so that
ESB/BPS components can add the CPU usage
data to the same instance that is used by the
org.wso2.carbon.usage.agent component's
persistence queue.
24. Problems Continued ..
[3] Accurate CPU Usage data ..??
Request execution live time and CPU time are very
close values, but CPU time is less than the live
time.
Thread sleep time is not captured as CPU time.
Thread CPU time is aggregated in ThreadMXBean.
Had to take the difference of thread CPU time
always for a particular request.
25. Problems Continued ..
[4] Performance Hit ...??
EnableMetring is set to 'false' by default in
carbon.xml. CPU time measuring code is executed
only if metering is enabled.
Tested for Tomcat.ext after metering is enabled.
No noticeable change in SOAPUI for a of web
service call burst.
Tested for several types of ESB proxy services
with and without code from Apache Jmeter and
there is no sign of change in TPS.
26. Performance Comparison with Apache Jmeter
ESB Echo Proxy Service – 1000 Samples
No of Threads : 100 Ramp-up period : 5s Loop Count : 10
Average Median 90% Line Min Max Error Throughput
3 3 6 2 26 0.0% 199.7/sec
3 3 6 2 18 0.0% 199.4/sec
Without Code 3 2 6 2 34 0.0% 199.6/sec
2 2 6 2 32 0.0% 199.3/sec
3 2 6 2 22 0.0% 199.7/sec
Average Median 90% Line Min Max Error Throughput
4 3 7 2 38 0.0% 199.1/sec
With Code 3 3 6 2 21 0.0% 199.3/sec
3 3 6 2 38 0.0% 199.6/sec
3 3 6 2 21 0.0% 199.5/sec
2 3 6 2 25 0.0% 199.0/sec
27. Performance Comparison with Apache Jmeter
ESB Echo Proxy Service – 1000 Samples
No of Threads : 50 Ramp-up period : 5s Loop Count : 20
Average Median 90% Line Min Max Error Throughput
3 3 4 2 19 0.0% 199.0/sec
3 3 4 2 13 0.0% 198.8/sec
Without Code 3 3 4 2 21 0.0% 197.5/sec
2 3 4 2 12 0.0% 199.0/sec
2 3 3 2 19 0.0% 199.3/sec
Average Median 90% Line Min Max Error Throughput
4 3 7 2 29 0.0% 196.9/sec
With Code 4 3 7 2 22 0.0% 199.3/sec
4 3 7 2 21 0.0% 199.5/sec
3 3 6 2 22 0.0% 198.1/sec
2 3 3 2 17 0.0% 199.2/sec
28. Performance Comparison with Apache Jmeter
ESB Proxy Service (Class mediator) – 1000 Samples
No of Threads : 100 Ramp-up period : 1s Loop Count : 10
Without With
Code Code
760.5/sec 766.8/sec
762.2/sec 749.1/sec
754.1/sec 746.8/sec
745.2/sec 746.3/sec
751.9/sec 751.3/sec
763.4/sec 748.5/sec
764.5/sec 757.6/sec
753.6/sec 749.6/sec
– Checked out for several types of Proxy services at the same
time and total throughput seems to be quite even.
29. Problems Continued ..
[5] Product/version problems
Different Products used different versions of the
same component
While project goes on several changes to
dependent components occurred
31. Automation Hackathon
With GREG team for almost 2 months.
Wrote a lot of test cases and ported old tests to
Clarity framework.
Learnt on Greg LCs, Rxts, APIs, URIs, Handlers,
Permissions etc.
Learnt to writie axis2 clients to test CRUD Operation
support and Discovery Proxy for GREG.
Automated several Support Patches.
33. Above them all ...
Obviously learnt a load of technical things.
How to take important architectural decisions and
flexibility of carbon architecture.
How to Communicate ideas with others and get the
necessary help.
Was able to get the help of lot of people and work in
several products Carbon, AS, ESB, BPS, GREG,
BAM, DSS, Stratos etc.
Learnt best practices in software engineering and
coding conventions.
34. Above them all ...
How to test software, automate the functionality and
how QA functions.
How to use mailing lists effectively.
How to manage time and meet deadlines.
How does a company function and how a company
prepares for a release.
Got to know a bunch of good friends/people.
Enjoyed every minute of it.