"Surviving highload with Node.js", Andrii Shumada

•

1 like•135 views

This document discusses highload systems and strategies for scaling Node.js applications to handle increased traffic. It recommends using multiple servers for redundancy and handling spikes in load. Key metrics for monitoring include status codes, backend latency, CPU and memory utilization, and event loop lag. Batching operations to third parties and sampling logs are suggested to reduce load. Offloading heavy tasks to workers can also help optimize performance. The document emphasizes monitoring systems closely and using as few servers as possible through optimization.

Technology

ABOUT ME
- 14y. in IT
- 13y. in Node.js dev
- RnD Team Lead at WalkMe
- working with highload services

Why 2+?
Redundancy
1 service can shut
down or brake
0 downtime updates
You can update 1
service, while 2nd
will handle requests
2 is a minimum number of
servers even for
non-highload projects

When do you need
2+ servers?
- Customers are complaining about
performance
- Your metrics show performance
degradation

- Yes
- Any code optimization has
its limits
- At some point you will
reach your CPU capacity
with more users
Maybe optimize your app?

So adding more servers is the right
approach to handle more request?

Status codes
the more 2xx - the better
the less 5xx - the better
Backend latency
Preferably to respond under 200ms
To satisfy business needs
Be cost effective
The less we spend - the more money
business can get.
How to achieve this?

CPU
~40-60% avg utilization
Memory
<50% max utilization
Trafﬁc pattern
This can affect our auto scaling
parameters
Active handles
Spikes of active handles can block
requests from being processed
Active requests
Spikes of active requests can block
requests from being processed
Event loop lag
can be reason, why we can’t handle
requests in time
Monitoring & auto scaling

$$$$
Case 1: trafﬁc increases and
decreases gradually
$$$$

Case 2: trafﬁc or/and CPU
usage increases and decreases
sporadically
$$
$$
$$
$$
- potential money
saving
Hard to auto scale such systems,
there are some heavy requests.
Possible solution - ofﬂoad CPU heavy
tasks to ofﬂine jobs (workers,
separate deployments)

Node.js metrics: event loop lag
Hundreds of these can cause high event loop lag and
lead to app unresponsiveness.
Mitigation: add setImmediate() to your cycles

event loop lag in sync methods
I hope you are not using sync methods of fs.
Use async variations of methods everywhere.
Do not use it
Use it

How to capture these?
default metrics can be collected in register
of prom-client and later exposed by your
http server, so Prometheus can collect
them and display in Grafana

Exploring event loop lag
Avg event loop lag > 100ms is the case for investigation

Other default metrics, that are collected with
“collectDefaultMetrics”
https://github.com/siimon/prom-client/tree/master/lib/metrics

Debug speciﬁc pod and check types of handles
Incoming http requests
from load balancer
Outgoing connections to
3rd parties
'Number of active libuv handles grouped by handle type. Every handle type is C++ class
name.'

Code improvements: batch writes
Kafka write example.
Batch operations are also supported by Kinesis,
DynamoDb, Aerospike and many more

Batch writes example
Can be applied to any 3rd party, that supports batch writes

Logs, what can go wrong?
100_000 * 3_600 = 0.36B/h
- How much you would pay
to DataDog for this?
- What network load this
will create?
- What CPU load this will
create?
- How would you navigate
through 0.36B of logs per
hour?
In highload this can become

mitigation 1: sample errors
You don’t need all 100_000 errors in your logs

mitigation 2: store statistics of errors
It’s important to know when and how many errors did you
have

Now combine these methods
Error messages should be
persistent
You will know exact
number of events that
happened
You still can find details
about the error, where it
happened
You should tune log rate
to your load. it can be any
number 0.00001%-100%

Conclusion
Horizontal scale is most effective way
to handle more requests
Use as little servers as possible
Use batch operations when possible
log only needed amount of logs
Ofﬂoad heavy jobs to “ofﬂine workers”
Eliminate long blocking operations
Monitor everything

THANK YOU!
Time for questions!
Andrii Shumada
More talks:
https://eagleeye.github.io

Similar to "Surviving highload with Node.js", Andrii Shumada

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Brian Brazil

X-Ray distributed tracing proof-of-concept

Aram Alipoor

Serverless meetup Auckland #6

Myles Henaghan

Performance Optimization in Large Systems - Cusec 2019

Pierre-Luc Maheu

High-Speed Reactive Microservices - trials and tribulations

Rick Hightower

Going Serverless on AWS

Aleksandr Maklakov

Building and Scaling a WebSockets Pubsub System

Kapil Reddy

Deep Dive: AWS X-Ray London Summit 2017

Randall Hunt

Scalability using Node.js

ratankadam

If you'd like to learn more about Prometheus, contact us at prometheus@robustperception.io or follow us on twitter at https://twitter.com/RobustPerceiver Prometheus is a next-generation monitoring system designed for microservices. This talk will look at what's the best way to monitor your microservices, which metrics you should care about, how to have useful alerts and how Prometheus empowers you to do things the right way.

Microservices and Prometheus (Microservices NYC 2016)

Brian Brazil

Scalable Apache for Beginners

webhostingguy

Prometheus and Docker (Docker Galway, November 2015)

Brian Brazil

Enterprise application performance - Understanding & Learnings

Dhaval Shah

Server Monitoring (Scaling while bootstrapped)

Ajibola Aiyedogbon

Serverless Computing

Anand Gupta

Introduce AWS Lambda for newbie and Non-IT อธิบาย ความเป็นมาของ Serverless และ AWS Lambda คืออะไร ดีอย่างไร เพื่อให้คนไม่รู้จักและคนที่ไม่ใช่ IT ได้เข้าใจง่ายๆ Index - What's Serverless - What's AWS Lambda - Working with AWS Lambda - AWS Lambda Life-Cycle - AWS Lambda Anatomy - Beware Cold Start - How to debug - Do and Don't to implement - Pricing structure and example - Advantage/Disadvantage Presentation is English Version Blog is Thai Version : https://myifew.com/5166/understand-serverless-with-aws-lambda-for-newbie/

Introduce AWS Lambda for newbie and Non-IT

Chitpong Wuttanan

We are talking about microservices. It is a pattern to resolve the complexity of the system those need to process a high amount of data within a short period. Financial lose may occur on implementation of this pattern for an application of limited complexity in the initial phase. Initial phases have a learning curve to understand the relation and behavior of domain entities. Small and medium companies lean this during development. Large companies can allocate additional times for documentation and correction on design phases for a reasonable long period. So, sometimes it is good to start with a monolithic architecture and grow with the achievement of the company then migrate to microservices.

Introduction to requirement of microservices

Avik Das

Yazid Boutejder: AWS San Francisco Startup Day, 9/7/17 Operations: Production Readiness Review – how to stop bad things from happening - There is more to deploying code than pushing the deploy button. A good practice that many companies follow is a Production Readiness Review (PRR) which is essentially a pre-flight check list before a service launches. This helps ensure new services are properly architected, monitored, secured, and more. We’ll walk through an example PRR and discuss the value of ensuring each of these is properly taken care of before your service launches.

Operations: Production Readiness

Amazon Web Services

Cloud Native & Service Mesh

Roi Ezra

Enterprise software teams are starting to understand and embrace the power of Node.js. They face a serious challenge: integrating Node.js into the legacy systems they maintain, and migrating these system over time into Node.js architectures. This talk is a pathfinder for those facing this task. As a community we must proactively engage with the Java and .Net communities, and create a deeper understanding of the "Node.js Way".

Richardrodger nodeday-2014-final

Richard Rodger

Similar to "Surviving highload with Node.js", Andrii Shumada (20)

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

X-Ray distributed tracing proof-of-concept

Serverless meetup Auckland #6

Performance Optimization in Large Systems - Cusec 2019

High-Speed Reactive Microservices - trials and tribulations

Going Serverless on AWS

Building and Scaling a WebSockets Pubsub System

Deep Dive: AWS X-Ray London Summit 2017

Scalability using Node.js

Microservices and Prometheus (Microservices NYC 2016)

Scalable Apache for Beginners

Prometheus and Docker (Docker Galway, November 2015)

Enterprise application performance - Understanding & Learnings

Server Monitoring (Scaling while bootstrapped)

Serverless Computing

Introduce AWS Lambda for newbie and Non-IT

Introduction to requirement of microservices

Operations: Production Readiness

Cloud Native & Service Mesh

Richardrodger nodeday-2014-final

More from Fwdays

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...

Fwdays

In my talk, I will tell about the world of GenAI services beyond GPT-wrappers and how we developed and scaled GenAI-centric applications. I'll share personal experiences about the obstacles, lessons, and strategic tools and methodologies that were key in taking GenAI applications from 0 to 1. I'll talk about the challenges we faced when launching LLM-based and image generative applications and delivering them to end users, and what conclusions and solutions were made.

"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii

Fwdays

Python engineers are introduced to the transformative potential of Large Language Models (LLMs) in the realm of advanced data analysis and the application of Semantic Kernel techniques. We will talk about how LLMs like ChatGPT can be integrated into Python environments to automate data processing, enhance predictive modeling, and unlock deeper insights from complex datasets. The session will delve into practical strategies for embedding Semantic Kernel methods within Python projects, illustrating how these advanced techniques can refine the accuracy of machine learning models by embedding domain-specific knowledge directly into the analysis process. Attendees will leave with a clear roadmap for leveraging the combined power of LLMs and Semantic Kernels, equipped with actionable knowledge to drive innovation in their data analysis projects and beyond, marking a significant leap forward in the evolution of Python engineering practices.

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Fwdays

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Fwdays

"What is a RAG system and how to build it",Dmytro Spodarets

Fwdays

"Debugging python applications inside k8s environment", Andrii Soldatenko

Fwdays

"ML in Production",Oleksandr Bagan

Fwdays

Ever seen a code base where understanding a simple method meant jumping through tangled class hierarchies? We all have! And while "Favor composition over inheritance!" is almost as old as object-oriented programming, strictly avoiding all types of subclassing leads to verbose, un-Pythonic code. So, what to do? The discussion on composition vs. inheritance is so frustrating because far-reaching design decisions like this can only be made with the ecosystem in mind – and because there's more than one type of subclassing! Let's take a dogma-free stroll through the types of subclassing through a Pythonic lens and untangle some patterns and trade-offs together. By the end, you'll be more confident in deciding when subclassing will make your code more Pythonic and when composition will improve its clarity.

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Fwdays

The current architecture of Prom.ua is built on microservices and GraphQL API, but it was not always like that. In this talk, I'll tell you how far we've come and how we've made using graphs in a microservice architecture convenient and simple. I will talk about the problems we faced and how we overcame them, made our development process more accessible, deployments faster, and the remains of the monolith less loaded.

"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi

Fwdays

ETL stands for extract, transform, load. It's a process that combines data from different sources into a single repository for further processing, analysis, and utilization. This talk provides an example of how pandas can be used to solve ETL tasks as a stage in the evolution of the data intake component. This involves preliminary validation, filtering, and conversion of data according to a set of business rules and internal representation, with intermediate combination with other sources.

"Rethinking the existing data loading and processing process as an ETL exampl...

Fwdays

I’m confident that many IT professionals are currently facing the same situation I was in a few months ago. Mobilization, uncertainty. How can I be maximally beneficial to the country with my experience and continue professional development in such circumstances? Since the onset of the full-scale invasion, I've been actively volunteering and assisting the army. Mobilization became the next logical step. I want to share: My journey in IT, volunteering, and the beginning of my service in the Armed Forces Impressions from the first few months Which Soft Skills are helpful in this context I aim to dispel myths about the mobilization process and projects of the Armed Forces. Address your questions And yes, military personnel can travel abroad during their leave.

"How Ukrainian IT specialist can go on vacation abroad without crossing the T...

Fwdays

The leader must be strong all the time. The leader cannot afford to make mistakes, let alone fail in front of their team. Is that really true? Nick Gicinto, a cybersecurity leader with over 25 years of experience, who has worked for the CIA and has built security systems from scratch at Tesla and Uber, fully hiring teams for these projects, will talk about the importance of being vulnerable to build trust within a team.

"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...

Fwdays

Sharing open feedback can be difficult because it equals much work on yourself. However, feedback needs attention and a special place in the corporate culture. It helps to grow dynamically, build a team of like-minded people and achieve powerful results. In the presentation, I will talk about: The ability to work with feedback as a soft, solid skill in developing technical specialists. A list of difficulties that prevent quality work with feedback. The 4A Framework is a tool for successful giving and receiving feedback. I will also help specialists learn the following: Form constructive feedback and understand how and when to give it. Work analytically with the received feedback. Feel free to share your thoughts and be heard.

"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...

Fwdays

Will discuss: Current communication challenges, including mishaps and toxic versus productive interactions. Ever wondered about PDP? It’s likely because its relevance to career planning, even outside your current company, hasn’t been fully spotlighted. Exploring how PDP functions within career planning, applicable even if you’re eyeing an exit. “Who do I aspire to become?” Summarizing key points with a reference to a practical form you can download to use.

"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...

Fwdays

This talk will reveal four destructive communication patterns that can undermine team spirit, reduce productivity and cause conflict, and offer effective strategies for neutralizing them. Let's start with exciting storytelling about a fictional team of developers working on Scrum. You will learn about situations that their team member noticed during team meetings. Next, we will analyze "The Gottman Four Horsemen" model, which describes the four "horsemen of the apocalypse" of work relationships: criticism, defensiveness, contempt, and stonewalling. For each of these patterns, specific "antidotes" will be offered that allow you to build healthier and more productive relationships in the team. Finally, we'll look at why this topic is critical to team productivity, drawing on Google's "Project Aristotle" research. Special attention will be paid to the concept of psychological safety, which is a key factor in the success of high-performance teams. This talk will not only provide valuable insights and tools for improving communication and management in Tech teams, but will also help each member better understand their own contribution to the overall success of the team.

"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...

Fwdays

We are all living in exceptionally turbulent times. Ukrainians are facing numerous crises and challenges, ranging from war to lay-offs. Work provides us with a sense of stability, which many of us deeply value and strive to maintain. However, there are times when we may not realize that our workload becomes overwhelming or that the stress we experience in our jobs becomes highly toxic. This often leads to burnout. During our discussion, we will cover: - The factors that contribute to burnout; - How to identify it; - Short-term strategies for addressing it; - Long-term approaches to finding fulfillment and satisfaction in our work; I will also share my personal experiences and insights, as well as those of my colleagues in the IT field. - Burnout - Stress

"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...

Fwdays

As you grow in experience as a tech specialist and close the needs of Maslow's hierarchy, you realize that you can help beginners in your specialization. But how to do it effectively? Where to look for future mentees? How to avoid mistakes that demotivate both sides? And, in general, how can you benefit from investing your time in the success of a mentee? I'll discuss this based on my 1.5 years of experience mentoring beginners. It will be useful for professionals with at least six months of experience in IT and a willingness to share their knowledge with the community.

"Mentoring 101: How to effectively invest experience in the success of others...

Fwdays

"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova

Fwdays

Historically, Ukrainians are not good at presenting themselves (read: selling) because they are not self-confident. But we were never taught to be confident or to understand our strengths, moreover to pitch ourselves well in 3 minutes at an interview. Therefore, my task is to teach you in a few minutes how to: Create your personal pitch correctly (sales points, structure) Understand, depending on the type of company, what managers want to hear about you at the interview Be able to package irrelevant or no experience Understand your strengths Create an example of a personal pitch No, there will be no information here on how to get more $ at the interview, but it's not sure 😂

"Why have we learned how to package products, but not how to 'package ourselv...

Fwdays

Imposter syndrome is a psychological phenomenon that most of us suffer from to one degree or another. I'm no exception and have faced this for most of my IT career, which has been around 10 years. I did not immediately understand that the impostor syndrome is one of the biggest blockers for my personal and career development. When I understood it, I gradually started, with small and large steps, fighting with a syndrome in all available ways. In my talk, I will share my story and how it helped me grow from an ordinary developer to a team leader and, eventually, the leader of the entire department.

"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...

Fwdays

More from Fwdays (20)