2. Pain points of Systems Software
Change in computing landscape
Scale of development
Developer productivity
Dependency management
5. Enter Go
A new language,a concurrent, garbage-collected languagewith
fast compilation.
6. Syntax and Semantics
Similar to C
• compiled
• Statically typed
• Procedural with pointers
Small changes to C semantics
• No Pointer arithmetic
• No implicit numeric conversions
• Array bounds are always checked
Big changes
• Linguistic support for Concurrency
• Garbage collection
• Interface, reflection, type switches, etc.
7. Packages
• Modularity and reusability
• Componentize software
• Import clause
– import “fmt”
• Package name is used to qualify items
– Name vs. pkg.Name
• Remote packages
– import "github.com/garyburd/redigo"
8. Defining visibility using Naming
• Name of the identifier itself carries the visibility
– Upper case initial letter: Public/ exported/ visible to
other packages
– Lower case initial letter: Private to the package
9. Concurrency
• Do not communicate by sharing memory;
instead, share memory by communicating.
10. Goroutine
• Function executing concurrently with others in
same address space
• Lightweight
• Multiplexed into multiple OS threads
• Go keyword
11. Channels
• way for two goroutines to communicate with
one another and synchronize their execution
• By default, sends and receives block until the
other side is ready.
• Combine communication and synchronization
• Buffered channels
17. Cost of scheduling OS threads
• POSIX Threads: Signal mask, CPU affinity,
cgroups
• Store all the CPU registers
• Cost of context switch Vs amount of work
18. Goroutine scheduling
• Switched happen only at predefined times
– If channel operations are blocking
– Go statement
– Blocking syscalls like file and network IO
– Garbage collection
• Compiler knows which registers are used
19. Stack management
POSIX threads
• Large amount of
memory pre-reserved
• Amount of available
memory reduces with
increase in threads
Goroutine
• Starts with 2k
• A check before a
function call
• Shrinks with GC
24. A golang-Interface is a class, with NO fields, and ALL VIRTUAL methods
When you call a method on the var/parameter, a concrete method is
called via method dispatch from a jmp-table.
Interface
25. Standard library
• Exp package for experimental new packages
• Archive and compresion: read .tar and .zip
• Bytes and String
• Collections: heap, lists
• File, OS
• Maths
• Networking: UNIX domain and network
sockets, TCP/IP, and UDP
27. Tools
• Gofmt: single style for readability and
scalability
• Golint: checks style violations
• GoVet: finding common mistakes
• Inbuilt performance profiler
29. Golang @ PubMatic
• Low latency high throughput
• Large number of network IO to external
partners
• Billions of requests per day
• Multiple go services live
• Third party components such as redis,
Aerospike, MySQL, GeoIP
31. GOPHERCON
• The Go Conference in India
• 19 - 20 February 2016
• Vivanta by Taj, MG Road, Bengaluru
• http://www.gophercon.in/
• Ticket is Rs 3999 (last 60 tickets remaining)
• Discount code D1000 for Rs 1000 off!
• https://www.townscript.com/e/gci16
Notas del editor
Most of the popular languages used for systems development, such as C C++ Java have been created for an environment which has changed a lot in the last few years.
The problems introduced by multicore processors, networked systems, massive computation clusters, and the web programming model were being worked around rather than addressed head-on. Fundamental concepts such as garbage collection and parallel computation were not supported by popular languages.
The scale has changed: today's server programs comprise tens of millions of lines of code, are worked on by hundreds or even thousands of programmers, and are updated literally every day. To make matters worse, build times, even on large compilation clusters, have stretched to many minutes, even hours.
Lot of developers are early in their careers and are used to procedural languages. It takes lot of time and effort to become productive in a language. People are used to languages such as C C++ Java and Java Script. Each programmer uses a subset of the language. Poor program understanding (code hard to read, poorly documented)
Duplication of efforts: Lack of standard library support for commonly used data strcutures, networking and other routinely used algorithms results in duplication of efforts in implementing them.
Dependency management is important for clean dependency analysis—and fast compilation. “header files” of languages in the C tradition
#ifndef guards: Idea is that header file is bracked with the conditional compilation clause, so that the file can be included multiple times without error.
Source of unix ps command has stat.h included 37 times. Even if the contents are discarded, the file is opened, read and scanned 37 times.
One interesting data point from Google: The code of a google binary concatenated together, was around 4.2 MB. By the time the #inlcude had been expanded, it blew over to 8GB as input to the compiler, that is a massive 2000x blowup.
Another data point: Google updated their build system from a single makefile, to a per-directory design with more clear dependencies. The binary shrunk by 40% just by clear dependencies.
The consequence of uncontrolled dependencies is that Google has to maintain a distributed compilation system costing huge engineering effort.
Requirements for a language to succeed in the systems development environment:
It must work at scale, for large programs with large numbers of dependencies, with large teams of programmers working on them.
It must be familiar, roughly C-like. The need to get programmers productive quickly in a new language means that the language cannot be too radical.
It must be modern. There are features of the modern world that are better met by newer approaches, such as built-in concurrency.
Regarding the points above:
It is possible to compile a large Go program in a few seconds on a single computer.
Go makes dependency analysis easy and avoids much of the overhead of C-style include files and libraries.
Type system has no hierarchy
Go is fully garbage-collected and provides fundamental support for concurrent execution and communication.
Go supports the construction of system software on multicore machines.
It is common to have thousands of goroutines on a moderately sized server
syntax determines the readability and hence clarity of the language. Also, syntax is critical to tooling: if the language is hard to parse, automated tools are hard to write.
Go was therefore designed with clarity and tooling in mind, and has a clean syntax. the grammar is regular and therefore easy to parse
Go provides the modularity and code reusability through it’s package ecosystem
write small pieces of software components through packages, and compose your applications with these small packages.
the package path can refer to remote repositories by having it identify the URL of the site serving the repository.
it's always clear when looking at an identifier whether it is part of the public API. After using Go for a while, it feels burdensome when going back to other languages that require looking up the declaration to discover this information
Concurrent programming in many environments is made difficult by the subtleties required to implement correct access to shared variables
Go encourages a different approach in which shared values are passed around on channels
Only one goroutine has access to the vaue at any given time. Data races cannot occur, by design
Goroutines are multiplexed onto multiple OS threads so if one should block, such as while waiting for I/O, others continue to run. Their design hides many of the complexities of thread creation and management.
By default, sends and receives block until the other side is ready. This allows goroutines to synchronize without explicit locks or condition variables.
If the channel has a buffer, the sender blocks only until the value has been copied to the buffer; if the buffer is full, this means blocking until some receiver has retrieved a value.
A buffered channel can be used like a semaphore, for instance to limit throughput.
The throughput of a server follows a bell curve, as shown in the figure
After the peak output, the server starts spending more time scheduling the requests, rather than serving the requests, resulting into increasing number of timeouts and errors
Thread per request:
The bottleneck of low latency high scale systems is the number of threads handled by the OS (not RAM, CPU, Bandwidth).
The code is simple to develop and debug.
Either go for event based model or lightweight threads
Event-based programming has been highly touted in recent years as the best way to write highly concurrent applications.
Event based systems are promising but it has a steep learning curve and dependent complexities.
Threads have their own signal mask, can be assigned CPU affinity, can be put into cgroups and can be queried for which resources they use. All these controls add overhead for they quickly add up when you have 100,000 threads in your program.
The kernel needs to store the contents of all the CPU registers for that process, then restore the values for another process. Because a process switch can occur at any point in a process’ execution, the operating system needs to store the contents of all of these registers because it does not know which are currently in use 2.
The kernel needs to flush the CPU’s virtual address to physical address mappings (TLB cache)
These costs are relatively fixed by the hardware, and depend on the amount of work done between context switches to amortise their cost—rapid context switching tends to overwhelm the amount of work done between context switches.
Instead of using guard pages, the Go compiler inserts a check as part of every function call to test if there is sufficient stack for the function to run. If there is sufficient stack space, the function runs as normal.
If there is insufficient space, the runtime will allocate a larger stack segment on the heap, copy the contents of the current stack to the new segment, free the old segment, and the function call is restarted.
Because of this check, a goroutine’s initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources. Goroutine stacks can also shrink if a sufficient portion remains unused. This is handled during garbage collection
By embedding a struct into another you have a mechanism similar to multiple inheritance
After embedding, the base fields and methods are directly available in the derived struct
If you have a method show() for example in class/struct NamedObj and also define a method show() in class/struct Rectangle, Rectangle/show() will SHADOW the parent's class NamedObj/Show()
Golang solves the diamond problem by not allowing diamonds.
The exp (experimental) package is where packages that might potentially be added to the standard library begin life, so these packages should not be used unless you want to participate in their development (by testing, commenting, or submitting patches). T
Go's syntax, package system, naming conventions, and other features were designed to make tools easy to write, and the library includes a lexer, parser, and type checker for the language.