C++ open positions and popularity remain high as media has recently, and there is a reason for that: from the many languages and platforms that developers have available today, C++ features uncontested capabilities in power and performance, allowing innovation outside the box (just think on action games, natural user interfaces or augmented reality, to mention some). In this talk you’ll see the new features and technologies that are coming with Visual C++ vNext, helping you build compelling applications with a renewed developer experience. Don’t miss it!!
1. What’s new in Visual C++
11
Jim Hogg
Program Manager
Visual C++
Microsoft
2. Agenda
• Why C++?
• Performance : CPUs and GPUs
• Baseline : Single-CPU / Multi-CPU Demo
• Vector CPU Demo
• GPU : C++ AMP Demo
• ISO C++ 11
• ALM (Application Lifetime Management)
3. Why C++? : Power & Performance
power: driver at all “The going word at Facebook is that
scales – on-die, mobile, „reasonably written C++ code just
desktop, datacenter
runs fast,‟ which underscores the
size: limits on
enormous effort spent at optimizing
processor PHP and Java code. Paradoxically, C++
resources code is more difficult to write than in
– desktop, mobile
other languages, but
experiences: bigger
experiences on efficient code is a lot easier.” –
smaller hardware; Andrei Alexandrescu
pushing envelope
means every
cycle matters
4. Agenda
• Why C++?
• Performance : CPUs and GPUs
• Baseline : Single-CPU / Multi-CPU Demo
• Vector CPU Demo
• GPU : C++ AMP Demo
• ISO C++ 11
• ALM (Application Lifetime Management)
5. CPU v.s. GPU today
CPU GPU
• Low memory bandwidth • High memory bandwidth
• Higher power consumption • Lower power consumption
• Medium level of parallelism • High level of parallelism
• Deep execution pipelines • Shallow execution pipelines
• Random accesses • Sequential accesses
• Supports general code • Supports data-parallel code
• Mainstream programming • Niche programming
images source: AMD
10. Compiler Enhancements
• Auto-vectorizer • Auto-parallelization
• Automatically vectorize loops. – Reorganizes the loop to run
• SIMD instructions. on multiple threads
• ON by default – /Qpar
– Optional #pragma loop
for (i = 0; i < 1024; i++)
a[i] = b[i] * c[i]; #pragma loop(hint_parallel(N))
for (i = 0; i < 1024; i++)
a[i] = b[i] * c[i];
for (i = 0; i < 1024; i += 4)
a[i:i+3] = b[i:i+3] * c[i:i+3];
17. The Power of Heterogeneous
Computing
146X 36X 19X 17X 100X
Interactive Ionic placement Transcoding HD Simulation in Astrophysics N-
visualization of for molecular video stream to Matlab using .mex body simulation
volumetric white dynamics H.264 file CUDA function
matter simulation on GPU
connectivity
149X 47X 20X 24X 30X
sourc
Financial Ultrasound Highly optimized e
GLAME@lab: An Cmatch exact string
simulation of M-script API for medical imaging object oriented matching to find
LIBOR model with linear Algebra for cancer molecular similar proteins and
swaptions operations on diagnostics dynamics gene sequences
GPU
18. C++ AMP
• Part of Visual C++
• Visual Studio integration
• STL-like library for multidimensional data
• Builds on Direct3D
performance
productivity
portability
19. Hello World: Array Addition
#include <amp.h>
using namespace concurrency;
void AddArrays(int* a, int* b, int* c, int N) void AddArrays(int n, int * pA, int* * pB, int * pC)
void AddArrays(int* a, int* b, int c, int N)
{ {{
array_view<int,1> va(N, a);
array_view<int,1> vb(N, b);
array_view<int,1> vc(N, c);
parallel_for_each(
for (int i = 0; i < n; ++i) for (int i=0; i<n; i++)
va.grid,
[=](index<1> i) restrict(direct3d)
{
{ { va[i] = vb[i] + vc[i];
a[i] = b[i] + c[i]; } pC[i] = pA[i] + pB[i];
} ); }
}
} }
20. Basic Elements of C++ AMP coding
array_view: wraps the data restrict(direct3d): tells the
to operate on the accelerator compiler to check that this code
parallel_for_each:
can execute on Direct3D hardware
execute the lambda void AddArrays(int* a, int* b, int* c, int N)
(aka accelerator)
on the accelerator {
once per thread array_view<int,1> va(N, a);
array_view<int,1> vb(N, b);
array_view<int,1> vc(N, c);
parallel_for_each(
grid: the number and va.grid,
shape of threads to [=](index<1> i) restrict(direct3d)
execute the lambda {
va[i] = vb[i] + vc[i];
} array_view variables captured
); and associated data copied to
index: the thread ID that is running the
} accelerator (on demand)
lambda, used to index into data
22. C++ AMP at a Glance
• restrict(direct3d, cpu) • tile_static storage class
• parallel_for_each • class tiled_grid< , , >
• class array<T,N> • class tiled_index< , , >
• class array_view<T,N> • class tile_barrier
• class index<N>
• class extent<N>, grid<N>
• class accelerator
• class accelerator_view
24. C++ AMP Parallel Debugger
• Well known Visual Studio debugging features
• Launch, Attach, Break, Stepping, Breakpoints, DataTips
• Toolwindows
• Processes, Debug Output, Modules, Disassembly, Call Stack, Memory,
Registers, Locals, Watch, Quick Watch
• New features (for both CPU and GPU)
• Parallel Stacks window, Parallel Watch window, Barrier
• New GPU-specific
• Emulator, GPU Threads window, race detection
25. Summary
• Democratization of parallel hardware programmability
• Performance for the mainstream
• High-level abstractions in C++ (not C)
• State-of-the-art Visual Studio IDE
• Hardware abstraction platform
• C++ AMP now published as open specification
• http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-
0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf
26. Agenda
• Why C++?
• Performance : CPUs and GPUs
• Baseline : Single-CPU / Multi-CPU Demo
• Vector CPU Demo
• GPU : C++ AMP Demo
• ISO C++ 11
• ALM (Application Lifetime Management)
27. Modern C++: Clean, Safe and Fast
auto type deduction T* shared_ptr<T>
Then Now new
make_shared
circle* p = new circle( 42 ); auto p = make_shared<circle>( 42 );
vector<shape*> v = load_shapes(); vector<shared_ptr<shape>> vw = load_shapes();
for( vector<circle*>::iterator i = v.begin(); i != v.end(); ++i ) { for_each( begin(vw), end(vw), [&]( shared_ptr<circle>& s
if(*i && **i == *p ) ){
cout << **i << “ is a matchn”; if( s && *s == *p )
} cout << *s << “ is a matchn”;
for( vector<circle*>::iterator i = v.begin(); i != v.end(); ++i ) } ); for/while/do
{ std:: algorithms
delete *i; [&] lambda functions
no need for “delete”
} not exception-safe
automatic lifetime
delete p; missing try/catch, management
__try/__finally
exception-safe
28. C++ 11 Language Features in Visual Studio
C++11 Core Language Features VC10 VC11
rvalue references v2.0 v2.1*
auto v1.0 v1.0
decltype v1.0 v1.1**
static_assert Yes Yes
trailing return types Yes Yes
lambdas v1.0 v1.1
nullptr Yes Yes
strongly typed enums Partial Yes
forward declared enums No Yes
standard-layout and trivial types No Yes
atomics No Yes
strong compare and exchange No Yes
bidirectional fences No Yes
data-dependency ordering No Yes
29. rvalue refs
struct Car {
string make; // eg “Volvo”
int when; // last-serviced – eg 201103 => March 2011
};
workOnClone(Car c); // work on a clone of my car – not returned
inspect(const Car& c); // inspect, but don’t alter, my car
fix(Car& c); // fix and return my car
replace(Car&& c); // take my car and cannibalize it – I won’t be using it again
// note that && is not a ref-to-ref (unlike **)
// enables “move semantics” and “perfect forwarding”
30. auto
int n = 42;
double pi = 3.14159;
auto x = n * e; // will infer type of x is double
for (std::map<string, vector<double>>::const_iterator iter = m.cbegin(); iter != m.cend(); ++iter)
for (auto iter = m.cbegin(); iter != m.cend(); ++iter)
const auto * p = new MyClass; // “add back” qualifiers to auto’s inferred type
const auto & r = s; // “add back” qualifiers to auto’s inferred type
auto a1 = new auto(42); // infers int*
auto * a2 = new auto(42); // beware: also infers int*
Notes: static type inference!
like C# “var”
may break old code: old auto specifies allocation within current stack frame
31. decltype
decltype(new C) c = new C; // c is a C*
// Note: first “new C” is not executed
std::vector<int>::const_iterator iter1; // a long type name
decltype(iter1) iter2; // iter2 has same type as iter1
32. static_assert
pre-processor-time run-time
#if VERSION < 8 bool done(float g1, float g2, float tol) {
#error “Need version 8 or higher” assert (tol < 1.0e-3);
#endif
compile-time
static_assert (FeetPerMile > 5200 && FeetPerMile < 6100, “FeetPerMile is wrong”);
template<class T> struct S {
static_assert(sizeof(T) < sizeof(int), “T is too big”);
static_assert(std::is_unsigned<T>::value, “S needs an unsigned type”);
33. Trailing-Return-Type
template<class A, class B> ??? adder(A &a, B &b) { return a + b; } // no!
template<class A, class B> decltype(a + b) adder(A &a, B &b) { return a + b; } // no!
template<class A, class B> auto adder(A &a, B &b) -> decltype(a + b) { return a + b; } // yes!
34. lambdas – functions with no name
[ ] ( ) -> int { return 42; } ; // no arguments
[ ] (int n) -> int { return n * n; } ; // one argument
[ ] (int a, int b) -> int { return a + b; } ; // two arguments
for_each(v.begin(), v.end(), [ ] (int n) { cout << n << “ “; }); // one-liner
float f1 = integrate ( golden, 0.0, 1.0 );
float f2 = integrate ( [ ] (float x ) { return x * x + x – 1; }, 0.0, 1.0 );
[ ] { cout << “hi” } // can omit ( ) if no parameters
// can omit -> return-type if inferable
[ capture-clause] ( parameter-list ) -> return-type { body }// grammar
35. Strongly-Typed Enums
Illegal – members must be globally unique
enum Heights {SHORT, TALL}; // ok
enum Widths {BYTE, SHORT, INT, LONG}; // clash
enum members are just integers
enum Colors {RED, GREEN, BLUE};
if (GREEN == 1) cout << “GREEN == 1”; // yes!
enum Parts {ENGINE, BRAKE, CLUTCH};
if (GREEN == BRAKE) cout << “GREEN == BRAKE”; // yes!
Use enum class
enum class Heights {SHORT, TALL};
enum class Widths {BYTE, SHORT, INT, LONG}; // eg: Widths::SHORT
36. Forward-Declared Enum Classes
enum class Colors; // forward declaration
void fun(Colors c); // use
. . .
enum class Colors : unsigned char {RED = 3, GREEN, BLUE = 7};
37. nullptr
// the NULL hack:
int* p1 = 0; // value of 0 is ‘special’
int* p2 = 42; // illegal
void f (int n) { cout << n; };
f(0); // works
void f (int* p) { cout << p; };
f(0); // works
void f (int n) { cout << n; }
void f (int* p) { cout << p; };
f(0); // which one?
f(nullptr); // calls f(int*)
decltype(nullptr) == nullptr_t
41. Each proc has FIFO store buffer
Proc Proc Reads read from local SB
Read bypassing
MFENCE flushes SB
Store buffer Store buffer
LOCK’d instruction acqiures Lock
(eg: XCHG)
Write to SB may reach memory at
any time Lock is not held
Lock
Memory
http://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.tphols.pdf
42. C++ Libraries (VS)
• STL
• C++ 11 conformant
• Support for new headers in VS vNext
• <atomic>, <filesystem>, <thread> (others)
• PPL
• Parallel Algorithms
• Task-based programming model
• Agents and Messaging - express dataflow pipelines
• Concurrency-safe containers
43. Agenda
• Why C++?
• Performance : CPUs and GPUs
• Baseline : Single-CPU / Multi-CPU Demo
• Vector CPU Demo
• GPU : C++ AMP Demo
• ISO C++ 11
• ALM (Application Lifetime Management)
44.
45. ALM (Application Life Management)
• New ALM features in vNext • Additional new C++ features
• Lightweight Requirements • 2010 features Updated
• Agile Planning Tools • Architecture Tools
• Stakeholder Feedback • Dependency Diagrams
Context Switching
• Architecture Explorer
•
• Code Review
• Unit Testing
• Exploratory Testing
• Native Unit Test Framework
• Manage and Run tests in VS
and Test Manager
49. MICROSOFTC++
2012
PARTICIPATE IN C++
MICROSOFT
DEVELOPER
DIVISION
DEVELOPMENT USER DESIGN
RESEARCH
RESEARCH
SIGN UP ONLINE AT
http://bit.ly/cppdeveloper
50. Pour aller plus loin
Prochaines sessions des Dev Camps
Chaque semaine, les DevCamps 10 février Open Data - Développer des applications riches avec le protocole Open
ALM, Azure, Windows Phone, HTML5, OpenData 2012
Live Meeting
Data
http://msdn.microsoft.com/fr-fr/devcamp 16 février Azure series - Développer des applications sociales sur la plateforme
Live Meeting
2012 Windows Azure
17 février
Téléchargement, ressources et toolkits : 2012
Live Meeting Comprendre le canvas avec Galactic et la librairie three.js
21 février
RdV sur MSDN 2012
Live Meeting La production automatisée de code avec CodeFluent Entities
http://msdn.microsoft.com/fr-fr/ 2 mars
Live Meeting
Comprendre et mettre en oeuvre le toolkit Azure pour Windows Phone 7,
2012 iOS et Android
6 mars
Live Meeting Nuget et ALM
Les offres à connaître 2012
9 mars
Live Meeting Kinect - Bien gérer la vie de son capteur
90 jours d’essai gratuit de Windows Azure 2012
www.windowsazure.fr 13 mars
2012
Live Meeting Sharepoint series - Automatisation des tests
14 mars TFS Health Check - vérifier la bonne santé de votre plateforme de
Jusqu’à 35% de réduction sur Visual Studio Pro, avec 2012
Live Meeting
développement
l’abonnement MSDN 15 mars
Live Meeting
Azure series - Développer pour les téléphones, les tablettes et le cloud
2012 avec Visual Studio 2010
www.visualstudio.fr
16 mars Applications METRO design - Désossage en règle d'un template METRO
Live Meeting
2012 javascript
20 mars Retour d'expérience LightSwitch, Optimisation de l'accès aux données,
Live Meeting
2012 Intégration Silverlight
23 mars
Live Meeting OAuth - la clé de l'utilisation des réseaux sociaux dans votre application
2012
Editor's Notes
It used to be that a decade back in 1990’s you did not care about performance. Things have change drastically performance is king again – used to be Free lunch – not anymoreIf you look around in the market there is a wide spectrum of devices available to consumer. On one of the spectrum you have mobile devices Then you have the traditional desktopsAnd everything in the cloud Data centers.In each of the scenario power and performance are key and the language that helps you get more performance from your hardware while maintaining the simplicity of the modern language is C++.C language close to the hardware, portableC++ provides - Strong abstraction, strong type safety, type safe generic codeGreat modeling power. Full control of code in memory.C++ optimized for control and efficiency.Achieve C++’s value proposition of efficient abstraction.Strong abstraction: Type-safe OO and generic code for modeling power, without sacrificing control and efficiency.Full control over code and memory: You can always express what you want to do. And you can always control memory and data layout exactly.Pay-as-you-go efficiency: No mandatory overheads, don’t pay for what you don’t use.
Intel Sandy Bridge (32 nm Tick, release Jan 2011). AVX (256-bit). Upto 8 cores; 16 with HT. Successor to Nehalem micro-arch.IntelIvy Bridge (22 nm Tock)Past decade has seen a huge increase in digital content.New class of applications that have to deal with huge amount of data.Distinguishing feature of such applications is the data level parallelism and data can be processed in any order.Two major computing platforms available:CPU GPUCPU – wide variety of applicationsPerfomanceimprovments but at cost of power.GPU – handle parallelism
A vector processor, or array processor, is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items. The vast majority of CPUs now include vector units.Today, most commodity CPUs implement architectures that feature instructions for some vector processing on multiple (vectorized) data sets, typically known as SIMD (Single Instruction, Multiple Data). Vectorization, in parallel computing, is a special case of parallelization, in which software programs that, by default, perform one operation at a time on a single thread are modified to perform multiple operations simultaneously, on that same thread.Automatic vectorization is major research topic in computer science; seeking methods that would allow a compiler to convert scalar programs into vectorized programs without human assistance.
Traditional application not using the vector processor.
SSE2 is now the defaultUsed to be x87 (floating point stack).4% performance gain on SPEC benchmarksCan cause slight differences in results in the least significant digitTo go back to old, use /ia32
No more “free lunch” term used to indicate people relying on next gen hardware to improves software speed by increasing the clock-speed and more cache.Heterogeneous computing has been around for years but usage has been restricted to fairly small niches. I’m predicting that we’re going to see abrupt and steep growth over the next couple of years. The combination of delivering results for many workloads cheaper, faster, and more power efficiently coupled with improved programming tools is going to vault GPGPU programming into being a much more common technique available to everyone.
C++11, also formerly known as C++0x (pronounced "see plus plus oh ex"),[1] is the name of the most recent iteration of the C++programming language, replacing C++03, approved by the ISO as of 12 August 2011.[2] The name is derived from the tradition of naming language versions by the date of the specification's publication.
ALM, or Application Life Management describes the whole process involved in building, shipping and maintaining software.So, as a Developer, you might start the day be checking-out an existing source file from the version-control system. You then edit the file, write some unit tests, and get it working with the help of the debugger to identify and fix problems. You then submit the change to an automated build system that also runs a batch of regression tests – to make sure that this fix did not break any existing functionality.In the afternoon, you may take part in a bug triage – examining all of the bugs reported by extensive tests; or by customers running already-shipped versions of the software; or by customers running an early beta of the new software. You review the bugs, mark their relative priorities, and assign to testers for further diagnosis.Project managers run off custom reports to monitor the status of tasks – that’s to say, work that has to be done as part of developing the product – and reported bugs. They worry about how the project is progressing against where it should be according to plan. They worry about the rates at which bugs are coming in, being resolved, and being closed. They looks at graphs of bug “burn-down” rates, slippages, and surprises – such as new tasks added to the project.Back when software development was more informal, a small project team did all of these activities. But plans might be written on paper and thrown onto a shelf. Bugs might be recorded in a list, in an ASCII text file. New features might be kept in an Excel spreadsheet, updated manually as new bugs were found. A weekly report meant gathering all of the information together from these different sources, and manually constructing a custom document, which the management of the project poured over around a big table.The whole affair was relaxed – but it worked, because the team was small – and frankly, the volume code produced was small.But software development today is a much more serious (and stressful) undertaking. Teams are large – it’s common to have projects with 100 or more Developers working on the same set of source file – half-a-million lines of C++, for example. And there’s pressure to deliver high-quality software fast – to meet deadlines required in a competitive market, in order that the company survives and grows.So the old tools are inefficient. ALM seeks to solve these problems with a solution that centers around TFS – “Team Foundation Server”. This provides a large set of tools, that all work smoothly together, and solve the problems listed in the orange arc – managing the requirements that the project must meetthe day-to-day, or hour-to-hour monitoring and control of the project’s progressversion control – the system that provides a secure store for the project source files and documents. We update it using database transactions – all or nothing updates. It is backed-up every night, or whatever policy you choose. We can see how any source file evolved – should we ever need to track down a rogue checkin and back-it-out. We can create new “branches” to allow development along parallel branches. It discovers, and in most cases, resolves conflicts where two or more Devs change the same file, over the course of the same few days.Test case management – the 100s or 1000s of tests that the software must pass to be considered “good”. These tests cover unit, functional, performance, stress.Build automation – performing full builds of the entire project with different standard options: incremental or full; debug or optimized (what Visual Studio calls “release”)Reporting – a vast range of reports – both routine and “exception”. Examples would be, progress by Devs – they update how many days, or hours, they have worked on each task and whether the time left is still the same as they estimated; bug rates for incoming, resolved and fixed. “burn-down” rates that predict the date the project will be ready to fix. Exception reports such as peformance test number 123 ran more than 5% slower than it did on the last check – with a bug automatically assigned to the owner of that component. The list goes on and on. And these are just the standard reports.What I’ve described so far is all rather abstract. To give a concrete example, let me describe how it works for us – the 3,500 folks working in Microsoft’s “Developer Division”. We all use TFS every day. Developers, testers, program managers, documentation teams. In particular, in my group – we design and build the C++ compiler – which, in turn, is used to build some enormous products inside Microsoft (Windows, SQL Server, Office and so on); as well as millions of applications around the world, written by IT departments within big corporations and ISVs building their own products and services for sale.