1/ The new Amazon EC2 P3dn instance
2/ With four-times the networking bandwidth and twice the GPU memory of the largest P3 instance, P3dn is ideal for large scale distributed training. No one else has anything close.
3/ P3dn.24xlarge instances offer 96vCPUs of Intel Skylake processors to reduce preprocessing time of data required for machine learning training.
3/ The enhanced networking of the P3n instance allows GPUs to be used more efficiently in multi-node configurations so training jobs complete faster.
4/ Finally, the extra GPU memory allows developers to easily handle more advanced machine learning models such as holding and processing multiple batches of 4k images for image classification and object detection systems
1/ in order to take advantage of all their data to train more accurate, more sophisticated models, or if they want to train their existing models quickly – customers usually need to scale their training across multiple GPUs, not just in a single instances, but across multiple instances.
2/ this can often involve hundreds of GPUs.
3/ Unfortunately, the inner workings of TensorFlow makes scaling across GPUs on different instances very inefficient.
4/ For example, at 256 GPUs, TensorFlow is only able to use 65% of the total capacity; that’s incredibly wasteful and expensive. [NOTE: a neural network is made up of hundreds of thousands of weighted network connections; these weights are adjusted over and over again - potentially billions of times during training; when training is distributed across the GPUs you also need to be able to efficiently share and update these weights with all the GPUs; relatively easy on one instance - you can just store them in memory - super hard when you get to tens of hundreds of GPUs). Hard problem.]
TRANSITION: Customers told us this was becoming a major problem when working with TF, so we…
1/ Today I’m pleased to announce that we have been able to overcome this limitation, significantly improving TensorFlow's scaling efficiency up to 90% across 256 GPUs.
2/ To achieve close to linear scalability across hundreds of GPUs.
3/ We did this by improving the way in which TensorFlow shares the model parameters across multiple instances, making the sharing faster and more efficient between instances.
4/ We got a further improvement through the 100 gig networking available on the new P3dn instances, but the majority of the benefit came from the internal architectural changes we made to the framework itself (nine tenths of the benefits came from TF improvements).
3/ This means that customers can use more GPUs to train more data, in less time.
TRANSITION: to give you an idea of the impact of this……
1/ Let’s take a look at a common computer vision model for image classification, a deep neural network called ResNet-50 <rez net fifty>, trained on hundreds of thousands of images.
2/ The fastest time to train this model, by a team in Mountain View, was 30 minutes, using a specially built training algorithm which was optimized just for this single neural network, and for specialized hardware which is only available in beta (and not available to most developers). These improvements are locked away from most models, and out of reach of the vast majority of developers.
4/ With the improvements we made in TF, we reduced training time by over 50% to just 14 minutes. This is the fastest time for training ResNet using TensorFlow, anywhere.
5/ But even more importantly, our optimizations can be applied to multiple different models - including convolutional neural networks (images), and recurrent neural networks (language, recommendation).
6/ It also runs on P3 instances, which are globally available to all developers in 14 regions.
7/ Available in SageMaker and the Deep Learning AMI
1/ Successful models are built on high-quality training data, and collecting and labeling the training dataset at the start of this workflow still involves a lot of time and effort.
2/ For example, building a computer vision system that is reliable enough to identify objects - such as traffic lights, stop signs, or pedestrians - requires thousands of hours of video recordings, consisting of hundreds of millions of video frames.
1/ And each one of these frames must be labeled to build a dataset that is can be used for training
2/This means human labelers first need to evaluate each frame and label objects, such as traffic signals, pedestrians, other vehicles, and even the road, so that the model can learn to identify these objects on its own.
3/ Today, customers distribute the labeling tasks across as many as thousands of human labelers, adding significant overhead and cost because of the sheer scale and complexity of managing so many people, and even then the process takes months.
4/ Further, if the labelers incorrectly label objects, the system will learn from the bad information and make inaccurate predictions,
leading to real-world consequences, such as a car’s inability to detect a stop sign.
5/ Customers try to filter out errors with audits and redundant reviews, but this increases the time and cost required.
TRANSITION: Increasingly, the time, expense, and complexity required for accurate labeling of large datasets have become so prohibitive that customers abandon their efforts to train new types of models that solve sophisticated problems.
1/ It all starts with data - raw data, which as yet, does not have the labels needed; it’s just raw text, or raw images, or speech, without details of what is inside the text, what items are inside the images, or what words and context the speech contains.
1/ to get started, Ground Truth will select a small number of diverse data, and send them to humans for annotation;
2/ It can do this through Mechanical Turk, or your own workforce, or a crowdsourcing company.
3/ Ground Truth collects the human labeled data, and builds a special, custom machine learning model. It then starts to run the rest of the raw data through the model.
1/ Where the model has a high confidence in the results based on what it has learned so far, it will apply the automatic annotations to the training data.
1/ where the model is less confident in the results, it will pass the new data to human annotators to provide labels.
1/ these annotations are validated across multiple human annotators, and then contribute to the training data set.
2/ Additionally, those new labels are passed back to the custom model to improve automated labels.
3/ This means that over time, Ground Truth can label more data automatically - only a small percentage of new data will need to be sent to human annotators; the rest can be done automatically.
1/ Amazon Mechanical Turk to access either a crowdsourced workforce of over 500,000 workers
2/for data which requires confidentiality, service guarantees, or special skills of pre-authorized workers a private workforce of your own employees
3/ or specific third-party vendors (such as iMerit, Vivitec, Cogito Tech, CapeStart and iVision), which offer a range of prices and geographic availability.
This lowers the overall cost of creating new datasets, keeping datasets up to date with newly generated data, and provides a better return on the cost of human annotation, since those annotations automatically contribute to the overall efficiency of the system.
We think Ground Truth will significantly change the economics of generating training data so that more data becomes available for machine learning. Really exciting.
1/ Once deployed in production SM manages the compute infrastructure on your behalf
2/ Not only does auto scaling, performs health checks, handles node failures under the covers, applies security patches, performs other routine maintenance
3/ ALL with C/W Monitoring and Logging
4/ FINALLY, one other really cool thing w SM is that it’s built in a MODULAR way (can build and train and deploy elsewhere like the edge OR can host models that were trained elsewhere)…Your Choice
5/ We couldn’t be more excited about SM and think it’s a huge playing field leveler for everyday developers and scientists
1/From today developers can access over a hundred algorithms and models, covering a remarkable breadth of capabilities from our Marketplace sellers. This is just a subset, but it gives you a sense of the capabilities which are now available; from speaker identification and speech recognition, to video classification and handwriting recognition. As a developer, these are all just a few clicks away.
2/You can just browse or search the AWS Marketplace as normal, select the algorithm or model you want to use
3/ And subscribe in a single click.
4/ new algorithm or model is the available in Amazon SageMaker console, and you can train models using the algorithms, or start running predictions on the pre-trained models immediately.
These performance and accuracy trade offs are felt most acutely at the edge.
1/ IoT applications are usually running on devices, out there in the real world. This means that the accuracy of models can be felt quickly, and immediately. Consumer IoT applications have a high expectation of accuracy - such as Alexa detecting the wake word reliably - the accuracy of that model really matters to the overall experience. In industrial IoT, devices are often responsible for monitoring and maintaining and core manufacturing processes, or safety. The accuracy of a model here is critical.
2/ Applications running on IoT devices at the edge are commonly very sensitive to latency; it’s part of the reason why customers are running the workload there in the first place, because they can’t afford the round trip to the cloud and back. So any increase in that latency can have a meaningful impact on the success of the device itself.
3/ IoT applications are often incredibly resource constrained, in a way which is much more acute than in the cloud. The devices are smaller, and have less memory and processing power, which is a real problem for machine learning models.
4/ In many cases, IoT applications need to run on very diverse hardware platforms, with a dizzying myriad of processor architectures. To get any sort of performance, developers have to optimize
5/ Finally, one of the key benefits of machine learning can get lost; the ability to continually improve the model. IoT applications are great data generators, and once that data is “ground-truthed”, it can be used to build more sophisticated models. However, if the effort to optimize those improved models for the constraints and diverse hardware at the edge is high, then it’s less likely to happen, and developers are leaving money on the table. A real missed opportunity.
6/ We don’t think that customers should have to choose between accuracy and performance. It’s a false choice, with a high cost.
So I’m excited to announce a new feature of SageMaker…
1/ There are a lot of demands placed on organizations when dealing with documents. What they typically want to be able to do sounds straightforward…
2/ They want to be able to identify documents in any format;
3/ and then extract text from those documents, accurately.
4/ But there are a whole ton of challenges which make this difficult; such as the variety of forms and formats, and the quality.
5/ The way customers try to overcome this complexity today is by either by manual review (which is accurate, but time consuming and expensive), or
6/ with simple OCR and/or..
7/ template based data extraction (which is fast, but tends not to be accurate enough, so they end up sending the documents to manual review or verification anyway).
TRANSITION: we think there is a better way, and that instead of manual reviews, simplistic OCR, and templates, we can replace that heavy lifting with smart, cheap, powerful machine learning…