Presenting at the Microsoft Devs HK Meetup on 13 June, 2018
Code for presentation: https://github.com/sadukie/IntroToPyForCSharpDevs
Azure Notebook for presentation:
https://notebooks.azure.com/cletechconsulting/libraries/introtopyforcsharpdevs
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerSarah Dutkiewicz
This document provides an overview of a presentation on being a polyglot data scientist using multiple languages and tools. It discusses using SQL, R, and Python together in data science work. The presentation covers the challenges of being a polyglot, how SQL Server with R or Python can help solve problems more easily, and examples of analyzing sensor data with these tools. It also discusses resources for learning more about R, Python, and machine learning services in SQL Server.
This document provides an introduction to testing and test-driven development. It discusses what testing is, different types of tests like unit tests and integration tests, test-driven development principles like red-green-refactor, and tools that can be used for test-driven development. Resources for learning more about testing, behavior-driven development, and coding katas are also presented.
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.
With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day.
In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.
Productionizing Spark and the REST Job Server- Evan ChanSpark Summit
The document discusses productionizing Apache Spark and using the Spark REST Job Server. It provides an overview of Spark deployment options like YARN, Mesos, and Spark Standalone mode. It also covers Spark configuration topics like jars management, classpath configuration, and tuning garbage collection. The document then discusses running Spark applications in a cluster using tools like spark-submit and the Spark Job Server. It highlights features of the Spark Job Server like enabling low-latency Spark queries and sharing cached RDDs across jobs. Finally, it provides examples of using the Spark Job Server in production environments.
Given at Data Day Texas 2016.
Apache Spark has been hailed as a trail-blazing new tool for doing distributed data science. However, since it's so new, it can be difficult to set up and hard to use. In this talk, I'll discuss the journey I've had using Spark for data science at Bitly over the past year. I'll talk about the benefits of using Spark, the challenges I've had to overcome, the caveats for using a cutting-edge technology such as this, and my hopes for the Spark project as a whole.
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
The landscape of security threats an enterprise faces is vast. It is imperative for an organization to know when one of the machines within the network has been compromised. One layer of detection can take advantage of the DNS requests made by machines within the network. A request to a Command & Control (CNC) domain can act as an indication of compromise. It is thus advisable to find these domains before they come into play. The team at Akamai aims to do just that.
In this session, Aminov will share Akamai’s experience in porting their PoC detection algorithms, written in Python, to a reliable production-level implementation using Scala and Apache Spark. He will specifically cover their experience regarding an algorithm they developed to detect botnet domains based on passive DNS data. The session will also include some useful insights Akamai has learned while handing out solutions from research to development, including the transition from small-scale to large-scale data consumption, model export/import using PMML and sampling techniques. This information is valuable for researchers and developers alike.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks
- MLlib has rapidly developed over the past 5 years, growing from a few algorithms to over 50 algorithms and featurizers for classification, regression, clustering, recommendation, and more.
- This growth has shifted from just adding algorithms to improving algorithms, infrastructure, and integrating ML workflows with Spark's broader capabilities like SQL, DataFrames, and streaming.
- Going forward, areas of focus include continued scalability improvements, enhancing core algorithms, extensible APIs, and making MLlib a more comprehensive standard library.
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerSarah Dutkiewicz
This document provides an overview of a presentation on being a polyglot data scientist using multiple languages and tools. It discusses using SQL, R, and Python together in data science work. The presentation covers the challenges of being a polyglot, how SQL Server with R or Python can help solve problems more easily, and examples of analyzing sensor data with these tools. It also discusses resources for learning more about R, Python, and machine learning services in SQL Server.
This document provides an introduction to testing and test-driven development. It discusses what testing is, different types of tests like unit tests and integration tests, test-driven development principles like red-green-refactor, and tools that can be used for test-driven development. Resources for learning more about testing, behavior-driven development, and coding katas are also presented.
I work in a Data Innovation Lab with a horde of Data Scientists. Data Scientists gather data, clean data, apply Machine Learning algorithms and produce results, all of that with specialized tools (Dataiku, Scikit-Learn, R...). These processes run on a single machine, on data that is fixed in time, and they have no constraint on execution speed.
With my fellow Developers, our goal is to bring these processes to production. Our constraints are very different: we want the code to be versioned, to be tested, to be deployed automatically and to produce logs. We also need it to run in production on distributed architectures (Spark, Hadoop), with fixed versions of languages and frameworks (Scala...), and with data that changes every day.
In this talk, I will explain how we, Developers, work hand-in-hand with Data Scientists to shorten the path to running data workflows in production.
Productionizing Spark and the REST Job Server- Evan ChanSpark Summit
The document discusses productionizing Apache Spark and using the Spark REST Job Server. It provides an overview of Spark deployment options like YARN, Mesos, and Spark Standalone mode. It also covers Spark configuration topics like jars management, classpath configuration, and tuning garbage collection. The document then discusses running Spark applications in a cluster using tools like spark-submit and the Spark Job Server. It highlights features of the Spark Job Server like enabling low-latency Spark queries and sharing cached RDDs across jobs. Finally, it provides examples of using the Spark Job Server in production environments.
Given at Data Day Texas 2016.
Apache Spark has been hailed as a trail-blazing new tool for doing distributed data science. However, since it's so new, it can be difficult to set up and hard to use. In this talk, I'll discuss the journey I've had using Spark for data science at Bitly over the past year. I'll talk about the benefits of using Spark, the challenges I've had to overcome, the caveats for using a cutting-edge technology such as this, and my hopes for the Spark project as a whole.
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
The landscape of security threats an enterprise faces is vast. It is imperative for an organization to know when one of the machines within the network has been compromised. One layer of detection can take advantage of the DNS requests made by machines within the network. A request to a Command & Control (CNC) domain can act as an indication of compromise. It is thus advisable to find these domains before they come into play. The team at Akamai aims to do just that.
In this session, Aminov will share Akamai’s experience in porting their PoC detection algorithms, written in Python, to a reliable production-level implementation using Scala and Apache Spark. He will specifically cover their experience regarding an algorithm they developed to detect botnet domains based on passive DNS data. The session will also include some useful insights Akamai has learned while handing out solutions from research to development, including the transition from small-scale to large-scale data consumption, model export/import using PMML and sampling techniques. This information is valuable for researchers and developers alike.
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
The majority of a data scientist’s time is spent cleaning and organizing data before insights can be derived. Frequently, with large datasets, a lack of integration with visualization tools makes it hard to know what’s most interesting in the data and also creates challenges for validating numerical insights from models. Given the vast number of tools available in the ecosystem, it is hard to experiment with different tools to pick the most suitable one, especially given the complexity involved in integrating them with one’s solution.
The speakers will present an easy to use workflow that solves this integration challenge by combining various open source libraries, databases (e.g. Hive, Postgres, MySQL, HBase etc.) and visualization with distributed analytics. Intel developed a highly scalable library built over Apache Spark with novel graph, statistical and machine learning algorithms that also enhances the user experience of Apache Spark via easier to use APIs.
This session will showcase how to address the above mentioned issues for a drug similarity use case. We’ll go from ETL operations on raw drug data to deriving relevant features from the drug’s chemical structure using statistical and graph algorithms, using techniques to identify best model and parameters for this data to derive insights, and then demonstrating the ease of connectivity to different databases and visualization tools.
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyDatabricks
- MLlib has rapidly developed over the past 5 years, growing from a few algorithms to over 50 algorithms and featurizers for classification, regression, clustering, recommendation, and more.
- This growth has shifted from just adding algorithms to improving algorithms, infrastructure, and integrating ML workflows with Spark's broader capabilities like SQL, DataFrames, and streaming.
- Going forward, areas of focus include continued scalability improvements, enhancing core algorithms, extensible APIs, and making MLlib a more comprehensive standard library.
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in our job.
SPARQL is proposed as a SQL-like query language for NoSQL databases that is schema-free, accommodates different data models, uses a familiar SQL-syntax, and is an accepted standard. SPARQL represents data as RDF graphs and can run unmodified on different NoSQL systems that support it, providing interoperability and lowering the barrier to querying diverse data sources. It aims to address how organizations can process, manage and analyze growing volumes of data with fewer resources by offering a common language for querying multiple NoSQL databases.
The document describes a product called Scrazzl that analyzes scientific articles to extract key information and entities. It highlights important parts of articles and provides supplementary information from its own repository. The repository collects extracted data from articles that is cross-referenced and linked. The product also includes analytics on brands, phrases, products and locations. It distributes the exposed data through feeds to gain traffic. The technical architecture uses tools like Apache Solr for indexing and analyzing documents, MongoDB for analytics data, and distributed systems for scaling.
This document provides an introduction and agenda for a presentation on Spark. It discusses how Spark is a fast engine for large-scale data processing and how it improves on MapReduce. Spark stores data in memory across clusters to allow for faster iterative computations versus writing to disk with MapReduce. The presentation will demonstrate Spark concepts through word count and log analysis examples and provide an overview of Spark's Resilient Distributed Datasets (RDDs) and directed acyclic graph (DAG) execution model.
Big Data Processing with Apache Spark 2014mahchiev
This document provides an overview of Apache Spark, a framework for large-scale data processing. It discusses what big data is, the history and advantages of Spark, and Spark's execution model. Key concepts explained include Resilient Distributed Datasets (RDDs), transformations, actions, and MapReduce algorithms like word count. Examples are provided to illustrate Spark's use of RDDs and how it can improve on Hadoop MapReduce.
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
Scala and Spark are each great tools for data processing and they work well together. They can process data via small simple interactive queries as well as in very large highly-available and scalable production systems. They provide an integrated framework for an ever growing wide range of data processing capabilities. We examine the reasons for this and also look a couple of simple data processing examples written in Scala. Presented by John Nestor, Sr Architect at 47 Degrees.
Big data analysing genomics and the bdg projectsree navya
This document discusses a project analyzing genomics using big data techniques. It introduces Spark, a framework for large-scale data analysis, and ADAM, a Spark-based framework for genomic data. Key points covered include an overview of Spark and Hadoop, data types and formats in genomics, and using ADAM and Spark to perform genomic analysis and queries on large datasets like the 1000 Genomes Project.
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
Sesja or ozwiązaniu Big Data Analytics Microsoft. Jest to Hortonowrks (HADOOP, HBase, Storm, Spark), wraz z wydajnym R Server. Zaawansowana analityka przy użyciui RevoScaleR
Whirlpools in the Stream with Jayesh LalwaniDatabricks
This document summarizes some challenges and solutions related to structured streaming in Spark. It discusses issues with joining streaming and batch data due to lack of pushdown predicates. It also covers problems with caching batch dataframes, lack of a JDBC sink in streaming mode initially, issues with checkpoints being inconsistent, and limitations on aggregating aggregated dataframes. Solutions proposed include caching data outside Spark, looking up batch data in map/flatmap, direct database writes, using NFS for checkpoints, and custom aggregations without Spark SQL.
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyUwe Korn
This document discusses how Apache Arrow enables sharing data between Python and Java without copying. It summarizes Arrow's capabilities for efficient in-memory columnar data and its ability to exchange data between different programming languages. The document then outlines how Arrow, through its Java and Python libraries, allows querying data in Java from Python without copying, by passing memory addresses between the two environments. This enables faster data science workflows that involve both Python and Java/Scala.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
This presentation introduces Tune and Fugue, frameworks for intuitive and scalable hyperparameter optimization (HPO). Tune supports both non-iterative and iterative HPO problems. For non-iterative problems, Tune supports grid search, random search, and Bayesian optimization. For iterative problems, Tune generalizes algorithms like Hyperband and Asynchronous Successive Halving. Tune allows tuning models both locally and in a distributed manner without code changes. The presentation demonstrates Tune's capabilities through examples tuning Scikit-Learn and Keras models. The goal of Tune and Fugue is to make HPO development easy, testable, and scalable.
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
Apache Arrow is an open standard for in-memory columnar data and an analytical data processing platform. It aims to simplify system architectures, improve interoperability between systems, and enable data and algorithms to be reused across different programming languages. Arrow provides a portable in-memory data format and computational libraries to build analytical data processing systems. It is language-independent and supports data sharing and algorithm reuse between libraries and processes via shared memory with near-zero overhead.
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Databricks
Time is the one thing we can never get in front of. It is rooted in everything, and “timeliness” is now more important than ever especially as we see businesses automate more and more of their processes. This presentation will scratch the surface of streaming discovery with a deeper dive into the telecommunications space where it is normal to receive billions of events a day from globally distributed sub-systems and where key decisions “must” be automated.
We’ll start out with a quick primer on telecommunications, an overview of the key components of our architecture, and make a case for the importance of “ringing”. We will then walk through a simplified solution for doing windowed histogram analysis and labeling of data in flight using Spark Structured Streaming and mapGroupsWithState. I will walk through some suggestions for scaling up to billions of events, managing memory when using the spark StateStore as well as how to avoid pitfalls with the serialized data stored there.
What you’ll learn:
1. How to use the new features of Spark 2.2.0 (mapGroupsWithState / StateStore)
2. How to bucket and analyze data in the streaming world
3. How to avoid common Serialization mistakes (eg. how to upgrade application code and retain stored state)
4. More about the telecommunications space than you’ll probably want to know!
5. Learn a new approach to building applications for enterprise and production.
Assumptions:
1. You know Scala – or want to know more about it.
2. You have deployed spark to production at your company or want to
3. You want to learn some neat tricks that may save you tons of time!
Take Aways:
1. Fully functioning spark app – with unit tests!
Skutil - H2O meets Sklearn - Taylor SmithSri Ambati
Skutil brings the best of both worlds to H2O and sklearn, delivering an easy transition into the world of distributed computing that H2O offers, while providing the same, familiar interface that sklearn users have come to know and love.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an introduction and overview of the Python programming language. It covers what Python is, why it is used, how to use it, basic data types and containers, control flow, scientific Python packages like NumPy and Pandas, and examples of data visualization and machine learning. The document is intended to give attendees an overview of the Python ecosystem and how it can be applied to scientific and data analysis tasks.
This document provides an introduction to Jupyter Notebook and Azure Machine Learning Studio. It discusses popular programming languages like Python, R, and Julia that can be used with these tools. It also summarizes key features of Jupyter Notebook like code cells, kernels, and cloud deployment. Demo code examples are shown for integrating Python and R with Azure ML to fetch and transform data.
This document provides an introduction to Jupyter Notebook and Azure Machine Learning Studio. It discusses popular programming languages like Python, R, and Julia that can be used with these tools. It also summarizes key features of Jupyter Notebook like code cells, kernels, and cloud deployment. Examples are given of using Python and R with Azure ML to fetch and transform data in Jupyter notebooks.
Debugging Apache Spark - Scala & Python super happy fun times 2017Holden Karau
Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, and some common errors and how to detect them.
Spark’s own internal logging can often be quite verbose. Holden and Joey demonstrate how to effectively search logs from Apache Spark to spot common problems and discuss options for logging from within your program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but Holden and Joey look at how to effectively use Spark’s current accumulators for debugging before gazing into the future to see the data property type accumulators that may be coming to Spark in future versions. And in addition to reading logs and instrumenting your program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems. Holden and Joey cover how to quickly use the UI to figure out if certain types of issues are occurring in our job.
SPARQL is proposed as a SQL-like query language for NoSQL databases that is schema-free, accommodates different data models, uses a familiar SQL-syntax, and is an accepted standard. SPARQL represents data as RDF graphs and can run unmodified on different NoSQL systems that support it, providing interoperability and lowering the barrier to querying diverse data sources. It aims to address how organizations can process, manage and analyze growing volumes of data with fewer resources by offering a common language for querying multiple NoSQL databases.
The document describes a product called Scrazzl that analyzes scientific articles to extract key information and entities. It highlights important parts of articles and provides supplementary information from its own repository. The repository collects extracted data from articles that is cross-referenced and linked. The product also includes analytics on brands, phrases, products and locations. It distributes the exposed data through feeds to gain traffic. The technical architecture uses tools like Apache Solr for indexing and analyzing documents, MongoDB for analytics data, and distributed systems for scaling.
This document provides an introduction and agenda for a presentation on Spark. It discusses how Spark is a fast engine for large-scale data processing and how it improves on MapReduce. Spark stores data in memory across clusters to allow for faster iterative computations versus writing to disk with MapReduce. The presentation will demonstrate Spark concepts through word count and log analysis examples and provide an overview of Spark's Resilient Distributed Datasets (RDDs) and directed acyclic graph (DAG) execution model.
Big Data Processing with Apache Spark 2014mahchiev
This document provides an overview of Apache Spark, a framework for large-scale data processing. It discusses what big data is, the history and advantages of Spark, and Spark's execution model. Key concepts explained include Resilient Distributed Datasets (RDDs), transformations, actions, and MapReduce algorithms like word count. Examples are provided to illustrate Spark's use of RDDs and how it can improve on Hadoop MapReduce.
Scala and Spark are Ideal for Big Data - Data Science Pop-up SeattleDomino Data Lab
Scala and Spark are each great tools for data processing and they work well together. They can process data via small simple interactive queries as well as in very large highly-available and scalable production systems. They provide an integrated framework for an ever growing wide range of data processing capabilities. We examine the reasons for this and also look a couple of simple data processing examples written in Scala. Presented by John Nestor, Sr Architect at 47 Degrees.
Big data analysing genomics and the bdg projectsree navya
This document discusses a project analyzing genomics using big data techniques. It introduces Spark, a framework for large-scale data analysis, and ADAM, a Spark-based framework for genomic data. Key points covered include an overview of Spark and Hadoop, data types and formats in genomics, and using ADAM and Spark to perform genomic analysis and queries on large datasets like the 1000 Genomes Project.
AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsightŁukasz Grala
Sesja or ozwiązaniu Big Data Analytics Microsoft. Jest to Hortonowrks (HADOOP, HBase, Storm, Spark), wraz z wydajnym R Server. Zaawansowana analityka przy użyciui RevoScaleR
Whirlpools in the Stream with Jayesh LalwaniDatabricks
This document summarizes some challenges and solutions related to structured streaming in Spark. It discusses issues with joining streaming and batch data due to lack of pushdown predicates. It also covers problems with caching batch dataframes, lack of a JDBC sink in streaming mode initially, issues with checkpoints being inconsistent, and limitations on aggregating aggregated dataframes. Solutions proposed include caching data outside Spark, looking up batch data in map/flatmap, direct database writes, using NFS for checkpoints, and custom aggregations without Spark SQL.
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyUwe Korn
This document discusses how Apache Arrow enables sharing data between Python and Java without copying. It summarizes Arrow's capabilities for efficient in-memory columnar data and its ability to exchange data between different programming languages. The document then outlines how Arrow, through its Java and Python libraries, allows querying data in Java from Python without copying, by passing memory addresses between the two environments. This enables faster data science workflows that involve both Python and Java/Scala.
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
This presentation introduces Tune and Fugue, frameworks for intuitive and scalable hyperparameter optimization (HPO). Tune supports both non-iterative and iterative HPO problems. For non-iterative problems, Tune supports grid search, random search, and Bayesian optimization. For iterative problems, Tune generalizes algorithms like Hyperband and Asynchronous Successive Halving. Tune allows tuning models both locally and in a distributed manner without code changes. The presentation demonstrates Tune's capabilities through examples tuning Scikit-Learn and Keras models. The goal of Tune and Fugue is to make HPO development easy, testable, and scalable.
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
Apache Arrow is an open standard for in-memory columnar data and an analytical data processing platform. It aims to simplify system architectures, improve interoperability between systems, and enable data and algorithms to be reused across different programming languages. Arrow provides a portable in-memory data format and computational libraries to build analytical data processing systems. It is language-independent and supports data sharing and algorithm reuse between libraries and processes via shared memory with near-zero overhead.
Streaming Trend Discovery: Real-Time Discovery in a Sea of Events with Scott ...Databricks
Time is the one thing we can never get in front of. It is rooted in everything, and “timeliness” is now more important than ever especially as we see businesses automate more and more of their processes. This presentation will scratch the surface of streaming discovery with a deeper dive into the telecommunications space where it is normal to receive billions of events a day from globally distributed sub-systems and where key decisions “must” be automated.
We’ll start out with a quick primer on telecommunications, an overview of the key components of our architecture, and make a case for the importance of “ringing”. We will then walk through a simplified solution for doing windowed histogram analysis and labeling of data in flight using Spark Structured Streaming and mapGroupsWithState. I will walk through some suggestions for scaling up to billions of events, managing memory when using the spark StateStore as well as how to avoid pitfalls with the serialized data stored there.
What you’ll learn:
1. How to use the new features of Spark 2.2.0 (mapGroupsWithState / StateStore)
2. How to bucket and analyze data in the streaming world
3. How to avoid common Serialization mistakes (eg. how to upgrade application code and retain stored state)
4. More about the telecommunications space than you’ll probably want to know!
5. Learn a new approach to building applications for enterprise and production.
Assumptions:
1. You know Scala – or want to know more about it.
2. You have deployed spark to production at your company or want to
3. You want to learn some neat tricks that may save you tons of time!
Take Aways:
1. Fully functioning spark app – with unit tests!
Skutil - H2O meets Sklearn - Taylor SmithSri Ambati
Skutil brings the best of both worlds to H2O and sklearn, delivering an easy transition into the world of distributed computing that H2O offers, while providing the same, familiar interface that sklearn users have come to know and love.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
This document provides an introduction and overview of the Python programming language. It covers what Python is, why it is used, how to use it, basic data types and containers, control flow, scientific Python packages like NumPy and Pandas, and examples of data visualization and machine learning. The document is intended to give attendees an overview of the Python ecosystem and how it can be applied to scientific and data analysis tasks.
This document provides an introduction to Jupyter Notebook and Azure Machine Learning Studio. It discusses popular programming languages like Python, R, and Julia that can be used with these tools. It also summarizes key features of Jupyter Notebook like code cells, kernels, and cloud deployment. Demo code examples are shown for integrating Python and R with Azure ML to fetch and transform data.
This document provides an introduction to Jupyter Notebook and Azure Machine Learning Studio. It discusses popular programming languages like Python, R, and Julia that can be used with these tools. It also summarizes key features of Jupyter Notebook like code cells, kernels, and cloud deployment. Examples are given of using Python and R with Azure ML to fetch and transform data in Jupyter notebooks.
This document discusses the concept of orthogonality in software design. It defines orthogonality as making features that minimally depend on each other, such as code, methods, classes, libraries, and more. The benefits of orthogonality include easier maintenance, reading, and reuse as changes to one part do not affect others. Techniques for achieving orthogonality include designing components as reusable "Lego bricks", minimizing state, favoring immutability, and separating concerns through clear APIs. The document provides examples of applying these principles in case studies of image processing, analytics and OCR libraries.
This document summarizes Peter Wang's keynote speech at PyData Texas 2015. It begins by looking back at the history and growth of PyData conferences over the past 3 years. It then discusses some of the main data science challenges companies currently face. The rest of the speech focuses on the role of Python in data science, how the technology landscape has evolved, and PyData's mission to empower scientists to explore, analyze, and share their data.
This document provides information about a Python for Data Science course held from July 15-19, 2019. The course aims to teach participants how to access and combine diverse datasets, conduct data exploration and visualization, and utilize Python libraries for geospatial data and machine learning. By the end of the course, participants should be able to self-teach next steps without any prior coding skills. The document also discusses key elements of programming like syntax, operations, and functions. It provides examples of Python libraries commonly used in data science workflows for tasks like web scraping, data cleaning, modeling, analysis, and presentation.
A short introduction to the more advanced python and programming in general. Intended for users that has already learned the basic coding skills but want to have a rapid tour of more in-depth capacities offered by Python and some general programming background.
Execrices are available at: https://github.com/chiffa/Intermediate_Python_programming
This document provides an introduction and overview of the Python programming language. It discusses who uses Python and why, gives an overview of Python's features including dynamic typing, modules, exceptions, generators and list comprehensions. It also covers Python paradigms like structural, object-oriented, functional and imperative programming. The document describes Python data types and sequences, provides examples of classes and decorators, and discusses additional Python concepts like special keywords, ranges, generators and resources for learning more.
Best Python Online Training with Live Project by Expert QA TrainingHub
QA Training Hub is best Python Programing Online Training Center in India. Python Online Training provided by real time working Professional Mr. Dinesh. Data Scientist and RPA Expert with 18+ years of industry experience in teaching Python. Visit: http://www.qatraininghub.com/python-online-training.php Contact: Mr. Dinesh Raju : India: +91-8977262627, USA: : +1-845-493-5018, Mail: info@qatraininghub.com
Python Tricks That You Can't Live WithoutAudrey Roy
Audrey Roy gave a presentation on Python tricks for code readability and reuse at PyCon Philippines 2012. She discussed writing clean, understandable code by following PEP8 style guidelines and using linters. She also explained how to find and install reusable Python libraries from the standard library and PyPI, and how to write packages and modules to create reusable code.
This 4-week course on "Python for Data Science" taught the basics of Python programming and libraries for data science. It covered topics like data types, sequence data, Pandas dataframes, data visualization with Matplotlib and Seaborn. Technologies taught included Spyder IDE, NumPy, Jupyter Notebook, Pandas and visualization libraries. The course aimed to equip participants with Python skills for solving data science problems. It examined applications of data science in domains like e-commerce, machine learning, medical diagnosis and more.
Python intro and competitive programmingSuraj Shah
This document provides an introduction and overview of the Python programming language. It covers Python's background, syntax, types, operators, control flow, functions, classes, tools, and provides examples of code. The document discusses Python's multi-purpose capabilities, object oriented nature, dynamic and strong typing, readability, batteries included philosophy, and cross-platform support. It also lists some major organizations that use Python.
This document provides an overview of better tools and mindsets for personal and professional development. It discusses the importance of self-reflection and understanding one's goals. A variety of useful tools are then introduced, including VPN software, language learning resources, code hosting platforms, IDEs like IntelliJ and development tools like Git, Bitbucket, Youtrack, and TeamCity. The document also discusses mindsets like neuroplasticity and approaches like Agile. It focuses on establishing an effective development environment and learning resources.
The document discusses Python and its suitability for data science. It describes Python's Zen-like approach of focusing on simplicity and empowering users. It promotes Python's data science stack, including NumPy, Pandas, scikit-learn and others, and how they allow for rapid data analysis and model building. It also describes the Anaconda distribution and conda package manager for easily managing Python environments and packages.
Welcome to the Brixton Library Technology InitiativeBasil Bibi
This document introduces a Python coding initiative at the Brixton Library for adults. It provides information about meeting times and contacts, as well as a detailed overview of the Python programming language, its history and uses. Participants are encouraged to register for an associated free online Coursera course and attend Saturday sessions at the library for assistance and collaboration.
Object Oriented Programming in Swift Ch0 - EncapsulationChihyang Li
This document introduces object oriented programming concepts in Swift. It discusses key OOP principles like encapsulation, inheritance and polymorphism. It also covers object oriented analysis, design and programming levels. Specific concepts explained include data abstraction, access control, class invariants, pre/postconditions and design by contract. Common programming paradigms like procedural, object oriented and spaghetti code are compared. Modularization benefits like reusability, maintainability and debugging are highlighted.
Austin Python Learners Meetup - Everything you need to know about programming...Danny Mulligan
This document provides an overview of the key topics and tools needed for programming without prior experience, summarized in 3 sentences:
It discusses editors/IDEs, revision control, testing, debugging, common errors, performance, libraries, documentation, getting help, practicing, and answers questions about programming. Popular editors mentioned include TextEdit, Notepad, EMACS, and vim, while revision control tools include GIT and Mercurial. The document emphasizes using libraries, writing tests, avoiding errors, and getting help from documentation and online communities like StackOverflow.
Working with credentials for Azure resources, you want to avoid storing your credentials in repositories when possible. In this session, we will talk about some of the options for working with credentials in Azure development without checking them into repositories - including managed identities, DefaultAzureCredential, and ChainedTokenCredential.
This document summarizes a presentation on using Azure Databricks to predict flight delays. It introduces Databricks, which has environments for SQL, data science/engineering, and machine learning. For the flight prediction scenario, historical flight data is loaded into Databricks and a decision tree model is trained to predict delays. The model is then used to score new flight data and results are analyzed in Power BI.
This document provides an overview of Azure DevOps and how it can benefit developers. It discusses key features such as source control, work item tracking, continuous integration and delivery pipelines, and how SQL Server Data Tools can be used. The presenter has over 20 years of experience in technology and is a Microsoft MVP. They provide a demonstration of using Azure DevOps and SSDT for a database project. Resources for learning more are also included.
The document provides an overview of Azure DevOps and why JavaScript developers should use it. It discusses features like source control, boards for tracking work items, pipelines for continuous integration and delivery, and testing. It also includes a demo of setting up a sample Create React App project in Azure DevOps, including configuring a pipeline to build and deploy the app to an Azure App Service. Resources for learning more about Azure DevOps, using it with JavaScript projects, and understanding Git are also provided.
This document discusses using Azure DevOps for database development. It provides an overview of Azure DevOps features like source control, work tracking, code reviews, builds and releases. SQL Server Data Tools can be used to create database projects in Azure DevOps. An example is provided of adding a new feature to an AdventureWorks database project, committing the changes to source control, and linking work items to track the task. Data professionals are encouraged to use these tools to version database code and automate deployments.
Noodling Data with Jupyter Notebook - presented at various user groups in 2020 both in this format and for Azure Notebooks; also available as a Juptyer Notebook to be presented with RISE slideshow
Pair programming involves two programmers working together, with one typing and the other reviewing the work. It allows for knowledge sharing and immediate feedback. When used selectively, it can produce higher quality code and help onboard new programmers. Mob programming takes this further by having the entire team work together on one task using one screen and keyboard, rotating who physically types. It aims to improve shared understanding and code quality through extensive collaboration, but may reduce delivery speed and be challenging for those who prefer individual work. Effective use of these techniques requires open communication, shared goals, and avoiding forced participation.
Becoming a Servant Leader, Leading from the TrenchesSarah Dutkiewicz
This document discusses how to become a servant leader. It provides tips for leading from the trenches such as putting others first, checking your ego at the door, growing leaders at all levels, and listening. It emphasizes the importance of credibility, integrity, core values, empathy and being transparent. Additional advice includes gaining experience with pain points firsthand, building trust while working alongside others, keeping your ear to the ground, and staying technical while contributing to the community. The overall message is that leadership is about serving others through honest communication and commitment to positive change.
The document discusses the importance of mentorship, especially for junior developers. It provides guidance on how to be a great mentor, including setting expectations, goals, encouraging questions, giving feedback, and pushing mentees out of their comfort zone. Mentorship is a two-way street that can benefit both parties. Finding mentors can come from within one's company, community events, conferences, or formal programs.
This document discusses how to become a servant leader. It provides tips for leading from the trenches such as putting others first, checking your ego at the door, growing leaders at all levels, and building trust and respect. It emphasizes listening, empathy, transparency, and gaining experience with pain points firsthand. Staying technical, getting involved in the community, and contributing to open source are recommended for servant leaders. The overall message is that leadership is about serving others through commitment, honesty, and positive change.
What is UX and why should we care as developers? This talk explores these concepts from a developer's perspective. Presented at Kansas City Developer Conference 2017 on August 4, 2017
The document discusses many influential women in the history of technology, including those who programmed the first digital computer (ENIAC), invented programming languages like COBOL, broke German ciphers during World War 2, created influential programming languages like CLU and Argus, invented technologies used in phones today, popularized the use of icons in computing, and more. It highlights women who made contributions across programming, engineering, standards development, and more throughout the development of computing. The document aims to showcase the many trailblazing women whose contributions are often overlooked.
This document outlines Sarah Dutkiewicz's goals and journey in her Unstoppable Course from January to March 2016. Her purpose is to make the world a better place by using her talents to serve humanity. Her short-term goals include making Easter cheese and learning about Akron with her family. Her long-term goal is to provide the best learning experiences for her boys to become well-rounded global citizens. She overcame struggles by focusing on gratitude, affirmations, and blogging about her experiences.
Without users & their problems, we have no reason to write software. However, sometimes, it is frustrating dealing with the source of our problems. Thankfully, there are tools to help us become better at communicating with our end users, in hopes of achieving the end goal with as little strife as possible. Empathy, patience, and clear communication go a long way in development, as this talk will show. “Even More Tools for the Developer’s UX Toolbelt” will give developers even more tools to make their lives a little easier when dealing with end users.
The document provides a history of women in technology, profiling several pioneering women in the field. It summarizes the contributions of Ada Lovelace, the original programmers of the ENIAC computer, Grace Hopper, Hedy Lamarr, Barbara Liskov, Frances Allen, and Mary Lou Jepsen. Some of the lessons highlighted include choosing mentors interested in your learning, having an open mind in one's career, making technology more accessible, and challenging conventional ways of thinking.
This document discusses various UX tools and methods that can help software developers integrate user experience best practices into their development process. It outlines the typical software development phases of analyze, design, develop/implement, and test, and provides examples of UX tools that can be used at each phase, such as mind maps and personas for analysis, wireframes and user flows for design, feature files for development, and heatmaps and analytics for testing. The overall goal is to help developers better understand users and build software that meets users' needs.
World Usability Day 2014 - UX Toolbelt for DevelopersSarah Dutkiewicz
The document discusses user experience (UX) tools and methods that are useful for software developers. It covers techniques for analyzing user needs like mind maps and personas, designing interfaces with wireframes and user flows, implementing features tracked in code via behavior-driven development, testing with analytics and heat maps, and iterating based on user research. The goal is to incorporate UX best practices into each phase of development to build intuitive, user-centered products.
The document discusses user experience (UX) design and how it relates to the software development process. It describes UX tools that can be used during each development phase, including mind maps, site maps, personas, user flows, wireframes, heatmaps and analytics. The goal is to involve UX design principles at every step to develop software that meets users' needs and provides a positive experience.
Tips & tricks *for developers, by a developer* on how to work with end users and the business, making software development a bit easier.
This was delivered at Link State 2014 at Case Western Reserve University in Cleveland, OH on September 20, 2014.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
2. A C# Dev’s Guide to Python
Presented by:
Sarah Dutkiewicz
Microsoft MVP, Visual Studio and
Development Technologies
Microsoft Developers HK
13 June, 2018
3. About the Presenter
• 9 time Microsoft Most Valuable
Professional – 2 years in Visual C#, 7 years
in Visual Studio and Development Tools
• Bachelor of Science in Computer Science &
Engineering Technology
• Published author of a PowerShell book
• Live coding stream guest on Fritz and
Friends and DevChatter
• Why Hong Kong? #ancestraltrip!
4. The Python Community
Python’s community is vast;
diverse & aims to grow;
Python is Open.
https://www.python.org/community/
5. Diversity in Python
The Python Software Foundation and the global Python
community welcome and encourage participation by
everyone. Our community is based on mutual respect,
tolerance, and encouragement, and we are working to help
each other live up to these principles. We want our
community to be more diverse: whoever you are, and
whatever your background, we welcome you.
https://www.python.org/community/diversity/
6. The Zen of Python,
by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
https://www.python.org/dev/peps/pep-0020
7. Areas Using Python
• Analysis
• Computation
• Math
• Science
• Statistics
• Engineering
• Deep Learning
• Artificial Intelligence
• Machine Learning
• Data Science
8. Weakness of Python
• The Great Python Schism
• Hard version split between 2.x and 3.x
• Some people are stuck on 2.x due to dependencies following the 2.x line
• Greatly opinionated
There should be one-- and preferably only one --obvious way to do it.
9. Why Python 3?
• Python 2 struggles with text and binary data
• ‘abcd’ is both a string consisting of letters (textual) and a string consisting of
bytes (binary)
• Goes against the “preferably one way” part of the Zen of Python
• Doesn’t do well with Unicode
• Python was out before Unicode was a standard
• Not all projects in Python 2 support Unicode equally
• Python 3
• unicode/str/bytes types
• Backwards-incompatible – but very much necessary, as Python is a language
of the world
10. Python Enhancement Proposals (PEPs)
• https://www.python.org/dev/peps/
• Purpose and Guidelines for PEPs
• Guidelines for Language Evolution
• Deprecation of Standard Modules
• Bug Fixes
• Style Guides
• Docstring Conventions
• API for crypto
• API for Python database
• Python release schedules
• … and more!
11. Other Terms…
• Benevolent Dictator for Life (BDFL) – Guido van
Rossum, father of Python
• Pythonista – Python developer
• Pythonic – code follows common guidelines, written
in idiomatic Python
• Pythoneer – pioneers of Python, leaders who create
change
• A Pythoneer can be a Pythonista, but not all
Pythonistas are Pythoneers.
12. Some Tools to Know
• Visual Studio with Python Tools
• Visual Studio Code
• Azure Notebooks
• Repl.it
• Jupyter Notebooks
• PyCharm
13. Package Management
• Think NuGet only for Python
• Pip (Pip Installs Packages)
• Python’s official package manager
• Virtualenv
• Install pip packages in an isolated manner
• Conda – conda.io
• Not Python-specific – a cross-platform option similar
to apt and yum
• Part of Miniconda
• Just conda and its dependencies
• Also part of Anaconda
• Conda, its dependencies, and many packages helpful in
data science applications
• More than one? Isn’t this anti-Zen? Yes,
but…
http://jakevdp.github.io/blog/2016/08/25/co
nda-myths-and-misconceptions/
14. Presentation Breakdown
• Simple Python demos in a Jupyter Notebook – to be shared in
an Azure Notebook
• Variables
• Conditional Structures
• Loops
• Functions
• Exception Handling
• Azure Notebook Library:
https://notebooks.azure.com/cletechconsulting/libraries/
introtopyforcsharpdevs
• More complex code using Visual Studio Community Edition
with the Python Tools and/or Visual Studio Code
• GitHub repo:
https://github.com/sadukie/IntroToPyForCSharpDevs
15. What version of Python am I running?
Location Command
Command-line python -V
Within a Python
environment
import sys
sys.version
16. Key Points from Python Style Guide (PEP 8)
• Indentation – 4 spaces
• Optional for continuation lines
• Make it readable and clearly identifiable
• If tabs are already in use, continue with tabs
• Do not mix tabs and spaces!
• Maximum line length should be 79 characters
• Easy for side-by-side files
• Works well for code review situations
• Docstrings and comments should be limited to 72 characters
• Imports on separate lines, always at the top
• Be consistent with quoting – single-quoted and double-quoted strings are the same.
• Read more at https://www.python.org/dev/peps/pep-0008/#imports
17. DEMO: Basics of Python
If Internet is present: Azure Notebooks
If Internet is not present: Jupyter Notebooks
19. Object Orientation
• Object oriented from the beginning
• Classes with:
• Data members (class variables and instance variables)
• Methods
• Class Variables vs Instance Variables
• Class variables are accessed for all instances of a class
• Within a class, outside of methods
• Not common
• Instance variables are managed by the instance
20. Inheritance
• Can inherit from multiple classes
• Can check relationships with isinstance() and issubclass()
• Parent class is accessed via super() method call
• Typical to call parent’s __init__() from within child’s __init__() before
moving on in the child’s initialization method
• Child knows about parents through its __bases__attribute
21. Interfaces
• Not necessary in Python
• No interface keyword in Python
• Try to invoke a method we expect
• Exception handling
• hasattr checking
• Duck typing
If it talks and walks like a duck, then it is a duck
22. Metaclasses
• Things typically defined in the language specification in other
languages
• Classes’ classes
• Class factories!
• Can be stored in a __metaclass__ attribute
• Can also be declared with metaclass= in the class declaration, following
parameters
• Most classes have the metaclass of type
• Traverse the __class__ tree enough, and you’ll end at type
23. Abstract Base Classes
• ABCs!
• abc module
• ABCMeta metaclass
• Use the pass keyword to not define the method’s body
• Must also use the @abc.abstractmethod decorator
• Can register classes as virtual subclasses of ABCs
• Only useful for categorization
• Does not know anything about its parent – nothing in __bases__
• Can throw errors if methods aren’t implemented
26. Magic Methods
• Key concept to understand for OO Python
• Method names are surrounded by double underscores (“dunders”)
• Sometimes called dunder methods
• Object’s lifespan in magic methods
• __new__ - redefined rarely; used to create new instances; phase 1 of the
constructor
• __init__ - initializer for the class; passed the instance; most commonly used in
Python class definitions
• __del__ - the destructor; no guarantee that __del__ will be executed
30. Desktop Application Development
• Tkinter (“Tk
interface”) – Defacto
GUI creation in
Python for writing
desktop apps based
on Tcl/Tk
• PyQt – Python
package for writing
desktop apps based
on Qt
• If you prefer GTK:
• PyGObject
• pygtk
http://www.pygame.org
https://kivy.org
35. SQL Server 2017 & Machine Learning
• Run Python in the server
• Brings computation to the data
• revoscalepy: https://docs.microsoft.com/en-us/machine-learning-
server/python-reference/revoscalepy/revoscalepy-package
36. Learn More!
• Seminar of Machine Learning in Python – Open Source Hong Kong –
led by Delon Yau, Software Engineer, Microsoft -
https://www.meetup.com/opensourcehk/events/251121245/
• Getting Started with Python in Visual Studio Code:
https://code.visualstudio.com/docs/python/python-tutorial
• Python Tools for Visual Studio:
https://www.visualstudio.com/vs/features/python/
• Python at Microsoft blog:
https://blogs.msdn.microsoft.com/pythonengineering/
Abstract: As technology continues to evolve, our toolset as developers evolves as well. While we can use C# for many things, other languages are growing in popularity in other areas - such as Python being used in AI, ML, and other aspects of data science. In this session, we will see how we do things in Python compared to what we do in C#. Some of the tools we will look at include Anaconda with Visual Studio Code and Visual Studio's Python tooling.