Practical Data Analysis in Python

•Descargar como PPTX, PDF•

20 recomendaciones•7,396 vistas

These are the slides from my presentation to the NYC Python Meetup on July 28, 2009. The presentation was an overview of data analysis techniques and various python tools and libraries, along with the practical example (with code and algorithms) of a Twitter spam filter implemented with NLTK.

Tecnología Educación

Practical Data Analysis in Python
Hilary Mason
@hmason
www.hilarymason.com
hilary@path101.com

Data is ubiquitous.
The ability and tools to use it are not.

Data Analysis on the Web
Data items change rapidly.
Data items are not independent.
There’s a lot of semi-structured data around.
There’s a LOT of data around.
==
Too many problems, few tools, and few experts.

Entity Disambiguation
This is important.

Entity Disambiguation
This is important.
Company disambiguation is a very common
problem – Are “Microsoft”, “Microsoft
Corporation”, and “MS” the same company?
This is a hard problem.

Classification
Document classification.
Image recognition.
Topic recognition.

Recommendation Systems
Product recommendations.
Disease predictions.
Behavior analysis.

IEEE Tag Clustering
immunity
ultrasound
medical
imaging
medical
devices
thermoelectric
devices
fault-tolerant
circuits
low power
devices

Python for Data Analysis
import why_python_is_awesome
Python is readable.
Easy to transition from Matlab or R.
Numerical computing support.
Growing set of machine learning libraries.

Libraries
NLTK (Natural Language Toolkit) – www.nltk.org
mlpy (Machine Learning PY) – mlpy.fbk.eu
numpy & scipy – scipy.org

An EC2 AMI provisioned with all of the toys you
need:
http://blog.infochimps.org/2009/02/06/start-
hacking-machetec2-released/
MachetEC2

Supervised Classification
Text
Feature
Extractor
Trained
Classifier
Spam
Not Spam
Training
Data
Feature
Extractor

Data: Tweets
Hand-classified. For example, some spam:
| don't disrespect me. I just wanted yall to get a head start so
don't feel bad when I have more followers in two days.
http://xyyx.eu/a1ha |
| oh yay more new followers..hiii...if u want go to
http://xyyx.eu/a1hb
|
| My friend made this new tool to get more twitter followers,
http://xyyx.eu/a1ht
|
| Yes, Twitter is doing some Follower/Following count
corrections. Get it back at: http://xyyx.eu/a1h8
|
| man if i see one more person cry about losing followers!!!
http://xyyx.eu/a1h4
|

$Features def document_features(self, document): document_words = set(document) features = {} for word in self.word_features: features['contains(%s)' % word] = (word in document_words) return features Break tweets into lists of relevant words.$

Naïve Bayesian Classifer
P(A|B) = the conditional probability of A given B
http://yudkowsky.net/rational/bayes
http://blog.oscarbonilla.com/2009/05/visualizin
g-bayes-theorem/
classifier = nltk.NaiveBayesClassifier.train(train_set)

Classifer Accuracy
Use a hand-classified test set to see the accuracy
of the classifier:
nltk.classify.accuracy(classifier, test_set)

Feature Relevance
contains(') = True not_s : spam = 53.6 : 1.4
contains(") = True not_s : spam = 32.2 : 1.1
contains(#) = True not_s : spam = 22.0 : 1.0
contains(!) = True not_s : spam = 10.8 : 1.0
contains(*) = True spam : not_s = 7.4 : 1.0
contains(=) = True not_s : spam = 5.5 : 1.0
contains(i) = False spam : not_s = 5.2 : 1.0
contains(?) = True not_s : spam = 2.4 : 1.0
contains(:) = True spam : not_s = 2.3 : 1.0
contains(&) = True not_s : spam = 1.8 : 1.0
contains(;) = True not_s : spam = 1.6 : 1.0
contains($) = True spam : not_s = 1.5 : 1.0
contains(u) = True spam : not_s = 1.5 : 1.0
contains(2.0) = False not_s : spam = 1.4 : 1.0
contains(saw) = False not_s : spam = 1.4 : 1.0
contains(noble) = False not_s : spam = 1.4 : 1.0
contains(sound) = False not_s : spam = 1.3 : 1.0
contains(approach) = False not_s : spam = 1.3 : 1.0
contains(finally) = False not_s : spam = 1.3 : 1.0
contains(more) = False spam : not_s = 1.3 : 1.0

Results
90% accuracy on spam tweets – not bad!
Other possibilities:
categorization – what do you tweet about?
human vs bot?
which celebrity tweeter are you?

Más contenido relacionado

Destacado

pandas - Python Data Analysis

Andrew Henshaw

Parsing real-time data using Twitter Streaming API

Ram Parthasarathy

Data Analysis in Python

Richard Herrell

Python and Data Analysis

Praveen Nair

Intro to Python Data Analysis in Wakari

Karissa Rae McKelvey

Getting started with pandas

maikroeder

pandas: Powerful data analysis tools for Python

Wes McKinney

Python for Financial Data Analysis with pandas

Wes McKinney

CLASSIFICATION OF TWEETS

Mukul Jha

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...

Edureka!

Python for Data Analysis: Chapter 2

智哉今西

Creative Data Analysis with Python

Grant Paton-Simpson

Researh toolbox-data-analysis-with-python

Waternomics

Making your-very-own-android-apps-for-waternomics-using-app-inventor-2

Waternomics

Data analysis with pandas

Outreach Digital

Creating Your First Predictive Model In Python

Robert Dempsey

Categorical Data Analysis in Python

Jaidev Deshpande

Big data analysis in python @ PyCon.tw 2013

Jimmy Lai

Analyzing Data With Python

Sarah Guido

Data Structures for Statistical Computing in Python

Wes McKinney

Destacado (20)

pandas - Python Data Analysis

Parsing real-time data using Twitter Streaming API

Data Analysis in Python

Python and Data Analysis

Intro to Python Data Analysis in Wakari

Getting started with pandas

pandas: Powerful data analysis tools for Python

Python for Financial Data Analysis with pandas

CLASSIFICATION OF TWEETS

Webinar: Mastering Python - An Excellent tool for Web Scraping and Data Anal...

Python for Data Analysis: Chapter 2

Creative Data Analysis with Python

Researh toolbox-data-analysis-with-python

Making your-very-own-android-apps-for-waternomics-using-app-inventor-2

Data analysis with pandas

Creating Your First Predictive Model In Python

Categorical Data Analysis in Python

Big data analysis in python @ PyCon.tw 2013

Analyzing Data With Python

Data Structures for Statistical Computing in Python

Similar a Practical Data Analysis in Python

Software continues to revolutionize the world, impacting nearly every aspect of our work, family, and personal life. Artificial intelligence (AI) and machine learning (ML) are playing key roles in this revolution through improvements in search results, recommendations, forecasts, and other predictions. AI and ML technologies are being used in platforms for digital assistants, home entertainment, medical diagnosis, customer support, and autonomous vehicles. Testing practitioners are recognizing the potential for advances in AI and ML to be leveraged for automated testing—an area that still requires significant manual effort. Tariq King and Jason Arbon introduce you to the world of AI for software testing. Learn the fundamentals behind autonomous and intelligent agents, ML approaches including Bayesian networks, decision tree learning, neural networks, and reinforcement learning. Discover how to apply these techniques to common testing tasks such as identifying testable features, generating test flows, and detecting erroneous states.

AI and ML Skills for the Testing World Tutorial

Tariq King

Static Analysis

alice yang

First ML Experience

Amrith Kumar

From list sorting to network routing, and from hash tables to capacity planning, a programmer's daily work is filled with probability. We use probabilistic algorithms, data structures, and systems constantly often without even thinking about it. Experienced engineers reach for probabilistic algorithms frequently and intentionally, especially when building systems of serious scale. How do probabilistic algorithms actually work in practice? And how do we know they'll be safe and reliable in our critical production systems? We'll address those questions, explore a few algorithms, and see why "with high probability" is often better than "exactly".

It Probably Works - QCon 2015

Fastly

A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...

wajrcs

yelp data challenge

AMR koura

Computational decision making

Boris Adryan

Debugging AI

Dr. Christian Betz

EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER

Andrey Karpov

Ember

mrphilroth

A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...

Silvio Cesare

Using the Machine to predict Testability

Miguel Lopez

Machine Learning, Data Mining, Genetic Algorithms, Neural ...

butest

Neural networks, naïve bayes and decision tree machine learning

Francisco E. Figueroa-Nigaglioni

Automated Machine Learning Applied to Diverse Materials Design Problems

Anubhav Jain

Jugal Parikh, Microsoft Holly Stewart, Microsoft Humans are susceptible to social engineering. Machines are susceptible to tampering. Machine learning is vulnerable to adversarial attacks. Singular machine learning models can be “gamed” leading to unexpected outcomes. In this talk, we’ll compare the difficulty of tampering with cloud-based models and client-based models. We then discuss how we developed stacked ensemble models to make our machine learning defenses less susceptible to tampering and significantly improve overall protection for our customers. We talk about the diversity of our base ML models and technical details on how they are optimized to handle different threat scenarios. Lastly, we’ll describe suspected tampering activity we’ve witnessed using protection telemetry from over half a billion computers, and whether our mitigation worked.

BlueHat v18 || Protecting the protector, hardening machine learning defenses ...

BlueHat Security Conference

A lot of people talk about Data Mining, Machine Learning and Big Data. It clearly must be important, right? A lot of people are also trying to sell you snake oil - sometimes half-arsed and overpriced products or solutions promising a world of insight into your customers or users if you handover your data to them. Instead, trying to understanding your own data and what you could do with it, should be the first thing you’d be looking at. In this talk, we’ll introduce some basic terminology about Data and Text Mining as well as Machine Learning and will have a look at what you can on your own to understand more about your data and discover patterns in your data.

Introduction to Data Mining

Kai Koenig

B4UConference_machine learning_deeplearning

Hoa Le

Machine Learning for Product Managers

Neal Lathia

Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them. In this session, Francesca will go over a few methods and tools that enable you to “unpack" machine learning models, gain insights into how and why they produce specific results, assess your AI systems fairness and mitigate any observed fairness issues. Using open source fairness and interpretability packages, attendees will learn how to: - Explain model prediction by generating feature importance values for the entire model and/or individual datapoints. - Achieve model interpretability on real-world datasets at scale, during training and inference. - Use an interactive visualization dashboard to discover patterns in data and explanations at training time. - Leverage additional interactive visualizations to assess which groups of users might be negatively impacted by a model and compare multiple models in terms of their fairness and performance.

Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...

Francesca Lazzeri, PhD

Similar a Practical Data Analysis in Python (20)

AI and ML Skills for the Testing World Tutorial

Static Analysis

First ML Experience

It Probably Works - QCon 2015

A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...

yelp data challenge

Computational decision making

Debugging AI

EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER

Ember

A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...

Using the Machine to predict Testability

Machine Learning, Data Mining, Genetic Algorithms, Neural ...

Neural networks, naïve bayes and decision tree machine learning

Automated Machine Learning Applied to Diverse Materials Design Problems

BlueHat v18 || Protecting the protector, hardening machine learning defenses ...

Introduction to Data Mining

B4UConference_machine learning_deeplearning

Machine Learning for Product Managers

Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...

Más de Hilary Mason

See the video here: http://gracehopper.anitaborg.org/conference-overview/livestream-schedule-2015/wednesday-livestream-2015/ My GHC keynote covers the state of technology in Machine Intelligence, and also what it means to have a career as a computer scientist in a field where the technology and the products are evolving so quickly. (I use a presentation style with lots of images and very little content in the slides, and this presentation was not designed to stand alone.)

Grace Hopper Conference Opening Keynote

Hilary Mason

Short URLs, Big Fun

Hilary Mason

Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime

Hilary Mason

PyCon 2011 Keynote

Hilary Mason

Machine Learning for Web Data

Hilary Mason

A Data-driven Look at the Realtime Web

Hilary Mason

IgniteNYC: How to Replace Yourself With a Very Small Shell Script

Hilary Mason

Have data? What now?!

Hilary Mason

JWU Guest Talk: JavaScript and AJAX

Hilary Mason

Analytics for Virtual Worlds

Hilary Mason

Experiential Learning in Second Life

Hilary Mason

Virtual Worlds in Education

Hilary Mason

Más de Hilary Mason (12)

Grace Hopper Conference Opening Keynote

Short URLs, Big Fun

Strata NY Sep 2011: Big Data, Short URLs: Learning in Realtime

PyCon 2011 Keynote

Machine Learning for Web Data

A Data-driven Look at the Realtime Web

IgniteNYC: How to Replace Yourself With a Very Small Shell Script

Have data? What now?!

JWU Guest Talk: JavaScript and AJAX

Analytics for Virtual Worlds

Experiential Learning in Second Life

Virtual Worlds in Education

Último

Keynote 2: APIs in 2030: The Risk of Technological Sleepwalk Paolo Malinverno, Growth Advisor - The Business of Technology Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

apidays

Following the popularity of "Cloud Revolution: Exploring the New Wave of Serverless Spatial Data," we're thrilled to announce this much-anticipated encore webinar. In this sequel, we'll dive deeper into the Cloud-Native realm by uncovering practical applications and FME support for these new formats, including COGs, COPC, FlatGeoBuf, GeoParquet, STAC, and ZARR. Building on the foundation laid by industry leaders Michelle Roby of Radiant Earth and Chris Holmes of Planet in the first webinar, this second part offers an in-depth look at the real-world application and behind-the-scenes dynamics of these cutting-edge formats. We will spotlight specific use-cases and workflows, showcasing their efficiency and relevance in practical scenarios. Discover the vast possibilities each format holds, highlighted through detailed discussions and demonstrations. Our expert speakers will dissect the key aspects and provide critical takeaways for effective use, ensuring attendees leave with a thorough understanding of how to apply these formats in their own projects. Elevate your understanding of how FME supports these cutting-edge technologies, enhancing your ability to manage, share, and analyze spatial data. Whether you're building on knowledge from our initial session or are new to the serverless spatial data landscape, this webinar is your gateway to mastering cloud-native formats in your workflows.

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Safe Software

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Passkeys: Developing APIs to enable passwordless authentication Cody Salas, Sr Developer Advocate | Solutions Architect - Yubico Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

apidays

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Dubai, known for its towering skyscrapers, luxurious lifestyle, and relentless pursuit of innovation, often finds itself in the global spotlight. However, amidst the glitz and glamour, the emirate faces its own set of challenges, including the occasional threat of flooding. In recent years, Dubai has experienced sporadic but significant floods, disrupting normalcy and posing unique challenges to its infrastructure. Among the critical nodes in this bustling metropolis is the Dubai International Airport, a vital hub connecting the world. This article delves into the intersection of Dubai flood events and the resilience demonstrated by the Dubai International Airport in the face of such challenges.

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf

Orbitshub

Effective data discovery is crucial for maintaining compliance and mitigating risks in today's rapidly evolving privacy landscape. However, traditional manual approaches often struggle to keep pace with the growing volume and complexity of data. Join us for an insightful webinar where industry leaders from TrustArc and Privya will share their expertise on leveraging AI-powered solutions to revolutionize data discovery. You'll learn how to: - Effortlessly maintain a comprehensive, up-to-date data inventory - Harness code scanning insights to gain complete visibility into data flows leveraging the advantages of code scanning over DB scanning - Simplify compliance by leveraging Privya's integration with TrustArc - Implement proven strategies to mitigate third-party risks Our panel of experts will discuss real-world case studies and share practical strategies for overcoming common data discovery challenges. They'll also explore the latest trends and innovations in AI-driven data management, and how these technologies can help organizations stay ahead of the curve in an ever-changing privacy landscape.

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

TrustArc

MS Copilot expands with MS Graph connectors

Nanddeep Nachan

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

💥 You’re lucky! We’ve found two different (lead) developers that are willing to share their valuable lessons learned about using UiPath Document Understanding! Based on recent implementations in appealing use cases at Partou and SPIE. Don’t expect fancy videos or slide decks, but real and practical experiences that will help you with your own implementations. 📕 Topics that will be addressed: • Training the ML-model by humans: do or don't? • Rule-based versus AI extractors • Tips for finding use cases • How to start 👨‍🏫👨‍💻 Speakers: o Dion Morskieft, RPA Product Owner @Partou o Jack Klein-Schiphorst, Automation Developer @Tacstone Technology

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

UiPathCommunity

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

MINDCTI Revenue Release Quarter One 2024

MIND CTI

Practical Data Analysis in Python

1. Practical Data Analysis in Python Hilary Mason @hmason www.hilarymason.com hilary@path101.com

2. Data is ubiquitous. The ability and tools to use it are not.

3. (Focused) Data == Intelligence

4. Data Analysis on the Web Data items change rapidly. Data items are not independent. There’s a lot of semi-structured data around. There’s a LOT of data around. == Too many problems, few tools, and few experts.

5. Entity Disambiguation This is important.

6. ME UGLY HAG

7. Entity Disambiguation This is important. Company disambiguation is a very common problem – Are “Microsoft”, “Microsoft Corporation”, and “MS” the same company? This is a hard problem.

8. SPAM sucks

9. Classification Document classification. Image recognition. Topic recognition.

10. Text Parsing

11. Recommendation Systems Product recommendations. Disease predictions. Behavior analysis.

12. IEEE Tag Clustering immunity ultrasound medical imaging medical devices thermoelectric devices fault-tolerant circuits low power devices

13. Python for Data Analysis import why_python_is_awesome Python is readable. Easy to transition from Matlab or R. Numerical computing support. Growing set of machine learning libraries.

14. Libraries NLTK (Natural Language Toolkit) – www.nltk.org mlpy (Machine Learning PY) – mlpy.fbk.eu numpy & scipy – scipy.org

15. An EC2 AMI provisioned with all of the toys you need: http://blog.infochimps.org/2009/02/06/start- hacking-machetec2-released/ MachetEC2

16.

17. Supervised Classification Text Feature Extractor Trained Classifier Spam Not Spam Training Data Feature Extractor

18. Data: Tweets Hand-classified. For example, some spam: | don't disrespect me. I just wanted yall to get a head start so don't feel bad when I have more followers in two days. http://xyyx.eu/a1ha | | oh yay more new followers..hiii...if u want go to http://xyyx.eu/a1hb | | My friend made this new tool to get more twitter followers, http://xyyx.eu/a1ht | | Yes, Twitter is doing some Follower/Following count corrections. Get it back at: http://xyyx.eu/a1h8 | | man if i see one more person cry about losing followers!!! http://xyyx.eu/a1h4 |

19. Features def document_features(self, document): document_words = set(document) features = {} for word in self.word_features: features['contains(%s)' % word] = (word in document_words) return features Break tweets into lists of relevant words.

20. Naïve Bayesian Classifer P(A|B) = the conditional probability of A given B http://yudkowsky.net/rational/bayes http://blog.oscarbonilla.com/2009/05/visualizin g-bayes-theorem/ classifier = nltk.NaiveBayesClassifier.train(train_set)

21. Classifer Accuracy Use a hand-classified test set to see the accuracy of the classifier: nltk.classify.accuracy(classifier, test_set)

22. Feature Relevance contains(') = True not_s : spam = 53.6 : 1.4 contains(") = True not_s : spam = 32.2 : 1.1 contains(#) = True not_s : spam = 22.0 : 1.0 contains(!) = True not_s : spam = 10.8 : 1.0 contains(*) = True spam : not_s = 7.4 : 1.0 contains(=) = True not_s : spam = 5.5 : 1.0 contains(i) = False spam : not_s = 5.2 : 1.0 contains(?) = True not_s : spam = 2.4 : 1.0 contains(:) = True spam : not_s = 2.3 : 1.0 contains(&) = True not_s : spam = 1.8 : 1.0 contains(;) = True not_s : spam = 1.6 : 1.0 contains($) = True spam : not_s = 1.5 : 1.0 contains(u) = True spam : not_s = 1.5 : 1.0 contains(2.0) = False not_s : spam = 1.4 : 1.0 contains(saw) = False not_s : spam = 1.4 : 1.0 contains(noble) = False not_s : spam = 1.4 : 1.0 contains(sound) = False not_s : spam = 1.3 : 1.0 contains(approach) = False not_s : spam = 1.3 : 1.0 contains(finally) = False not_s : spam = 1.3 : 1.0 contains(more) = False spam : not_s = 1.3 : 1.0

23. Kitchen Sink wash, rinse, repeat

24. Results 90% accuracy on spam tweets – not bad! Other possibilities: categorization – what do you tweet about? human vs bot? which celebrity tweeter are you?

25. <3 Data Thank you!

Notas del editor

1) Access to the data, and 2) CPU power/algorithms that are robust enough to analyze it
NLTK – in development since 2001

Practical Data Analysis in Python

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Practical Data Analysis in Python

Similar a Practical Data Analysis in Python (20)

Más de Hilary Mason

Más de Hilary Mason (12)

Último

Último (20)

Practical Data Analysis in Python

Notas del editor