Pppeople 2020

•Descargar como PPT, PDF•

1 recomendación•319 vistas

An attempt to create an "if you like this person, you make want to know about these people" interface. http://pppeople.collabtools.org.uk/

Tecnología

If you like this person you may also like

The Cunning “ Plan ” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Crawlers: AKA spiders, bots, scrapers, data mining 80legs can crawl over 5,000,000 web pages in 1 hour Yahoo BOSS http://www.ibm.com/developerworks/linux/library/l-spider/?ca=dgr-lnxw01WebSpiderLinux ScraperWiki But Yahoo already has!!! Python crawlers • Mechanize • Harvestman • Scrapy • Spynner 99 on Google Code! Extractiv

Database http://neo4j.org/ The largest production cluster has over 100 TB of data in over 150 machines.

Semantic Engine http://media.jesselegg.com/djangocalais/

Visualisation http://www.twitt3d.com http://www.neuroproductions.be/twitter_friends_network_browser/ Neo4j + Gephi http://thejit.org/

The Result http://pppeople.collabtools.org.uk

“ In theory ” Neo4j Gephi Treebeard Freebase Wikipedia Twitter Delicious Betsy Harvestman Bug Missing API

Data Cleansing ,[object Object],[object Object],[object Object],[object Object],Data Scrying &

Not working with people slows you down Working with people slows you down “ It ’ s just one big matrix ”

Bad Semantics Jargon Buster SIPIG WSG DPS V/C/011 Zero Point Energy Codex Alimentarius Dept. Buster

No data creation Cheap tricks: Pictures and Google

Más contenido relacionado

Similar a Pppeople 2020

Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY

Big Data Analysis : Deciphering the haystack Srinath Perera

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY

We are Digital PuppetsSecpro - Security Professionals

1st Birmingham Big Data Science Group meetup Faizan Javed

Enterprise Open Source Intelligence GatheringTom Eston

Mashups & Data Visualizations: The New Breed of Web ApplicationsDarlene Fichter

Semtech bizsemanticsearchtutorialBarbara Starr

Building a Big Data PipelineJesus Rodriguez

Filtering From the Firehose: Real Time Social Media StreamingCloud Elements

Hacking for Innovation: IIT KharagpurSaurabh Sahni

Big DataMehmet Burak Akgün

Generative Artificial Intelligence and Data Privacy: A Primer Internet Law Center

YQL: Select * from Internetdrgath

Ipres2019 sn-stormcrawlersebastian_nagel

Democratizing AI with Apache SparkSpark Summit

YQL:: Select * from Internetdrgath

Tom Critchlow - Data Feed SEO & Advanced Site Architectureauexpo Conference

My Little Data in a Big Data WorldCandida Haynes

Leveraging the semantic web meetup, Semantic Search, Schema.org and moreBarbaraStarr2009

Similar a Pppeople 2020 (20)

Elasticsearch : petit déjeuner du 13 mars 2014

Big Data Analysis : Deciphering the haystack

Séminaire Big Data Alter Way - Elasticsearch - octobre 2014

We are Digital Puppets

1st Birmingham Big Data Science Group meetup

Enterprise Open Source Intelligence Gathering

Mashups & Data Visualizations: The New Breed of Web Applications

Semtech bizsemanticsearchtutorial

Building a Big Data Pipeline

Filtering From the Firehose: Real Time Social Media Streaming

Hacking for Innovation: IIT Kharagpur

Big Data

Generative Artificial Intelligence and Data Privacy: A Primer

YQL: Select * from Internet

Ipres2019 sn-stormcrawler

Democratizing AI with Apache Spark

YQL:: Select * from Internet

Tom Critchlow - Data Feed SEO & Advanced Site Architecture

My Little Data in a Big Data World

Leveraging the semantic web meetup, Semantic Search, Schema.org and more

Último

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Real Time Object Detection Using Open CVKhem

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Artificial Intelligence: Facts and MythsJoaquim Jorge

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Why Teams call analytics are critical to your entire businesspanagenda

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Pppeople 2020

1. PPPeople PPPowered

2. If you like this person you may also like

4. Crawlers: AKA spiders, bots, scrapers, data mining 80legs can crawl over 5,000,000 web pages in 1 hour Yahoo BOSS http://www.ibm.com/developerworks/linux/library/l-spider/?ca=dgr-lnxw01WebSpiderLinux ScraperWiki But Yahoo already has!!! Python crawlers • Mechanize • Harvestman • Scrapy • Spynner 99 on Google Code! Extractiv

5. Database http://neo4j.org/ The largest production cluster has over 100 TB of data in over 150 machines.

6. Semantic Engine http://media.jesselegg.com/djangocalais/

7. Social Media Account Details

8. Visualisation http://www.twitt3d.com http://www.neuroproductions.be/twitter_friends_network_browser/ Neo4j + Gephi http://thejit.org/

9. Social Media Integration

10. The Result http://pppeople.collabtools.org.uk

11. Lessons Learned You ’ re on your own

12. “ In theory ” Neo4j Gephi Treebeard Freebase Wikipedia Twitter Delicious Betsy Harvestman Bug Missing API

13.

14. Not working with people slows you down Working with people slows you down “ It ’ s just one big matrix ”

15. Bad Semantics Jargon Buster SIPIG WSG DPS V/C/011 Zero Point Energy Codex Alimentarius Dept. Buster

16. Browse vs Search

17. No data creation Cheap tricks: Pictures and Google

18. What Brings People Back?

19. The “ jiggle ” is everything

20.

Notas del editor

JISC funded project – The call was about COLLABORATION. Small project £16,000? Worked on it when there were “quiet times” on the Collab Tools project.
The idea was to create an Amazon-like tool – “If you like this person – you may want to know these people” – drop that information into a social network ( something people used regularly )
All the bits are “out there” already – it should be just be a matter of assembling them. A CRAWLER is some code that goes and gets web pages. You give a SEMANTIC ENGINE messy data, like web pages and it gives you CONCEPTS and meaning The VISUALISATIONS are rife, let’s find an appropriate one and use it. Make the data EDITABLE by the people.
In a way there are lots of CRAWLING TECHNOLOGIES to choose from. 80legs is a service ( as is YAHOO BOSS) . You say, start with this URL and these regular expressions, call me when you have a spreadsheet. Yahoo have already crawled your website, used XPATH to fish data out. Well-proven tools like HT DIG and CURL ( quite hard to use not quite what I wanted ) The are open source crawlers ( most of them are RUBBISH! ) Used Harvestman. Indian developer. I crawled YORK, LEEDS and SHEFFIELD web sites ( AND the white rose consortium repository ) took a few days each
You have to store the data somewhere. MySQL is an obvious choice – but all the cool kids are using NOSQL databases. HOW YOU STORE DATA IS HOW YOU THINK ( ABOUT DATA ) Schema less. Graph databases ( to me ) seemed even cooler because you can do queries where you discover things. All the people who know 5 people or more who have been to the same country on an event linked to Biology.
One you have your data, you then want to find out about it. I looked at NODEBOX which is a collection of python libraries that let you SUMMARIZE data, get EMOTIAL rankings and SYNONYMS and it does VISUALISATION ( see background). Too complicated… too much data. There are services like Textwise, OpenAmplify and Calais. You say, here’s a web page, they say CABBAGES, FRANCE and TOMMY COOPER. OpenCalais – Thomson Reuters. Django module. GOOD ENOUGH
The next step was to ask everyone at York about their social media usage. Twitter accounts, blogs, followers, links. The survey tool was un-workable. I got nervous about asking for this data since I was already getting some people being a bit sniffy about using data from the web site. ALREADY HAD TOO MUCH DATA. VISUALISATIONS THAT LOOK PRETTY But don’t tell you anything.
The next step was to show that information somehow, in a way that people could interact and explore it. Javascript Libraries like THE JIT, Software tools like Gephi ) draw it all yourself like PROCESSING.
And the intention was for that data to end up in a YORK SOCIAL NETWORK PROFILE – automatically generated… Cyn.in PROFILES are a bit lame. BUDDYPRESS on the way … STATUSNET perhaps…
This is the end result… A Network of CONCEPTS and PEOPLE… linked to CONCEPTS and PEOPLE. This was a triumph of simply GETTING SOMETHING DONE.
JISC had lots of sites – conference and met other bid winners, project blog, Google Docs, wikis – Frederique left. Going to be “found out” with lots of bits of code that didn’t work. COULDN’T WORK… Holy Grail.
Harvestman crawler – and old project… Delicious and Twitter changed their APIs – making them much less hacky NEO4J unfinished corners of the API – so I could either write it myself or wait a few months Experimenting with other technologies and datasets. You don’t know until it’s too late.
The hope was that you grab lots of data then SIFT out meaningful information. HAD TOO MUCH STUFF. Get rid of the crap. PLURALS – TELEPHONE NUMBERS – OBVIOUS PLACES -
Tried to get the YCSSA team to help with Neo4J graph database. They helped me to understand more about graphs and networks and how there are something even clever people ( or computers can’t understand). I started to try and create ONE BIG MATRIX – scary maths. DO THE SIMPLEST THING FIRST…. Even if it seems boring. Because then people will have something to look at and help you with.
NOT BAD IS GOOD ENOUGH … it’s whether a concept connects two things. EDITABLE is a must. THERE IS NO SILVER BULLET -
Rather than let people search then find nothing. Show people what’s available and let them choose. A type-ahead search box, only lets you search for what’s there. Linguistics Eye of the Beholder: People happily ignore the non-relevant stuff
Did require anyone to enter data Didn’t have to ask anyone Cheap trick: Biggest squarest image Maybe related ( via Google ). Like magic…
Wanted to change the direction of the project mid-way… because stumbled upon a KILLER APP. A NEWS SITE: but every news article is linked to known data about University of York… and Leeds and Sheffield…
The animation is visualisation added a dimension of time to information. Waiting. It actually saved a lot of coding BY ACCIDENT… by pulling well-connected concepts spatially nearer to each other… FUN People VANITY SURF… then move wider I have seen people using it, and use the words “I didn’t know they were doing that at Leeds”
Or “on something”. Waiting for our “ social host ” … would need better programmers, or more of a dev team workshop. Did a JISC conference attendees blogs ( from their twitter accounts ) … a way of “meeting people before a conferece” perhaps. The Lots of Big Ideas Proposal at The Hub. Bring the web pages onto the walls. Proposed this to Liz Waller for Harry Fairhurst. Could have done with some JISC help ( but also was scared by JISC advice ) I probably wouldn’t do it again…

Pppeople 2020

Recomendados

Recomendados

Más contenido relacionado

Similar a Pppeople 2020

Similar a Pppeople 2020 (20)

Último

Último (20)

Pppeople 2020

Notas del editor