SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
+ => 1 million SPDX 
Large-scale license transparency using open data, open standards and F/OSS 
http://triplecheck.net http://searchcode.com
Speaker 
Slide #2 
Nuno Brito 
 Free/open source contributor since 2005 
 Last 12 months wrote 100k F/OSS lines of code 
 SPDX contributor, co-founder of TripleCheck 
Around the web 
http://nunobrito.eu
Transparency 
Slide #3 
Take some source code as example 
Who developed the code? 
Which licenses are applicable? 
Was the code copied from somewhere else?
Size 
Slide #4 
A problem of scale 
Open licenses? > 300 types to choose 
> 5 million F/OSS projects 
> 100 million source code files
Practice 
Slide #5 
Applying licenses 
 Burden on developer (do correctly, do enough) 
 Expressed differently (difficult to understand) 
 Scaling obstacles (scarce automation) 
Transparency?
What do? 
Slide #6 
Ideally, we'd have tooling that is.. 
a) Reachable 
b) Cooperative 
c) Free 
Choose two. (sad reality)
Choose three 
Slide #7 
Choose building blocks based on: 
a) Open standards 
b) Open data 
c) Reachable tools 
Learn, write, improve. 
Share.
Standards 
Slide #8 
SPDX: Open standard for software licensing 
 Standardizes license description 
 Defines Id for license terms 
 http://spdx.org 
Pro: Good docs, straightforward, getting better 
Cons: Slow adoption, scarce tooling
Open data 
Slide #9 
GitHub: Targeting open data repositories 
 API suited for intensive access 
 Social coding 
 Largest open source code collection 
Pro: Reachable, diverse 
Cons: Repositories processed one-by-one
Tooling 
Slide #10 
Custom-built tools for software licenses 
 Large-scale repository data-mining 
 Find applicable licenses inside content 
 Share millions of SPDX documents 
Pro: Learn by doing, modularized, single language 
Cons: Built from scratch, needs consolidation
Step 1 
Slide #11 
Desktop tool/engine to discover licenses 
 SPDX format as storage medium 
 Identify copyright and 18 license types 
 Java, released in Feb 2014. EUPL 
http://spdx.org/tools/community/triplecheck-reporter
Desktop 
Slide #12
File detail 
Slide #13
SPDX file 
Slide #14
Customize 
Slide #15
Details 
Slide #16 
Underneath the hood 
 147 file extensions, 18 license types 
 LOC, hashes (SHA1, MD5, SHA256, SSDEEP) 
 Command line supported (Jenkins, cron) 
 Fast, 40k files/minute (Pentium IV)
Step 2 
Discovering repositories with gitFinder 
Create a list of projects online to use as components. 
Get basic licensing information from each project. 
 Write text file with each github user (~7 million) 
 For each user, find repositories not forked (~10M) 
 Split each repository according to language (197) 
 For each list of language/reps, download code 
Slide #17
Performance 
Slide #18 
~70k repositories/day 
 Single machine (i7, 8Gb RAM, CentOS) 
 9 parallel threads 
 Resume/recover supported 
 Released in Jun. 2014 
https://github.com/triplecheck/gitfinder
Output 
Slide #19
Storage? 
https://what-if.xkcd.com/29/ (CC BY-NC 2.5) Slide #20
Storage 
BigZip, +100 million files on a single download 
Slide #21 
 Flat-file, zip compression (per entry) 
 Fast, simple, portable. Indexed search 
https://github.com/triplecheck/big
How it looks 
Slide #22
Step 3 
Slide #23 
SPDX search engine 
 One-click SPDX creation from open data 
 Visualize license and copyright data 
 Visit at http://searchcode.com/spdx
Example 
Slide #24 
Using the original URL.. 
 https://github.com/iuly/europa_kernel/ 
=> 
 https://spdxhub.com/iuly/europa_kernel/
Example 
Slide #25
SPDX-1M 
“Do It Yourself” kit. Generate 1 million SPDX 
Slide #26 
 https://github.com/triplecheck/diy 
 1.2 million open source projects 
 “Arduino” for s/w licenses detection 
9Gb worth of SPDX? Grab: 
http://triplecheck.net/public/storage/spdx.big
Screenshots 
Slide #27
Next step? 
Slide #28 
F2F – pinpointing non-original code 
 Decompose code into blocks 
 Tokenize/anonymize data 
 Find code matches across knowledge base 
ETA in Dec. 2014 
https://github.com/triplecheck/f2f
Preview 
Slide #29
Conclusion 
Slide #30 
What is now available for everyone 
 Desktop tooling / detection engine 
 Extraction of open data in scale 
 Search engine for SPDX
Questions? 
Slide #31 
http://spdx.org 
http://searchcode.com/spdx 
http://github.com/triplecheck 
Interesting stuff? 
Let us know: @nn81 @boyte #linuxcon 
http://xkcd.com/1118/
Backup slides 
Slide #32
Engine 
Slide #33
License DB 
Slide #34
Components 
Slide #35
Exporting 
Slide #36

Más contenido relacionado

La actualidad más candente

Open Source Software Concepts
Open Source Software ConceptsOpen Source Software Concepts
Open Source Software ConceptsJITENDRA LENKA
 
The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180Mahmoud Samir Fayed
 
Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01Linaro
 
For the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-VFor the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-VDrew Fustini
 
Introduction to FOSS, SRM University
Introduction to FOSS, SRM UniversityIntroduction to FOSS, SRM University
Introduction to FOSS, SRM UniversityAtul Jha
 
Benefits of Opensource Products
Benefits of Opensource ProductsBenefits of Opensource Products
Benefits of Opensource ProductsAnju Merin
 
Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)Igalia
 
The open source philosophy
The open source philosophyThe open source philosophy
The open source philosophyGautam Krishnan
 
Free and open source software
Free and open source softwareFree and open source software
Free and open source softwareFrederik Questier
 
GNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and DifferencesGNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and DifferencesIresha Rubasinghe
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareRoss Gardler
 
Open Source Presentation
Open Source PresentationOpen Source Presentation
Open Source PresentationAdhoura Academy
 
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...OW2
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source Softwareiwilldo4u
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Hiro Yoshioka
 

La actualidad más candente (20)

Open Source Software Concepts
Open Source Software ConceptsOpen Source Software Concepts
Open Source Software Concepts
 
The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180
 
Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01
 
For the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-VFor the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-V
 
Open Source and Free Software
Open Source and Free SoftwareOpen Source and Free Software
Open Source and Free Software
 
Introduction to FOSS, SRM University
Introduction to FOSS, SRM UniversityIntroduction to FOSS, SRM University
Introduction to FOSS, SRM University
 
Benefits of Opensource Products
Benefits of Opensource ProductsBenefits of Opensource Products
Benefits of Opensource Products
 
Python at a glance
Python at a glancePython at a glance
Python at a glance
 
Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)
 
The open source philosophy
The open source philosophyThe open source philosophy
The open source philosophy
 
MSR09.ppt
MSR09.pptMSR09.ppt
MSR09.ppt
 
Free and open source software
Free and open source softwareFree and open source software
Free and open source software
 
GNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and DifferencesGNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and Differences
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source Software
 
Kivy report
Kivy reportKivy report
Kivy report
 
Open Source Presentation
Open Source PresentationOpen Source Presentation
Open Source Presentation
 
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source Software
 
Foss Presentation
Foss PresentationFoss Presentation
Foss Presentation
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
 

Similar a 2014 10-14: GitHub plus FOSS == 1 million SPDX

Android Developer Meetup
Android Developer MeetupAndroid Developer Meetup
Android Developer MeetupMedialets
 
Automate your iOS deployment a bit
Automate your iOS deployment a bitAutomate your iOS deployment a bit
Automate your iOS deployment a bitMichał Łukasiewicz
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...sparkfabrik
 
Ubucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSSUbucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSSNuno Brito
 
Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitcbenDesigning
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1benDesigning
 
Module 18 (linux hacking)
Module 18 (linux hacking)Module 18 (linux hacking)
Module 18 (linux hacking)Wail Hassan
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceGeeks Anonymes
 
2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARMAntonio Mondragon
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)dmgerman
 
Scanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.ioScanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.ioMichael Herzog
 
Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...OW2
 
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...Niklas Heidloff
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Codemotion
 
Microsoft Embracing Open Source Technologies
Microsoft Embracing Open Source TechnologiesMicrosoft Embracing Open Source Technologies
Microsoft Embracing Open Source TechnologiesRicardo Peres
 
Software Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSoftware Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSpeck&Tech
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxCarlos Cámara
 
Open source freeopensource & linux
Open source freeopensource & linuxOpen source freeopensource & linux
Open source freeopensource & linuxManura Perera
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentationLaura Steggles
 

Similar a 2014 10-14: GitHub plus FOSS == 1 million SPDX (20)

Android Developer Meetup
Android Developer MeetupAndroid Developer Meetup
Android Developer Meetup
 
Automate your iOS deployment a bit
Automate your iOS deployment a bitAutomate your iOS deployment a bit
Automate your iOS deployment a bit
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Ubucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSSUbucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSS
 
Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitc
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1
 
Module 18 (linux hacking)
Module 18 (linux hacking)Module 18 (linux hacking)
Module 18 (linux hacking)
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open Source
 
2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)
 
Scanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.ioScanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.io
 
Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...
 
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
 
Microsoft Embracing Open Source Technologies
Microsoft Embracing Open Source TechnologiesMicrosoft Embracing Open Source Technologies
Microsoft Embracing Open Source Technologies
 
Software Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSoftware Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & Profit
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital Toolbox
 
UnDeveloper Studio
UnDeveloper StudioUnDeveloper Studio
UnDeveloper Studio
 
Open source freeopensource & linux
Open source freeopensource & linuxOpen source freeopensource & linux
Open source freeopensource & linux
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 

Más de Nuno Brito

Triplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sampleTriplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sampleNuno Brito
 
Stop look and listen before you talk
Stop look and listen before you talkStop look and listen before you talk
Stop look and listen before you talkNuno Brito
 
Lifes Good In Portugal
Lifes Good In PortugalLifes Good In Portugal
Lifes Good In PortugalNuno Brito
 
Managing business relationships
Managing business relationshipsManaging business relationships
Managing business relationshipsNuno Brito
 
Explaining the WinBuilder framework
Explaining the WinBuilder frameworkExplaining the WinBuilder framework
Explaining the WinBuilder frameworkNuno Brito
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0Nuno Brito
 

Más de Nuno Brito (6)

Triplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sampleTriplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sample
 
Stop look and listen before you talk
Stop look and listen before you talkStop look and listen before you talk
Stop look and listen before you talk
 
Lifes Good In Portugal
Lifes Good In PortugalLifes Good In Portugal
Lifes Good In Portugal
 
Managing business relationships
Managing business relationshipsManaging business relationships
Managing business relationships
 
Explaining the WinBuilder framework
Explaining the WinBuilder frameworkExplaining the WinBuilder framework
Explaining the WinBuilder framework
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

2014 10-14: GitHub plus FOSS == 1 million SPDX

  • 1. + => 1 million SPDX Large-scale license transparency using open data, open standards and F/OSS http://triplecheck.net http://searchcode.com
  • 2. Speaker Slide #2 Nuno Brito  Free/open source contributor since 2005  Last 12 months wrote 100k F/OSS lines of code  SPDX contributor, co-founder of TripleCheck Around the web http://nunobrito.eu
  • 3. Transparency Slide #3 Take some source code as example Who developed the code? Which licenses are applicable? Was the code copied from somewhere else?
  • 4. Size Slide #4 A problem of scale Open licenses? > 300 types to choose > 5 million F/OSS projects > 100 million source code files
  • 5. Practice Slide #5 Applying licenses  Burden on developer (do correctly, do enough)  Expressed differently (difficult to understand)  Scaling obstacles (scarce automation) Transparency?
  • 6. What do? Slide #6 Ideally, we'd have tooling that is.. a) Reachable b) Cooperative c) Free Choose two. (sad reality)
  • 7. Choose three Slide #7 Choose building blocks based on: a) Open standards b) Open data c) Reachable tools Learn, write, improve. Share.
  • 8. Standards Slide #8 SPDX: Open standard for software licensing  Standardizes license description  Defines Id for license terms  http://spdx.org Pro: Good docs, straightforward, getting better Cons: Slow adoption, scarce tooling
  • 9. Open data Slide #9 GitHub: Targeting open data repositories  API suited for intensive access  Social coding  Largest open source code collection Pro: Reachable, diverse Cons: Repositories processed one-by-one
  • 10. Tooling Slide #10 Custom-built tools for software licenses  Large-scale repository data-mining  Find applicable licenses inside content  Share millions of SPDX documents Pro: Learn by doing, modularized, single language Cons: Built from scratch, needs consolidation
  • 11. Step 1 Slide #11 Desktop tool/engine to discover licenses  SPDX format as storage medium  Identify copyright and 18 license types  Java, released in Feb 2014. EUPL http://spdx.org/tools/community/triplecheck-reporter
  • 16. Details Slide #16 Underneath the hood  147 file extensions, 18 license types  LOC, hashes (SHA1, MD5, SHA256, SSDEEP)  Command line supported (Jenkins, cron)  Fast, 40k files/minute (Pentium IV)
  • 17. Step 2 Discovering repositories with gitFinder Create a list of projects online to use as components. Get basic licensing information from each project.  Write text file with each github user (~7 million)  For each user, find repositories not forked (~10M)  Split each repository according to language (197)  For each list of language/reps, download code Slide #17
  • 18. Performance Slide #18 ~70k repositories/day  Single machine (i7, 8Gb RAM, CentOS)  9 parallel threads  Resume/recover supported  Released in Jun. 2014 https://github.com/triplecheck/gitfinder
  • 21. Storage BigZip, +100 million files on a single download Slide #21  Flat-file, zip compression (per entry)  Fast, simple, portable. Indexed search https://github.com/triplecheck/big
  • 22. How it looks Slide #22
  • 23. Step 3 Slide #23 SPDX search engine  One-click SPDX creation from open data  Visualize license and copyright data  Visit at http://searchcode.com/spdx
  • 24. Example Slide #24 Using the original URL..  https://github.com/iuly/europa_kernel/ =>  https://spdxhub.com/iuly/europa_kernel/
  • 26. SPDX-1M “Do It Yourself” kit. Generate 1 million SPDX Slide #26  https://github.com/triplecheck/diy  1.2 million open source projects  “Arduino” for s/w licenses detection 9Gb worth of SPDX? Grab: http://triplecheck.net/public/storage/spdx.big
  • 28. Next step? Slide #28 F2F – pinpointing non-original code  Decompose code into blocks  Tokenize/anonymize data  Find code matches across knowledge base ETA in Dec. 2014 https://github.com/triplecheck/f2f
  • 30. Conclusion Slide #30 What is now available for everyone  Desktop tooling / detection engine  Extraction of open data in scale  Search engine for SPDX
  • 31. Questions? Slide #31 http://spdx.org http://searchcode.com/spdx http://github.com/triplecheck Interesting stuff? Let us know: @nn81 @boyte #linuxcon http://xkcd.com/1118/