SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
STRUCTURAL PROFILING OF WEB SITES IN THE WILD
LABORATOIRE D’INFORMATIQUE FORMELLE UNIVERSITÉ DU QUÉBEC À CHICOUTIMI
XAVIER CHAMBERLAND-THIBEAULT AND SYLVAIN HALLÉ ICWE 9 JUIN 2020
1
THE REASONING BEHIND THIS PAPER
2
DEBUGGING AND FIXING WEB APPLICATIONS
 An increasing number of tools are created to help analyze, debug, detect errors or even process the output of
web applications.
 Most of the tools focus on anlyzing the Document Object Model (DOM) and the Cascading Stylesheet (CSS) of a
page.
 Those tools have varied utilities :
 Fixing cross-browser issues ;
 DOM interpreter ;
 Detect responsive web design bugs ;
 Etc.
3
WHAT DOES A WEB PAGE LOOKS LIKE ?
 Most of the aforementioned tools have their scalability, and sometimes even their success, based on size related
features.
 What’s the average size of a web page ?
 Walsh and al. (2015) run experiments against pages of up to 196 DOM nodes, whereas Choudhary and al. (2013) chose
pages going up to 39146 DOM nodes.
 This paper aimed to address this issue by doing a large-scale analysis of 708 websites hoping to measure an array
of parameters relative to the size and structure of web pages.
4
METHODOLOGY
5
METHODOLOGY
Website collection DOM harvesting Data processing
6
WEBSITE COLLECTION
 To make sure to get a pool of websites representing the reality of the users, it was mandatory to get the sites
that the most users visit.
 To do that, the Moz top 500 most frequented websites list was used. However, there were many duplicates made of country
specific versions of the same web application.
 Out of those 500 sites, only 300 non-duplicate remained.
 Yet, sites visited by the most users do not reflect the reality, for this notion is orthogonal to the sites most visited
by an individual user.
 Therefore, we informally asked people around to provide us with the list of websites they use daily.
7
DOM HARVESTING
 To collect data on the DOM for each of these sites, a JavaScript program was designed to run when a page has
finished loading.
 The script starts at the body node of a page and performs a preordered traversal of the integral DOM tree,
recording and computing various features :
 Tag names ;
 CSS classes ;
 Visibility status ;
 Structural information.
 The script then generated two files : a JSON file containing all the data and a DOT file accepted by the Graphviz
library so we could get statistical and visual representation of a web page.
8
DOM HARVESTING – RUNNING ON EVERY PAGE
 To actually be able to run on every page, the TamperMonkey extension was used.
 This extension, available on multiple browsers, allows the user to inject and run custom JavaScript code every
time a new page is loaded in the browser.
 It is to be noted that the harvesting was done on the browser-rendered DOM and properties.
9
DATA PROCESSING
 LabPal was used to process all the 62MB of raw data :
 Every website was made into an experiment that would process the associated JSON file ;
 It was then possible to aggregate all the data recovered and even perform deeper statistical analysis.
 It is to be noted that some files were not used since the automated loading made us retrieve a lot of pop-ups.
 Manually inspecting each recovered files to detect the pop-ups would have been a tedious task, therefore it was
decided to use a more generic filter removing most of these pages by removing every file with less than 5 DOM
nodes or if the URL belonged to a list of know advertisement pages.
10
RESULTS
11
GRAPHICAL REPRESENTATION OF AWEBSITE
 Each color represents a different HTML tag name.
 The root of the tree, the body tag, is represented by
the black square.
 This is the representation of Zippyshare.com .
12
RESULTS
Cumulative distribution of websites based on
the size of DOM tree
Distribution of websites based on DOM tree
depth
13
RESULTS
Cumulative distribution of websites based on
maximum node degree
Distribution of websites based on maximum
node degree
14
RESULTS
Total number of elements using each
visibility
Distribution of websites according to the fraction
of all DOM nodes that are invisible.
15
RESULTS
Size of the DOM tree vs. number of CSS
classes
Cumulative distribution of websites based on
the average size of a CSS class
16
THREATTOVALIDITY
Website sample
Variance due to browser
Homepage analysis
17
REFERENCES
 Walsh,T.A., McMinn, P., Kapfhammer, G.M.:Automatic detection of potential layout faults following changes to
responsive web pages (N). In: Cohen, M.B., Grunske, L.,Whalen, M. (eds.) Proc.ASE 2015. pp. 709–714. IEEE
Computer Society (2015)
 Choudhary, S.R., Prasad, M.R., Orso,A.: X-PERT: accurate identification of crossbrowser issues in web applications.
In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) Proc. ICSE 2013. pp. 702–711. IEEE Computer Society (2013)
 The Moz top 500 websites, https://moz.com/top500,Accessed October 20th, 2019
 All pictures used are licence free
18

Más contenido relacionado

La actualidad más candente

Prawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyPrawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyTom Klaasen
 
introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5FLYMAN TECHNOLOGY LIMITED
 
Website Overview
Website OverviewWebsite Overview
Website OverviewChanHan Hy
 
Wikisfor Everyone
Wikisfor EveryoneWikisfor Everyone
Wikisfor Everyonemayerc
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)Partnered Health
 
building websites at NAU
building websites at NAUbuilding websites at NAU
building websites at NAUJonathan Smart
 
DHTML - Dynamic HTML
DHTML - Dynamic HTMLDHTML - Dynamic HTML
DHTML - Dynamic HTMLReem Alattas
 
Bruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek MeetBruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek Meetbrucelawson
 
Web Design Basics and HTML
Web Design Basics and HTMLWeb Design Basics and HTML
Web Design Basics and HTMLRajesh Sanabada
 
W3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web StandardsW3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web Standardsbrucelawson
 

La actualidad más candente (14)

Prawn: Creating PDF in Ruby
Prawn: Creating PDF in RubyPrawn: Creating PDF in Ruby
Prawn: Creating PDF in Ruby
 
lect9
lect9lect9
lect9
 
introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5introduction to the document object model- Dom chapter5
introduction to the document object model- Dom chapter5
 
Website Overview
Website OverviewWebsite Overview
Website Overview
 
Web browsers and web document
Web browsers and web documentWeb browsers and web document
Web browsers and web document
 
USER MANUAL
USER MANUALUSER MANUAL
USER MANUAL
 
Wikisfor Everyone
Wikisfor EveryoneWikisfor Everyone
Wikisfor Everyone
 
Dom(document object model)
Dom(document object model)Dom(document object model)
Dom(document object model)
 
building websites at NAU
building websites at NAUbuilding websites at NAU
building websites at NAU
 
DHTML - Dynamic HTML
DHTML - Dynamic HTMLDHTML - Dynamic HTML
DHTML - Dynamic HTML
 
Bruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek MeetBruce lawson Stockholm Geek Meet
Bruce lawson Stockholm Geek Meet
 
Web Design Basics and HTML
Web Design Basics and HTMLWeb Design Basics and HTML
Web Design Basics and HTML
 
Dhtml sohaib ch
Dhtml sohaib chDhtml sohaib ch
Dhtml sohaib ch
 
W3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web StandardsW3C Widgets: Apps made with Web Standards
W3C Widgets: Apps made with Web Standards
 

Similar a Structural profiling of web sites in the wild

Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Doug Gapinski
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Dave Olsen
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech QuotientTarence DSouza
 
Making Of PHP Based Web Application
Making Of PHP Based Web ApplicationMaking Of PHP Based Web Application
Making Of PHP Based Web ApplicationSachin Walvekar
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesiabrucelawson
 
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxLATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxchitrachauhan21
 
Measuring Web Performance
Measuring Web Performance Measuring Web Performance
Measuring Web Performance Dave Olsen
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client PerformanceHerea Adrian
 
Leverage Your Online Web Presence
Leverage Your Online Web PresenceLeverage Your Online Web Presence
Leverage Your Online Web PresenceSusan Boone
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web DesignDave Olsen
 
Liquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design AlgorithmLiquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design Algorithmtheijes
 
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski MeetMagentoNY2014
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...IOSR Journals
 
Building Mobile Websites with Joomla
Building Mobile Websites with JoomlaBuilding Mobile Websites with Joomla
Building Mobile Websites with JoomlaTom Deryckere
 
Web topic 26 browser compatibilty and security
Web topic 26  browser compatibilty and securityWeb topic 26  browser compatibilty and security
Web topic 26 browser compatibilty and securityCK Yang
 
Responsive Web Design_2013
Responsive Web Design_2013Responsive Web Design_2013
Responsive Web Design_2013Achieve Internet
 
G0373049057
G0373049057G0373049057
G0373049057theijes
 

Similar a Structural profiling of web sites in the wild (20)

Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
Everything You Know is Not Quite Right Anymore: Rethinking Best Web Practices...
 
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
Everything You Know is Not Quite Right Anymore: Rethinking Best Practices to ...
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
Making Of PHP Based Web Application
Making Of PHP Based Web ApplicationMaking Of PHP Based Web Application
Making Of PHP Based Web Application
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesia
 
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptxLATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
LATEST_TRENDS_IN_WEBSITE_DEVELOPMENT.pptx
 
Web engineering lecture 5
Web engineering lecture 5Web engineering lecture 5
Web engineering lecture 5
 
Measuring Web Performance
Measuring Web Performance Measuring Web Performance
Measuring Web Performance
 
Web Client Performance
Web Client PerformanceWeb Client Performance
Web Client Performance
 
RWD
RWDRWD
RWD
 
Leverage Your Online Web Presence
Leverage Your Online Web PresenceLeverage Your Online Web Presence
Leverage Your Online Web Presence
 
The Server Side of Responsive Web Design
The Server Side of Responsive Web DesignThe Server Side of Responsive Web Design
The Server Side of Responsive Web Design
 
Liquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design AlgorithmLiquidizer.js: A Responsive Web Design Algorithm
Liquidizer.js: A Responsive Web Design Algorithm
 
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
Two approaches to RWD: Pure & Hybrid. Brendan Falkowski
 
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
Advance Frameworks for Hidden Web Retrieval Using Innovative Vision-Based Pag...
 
Web apps of the future
Web apps of the futureWeb apps of the future
Web apps of the future
 
Building Mobile Websites with Joomla
Building Mobile Websites with JoomlaBuilding Mobile Websites with Joomla
Building Mobile Websites with Joomla
 
Web topic 26 browser compatibilty and security
Web topic 26  browser compatibilty and securityWeb topic 26  browser compatibilty and security
Web topic 26 browser compatibilty and security
 
Responsive Web Design_2013
Responsive Web Design_2013Responsive Web Design_2013
Responsive Web Design_2013
 
G0373049057
G0373049057G0373049057
G0373049057
 

Último

On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Onlineanilsa9823
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls
 

Último (20)

On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 

Structural profiling of web sites in the wild

  • 1. STRUCTURAL PROFILING OF WEB SITES IN THE WILD LABORATOIRE D’INFORMATIQUE FORMELLE UNIVERSITÉ DU QUÉBEC À CHICOUTIMI XAVIER CHAMBERLAND-THIBEAULT AND SYLVAIN HALLÉ ICWE 9 JUIN 2020 1
  • 2. THE REASONING BEHIND THIS PAPER 2
  • 3. DEBUGGING AND FIXING WEB APPLICATIONS  An increasing number of tools are created to help analyze, debug, detect errors or even process the output of web applications.  Most of the tools focus on anlyzing the Document Object Model (DOM) and the Cascading Stylesheet (CSS) of a page.  Those tools have varied utilities :  Fixing cross-browser issues ;  DOM interpreter ;  Detect responsive web design bugs ;  Etc. 3
  • 4. WHAT DOES A WEB PAGE LOOKS LIKE ?  Most of the aforementioned tools have their scalability, and sometimes even their success, based on size related features.  What’s the average size of a web page ?  Walsh and al. (2015) run experiments against pages of up to 196 DOM nodes, whereas Choudhary and al. (2013) chose pages going up to 39146 DOM nodes.  This paper aimed to address this issue by doing a large-scale analysis of 708 websites hoping to measure an array of parameters relative to the size and structure of web pages. 4
  • 6. METHODOLOGY Website collection DOM harvesting Data processing 6
  • 7. WEBSITE COLLECTION  To make sure to get a pool of websites representing the reality of the users, it was mandatory to get the sites that the most users visit.  To do that, the Moz top 500 most frequented websites list was used. However, there were many duplicates made of country specific versions of the same web application.  Out of those 500 sites, only 300 non-duplicate remained.  Yet, sites visited by the most users do not reflect the reality, for this notion is orthogonal to the sites most visited by an individual user.  Therefore, we informally asked people around to provide us with the list of websites they use daily. 7
  • 8. DOM HARVESTING  To collect data on the DOM for each of these sites, a JavaScript program was designed to run when a page has finished loading.  The script starts at the body node of a page and performs a preordered traversal of the integral DOM tree, recording and computing various features :  Tag names ;  CSS classes ;  Visibility status ;  Structural information.  The script then generated two files : a JSON file containing all the data and a DOT file accepted by the Graphviz library so we could get statistical and visual representation of a web page. 8
  • 9. DOM HARVESTING – RUNNING ON EVERY PAGE  To actually be able to run on every page, the TamperMonkey extension was used.  This extension, available on multiple browsers, allows the user to inject and run custom JavaScript code every time a new page is loaded in the browser.  It is to be noted that the harvesting was done on the browser-rendered DOM and properties. 9
  • 10. DATA PROCESSING  LabPal was used to process all the 62MB of raw data :  Every website was made into an experiment that would process the associated JSON file ;  It was then possible to aggregate all the data recovered and even perform deeper statistical analysis.  It is to be noted that some files were not used since the automated loading made us retrieve a lot of pop-ups.  Manually inspecting each recovered files to detect the pop-ups would have been a tedious task, therefore it was decided to use a more generic filter removing most of these pages by removing every file with less than 5 DOM nodes or if the URL belonged to a list of know advertisement pages. 10
  • 12. GRAPHICAL REPRESENTATION OF AWEBSITE  Each color represents a different HTML tag name.  The root of the tree, the body tag, is represented by the black square.  This is the representation of Zippyshare.com . 12
  • 13. RESULTS Cumulative distribution of websites based on the size of DOM tree Distribution of websites based on DOM tree depth 13
  • 14. RESULTS Cumulative distribution of websites based on maximum node degree Distribution of websites based on maximum node degree 14
  • 15. RESULTS Total number of elements using each visibility Distribution of websites according to the fraction of all DOM nodes that are invisible. 15
  • 16. RESULTS Size of the DOM tree vs. number of CSS classes Cumulative distribution of websites based on the average size of a CSS class 16
  • 17. THREATTOVALIDITY Website sample Variance due to browser Homepage analysis 17
  • 18. REFERENCES  Walsh,T.A., McMinn, P., Kapfhammer, G.M.:Automatic detection of potential layout faults following changes to responsive web pages (N). In: Cohen, M.B., Grunske, L.,Whalen, M. (eds.) Proc.ASE 2015. pp. 709–714. IEEE Computer Society (2015)  Choudhary, S.R., Prasad, M.R., Orso,A.: X-PERT: accurate identification of crossbrowser issues in web applications. In: Notkin, D., Cheng, B.H.C., Pohl, K. (eds.) Proc. ICSE 2013. pp. 702–711. IEEE Computer Society (2013)  The Moz top 500 websites, https://moz.com/top500,Accessed October 20th, 2019  All pictures used are licence free 18