Mapping spatial patterns of whai finder usage to measure community outreach effectiveness howard veregin
1. MAPPING SPATIAL PATTERNS
OF WHAIFINDER USAGE TO
MEASURE COMMUNITY
OUTREACH EFFECTIVENESS
Howard Veregin
Wisconsin State Cartographer
AJ Wortley
Sr. Outreach Specialist
9. Research Questions
How effectively is WHAIFinder
reaching its target community?
What is the relationship between
WHAIFinder usage and population?
Is the relationship affected by
Broadband access?
10. Marketing Focus
Web analytics has e-commerce
focus: marketing goals
Methods adaptable to
nonprofits: similar
underlying indicators
of effectiveness
13. Relevance for PGIS
Continuum of engagement levels:
Information delivery to empowerment
Common challenge: Assessment of
program effectiveness
14. What is Web Analytics?
Web Analytics Association: “The
measurement, collection, analysis and
reporting of Internet data for the purposes of
understanding and optimizing Web usage”
15. Web Analytics Approaches
Usability
testing
User feedback Usage data Technical
performance
data
Formal assessments
based on user
testing in a
controlled
environment
Collection and
analysis of written
or verbal feedback
from users
Collection and
analysis of
quantitative data
about Web site
traffic
Measurement of
Web site and
Internet
performance
Purpose: Enhance
Web site design,
navigation,
functionality, and
topology to improve
user experience
Purpose: Assess
user profile,
demographics,
satisfaction, and use
Purpose: Evaluate
pages viewed,
number of visits,
unique visitors,
searches run, etc.
Purpose: Quantify
latency, availability,
data transfer rates,
etc.
21. WHAIFinder Usage
For the 1-year period from 6/15/11 to 6/14/12
21,109 pageviews
16,497 visits
9725 unique visitors
~ 0.17% of Wisconsin’s pop.
~ 27 visitors per day
25. Broadband Access
Coefficient Estimate t-value P > |t|
Intercept -6.164 -6.092 < 0.001
Slope 0.934 9.874 < 0.001
R2 = 0.589 n = 70 F = 97.500 p < 0.001
Visitors = 0.00210 Access 0.934
26. Other Results
Better results obtained with curve-fitting
model (vs. log transformations)
Dane county identified as unique outlier
based on regression metrics
Model without Dane County:
Visitors = 0.0584 Access 0.656 (R2=0.646)
28. Summary of Results
Relatively uniform usage across state.
Some areas of under-representation.
Some pockets of over-representation.
Broadband access has limited effect.
Some variation still unexplained.
29. Conclusions
Useful assessment of information delivery
effectiveness.
Identification of successes and areas for
improvement.
Quantification, analysis and mapping, not
just anecdotal evaluation.
Set and assess outreach goals, evaluate ROI.
Web analytics to measure usage patterns for an online mapping appWHAIFinder (Wisconsin Historical Aerial Image Finder) used as case studyAnalyze statistics and maps to quantify relationship between WHAIFinder usage and local populationEvaluate whether or not WHAIFinder is reaching its intended audience equitably (our definition of “effectiveness” in this case)Eventually, assess whether SCO’s technology-based outreach program is a cost-effective way to disseminate information
An increasingly common theme in higher ed, due to questions of accountability and value (what value does the university provide).As exemplified by Cooperative Extension, library access and assistance to general public, faculty service activities, specific outreach events (GIS Day), program like SCO and open education initiatives.The idea historically has been a one-way information flow of knowledge outward; now this is being reformulated in terms of an interactive model.
Outreach has long been important at UW-Madison.Wisconsin Idea: (1) the university should influence citizens’ lives beyond the classroom. (2) The knowledge and skills of the university should be used to enhance the lives of citizens throughout the state. (3) Increased emphasis on engagement and two-way info flow rather than just “top-down”. (4) It is a social contract.
Many of our duties pertain to outreach.
SCO relies heavily on technology-based outreach method – a way to connect and disseminate information via the Web.Especially true for our online mapping apps.These automate manual processes; they extend capabilities of SCO staff to respond to users’ requests.Methods have low marginal cost; once application is developed and deployed, it can be accessed by anyone at any time of day without the need for manual assistance. For users, they provide expanded access to geospatial data and information, particularly for users in remote parts of the state.Without these applications online, many individuals would be unable to access the data they need in a timely or economical manner.The question is: Are these benefits really being realized?
Focus on WHAIFinder.Online access to > 38,000 scanned aerial photographs of Wisconsin from the 1930s and 1940s.Grant from the Baldwin Wisconsin Idea Endowment awarded to SCO, Robinson Map Library and UW Digital Collections Center. Goal: (a) increase public access to the collection and (b) increase ease of access by individuals far from paper collection at UW-Madison. WHAIFinder audience is defined in the widest possible terms to include not only geospatial professionals and university researchers, but also the general public.WHAIFinder launched in February, 2011.One of the SCO’s most popular online applications, averaging over 1700 pageviews per month. Broaduser community, with 80%of traffic originating in Wisconsin, 19%from other parts of the US, and 1%from other countries.
Is there a correlation between WHAIFinder usage and population?- This would be expected if the application was reaching all areas of the state equally. Are there areas that are being under-served? Is usage concentrated in certain regions of the state? Does access to broadband Internet affect the relationship between visitors and population?Might be expected given the large download file size for WHAIFinder images.
Web analytics has a strong e-commerce focus. - Many Web analytics studies are concerned with improving Web site performance relative to specific marketing objectives. - The best-known Web analytics blogs also have this commercial orientation. However, Web analytics methods can be adapted for use in other areas as well. - Nonprofit Web sites still exhibit underlying indictors of success and patron satisfaction that can be very effective for measuring performance and effectiveness. - Hence Web analytics methods derived from e-commerce can be quite valuable for nonprofits, by translating the theories of profit-based consumer usage into the specific goals of the nonprofit organization
Libraries- Libraries at forefront of Web analytics for such assessments- In part because of rise of digital libraries with no physical collection- Increasingly need to rationalize and justify costsVarious studies look at overall usage, market penetration (saturation) rates, referred traffic, visit duration, other KPIs (Key Performance Indicators) indicative of success and effectiveness.Open Education Intiatives- e.g., National Science Digital Library, Open.Michigan initiative- National Science Digital Library (NSDL), launched in 2000 as online resource for science, technology, engineering, and mathematics (STEM) education.- NSDL is a distributed digital library providing a range of resources and materials for teachers and students.- Open.Michigan has published a variety of Open Educational Resources (OERs) on its Web site since launch in 2008 to enable faculty, staff, students, and others to share their educational resources and learn from others. - The goal of Web analytics effort is to strengthen accountability within Open.Michigan’s home institution and across the open education movement itself. Broad geographic focus means an interest in where users are located -- something we are interested in as well.Science CommunicationOften large PR campaigns, e.g., European Space Agency, Centers for Disease Control.Evaluate how well these programs are reaching target audiences, and ultimately whether outreach goals are being realized.May be multifaceted: Web, podcasts, social media. Complex analysis, but with many of the same basic questions.
Different approach taken within cartography when it comes to evaluating Web sites. Outside of the scope of our study.Web map effectiveness often equated with usability testing, i.e., examination of the ways in which users interact with maps and how the success of these interactions can be measured.Web map usability studies often focus on specific maps to assess usability issues and identify improvements for future design efforts. These research efforts focus on design and communication quality with the goal of providing guidelines for map design and development.HCI (Human Computer Interaction) is the intersection between computer science and the traditional perceptual sciences.It focuses on understanding the human-computer interface for interactive computer-based systems. UCD (User-Centered Design) is a design model aligned to the user-centered focus of HCI.
Study is aligned with work in Participatory GIS (PGIS) on community-based online mapping.PGIS refers to a continuum of levels of citizen engagement, from basic information delivery on one end to collaboration and empowerment on the other. Common characteristic of all PGIS initiatives is that they often face the same set of challenges, including technical hurdles (e.g., lack of access to broadband Internet), conceptual limitations (e.g., lack of understanding of geospatial technology and concepts), and biases inherent in the technology itself (e.g., inability to use the technology to record diverse ways of understanding space)Big question is how effective these programs are especially in face of resource constraints and limitations.
The Web Analytics Association definition of Web analytics is “the measurement, collection, analysis and reporting of Internet data for the purposes of understanding and optimizing Web usage”Actively focuses on discovering business opportunities by studying user habits and behaviors. Implications of Web analytics may transcend the Web site itself to include things as diverse as offline advertising or modifications to office workflow processes.In our case, Web analytics as the measurement and analysis of Web site traffic to understand use patterns and user behaviors, in order to identify opportunities for improving performance on specific program goals. These goals refer to the effectiveness of outreach and information delivery to citizens of the state through our Web site and online mapping applications.
In this typology, we are in column 3. Usability testing (method 1) might reveal possible improvements to the interface, but not a top priority at this time. Our focus is on service effectiveness as opposed to patron satisfaction; hence the user feedback method (method 2) is not directly applicable to the issues we are addressing.Performance data (method 4) does have some relevance -- users may experience problems dealing with large download file sizes; we operationalize this through a measure of broadband Internet access. Other technical performance issues are not addressed in our study. Since our primary concern is assessing WHAIFinder effectiveness by examining usage statistics, our focus is the usage data method (method 3).An added dimension of our study is the use of location data to visualize and analyze geographic patterns of usage across Wisconsin.
Free service.Snippet of javascript code, stats loaded to account page.Usage stats include: visits, visitors, pageviews.Stats tabulated by location, traffic source, web platform used, etc.Can vary time frame of analysis.Can export data as csv files for analysis in Excel or other software.Pageview -- Calculated every time a page on the Web site loads. If a user visits a page, and then reloads the page, two pageviews will be counted. This is one of the more crude usage indicators, and is conceptually quite similar to the idea of a “hit”. The main problem with the pageview statistic is that it can significantly overestimate actual usage.Visit. A period of interaction between a Web browser and a Web site. When the browser is closed or is inactive for a certain period of time, the visit has ended.Visitor. Essentially synonymous with a single user.Location and hardware dependent. This study focuses on visitors -- since the study looks at correlating usage against population. Did a customized report from June 15, 2011 to June 14, 2012.
Statistics down to the “city” level.Map visitors’ IP addresses to geographic locations. For the WHAIFinder, almost 2500 cities for the selected year-long date range. Approximately 65 % were not in Wisconsin and few visitors. Excluded.List includes cities, villages, unincorporated places, as well towns (or townships) -- which have legal status in Wisconsin. Observed that for some counties only one or two of the larger cities were listed.Due to the way that IP addresses are assigned -- accuracy of 78 percent at the city level in the US, within a 40 kilometer radius (larger than the average Wisconsin county).Visitorsfor all cities, villages, townships, and unincorporated places within each county were summed.
Broadband data from National Broadband Map (NBM) initiative (www.broadbandmap.gov). NBMcreated by National Telecommunications and Information Administration (NTIA) in collaboration with Federal Communications Commission (FCC) and US states and territories.Part of NTIA’s State Broadband Initiative (SBI) that provides funding to state entities or non-profits to map statewide broadband data.In Wisconsin SBI is known as LinkWISCONSIN and is administered through the Public Service Commission of Wisconsin.For the NBM, broadband is defined as-high-speed, “always-on” connection to the Internetproviding two-way data transmission (i.e., upload and download) with advertised speeds of at least 768 Kbps for download and 200 Kbps for upload.
For us, issue is time it takes to download a scanned WHAIFinder image. Highest resolution images (600 dpi) are approximately 30 MB.At 768 Kbps an image would take > 5 minutes to download.For this study, used download speeds of >= 3 Mbps. 768 Kbps islower limit of broadband, but 3 Mbps is more in line with the FCC’s National Broadband Plan as a desired base level of access. % of county population with access to max download speeds of >= 3 Mbps obtained using the NBM analysis tool (www.broadbandmap.gov/rank). Percentages then converted to total population counts by county.
Important areas: 1 = Dane county (UW-Madison, state agencies)2 = Milwaukee (largest city in state)3 = Green Bay4 = Eau Claire (large UW campus)5 = La Crosse (large UW campus)
Important areas: 1 = Dane county (UW-Madison, state agencies)2 = Milwaukee (largest city in state)3 = Green Bay4 = Eau Claire (large UW campus)5 = La Crosse (large UW campus)
Highest densities:1 = Dane county2 = La Crosse – Eau Claire area3 = Northern countiesLow densities:4 = Milwaukee and eastern edge of stateMoran’s I = 0.00045 (county adjacency) not sig at 95% level.
Strong, significant relationship between visitors and population.Suggests the application is being accessed uniformly throughout the state.Best relationship is double-log; can write in non-linear form.Number of visitors increases with population but a declining rate.Similar to allometric relationships observed for pop and other aspects of urban areas.
Important areas: 1 = Dane county (UW-Madison, state agencies)2 = Milwaukee (largest city in state)3 = Green Bay4 = Eau Claire (large UW campus)5 = La Crosse (large UW campus)
Regression residuals examined for spatial patterns. 0. Under-represented counties (fewer visitors than expected) and over-represented counties (more visitors than expected) are distributed throughout the state fairly randomly. Test of spatial autocorrelation in residuals:Moran I statistic is -0.000329 (not significant at the 95% level).Some patterns are visible nevertheless. Dane countyhas the highest over-representation of visitors.Many state agency offices and UW-Madison campus. 2. Eau Claire and LaCrosse.Also have large UW campuses. Greater awareness of the application due to classroom use, links oncounty Web site (referred traffic).3. Over-represented counties in north.Far from physical data collection.Reflects benefits of using site over driving to Madison.4. Under-representation (fewer visitors than expected) along eastern edge of state, especially in Milwaukee county. Has been suggested that southeastern Wisconsin is somewhat independent of the rest of the state in terms of geospatial activity.Several counties in this region have county-wide mosaics available on their Web sites, which may be a more attractive data source.
Population weighted by percentage with access to broadband Internet with speeds of at least 3 Mbps.Access to broadbandassociated with higher visitor rates, as would be expected.Results very similar, but marginally better, than population unweighted by broadband access.Test of spatial autocorrelation in residuals:Moran I statistic is -0.000299 (not significant at the 95% level).
Possible to fit a curve to a dataset without applying a log transformation. Instead, optimization methods can be used to derive model coefficients that maximize explanatory power. The Generalized Reduced Gradient Algorithm (GRG) is an example.To assess the influence of Dane county, used Cook’s distance (Di). - Based on predicted values for the dependent variable derived from the full regression model and from a partial model in which a particular data point has been removed.Removing Dane county results in a Di value of 5.63, which is much greater than 0.7 (cutoff).This model has a higher R2 value than any of the models that include this county.
Strong positive relationship between WHAIFinder visitors and county population. Relatively uniform usage across state. A good thing!Some under-representation along eastern edge of state. Cause uncertain.Some areas of over-representation, possibly due to greater awareness of the app.Little evidence that broadband access exerts an effect on this relationship. Other sources of unexplained variation remain under consideration.
Analysis of web usage is a cost-effective way to assess effectiveness of information delivery. Identify where information outreach is successful and where it can be improved.Quantification, analysis and mapping, not just anecdotal evaluation. Set outreach and information dissemination goals, help assess whether goals are being met.Evaluate ROI.