This document summarizes a study that mined public web data to approximate university rankings. It analyzed various existing university ranking lists and then developed new ranking metrics based on data from university websites, Twitter accounts, endowments and other sources. It found moderate correlations between the new web-based rankings and traditional expert rankings. The study also identified challenges around inconsistent data on the web and limitations of only considering a single official Twitter account for each university.
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
JCDL 2018 - Mining the Web to Approximate University Rankings
1. Mining the Web to Approximate
University Rankings
Corren G. McCoy, Michael L. Nelson, Michele C. Weigle
@CorrenMcCoy @WebSciDL
JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX
June 6, 2018
2. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
University Ranking Lists
2
Center for World University Rankings
Top Party Schools
America’s Top Colleges
QS World University Rankings
Best Athletic Facilities
Most Beautiful Campus
Reuters Top 100: The World's Most Innovative Universities
CWTS Leiden Ranking
Kiplinger's Best College Value
Best Campus Food
3. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Academic Ranking Lists
Data Collected Between June and August 2016
3
• Disparate Ranking Criteria Between Lists
• Alumni Giving
• Peer Evaluation Surveys
• Value and Affordability
• Faculty Citations
• Graduation Rates
• Library Resources
• Faculty-Student Ratios
4. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Variations Between Expert Ranking Lists
4
US News Top-10 US Universities (2016)
______________________________________
1. Harvard University
2. Stanford University
3. UC Berkeley
4. UCLA
5. Columbia University
6. University of Washington
7. Princeton University
8. University of Pennsylvania
9. Yale University
10. University of Michigan
Money Magazine Top-10 US Universities (2016)
____________________________________________
1. Princeton University
2. University of Michigan
3. Harvard University
4. Rice University
5. UC Berkeley
6. Brigham Young University
7. University of Virginia
8. Stanford University
9. Yale University
10.Texas A&M
5. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Mining the Web
5
Data-driven, alternative approach using only web-based metrics
University Twitter Engagement (UTE) Score
Traditional and “undiscovered” Twitter followers
Popular metric for measuring influence and social reputation
Enrollment, Endowment, and Athletic Expenditures (EEE) Score
Measure of institutional investment in promoting school’s brand
Adjusted Reputation Rank (ARR)
Mean ranking across all expert lists
Similar to Olympic Scoring
Excludes Money Magazine’s ranking position due to its variability
6. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Building the Web Collection
Data collected between June and August 2016
6
University Home Pages
Twitter Friends & Followers
Google Custom Search Engine
NCSE Endowment Market Values
Athletic Expenditures
Undergraduate Enrollment
NCAA Conference Membership
8. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Incomplete Social Media Directories
8
Source: https://web.archive.org/web/20150905094512/https://socialmedia.duke.edu/ (2015)
9. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Issues with Data Consistency
9
Inconsistent references to university names
US News: Binghamton University—SUNY
THE: SUNY—Binghamton
Missing Twitter account on home page
University of Louisville
Ball State
Some endowments aggregated at the system or foundation level
The University of Texas System vs. University of Texas-Austin
University of Minnesota & Foundations vs. University of Minnesota - Twin Cities
10. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Web Data in PDF Format
10
Source:
https://web.archive.org/web/20180206194037/http://www.nacubo.org/Documents/EndowmentFiles/2015_NC
SE_Endowment_Market_Values.pdf
11. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Automated Submission to Web Forms
11
Source: https://nces.ed.gov/ccd/schoolsearch/
13. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Building the University Dataset
13
Who’s In:
• All Ivy League Schools
• Baylor University
• Citadel
• Villanova University
Who’s Out:
• Baylor College of Medicine
• Univ. of Texas Health Science Ctr
• MIT
14. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
US Schools Depend on Fundraising
14
15. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Standardizing Rank Positions
15
Unranked schools added
to end of list
16. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Adjusted Reputation Rank (ARR)
16
Top-10 Ranked by ARR
17. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Enrollment, Endowment, Expenditures (EEE) Rank
17
18. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
University Twitter Engagement (UTE) Score
18
(Bi-directional linkage required)
19. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Bi-directional Linkage Points to Official Accounts
19
duke.edu (Yes) goDuke.com (No)annualfund.duke.edu (Yes)
20. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Rank Comparisons
20
Top-10 by ARR
1. Harvard University
2. Stanford University
3. UC-Berkeley
4. Princeton University
5. Columbia University
6. UCLA
7. Yale University
8. University of Pennsylvania
9. University of Washington
10. University of Michigan
Top-10 by EEE
1. Ohio State University
2. University of Texas
3. Penn State University
4. University of Michigan
5. Univ. of Wisconsin
6. University of Florida
7. Michigan State University
8. University of Washington
9. UCLA
10. Indiana University
Top-10 by UTE
1. Harvard University
2. Stanford University
3. Cornell University
4. Yale University
5. University of Pennsylvania
6. Arizona State
7. Columbia University
8. Texas A&M University
9. Wake Forest University
10. University of Texas
21. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Kendall’s Tau-b Correlation Between Rankings
21
22. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Correlation of ARR to EEE for Complete Dataset
22
EEE Rank
1 to 50
51 to100
101 to150
151 to 200
201 to 264
Large Enrollment
Big Endowments
Hefty Athletic Spending
Money Magazine Only Schools
Given ARR of 142
23. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Correlation of ARR to UTE for Complete Dataset
23
EEE Rank
1 to 50
51 to100
101 to150
151 to 200
201 to 264
Georgia Tech
Univ. of Pittsburgh
Cornell
Wake Forest
24. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Impact of a Celebrity Professor
Wake Forest University Loses 410K Followers From UTE Rank
24
URI: ajccenter.wfu.edu
Internet Archive:
http://web.archive.org/web/20151210173754/https:/twitter.com/M
HarrisPerry (2015)
URI: freedomontap.com
Live Web: https://twitter.com/MHarrisPerry (2018)
25. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Correlation of EEE to UTE for Complete Dataset
25
EEE Rank
1 to 50
51 to100
101 to150
151 to 200
201 to 264
Large Enrollment
Big Endowments
Hefty Athletic Spending
Pitt
Georgia Tech
Dartmouth
26. 26@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
NCAA Power Five
• Richest, Most Powerful Schools in NCAA Division I
• 55 of 65 in Top-100 of ARR
• 65 of 65 in Top-100 of EEE
• 61 of 65 in Top-100 of UTE
26
27. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Correlation of ARR to UTE
NCAA Power Five
27
NCAA Power 5
Other Schools
Louisville
https://GoCards.com
Georgia Tech
Missing Web Links
Oregon State
Missing Web LinksIvy League Schools
Not in Power Five
28. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
UTE Constrained by Domain Rules for Official Accounts
28
Full table available in Appendix A of our tech report:
Corren G. McCoy, Michael L. Nelson, and Michele C. Weigle. 2017. University Twitter Engagement: Using
Twitter Followers to Rank Universities. arXiv preprint arXiv:1708.05790.
29. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Future Work
29
Address bots and spam accounts
Potentially overinflates the UTE score
The Follower Factory – The New York Times (January 2018)
https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html
Address artificial manipulation of Twitter followers
Campbell’s Law
Temporal analysis using identify non-linear spikes
Known Twitter API limitations related to historical follower counts
Miranda Smith. 2018. Twitter Follower Count History via the Internet Archive. Blog. http://ws-
dl.blogspot.com/2018/03/2018-03-14-twitter-follower-count.html
Expand list of official university domains
UTE would include more secondary accounts
30. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Conclusions
30
Mined data from Twitter and other public, web data sources
Raw data and 1M Twitter profiles available on GitHub
https://github.com/oduwsdl/University-Twitter-Engagement
Incomplete directories of social media accounts by universities
Necessitated discovery of secondary accounts
Methodology constrains universities to single official domain
Potentially deflates the UTE score
Comparable approximation to national ranking lists (ARR)
Correlation with UTE ( τ =0.6018)
Correlation with EEE ( τ =0.5969)
31. @CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018@CorrenMcCoy @WebSciDL JCDL-KDDL 2018, UNT Health Science Center, Fort Worth, TX June 6, 2018
Links to Data Sources
31
Tech report and blog post
Corren G. McCoy, Michael L. Nelson, and Michele C. Weigle. 2017. University Twitter Engagement: Using
Twitter Followers to Rank Universities. arXiv preprint arXiv:1708.05790.
Corren McCoy. 2017. University Twitter Engagement: Using Twitter Followers to Rank Universities. Blog.
http://ws-dl.blogspot.com/2017/08/2017-08-25-university-twitter.html
Endowment: 2015 NCSE Endowment Market Values,
http://www.nacubo.org/Documents/EndowmentFiles/2015_NCSE_Endowment_Market_Values.p
df
Athletic Expenditures: Equity in Athletics Data Analysis (EADA), https://ope.ed.gov/athletics/
Undergraduate Enrollment: Integrated Postsecondary Education Data System (IPEDS),
https://nces.ed.gov/ipeds/
Google Custom Search: https://cse.google.com/cse/
Twitter Developer Platform: https://developer.twitter.com/content/developer-twitter/en.html
NCAA Conference Membership: http://www.ncaa.org/about?division=d1
Editor's Notes
As indicated here, there are ranking lists for all kinds of metrics related to academia. Top party schools. Most beautiful campus. Our human nature inclines us to rank things.
Academic rankings can play an important role in assessing reputation. Those rankings, like the set shown here, use disparate criteria and methodologies.
Three of the four ranking systems determine best colleges based on academic excellence, while the fourth, Money Magazine is focused solely on perceived value and affordability.
This data can be time-consuming and expensive to gather and doesn’t provide a mechanism for point-in-time comparison because the lists are published annually.
Because the criteria are different, we see noticeable differences in the top rankings between lists
For our data collected in 2016, we see the top-10 universities on US News is anchored by the schools we’re accustomed to seeing.
On the other hand, the Money Magazine list includes a few of schools along with several others we might not expect to see in a top-10 lists
In our work, we mine a collection data that is publicly available on the web to compute three different metrics to approximate typical academic rankings.
Specifically, we mine Twitter data to compute a ranking based on the University Twitter Engagement (UTE) score (followers)
Then, we gather information from other public sources posted by the Department of Education related to the student enrollment, endowment, and athletics expenditures (EEE)
Finally, we compute an adjusted reputation rank which is the mean ranking across all of the lists in our dataset
Since the Money Magazine ranking was an outlier, we exclude that ranking position when computing the mean. But, we keep universities to expand our dataset
Here are the data sources needed to feed our metrics
The ranking lists usually provide a link to the university’s home page. From there we can begin to search and collect Twitter account information
If no Twitter profile is on the home page, we use search engine results from Google to identify the primary Twitter account
We consume numerous sources from the Department of Education to provide the data we need to compute our EEE score
And, the National Collegiate Athletic Association (NCAA) web site provided the conference information we’ll discuss later.
We primarily used web-based data sources, but still encountered challenges which are not conducive to automation
Our process requires an initial twitter account from the university home page (follow me on Twitter icon)
Some universities don’t have this icon or a complete social media directory.
Here, we see for Duke that the programs listed don’t necessarily have an associated Twitter account listed. In our methodology, these accounts have to be discovered using a technique I’ll describe in a moment.
Each of the data sources we used potentially has a different nomenclature when referring to a university. Most common cases involved transposition or name or references to branch and main campuses.
We identified 10 universities in our dataset where the Twitter account was not prominently displayed on the home page
There was a mixture of granularity in the reporting of endowment values to university systems or foundations.
Again, we used Google or dbPedia to tease apart values to determine the appropriate allocation to individual universities.
The endowment values were available on the web but in PDF format.
We used online tools to convert the PDF to Excel. The quality of the conversion varied so we manually inspected the converted text for accuracy.
https://www.pdftoexcel.com/
https://pdftables.com/convert-pdf-to-excel
The education databases are available online, but must be queried individually via a web-form.
We wrote scripts to submit the desired parameters and page navigation, then scraped the generated output to get enrollment.
Now let’s talk about how we build the data set, compute the web metrics and perform some rank-order correlation
We started with the 351 US universities currently classified as Division I by the National Collegiate Athletic Association (NCAA).
Division I is not a ranking, but represents the highest level of intercollegiate athletics.
Division I schools have larger budgets, more advanced facilities, and more athletic scholarships
Next, we retrieved the US universities from Academic Rankings of World Universities (ARWU) 2016, the Times Higher Education (THE) World University rankings 2015-2016, Money’s Best Colleges (MONEY) 2016-2017, and U.S. News & World Report (USNEWS) Best Global Universities 2015 and 2016.
The overlap, or any university appearing on at least two least lists, resulted in 264 schools in our dataset. 115 schools appeared only on Money.
Who’s in: The Ivy’s and other schools were recognize. Who’s out: Graduate schools, medical schools, maritime academies, non-DIV I, note MIT is NCAA DIV III
The ranking lists that we use include both US and international universities.
We focus on the universities in the US because the data is more readily available on the web.
In particular the US schools are also more driven to connect with alumni and friends to drive fundraising efforts. This outreach is essential when for attracting Twitter followers.
One of the problems when comparing two ranked lists is that the length may not be identical; items in list A do not necessarily appear in list B.
Using Spearman’s Footrule essentially places any universities which are not ranked at the end of a respective list.
We also need to order the rankings to compute the ARR score
In this example, for Times Higher Education, we place the universities in order by their original ranking position, then number sequentially to standardize the position.
Schools in the dataset which were not included in the THE list are placed at end using Spearman’s footrule.
Once we sequence each of the expert lists using Spearman’s foot rule, we calculate a mean reputation score.
As I noted earlier, we don’t include the Money Magazine ranking in the mean even though the universities remain in the dataset
The ordering of mean reputation score becomes our adjusted reputation rank (ARR)
To calculate our EEE rank, we apply the same ordering principle
We ordered each university separately within the individual categories, then calculated the mean to determine EEE.
Our twitter follower score needs a bootstrap using the URI for the university homepage (www.stanford.edu)
By looking at page source, we scrape the HTML to locate links to Twitter handles. We used regular expressions to eliminate false positives like twitter queries or photos.
If we couldn’t find a Twitter account, we used an Xray search on Google to find an account. As a last resort, we searched Twitter manually.
Once we scrape the primary accounts, we use the Twitter API to get profile information for friends (following)
The friends need to have a bi-directional linkage with the university homepage to be considered as a secondary account
For the primary and secondary accounts, we aggregate the followers to calculate UTE
Here’s another example of the bi-directional linkage needed to identify official accounts.
These twitter accounts were scraped from Duke’s home page
Only two are considered official because the URI in the weblink resolves to the same domain as the homepage
If visit goDuke.com, we can see that it’s the men’s basketball team, but it fails our linking rule
Let’s take a look at how universities are ranked using each of our three metrics
In the Top 10 by ARR, we see the Ivy’s and schools we expect
In the Top 10 by EEE, we see schools with large enrollment and big sports programs
In the Top 10 by UTE, we see some familiar names and a few that might be unexpected (Arizona State, Wake Forest)
Once we have our three metrics, we evaluated the correlation between the 264 universities using Kendall’s Tau-b. We used Tau-b because we have ties within the rankings we computed.
In Top 50, UTE is most strongly correlated with the ARR with a τ value of 0.6691. We noticed the majority of the schools in the top-50 are usually Ivy League or large schools like those in the Power Five (e.g., Ohio State, Penn State)
The correlate between UTE and ARR decreases slightly for the top-100, but overall we see a strong correlation, τ = 0.6018, when we examine the full dataset.
Likewise, we see a strong correlation, τ = 0.6461 between EEE and UTE for the full dataset
The goal of our resource was to maximize the use of web-based metrics, so in that scenario we would choose UTE over EEE since the rankings would be similar.
To further investigate the correlation of ARR, UTE, and EEE, let’s look at some scatter plots.
The colors represent bins of the EEE rank as shown in the legend
The 115 schools that appeared exclusively on the Money Magazine list were binned and all assigned a rank of 142 for the ARR.
Note that all of the universities in the first bin of EEE (black dots) are ranked below 150 in ARR.
This suggests that universities with high enrollments, endowments, and/or athletic budgets tend to also have high academic rank.
In this comparison of ARR to UTE, we see several universities with large Twitter followings which we can’t explain based on just their academic rank (black dots)
Essentially, the UTE rank is higher than the ARR rank.
Most of these rankings fall into the first bin of EEE (larger schools) which could explain the increase in Twitter followers
In the case of Wake Forest, we conclude that Twitter provides an inexpensive way for smaller schools to reach a large audience, potentially enhancing their reputations.
We also see there are several unranked, smaller schools in the last EEE bin (cyan dots) that have larger Twitter followings than their academic rank or EEE would explain.
These schools may be trying enhance their profile and could appear in standard rankings some time in the future
Let’s take about Wake Forest for a moment
At the time of our data collection, the ARR (top 100) for Wake Forest doesn’t correlate with the UTE ranking
This is because the schools Twitter followers boosted by having a celebrity professor, Melissa Harris-Perry. In 2016, she was a host on MSNBC.
She accounts for 80% of Wake Forest’s UTE because of bi-directional linkage as shown on her profile from the Internet Archive.
If we were to re-analyze our data today, Wake Forest would lost 410K followers because MHP’s current profile points to another web link she is choosing to promote.
The school’s UTE ranking would fall as a result.
Now let’s look at the correlation between EEE and UTE.
By looking at the shaded area we see as expected that universities with more financial resources (EEE) tend to have larger Twitter followings.
There are still some universities in the lower EEE bins that have significant Twitter followings (Dartmouth)
Within the NCAA Division I, we noticed a strong similarity between the schools in Power Five (almost as if they were a fraternity)
The Power Five conferences, ACC, Big 10, Big 12, Pac-12 and SEC represent the richest and most powerful schools in the division.
Think about the schools who’re usually playing a bowl game on New Year’s day (Alabama, USC, Iowa, Wisconsin, Penn State)
Within the complete data set, we noticed that 55 out of the 65 Power Five schools were ranked within the top-100 of ARR rank.
We found that all 65 schools were ranked within the top-100 positions based on the EEE rank.
And, 61 of 65 were in the top-100 of UTE
This observation is consistent with the strong correlation we noted between EEE and UTE, τ = 0.6461 and is consistent with our intuition that large schools attract more Twitter followers.
Schools not in Top-100 of ARR
University of Arkansas--Fayetteville
Auburn University
University of Louisville
West Virginia University
Clemson University
Texas Christian University
Mississippi State University
Let’s look at scatterplot where we highlight the Power 5 using the same metrics used for the entire data set.
The Power 5 schools are the cluster of blue dots and the other schools are the black dots.
The black dots we see with both high ARR and UTE are Ivy league schools which aren’t in the Power 5.
We need to point out some schools that fall within the bottom 50% of UTE.
In particular, the University of Louisville could get a boost in UTE ranking by ≈ 107,000 followers) if the athletic department (@GoCards) would reference the primary URI of the university rather than its own domain (http://gocards.com).
We discovered 284 primary and secondary accounts for Georgia Tech, but only be considered official because profiles didn’t have a web link in the bio
We have the same scenario for Oregon State where 271 of the 341 accounts we discovered didn’t have web link.
Again, the bi-directional linkage is very important for determining UTE
In this table, we have a sample of the unique university domains referenced in the profile Twitter accounts we found for the Power 5.
You can see the table for all 65 universities in the appendix of our tech report.
We visually inspected the web content of each domain and determined a relationship with the university in some capacity (e.g., sports teams, clubs).
These domains don’t conform to our association rule so they’re not included in UTE.
Look at Purdue, 296 secondary Twitter accounts could add 426,586 followers to their UTE.
As evident by this example, the omission of followers is significant in our calculation of UTE score.
For the universities who are under performing in terms of Twitter followers, these additional domains would elevate the UTE rank and likely present a stronger correlation of Kendall’s Tau-b ( τ ) than we observed.
There are several opportunities for future work with our research.
We didn’t address bots or spam accounts and as reported by the NY Times, we know that it’s possible to buy followers.
We didn’t address artificial manipulation of Twitter followers which Campbell’s law suggests is a possibility when the underlying metric is well known.
For both of these scenarios, a temporal analysis could help to identify non-linear spikes in follower counts.
The Twitter API is limited in the ability to retrieve historical follower counts.
A colleague in my research lab has written a blog post which suggests an alternative approach using the Internet Archive’s Wayback Machine to look at the digital snapshots.
We didn’t try to identify all of the possible domains for a university because this effort would be manually intensive. But, it does appear that domains that include the school’s mascot are good candidates (sun devils, tigers, badgers)
So we examined and ranked a set of 264 U.S. universities extracted from the NCAA Division I membership and lists published in several well known ranking lists that we compared to our EEE and UTE scores.
To compute the EEE and UTE rankings, we mined available data from Twitter and other publicly-available collections.
When compared to the ARR rank, we noted a strong correlation with UTE ( τ =0.6018) and a similar correlation with EEE ( τ =0.5969).
We conclude that our rankings are comparable for approximating the well known lists, can be calculated on-demand, and we build a collection using only web-based artifacts rather than annual publications.
Figure 4 highlights the relationships between the Power Five and the various metrics by repeating the same metrics with the Power 5 shown in blue.
We noted several similarities which were indicative of the ten schools (15.4%) that were ranked outside of the top tier for ARR.
Notably both Texas Christian and Mississippi State are the only schools which were not ranked on two or more of the ranking lists. Both schools also fall signifi cantly below the mean values for the Power Five in terms of undergraduate enrollment ( ≈ 21,000), endowment value ( ≈ $2.3B), and athletic expenditures ( ≈ $90M),
placing them at the bottom of the EEE ranking. On the other hand, Wake Forest is the smallest institution in the Power Five, but the school garners an academic reputation (ARR=45) that cannot be readily explained by its comparatively low EEE ranking (EEE=97).
At the time of our data collection (Jun to Aug 2016), these figures represent the Twitter followers for two universities at different ends of the ranking spectrum
If we only consider alumni growth over time, we might expect Twitter followers to track at a similar rate
The large disparity between Harvard and VMI presents a first indication that some correlation may exist between rank position and Twitter followers