1. Dr. Lev Manovich
Professor of Computer Science, The Graduate Center, City
University of NewYork / Director, Cultural Analytics Lab
lab.culturalanalytics.info
email: manovich.lev@gmail.com
What Does Data Want?
(answer: help humans to think without categories)
2. - How to use big cultural data without aggregation and
summarization?
- How to think without (traditional) categories?
- How to learn from computers to understand the world
differently?
- How to work with big data without numbers?
3. 1960 - Born in Moscow
1981 - came to NYC
1982-1985: BA (NYU Film School)
1986-1988: MA inVision Science
1989-1993: PhD inVisual Culture
1992-2012: Professor of Digital Art
2013 - Professor of Computer Science
4. 1973 - started art lessons
1975 - started learning computer programming
1978 - how images are structured? How they communicate?
1984 - started working in GGI
1986 - took classes in computer vision
2005 - the idea of cultural analytics
2007 - established Cultural Analytics Lab (UCSD)
2020 - Cultural Analytics book is published
5. Using data science to analyze contemporary culture:
Companies - marketing research, consumer preferences,
new product development, analysis of online and physical
behaviors
Non-profits - museums, universities, etc.
AAI , Network science, many areas of Computer Science
Computational social science
Communication studies
Political science, Psychology, Sociology
Urban planning, Urban Studies
Data visualization, data design, data art
Digital Humanities (sometimes)
6. Research examples:
"Cultural diffusion & trends in Facebook photographs" (2017)
"StreetStyle: Exploring world-wide clothing styles from millions of
photos" (2017)
"Why the songs of the summer sound the same"(2018)
"Neuroaesthetics in Fashion: Modeling the Perception of
Fashionability”(2015)
Every Noise at Once (2013)
"Quantifying reputation and success in art" (2018)
7.
8.
9.
10.
11. What is “Culture”? Humanities vs Cultural Anthropology
Humanities: culture is material artifacts, texts, and media
objects created by small number of authors
Cultural Anthropology: culture is behaviors, symbols, rituals,
values, beliefs; looking at society as a whole
For contemporary culture, it seems easy to combine both
perspectives using data - e.g. SM is both artifacts & behaviors.
But how informative are these behaviors?
12. Analyzing Culture: Digital Humanities vs Computer
Science
DH (mostly):
- analyzes the historical artifacts by professional creatives
Computer Science (mostly):
- analyzes contemporary artifacts & behaviors of ordinary
people (e.g. SM posts, images, video, online and physical
behaviors by billions of “normal”users)
13.
14.
15.
16.
17.
18. Cultural analytics: using data methods to see
contemporary global culture (2005-):
inspiration: cientometrics, evolutionary biology, GIS
Research goals:
1) What are the themes, styles, behaviors and their
patterns in contemporary global culture?
2) Where are they active? (spatial distributions)
3) When they emerge, how they diffuse, change over time?
We now have enough data to map some of this at
relatively high resolution - but -
19. The main challenge in studying contemporary culture with
data science (as I see it):
- Shall we aggregate big cultural data and reduce it to small set
of patterns - frequently occurring ideas, themes, styles,
patterns, behaviors frequent in the data? (Statistical paradigm -
standard today in data science and data-driven research).
- This paradigm focuses on what is common between a number
of objects; does not include what occurs infrequently.
20. - Or shall we refuse this dominant paradigm
instead focusing on diversity, variability & differences
(including tiny ones)? - i.e. work on big cultural data without
aggregation)?
- include everything
- pay attention to infrequent (but not outliers)
- identify small cultural islands (that usually disappear when
researchers use dominant paradigm)
- question similarity (categories, clusters, dimension reduction,
etc.)
21. High-resolution data allows us to think outside of the
dominant intellectual paradigm of the modern period:
aggregation, reduction, categorization
Individualization paradigm in the media/data industry:
- Ad platforms making predictive models for each user (even
for different times of the day) & custom recommendations
- Search engines indexing every webpage they can find
- Spotify and other companies analyzing every music track
(40M+)
22. Examples of the dominant paradigm in data-driven
culture analysis:
Cultural Diffusion andTrends in Facebook Photographs (2014):
“We are interested in recognizing many different types of
cultural lifestyles or activities in photographs…we select
the most common concepts”
“…asked annotators to describe the main visible concepts
of images using a few keywords”
“After pruning infrequent keywords..”
23.
“Faces Engage Us: Photos with Faces Attract More Likes and
Comments on Instagram” (2014):
“Our dataset consists of 23 million Instagram photos and
over 3 million Instagram users…we randomly selected 1
million photos from this data set.“
"the existence of a face in a photo significantly affects its
social engagement.This effect is substantial, increasing the
chances of receiving likes by 38% and comments by 32%. "
24. “Exploring world-wide clothing styles from millions of photos” (2017)
Paper goals:
“- Identify common, visually correlated combinations of these
basic attributes (e.g., blue sweater with jacket and wool hat).
- Identify styles that appear more frequently in one city versus
another or more frequently during particular periods of time.
- Identify finer-grained, visually coherent versions of these ele-
ments (e.g., sports jerseys in a particular style)."
26. “GeoStyle: Discovering Fashion Trends and Events (2019)
“Most attribute combinations are uninteresting because of
their rarity: e.g., pink, short-sleeved, suits.
We want to focus on the limited set of attribute combinations
that are actually prevalent in the data.”“
27. Supervised vs unsupervised machine learning for seeing
culture:
-supervised machine learning use for classification:
start with existing categories (defined by experts, or by
“common sense”) and then classify the rest of the data /
new data using these categories. Using neural nets only
makes this problem bigger.
28. Cultural analytics - how my vision changed over time
- We want to challenge existing categories; ask if rigid
categories make sense for a particular cultural field; discover
its structure (2007)
- Unsupervised machine learning is well suited for these
goals; but success depends on how we represent a
phenomenon as data, what features we use (2010)
- But unsupervised machine learning also requires
aggregation, as classical statistics - how to avoid this? (2016)
29. - Data paradigm offers a new language for describing and
thinking about culture
- Numerical (continuous) scales instead of (verbal categories)
- Representing continuous change over time
- Representing differences between cultural artifacts and actors
as numerical distances in feature space
- Detecting clusters instead of starting with already existing
categories
- In a cluster any object has a particular distance to the center
(in traditional categories its either/or membership)
- But there are still key challenges -
30. The problems with representing cultural artifacts using
numerical features:
- how do we know we have the right features?
- we don’t know how brain combines visual features
- gestalt theory - the whole is not a mechanical part of the
parts
- many images may be identical from statistical point of view,
and yet they have crucial differences for a human observer -
tiny differences that make a difference
31. - Next slides: examples of Instagram photography
(2005-2016)
- Can data science and AI today capture all the differences
between these artifacts - between authors’ visions and the
differences all individual photographs? (in content, visual
language, mood, emotions - and all of this together for each
image - because this is how many people see)
- Every person may see each artifact differently depending on
her background, education, knowledge of codes, what she
seen before, etc. Can data approach capture such variability?
(recommendation engines research?)
32.
33.
34.
35.
36.
37. - Cultural Analytics vision (2007-2008)
- Examples of projects from our lab (2009-2015)
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59. Elsewhere project (2018-)
- Instead of using social networks data (posts by
individuals), we use information about cultural events
shared by organizations on different platforms.
- During last 20 years, the numbers of these places and
events have become so large that we can now to treat
them as “big data.”
- Examples of data sources: TEDx events, e-flux archive,
Meetup, Behance. Our dataset: 4.5 million events
60. Elsewhere project (2018-) - using locations, categories, dates
and text descriptions of millions of cultural events
1) What is the presence of some of contemporary culture
(CC) - as represented by our data sources - on a world map
today? What is the density and depth of this presence in
different places? Are there still big white spots?
2) What is the temporal growth and diffusion of CC (1990 -) ?
3) What are the topics, concepts and themes in CC? What
occurs everywhere, what is only somewhere, what is elsewhere
(outside of top cities), what is unique?