2. Business
• Midterms graded
• Office hours
– Tomorrow 11:00AM—3:00PM
– Friday 1:00PM—3:00PM
• Project homework
– Mark-up text for paragraphs and quotes
– Quotes are SPAN elements with CLASS attributes
of either ‘quote’ or ‘extract’
– Make sure file and directories are named properly
3. Review
• Web 2.0
– Post-Google era of the web
– Massive participation in social media
– Social production of knowledge
– New models of how knowledge is
produced, maintained, organized
• Tags
– One example of this shift
– A new kind of knowledge “product”
6. http://anthonyflo.tumblr.com/post/7590868323/photographer-and-self-described-geek-of-maps Eric Fischer
creates maps that
merge geographic
locations with
geotagged photos
from Flickr and
tweets from
Twitter. Red dots
pinpoint the
locations of Flickr
pictures, blue dots
show tweets, white
dots mark places
that have been
posted to both. This
map of
Washington, D.C., s
hows messages
concentrating
around the national
landmarks and
power corridors of
the city‟s federal
zone.
7. An algorithm generates a virtual Rome in 3D from
150,000 Flickr Users' Photos
http://www.popsci.com/gear-amp-gadgets/article/2009-09/building-virtual-cities-automatically-150000-flickr-photos
8. Flickr Photos Yield Tourist Trails. An algorithm
uses images from millions of tourists to suggest
ways for visitors to spend their time.
http://www.technologyreview.com/computing/25549/page1/
18. Franic Bacon in 1620 described a
new kind of knowledge based on
observation and induction
(empiricism). This view can be
partly traced to the successes of
exploration and instruments in
learning about the world.
19.
20. Anderson argues that a similar shift is
happening now
With the era of the “cloud” and massive
data
the Petabyte Age
comes a new kind of knowledge
21. The database is not just a symbolic
form
It is the pervasive and standard form in
which our knowledge is organized
22. Anderson
• The end of theory
– Positivism (see definition)
– It’s algorithms all the way down
• No need for models and causality
– Correlation is enough
• More is different
– The “Petabyte Age”
– The sheer amount of data makes it valuable
– Quality does not matter
23. Some Definitions
• Petabyte (PB) = 250
1,125,899,906,842,624 bytes
1,024 terabytes
• Positivism (my definition)
– A theory of knowledge that views physical laws and
models as more or less stable patterns
– Regards statistics and pattern recognition as more
authentic forms of knowledge than laws
– Radically empiricism (nothing “behind” the observed)
24. The Page Rank
algorithm
visualized
Google does not care about what is on a page, it
just cares about this
26. “AdWords analyzes every Google search to
determine which advertisers get each of up
to 11 „sponsored links‟ on every results page.
It‟s the world‟s biggest, fastest auction, a
never-ending, automated, self-service
version of Tokyo‟s boisterous Tsukiji fish
market, and it takes place, Varian
says, „every time you search.‟ ”
Steven Levy, “Secret of Googlenomics: Data-Fueled Recipe Brews
Profitability,” WIRED 17.06.
http://www.wired.com/culture/culturereviews/magazine/17-06/nep_googlenomics
27. It’s all about the algorithm
There is no real theory behind the
formula
It just happens to work
29. Manovich’s experiments
explore this concept
(examples from Mapping Time Exhibit)
http://www.flickr.com/photos/culturevis/sets/72157624959121129/detail/
32. Anna Karenina
This visualization of Anna
Karenina is inspired by a common
reading practice of underlining
important lines and passages in a
text using magic markers. To
create this visualization we
designed a program that reads the
text from a file and renders it in a
series of columns running from top
to bottom and from left to right as
a single image it also checks
whether text lines contain
particular words (this version
checks for the word Anna) and
highlights the found matches.
Original: http://www.flickr.com/photos/culturevis/4038907270/sizes/o/in/set-72157624959121129/ Mapping TimeJeremy Douglass and Lev Manovich, 2009.----------------------------Data:The covers of every issue of Time magazine published from 1923 to summer 2009. Total number of covers: 4535.The large percentage of the covers included red borders. We cropped these borders and scaled all images to the same size to allow a user see more clearly the temporal patterns across all covers. ----------------------------Timescale:1923-2009.----------------------------Mapping:Time covers appear in order of publication (i.e., from 1923 to 2009), arranged in a grid layout (left to right and top to bottom).----------------------------Mapping 4535 Time covers into a grid organized by publicatoon date reveals a number of historical patterns. Here are some of them:Medium: In the 1920s and 1930s Time covers use mostly photography. After 1941, the magazine switches to paintings. In the later decades the photography gradually comes to dominate again. In the 1990s we see emergence of the contemporary software-based visual language which combines manipulated photography, graphic and typographic elements.Color vs. black and white: The shift from early black and white to full color covers happens gradually, with both types coexisting for many years.Hue: Distinct “color periods” appear in bands: green, yellow/brown, red/blue, yellow/brown again, yellow, and a lighter yellow/blue in the 2000s.Brightness: The changes in brightness (the mean of all pixels’ grayscale values for each cover) follow a similar cyclical pattern.Contrast and Saturation: Both gradually increase throughout the 20th century. However, since the end of the 1990s, this trend is reversed: recent covers have less contrast and less saturation.Content: Initially most covers are portraits of individuals set against neutral backgrounds. Over time, portrait backgrounds change to feature compositions representing concepts. Later, these two different strategies come to co-exist: portraits return to neutral backgrounds, while concepts are now represented by compositions which may include both objects and people – but not particular individuals. The visualization also reveals an important “metapattern”: almost all changes are gradual. Each of the new communication strategies emerges slowly over a number of months, years or even decades.
Source: http://www.flickr.com/photos/culturevis/5107682969/sizes/l/in/set-72157624959121129/This visualization of Anna Karenina is inspired by a common reading practice of underlining important lines and passages in a text using magic markers. To create this visualization we designed a program that reads the text from a file and renders it in a series of columns running from top to bottom and from left to right as a single image it also checks whether text lines contain particular words (this version checks for the word Anna) and highlights the found matches.
Source: http://www.flickr.com/photos/culturevis/5109394222/in/set-72157624959121129Manga Style SpaceLev Manovich and Jeremy Douglass, 2010.------Data: 883 Manga series from the scanlationsiteOneManga.com. Total number of pages: 1,074,790.In the Fall 2009, we havedownloaded 883 Manga seriescontaining 1,074,790 uniquepages from thissite. We havethenusedourcustom software system installed on a supercomputeratNationalDepartment of EnergyResearch Center (NERSC) to analyzevisualfeatures of thesepages.-------------------------------Timescale:The longestrunning Manga serieshasbeenpublishedcontinuouslysince 1976. The most popular series on OneManga.comareNaruto (1999-; 8835 pages) and One Piece (1997-; 10562 pages). Along with suchlong Manga series, our data set alsocontainsshorterseriesthatappeared in 2000s and only run for 1-3 years. -------------------------------Mapping:X axis: standard deviation of pixels’ grayscalevalues in a page. Y axis: entropymeasuredoverallpixels’ grayscalevalues in a page. -------------------------------The visualizationshows 1,074,790 uniquepages from 883 distinct manga series from Japan, Korea and China. The seriesincludebothvery popular long-runningtitlessuch as Naruto and One Piece and alsomanyshort-livedtitles. The visualizationmaps the pages the pagesaccording to some of theirvisualcharacteristicsthatweremeasuredautomatically on supercomputersat the U.S. NationalDepartment of EnergyResearch Center usingcustom software developed by Software StudiesInitiative. (X-axis: standard deviation. Y-axis: entropy.)The pages in the bottom part of the visualizationare the most graphic and have the leastamount of detail. The pages in the upperrighthavelots of detail and texture. The pages with the highestcontrastare on the right, whilepages with the leastcontrastare on the left. In betweenthesefourextremes, we findeverypossiblestylisticvariation. Thissuggeststhatourbasicconcept of “style” maybe not appropriatethen we considerlargecultural data sets. The conceptassumesthat we canpartition a set of culturalartifactsworksinto a small number of discretecategories. In the case of our one millionpages set, we findpracticallyinfinitegraphicalvariations. If we try to dividethisspaceintodiscretestylisticcategories, anysuchattemptwill be arbitrary. Visualizationalsoshowswhichgraphicalchoicesaremorecommonlyused by manga artists (the central part of the “cloud” of pages) and whichappear much morerarely (bottom and leftparts).---------------Note: some of the pages - such as allcovers - are in color. However in order to be able to fitall image into a single large image (the originalis 44,000x44,000 pixels - scaled to 10,000x10,000 for posting to Flickr), we renderedeverything in greyscale.Becausepagesarerendered on top of eachother, youdon'tactuallysee 1 million of distinctpages - the visualizationshows a distribution of allpages with typicalexamplesappearing on the top.
Mondrian vs. RothkoLev Manovich, 2010.images preparation: Xiaoda Wang ----------------------------Data:128 paintings by Piet Mondrian (1905 - 1917).151 paintings by Mark Rothko (1944 - 1957). ----------------------------Mapping:X-axis: brightness meanY-axis: saturation meanThe two image plots are placed side by side so they share the Y-axis----------------------------This visualization demonstrates how image plots can be used to compare multiple data sets. In this case, the goal is to compare similar number of paintings by Piet Mondrian and Mark Rothko (produced over comparable time periods of 13 years) along particular visual dimensions. We have selected particular periods in the career of each artist which are structurally similar. In the beginning of a period each artist was imitating his predecessors and contemporaries. By the end period each developed his mature style for which he became famous. In between, each gradually moved moved from figurative representation to pure abstraction.The left image plot shows 128 paintings by Mondrian; the right shows 151 paintings by Rothko. The paintings are organized according to their brightness mean (X-axis) and saturation mean (Y-axis). These measurements were obtained with digital image processing software.Projecting sets of paintings of these two artists into the same coordinate space reveals their comparative "footprints" - the parts of the space of visual possibilities they explored. We can see the relative distributions of their works - the more dense and the more sparse areas, the presence or absence of clusters, the outliers, etc.The visualizations also show how Mark Rotho - the abstract artist of the generation which followed Mondrian’s - was exploring the parts of brightness/hue space which Mondrian did not reach (highly saturated and bright paintings in the upper right corner, and desaturated dark paintings in the left part). Another interesting pattern revealed by the visualization is that all paintings of one artists are sufficiently different from each other – no two occupy the same point in brightness / saturation space. This makes sense given the ideology of modern art on unique original works – if we are to map works from earlier centuries, when it was common for artists to make copies of successful works which were considered to be equally valuable, we may expect to see a different pattern. However what could not be predicted is that the distances between any two paintings which are next to each are similar to each other – i.e., while each image occupies its own unique position, its not very far from its neighbours.To see how each artist moved through brightness/saturation space during the 13 year periods we are comparing, we can visualize the paintings as color circles. The colors indicate the position of each paintings within the time period, running from blue to red. To make the patterns even easier to see, we also vary the size of circles from - from smallest to largest.www.flickr.com/photos/culturevis/4728910768/in/set-721576...This visualizationreveals another interesting pattern. Rothko starts his explorations in late 1930-1940s in the same same part of brightness/saturation space where Mondrian arrives by 1917 - high brightness/low saturation area (the right bottom corner of the plot). But as he develops, he is able to move beyond the areas already “marked” by his European predecessors such as Mondrian.
Original: http://www.flickr.com/photos/culturevis/4048646419/sizes/o/in/set-72157622608431194/ film: The Eleventh Yeardata: every shot of the film is represented by 1 frameThe frame are arranged by brightness kurtosis (X) and number of shapes (Y). (Note that frames overlap so not all of them are visible).