Más contenido relacionado La actualidad más candente (20) Similar a MarkLogic User Group - Best of MLW and Search + Semantics (20) MarkLogic User Group - Best of MLW and Search + Semantics2. Slide 2 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
<MLUGL>
<intro/>
<talk>
<bit>Mission Impossible</bit>
<story>Wiley</story>
<story>Springer</story>
<story>Mitchell1</story>
<bit>Search and Semantics<bit>
<demo>Old Skool</demo>
</talk>
</MLUGL>
3. Slide 3 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Mission(s) Impossible
4. Slide 4 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
<story>http://www.marklogic.com/resources/slides-gearing-up-for-the-
content-factory-to-quickly-create-innovate-and-monetize/</story>
5. Slide 5 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Why is it Mission Impossible?
Start Revenue Earning January 2013
• Publish new content from 1 Jan 2013
• Accepted Articles : 20/day; 100/week; 400/month
• Early View Articles: 20/day; 100/week; 400/month
• Issues : 19/month; 77/quarter; 230/year
Give AGU customers access to all licensed content by 1 January 2013
• 21 journals (160,000 articles)
• 33 personal choice products (aka virtual journals) based on AGU index terms
• 743 special sections
• Migrate customers, users, products, licenses, alerts data
Vendors, systems & business processes in Editorial & Production ready to
publish 2013 Content
• Integration with new editorial system
• Changes to work flow
And… it needs to work like how it works on AGU site with over 60 enhancements
6. Slide 6 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
KeyChallenges
•Content with no issue number and no pagination
•Journal with 7 parts, of which 3 of those parts have sub-parts!
•Many moving parts within Wiley - 17 systems to check
•Content completeness and quality (and external vendor)
•Unknown unknowns - coping with changing and emerging requirements
throughout development phase
Challenges to overcome
• 4 months left!
7. Slide 7 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Examples:
“Coastal Ocean
Observatories”
“The 11 March 2011
Tohoku-Oki Earthquake
and Tsunami”
Content-Driven Functionality – Special Section Search
8. Slide 8 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
How MarkLogic Helped - S/W Development
Search Service
•As a search engine, doesn't need manual/additional re-indexing after loading new
content. Everything is done on fly – saves time and effort
•Enabled reuse and only had to add some enhancements to search service for AGU
Save Searches
•Search service processing request in XML is easy to save whole search and reuse it
for either alerts or loading the saved search
Index Terms
•Reuse vocabulary service to help with hierarchy of index terms. This was more
valuable for faceting for index terms. Can easily fetch any sub-structure of index
terms
Faceting
•MarkLogic supports faceting, so no need to do anything special, just add proper
configuration according AGU specification
9. Slide 9 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What Variations/Non Standard
Practices were introduced
• New licensing model (e.g. multi choice product for personal subscribers)
• Create Special Sections as another slice of content view
• New workflow for handling daily society data updates via feeds
• Changing content workflow for legacy vs current content
• Improvements to content (not just conversion)
• Start development before requirements were clear
• Complete testing before we had all the content
• Cannot complete certain types of testing
• Break some rules
Recipe for Disaster?
10. Slide 10 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Conclusion
•Mission Impossible? Choose not to accept
•Mission Impossible? Deal with it – that’s life but may not succeed
•Mission Impossible? New organizational capability
•Embrace challenge, but put your best people with experience on it
•Be brave to break the rules when required
•People over Process
•Enabling technologies like MarkLogic
Develop as new capability to handle the unexpected and unknowns
11. Slide 11 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
<story>http://www.marklogic.com/resources/betting-the-company-how-
springer-successfully-insourced-its-flagship-content-platform/</story>
12. Slide 12 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 18
Growth in electronic sales
0.0%
20.0%
40.0%
60.0%
80.0%
100.0%
2007 2008 2009 2010 2011 2012
Bud
Total Online
Total Print66
33
13. Slide 13 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 19
So...
Springer decided to
build its own platform
14. Slide 14 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 21
36 man-years of effort to reproduce
36 man-yearsHow much time independent software auditor
estimated it would take to reproduce
the existing code base
15. Slide 15 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 22
A risky move?
MetaPress
code base
16. Slide 16 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 24
Oh, and have it ready
in 11 months
17. Slide 17 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 26
Where we were in April 2011
• People
• 1 Executive Champion
• 1 Product Owner
• 1 Dir. of Dev
• 1 Tech Lead
• 2 Developers
• 1 BA
• 0 QA
• 0 DevOps
• 0 UX/design/front-end
• 0 architect
• Hardware/Software/Data
• 0 databases
• 0 servers
• 0 documents
7 staff*
*3 managers – who don’t count
Jan-Erik de Boer
Brian Bishop Georg Nold
EVP of IT
Product Owner Dir. of Development
19. Slide 19 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 29
Where we are today
• 1 Executive champion
• 1 Product Owner
• 1 Dir. of Dev
• 2 Tech Leads
• 16 Developers
• 2 Dev Ops
• 4 BAs
• 6 QAs
• 2 UX
• 2 Design/Front-end
• 1 Architect
• 16 servers
• 2 live environments
• 1 database
• 12 pairing stations
• 2 Build Agents
• 2 dashboard machines
• 5.7 million documents
• 60 million PNGs
• 11TB of data
31staff
20. Slide 20 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 31
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
New platform release schedule
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
Release
21. Slide 21 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 34
22. Slide 22 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 42
MarkLogic
cluster
RESTful APIs realtime.springer.com
citations.springer.com
iPhone apps
23. Slide 23 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 45
Goals are
prioritized
(top to bottom) and
stories
are prioritized
(left to right)
Velocity is measured
every week, allowing
us to accurately
forecast when a
certain level of work
can be completed
24. Slide 24 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 55
MarkLogic IS agile
25. Slide 25 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 56
MarkLogic agility
• Schema-less means we can use our complex XML content as-is
• E.g. Different attributes for books, journals, chapters, articles, protocols, etc.
• You can decide later if you need to add indexes at very little cost
• You don’t have to know everything up front
• Ingestion is relatively pain-free
• You are free to come up with features without worrying about back-end
• Modifying content via Record Loader makes it easy to manipulate data
• Handles various types of native content
• You don’t even have to use Xquery!
26. Slide 26 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 69
What if you could subscribe to
a search query?
27. Slide 27 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 70
Content Entitlements
2TB
Storing entitlements as queries means any new content loaded
automatically becomes available to authorized users
Customers
<material_ID=“001”>
Subject : Engineering
<content>
Journal_ID:0001
ContentType: Article
DatePublished: 4/4/2012
Subject:Mathematics
Author: John Smith
Language: English
Keywords: “k theory” <material_ID=“002”>
Journal_ID: 0001-0099
<material_ID=“003”>
Subject: Engineering
SearchTerm: “carbon nanotube”
DatePublished: 2000-2012
<customer=“001”>
material_ID : 001
These are stored as
serialized queries
28. Slide 28 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 76
How did it go?
29. Slide 29 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 72
0
2
4
6
8
10
12
Old New
Average Page Load Time (sec)
30. Slide 30 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Betting the Company | 4/6/2013 | 77
Weekly visits to SpringerLink (millions, Aug 4, 2012 – Mar 2, 2013)
Source: Google Analytics
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
4,500,000
5,000,000
link.springer.com
SpringerLink.com
Total
31. Slide 31 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
<story>http://www.marklogic.com/resources/the-journey-from-
print-to-online/</story>
33. Slide 33 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
65 OEM Auto and Part
Manufacturers
Data on every modern car sold in
US
Repair
Diagnostics
Maintenance
Technical Service Bulletins (TSBs)
Wiring
Estimator
Mitchell1: Data
34. Slide 34 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
What’ s in the data store today?
• Articles – 408,892
– 209,987 Narratives
– 103,416 Technical Service Bulletins and Recalls
– 15,179 Maintenance Schedules
• Images – 6,193,647
– 5,924,959 Narrative
– 268,688 Technical Service Bulletins and Recalls
• When it’ s all broken down, it becomes roughly
16,000,000 MarkLogic Documents
35. Slide 35 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
And how do we describe it?
• Preferred Terms
– Tends to be the ASE term
– Used to describe Components (12,261), Diagnostic Trouble
Codes (65,525), and Information Types (98)
• Non-Preferred Terms
– Tends to be OEM specific terminology
– Alternate terms for Components (22,733) and Information Types
(757)
– Codes do not have Non-Preferred Terms
• Spatial References
– Because “ Replace the window motor” just isn’ t precise enough
36. Slide 36 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Mitchell1: Data Then, Data Now
37. Slide 37 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Mitchell1: Data Then, Data Now
38. Slide 38 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Mitchell1: Data Then, Data Now
39. Slide 39 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Mitchell1: Market Reaction
https://www.youtube.com/watch?v=IfM8v-8NY_4&list=UUIOYnh6LBFooV_YxlPVPLvA&index=36
40. Slide 40 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Search . . . and Semantics
41. Slide 41 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
One Question . . .
42. Slide 42 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Who’s Smarter?
VS
43. Slide 43 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Do domestic dogs interpret pointing as a command?
Animal Cognition (2012): 1-12 , November 09, 2012
By Scheider, Linda; Kaminski, Juliane; Call, Josep; Tomasello, Michael
46. Slide 46 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
The Basic Idea
Get some triples . . . if you haven’t already
• Grabbed DBPedia
• Dumped in Linked Data Consortium
• Loaded Lehigh
• and NYT’s open data
You are behind!
But what if you could add in documents?
47. Slide 47 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Rich MarkLogic Applications .. Made Richer
48. Slide 48 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Rich MarkLogic Applications .. Made Richer
Name: John Smith
Affiliation: IBM
Timezone: PST
Committer: Hadoop
49. Slide 49 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Semantics Architecture
TRIPLE
XQY XSLT SQL SPARQL
GRAPH
SPARQL
50. Slide 50 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Triple Index
• 3 triple orders
• Cached for performance
• Works seamlessly with other indexes
• Security
• 350 bytes per triple on disk
• 1 billion+ triples per host
TRIPLE
51. Slide 51 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
SPARQL
• Executed using the triple index
• SPARQL 1.0
• Cost-based optimization
• Join ordering and algorithms
• More in the lightning talks
select * where {
?person :birth-place ?place;
:first-name “John”
}
SPARQL
54. Slide 54 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Old Skool
- Quickie Framework
- Circa 2006ish
- HTML tables -> 1997 style
- ‘action’ controller
- <query/> state -> from the query string
- No sessions
- No CSS
- No Javascript
- No Adaptive Design
- No Facets?
58. Slide 58 Copyright © 2013 MarkLogic® Corporation. All rights reserved.
Just Semantics?
Notas del editor << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >> << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >> << JBG: Data Now slide needs to be replaced. A slide at the end of this presentation contains an appropriate image. >> Run it past Michaline and Dave GorbetInclude fulltext index in exposition. Not all index has to be in memoryRoles and permissionsCheck sizingSee a SPARQL querySpend a bit more time on this slide