SlideShare a Scribd company logo
1 of 24
1
About me
マーク・バーンズ
about.me/mark.burns
日本語ができる Ruby developer
On holiday from England
I love ruby and startups
2
Introduction
Jim Breen’s (Monash University)
Japanese-English online dictionary
wwwjdic.com
Data freely available
accepts user-contributions
3
wwwjdic
(rewrite)
https://github.com/markburns/wwwjdic
4
Current interaction
GET http://wwwjdic.com
301 -> http://www.edrdg.org/cgi-bin/wwwjdic/wwjdic?1C
POST http://www.csse.monash.edu.au/~jwb/cgi-bin/wwwjdic.cgi?1E
BODY: dsrchkey=%CD%F1&dicsel=1
5
Response
5
6
Aims
JSON API
Cleaner UI
Nice features: e.g. autocomplete
Easily extensible open source codebase
7
JSON API
GET http://localhost:4000/ 卵 .json
8
Simpler UI
(Example)
GET http://localhost:4000/ 卵
8
9
Autocomplete
10
Trie index
http://oldblog.antirez.com/post/autocomplete-with-redis.html
Autocomplete
11
Trie index
Time: O(log(N)) N=~150,000.
Space: N*(Ma+1)
=~ 51MB
12
TRIE
12
13
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
14
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
15
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes",
"egg dishes*", "egg l", "egg la", "egg lai",
"egg laid", "egg laid ", "egg laid i", "egg
laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
16
["egg laid in wint", "egg laid in winte", "egg
laid in winter", "egg laid in winter*", "egg m",
"egg me", "egg mem", "egg memb", "egg
membr", "egg membra", "egg membran",
"egg membrane", "egg membrane*", "egg s",
"egg sa"]
["eg", "ega", "egal", "egali", "egalit",
"egalita", "egalitar", "egalitari", "egalitaria",
"egalitarian", "egalitarian*", "egg", "egg ",
"egg (", "egg (e"]
["egg dish", "egg dishe", "egg dishes",
"egg dishes*", "egg l", "egg la", "egg lai",
"egg laid", "egg laid ", "egg laid i", "egg
laid in", "egg laid in ", "egg laid in w",
"egg laid in wi", "egg laid in win"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
17
"walr""walt"
"walrus"
["walr", "walru", "walrus", "walrus*",
"walruse", "walruses", "walruses*",
"walt", "waltz", "waltz ", "waltz (",
"waltz (c", "waltz (co", "waltz (com",
"waltz (comp"]
https://github.com/markburns/wwwjdic/blob/master/app/data_access/auto_
18
shutl.com & graphs
19
Isomorphism?
20
N-grams
安心 リフォーム へ の 近道 [TAB]29
(Anshin reform he no chikamichi)
安心 + リフォーム + へ + の + 近道
安心 [TAB]41,322,178
21
Present/State of
Play
Data import to redis
Indexed word lookup
Autocomplete
Begun work on text glossing
22
Noticably Missing
Not yet released to production
No test/staging server
However, should be easy enough to run
locally
23
Future
Wordnet plus graph db => mapping of
languages
Analysis of kanji
User experience/Design/Polish
N-grams
Other ideas/collaboration?
24
https://github.com/markburns/wwwjdic
http://www.slideshare.net/_mark_burns/slides-24568551
about.me/mark.burns
Questions?
24

More Related Content

What's hot

Baby – SS & FK
Baby – SS & FKBaby – SS & FK
Baby – SS & FKshortstp73
 
Presentation on tank fish culture at pksf
Presentation on tank fish culture at pksfPresentation on tank fish culture at pksf
Presentation on tank fish culture at pksfRasal Ali
 
PyCon大会分享
PyCon大会分享PyCon大会分享
PyCon大会分享Qing Feng
 
زُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِزُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِDr. GM Sherbini
 
おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出moai kids
 
多快好省的前端开发实践
多快好省的前端开发实践多快好省的前端开发实践
多快好省的前端开发实践美团技术团队
 
Node js javascript no lado do servidor
Node js javascript no lado do servidorNode js javascript no lado do servidor
Node js javascript no lado do servidorMauricio Vieira
 
Head to head shed 20 dairy cow in bangla
Head to head shed  20 dairy cow in  banglaHead to head shed  20 dairy cow in  bangla
Head to head shed 20 dairy cow in banglaMohammad Ruhul Amin
 
Tail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaTail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaMohammad Ruhul Amin
 
Williams darnell finalppp_slideshow
Williams darnell finalppp_slideshowWilliams darnell finalppp_slideshow
Williams darnell finalppp_slideshowKash Kobain
 

What's hot (14)

Baby – SS & FK
Baby – SS & FKBaby – SS & FK
Baby – SS & FK
 
Site 2013
Site 2013Site 2013
Site 2013
 
Presentation on tank fish culture at pksf
Presentation on tank fish culture at pksfPresentation on tank fish culture at pksf
Presentation on tank fish culture at pksf
 
PyCon大会分享
PyCon大会分享PyCon大会分享
PyCon大会分享
 
زُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِزُبَرَ الْحَدِيدِ و الْقِطْرِ
زُبَرَ الْحَدِيدِ و الْقِطْرِ
 
Computer nerwork
Computer nerworkComputer nerwork
Computer nerwork
 
おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出おひろめ会〜教師なしワード抽出
おひろめ会〜教師なしワード抽出
 
123movies au
123movies au123movies au
123movies au
 
多快好省的前端开发实践
多快好省的前端开发实践多快好省的前端开发实践
多快好省的前端开发实践
 
Node js javascript no lado do servidor
Node js javascript no lado do servidorNode js javascript no lado do servidor
Node js javascript no lado do servidor
 
Head to head shed 20 dairy cow in bangla
Head to head shed  20 dairy cow in  banglaHead to head shed  20 dairy cow in  bangla
Head to head shed 20 dairy cow in bangla
 
Tail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in banglaTail to tail shed 20 dairy cow in bangla
Tail to tail shed 20 dairy cow in bangla
 
Gd
GdGd
Gd
 
Williams darnell finalppp_slideshow
Williams darnell finalppp_slideshowWilliams darnell finalppp_slideshow
Williams darnell finalppp_slideshow
 

Viewers also liked

แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่Kamthon Sarawan
 
Canals de tv via satel·lit asma
Canals de tv via satel·lit asmaCanals de tv via satel·lit asma
Canals de tv via satel·lit asmamgonellgomez
 
V mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperV mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperEMC
 
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...EMC
 
La televisió blai
La televisió blaiLa televisió blai
La televisió blaimgonellgomez
 
4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – InfographicEMC
 
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization Microsoft TechNet - Belgium and Luxembourg
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsEMC
 
The colorful friends
The colorful friendsThe colorful friends
The colorful friendsChandan Dubey
 
El cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilEl cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilmgonellgomez
 
Dell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesDell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesTracy McKibben
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
International trade
International tradeInternational trade
International tradeTravis Klein
 
RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013EMC
 

Viewers also liked (20)

แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
แบบบ้านสองชั้น สวยทันสมัย ตกแต่งน่าอยู่
 
Canals de tv via satel·lit asma
Canals de tv via satel·lit asmaCanals de tv via satel·lit asma
Canals de tv via satel·lit asma
 
Day2
Day2 Day2
Day2
 
V mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaperV mware organizing-for-the-cloud-whitepaper
V mware organizing-for-the-cloud-whitepaper
 
Jose gafas
Jose gafasJose gafas
Jose gafas
 
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
Improve Patient Care and Reduce IT Costs with Vendor Neutral Archiving and Cl...
 
La televisió blai
La televisió blaiLa televisió blai
La televisió blai
 
4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic4 Ms of Big Data: Make Me More Money – Infographic
4 Ms of Big Data: Make Me More Money – Infographic
 
Webdays blida mobile top 10 risks
Webdays blida   mobile top 10 risksWebdays blida   mobile top 10 risks
Webdays blida mobile top 10 risks
 
Day 7
Day 7Day 7
Day 7
 
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization Forbidden fruits of Active Directory  –  Cloning, snapshotting, virtualization
Forbidden fruits of Active Directory – Cloning, snapshotting, virtualization
 
Flash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array DesignsFlash Implications in Enterprise Storage Array Designs
Flash Implications in Enterprise Storage Array Designs
 
The colorful friends
The colorful friendsThe colorful friends
The colorful friends
 
El cas del... oriol, oriol i nil
El cas del... oriol, oriol i nilEl cas del... oriol, oriol i nil
El cas del... oriol, oriol i nil
 
Warren buffet
Warren buffetWarren buffet
Warren buffet
 
Dell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For SuperheroesDell Webinar 2014-06-24: Subqueries For Superheroes
Dell Webinar 2014-06-24: Subqueries For Superheroes
 
Thebracelet
ThebraceletThebracelet
Thebracelet
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
International trade
International tradeInternational trade
International trade
 
RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013RSA Monthly Online Fraud Report -- May 2013
RSA Monthly Online Fraud Report -- May 2013
 

Similar to Introduction to wwwjdic project

Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Amazon Web Services
 
"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンドHayato Mizuno
 
Polyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraPolyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraMathias Karlsson
 
MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!Stennie Steneker
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012threepointone
 
Leveling Up at JavaScript
Leveling Up at JavaScriptLeveling Up at JavaScript
Leveling Up at JavaScriptRaymond Camden
 
Node.js Anti Patterns
Node.js Anti PatternsNode.js Anti Patterns
Node.js Anti PatternsBen Hall
 
Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Peter Higgins
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainKen Collins
 
Writing your Third Plugin
Writing your Third PluginWriting your Third Plugin
Writing your Third PluginJustin Ryan
 
Social Coding With JRuby
Social Coding With JRubySocial Coding With JRuby
Social Coding With JRubyKoichiro Ohba
 
Getting Started With Play Framework
Getting Started With Play FrameworkGetting Started With Play Framework
Getting Started With Play FrameworkTreasury user10
 
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef PROIDEA
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUKiwamu Okabe
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)True-Vision
 
Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Alex Cachia
 

Similar to Introduction to wwwjdic project (20)

Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
Zero to Sixty: AWS Elastic Beanstalk (DMG204) | AWS re:Invent 2013
 
"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド"今" 使えるJavaScriptのトレンド
"今" 使えるJavaScriptのトレンド
 
Polyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraPolyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPra
 
MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!MongoDB shell games: Here be dragons .. and JavaScript!
MongoDB shell games: Here be dragons .. and JavaScript!
 
Shell Script
Shell ScriptShell Script
Shell Script
 
Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012Amplify your stack - Jsfoo pune 2012
Amplify your stack - Jsfoo pune 2012
 
MateriApps LIVE! の設定
MateriApps LIVE! の設定MateriApps LIVE! の設定
MateriApps LIVE! の設定
 
Leveling Up at JavaScript
Leveling Up at JavaScriptLeveling Up at JavaScript
Leveling Up at JavaScript
 
Node.js Anti Patterns
Node.js Anti PatternsNode.js Anti Patterns
Node.js Anti Patterns
 
Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.Your Library Sucks, and why you should use it.
Your Library Sucks, and why you should use it.
 
Free The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own DomainFree The Enterprise With Ruby & Master Your Own Domain
Free The Enterprise With Ruby & Master Your Own Domain
 
Writing your Third Plugin
Writing your Third PluginWriting your Third Plugin
Writing your Third Plugin
 
Social Coding With JRuby
Social Coding With JRubySocial Coding With JRuby
Social Coding With JRuby
 
Ruby ile tanışma!
Ruby ile tanışma!Ruby ile tanışma!
Ruby ile tanışma!
 
Getting Started With Play Framework
Getting Started With Play FrameworkGetting Started With Play Framework
Getting Started With Play Framework
 
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
DOD 2016 - Tomasz Torcz - The Song of JBoss and Chef
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
 
03 tk2123 - pemrograman shell-2
03   tk2123 - pemrograman shell-203   tk2123 - pemrograman shell-2
03 tk2123 - pemrograman shell-2
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)
 
Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)Why Rust? by Edd Barrett (codeHarbour December 2019)
Why Rust? by Edd Barrett (codeHarbour December 2019)
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Introduction to wwwjdic project

Editor's Notes

  1. My name is Mark BurnsI'm a ruby developer, I speak Japanese, and I'm on holiday from England.
  2. I'm here to talk today about Jim Breen's Japanese Dictionary, wwwjdic,in particular, an open source rewrite of this online dictionary. As you may have guessed, it's originally written and maintained mostly byJim Breen, who is a retired professor (and current PhD student) at MonashUniversity in Melbourne Australia.It's freely available, actually I'm not 100% sure about the license, I'm no internet/international lawyer, but it's a flexible license that allows free and commercial use, but with a 'please-do-the-right-thing'and donate some money if it benefits you kind of deal
  3. So the start of the rewrite is available here: [URL]I'll also show the slideshare URL at the end of the talk so youcan make a note to be able to see all the various linksIn the past I've spoke to Jim about making improvements to the webinterface of the dictionary. I feel it could be better presented andmore user-friendly/intuitive.
  4. For example a typical lookup would be this kind of interaction:Visit wwwjdic.comredirected to this long URL with a particular query param for the word-search pagefill in a form and do a POST request toa URL with a specific query string parameter andspecifically encoded bodyAnd the results are currently available as HTML that looks likethis:
  5. So it's great, if you like information, and know where to look.You have links to everything you might need to do, and more.And it's this 'and more', that I think is the issue with a lot of information presentation.To be honest, it's not great for beginners, without thought on hierarchy of importance of information(which I'll come back to) Now, there's nothing wrong with this at all, it's just that it suitsits specific audience in particular. And by that I mean, technicallyminded learners of Japanese. I can only guess, but I also imagine it is morecommonly known amongst English native speakers than native Japanese.
  6. I thought it would be nicer to be able to make it in general more accessibleSo my aims of creating this project are thus:* Provide a JSON API* A Cleaner UI/UX* Autocomplete/other nice UI touches* Maintainability
  7. 8.Propose APIwhere you can GET a simply defined (easy to remember) URLGET http://wwwjdic.com/egg.json
  8. And some nicer design for the HTML output. now I'm not a front-end designer by any means, but I can appreciate the philosophyof clean design
  9. A first attempt was made using the Rails flavour of the ActiveRecord pattern against an SQL backend . (Easy to get up andrunning, but squeezes the concepts of domain model and persistence together). But a dictionary is much more read heavy than write heavy,and the model of languages doesn't fit as well in a relationaldatabase. The existing data is a few flat text files so I wanted toget a decent compromise for maintainability and it would be nice tonot completely throw away all the performance of the existingsystem's custom C code reading from flat text files.
  10. Autocomplete was done with a trie index The whole code and concept was pretty much taken from Antirez's (theauthor of redis) blog post http://oldblog.antirez.com/post/autocomplete-with-redis.html It scales quite nicely, as the entries are of the 150,000 magnitude Time O(log(N)) Space N*(Ma+1) Where Ma is average length of a word (5.6) =~51MB
  11. OK some details, Not too specific, but detailed enough hopefully to keep everyone happy. This is a result of doing a lookup on an index generated for autocompletion. E.g. the user searched for ‘egg’, and the list shows all the following matches in the autocomplete list.
  12. Here’s the lookup
  13. After entering ‘ eg ’ this is the value of `matches` Where we iterate over each match, and if the match doesn’t match, we break out. otherwise we append the match to our list of matches
  14. Here we have an example where the user has entered “walr” and the break clause is hit, as the value “walt” does not match “walr”
  15. In my work for shutl, a UK startup aimed at solving the onlinedelivery problem we use graph databases to help us match upcarrier/vehicle availability and pricing with customer requirementsand retail store opening hours. I think it could be interesting tostart structuring the data in a graph format. Words can at least belinked to the entries listed in their definitions. There can be amore semantically rich level of relationships represented though
  16. I think that mapping words to a graph is a more natural way to expressthe relationship between two languages. Firstly, you don't always haveisomorphic (one-to-one) relationships between any two words in eitherlanguage. すごい can mean in English either great or terrible. It can meansomething like wonderful or fantastic, as well as dreadful. I oftenstruggle with words that are their own antonyms, this was particularlyrelevant to me as on the day of the large Touhoku earthquake, I was ona shinkansen heading into Tokyo. After being on the train for sixhours, I needed to get a beer and find some people to chat to to findout what had happened. I'd understood that there was an earthquake,but it was my first experience of an earthquake and I hadn't yetgrasped the magnitude of it in both the literal and metaphoricalsenses of the term magnitude. So I found a guy who wanted to practicehis English, and he explained to me that "This is a great day forJapan". "Very great" I understanding something along the lines ofwonderful/fantastic had to ask him "Why? Is it a national holiday?Maybe the emperor's birthday?" Of course, it occured to me when Itranslated his sentence into Japanese in my head, choosing すごい forgreat that he must have meant the terrible/dreadful sense of the word.So clearly there is a need for a richer, more expressive data modelthat can capture these nuances and senses, and not just provide aone-to-one lookup service.
  17. Due to Jim's relationship with Monash University, hehas access to google's data-set of Japanese n-grams. An n-gram 安心リフォームへの近道 安心 リフォーム へ の 近道 [TAB]29 (5-gramsample) 安心 + リフォーム + へ + の + 近道安心 [TAB]41322178 安 心 [TAB]3274So this sequence of words occurred 29 times during the datacollection.By utilising this data we can look at making search have morerelevance. One of the problems with the existing flat file structureis that there is no meta-data helping with understanding how recent orrelevant a particular result is. Some of the terms may be legal orscientific terms, or pre-1945,Can be useful for spotting common co-locations too.