Arxiv.org: Research And Development Directions

•Descargar como PPT, PDF•

1 recomendación•1,203 vistas

his presentation describes the arXiv.org collection and users, development on authentication and access control as well as research projects in text classification and time series analysis. 16 slide presentation, Microsoft powerpoint, given at a November 2003 Information Science Open House.

Tecnología

ArXiv.org
250,000 documents
47,000 registered users
1 million+ downloads per year

Cost Per Paper
$10000

Commercial Journal

$1000

Non-Profit Journal

$10

arXiv

Goal: Process increasing number of submissions at
constant or declining cost

arXiv has an active core of users: 10% of users are
responsible for about 1/3 of all submissions, 50% of all
users have logged in (to submit or update a paper) in the
past 1.5 years

Authentication and Access Control
Recently moved from an http authentication/Berkeley database system
to a system based on cookies and a relational database.
Currently, all registered users (who haven’t been suspended) can
submit to all subjects classes in all archives – the original submitter or
somebody with the paper password can update the paper.
People are allowed to register depending on their E-mail address:
abc@university.edu can register, but xyz@company.com can’t unless
company=ibm,lucent,…; this list is hard to maintain (we have to block
popular ISPs in every country), exceptions are dealt with manually at
great cost (each case takes detective work), and there are many people
in .edu (alumni, non-research staff) who shouldn’t be able to submit.
Because registration and submission are linked, user database can’t be
used to offer other services: e-mail notification, personalization.

Endorsements and Trust Management
Administrators

Grandfathered Users

In new system, everyone will be able to register. Users who
registered under the old system will still be able to upload to
any archive or subject class, but new users will need to be
endorsed by an author with a publication history in that
category. Burden shifts from one senior staff person to 47,000
registered users. User database can be used

Endorsee

d
En

Endorser

en
m
rse
o

e
od
tc

Web-based interface for administrators:
• View user history and publications
• Monitor endorsement process
• Manage authority records
• Disable ability to submit or endorse
• Keep “institutional memory”

Future Directions
•Flexible Submission Queue (Currently submissions are
published the following evening – we can’t easily delay a
submission)
•Validating Metadata Form (Force users to clean up entry
errors, so administrators don’t have to)
• Automatic Protection (Suspicious submissions and
endorsements will be automatically delayed)
• New Search Engine based on Lucene
• Retrofit e-mail notification (current awareness) to use new
user database.

Classifying Articles with the
Support Vector Machine
Paul Ginsparg
Paul Houle
Thorsten Joachims
Jae-Hoon Sul
Goal: identify papers in existing archives that are relevant to
a new subject archive, q-bio (Quantitative Biology)

Active Training of SVM
Training: q-bio
Training: not q-bio
Other far from margin
Other close to margin

SVM finds maximum-margin hyperplane. We do first training run on one
year of data, then identify other papers that lie close to the dividing line.
We iteratively classify these by hand to refine the classification

Classifer performance improves as the size of a category
increases.

Time Series Analysis of Content
and Usage Information
Paul Ginsparg
Jon Kleinberg

Kleinberg’s algorithm uses a hidden Markov model to detect bursts of
word usage in arXiv titles, reveals intellectual trends in the last
decade of high-energy physics theory.

Announcement

Cited by other papers
Web Link Added

Review papers have a distinctive pattern of use: an initial spike after
announcement, followed by a long nearly-constant tail.

Más contenido relacionado

Destacado

Semiclassical mechanics of a non-integrable spin clusterPaul Houle

Journalism and the Semantic WebKurt Cagle

Diploma Supplement_1Tomislav Šoštarić

Resume Jyoti MenonJyoti Sudhir Menon

How to Trace an E-mail Part 2Lebowitzcomics

Comandos spanning tree1 2d

Open badgesmarch2014Martin Cooke

Newsletter nr 11_noiembrie_2014Vochescu Alexandru

Test titlekeszthelyi

Microsoft® Outlook® Tips Hints For Adminspses12

2010 DOE DirectoryHonolulu Civil Beat

Uma sec council_june_22_v4Domenico Catalano

July, 2014 Vol. 18 No.3Monica Sharma

What is doe level 6FSP Technology Inc.

Innovation & Marketing at 50+Utai Sukviwatsirikul

Changes to SNS, VIS & BARDNASBLA

Tep business planning in tourismled4lgus

USER & USAGE GEO.ADMIN.CH (OKCon 2013)geoportal of the federal authorities of the Swiss Confederation

How2Recycle Label PresentationGreenBlue

ARIN Registration Services Department ReportARIN

Destacado (20)

Semiclassical mechanics of a non-integrable spin cluster

Journalism and the Semantic Web

Diploma Supplement_1

Resume Jyoti Menon

How to Trace an E-mail Part 2

Comandos spanning tree

Open badgesmarch2014

Newsletter nr 11_noiembrie_2014

Test title

Microsoft® Outlook® Tips Hints For Admins

2010 DOE Directory

Uma sec council_june_22_v4

July, 2014 Vol. 18 No.3

What is doe level 6

Innovation & Marketing at 50+

Changes to SNS, VIS & BARD

Tep business planning in tourism

USER & USAGE GEO.ADMIN.CH (OKCon 2013)

How2Recycle Label Presentation

ARIN Registration Services Department Report

Similar a Arxiv.org: Research And Development Directions

E-library mangament system@Royal_Class: Private Business

Learning Management SystemShubham Singh

Federated Access Management 102JISC.AM

McShibboleth PresentationJISC.AM

JISC License WorkshopJISC.AM

Triage in the Digital Age, by Mary Beth Weber and Gracemary Smulewitz Charleston Conference

Leicester Research Archive (LRA): the work of a repository administratorGaz Johnson

Access Management for Libraries by John Paschoud & Masha GaribyanJISC.AM

A Novel methodology for handling Document Level Security in Search Based Appl...lucenerevolution

Lucene solrrev documentlevelsecurity_rajanimaski_finalRajani Maski

OpenAthens Conference 2018 - Trevor Hough - Case study - University of LeedsOpenAthens

How Do We Measure Success In Digital RepositoriesRichard Bernier

Vision and Scope Document For Library Management SystemSoman Sarim

Federated Access Management (SFEU)JISC.AM

Partnering With Vendors to Limit Compromised User Accounts - Richard GuajardoNASIG

Individual e journal subscription: assembly requiredxqhiris

Simple Web service Offering Repository Deposit (SWORD)‏Julie Allinson

HathiTrust Research Center Secure CommonsBeth Plale

library management systemprabhat kumar

Celsius Bloodhound: Automatizing searching and fetching records from library ...Servicio de Difusión de la Creación Intelectual (SEDICI)

Similar a Arxiv.org: Research And Development Directions (20)

E-library mangament system

Learning Management System

Federated Access Management 102

McShibboleth Presentation

JISC License Workshop

Triage in the Digital Age, by Mary Beth Weber and Gracemary Smulewitz

Leicester Research Archive (LRA): the work of a repository administrator

Access Management for Libraries by John Paschoud & Masha Garibyan

A Novel methodology for handling Document Level Security in Search Based Appl...

Lucene solrrev documentlevelsecurity_rajanimaski_final

OpenAthens Conference 2018 - Trevor Hough - Case study - University of Leeds

How Do We Measure Success In Digital Repositories

Vision and Scope Document For Library Management System

Federated Access Management (SFEU)

Partnering With Vendors to Limit Compromised User Accounts - Richard Guajardo

Individual e journal subscription: assembly required

Simple Web service Offering Repository Deposit (SWORD)‏

HathiTrust Research Center Secure Commons

library management system

Celsius Bloodhound: Automatizing searching and fetching records from library ...

Más de Paul Houle

Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle

Estimating the Software Product Value during the Development ProcessPaul Houle

Universal Standards for LEI and other Corporate Reference Data: Enabling risk...Paul Houle

Fixing a leaky bucket; Observations on the Global LEI SystemPaul Houle

Cisco Fog Strategy For Big and Smart DataPaul Houle

Making the semantic web workPaul Houle

Ontology2 platformPaul Houle

Ontology2 Platform EvolutionPaul Houle

Paul houle the supermenPaul Houle

Paul houle what ails enterprise search Paul Houle

Subjective Importance SmackdownPaul Houle

Extension methods, nulls, namespaces and precedence in c#Paul Houle

Dropping unique constraints in sql serverPaul Houle

Prefix casting versus as-casting in c#Paul Houle

Paul houle resumePaul Houle

Keeping track of state in asynchronous callbacksPaul Houle

Embrace dynamic PHPPaul Houle

Once asynchronous, always asynchronousPaul Houle

What do you do when you’ve caught an exception?Paul Houle

Extension methods, nulls, namespaces and precedence in c#Paul Houle

Más de Paul Houle (20)

Chatbots in 2017 -- Ithaca Talk Dec 6

Estimating the Software Product Value during the Development Process

Universal Standards for LEI and other Corporate Reference Data: Enabling risk...

Fixing a leaky bucket; Observations on the Global LEI System

Cisco Fog Strategy For Big and Smart Data

Making the semantic web work

Ontology2 platform

Ontology2 Platform Evolution

Paul houle the supermen

Paul houle what ails enterprise search

Subjective Importance Smackdown

Extension methods, nulls, namespaces and precedence in c#

Dropping unique constraints in sql server

Prefix casting versus as-casting in c#

Paul houle resume

Keeping track of state in asynchronous callbacks

Embrace dynamic PHP

Once asynchronous, always asynchronous

What do you do when you’ve caught an exception?

Extension methods, nulls, namespaces and precedence in c#

Último

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

GenCyber Cyber Security Day PresentationMichael W. Hawkins

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

A Call to Action for Generative AI in 2024Results

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

How to convert PDF to text with Nanonetsnaman860154

Histor y of HAM Radio presentation slidevu2urc

Arxiv.org: Research And Development Directions

1. ArXiv.org 250,000 documents 47,000 registered users 1 million+ downloads per year Cost Per Paper $10000 Commercial Journal $1000 Non-Profit Journal $10 arXiv

2. Goal: Process increasing number of submissions at constant or declining cost

3. arXiv has an active core of users: 10% of users are responsible for about 1/3 of all submissions, 50% of all users have logged in (to submit or update a paper) in the past 1.5 years

4. Authentication and Access Control Recently moved from an http authentication/Berkeley database system to a system based on cookies and a relational database. Currently, all registered users (who haven’t been suspended) can submit to all subjects classes in all archives – the original submitter or somebody with the paper password can update the paper. People are allowed to register depending on their E-mail address: abc@university.edu can register, but xyz@company.com can’t unless company=ibm,lucent,…; this list is hard to maintain (we have to block popular ISPs in every country), exceptions are dealt with manually at great cost (each case takes detective work), and there are many people in .edu (alumni, non-research staff) who shouldn’t be able to submit. Because registration and submission are linked, user database can’t be used to offer other services: e-mail notification, personalization.

5. Endorsements and Trust Management Administrators Grandfathered Users In new system, everyone will be able to register. Users who registered under the old system will still be able to upload to any archive or subject class, but new users will need to be endorsed by an author with a publication history in that category. Burden shifts from one senior staff person to 47,000 registered users. User database can be used

6. Endorsee d En Endorser en m rse o e od tc

7. Web-based interface for administrators: • View user history and publications • Monitor endorsement process • Manage authority records • Disable ability to submit or endorse • Keep “institutional memory”

8. Future Directions •Flexible Submission Queue (Currently submissions are published the following evening – we can’t easily delay a submission) •Validating Metadata Form (Force users to clean up entry errors, so administrators don’t have to) • Automatic Protection (Suspicious submissions and endorsements will be automatically delayed) • New Search Engine based on Lucene • Retrofit e-mail notification (current awareness) to use new user database.

9. Classifying Articles with the Support Vector Machine Paul Ginsparg Paul Houle Thorsten Joachims Jae-Hoon Sul Goal: identify papers in existing archives that are relevant to a new subject archive, q-bio (Quantitative Biology)

10. Active Training of SVM Training: q-bio Training: not q-bio Other far from margin Other close to margin SVM finds maximum-margin hyperplane. We do first training run on one year of data, then identify other papers that lie close to the dividing line. We iteratively classify these by hand to refine the classification

11.

12. Classifer performance improves as the size of a category increases.

13.

14. Time Series Analysis of Content and Usage Information Paul Ginsparg Jon Kleinberg

15. Kleinberg’s algorithm uses a hidden Markov model to detect bursts of word usage in arXiv titles, reveals intellectual trends in the last decade of high-energy physics theory.

16. Announcement Cited by other papers Web Link Added Review papers have a distinctive pattern of use: an initial spike after announcement, followed by a long nearly-constant tail.

Notas del editor

{}

Arxiv.org: Research And Development Directions

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Arxiv.org: Research And Development Directions

Similar a Arxiv.org: Research And Development Directions (20)

Más de Paul Houle

Más de Paul Houle (20)

Último

Último (20)

Arxiv.org: Research And Development Directions

Notas del editor