SlideShare una empresa de Scribd logo
1 de 28
Requirement Analysis THE STAT PROJECT Milestone 1 Report
To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
Variation for importing dataset (File Sources)
Variations for importing dataset (File formats)
Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
Reuters dataset example
Simplified approach ,[object Object],[object Object],[object Object],[object Object],Observation: for the sake of comparison, researchers usually deal with a few famous dataset (e.g., Reuters, RCV-1)
Able to  persist and read back  memory objects
Able to  visualize  memory objects
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
STAT framework sample code (conceptual)
 
Domain Concept:  RawCorpus A collection of  RawDocument , supporting collection operations: - Add new  RawDocument   element - Remove existing  RawDocument   element - Accessing elements in the collection - …
Domain Concept:  RawCorpus abstract class  RawCorpus  { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
Domain Concept:  RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
Domain Concept:  RawDocument class  MyRawDocument  extends  RawDocument  { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class  RawDocument  { public RawDocument() {} }
Domain Concept:  Processor An object that processes  RawCorpus  and produces  Corpus .  - Linguistic:  Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
Domain Concept:  Corpus An object representing a collection of  Document   for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
Domain Concept:  Trainer A representation of a machine learning algorithm, which can learn from a  Corpus  and produce a  Model .
Domain Concept:  Model An object of what machine learning algorithm (i.e.,  Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e.,  Corpus )
Domain Concept:  Classifier An object that maps  Documents  to target values (label, number, probability). It takes a  Corpus  and a  Model  as inputs, and produces a  Prediction  associated with the  Corpus  according to the  Model .
Domain Concept:  Prediction A collection of target values (label, number, probability) that associate with a  Corpus , i.e., a collection of  Document .
Domain Concept:  Evaluator An object used for comparing the  Prediction  against its associated  Corpus  and generating  Evaluation
Domain Concept:  Evaluation A representation of evaluation result given by a  Evaluator , in a summarized manner.
THE STAT PROJECT Thanks
STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
STAT Domain Model Note : We ignore texts above lines for brevity  Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument

Más contenido relacionado

La actualidad más candente

Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsRuben Taelman
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
C programming disk file reading and writing
C programming disk file reading and writingC programming disk file reading and writing
C programming disk file reading and writingrishi ram khanal
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexChris Wilper
 
input/ output in java
input/ output  in javainput/ output  in java
input/ output in javasharma230399
 
EKAW - Linked Data Publishing
EKAW - Linked Data PublishingEKAW - Linked Data Publishing
EKAW - Linked Data PublishingRuben Taelman
 
RapidMiner: Word Vector Tool And Rapid Miner
RapidMiner:  Word Vector Tool And Rapid MinerRapidMiner:  Word Vector Tool And Rapid Miner
RapidMiner: Word Vector Tool And Rapid MinerDataminingTools Inc
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
File Handling in Java Oop presentation
File Handling in Java Oop presentationFile Handling in Java Oop presentation
File Handling in Java Oop presentationAzeemaj101
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)Om Ganesh
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXStuart Chalk
 
Java IO Package and Streams
Java IO Package and StreamsJava IO Package and Streams
Java IO Package and Streamsbabak danyal
 

La actualidad más candente (19)

ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
Versioned Triple Pattern Fragments
Versioned Triple Pattern FragmentsVersioned Triple Pattern Fragments
Versioned Triple Pattern Fragments
 
Java stereams
Java stereamsJava stereams
Java stereams
 
9 Inputs & Outputs
9 Inputs & Outputs9 Inputs & Outputs
9 Inputs & Outputs
 
Data structure Unit-I Part A
Data structure Unit-I Part AData structure Unit-I Part A
Data structure Unit-I Part A
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Input output streams
Input output streamsInput output streams
Input output streams
 
C programming disk file reading and writing
C programming disk file reading and writingC programming disk file reading and writing
C programming disk file reading and writing
 
Javaiostream
JavaiostreamJavaiostream
Javaiostream
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource Index
 
input/ output in java
input/ output  in javainput/ output  in java
input/ output in java
 
EKAW - Linked Data Publishing
EKAW - Linked Data PublishingEKAW - Linked Data Publishing
EKAW - Linked Data Publishing
 
RapidMiner: Word Vector Tool And Rapid Miner
RapidMiner:  Word Vector Tool And Rapid MinerRapidMiner:  Word Vector Tool And Rapid Miner
RapidMiner: Word Vector Tool And Rapid Miner
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
File Handling in Java Oop presentation
File Handling in Java Oop presentationFile Handling in Java Oop presentation
File Handling in Java Oop presentation
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Java IO Package and Streams
Java IO Package and StreamsJava IO Package and Streams
Java IO Package and Streams
 

Destacado

Effective usecases
Effective usecasesEffective usecases
Effective usecasesam_iim
 
Determining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And DsignDetermining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And DsignAsaduzzaman Kanok
 
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...Ivano Malavolta
 
Software (requirement) analysis using uml
Software (requirement) analysis using umlSoftware (requirement) analysis using uml
Software (requirement) analysis using umlDhiraj Shetty
 
Software Requirement Specification
Software Requirement SpecificationSoftware Requirement Specification
Software Requirement SpecificationVishal Singh
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specificationindrisrozas
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement DocumentIsabel Elaine Leong
 

Destacado (8)

Effective usecases
Effective usecasesEffective usecases
Effective usecases
 
Requirement analysis with use case
Requirement analysis with use caseRequirement analysis with use case
Requirement analysis with use case
 
Determining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And DsignDetermining Requirements In System Analysis And Dsign
Determining Requirements In System Analysis And Dsign
 
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
Requirements engineering with UML [Software Modeling] [Computer Science] [Vri...
 
Software (requirement) analysis using uml
Software (requirement) analysis using umlSoftware (requirement) analysis using uml
Software (requirement) analysis using uml
 
Software Requirement Specification
Software Requirement SpecificationSoftware Requirement Specification
Software Requirement Specification
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specification
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement Document
 

Similar a STAT Requirement Analysis

postgres loader
postgres loaderpostgres loader
postgres loaderINRIA-OAK
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?mikaelbarbero
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualArkaGhosh65
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldDatabricks
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...Spark Summit
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)Data Finder
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddlerholiman
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docxhoney725342
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)Al Sabawi
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementAndreas Schreiber
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsAndreas Schreiber
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 

Similar a STAT Requirement Analysis (20)

postgres loader
postgres loaderpostgres loader
postgres loader
 
ORDBMS.pptx
ORDBMS.pptxORDBMS.pptx
ORDBMS.pptx
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
iOS Application Development
iOS Application DevelopmentiOS Application Development
iOS Application Development
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Hatkit Project - Datafiddler
Hatkit Project - DatafiddlerHatkit Project - Datafiddler
Hatkit Project - Datafiddler
 
Java basics
Java basicsJava basics
Java basics
 
ODF Mashups
ODF MashupsODF Mashups
ODF Mashups
 
1 Project 2 Introduction - the SeaPort Project seri.docx
1  Project 2 Introduction - the SeaPort Project seri.docx1  Project 2 Introduction - the SeaPort Project seri.docx
1 Project 2 Introduction - the SeaPort Project seri.docx
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Quantopix analytics system (qas)
Quantopix analytics system (qas)Quantopix analytics system (qas)
Quantopix analytics system (qas)
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 

Más de stat

Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09stat
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1stat
 
Requirementv4
Requirementv4Requirementv4
Requirementv4stat
 
Stat2 25 09
Stat2 25 09Stat2 25 09
Stat2 25 09stat
 
Requirment
RequirmentRequirment
Requirmentstat
 
Requirements - Part 1
Requirements - Part 1Requirements - Part 1
Requirements - Part 1stat
 

Más de stat (6)

Stat Design3 18 09
Stat Design3 18 09Stat Design3 18 09
Stat Design3 18 09
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1
 
Requirementv4
Requirementv4Requirementv4
Requirementv4
 
Stat2 25 09
Stat2 25 09Stat2 25 09
Stat2 25 09
 
Requirment
RequirmentRequirment
Requirment
 
Requirements - Part 1
Requirements - Part 1Requirements - Part 1
Requirements - Part 1
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

STAT Requirement Analysis

  • 1. Requirement Analysis THE STAT PROJECT Milestone 1 Report
  • 2. To design a framework, how many variations we need to protect? How many functionalities we need to provide for supporting all these variations? QUESTIONS
  • 3. Variation for importing dataset (File Sources)
  • 4. Variations for importing dataset (File formats)
  • 5. Variations for importing dataset (Schemas) Even if we only consider dataset in XML, each dataset may have its own schema.
  • 7.
  • 8. Able to persist and read back memory objects
  • 9. Able to visualize memory objects
  • 10. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation
  • 11. STAT framework sample code (conceptual)
  • 12.  
  • 13. Domain Concept: RawCorpus A collection of RawDocument , supporting collection operations: - Add new RawDocument element - Remove existing RawDocument element - Accessing elements in the collection - …
  • 14. Domain Concept: RawCorpus abstract class RawCorpus { List< RawDocument > rawDocuments; RawDocument getDocument(int index); void setDocument(int index, T doc); void removeDocument(int index); }
  • 15. Domain Concept: RawDocument An object with one or more string fields, serving as a non-processed, in-memory representation of a document unit - Like Java beans with getter and setter - All fields must be string type, even for numbers
  • 16. Domain Concept: RawDocument class MyRawDocument extends RawDocument { String title; String author; String body; String date; String numOfClicks; String topicType; … } abstract class RawDocument { public RawDocument() {} }
  • 17. Domain Concept: Processor An object that processes RawCorpus and produces Corpus . - Linguistic: Tokenizer, Stemmer, StopRemover, PosTagger, … - Machine learning: Feature-specific, document-specific
  • 18. Domain Concept: Corpus An object representing a collection of Document for use by machine learning side of framework. This object provides a notion of splits which is commonly used (e.g., train, test)
  • 19. Domain Concept: Trainer A representation of a machine learning algorithm, which can learn from a Corpus and produce a Model .
  • 20. Domain Concept: Model An object of what machine learning algorithm (i.e., Trainer ) creates to store parameters that are &quot;learned&quot; from the data (i.e., Corpus )
  • 21. Domain Concept: Classifier An object that maps Documents to target values (label, number, probability). It takes a Corpus and a Model as inputs, and produces a Prediction associated with the Corpus according to the Model .
  • 22. Domain Concept: Prediction A collection of target values (label, number, probability) that associate with a Corpus , i.e., a collection of Document .
  • 23. Domain Concept: Evaluator An object used for comparing the Prediction against its associated Corpus and generating Evaluation
  • 24. Domain Concept: Evaluation A representation of evaluation result given by a Evaluator , in a summarized manner.
  • 26. STAT (brief) Domain Model Note : We ignore texts on connectors for brevity. Some connections are not drawn because of space limitation Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer Vocabulary
  • 27. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Writer
  • 28. STAT Domain Model Note : We ignore texts above lines for brevity Corpus Reader Processor RawCorpus Trainer Model Classifier Prediction Evaluator Evaluation Document RawDocument