SlideShare una empresa de Scribd logo
1 de 27
Scala collections
Sagie Davidovich
@mesagie
singularityworld.com
linkedin.com/in/sagied
Warm up example:
Fibonacci sequence
val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Key concepts:
• Recursive values
• Streams
• Scan
• Binary place-holder notation
Immutable collections
You’ll know about
• Avoid memory allocation for empty collections
• Optimize for small collections
• Equal-hashCode contract
• Asymptotic behavior of common operations
Nil
List.empty and Nil are singletons.
No new memory is allocated
Option[A]
Immutable Sets – emptySet
emptySet is a singleton too
Immutable Sets – Set1
Optimized for sets of size 1
Immutable Sets – Set2
Optimized for sets of size 2
Immutable Sets – Set4
A HashSet is (finally) instantiated
Immutable Collections
Mutable Collections
One liners
Computing a derivative
def derivative(nums: Iterable[Double]) =
nums.sliding(2)
.map (pair => pair._2 - pair._1)
What can be improved in this solution?
Bonus question: change a few characters to find the
max slope
Counting occurrences (histogram)
"encyclopedia" groupBy identity mapValues (_.size)
Map (
e -> 2, n -> 1, y -> 1, a -> 1, i -> 1,
l -> 1, p -> 1, c -> 2, o -> 1, d -> 1
)
Word n-grams
val range = 1 to 3
val text = "hello sweet world"
val tokenize = (s: String) => s.split(" ")
range flatMap (size => tokenize(text) sliding size)
Result:
Vector(Array(hello), Array(sweet), Array(world), Array(hello,
sweet), Array(sweet, world), Array(hello, sweet, world))
Are all members of a greater than
corresponding members of b
val a = List(2,3,4)
val b = List(1,2,3)
// O(n^2) and not very elegant.
(0 until a.size) forall (i => a(i) > b(i))
// O(n) but creates tuples and a temporary list. Yet, more elegant.
a zip b forall (x=> x._1 > x._2)
// same as above but doesn't create a temporary list (lazy)
a.view zip b forall (x=> x._1 > x._2)
// O(n), without tuple or temporary list creation, and even more elegant.
(a corresponds b)(_ > _)
Strings are collections. How come?
“abc”.max
@inline implicit def augmentString(x: String) =
new StringOps(x)
String <% StringOps <: StringLike <: IndexedSeqOptimized …
Complexity of collection operations
• Linear:
– Unary: O(n):
• Mappers: map, collect
• Reducers: reduce, foldLeft, foldRight
• Others: foreach, filter, indexOf, reverse, find, mkString
– Binary: O(n+ m):
• union, diff, and intersect
Immutable Collections time complexity
head tail apply update prepend append
List C C L L C L
Stream C C L L C L
Vector eC eC eC eC eC eC
Stack C C L L C L
Queue aC aC L L L C
Range C C C - - -
String C L C L L L
Mutable Collections time complexity
head tail apply update prepend append insert
ArrayBuffer C L C C L aC L
ListBuffer C L L L C C L
StringBuilde
r C L C C L aC L
MutableList C L L L C C L
Queue C L L L C C L
ArraySeq C L C C - - -
Stack C L L L C L L
ArrayStack C L C C aC L L
Array C L C C - - -
Bonus question
What’s the complexity
of Range.sum?
Range
Equals-hashCode contract
(a equals b)  (a.hashCode == b.hashCode)
All Scala collection implement the contract
Bad idea: Set[Array[Int]]
Good idea: Set[Vector[Int]]
Bad Idea: Set[ArrayBuffer[Int]]
Bad Idea: Set[collection.mutable._]
Good Idea: Set[collection.immutable._]
More on collections equality
val (a, b) = (1 to 3, List(1, 2, 3))
a == b // true
Q: Wait, how efficient is Range.hashCode?
A: O(n)
override def hashCode = util.hashing.MurmurHash3.seqHash(seq)
Challenge yourself:
Is there a closed (o(1)) formula for a range hashCode?
Java interoperability
Implicit (less boilerplate):
import collection.javaConversions._
javaCollection.filter(…)
Explicit (better control):
Import collection.javaConverters._
javaCollection.asScala.filter(…)
scalaCollection.asJava
The power of type-level programming
graph path-finding in compile time
import scala.language.implicitConversions
// Vertices
case class A(l: List[Char])
case class B(l: List[Char])
case class C(l: List[Char])
case class D(l: List[Char])
case class E(l: List[Char])
// Edges
implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A')
implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B')
implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C')
implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E')
def pathFrom(end:D) = end
pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
Want to go Pro?
• Shapeless (Miles Sabin)
– Polytypic programming & Heterogenous lists
– github.com/milessabin/shapeless
• Scalaxy (Olivier Chafik)
– Macros for boosting performance of collections
– github.com/ochafik/Scalaxy

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Arrays basics
Arrays basicsArrays basics
Arrays basics
 
Abstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generatorsAbstracting over the Monad yielded by a for comprehension and its generators
Abstracting over the Monad yielded by a for comprehension and its generators
 
Scientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuanScientific Computing with Python - NumPy | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
 
Mutable data types in python
Mutable data types in pythonMutable data types in python
Mutable data types in python
 
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
 
Faster persistent data structures through hashing
Faster persistent data structures through hashingFaster persistent data structures through hashing
Faster persistent data structures through hashing
 
Numpy
NumpyNumpy
Numpy
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Strings
StringsStrings
Strings
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
1.Array and linklst definition
1.Array and linklst definition1.Array and linklst definition
1.Array and linklst definition
 
2-D array
2-D array2-D array
2-D array
 
Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’ Addendum to ‘Monads do not Compose’
Addendum to ‘Monads do not Compose’
 
Chapter7
Chapter7Chapter7
Chapter7
 
2- Dimensional Arrays
2- Dimensional Arrays2- Dimensional Arrays
2- Dimensional Arrays
 
Functional ruby
Functional rubyFunctional ruby
Functional ruby
 
Introduction to numpy Session 1
Introduction to numpy Session 1Introduction to numpy Session 1
Introduction to numpy Session 1
 
Statistics - ArgMax Equation
Statistics - ArgMax EquationStatistics - ArgMax Equation
Statistics - ArgMax Equation
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 

Destacado

Exposiçao: Kandinsky - Tudo começa num ponto
Exposiçao: Kandinsky - Tudo começa num pontoExposiçao: Kandinsky - Tudo começa num ponto
Exposiçao: Kandinsky - Tudo começa num ponto
Fabiana Motroni
 

Destacado (10)

Carolina castro segundo parcial_tarea2
Carolina castro segundo parcial_tarea2Carolina castro segundo parcial_tarea2
Carolina castro segundo parcial_tarea2
 
Principios diseño de interacción
Principios diseño de interacciónPrincipios diseño de interacción
Principios diseño de interacción
 
Exposiçao: Kandinsky - Tudo começa num ponto
Exposiçao: Kandinsky - Tudo começa num pontoExposiçao: Kandinsky - Tudo começa num ponto
Exposiçao: Kandinsky - Tudo começa num ponto
 
Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...
Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...
Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...
 
Caso La Noria
Caso La NoriaCaso La Noria
Caso La Noria
 
Algorithms in nature
Algorithms in natureAlgorithms in nature
Algorithms in nature
 
Effective code reviews
Effective code reviewsEffective code reviews
Effective code reviews
 
7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...
7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...
7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...
 
ZAYANN : Parcours pédagogique entre nature et culture
ZAYANN : Parcours pédagogique entre nature et cultureZAYANN : Parcours pédagogique entre nature et culture
ZAYANN : Parcours pédagogique entre nature et culture
 
Scala Parallel Collections
Scala Parallel CollectionsScala Parallel Collections
Scala Parallel Collections
 

Similar a Scala collections wizardry - Scalapeño

Similar a Scala collections wizardry - Scalapeño (20)

Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
Mining Functional Patterns
Mining Functional PatternsMining Functional Patterns
Mining Functional Patterns
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
 
Hashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT iHashing In Data Structure Download PPT i
Hashing In Data Structure Download PPT i
 
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
BASE Meetup: "Analysing Scala Puzzlers: Essential and Accidental Complexity i...
 
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
Scala Up North: "Analysing Scala Puzzlers: Essential and Accidental Complexit...
 
Scala Collections
Scala CollectionsScala Collections
Scala Collections
 
Scala collections
Scala collectionsScala collections
Scala collections
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
 
Scala collection
Scala collectionScala collection
Scala collection
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collections
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collections
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Introduction to MATLAB
Introduction to MATLABIntroduction to MATLAB
Introduction to MATLAB
 
Power of functions in a typed world
Power of functions in a typed worldPower of functions in a typed world
Power of functions in a typed world
 
ScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin OderskyScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin Odersky
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Scala collections wizardry - Scalapeño

  • 2. Warm up example: Fibonacci sequence val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _) Key concepts: • Recursive values • Streams • Scan • Binary place-holder notation
  • 3. Immutable collections You’ll know about • Avoid memory allocation for empty collections • Optimize for small collections • Equal-hashCode contract • Asymptotic behavior of common operations
  • 4. Nil List.empty and Nil are singletons. No new memory is allocated
  • 6. Immutable Sets – emptySet emptySet is a singleton too
  • 7. Immutable Sets – Set1 Optimized for sets of size 1
  • 8. Immutable Sets – Set2 Optimized for sets of size 2
  • 9. Immutable Sets – Set4 A HashSet is (finally) instantiated
  • 13. Computing a derivative def derivative(nums: Iterable[Double]) = nums.sliding(2) .map (pair => pair._2 - pair._1) What can be improved in this solution? Bonus question: change a few characters to find the max slope
  • 14. Counting occurrences (histogram) "encyclopedia" groupBy identity mapValues (_.size) Map ( e -> 2, n -> 1, y -> 1, a -> 1, i -> 1, l -> 1, p -> 1, c -> 2, o -> 1, d -> 1 )
  • 15. Word n-grams val range = 1 to 3 val text = "hello sweet world" val tokenize = (s: String) => s.split(" ") range flatMap (size => tokenize(text) sliding size) Result: Vector(Array(hello), Array(sweet), Array(world), Array(hello, sweet), Array(sweet, world), Array(hello, sweet, world))
  • 16. Are all members of a greater than corresponding members of b val a = List(2,3,4) val b = List(1,2,3) // O(n^2) and not very elegant. (0 until a.size) forall (i => a(i) > b(i)) // O(n) but creates tuples and a temporary list. Yet, more elegant. a zip b forall (x=> x._1 > x._2) // same as above but doesn't create a temporary list (lazy) a.view zip b forall (x=> x._1 > x._2) // O(n), without tuple or temporary list creation, and even more elegant. (a corresponds b)(_ > _)
  • 17. Strings are collections. How come? “abc”.max @inline implicit def augmentString(x: String) = new StringOps(x) String <% StringOps <: StringLike <: IndexedSeqOptimized …
  • 18. Complexity of collection operations • Linear: – Unary: O(n): • Mappers: map, collect • Reducers: reduce, foldLeft, foldRight • Others: foreach, filter, indexOf, reverse, find, mkString – Binary: O(n+ m): • union, diff, and intersect
  • 19. Immutable Collections time complexity head tail apply update prepend append List C C L L C L Stream C C L L C L Vector eC eC eC eC eC eC Stack C C L L C L Queue aC aC L L L C Range C C C - - - String C L C L L L
  • 20. Mutable Collections time complexity head tail apply update prepend append insert ArrayBuffer C L C C L aC L ListBuffer C L L L C C L StringBuilde r C L C C L aC L MutableList C L L L C C L Queue C L L L C C L ArraySeq C L C C - - - Stack C L L L C L L ArrayStack C L C C aC L L Array C L C C - - -
  • 21. Bonus question What’s the complexity of Range.sum?
  • 22. Range
  • 23. Equals-hashCode contract (a equals b)  (a.hashCode == b.hashCode) All Scala collection implement the contract Bad idea: Set[Array[Int]] Good idea: Set[Vector[Int]] Bad Idea: Set[ArrayBuffer[Int]] Bad Idea: Set[collection.mutable._] Good Idea: Set[collection.immutable._]
  • 24. More on collections equality val (a, b) = (1 to 3, List(1, 2, 3)) a == b // true Q: Wait, how efficient is Range.hashCode? A: O(n) override def hashCode = util.hashing.MurmurHash3.seqHash(seq) Challenge yourself: Is there a closed (o(1)) formula for a range hashCode?
  • 25. Java interoperability Implicit (less boilerplate): import collection.javaConversions._ javaCollection.filter(…) Explicit (better control): Import collection.javaConverters._ javaCollection.asScala.filter(…) scalaCollection.asJava
  • 26. The power of type-level programming graph path-finding in compile time import scala.language.implicitConversions // Vertices case class A(l: List[Char]) case class B(l: List[Char]) case class C(l: List[Char]) case class D(l: List[Char]) case class E(l: List[Char]) // Edges implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A') implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B') implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C') implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E') def pathFrom(end:D) = end pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
  • 27. Want to go Pro? • Shapeless (Miles Sabin) – Polytypic programming & Heterogenous lists – github.com/milessabin/shapeless • Scalaxy (Olivier Chafik) – Macros for boosting performance of collections – github.com/ochafik/Scalaxy