You like to use R, and you need to use big data. dplyr, one of the most popular packages for R, makes it easy to query large data sets in scalable processing engines like Apache Spark and Apache Impala. But there can be pitfalls: dplyr works differently with different data sources—and those differences can bite you if you don’t know what you’re doing. Ian Cook is a data scientist, an R contributor, and a curriculum developer at Cloudera University. In this webinar, Ian will show you exactly what you need to know about sparklyr (from RStudio) and the package implyr (from Cloudera). He will show you how to write dplyr code that works across these different interfaces. And, he will solve mysteries: Do I need to know SQL to use dplyr? When is a “tbl” not a “tibble”? Why is 1 not always equal to 1? When should you collect(), collapse(), and compute()? How can you use dplyr to combine data stored in different systems? 3 things to learn: Do I need to know SQL to use dplyr? When should you collect(), collapse(), and compute()? How can you use dplyr to combine data stored in different systems?