Cargando…

Mastering Scala machine learning /

Advance your skills in efficient data analysis and data processing using the powerful tools of Scala, Spark, and HadoopAbout This Book*This is a primer on functional-programming-style techniques to help you efficiently process and analyze all of your data*Get acquainted with the best and newest tool...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Kozlov, Alexander (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham : Packt Publishing, 2016.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; Acknowlegement; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Exploratory Data Analysis; Getting started with Scala; Distinct values of a categorical field; Summarization of a numeric field; Grepping across multiple fields; Basic, stratified, and consistent sampling; Working with Scala and Spark Notebooks; Basic correlations; Summary; Chapter 2: Data Pipelines and Modeling; Influence diagrams; Sequential trials and dealing with risk; Exploration and exploitation; Unknown unknowns; Basic components of a data-driven system; Data ingest.
  • Data transformation layerData analytics and machine learning; UI component; Actions engine; Correlation engine; Monitoring; Optimization and interactivity; Feedback loops; Summary; Chapter 3: Working with Spark and MLlib; Setting up Spark; Understanding Spark architecture; Task scheduling; Spark components; MQTT, ZeroMQ, Flume, and Kafka; HDFS, Cassandra, S3, and Tachyon; Mesos, YARN, and Standalone; Applications; Word count; Streaming word count; Spark SQL and DataFrame; ML libraries; SparkR; Graph algorithms
  • GraphX and GraphFrames; Spark performance tuning; Running Hadoop HDFS; Summary.
  • Chapter 4: Supervised and Unsupervised LearningRecords and supervised learning; Iris dataset; Labeled point; SVMWithSGD; Logistic regression; Decision tree; Bagging and boosting
  • ensemble learning methods; Unsupervised learning; Problem dimensionality; Summary; Chapter 5: Regression and Classification; What regression stands for?; Continuous space and metrics; Linear regression; Logistic regression; Regularization; Multivariate regression; Heteroscedasticity; Regression trees; Classification metrics; Multiclass problems; Perceptron; Generalization error and overfitting; Summary.
  • Chapter 6: Working with Unstructured DataNested data; Other serialization formats; Hive and Impala; Sessionization; Working with traits; Working with pattern matching; Other uses of unstructured data; Probabilistic structures; Projections; Summary; Chapter 7: Working with Graph Algorithms; A quick introduction to graphs; SBT; Graph for Scala; Adding nodes and edges; Graph constraints; JSON; GraphX; Who is getting e-mails?; Connected components; Triangle counting; Strongly connected components; PageRank; SVD++; Summary; Chapter 8: Integrating Scala with R and Python; Integrating with R.
  • Setting up R and SparkRLinux; Mac OS; Windows; Running SparkR via scripts; Running Spark via R's command line; DataFrames; Linear models; Generalized linear model; Reading JSON files in SparkR; Writing Parquet files in SparkR; Invoking Scala from R; Using Rserve; Integrating with Python; Setting up Python; PySpark; Calling Python from Java/Scala; Using sys.process._; Spark pipe; Jython and JSR 223; Summary; Chapter 9: NLP in Scala; Text analysis pipeline; Simple text analysis; MLlib algorithms in Spark; TF-IDF; LDA; Segmentation, annotation, and chunking; POS tagging.