Machine Learning with Apache Spark Quick Start Guide : Uncover Patterns, Derive Actionable Insights, and Learn from Big Data Using MLlib.
Chapter 3: Artificial Intelligence and Machine Learning; Artificial intelligence; Machine learning; Supervised learning; Unsupervised learning; Reinforced learning; Deep learning; Natural neuron; Artificial neuron; Weights; Activation function; Heaviside step function; Sigmoid function; Hyperbolic t...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham :
Packt Publishing Ltd,
2018.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Chapter 1: The Big Data Ecosystem; A brief history of data; Vertical scaling; Master/slave architecture; Sharding; Data processing and analysis; Data becomes big; Big data ecosystem; Horizontal scaling; Distributed systems; Distributed data stores; Distributed filesystems; Distributed databases; NoSQL databases; Document databases; Columnar databases; Key-value databases; Graph databases; CAP theorem; Distributed search engines; Distributed processing; MapReduce; Apache Spark
- RDDs, DataFrames, and datasetsRDDs; DataFrames; Datasets; Jobs, stages, and tasks; Job; Stage; Tasks; Distributed messaging; Distributed streaming; Distributed ledgers; Artificial intelligence and machine learning; Cloud computing platforms; Data insights platform; Reference logical architecture; Data sources layer; Ingestion layer; Persistent data storage layer; Data processing layer; Serving data storage layer; Data intelligence layer; Unified access layer; Data insights and reporting layer; Platform governance, management, and administration; Open source implementation; Summary
- Chapter 2: Setting Up a Local Development EnvironmentCentOS Linux 7 virtual machine; Java SE Development Kit 8; Scala 2.11; Anaconda 5 with Python 3; Basic conda commands; Additional Python packages; Jupyter Notebook; Starting Jupyter Notebook; Troubleshooting Jupyter Notebook; Apache Spark 2.3; Spark binaries; Local working directories; Spark configuration; Spark properties; Environmental variables; Standalone master server; Spark worker node; PySpark and Jupyter Notebook; Apache Kafka 2.0; Kafka binaries; Local working directories; Kafka configuration; Start the Kafka server; Testing Kafka
- Univariate linear regressionResiduals; Root mean square error; R-squared; Univariate linear regression in Apache Spark; Multivariate linear regression; Correlation; Multivariate linear regression in Apache Spark; Logistic regression; Threshold value; Confusion matrix; Receiver operator characteristic curve; Area under the ROC curve; Case study
- predicting breast cancer; Classification and Regression Trees; Case study
- predicting political affiliation; Random forests; K-Fold cross validation; Summary; Chapter 5: Unsupervised Learning Using Apache Spark; Clustering; Euclidean distance