Discovering knowledge in data : an introduction to data mining /
"This is a new edition of a highly praised, successful reference on data mining, now more important than ever due to the growth of the field and wide range of applications. This edition features new chapters on multivariate statistical analysis, covering analysis of variance and chi-square proc...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Otros Autores: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Hoboken :
Wiley,
2014.
|
Edición: | Second edition. |
Colección: | Wiley series on methods and applications in data mining.
|
Temas: | |
Acceso en línea: | Texto completo Texto completo |
MARC
LEADER | 00000cam a2200000 i 4500 | ||
---|---|---|---|
001 | EBOOKCENTRAL_ocn869460667 | ||
003 | OCoLC | ||
005 | 20240329122006.0 | ||
006 | m o d | ||
007 | cr ||||||||||| | ||
008 | 140128s2014 nju ob 001 0 eng | ||
010 | |a 2014003777 | ||
040 | |a DLC |b eng |e rda |e pn |c DLC |d YDX |d N$T |d EBLCP |d NAM |d TEFOD |d YDXCP |d DG1 |d CHVBK |d OCLCF |d B24X7 |d COO |d OCLCQ |d TEFOD |d DEBSZ |d DEBBG |d DG1 |d LIP |d MERUC |d BUF |d OCLCO |d OCLCQ |d UUM |d DEHBZ |d AUD |d COF |d AU@ |d OCLCQ |d U3W |d OCLCQ |d UAB |d OCLCQ |d U@J |d CNCEN |d DCT |d ERF |d ITD |d OCLCQ |d S4S |d UMI |d CEF |d STF |d OTZ |d OCLCQ |d UKBTH |d OCLCQ |d BRF |d EYM |d OCLCO |d INARC |d OCLCQ |d IEEEE |d OCLCO |d OCLCL | ||
066 | |c (S | ||
019 | |a 900145241 |a 922007435 |a 1060197865 |a 1105712195 |a 1105772098 |a 1112602732 |a 1132915630 |a 1159619025 |a 1194805506 | ||
020 | |a 9781118873571 |q (ePub) | ||
020 | |a 1118873572 |q (ePub) | ||
020 | |a 9781118873588 |q (Adobe PDF) | ||
020 | |a 1118873580 |q (Adobe PDF) | ||
020 | |a 9781118874059 | ||
020 | |a 1118874056 | ||
020 | |z 9780470908747 |q (hardback) | ||
020 | |z 0470908742 | ||
024 | 8 | |a 9781118873571 | |
024 | 7 | |a 10.1002/9781118874059 |2 doi | |
029 | 1 | |a AU@ |b 000052497094 | |
029 | 1 | |a CHBIS |b 010442153 | |
029 | 1 | |a CHDSB |b 006246430 | |
029 | 1 | |a CHNEW |b 000887791 | |
029 | 1 | |a CHNEW |b 000942767 | |
029 | 1 | |a CHVBK |b 480230617 | |
029 | 1 | |a DEBBG |b BV043396508 | |
029 | 1 | |a DEBBG |b BV043610018 | |
029 | 1 | |a DEBSZ |b 449430731 | |
029 | 1 | |a DEBSZ |b 468874178 | |
029 | 1 | |a NZ1 |b 15753483 | |
029 | 1 | |a NZ1 |b 15907000 | |
029 | 1 | |a DEBBG |b BV043020352 | |
029 | 1 | |a DEBSZ |b 455699240 | |
029 | 1 | |a GBVCP |b 882744615 | |
029 | 1 | |a DKDLA |b 820120-katalog:999927013805765 | |
035 | |a (OCoLC)869460667 |z (OCoLC)900145241 |z (OCoLC)922007435 |z (OCoLC)1060197865 |z (OCoLC)1105712195 |z (OCoLC)1105772098 |z (OCoLC)1112602732 |z (OCoLC)1132915630 |z (OCoLC)1159619025 |z (OCoLC)1194805506 | ||
037 | |a 48FDFBF4-359A-49D0-9000-912DC8EF78C9 |b OverDrive, Inc. |n http://www.overdrive.com | ||
037 | |a 10066951 |b IEEE | ||
042 | |a pcc | ||
050 | 0 | 0 | |a QA76.9.D343 |
072 | 7 | |a COM |x 000000 |2 bisacsh | |
082 | 0 | 0 | |a 006.3/12 |2 23 |
084 | |a COM021040 |a COM021030 |2 bisacsh | ||
049 | |a UAMI | ||
100 | 1 | |a Larose, Daniel T. | |
245 | 1 | 0 | |a Discovering knowledge in data : |b an introduction to data mining / |c Daniel T. Larose and Chantal D. Larose. |
250 | |a Second edition. | ||
264 | 1 | |a Hoboken : |b Wiley, |c 2014. | |
300 | |a 1 online resource (xviii, 316 pages) | ||
336 | |a text |b txt |2 rdacontent | ||
337 | |a computer |b c |2 rdamedia | ||
338 | |a online resource |b cr |2 rdacarrier | ||
347 | |a text file | ||
490 | 1 | |a Wiley series on methods and applications in data mining | |
500 | |a Includes index. | ||
520 | |a "This is a new edition of a highly praised, successful reference on data mining, now more important than ever due to the growth of the field and wide range of applications. This edition features new chapters on multivariate statistical analysis, covering analysis of variance and chi-square procedures; cost-benefit analyses; and time-series data analysis. There is also extensive coverage of the R statistical programming language. Graduate and advanced undergraduate students of computer science and statistics, managers/CEOs/CFOs, marketing executives, market researchers and analysts, sales analysts, and medical professionals will want this comprehensive reference"-- |c Provided by publisher. | ||
588 | 0 | |a Print version record and CIP data provided by publisher. | |
504 | |a Includes bibliographical references and index. | ||
505 | 0 | |a DISCOVERING KNOWLEDGE IN DATA -- Contents -- Preface -- 1 An Introduction to Data Mining -- 1.1 What is Data Mining? -- 1.2 Wanted: Data Miners -- 1.3 The Need for Human Direction of Data Mining -- 1.4 The Cross-Industry Standard Practice for Data Mining -- 1.4.1 Crisp-DM: The Six Phases -- 1.5 Fallacies of Data Mining -- 1.6 What Tasks Can Data Mining Accomplish? -- 1.6.1 Description -- 1.6.2 Estimation -- 1.6.3 Prediction -- 1.6.4 Classification -- 1.6.5 Clustering -- 1.6.6 Association -- References -- Exercises -- 2 Data Preprocessing -- 2.1 Why do We Need to Preprocess the Data? -- 2.2 Data Cleaning -- 2.3 Handling Missing Data -- 2.4 Identifying Misclassifications -- 2.5 Graphical Methods for Identifying Outliers -- 2.6 Measures of Center and Spread -- 2.7 Data Transformation -- 2.8 Min-Max Normalization -- 2.9 Z-Score Standardization -- 2.10 Decimal Scaling -- 2.11 Transformations to Achieve Normality -- 2.12 Numerical Methods for Identifying Outliers -- 2.13 Flag Variables -- 2.14 Transforming Categorical Variables into Numerical Variables -- 2.15 Binning Numerical Variables -- 2.16 Reclassifying Categorical Variables -- 2.17 Adding an Index Field -- 2.18 Removing Variables that are Not Useful -- 2.19 Variables that Should Probably Not Be Removed -- 2.20 Removal of Duplicate Records -- 2.21 A Word About Id Fields -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 3 Exploratory Data Analysis -- 3.1 Hypothesis Testing Versus Exploratory Data Analysis -- 3.2 Getting to Know the Data Set -- 3.3 Exploring Categorical Variables -- 3.4 Exploring Numeric Variables -- 3.5 Exploring Multivariate Relationships -- 3.6 Selecting Interesting Subsets of the Data for Further Investigation -- 3.7 Using EDA to Uncover Anomalous Fields -- 3.8 Binning Based on Predictive Value -- 3.9 Deriving New Variables: Flag Variables. | |
505 | 8 | |a 3.10 Deriving New Variables: Numerical Variables -- 3.11 Using EDA to Investigate Correlated Predictor Variables -- 3.12 Summary -- THE R ZONE -- Reference -- Exercises -- Hands-On Analysis -- 4 Univariate Statistical Analysis -- 4.1 Data Mining Tasks in Discovering Knowledge in Data -- 4.2 Statistical Approaches to Estimation and Prediction -- 4.3 Statistical Inference -- 4.4 How Confident are We in Our Estimates? -- 4.5 Confidence Interval Estimation of the Mean -- 4.6 How to Reduce the Margin of Error -- 4.7 Confidence Interval Estimation of the Proportion -- 4.8 Hypothesis Testing for the Mean -- 4.9 Assessing the Strength of Evidence Against the Null Hypothesis -- 4.10 Using Confidence Intervals to Perform Hypothesis Tests -- 4.11 Hypothesis Testing for the Proportion -- THE R ZONE -- Reference -- Exercises -- 5 Multivariate Statistics -- 5.1 Two-Sample t-Test for Difference in Means -- 5.2 Two-Sample Z-Test for Difference in Proportions -- 5.3 Test for Homogeneity of Proportions -- 5.4 Chi-Square Test for Goodness of Fit of Multinomial Data -- 5.5 Analysis of Variance -- 5.6 Regression Analysis -- 5.7 Hypothesis Testing in Regression -- 5.8 Measuring the Quality of a Regression Model -- 5.9 Dangers of Extrapolation -- 5.10 Confidence Intervals for the Mean Value of Given -- 5.11 Prediction Intervals for a Randomly Chosen Value of Given -- 5.12 Multiple Regression -- 5.13 Verifying Model Assumptions -- THE R ZONE -- Reference -- Exercises -- Hands-On Analysis -- 6 Preparing to Model the Data -- 6.1 Supervised Versus Unsupervised Methods -- 6.2 Statistical Methodology and Data Mining Methodology -- 6.3 Cross-Validation -- 6.4 Overfitting -- 6.5 BIAS-Variance Trade-Off -- 6.6 Balancing the Training Data Set -- 6.7 Establishing Baseline Performance -- THE R ZONE -- Reference -- Exercises -- 7 k-Nearest Neighbor Algorithm. | |
505 | 8 | |a 11.2 Kohonen Networks -- 11.2.1 Kohonen Networks Algorithm -- 11.3 Example of a Kohonen Network Study -- 11.4 Cluster Validity -- 11.5 Application of Clustering Using Kohonen Networks -- 11.6 Interpreting the Clusters -- 11.6.1 Cluster Profiles -- 11.7 Using Cluster Membership as Input to Downstream Data Mining Models -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 12 Association Rules -- 12.1 Affinity Analysis and Market Basket Analysis -- 12.1.1 Data Representation for Market Basket Analysis -- 12.2 Support, Confidence, Frequent Itemsets, and the a Priori Property -- 12.3 How Does the a Priori Algorithm Work? -- 12.3.1 Generating Frequent Itemsets -- 12.3.2 Generating Association Rules -- 12.4 Extension from Flag Data to General Categorical Data -- 12.5 Information-Theoretic Approach: Generalized Rule Induction Method -- 12.5.1 J-Measure -- 12.6 Association Rules are Easy to do Badly -- 12.7 How can we Measure the Usefulness of Association Rules? -- 12.8 Do Association Rules Represent Supervised or Unsupervised Learning? -- 12.9 Local Patterns Versus Global Models -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 13 Imputation of Missing Data -- 13.1 Need for Imputation of Missing Data -- 13.2 Imputation of Missing Data: Continuous Variables -- 13.3 Standard Error of the Imputation -- 13.4 Imputation of Missing Data: Categorical Variables -- 13.5 Handling Patterns in Missingness -- THE R ZONE -- Reference -- Exercises -- Hands-On Analysis -- 14 Model Evaluation Techniques -- 14.1 Model Evaluation Techniques for the Description Task -- 14.2 Model Evaluation Techniques for the Estimation and Prediction Tasks -- 14.3 Model Evaluation Techniques for the Classification Task -- 14.4 Error Rate, False Positives, and False Negatives -- 14.5 Sensitivity and Specificity. | |
505 | 8 | |a 14.6 Misclassification Cost Adjustment to Reflect Real-World Concerns -- 14.7 Decision Cost/Benefit Analysis -- 14.8 Lift Charts and Gains Charts -- 14.9 Interweaving Model Evaluation with Model Building -- 14.10 Confluence of Results: Applying a Suite of Models -- THE R ZONE -- Reference -- Exercises -- Hands-On Analysis -- Appendix Data Summarization and Visualization -- Part 1 Summarization 1: Building Blocks of Data Analysis -- Part 2 Visualization: Graphs and Tables for Summarizing and Organizing Data -- 2.1 Categorical Variables -- 2.2 Quantitative Variables -- Part 3 Summarization 2: Measures of Center, Variability, and Position -- Part 4 Summarization and Visualization of Bivariate Relationships -- Index. | |
542 | |f Copyright © John Wiley & Sons |g 2014 | ||
590 | |a ProQuest Ebook Central |b Ebook Central Academic Complete | ||
590 | |a O'Reilly |b O'Reilly Online Learning: Academic/Public Library Edition | ||
650 | 0 | |a Data mining. | |
650 | 2 | |a Data Mining | |
650 | 6 | |a Exploration de données (Informatique) | |
650 | 7 | |a COMPUTERS |x Database Management |x Data Warehousing. |2 bisacsh | |
650 | 7 | |a COMPUTERS |x Database Management |x Data Mining. |2 bisacsh | |
650 | 7 | |a Data mining |2 fast | |
650 | 7 | |a Data Mining |2 gnd | |
700 | 1 | |a Larose, Chantal D. | |
758 | |i has work: |a Discovering Knowledge in Data (Text) |1 https://id.oclc.org/worldcat/entity/E39PCFFDMwMKVrQBPyDHHYkp4C |4 https://id.oclc.org/worldcat/ontology/hasWork | ||
776 | 0 | 8 | |i Print version: |a Larose, Daniel T. |t Discovering knowledge in data. |b Second edition. |d Hoboken : Wiley, [2014] |z 9780470908747 |w (DLC) 2013046021 |w (OCoLC)869458428 |
830 | 0 | |a Wiley series on methods and applications in data mining. | |
856 | 4 | 0 | |u https://ebookcentral.uam.elogim.com/lib/uam-ebooks/detail.action?docID=1699137 |z Texto completo |
856 | 4 | 0 | |u https://learning.oreilly.com/library/view/~/9781118873571/?ar |z Texto completo |
880 | 8 | |6 505-00/(S |a 7.1 Classification Task -- 7.2 κ-Nearest Neighbor Algorithm -- 7.3 Distance Function -- 7.4 Combination Function -- 7.4.1 Simple Unweighted Voting -- 7.4.2 Weighted Voting -- 7.5 Quantifying Attribute Relevance: Stretching the Axes -- 7.6 Database Considerations -- 7.7 κ-Nearest Neighbor Algorithm for Estimation and Prediction -- 7.8 Choosing κ -- 7.9 Application of κ-Nearest Neighbor Algorithm Using IBM/SPSS Modeler -- THE R ZONE -- Exercises -- Hands-On Analysis -- 8 Decision Trees -- 8.1 What is a Decision Tree-- 8.2 Requirements for Using Decision Trees -- 8.3 Classification and Regression Trees -- 8.4 C4.5 Algorithm -- 8.5 Decision Rules -- 8.6 Comparison of the C5.0 and Cart Algorithms Applied to Real Data -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 9 Neural Networks -- 9.1 Input and Output Encoding -- 9.2 Neural Networks for Estimation and Prediction -- 9.3 Simple Example of a Neural Network -- 9.4 Sigmoid Activation Function -- 9.5 Back-Propagation -- 9.5.1 Gradient Descent Method -- 9.5.2 Back-Propagation Rules -- 9.5.3 Example of Back-Propagation -- 9.6 Termination Criteria -- 9.7 Learning Rate -- 9.8 Momentum Term -- 9.9 Sensitivity Analysis -- 9.10 Application of Neural Network Modeling -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 10 Hierarchical and k-Means Clustering -- 10.1 The Clustering Task -- 10.2 Hierarchical Clustering Methods -- 10.3 Single-Linkage Clustering -- 10.4 Complete-Linkage Clustering -- 10.5 κ-Means Clustering -- 10.6 Example of κ-Means Clustering at Work -- 10.7 Behavior of MSB, MSE, and PSEUDO-F as the κ-Means Algorithm Proceeds -- 10.8 Application of κ-Means Clustering Using SAS Enterprise Miner -- 10.9 Using Cluster Membership to Predict Churn -- THE R ZONE -- References -- Exercises -- Hands-On Analysis -- 11 Kohonen Networks -- 11.1 Self-Organizing Maps. | |
938 | |a Internet Archive |b INAR |n discoveringknowl0000laro_d4x2 | ||
938 | |a Books 24x7 |b B247 |n bks00063521 | ||
938 | |a EBL - Ebook Library |b EBLB |n EBL1699137 | ||
938 | |a EBSCOhost |b EBSC |n 786218 | ||
938 | |a YBP Library Services |b YANK |n 11337188 | ||
938 | |a YBP Library Services |b YANK |n 11841305 | ||
938 | |a YBP Library Services |b YANK |n 12677609 | ||
994 | |a 92 |b IZTAP |