Unstructured data analysis : entity resolution and regular expressions in SAS /
Unstructured data is the most voluminous form of data in the world, and several elements are critical for any advanced analytics practitioner leveraging SAS software to effectively address the challenge of deriving value from that data. This book covers the five critical elements of entity extractio...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Cary, NC :
SAS Institute,
[2018]
|
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Intro; Contents; About This Book; Software Used to Develop the Book's Content; Example Code and Data; SAS University Edition; Acknowledgments; Chapter 1: Getting Started with Regular Expressions; 1.1.1 Defining Regular Expressions; 1.1.2 Motivational Examples; 1.1.3 RegEx Essentials; 1.1.4 RegEx Test Code; 1.3.1 Wildcard; 1.3.2 Word; 1.3.3 Non-word; 1.3.4 Tab; 1.3.5 Whitespace; 1.3.6 Non-whitespace; 1.3.7 Digit; 1.3.8 Non-digit; 1.3.9 Newline; 1.3.10 Bell; 1.3.11 Control Character; 1.3.12 Octal; 1.3.13 Hexadecimal; 1.4.1 List; 1.4.2 Not List; 1.4.3 Range; 1.5.1 Case Modifiers
- 1.5.2 Repetition Modifiers1.6.1 Ignore Case; 1.6.2 Single Line; 1.6.3 Multiline; 1.6.4 Compile Once; 1.6.5 Substitution Operator; 1.7.1 Start of Line; 1.7.2 End of Line; 1.7.3 Word Boundary; 1.7.4 Non-word Boundary; 1.7.5 String Start; Chapter 2: Using Regular Expressions in SAS; 2.1.1 Capture Buffer; 2.2.1 PRXPARSE; 2.2.2 PRXMATCH; 2.2.3 PRXCHANGE; 2.2.4 PRXPOSN; 2.2.5 PRXPAREN; 2.3.1 CALL PRXCHANGE; 2.3.2 CALL PRXPOSN; 2.3.3 CALL PRXSUBSTR; 2.3.4 CALL PRXNEXT; 2.3.5 CALL PRXDEBUG; 2.3.6 CALL PRXFREE; 2.4.1 Data Cleansing and Standardization; 2.4.2 Information Extraction
- 2.4.3 Search and ReplacementChapter 3: Entity Resolution Analytics; 3.3.1 Entity Extraction; 3.3.2 Extract, Transform, and Load; 3.3.3 Entity Resolution; 3.3.4 Entity Network Mapping and Analysis; 3.3.5 Entity Management; 3.4.1 Establish Clear Goals; 3.4.2 Verify Proper Data Inventory; 3.4.3 Create SMART Objectives; Chapter 4: Entity Extraction; 4.3.1 Webpage; 4.3.2 File System; 4.4.1 Social Security Number; 4.4.2 Phone Number; 4.4.3 Address; 4.4.4 Website; 4.4.5 Corporation Name; Chapter 5: Extract, Transform, Load; 5.2.1 PROC CONTENTS; 5.2.2 PROC FREQ; 5.2.3 PROC MEANS
- 5.4.1 Hexadecimal to Decimal5.4.2 Working with Dates; 5.6.1 Quantile Binning; 5.6.2 Bucket Binning; Chapter 6: Entity Resolution; 6.1.1 Exact Matching; 6.1.2 Fuzzy Matching; 6.1.3 Error Handling; 6.2.1 INDEX=; 6.3.1 COMPGED and COMPLEV; 6.3.2 SOUNDEX; 6.3.3 Putting Things Together; Chapter 7: Entity Network Mapping and Analysis; 7.2.1 Shared Entity Attributes; 7.2.2 Entity Interactions; 7.3.1 Articulation Points and Biconnected Components; 7.3.2 Minimum Spanning Trees; 7.3.3 Clique Detection; 7.3.4 Minimum Cut; 7.3.5 Shortest Paths; Chapter 8: Entity Management
- Appendix A: Additional ResourcesA. 2.1 Non-Printing Characters; A.2.2 Printing Characters; A.4.1 Random PII Generator; A.4.2 Output