Cargando…

A novel solution for a data augmentation and bias problem in NLP using TensorFlow

The TensorFlow ecosystem contains many valuable assets. One of which is the highly acclaimed TensorFlow high-level API. It's critical for a fast and lightweight approach to reducing lead time in deep learning model development and hypothesis testing. It's now possible to quickly and easily...

Descripción completa

Detalles Bibliográficos
Autor principal: Tung, KC (Autor)
Autor Corporativo: Safari, an O'Reilly Media Company
Formato: Electrónico Video
Idioma:Inglés
Publicado: O'Reilly Media, Inc., 2020.
Edición:1st edition.
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cgm a22000007a 4500
001 OR_on1143018724
003 OCoLC
005 20231017213018.0
006 m o c
007 cr cnu||||||||
007 vz czazuu
008 200220s2020 xx 041 vleng
040 |a AU@  |b eng  |c AU@  |d STF  |d NZCPL  |d OCLCF  |d OCLCO  |d OCLCQ 
019 |a 1193323282  |a 1232110637  |a 1305895560 
020 |z 0636920373759 
024 8 |a 0636920373773 
029 0 |a AU@  |b 000066786001 
035 |a (OCoLC)1143018724  |z (OCoLC)1193323282  |z (OCoLC)1232110637  |z (OCoLC)1305895560 
049 |a UAMI 
100 1 |a Tung, KC,  |e author. 
245 1 2 |a A novel solution for a data augmentation and bias problem in NLP using TensorFlow  |h [electronic resource] /  |c Tung, KC. 
250 |a 1st edition. 
264 1 |b O'Reilly Media, Inc.,  |c 2020. 
300 |a 1 online resource (1 video file, approximately 41 min.) 
336 |a two-dimensional moving image  |b tdi  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a video file 
520 |a The TensorFlow ecosystem contains many valuable assets. One of which is the highly acclaimed TensorFlow high-level API. It's critical for a fast and lightweight approach to reducing lead time in deep learning model development and hypothesis testing. It's now possible to quickly and easily develop a novel deep learning solution to meet an important need in practice: data bias and augmentation in NLP. Solving this problem would have a far-reaching impact in model bias, offensive-language detection, language personalization, and classification. KC Tung (Microsoft) details his work to satisfy a need of an enterprise customer (one of the largest airlines in the world) for a model that can accurately review, classify, and store texts from aircraft maintenance logs to comply with FAA regulations on aviation safety. The customer's data is imbalanced and biased toward certain categories. Training machine learning models with imbalanced data inevitably leads to model bias, and text generation is a novel and important approach for data augmentation. In NLP, many current approaches to augmenting minority data are unsupervised and are limited to synonym swap, insertion, deletion, or oversampling. These generalized approaches often lead to a trade-off between precision and recall. They also don't work well in practice, as enterprise data is almost always domain specific. There needs to be a better framework to generate new corpus by learning from any domain-specific underrepresented text. KC presents a novel deep learning framework built with TensorFlow to quickly achieve this goal. A benchmark model is trained on the balanced dataset. From this dataset a class is undersampled as the underrepresented, minority class text. Then a gated recurrent unit (GRU) model learns to generate more underrepresented text, which helps training a long short-term memory (LSTM) model that classifies text. The result on holdout data shows that the model trained with generated text is surprisingly effective. Classification accuracy, precision, and recall at each class are all on par with the benchmark model without compromising precision or recall. In short, this demonstrates the success of TensorFlow adoption for the enterprise customer in quickly leveraging and applying the TensorFlow high-level API in building a novel production-grade solution for deployment, demonstrating the effectiveness of a novel data-augmentation framework, identifying a "killer app" or a new core val ... 
538 |a Mode of access: World Wide Web. 
542 |f Copyright © O'Reilly Media, Inc. 
550 |a Made available through: Safari, an O'Reilly Media Company. 
588 |a Online resource; Title from title screen (viewed February 28, 2020) 
533 |a Electronic reproduction.  |b Boston, MA :  |c Safari.  |n Available via World Wide Web. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
710 2 |a Safari, an O'Reilly Media Company. 
856 4 0 |u https://learning.oreilly.com/videos/~/0636920373773/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
936 |a BATCHLOAD 
994 |a 92  |b IZTAP