Building Features from Text Data

Focused View

Janani Ravi

2:35:59

88 View

1. Course Overview

1. Course Overview.mp4

01:49

2. Representing Text as Features for Machine Learning

01. Version Check.mp4

00:16

02. Module Overview.mp4

01:16

03. Prerequisites and Course Outline.mp4

01:17

04. One-hot Encoding.mp4

04:24

05. Count Vectors.mp4

03:00

06. Tf-Idf Vectors.mp4

03:00

07. Co-occurence Vectors.mp4

05:05

08. Word Embeddings.mp4

05:11

09. Installing Packages and Setting Up the Environment.mp4

03:08

10. Sentence and Word Tokenization.mp4

05:28

11. Plotting Word Frequency Distributions.mp4

04:01

12. Module Summary.mp4

01:12

3. Building Feature Vector Representations of Text

1. Module Overview.mp4

01:17

2. Bag-of-words and Bag-of-n-grams.mp4

03:02

3. Bag-of-words Using the Count Vectorizer.mp4

06:53

4. Inverse Transform Using the Count Vectorizer.mp4

01:49

5. Bag-of-n-grams Using the Count Vectorizer.mp4

05:30

6. Generating N-grams Using NLTK.mp4

03:25

7. Bag-of-words Using the Tf-Idf Vectorizer.mp4

04:23

8. Module Summary.mp4

01:18

4. Simplifying Text Processing Using Natural Language Processing

1. Module Overview.mp4

01:15

2. Natural Language Processing Operations.mp4

05:44

3. Stopword Removal Using NLTK and scikit-learn.mp4

06:43

4. Frequency Filtering Using scikit-learn.mp4

02:58

5. Stemming.mp4

05:56

6. Lemmatization.mp4

03:31

7. Parts-of-speech Tagging.mp4

06:21

8. Module Summary.mp4

01:22

5. Reducing Dimensions in Text Using Hashing

1. Module Overview.mp4

01:11

2. Feature Hashing.mp4

02:26

3. Reducing Dimensions Using the Feature Hasher.mp4

03:43

4. Reducing Dimensions at Scale Using the Hashing Vectorizer.mp4

06:24

5. Locality-sensitive Hashing.mp4

05:29

6. Similar Documents Using Jaccard Index and Locality-sensitive Hashing.mp4

07:01

7. Module Summary.mp4

01:23

6. Applying Text Feature Extraction Techniques to Machine Learning

01. Module Overview.mp4

01:05

02. Naive Bayes for Classification.mp4

02:44

03. Classification Using the Hashing Vectorizer.mp4

07:55

04. Pre-process Text Using a Stemmer, Build Features Using the Hashing Vectorizer.mp4

02:58

05. Building Features Using the Count Vectorizer.mp4

02:13

06. Pre-processing with Stopword Removal, Building Features Using Count Vectorizer.mp4

01:49

07. Pre-processing with Stopword Removal, Frequency Filtering, Building Features Using Count Vectorizer.mp4

03:28

08. Building Features Using the Tf-Idf Vectorizer.mp4

01:49

09. Building Features Using Bag-of-n-grams Model.mp4

02:13

10. Summary and Further Study.mp4

01:34

Description

This course covers aspects of extracting information from text documents and constructing classification models including feature vectorization, locality-sensitive hashing, stopword removal, lemmatization, and more from natural language processing.

What You'll Learn?

From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form.

In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models.

First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document.

Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging.

Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together.

You will round out the course by implementing a classification model on text documents using many of these modeling abstractions.

When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.

More details

User Reviews

Rating

average 0

Total votes0

Focused display

Natural Language Processing

Data Mining

Janani Ravi

Instructor's Courses

Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework. After spending years working in tech in the Bay Area, New York, and Singapore at companies such as Microsoft, Google, and Flipkart, Janani finally decided to combine her love for technology with her passion for teaching. She is now the co-founder of Loonycorn, a content studio focused on providing high-quality content for technical skill development. Loonycorn is working on developing an engine (patent filed) to automate animations for presentations and educational content.

Pluralsight

View courses Pluralsight

Pluralsight, LLC is an American privately held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog. Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules.