Magda Mihalache's Key Ideas from Speech and Language Processing
by Daniel Jurafsky, James H. Martin

Name: Speech and Language Processing
Author: Magda Mihalache

Ideas, facts & insights covering these topics:

Science & Nature

Books

Society, Arts & Culture

Personal Development

Reading & Writing

2 ideas

42 reads

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

Natural Language Processing: Regular Expressions

/word/ - matches any string containing the substring "word". Note: it's case sensitive.
/[ab]/ - disjunction of characters: "a" or" b"
/cat|dog/ - disjunction of string cat and dog. To be read as: String containing "cat" or "dog"
/pupp(y|ies)/ - To be read as: string containing "puppy" or "puppies"
/[a-z]/ - shows an interval: any letter between "a" and "z"
/[^a] / - "^" shows negation when it's the first character inside brackets. To be read as: not containing "a"
/^The/ - Here "^" matches only the string which starts with the expression that follows it. E.g. The dog barks.
/s.$/ - "$" matches the end of the line. E.g. expression that matches any string ending with .s
/words?/ - "?" implies zero or one instances of the previous character. E.g. word or words
/b.y/ - "." shows any character between b and y. E.g. for the expression "busy life", the answer would be "us"
/a*/ - "*" shows any string of 0 or more "a"'s. To be read as: "String containing no a's or 1 ore more a's
/[ab+] / - "+" shows 1 ore more occurences of the previous character. To be read as:Any character containing 1 ore more ab's.
/\bthe \b/ / "\b" matches a word boundary. To be read as "matches a string which contains "the" isolated by other words. E.g. "in the club" but not "the other",
\d - contains any digit
\D - contains any non-digit
{n} - n occurences of the previous char or expression
{,n} - up to n occurences of the previous char or expresssion
{n, m) - from n to m occurences of the previous char or expression

15 reads

Natural Language Processing: Word Formation

Morphology is the study of word structure.

Words consist of one or more morphemes (cats= cat+ s). A morpheme is the smallest unit of language.

Morphemes can be of several types:

stem (can stand on its own)
affix (can't stand on its own)

Word formation can develop through:

inflection: forms of the same word. E.g. word, words; work, worked
derivation: not applicable to all words in a class; the meaning changes: e.g. act -> actor
compunds: stardust = star + dust
clitisication: word + clitic. E.g.: we're, you're

For counting words:

tokens: distinct occurences of word strings
types: distinct words

Lemmatization is a vocabulary reduction process of mapping words to their stem. E.g. sang, sung, sings to sing

Stemming is the process of reducing words to stems. E.g. information to inform, retrieval to retriev

Types of errors:

ommision: related words are not reduced to the same stem. E.g. European and Europe
commision: unrelated words reduced to the same stems. E.g. policy and police are reduced to polic

Note: look at Minimum Distance Algorithm to determine the distance between words

27 reads

IDEAS CURATED BY

Magda Mihalache

@magdamihalache

User Researcher, passionate about behaviours and building the right products. I 'stash' about research, self-development and education.

Magda Mihalache's ideas are part of this journey:

Learn more about scienceandnature with this collection

The Psychology of Willpower

How to strengthen your willpower

How to overcome temptation and distractions

The role of motivation in willpower

Related collections

Trauma: The Journey to Healing

Daring To Be Vulnerable

Managing People

Unlocking your Creative Potential

Discover Key Ideas from Books on Similar Topics

5 ideas

The fourth industrial revolution: a primer on Artificial Intelligence (AI)

medium.com

4 ideas

How to Curate Your Digital Persona

hbr.org

7 ideas

AI Revolution 101

medium.com

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact

Magda Mihalache's Key Ideas from Speech and Language Processingby Daniel Jurafsky, James H. Martin