Natural Language Processing: Regular Expressions

  • /word/ - matches any string containing the substring "word". Note: it's case sensitive.
  • /[ab]/ - disjunction of characters: "a" or" b"
  • /cat|dog/ - disjunction of string cat and dog. To be read as: String containing "cat" or "dog"
  • /pupp(y|ies)/ - To be read as: string containing "puppy" or "puppies"
  • /[a-z]/ - shows an interval: any letter between "a" and "z"
  • /[^a] / - "^" shows negation when it's the first character inside brackets. To be read as: not containing "a"
  • /^The/ - Here "^" matches only the string which starts with the expression that follows it. E.g. The dog barks.
  • /s.$/ - "$" matches the end of the line. E.g. expression that matches any string ending with .s
  • /words?/ - "?" implies zero or one instances of the previous character. E.g. word or words
  • /b.y/ - "." shows any character between b and y. E.g. for the expression "busy life", the answer would be "us"
  • /a*/ - "*" shows any string of 0 or more "a"'s. To be read as: "String containing no a's or 1 ore more a's
  • /[ab+] / - "+" shows 1 ore more occurences of the previous character. To be read as:Any character containing 1 ore more ab's.
  • /\bthe \b/ / "\b" matches a word boundary. To be read as "matches a string which contains "the" isolated by other words. E.g. "in the club" but not "the other", 
  •  \d - contains any digit
  • \D - contains any non-digit
  • {n} - n occurences of the previous char or expression
  • {,n} - up to n occurences of the previous char or expresssion
  • {n, m) - from n to m occurences of the previous char or expression

19 STASHED

2 LIKES

Speech and Language Processing

Speech and Language Processing

by Daniel Jurafsky, James H. Martin

MORE IDEAS FROM THE BOOK

Morphology is the study of word structure.

Words consist of one or more morphemes (cats= cat+ s). A morpheme is the smallest unit of language.

Morphemes can be of several types:

  • stem (can stand on its own)
  • affix (can't stand on its own)

Word formation can develop through:

  • inflection: forms of the same word. E.g. word, words; work, worked
  • derivation: not applicable to all words in a class; the meaning changes: e.g. act -> actor
  • compunds: stardust = star + dust
  • clitisication: word + clitic. E.g.: we're, you're

For counting words:

  • tokens: distinct occurences of word strings
  • types: distinct words

Lemmatization is a vocabulary reduction process of mapping words to their stem. E.g. sang, sung, sings to sing  

Stemming is the process of reducing words to stems. E.g. information to inform, retrieval to retriev

Types of errors:

  • ommision: related words are not reduced to the same stem. E.g. European and Europe
  • commision: unrelated words reduced to the same stems. E.g. policy and police are reduced to polic 

Note: look at Minimum Distance Algorithm to determine the distance between words

20 STASHED

3 LIKES

Deepstash helps you become inspired, wiser and productive, through bite-sized ideas from the best articles, books and videos out there.

GET THE APP:

RELATED IDEAS

  • Python is a general-purpose, object-oriented programming language.
  • It emphasises code readability by using white space.
  • It is easy to learn.
  • It is a favourite of programmers and developers.
  • Python is very well suited for use in machine learning at a large scale.
  • Its suite of specialised deep learning and machine learning libraries includes tools like scikit-learn, Keras and TensorFlow. It enables data scientists to develop sophisticated data models that plug directly into a production system.

6 STASHED

4 LIKES

Python vs. R: What’s the Difference?

ibm.com

Translation And Interpretation

They require an ability to be able to understand two or more languages and accurately express the content and information in the other language.

Translations need not be binary, but should sound natural without being too literal and wordy. The translator should be able to express the content in such a way that one cannot guess that it is a translation.

85 STASHED

What's the Difference Between French Interpretation and Translation?

thoughtco.com

Our Image In A Professional Setting

In a professional setting, our identity is largely governed by the perception of our peers, colleagues and bosses,

Our ‘image’ depends on how they measure the impact of our behaviour and actions, how our character is perceived and how we are compared with others. We cultivate our image as what others think about us matters more than we would like to think.

41 STASHED

How to Curate Your Digital Persona

hbr.org