Parsing - Deepstash
Machine Learning With Google

Learn more about computerscience with this collection

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Machine Learning With Google

Discover 95 similar ideas in

It takes just

14 mins to read

Parsing

The parser turns a list of tokens into a tree of nodes. A tree used for storing this type of data is known as an Abstract Syntax Tree, or AST. 

At least in Pinecone, the AST does not have any info about types or which identifiers are which. It is simply structured tokens. 

14

41 reads

MORE IDEAS ON THIS

Lexing

The first step in most programming languages is lexing, or tokenizing. ‘Lex’ is short for lexical analysis, a very fancy word for splitting a bunch of text into tokens. 

The word ‘tokenizer’ makes a lot more sense, but ‘lexer’ is so much fun to say that I use it anyway. 

15

96 reads

Action Tree vs AST

Put simply, the action tree is the AST with context. That context is info such as what type a function returns, or that two places in which a variable is used are in fact using the same variable. 

Because it needs to figure out and remember all this context, the code that g...

14

27 reads

Running the Action Tree

Once we have the action tree, running the code is easy. Each action node has a function ‘execute’ which takes some input, does whatever the action should (including possibly calling sub action) and returns the action’s output. This is the interpreter in action. 

14

46 reads

Parser Duties

The parser adds structure to to the ordered list of tokens the lexer produces. To stop ambiguities, the parser must take into account parenthesis and the order of operations. 

Simply parsing operators isn’t terribly difficult, but as more language constructs get added, pars...

14

34 reads

Bison

The predominant parsing library is Bison. Bison works a lot like Flex. You write a file in a custom format that stores the grammar information, then Bison uses that to generate a C program that will do your parsing. I did not choose to use Bison. 

14

33 reads

Tokens

A token is a small unit of a language

A token might be a variable or function name (AKA an identifier), an operator or a number. 

14

62 reads

Why Custom Is Better

  • Minimize context switching in workflow: context switching between C++ and Pinecone is bad enough without throwing in Bison’s grammar grammar
  • Every time grammar changes bison has to be run before the build.
  • A custom Parser is completely doable.

14

30 reads

Choosing a Language

If you are writing an interpreted language, it makes a lot of sense to write it in a compiled one (like C, C++ or swift).

 If you plan to compile, a slower language (like python or Javascript) is more acceptable.

14

166 reads

High Level Design

A programming language is generally structured as a pipeline. That is, it has several stages. 

Each stage has data formatted in a specific, well defined way. It also has functions to transform data from each stage to the next. 

15

124 reads

Compiled vs Interpreted

Compiled vs Interpreted

There are two major types of languages: compiled and interpreted: 

  • A compiler figures out everything a program will do, turns it into “machine code”, then saves that to be executed later. 
  • An interpreter steps through the source code line by line, figuring out what it’s doing a...

16

124 reads

CURATED FROM

CURATED BY

mipham

Holiday representative

Related collections

More like this

Action Tree vs AST

Put simply, the action tree is the AST with context. That context is info such as what type a function returns, or that two places in which a variable is used are in fact using the same variable. 

Because it needs to figure out and remember all this context, the code that g...

Python data processing with pandas

Pandas is a Python language package, which is used for data processing. This is a very common basic programming library when we use Python language for machine learning programming. This article is an introductory tutorial to it. Pandas provide fast, flexible and expressive data structures with t...

B-Trees

B-Trees

PostgreSQL implements several types of indexes, such as btree, hash, gist, spgist. The default and most common type of index is btree. A btree (balanced tree) allows for easier and faster searching. This can be seen in the image above where we search for the key with the value of 53. 

Btree...

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Access to 200,000+ ideas

Access to the mobile app

Unlimited idea saving & library

Unlimited history

Unlimited listening to ideas

Downloading & offline access

Personalized recommendations

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

Email

I agree to receive email updates