current coding pre-training systems heavily rely on either an encoder-only model similar to BERT or a decoder-only model like GPT.
Either way, it is suboptimal for generation and understanding tasks.
CodeBERT needs an additional decoder when used for tasks like code summarization. Apart from the above issue, most current methods adopt the conventional NLP pre-training techniques on source code by considering it a sequence of tokens like in natural language.
This largely ignores the rich structural information present in programming languages, which is vital to comprehend its semantics fully.
4
12 reads
The idea is part of this collection:
Learn more about artificialintelligence with this collection
Find out the challenges it poses
Learn about the potential impact on society
Understanding the concept of Metaverse
Related collections
Read & Learn
20x Faster
without
deepstash
with
deepstash
with
deepstash
Personalized microlearning
—
100+ Learning Journeys
—
Access to 200,000+ ideas
—
Access to the mobile app
—
Unlimited idea saving
—
—
Unlimited history
—
—
Unlimited listening to ideas
—
—
Downloading & offline access
—
—
Supercharge your mind with one idea per day
Enter your email and spend 1 minute every day to learn something new.
I agree to receive email updates