Everything you need to know about PolyCoder programming language

TechGig
2 min readMar 23, 2022

--

PolyCoder is an open-source programming language developed by Carnegie Mellon University researchers. This programming language is a paradigm for creating automatic code that has been taught on a variety of programming languages.

PolyCoder is an open-source programming language released by researchers from Carnegie Mellon University. This programming language is an automated code generator model which is trained on multiple programming languages. This programming language is particularly good at writing codes in C. .

Researchers have a hope that this open-source programming language will be able to democratise the research that takes place in the field of AI code generation.

The researchers of this programming language said that large language models (LMs) of code have recently demonstrated great potential in terms of completing and synthesising code from natural language descriptions. The current state-of-the-art code LMS, on the other hand, is not publicly available, raising several issues regarding their model and data design decisions.

The researchers also added that OpenAI’s Codex, which was announced in August, is accessible through Microsoft-owned GitHub’s Copilot tool, but that it only allows non-free access to the model’s output via black-box API calls, with the model’s weights and training data unavailable.

The concept behind auto code generation is that it can save time for developers if the output is accurate and does not include security issues.

The researchers also noted that despite the huge success of massive language models of code, the most powerful models aren’t open to the public. This limits research in this sector for low-resource firms and prevents the use of these models outside of well-resourced companies.

To address this, the researchers developed “PolyCoder,” a model that was trained on code from a variety of programming languages.

Data from various GitHub repositories were used to train the model, which covered 12 popular programming languages: C, C#, C++, Go, Java, JavaScript, PHP, Python, Ruby, Rust, Scala, and TypeScript. There were 631 GB of data and 38.9 million files in the raw dataset. Due to funding constraints, the researchers chose GPT-2 to train PolyCoder.

Some areas of success were claimed by the researchers, particularly in C. In other languages, however, Codex still won.

For more such content, visit: https://bit.ly/2XkTP0P

--

--

TechGig
TechGig

Written by TechGig

India's Largest Tech Community | 4.9 Million+ Developers | Guinness World Record Winner | Limca Book of Records

No responses yet