There are 2 major components of a machine learning modeling project of any kind:
- the data, and
- the algorithms (and other code) which learns something from this data
This is, by definition, also true of natural language processing modeling projects.
In an attempt to reduce the impediment of a lack of component number 1 from above — the data — we recently shared the Big Bad NLP Database, a fantastic collection of nearly 300 well-organized, sortable, and searchable natural language processing datasets, from the folks at Quantum Stat. The BBNLPDB contains datasets ready to go for common NLP tasks and needs, such as document classification, question answering, automated image captioning, dialog, clustering, intent classification, language modeling, machine translation, text corpora, and more.
This is all well and good if you already have a handle on how to approach NLP task implementations and are looking for data to play with. But what if you lack the skills to implement these NLP concepts yourself?
Enter The Super Duper NLP Repo, another fantastic resource also put together by Quantum Stat. SDNLPR is a collection of Colab notebooks covering a wide array of NLP task implementations available to launch in Google Colab with a single click.
Notebook entries in the repo include a general description, the notebook’s creator, as well as the task (text classification, text generation, question answering) being considered and the type of model being employed (BERT, GPT-2, convolution neural network, etc.). These notebooks come from across the web, including from expert individuals and organizations such as Elvis Saravia, Chris McCormick, the TensorFlow team, HuggingFace, Tae Hwan Jung, and many more. There is no lack of task, model, or expertise here.
One-click launching of the notebooks in Google Colab make this about as simple an exercise as it could possibly be.
A few examples of the interesting notebooks listed herein include:
- Twitter Pulse Checker – This is a quick and dirty way to get a sense of what’s trending on Twitter related to a particular topic
- Text-to-Speech with Tacotron+WaveRNN – This is an English female voice TTS demo using an open source project
- AI Dungeon 2 – AI Dungeon 2 is a completely AI generated text adventure built with OpenAI’s largest GPT-2 model. It’s a first of it’s kind game that allows you to enter and will react to any action you can imagine.
By pairing the Big Bad NLP Database and the Super Duper NLP Repo, there’s really no excuse to not learn modern natural language processing if you are so inclined. Check them out and start implementing today.