We will use IMDB Movie Reviews Dataset

What is DistilBERT

BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications.

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of Bert’s performances as measured on the GLUE language understanding benchmark.

alt text

Why DistilBERT

  • Accurate as much as Original BERT Model
  • 60% faster
  • 40% less parameters
  • It can run on CPU

Additional Reading

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

https://arxiv.org/abs/1910.01108

Video Lecture: BERT NLP Tutorial 1- Introduction | BERT Machine Learning | KGP Talkie

https://www.youtube.com/watch?v=h_U27jBNYI4

Ref BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

https://arxiv.org/abs/1810.04805

Understanding searches better than ever before:

https://www.blog.google/products/search/search-language-understanding-bert/

Good Resource to Read More About the BERT:

http://jalammar.github.io/illustrated-bert/


What is ktrain

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

ktrain uses tf.keras in TensorFlow instead of standalone Keras.) Inspired by the fastai library, with only a few lines of code, ktrain allows you to easily:

  • estimate an optimal learning rate for your model given your data using a learning rate finder
  • employ learning rate schedules such as the triangular learning rate policy, 1cycle policy, and SGDR to more effectively train your model
  • employ fast and easy-to-use pre-canned models for both text classification (e.g., NBSVM, fastText, GRU with pretrained word embeddings) and image classification (e.g., ResNet, Wide Residual Networks, Inception)
  • load and preprocess text and image data from a variety of formats

  • inspect data points that were misclassified to help improve your model

  • leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data

Notebook Setup

!pip install ktrain
Collecting ktrain
  Downloading https://files.pythonhosted.org/packages/12/11/49bdde1b08a210365c04367e84d3fb1489db6f4b262358a10719962891a2/ktrain-0.16.3.tar.gz (25.2MB)
     |████████████████████████████████| 25.2MB 117kB/s 
Collecting tensorflow==2.1.0
  Downloading https://files.pythonhosted.org/packages/85/d4/c0cd1057b331bc38b65478302114194bd8e1b9c2bbc06e300935c0e93d90/tensorflow-2.1.0-cp36-cp36m-manylinux2010_x86_64.whl (421.8MB)
     |████████████████████████████████| 421.8MB 26kB/s 
Collecting scikit-learn==0.21.3
  Downloading https://files.pythonhosted.org/packages/a0/c5/d2238762d780dde84a20b8c761f563fe882b88c5a5fb03c056547c442a19/scikit_learn-0.21.3-cp36-cp36m-manylinux1_x86_64.whl (6.7MB)
     |████████████████████████████████| 6.7MB 30.3MB/s 
Requirement already satisfied: matplotlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from ktrain) (3.2.1)
Requirement already satisfied: pandas>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from ktrain) (1.0.4)
Requirement already satisfied: fastprogress>=0.1.21 in /usr/local/lib/python3.6/dist-packages (from ktrain) (0.2.3)
Collecting keras_bert>=0.81.0
  Downloading https://files.pythonhosted.org/packages/ec/08/bffa03eb899b20bfb60553e4503f8bac00b83d415bc6ead08f6b447e8aaa/keras-bert-0.84.0.tar.gz
Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from ktrain) (2.23.0)
Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from ktrain) (0.15.1)
Collecting langdetect
  Downloading https://files.pythonhosted.org/packages/56/a3/8407c1e62d5980188b4acc45ef3d94b933d14a2ebc9ef3505f22cf772570/langdetect-1.0.8.tar.gz (981kB)
     |████████████████████████████████| 983kB 33.8MB/s 
Requirement already satisfied: jieba in /usr/local/lib/python3.6/dist-packages (from ktrain) (0.42.1)
Collecting cchardet==2.1.5
  Downloading https://files.pythonhosted.org/packages/fa/4e/847feebfc3e71c773b23ee06c74687b8c50a5a6d6aaff452a0a4f4eb9a32/cchardet-2.1.5-cp36-cp36m-manylinux1_x86_64.whl (241kB)
     |████████████████████████████████| 245kB 37.8MB/s 
Requirement already satisfied: networkx>=2.3 in /usr/local/lib/python3.6/dist-packages (from ktrain) (2.4)
Requirement already satisfied: bokeh in /usr/local/lib/python3.6/dist-packages (from ktrain) (1.4.0)
Collecting seqeval
  Downloading https://files.pythonhosted.org/packages/34/91/068aca8d60ce56dd9ba4506850e876aba5e66a6f2f29aa223224b50df0de/seqeval-0.0.12.tar.gz
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from ktrain) (20.4)
Requirement already satisfied: tensorflow_datasets in /usr/local/lib/python3.6/dist-packages (from ktrain) (2.1.0)
Collecting transformers>=2.11.0
  Downloading https://files.pythonhosted.org/packages/48/35/ad2c5b1b8f99feaaf9d7cdadaeef261f098c6e1a6a2935d4d07662a6b780/transformers-2.11.0-py3-none-any.whl (674kB)
     |████████████████████████████████| 675kB 36.0MB/s 
Requirement already satisfied: ipython in /usr/local/lib/python3.6/dist-packages (from ktrain) (5.5.0)
Collecting syntok
  Downloading https://files.pythonhosted.org/packages/8c/76/a49e73a04b3e3a14ce232e8e28a1587f8108baa665644fe8c40e307e792e/syntok-1.3.1.tar.gz
Collecting whoosh
  Downloading https://files.pythonhosted.org/packages/ba/19/24d0f1f454a2c1eb689ca28d2f178db81e5024f42d82729a4ff6771155cf/Whoosh-2.7.4-py2.py3-none-any.whl (468kB)
     |████████████████████████████████| 471kB 36.7MB/s 
Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.12.0)
Requirement already satisfied: numpy<2.0,>=1.16.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.18.5)
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Requirement already satisfied: keras-preprocessing>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.1.2)
Requirement already satisfied: protobuf>=3.8.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (3.10.0)
Requirement already satisfied: wrapt>=1.11.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.12.1)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (0.34.2)
Collecting tensorboard<2.2.0,>=2.1.0
  Downloading https://files.pythonhosted.org/packages/d9/41/bbf49b61370e4f4d245d4c6051dfb6db80cec672605c91b1652ac8cc3d38/tensorboard-2.1.1-py3-none-any.whl (3.8MB)
     |████████████████████████████████| 3.9MB 36.4MB/s 
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.1.0)
Requirement already satisfied: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (0.8.1)
Requirement already satisfied: scipy==1.4.1; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.4.1)
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
  Downloading https://files.pythonhosted.org/packages/18/90/b77c328a1304437ab1310b463e533fa7689f4bfc41549593056d812fab8e/tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448kB)
     |████████████████████████████████| 450kB 37.8MB/s 
Requirement already satisfied: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.29.0)
Requirement already satisfied: google-pasta>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (3.2.1)
Requirement already satisfied: absl-py>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (0.9.0)
Requirement already satisfied: keras-applications>=1.0.8 in /usr/local/lib/python3.6/dist-packages (from tensorflow==2.1.0->ktrain) (1.0.8)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0.0->ktrain) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0.0->ktrain) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0.0->ktrain) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib>=3.0.0->ktrain) (1.2.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.6/dist-packages (from pandas>=1.0.1->ktrain) (2018.9)
Requirement already satisfied: Keras in /usr/local/lib/python3.6/dist-packages (from keras_bert>=0.81.0->ktrain) (2.3.1)
Collecting keras-transformer>=0.37.0
  Downloading https://files.pythonhosted.org/packages/8a/2b/c465241bd3f37a3699246827ff4ad7974c6edeaa69cf9cdcff2fd1d3ba46/keras-transformer-0.37.0.tar.gz
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->ktrain) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->ktrain) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->ktrain) (2020.4.5.2)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->ktrain) (3.0.4)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.6/dist-packages (from networkx>=2.3->ktrain) (4.4.2)
Requirement already satisfied: Jinja2>=2.7 in /usr/local/lib/python3.6/dist-packages (from bokeh->ktrain) (2.11.2)
Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.6/dist-packages (from bokeh->ktrain) (3.13)
Requirement already satisfied: pillow>=4.0 in /usr/local/lib/python3.6/dist-packages (from bokeh->ktrain) (7.0.0)
Requirement already satisfied: tornado>=4.3 in /usr/local/lib/python3.6/dist-packages (from bokeh->ktrain) (4.5.3)
Requirement already satisfied: dill in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (0.3.1.1)
Requirement already satisfied: future in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (0.16.0)
Requirement already satisfied: attrs>=18.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (19.3.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (4.41.1)
Requirement already satisfied: promise in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (2.3)
Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.6/dist-packages (from tensorflow_datasets->ktrain) (0.22.2)
Collecting tokenizers==0.7.0
  Downloading https://files.pythonhosted.org/packages/14/e5/a26eb4716523808bb0a799fcfdceb6ebf77a18169d9591b2f46a9adb87d9/tokenizers-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (3.8MB)
     |████████████████████████████████| 3.8MB 41.0MB/s 
Collecting sentencepiece
  Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
     |████████████████████████████████| 1.1MB 33.4MB/s 
Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers>=2.11.0->ktrain) (0.7)
Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers>=2.11.0->ktrain) (3.0.12)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers>=2.11.0->ktrain) (2019.12.20)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
     |████████████████████████████████| 890kB 35.7MB/s 
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (4.8.0)
Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (47.1.1)
Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (0.8.1)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (1.0.18)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (0.7.5)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (4.3.3)
Requirement already satisfied: pygments in /usr/local/lib/python3.6/dist-packages (from ipython->ktrain) (2.1.3)
Requirement already satisfied: google-auth<2,>=1.6.3 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.17.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.2.2)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.0.1)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.6/dist-packages (from tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.4.1)
Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.8->tensorflow==2.1.0->ktrain) (2.10.0)
Collecting keras-pos-embd>=0.11.0
  Downloading https://files.pythonhosted.org/packages/09/70/b63ed8fc660da2bb6ae29b9895401c628da5740c048c190b5d7107cadd02/keras-pos-embd-0.11.0.tar.gz
Collecting keras-multi-head>=0.27.0
  Downloading https://files.pythonhosted.org/packages/e6/32/45adf2549450aca7867deccfa04af80a0ab1ca139af44b16bc669e0e09cd/keras-multi-head-0.27.0.tar.gz
Collecting keras-layer-normalization>=0.14.0
  Downloading https://files.pythonhosted.org/packages/a4/0e/d1078df0494bac9ce1a67954e5380b6e7569668f0f3b50a9531c62c1fc4a/keras-layer-normalization-0.14.0.tar.gz
Collecting keras-position-wise-feed-forward>=0.6.0
  Downloading https://files.pythonhosted.org/packages/e3/59/f0faa1037c033059e7e9e7758e6c23b4d1c0772cd48de14c4b6fd4033ad5/keras-position-wise-feed-forward-0.6.0.tar.gz
Collecting keras-embed-sim>=0.7.0
  Downloading https://files.pythonhosted.org/packages/bc/20/735fd53f6896e2af63af47e212601c1b8a7a80d00b6126c388c9d1233892/keras-embed-sim-0.7.0.tar.gz
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.6/dist-packages (from Jinja2>=2.7->bokeh->ktrain) (1.1.1)
Requirement already satisfied: googleapis-common-protos in /usr/local/lib/python3.6/dist-packages (from tensorflow-metadata->tensorflow_datasets->ktrain) (1.52.0)
Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers>=2.11.0->ktrain) (7.1.2)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.6/dist-packages (from pexpect; sys_platform != "win32"->ipython->ktrain) (0.6.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.6/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->ipython->ktrain) (0.2.4)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.6/dist-packages (from traitlets>=4.2->ipython->ktrain) (0.2.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (4.1.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.2.8)
Requirement already satisfied: rsa<5,>=3.1.4; python_version >= "3" in /usr/local/lib/python3.6/dist-packages (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (4.6)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.6.1)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.6/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (1.3.0)
Collecting keras-self-attention==0.46.0
  Downloading https://files.pythonhosted.org/packages/15/6b/c804924a056955fa1f3ff767945187103cfc851ba9bd0fc5a6c6bc18e2eb/keras-self-attention-0.46.0.tar.gz
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.6/dist-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (0.4.8)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata; python_version < "3.8"->markdown>=2.6.8->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.1.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.6/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow==2.1.0->ktrain) (3.1.0)
Building wheels for collected packages: ktrain, keras-bert, langdetect, seqeval, syntok, gast, keras-transformer, sacremoses, keras-pos-embd, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-self-attention
  Building wheel for ktrain (setup.py) ... done
  Created wheel for ktrain: filename=ktrain-0.16.3-cp36-none-any.whl size=25246180 sha256=87647334c3fae83c22b2d9ca6302341ac072868f7e98c95bfee703a1ed816de1
  Stored in directory: /root/.cache/pip/wheels/2b/24/21/a52cc4dcb0b438cf570e9d44c017761d41335d9af4c189ebad
  Building wheel for keras-bert (setup.py) ... done
  Created wheel for keras-bert: filename=keras_bert-0.84.0-cp36-none-any.whl size=36139 sha256=486e8a0911c77558ef0a2948d846bd0a7c8caa512e1737890438ff1428fea64f
  Stored in directory: /root/.cache/pip/wheels/1f/59/04/12e95257aebfd27f7edaaf65ab7dd57b5f6cadfb183f1a4ccd
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.8-cp36-none-any.whl size=993193 sha256=b1438f4300112c372f0b1f25fba841d7bb691cc0fe4ffccad34fabbaa094ac4e
  Stored in directory: /root/.cache/pip/wheels/8d/b3/aa/6d99de9f3841d7d3d40a60ea06e6d669e8e5012e6c8b947a57
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-0.0.12-cp36-none-any.whl size=7424 sha256=0011a22a9cb81570958e02e8c75bb7921a1245744cd9f7c9466c0066028af235
  Stored in directory: /root/.cache/pip/wheels/4f/32/0a/df3b340a82583566975377d65e724895b3fad101a3fb729f68
  Building wheel for syntok (setup.py) ... done
  Created wheel for syntok: filename=syntok-1.3.1-cp36-none-any.whl size=20918 sha256=c9b802faee34c0958abcab53266c8d3447880710ff8629ba2edc78451f44e8a1
  Stored in directory: /root/.cache/pip/wheels/51/c6/a4/be1920586c49469846bcd2888200bdecfe109ec421dab9be2d
  Building wheel for gast (setup.py) ... done
  Created wheel for gast: filename=gast-0.2.2-cp36-none-any.whl size=7540 sha256=b359676afad7b05ddd3d256f0f3a3c7fa980125834e7ed1ab3970483812faff5
  Stored in directory: /root/.cache/pip/wheels/5c/2e/7e/a1d4d4fcebe6c381f378ce7743a3ced3699feb89bcfbdadadd
  Building wheel for keras-transformer (setup.py) ... done
  Created wheel for keras-transformer: filename=keras_transformer-0.37.0-cp36-none-any.whl size=12942 sha256=00147eca000057fd9c95bc770178666049d34bc10877dcc7327d53cb3eeeee41
  Stored in directory: /root/.cache/pip/wheels/2a/f9/31/2a3289e948852ce0dd3fcd94c34bbc7eb9628842cb7110a87b
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893260 sha256=bef554a9fb8f31e28dfec5bb59361130a0a372696c745ca1d7054262e3632928
  Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
  Building wheel for keras-pos-embd (setup.py) ... done
  Created wheel for keras-pos-embd: filename=keras_pos_embd-0.11.0-cp36-none-any.whl size=7554 sha256=68b30e41c554662b4e009a8d6636699595dc5ee1a0e1a97f779f8d8b836c8324
  Stored in directory: /root/.cache/pip/wheels/5b/a1/a0/ce6b1d49ba1a9a76f592e70cf297b05c96bc9f418146761032
  Building wheel for keras-multi-head (setup.py) ... done
  Created wheel for keras-multi-head: filename=keras_multi_head-0.27.0-cp36-none-any.whl size=15611 sha256=96c09d9d783c2ec3076c95603b9fcd62201f3ed6c063070391e81427abf60160
  Stored in directory: /root/.cache/pip/wheels/b5/b4/49/0a0c27dcb93c13af02fea254ff51d1a43a924dd4e5b7a7164d
  Building wheel for keras-layer-normalization (setup.py) ... done
  Created wheel for keras-layer-normalization: filename=keras_layer_normalization-0.14.0-cp36-none-any.whl size=5268 sha256=6c441851437f477ede7fefdc1b3d77f581c5ac12d67c2de8eb90a6b36dddd32c
  Stored in directory: /root/.cache/pip/wheels/54/80/22/a638a7d406fd155e507aa33d703e3fa2612b9eb7bb4f4fe667
  Building wheel for keras-position-wise-feed-forward (setup.py) ... done
  Created wheel for keras-position-wise-feed-forward: filename=keras_position_wise_feed_forward-0.6.0-cp36-none-any.whl size=5623 sha256=cf0f5d03abd28d31d000da0934c97be89fcfdca2543ad026cb0d64cd2356bd9b
  Stored in directory: /root/.cache/pip/wheels/39/e2/e2/3514fef126a00574b13bc0b9e23891800158df3a3c19c96e3b
  Building wheel for keras-embed-sim (setup.py) ... done
  Created wheel for keras-embed-sim: filename=keras_embed_sim-0.7.0-cp36-none-any.whl size=4676 sha256=249ebdeb0525e75c1a25adc7a39167cff0981d527c8226601e86512f1d0f2c9b
  Stored in directory: /root/.cache/pip/wheels/d1/bc/b1/b0c45cee4ca2e6c86586b0218ffafe7f0703c6d07fdf049866
  Building wheel for keras-self-attention (setup.py) ... done
  Created wheel for keras-self-attention: filename=keras_self_attention-0.46.0-cp36-none-any.whl size=17278 sha256=058cd5b376a4c74dc0b02c1d98a9014e2f90a3d7ce65fa8c6523b32466a4d4b7
  Stored in directory: /root/.cache/pip/wheels/d2/2e/80/fec4c05eb23c8e13b790e26d207d6e0ffe8013fad8c6bdd4d2
Successfully built ktrain keras-bert langdetect seqeval syntok gast keras-transformer sacremoses keras-pos-embd keras-multi-head keras-layer-normalization keras-position-wise-feed-forward keras-embed-sim keras-self-attention
ERROR: tensorflow-probability 0.10.0 has requirement gast>=0.3.2, but you'll have gast 0.2.2 which is incompatible.
Installing collected packages: gast, tensorboard, tensorflow-estimator, tensorflow, scikit-learn, keras-pos-embd, keras-self-attention, keras-multi-head, keras-layer-normalization, keras-position-wise-feed-forward, keras-embed-sim, keras-transformer, keras-bert, langdetect, cchardet, seqeval, tokenizers, sentencepiece, sacremoses, transformers, syntok, whoosh, ktrain
  Found existing installation: gast 0.3.3
    Uninstalling gast-0.3.3:
      Successfully uninstalled gast-0.3.3
  Found existing installation: tensorboard 2.2.2
    Uninstalling tensorboard-2.2.2:
      Successfully uninstalled tensorboard-2.2.2
  Found existing installation: tensorflow-estimator 2.2.0
    Uninstalling tensorflow-estimator-2.2.0:
      Successfully uninstalled tensorflow-estimator-2.2.0
  Found existing installation: tensorflow 2.2.0
    Uninstalling tensorflow-2.2.0:
      Successfully uninstalled tensorflow-2.2.0
  Found existing installation: scikit-learn 0.22.2.post1
    Uninstalling scikit-learn-0.22.2.post1:
      Successfully uninstalled scikit-learn-0.22.2.post1
Successfully installed cchardet-2.1.5 gast-0.2.2 keras-bert-0.84.0 keras-embed-sim-0.7.0 keras-layer-normalization-0.14.0 keras-multi-head-0.27.0 keras-pos-embd-0.11.0 keras-position-wise-feed-forward-0.6.0 keras-self-attention-0.46.0 keras-transformer-0.37.0 ktrain-0.16.3 langdetect-1.0.8 sacremoses-0.0.43 scikit-learn-0.21.3 sentencepiece-0.1.91 seqeval-0.0.12 syntok-1.3.1 tensorboard-2.1.1 tensorflow-2.1.0 tensorflow-estimator-2.1.0 tokenizers-0.7.0 transformers-2.11.0 whoosh-2.7.4
!git clone https://github.com/laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k.git
Cloning into 'IMDB-Movie-Reviews-Large-Dataset-50k'...
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (10/10), done.
 
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf
data_test = pd.read_excel('/content/IMDB-Movie-Reviews-Large-Dataset-50k/test.xlsx', dtype= str)
data_train = pd.read_excel('/content/IMDB-Movie-Reviews-Large-Dataset-50k/train.xlsx', dtype = str)
data_train.sample(5)
Reviews Sentiment
16715 The sequel to the ever popular Cinderella stor... pos
11207 Excellent pirate entertainment! It has all the... pos
12609 The Underground Comedy movie is perhaps one of... neg
10685 My cable TV has what's called the Arts channel... pos
1633 This movie was terrible. Throughout the whole ... neg
text.print_text_classifiers()
fasttext: a fastText-like model [http://arxiv.org/pdf/1607.01759.pdf]
logreg: logistic regression using a trainable Embedding layer
nbsvm: NBSVM model [http://www.aclweb.org/anthology/P12-2018]
bigru: Bidirectional GRU with pretrained fasttext word vectors [https://fasttext.cc/docs/en/crawl-vectors.html]
standard_gru: simple 2-layer GRU with randomly initialized embeddings
bert: Bidirectional Encoder Representations from Transformers (BERT) [https://arxiv.org/abs/1810.04805]
distilbert: distilled, smaller, and faster BERT from Hugging Face [https://arxiv.org/abs/1910.01108]
(train, val, preproc) = text.texts_from_df(train_df=data_train, text_column='Reviews', label_columns='Sentiment',
                   val_df = data_test,
                   maxlen = 400,
                   preprocess_mode = 'distilbert')
preprocessing train...
language: en
train sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913
Is Multi-Label? False
preprocessing test...
language: en
test sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913
model = text.text_classifier(name = 'distilbert', train_data = train, preproc=preproc)
Is Multi-Label? False
maxlen is 400
done.
learner = ktrain.get_learner(model = model,
                             train_data = train,
                             val_data = val,
                             batch_size = 6)
learner.fit_onecycle(lr = 2e-5, epochs=2)

begin training using onecycle policy with max lr of 2e-05...
Train for 4167 steps, validate for 782 steps
Epoch 1/2
4167/4167 [==============================] - 3154s 757ms/step - loss: 0.2932 - accuracy: 0.8717 - val_loss: 0.1613 - val_accuracy: 0.9406
Epoch 2/2
4167/4167 [==============================] - 3131s 751ms/step - loss: 0.1552 - accuracy: 0.9440 - val_loss: 0.0623 - val_accuracy: 0.9836
<tensorflow.python.keras.callbacks.History at 0x7fe2fc067048>
predictor = ktrain.get_predictor(learner.model, preproc)
from google.colab import drive
drive.mount('/content/drive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive
predictor.save('/content/drive/My Drive/distilbert')
data = ['this movie was really bad. acting was also bad. I will not watch again',
        'the movie was really great. I will see it again', 'another great movie. must watch to everyone']
predictor.predict(data)
['neg', 'pos', 'pos']
predictor.get_classes()
['neg', 'pos']
predictor.predict(data, return_proba=True)
array([[0.9944576 , 0.00554235],
       [0.00516187, 0.99483806],
       [0.00479033, 0.99520963]], dtype=float32)