This notebook shows how to reuse a nlp.networks.BertEncoder from TensorFlow Model Garden to power three tasks: (1) pretraining with nlp.models.BertPretrainer (masked-LM + next-sentence), (2) span labeling with nlp.models.BertSpanLabeler (start/end logits for SQuAD-style QA), and (3) classification with nlp.models.BertClassifier ([CLS] head). You install tf-models-official (or tf-models-nightly for latest), import tensorflow_models.nlp, build small dummy examples, run each model forward pass, and compute losses (weighted sparse CE for MLM/NSP; CE for span start/end; CE for classification). Result: a clear pattern for wrapping one encoder into multiple BERT task heads with concise, production-friendly APIs.This notebook shows how to reuse a nlp.networks.BertEncoder from TensorFlow Model Garden to power three tasks: (1) pretraining with nlp.models.BertPretrainer (masked-LM + next-sentence), (2) span labeling with nlp.models.BertSpanLabeler (start/end logits for SQuAD-style QA), and (3) classification with nlp.models.BertClassifier ([CLS] head). You install tf-models-official (or tf-models-nightly for latest), import tensorflow_models.nlp, build small dummy examples, run each model forward pass, and compute losses (weighted sparse CE for MLM/NSP; CE for span start/end; CE for classification). Result: a clear pattern for wrapping one encoder into multiple BERT task heads with concise, production-friendly APIs.

TensorFlow Models NLP Library for Beginners

2025/09/08 17:40

Content Overview

  • Learning objectives

  • Install and import

  • Install the TensorFlow Model Garden pip package

  • Import TensorFlow and other libraries

  • BERT pretraining model

  • Build a BertPretrainer model wrapping BertEncoder

  • Compute loss

  • Span labelling model

  • Build a BertSpanLabeler wrapping BertEncoder

  • Compute loss

  1. Classification model
  2. Build a BertClassifier model wrapping BertEncoder
  3. Compute loss

\

Learning objectives

In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from NLP modeling library.

Install and import

Install the TensorFlow Model Garden pip package

  • tf-models-official is the stable Model Garden package. Note that it may not include the latest changes in the tensorflow_models github repo. To include latest changes, you may install tf-models-nightly, which is the nightly Model Garden package created daily automatically.
  • pip will install all models and dependencies automatically.

\

pip install tf-models-official 

Import Tensorflow and other libraries

import numpy as np import tensorflow as tf  from tensorflow_models import nlp 

\

2023-10-17 12:23:04.557393: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-10-17 12:23:04.557445: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-10-17 12:23:04.557482: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 

BERT pretraining model

BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.

In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data.

Build a BertPretrainer model wrapping BertEncoder

The nlp.networks.BertEncoder class implements the Transformer-based encoder as described in BERT paper. It includes the embedding lookups and transformer layers (nlp.layers.TransformerEncoderBlock), but not the masked language model or classification task networks.

The nlp.models.BertPretrainer class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives.

\

# Build a small transformer network. vocab_size = 100 network = nlp.networks.BertEncoder(     vocab_size=vocab_size,      # The number of TransformerEncoderBlock layers     num_layers=3) 

\

2023-10-17 12:23:09.241708: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2211] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 

Inspecting the encoder, we see it contains few embedding layers, stacked nlp.layers.TransformerEncoderBlock layers and are connected to three input layers:

input_word_idsinput_type_ids and input_mask.

\

tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a BERT pretrainer with the created network. num_token_predictions = 8 bert_pretrainer = nlp.models.BertPretrainer(     network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions') 

\

WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.9/site-packages/official/nlp/modeling/models/bert_pretrainer.py:112: Classification.__init__ (from official.nlp.modeling.networks.classification) is deprecated and will be removed in a future version. Instructions for updating: Classification as a network is deprecated. Please use the layers.ClassificationHead instead. 

Inspecting the bert_pretrainer, we see it wraps the encoder with additional MaskedLM and nlp.layers.ClassificationHead heads.

\

tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48) 

\

# We can feed some dummy data to get masked language model and sentence output. sequence_length = 16 batch_size = 2  word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))  outputs = bert_pretrainer(     [word_id_data, mask_data, type_id_data, masked_lm_positions_data]) lm_output = outputs["masked_lm"] sentence_output = outputs["classification"] print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}') print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}') 

\

lm_output: shape=(2, 8, 100), dtype=tf.float32 sentence_output: shape=(2, 2), dtype=tf.float32 

Compute loss

Next, we can use lm_output and sentence_output to compute loss.

\

masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions)) masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions)) next_sentence_labels_data = np.random.randint(2, size=(batch_size))  mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(     labels=masked_lm_ids_data,     predictions=lm_output,     weights=masked_lm_weights_data) sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(     labels=next_sentence_labels_data,     predictions=sentence_output) loss = mlm_loss + sentence_loss  print(loss) 

\

tf.Tensor(5.2983174, shape=(), dtype=float32) 

With the loss, you can optimize the model. After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see run_pretraining.py for the full example.

Span labeling model

Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.

In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity.

Build a BertSpanLabeler wrapping BertEncoder

The nlp.models.BertSpanLabeler class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.

Note that nlp.models.BertSpanLabeler wraps a nlp.networks.BertEncoder, the weights of which can be restored from the above pretraining model.

\

network = nlp.networks.BertEncoder(         vocab_size=vocab_size, num_layers=2)  # Create a BERT trainer with the created network. bert_span_labeler = nlp.models.BertSpanLabeler(network) 

Inspecting the bert_span_labeler, we see it wraps the encoder with additional SpanLabeling that outputs start_position and end_position.

\

tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a set of 2-dimensional data tensors to feed into the model. word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length))  # Feed the data to the model. start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])  print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}') print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}') 

\

start_logits: shape=(2, 16), dtype=tf.float32 end_logits: shape=(2, 16), dtype=tf.float32 

Compute loss

With start_logits and end_logits, we can compute loss:

\

start_positions = np.random.randint(sequence_length, size=(batch_size)) end_positions = np.random.randint(sequence_length, size=(batch_size))  start_loss = tf.keras.losses.sparse_categorical_crossentropy(     start_positions, start_logits, from_logits=True) end_loss = tf.keras.losses.sparse_categorical_crossentropy(     end_positions, end_logits, from_logits=True)  total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2 print(total_loss) 

\

tf.Tensor(5.3621416, shape=(), dtype=float32) 

With the loss, you can optimize the model. Please see run_squad.py for the full example.

Classification model

In the last section, we show how to build a text classification model.

Build a BertClassifier model wrapping BertEncoder

nlp.models.BertClassifier implements a [CLS] token classification model containing a single classification head.

\

network = nlp.networks.BertEncoder(         vocab_size=vocab_size, num_layers=2)  # Create a BERT trainer with the created network. num_classes = 2 bert_classifier = nlp.models.BertClassifier(     network, num_classes=num_classes) 

Inspecting the bert_classifier, we see it wraps the encoder with additional Classification head.

\

tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48) 

\

# Create a set of 2-dimensional data tensors to feed into the model. word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length)) mask_data = np.random.randint(2, size=(batch_size, sequence_length)) type_id_data = np.random.randint(2, size=(batch_size, sequence_length))  # Feed the data to the model. logits = bert_classifier([word_id_data, mask_data, type_id_data]) print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}') 

\

logits: shape=(2, 2), dtype=tf.float32 

Compute loss

With logits, we can compute loss:

\

labels = np.random.randint(num_classes, size=(batch_size))  loss = tf.keras.losses.sparse_categorical_crossentropy(     labels, logits, from_logits=True) print(loss) 

\

tf.Tensor([0.7332015 1.3447659], shape=(2,), dtype=float32) 

With the loss, you can optimize the model. Please see the Fine tune_bert notebook or the model training documentation for the full example.

\ \

:::info Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future

Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future

BitcoinWorld Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future In the dynamic world of decentralized computing, exciting developments are constantly shaping the future. Today, all eyes are on Akash Network, the innovative supercloud project, as it proposes a significant change to its tokenomics. This move aims to strengthen the value of its native token, AKT, and further solidify its position in the competitive blockchain space. The community is buzzing about a newly submitted governance proposal that could introduce a game-changing Burn Mint Equilibrium (BME) model. What is the Burn Mint Equilibrium (BME) for Akash Network? The core of this proposal revolves around a concept called Burn Mint Equilibrium, or BME. Essentially, this model is designed to create a balance in the token’s circulating supply by systematically removing a portion of tokens from existence. For Akash Network, this means burning an amount of AKT that is equivalent to the U.S. dollar value of fees paid by network users. Fee Conversion: When users pay for cloud services on the Akash Network, these fees are typically collected in various cryptocurrencies or stablecoins. AKT Equivalence: The proposal suggests converting the U.S. dollar value of these collected fees into an equivalent amount of AKT. Token Burn: This calculated amount of AKT would then be permanently removed from circulation, or ‘burned’. This mechanism creates a direct link between network utility and token supply reduction. As more users utilize the decentralized supercloud, more AKT will be burned, potentially impacting the token’s scarcity and value. Why is This Proposal Crucial for AKT Holders? For anyone holding AKT, or considering investing in the Akash Network ecosystem, this proposal carries significant weight. Token burning mechanisms are often viewed as a positive development because they can lead to increased scarcity. When supply decreases while demand remains constant or grows, the price per unit tends to increase. Here are some key benefits: Increased Scarcity: Burning tokens reduces the total circulating supply of AKT. This makes each remaining token potentially more valuable over time. Demand-Supply Dynamics: The BME model directly ties the burning of AKT to network usage. Higher adoption of the Akash Network supercloud translates into more fees, and thus more AKT burned. Long-Term Value Proposition: By creating a deflationary pressure, the proposal aims to enhance AKT’s long-term value, making it a more attractive asset for investors and long-term holders. This strategic move demonstrates a commitment from the Akash Network community to optimize its tokenomics for sustainable growth and value appreciation. How Does BME Impact the Decentralized Supercloud Mission? Beyond token value, the BME proposal aligns perfectly with the broader mission of the Akash Network. As a decentralized supercloud, Akash provides a marketplace for cloud computing resources, allowing users to deploy applications faster, more efficiently, and at a lower cost than traditional providers. The BME model reinforces this utility. Consider these impacts: Network Health: A stronger AKT token can incentivize more validators and providers to secure and contribute resources to the network, improving its overall health and resilience. Ecosystem Growth: Enhanced token value can attract more developers and projects to build on the Akash Network, fostering a vibrant and diverse ecosystem. User Incentive: While users pay fees, the potential appreciation of AKT could indirectly benefit those who hold the token, creating a circular economy within the supercloud. This proposal is not just about burning tokens; it’s about building a more robust, self-sustaining, and economically sound decentralized cloud infrastructure for the future. What Are the Next Steps for the Akash Network Community? As a governance proposal, the BME model will now undergo a period of community discussion and voting. This is a crucial phase where AKT holders and network participants can voice their opinions, debate the merits, and ultimately decide on the future direction of the project. Transparency and community engagement are hallmarks of decentralized projects like Akash Network. Challenges and Considerations: Implementation Complexity: Ensuring the burning mechanism is technically sound and transparent will be vital. Community Consensus: Achieving broad agreement within the diverse Akash Network community is key for successful adoption. The outcome of this vote will significantly shape the tokenomics and economic model of the Akash Network, influencing its trajectory in the rapidly evolving decentralized cloud landscape. The proposal to introduce a Burn Mint Equilibrium model represents a bold and strategic step for Akash Network. By directly linking network usage to token scarcity, the project aims to create a more resilient and valuable AKT token, ultimately strengthening its position as a leading decentralized supercloud provider. This move underscores the project’s commitment to innovative tokenomics and sustainable growth, promising an exciting future for both users and investors in the Akash Network ecosystem. It’s a clear signal that Akash is actively working to enhance its value proposition and maintain its competitive edge in the decentralized future. Frequently Asked Questions (FAQs) 1. What is the main goal of the Burn Mint Equilibrium (BME) proposal for Akash Network? The primary goal is to adjust the circulating supply of AKT tokens by burning a portion of network fees, thereby creating deflationary pressure and potentially enhancing the token’s long-term value and scarcity. 2. How will the amount of AKT to be burned be determined? The proposal suggests burning an amount of AKT equivalent to the U.S. dollar value of fees paid by users on the Akash Network for cloud services. 3. What are the potential benefits for AKT token holders? Token holders could benefit from increased scarcity of AKT, which may lead to higher demand and appreciation in value over time, especially as network usage grows. 4. How does this proposal relate to the overall mission of Akash Network? The BME model reinforces the Akash Network‘s mission by creating a stronger, more economically robust ecosystem. A healthier token incentivizes network participants, fostering growth and stability for the decentralized supercloud. 5. What is the next step for this governance proposal? The proposal will undergo a period of community discussion and voting by AKT token holders. The community’s decision will determine if the BME model is implemented on the Akash Network. If you found this article insightful, consider sharing it with your network! Your support helps us bring more valuable insights into the world of decentralized technology. Stay informed and help spread the word about the exciting developments happening within Akash Network. To learn more about the latest crypto market trends, explore our article on key developments shaping decentralized cloud solutions price action. This post Akash Network’s Strategic Move: A Crucial Burn for AKT’s Future first appeared on BitcoinWorld.
Share
Coinstats2025/09/22 21:35