Keras preprocessing layers make it easier to build end-to-end machine learning pipelines that handle raw text, numbers, categories, and images directly inside your model. This article walks through the available preprocessing layers, the adapt() method, and different strategies for placing preprocessing either inside the model or in the tf.data pipeline. You’ll also learn how to combine preprocessing with multi-worker training and export portable inference models, ensuring consistency, scalability, and better performance across environments.Keras preprocessing layers make it easier to build end-to-end machine learning pipelines that handle raw text, numbers, categories, and images directly inside your model. This article walks through the available preprocessing layers, the adapt() method, and different strategies for placing preprocessing either inside the model or in the tf.data pipeline. You’ll also learn how to combine preprocessing with multi-worker training and export portable inference models, ensuring consistency, scalability, and better performance across environments.

Beginner’s Guide to Keras Preprocessing Layers

2025/09/19 04:25
8 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Content Overview

  • Keras preprocessing
  • Available preprocessing
  • Text preprocessing
  • Numerical features preprocessing
  • Categorical features preprocessing
  • Image preprocessing
  • Image data augmentation
  • The adapt() method
  • Preprocessing data before the model or inside the model
  • Benefits of doing preprocessing inside the model at inference time
  • Preprocessing during multi-worker training

\

Keras preprocessing

The Keras preprocessing layers API allows developers to build Keras-native input processing pipelines. These input processing pipelines can be used as independent preprocessing code in non-Keras workflows, combined directly with Keras models, and exported as part of a Keras SavedModel.

With Keras preprocessing layers, you can build and export models that are truly end-to-end: models that accept raw images or raw structured data as input; models that handle feature normalization or feature value indexing on their own.

Available preprocessing

Text preprocessing

  • tf.keras.layers.TextVectorization: turns raw strings into an encoded representation that can be read by an Embedding layer or Dense layer.

Numerical features preprocessing

  • tf.keras.layers.Normalization: performs feature-wise normalization of input features.
  • tf.keras.layers.Discretization: turns continuous numerical features into integer categorical features.

Categorical features preprocessing

  • tf.keras.layers.CategoryEncoding: turns integer categorical features into one-hot, multi-hot, or count dense representations.
  • tf.keras.layers.Hashing: performs categorical feature hashing, also known as the "hashing trick".
  • tf.keras.layers.StringLookup: turns string categorical values into an encoded representation that can be read by an Embedding layer or Dense layer.
  • tf.keras.layers.IntegerLookup: turns integer categorical values into an encoded representation that can be read by an Embedding layer or Dense layer.

Image preprocessing

These layers are for standardizing the inputs of an image model.

  • tf.keras.layers.Resizing: resizes a batch of images to a target size.
  • tf.keras.layers.Rescaling: rescales and offsets the values of a batch of images (e.g. go from inputs in the [0, 255] range to inputs in the [0, 1] range.
  • tf.keras.layers.CenterCrop: returns a center crop of a batch of images.

Image data augmentation

These layers apply random augmentation transforms to a batch of images. They are only active during training.

  • tf.keras.layers.RandomCrop
  • tf.keras.layers.RandomFlip
  • tf.keras.layers.RandomTranslation
  • tf.keras.layers.RandomRotation
  • tf.keras.layers.RandomZoom
  • tf.keras.layers.RandomContrast

The adapt() method

Some preprocessing layers have an internal state that can be computed based on a sample of the training data. The list of stateful preprocessing layers is:

  • TextVectorization: holds a mapping between string tokens and integer indices
  • StringLookup and IntegerLookup: hold a mapping between input values and integer indices.
  • Normalization: holds the mean and standard deviation of the features.
  • Discretization: holds information about value bucket boundaries.

Crucially, these layers are non-trainable. Their state is not set during training; it must be set before training, either by initializing them from a precomputed constant, or by "adapting" them on data.

You set the state of a preprocessing layer by exposing it to training data, via the adapt() method:

\

import numpy as np import tensorflow as tf from tensorflow import keras from keras import layers  data = np.array(     [         [0.1, 0.2, 0.3],         [0.8, 0.9, 1.0],         [1.5, 1.6, 1.7],     ] ) layer = layers.Normalization() layer.adapt(data) normalized_data = layer(data)  print("Features mean: %.2f" % (normalized_data.numpy().mean())) print("Features std: %.2f" % (normalized_data.numpy().std())) 

\

Features mean: -0.00 Features std: 1.00 

The adapt() method takes either a Numpy array or a tf.data.Dataset object. In the case of StringLookup and TextVectorization, you can also pass a list of strings:

\

data = [     "ξεῖν᾽, ἦ τοι μὲν ὄνειροι ἀμήχανοι ἀκριτόμυθοι",     "γίγνοντ᾽, οὐδέ τι πάντα τελείεται ἀνθρώποισι.",     "δοιαὶ γάρ τε πύλαι ἀμενηνῶν εἰσὶν ὀνείρων:",     "αἱ μὲν γὰρ κεράεσσι τετεύχαται, αἱ δ᾽ ἐλέφαντι:",     "τῶν οἳ μέν κ᾽ ἔλθωσι διὰ πριστοῦ ἐλέφαντος,",     "οἵ ῥ᾽ ἐλεφαίρονται, ἔπε᾽ ἀκράαντα φέροντες:",     "οἱ δὲ διὰ ξεστῶν κεράων ἔλθωσι θύραζε,",     "οἵ ῥ᾽ ἔτυμα κραίνουσι, βροτῶν ὅτε κέν τις ἴδηται.", ] layer = layers.TextVectorization() layer.adapt(data) vectorized_text = layer(data) print(vectorized_text) 

\

tf.Tensor( [[37 12 25  5  9 20 21  0  0]  [51 34 27 33 29 18  0  0  0]  [49 52 30 31 19 46 10  0  0]  [ 7  5 50 43 28  7 47 17  0]  [24 35 39 40  3  6 32 16  0]  [ 4  2 15 14 22 23  0  0  0]  [36 48  6 38 42  3 45  0  0]  [ 4  2 13 41 53  8 44 26 11]], shape=(8, 9), dtype=int64) 

In addition, adaptable layers always expose an option to directly set state via constructor arguments or weight assignment. If the intended state values are known at layer construction time, or are calculated outside of the adapt() call, they can be set without relying on the layer's internal computation. For instance, if external vocabulary files for the TextVectorizationStringLookup, or IntegerLookup layers already exist, those can be loaded directly into the lookup tables by passing a path to the vocabulary file in the layer's constructor arguments.

Here's an example where you instantiate a StringLookup layer with precomputed vocabulary:

\

vocab = ["a", "b", "c", "d"] data = tf.constant([["a", "c", "d"], ["d", "z", "b"]]) layer = layers.StringLookup(vocabulary=vocab) vectorized_data = layer(data) print(vectorized_data) 

\

tf.Tensor( [[1 3 4]  [4 0 2]], shape=(2, 3), dtype=int64) 

Preprocessing data before the model or inside the model

There are two ways you could be using preprocessing layers:

Option 1: Make them part of the model, like this:

\

inputs = keras.Input(shape=input_shape) x = preprocessing_layer(inputs) outputs = rest_of_the_model(x) model = keras.Model(inputs, outputs) 

With this option, preprocessing will happen on device, synchronously with the rest of the model execution, meaning that it will benefit from GPU acceleration. If you're training on a GPU, this is the best option for the Normalization layer, and for all image preprocessing and data augmentation layers.

Option 2: apply it to your tf.data.Dataset, so as to obtain a dataset that yields batches of preprocessed data, like this:

\

dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y)) 

With this option, your preprocessing will happen on a CPU, asynchronously, and will be buffered before going into the model. In addition, if you call dataset.prefetch(tf.data.AUTOTUNE) on your dataset, the preprocessing will happen efficiently in parallel with training:

\

dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y)) dataset = dataset.prefetch(tf.data.AUTOTUNE) model.fit(dataset, ...) 

This is the best option for TextVectorization, and all structured data preprocessing layers. It can also be a good option if you're training on a CPU and you use image preprocessing layers.

Note that the TextVectorization layer can only be executed on a CPU, as it is mostly a dictionary lookup operation. Therefore, if you are training your model on a GPU or a TPU, you should put the TextVectorization layer in the tf.data pipeline to get the best performance.

When running on a TPU, you should always place preprocessing layers in the tf.data pipeline (with the exception of Normalization and Rescaling, which run fine on a TPU and are commonly used as the first layer in an image model).

Benefits of doing preprocessing inside the model at inference time

Even if you go with option 2, you may later want to export an inference-only end-to-end model that will include the preprocessing layers. The key benefit to doing this is that it makes your model portable and it helps reduce the training/serving skew.

When all data preprocessing is part of the model, other people can load and use your model without having to be aware of how each feature is expected to be encoded & normalized. Your inference model will be able to process raw images or raw structured data, and will not require users of the model to be aware of the details of e.g. the tokenization scheme used for text, the indexing scheme used for categorical features, whether image pixel values are normalized to [-1, +1] or to [0, 1], etc. This is especially powerful if you're exporting your model to another runtime, such as TensorFlow.js: you won't have to reimplement your preprocessing pipeline in JavaScript.

If you initially put your preprocessing layers in your tf.data pipeline, you can export an inference model that packages the preprocessing. Simply instantiate a new model that chains your preprocessing layers and your training model:

\

inputs = keras.Input(shape=input_shape) x = preprocessing_layer(inputs) outputs = training_model(x) inference_model = keras.Model(inputs, outputs) 

Preprocessing during multi-worker training

Preprocessing layers are compatible with the tf.distribute API for running training across multiple machines.

In general, preprocessing layers should be placed inside a tf.distribute.Strategy.scope() and called either inside or before the model as discussed above.

\

with strategy.scope():     inputs = keras.Input(shape=input_shape)     preprocessing_layer = tf.keras.layers.Hashing(10)     dense_layer = tf.keras.layers.Dense(16) 

For more details, refer to the Data preprocessing section of the Distributed input tutorial.

\ \

:::info Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

\

Market Opportunity
Brainedge Logo
Brainedge Price(LEARN)
$0.006737
$0.006737$0.006737
+2.21%
USD
Brainedge (LEARN) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Revolutionary: CME SOL XRP Futures Options Set to Transform Crypto Trading

Revolutionary: CME SOL XRP Futures Options Set to Transform Crypto Trading

BitcoinWorld Revolutionary: CME SOL XRP Futures Options Set to Transform Crypto Trading Exciting news is rippling through the cryptocurrency world! The U.S. Chicago Mercantile Exchange (CME), a titan in traditional finance, is reportedly planning to launch CME SOL XRP futures options. This significant development, initially reported by Walter Bloomberg, marks a pivotal moment for institutional involvement in the altcoin market. It signals a new era for how Solana (SOL) and Ripple (XRP) might be traded, potentially opening doors to broader adoption and increased market maturity. What Does the Launch of CME SOL XRP Futures Mean for Crypto? When an institution like CME, known for its rigorous standards and vast trading volume, enters a new market, it brings a wave of legitimacy. The introduction of CME SOL XRP futures options indicates a growing acceptance of these digital assets within mainstream finance. This move could fundamentally change how investors perceive and interact with SOL and XRP. Futures options are financial derivatives that give traders the right, but not the obligation, to buy or sell an underlying asset at a specific price on or before a certain date. For SOL and XRP, this means: Enhanced Price Discovery: More participants and trading volume can lead to more efficient and accurate pricing. Institutional Access: It provides regulated avenues for large institutional investors to gain exposure to SOL and XRP without directly owning the underlying assets. Risk Management: Traders can use these options to hedge against potential price fluctuations in their existing SOL and XRP holdings. Why Are SOL and XRP Chosen for CME SOL XRP Futures? The selection of Solana (SOL) and Ripple (XRP) for these new futures options is not arbitrary. Both cryptocurrencies hold significant positions in the market and offer distinct value propositions: Solana (SOL): Known for its high-performance blockchain, offering fast transaction speeds and low costs. Its robust ecosystem supports numerous decentralized applications (dApps), NFTs, and DeFi projects, attracting considerable developer and user interest. Ripple (XRP): Primarily focused on facilitating fast, low-cost international payments for financial institutions. Despite ongoing regulatory discussions, XRP maintains a strong market presence and a dedicated community, highlighting its potential for cross-border transactions. Their substantial market capitalization and existing liquidity make them attractive candidates for institutional-grade derivative products. This choice reflects a strategic assessment by CME of assets that can sustain significant trading interest and volume. Navigating the Landscape: Opportunities and Considerations for CME SOL XRP Futures The introduction of CME SOL XRP futures options presents a wealth of opportunities, yet it also comes with important considerations. On the opportunity front, we can expect increased liquidity, which benefits all market participants by making it easier to buy and sell without significant price impact. Moreover, it could attract new capital from traditional financial players who prefer regulated products. However, traders and investors should also consider the implications: Market Volatility: While derivatives can offer hedging, they can also amplify market movements. Regulatory Clarity: The regulatory landscape for cryptocurrencies, particularly for XRP, continues to evolve. CME’s move might encourage further clarity but also means ongoing scrutiny. Learning Curve: Understanding futures options requires a certain level of financial literacy, which new entrants to the crypto market may need to develop. These products offer sophisticated tools for managing exposure and speculating on price movements, but they demand a careful approach. What’s Next for the Crypto Market with CME SOL XRP Futures? The reported launch of CME SOL XRP futures options is more than just a new product offering; it represents a significant milestone in the ongoing convergence of traditional finance and the digital asset space. It underscores the growing maturity of the cryptocurrency market and its increasing integration into global financial systems. As institutional interest continues to surge, we can anticipate further innovation and a broader range of regulated products for other altcoins. This development is poised to offer sophisticated tools for investors and traders, potentially stabilizing market dynamics while simultaneously introducing new avenues for growth and investment. The crypto market is evolving rapidly, and CME’s latest initiative is a clear indicator of this exciting trajectory. To learn more about the latest crypto market trends, explore our article on key developments shaping the cryptocurrency market institutional adoption. Frequently Asked Questions (FAQs) What is the Chicago Mercantile Exchange (CME)? The CME is one of the world’s largest and most diverse derivatives marketplaces, offering a wide range of futures and options products across various asset classes, including equities, commodities, and now, expanding into specific cryptocurrencies. What are futures options in the context of SOL and XRP? Futures options for SOL and XRP are financial contracts that give the holder the right, but not the obligation, to buy or sell SOL or XRP futures contracts at a predetermined price on or before a specific date. They allow for hedging and speculation on price movements. Why are Solana (SOL) and Ripple (XRP) chosen for these new options? SOL and XRP were likely chosen due to their significant market capitalization, established liquidity, and distinct use cases within the crypto ecosystem, making them attractive for institutional-grade derivative products. How might CME SOL XRP futures options affect the prices of SOL and XRP? The introduction of these options could lead to increased liquidity and institutional participation, potentially influencing price discovery and stability. However, like all derivatives, they can also contribute to market volatility. When are these CME SOL XRP futures options expected to launch? While Walter Bloomberg reported CME’s plans, an official launch date has not yet been publicly announced by CME. Market participants should monitor official CME channels for updates. If you found this article insightful, please consider sharing it with your network! Help us spread the word about the exciting developments in the crypto space by sharing this article on your social media platforms. This post Revolutionary: CME SOL XRP Futures Options Set to Transform Crypto Trading first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 00:45
Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip

Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip

The post Gold Hits $3,700 as Sprott’s Wong Says Dollar’s Store-of-Value Crown May Slip appeared on BitcoinEthereumNews.com. Gold is strutting its way into record territory, smashing through $3,700 an ounce Wednesday morning, as Sprott Asset Management strategist Paul Wong says the yellow metal may finally snatch the dollar’s most coveted role: store of value. Wong Warns: Fiscal Dominance Puts U.S. Dollar on Notice, Gold on Top Gold prices eased slightly to $3,678.9 […] Source: https://news.bitcoin.com/gold-hits-3700-as-sprotts-wong-says-dollars-store-of-value-crown-may-slip/
Share
BitcoinEthereumNews2025/09/18 00:33
Will XRP Price Increase In September 2025?

Will XRP Price Increase In September 2025?

Ripple XRP is a cryptocurrency that primarily focuses on building a decentralised payments network to facilitate low-cost and cross-border transactions. It’s a native digital currency of the Ripple network, which works as a blockchain called the XRP Ledger (XRPL). It utilised a shared, distributed ledger to track account balances and transactions. What Do XRP Charts Reveal? […]
Share
Tronweekly2025/09/18 00:00