Switching from eager to graph execution with tf.function can make TensorFlow faster and more portable. This guide shows how AutoGraph converts Python control flow, how to avoid retracing, and when Grappler/XLA deliver real speedups.Switching from eager to graph execution with tf.function can make TensorFlow faster and more portable. This guide shows how AutoGraph converts Python control flow, how to avoid retracing, and when Grappler/XLA deliver real speedups.

Try This if Your TensorFlow Code Is Slow

2025/10/15 02:59

Content Overview

  • Overview
  • Setup
  • Taking advantage of graphs
  • Using tf.function
  • Seeing the speed-up
  • When is a tf.function tracing?
  • Next steps

\

Overview

This guide goes beneath the surface of TensorFlow and Keras to demonstrate how TensorFlow works. If you instead want to immediately get started with Keras, check out the collection of Keras guides.

In this guide, you'll learn how TensorFlow allows you to make simple changes to your code to get graphs, how graphs are stored and represented, and how you can use them to accelerate your models.

\

:::tip Note: For those of you who are only familiar with TensorFlow 1.x, this guide demonstrates a very different view of graphs.

:::

This is a big-picture overview that covers how tf.function allows you to switch from eager execution to graph execution. For a more complete specification of tf.function, go to the Better performance with tf.function guide.

What are graphs?

In the previous three guides, you ran TensorFlow eagerly. This means TensorFlow operations are executed by Python, operation by operation, and return results back to Python.

While eager execution has several unique advantages, graph execution enables portability outside Python and tends to offer better performance. Graph execution means that tensor computations are executed as a TensorFlow graph, sometimes referred to as a tf.Graph or simply a "graph."

Graphs are data structures that contain a set of tf.Operation objects, which represent units of computation; and tf.Tensor objects, which represent the units of data that flow between operations. They are defined in a tf.Graph context. Since these graphs are data structures, they can be saved, run, and restored all without the original Python code.

This is what a TensorFlow graph representing a two-layer neural network looks like when visualized in TensorBoard:

\n

The benefits of graphs

With a graph, you have a great deal of flexibility. You can use your TensorFlow graph in environments that don't have a Python interpreter, like mobile applications, embedded devices, and backend servers. TensorFlow uses graphs as the format for saved models when it exports them from Python.

Graphs are also easily optimized, allowing the compiler to do transformations like:

  • Statically infer the value of tensors by folding constant nodes in your computation ("constant folding").
  • Separate sub-parts of a computation that are independent and split them between threads or devices.
  • Simplify arithmetic operations by eliminating common subexpressions.

There is an entire optimization system, Grappler, to perform this and other speedups.

In short, graphs are extremely useful and let your TensorFlow run fast, run in parallel, and run efficiently on multiple devices.

However, you still want to define your machine learning models (or other computations) in Python for convenience, and then automatically construct graphs when you need them.

Setup

Import some necessary libraries:

\

import tensorflow as tf import timeit from datetime import datetime 

\

2024-08-15 01:23:58.511668: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-08-15 01:23:58.532403: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-08-15 01:23:58.538519: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 

Taking advantage of graphs

You create and run a graph in TensorFlow by using tf.function, either as a direct call or as a decorator. tf.function takes a regular function as input and returns a tf.types.experimental.PolymorphicFunctionA PolymorphicFunction is a Python callable that builds TensorFlow graphs from the Python function. You use a tf.function in the same way as its Python equivalent.

\

# Define a Python function. def a_regular_function(x, y, b):   x = tf.matmul(x, y)   x = x + b   return x  # The Python type of `a_function_that_uses_a_graph` will now be a # `PolymorphicFunction`. a_function_that_uses_a_graph = tf.function(a_regular_function)  # Make some tensors. x1 = tf.constant([[1.0, 2.0]]) y1 = tf.constant([[2.0], [3.0]]) b1 = tf.constant(4.0)  orig_value = a_regular_function(x1, y1, b1).numpy() # Call a `tf.function` like a Python function. tf_function_value = a_function_that_uses_a_graph(x1, y1, b1).numpy() assert(orig_value == tf_function_value) 

\

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1723685041.078349   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.081709   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.084876   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.088691   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.100124   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.103158   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.106072   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.109491   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.112991   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.115870   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.118785   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685041.122189   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.369900   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.372045   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.374040   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.376123   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.378174   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.380184   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.382098   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.384064   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.386002   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.387981   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.389902   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.391922   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.431010   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.433093   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.435050   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.437074   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.439053   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.441049   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.442965   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.444941   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.446890   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.450623   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.453482   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 I0000 00:00:1723685042.455908   10585 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 

On the outside, a tf.function looks like a regular function you write using TensorFlow operations. Underneath, however, it is very different. The underlying PolymorphicFunction encapsulates several tf.Graphs behind one API (learn more in the Polymorphism section). That is how a tf.function is able to give you the benefits of graph execution, like speed and deployability (refer to The benefits of graphs above).

tf.function applies to a function and all other functions it calls:

\

def inner_function(x, y, b):   x = tf.matmul(x, y)   x = x + b   return x  # Using the `tf.function` decorator makes `outer_function` into a # `PolymorphicFunction`. @tf.function def outer_function(x):   y = tf.constant([[2.0], [3.0]])   b = tf.constant(4.0)    return inner_function(x, y, b)  # Note that the callable will create a graph that # includes `inner_function` as well as `outer_function`. outer_function(tf.constant([[1.0, 2.0]])).numpy() 

\

array([[12.]], dtype=float32) 

If you have used TensorFlow 1.x, you will notice that at no time did you need to define a Placeholder or tf.Session.

Converting Python functions to graphs

Any function you write with TensorFlow will contain a mixture of built-in TF operations and Python logic, such as if-then clauses, loops, breakreturncontinue, and more. While TensorFlow operations are easily captured by a tf.Graph, Python-specific logic needs to undergo an extra step in order to become part of the graph. tf.function uses a library called AutoGraph (tf.autograph) to convert Python code into graph-generating code.

\

def simple_relu(x):   if tf.greater(x, 0):     return x   else:     return 0  # Using `tf.function` makes `tf_simple_relu` a `PolymorphicFunction` that wraps # `simple_relu`. tf_simple_relu = tf.function(simple_relu)  print("First branch, with graph:", tf_simple_relu(tf.constant(1)).numpy()) print("Second branch, with graph:", tf_simple_relu(tf.constant(-1)).numpy()) 

\

First branch, with graph: 1 Second branch, with graph: 0 

Though it is unlikely that you will need to view graphs directly, you can inspect the outputs to check the exact results. These are not easy to read, so no need to look too carefully!

\

# This is the graph-generating output of AutoGraph. print(tf.autograph.to_code(simple_relu)) 

\

def tf__simple_relu(x):     with ag__.FunctionScope('simple_relu', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:         do_return = False         retval_ = ag__.UndefinedReturnValue()          def get_state():             return (do_return, retval_)          def set_state(vars_):             nonlocal do_return, retval_             (do_return, retval_) = vars_          def if_body():             nonlocal do_return, retval_             try:                 do_return = True                 retval_ = ag__.ld(x)             except:                 do_return = False                 raise          def else_body():             nonlocal do_return, retval_             try:                 do_return = True                 retval_ = 0             except:                 do_return = False                 raise         ag__.if_stmt(ag__.converted_call(ag__.ld(tf).greater, (ag__.ld(x), 0), None, fscope), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)         return fscope.ret(retval_, do_return) 

\

# This is the graph itself. print(tf_simple_relu.get_concrete_function(tf.constant(1)).graph.as_graph_def()) 

\

node {   name: "x"   op: "Placeholder"   attr {     key: "_user_specified_name"     value {       s: "x"     }   }   attr {     key: "dtype"     value {       type: DT_INT32     }   }   attr {     key: "shape"     value {       shape {       }     }   } } node {   name: "Greater/y"   op: "Const"   attr {     key: "dtype"     value {       type: DT_INT32     }   }   attr {     key: "value"     value {       tensor {         dtype: DT_INT32         tensor_shape {         }         int_val: 0       }     }   } } node {   name: "Greater"   op: "Greater"   input: "x"   input: "Greater/y"   attr {     key: "T"     value {       type: DT_INT32     }   } } node {   name: "cond"   op: "StatelessIf"   input: "Greater"   input: "x"   attr {     key: "Tcond"     value {       type: DT_BOOL     }   }   attr {     key: "Tin"     value {       list {         type: DT_INT32       }     }   }   attr {     key: "Tout"     value {       list {         type: DT_BOOL         type: DT_INT32       }     }   }   attr {     key: "_lower_using_switch_merge"     value {       b: true     }   }   attr {     key: "_read_only_resource_inputs"     value {       list {       }     }   }   attr {     key: "else_branch"     value {       func {         name: "cond_false_31"       }     }   }   attr {     key: "output_shapes"     value {       list {         shape {         }         shape {         }       }     }   }   attr {     key: "then_branch"     value {       func {         name: "cond_true_30"       }     }   } } node {   name: "cond/Identity"   op: "Identity"   input: "cond"   attr {     key: "T"     value {       type: DT_BOOL     }   } } node {   name: "cond/Identity_1"   op: "Identity"   input: "cond:1"   attr {     key: "T"     value {       type: DT_INT32     }   } } node {   name: "Identity"   op: "Identity"   input: "cond/Identity_1"   attr {     key: "T"     value {       type: DT_INT32     }   } } library {   function {     signature {       name: "cond_false_31"       input_arg {         name: "cond_placeholder"         type: DT_INT32       }       output_arg {         name: "cond_identity"         type: DT_BOOL       }       output_arg {         name: "cond_identity_1"         type: DT_INT32       }     }     node_def {       name: "cond/Const"       op: "Const"       attr {         key: "dtype"         value {           type: DT_BOOL         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_BOOL             tensor_shape {             }             bool_val: true           }         }       }     }     node_def {       name: "cond/Const_1"       op: "Const"       attr {         key: "dtype"         value {           type: DT_BOOL         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_BOOL             tensor_shape {             }             bool_val: true           }         }       }     }     node_def {       name: "cond/Const_2"       op: "Const"       attr {         key: "dtype"         value {           type: DT_INT32         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_INT32             tensor_shape {             }             int_val: 0           }         }       }     }     node_def {       name: "cond/Const_3"       op: "Const"       attr {         key: "dtype"         value {           type: DT_BOOL         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_BOOL             tensor_shape {             }             bool_val: true           }         }       }     }     node_def {       name: "cond/Identity"       op: "Identity"       input: "cond/Const_3:output:0"       attr {         key: "T"         value {           type: DT_BOOL         }       }     }     node_def {       name: "cond/Const_4"       op: "Const"       attr {         key: "dtype"         value {           type: DT_INT32         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_INT32             tensor_shape {             }             int_val: 0           }         }       }     }     node_def {       name: "cond/Identity_1"       op: "Identity"       input: "cond/Const_4:output:0"       attr {         key: "T"         value {           type: DT_INT32         }       }     }     ret {       key: "cond_identity"       value: "cond/Identity:output:0"     }     ret {       key: "cond_identity_1"       value: "cond/Identity_1:output:0"     }     attr {       key: "_construction_context"       value {         s: "kEagerRuntime"       }     }     arg_attr {       key: 0       value {         attr {           key: "_output_shapes"           value {             list {               shape {               }             }           }         }       }     }   }   function {     signature {       name: "cond_true_30"       input_arg {         name: "cond_identity_1_x"         type: DT_INT32       }       output_arg {         name: "cond_identity"         type: DT_BOOL       }       output_arg {         name: "cond_identity_1"         type: DT_INT32       }     }     node_def {       name: "cond/Const"       op: "Const"       attr {         key: "dtype"         value {           type: DT_BOOL         }       }       attr {         key: "value"         value {           tensor {             dtype: DT_BOOL             tensor_shape {             }             bool_val: true           }         }       }     }     node_def {       name: "cond/Identity"       op: "Identity"       input: "cond/Const:output:0"       attr {         key: "T"         value {           type: DT_BOOL         }       }     }     node_def {       name: "cond/Identity_1"       op: "Identity"       input: "cond_identity_1_x"       attr {         key: "T"         value {           type: DT_INT32         }       }     }     ret {       key: "cond_identity"       value: "cond/Identity:output:0"     }     ret {       key: "cond_identity_1"       value: "cond/Identity_1:output:0"     }     attr {       key: "_construction_context"       value {         s: "kEagerRuntime"       }     }     arg_attr {       key: 0       value {         attr {           key: "_output_shapes"           value {             list {               shape {               }             }           }         }         attr {           key: "_user_specified_name"           value {             s: "x"           }         }       }     }   } } versions {   producer: 1882   min_consumer: 12 } 

Most of the time, tf.function will work without special considerations. However, there are some caveats, and the tf.function guide can help here, as well as the complete AutoGraph reference.

Polymorphism: one tf.function, many graphs

tf.Graph is specialized to a specific type of inputs (for example, tensors with a specific dtype or objects with the same id()).

Each time you invoke a tf.function with a set of arguments that can't be handled by any of its existing graphs (such as arguments with new dtypes or incompatible shapes), it creates a new tf.Graph specialized to those new arguments. The type specification of a tf.Graph's inputs is represented by tf.types.experimental.FunctionType, also referred to as the signature. For more information regarding when a new tf.Graph is generated, how that can be controlled, and how FunctionType can be useful, go to the Rules of tracing section of the Better performance with tf.function guide.

The tf.function stores the tf.Graph corresponding to that signature in a ConcreteFunctionA ConcreteFunction can be thought of as a wrapper around a tf.Graph.

\

@tf.function def my_relu(x):   return tf.maximum(0., x)  # `my_relu` creates new graphs as it observes different input types. print(my_relu(tf.constant(5.5))) print(my_relu([1, -1])) print(my_relu(tf.constant([3., -3.]))) 

\

tf.Tensor(5.5, shape=(), dtype=float32) tf.Tensor([1. 0.], shape=(2,), dtype=float32) tf.Tensor([3. 0.], shape=(2,), dtype=float32) 

If the tf.function has already been called with the same input types, it does not create a new tf.Graph.

\

# These two calls do *not* create new graphs. print(my_relu(tf.constant(-2.5))) # Input type matches `tf.constant(5.5)`. print(my_relu(tf.constant([-1., 1.]))) # Input type matches `tf.constant([3., -3.])`. 

\

tf.Tensor(0.0, shape=(), dtype=float32) tf.Tensor([0. 1.], shape=(2,), dtype=float32) 

Because it's backed by multiple graphs, a tf.function is (as the name "PolymorphicFunction" suggests) polymorphic. That enables it to support more input types than a single tf.Graph could represent, and to optimize each tf.Graph for better performance.

\

# There are three `ConcreteFunction`s (one for each graph) in `my_relu`. # The `ConcreteFunction` also knows the return type and shape! print(my_relu.pretty_printed_concrete_signatures()) 

\

Input Parameters:   x (POSITIONAL_OR_KEYWORD): TensorSpec(shape=(), dtype=tf.float32, name=None) Output Type:   TensorSpec(shape=(), dtype=tf.float32, name=None) Captures:   None  Input Parameters:   x (POSITIONAL_OR_KEYWORD): List[Literal[1], Literal[-1]] Output Type:   TensorSpec(shape=(2,), dtype=tf.float32, name=None) Captures:   None  Input Parameters:   x (POSITIONAL_OR_KEYWORD): TensorSpec(shape=(2,), dtype=tf.float32, name=None) Output Type:   TensorSpec(shape=(2,), dtype=tf.float32, name=None) Captures:   None 

Using tf.function

So far, you've learned how to convert a Python function into a graph simply by using tf.function as a decorator or wrapper. But in practice, getting tf.function to work correctly can be tricky! In the following sections, you'll learn how you can make your code work as expected with tf.function.

Graph execution vs. eager execution

The code in a tf.function can be executed both eagerly and as a graph. By default, tf.function executes its code as a graph:

\

@tf.function def get_MSE(y_true, y_pred):   sq_diff = tf.pow(y_true - y_pred, 2)   return tf.reduce_mean(sq_diff) 

\

y_true = tf.random.uniform([5], maxval=10, dtype=tf.int32) y_pred = tf.random.uniform([5], maxval=10, dtype=tf.int32) print(y_true) print(y_pred) 

\

tf.Tensor([2 0 7 2 3], shape=(5,), dtype=int32) tf.Tensor([9 9 1 1 5], shape=(5,), dtype=int32) 

\

get_MSE(y_true, y_pred) 

\

<tf.Tensor: shape=(), dtype=int32, numpy=34> 

To verify that your tf.function's graph is doing the same computation as its equivalent Python function, you can make it execute eagerly with tf.config.run_functions_eagerly(True). This is a switch that turns off tf.function's ability to create and run graphs, instead of executing the code normally.

\

tf.config.run_functions_eagerly(True) 

\

get_MSE(y_true, y_pred) 

\

<tf.Tensor: shape=(), dtype=int32, numpy=34> 

\

# Don't forget to set it back when you are done. tf.config.run_functions_eagerly(False) 

However, tf.function can behave differently under graph and eager execution. The Python print function is one example of how these two modes differ. Let's check out what happens when you insert a print statement to your function and call it repeatedly.

\

@tf.function def get_MSE(y_true, y_pred):   print("Calculating MSE!")   sq_diff = tf.pow(y_true - y_pred, 2)   return tf.reduce_mean(sq_diff) 

Observe what is printed:

\

error = get_MSE(y_true, y_pred) error = get_MSE(y_true, y_pred) error = get_MSE(y_true, y_pred) 

\

Calculating MSE! 

Is the output surprising? get_MSE only printed once even though it was called three times.

To explain, the print statement is executed when tf.function runs the original code in order to create the graph in a process known as "tracing" (refer to the Tracing section of the tf.function guide. Tracing captures the TensorFlow operations into a graph, and print is not captured in the graph. That graph is then executed for all three calls without ever running the Python code again.

As a sanity check, let's turn off graph execution to compare:

\

# Now, globally set everything to run eagerly to force eager execution. tf.config.run_functions_eagerly(True) 

\

# Observe what is printed below. error = get_MSE(y_true, y_pred) error = get_MSE(y_true, y_pred) error = get_MSE(y_true, y_pred) 

\

Calculating MSE! Calculating MSE! Calculating MSE! 

\

tf.config.run_functions_eagerly(False) 

print is a Python side effect, and there are other differences that you should be aware of when converting a function into a tf.function. Learn more in the Limitations section of the Better performance with tf.function guide.

\

:::tip Note: If you would like to print values in both eager and graph execution, use tf.print instead.

:::

Non-strict execution

\ Graph execution only executes the operations necessary to produce the observable effects, which include:

  • The return value of the function
  • Documented well-known side-effects such as:
  • Input/output operations, like tf.print
  • Debugging operations, such as the assert functions in tf.debugging
  • Mutations of tf.Variable

This behavior is usually known as "Non-strict execution", and differs from eager execution, which steps through all of the program operations, needed or not.

In particular, runtime error checking does not count as an observable effect. If an operation is skipped because it is unnecessary, it cannot raise any runtime errors.

In the following example, the "unnecessary" operation tf.gather is skipped during graph execution, so the runtime error InvalidArgumentError is not raised as it would be in eager execution. Do not rely on an error being raised while executing a graph.

\

def unused_return_eager(x):   # Get index 1 will fail when `len(x) == 1`   tf.gather(x, [1]) # unused    return x  try:   print(unused_return_eager(tf.constant([0.0]))) except tf.errors.InvalidArgumentError as e:   # All operations are run during eager execution so an error is raised.   print(f'{type(e).__name__}: {e}') 

\

tf.Tensor([0.], shape=(1,), dtype=float32) 

\

@tf.function def unused_return_graph(x):   tf.gather(x, [1]) # unused   return x  # Only needed operations are run during graph execution. The error is not raised. print(unused_return_graph(tf.constant([0.0]))) 

\

tf.Tensor([0.], shape=(1,), dtype=float32) 

tf.function best practices

It may take some time to get used to the behavior of tf.function. To get started quickly, first-time users should play around with decorating toy functions with @tf.function to get experience with going from eager to graph execution.

Designing for tf.function may be your best bet for writing graph-compatible TensorFlow programs. Here are some tips:

  • Toggle between eager and graph execution early and often with tf.config.run_functions_eagerly to pinpoint if/ when the two modes diverge.
  • Create tf.Variables outside the Python function and modify them on the inside. The same goes for objects that use tf.Variable, like tf.keras.layerstf.keras.Models and tf.keras.optimizers.
  • Avoid writing functions that depend on outer Python variables, excluding tf.Variables and Keras objects. Learn more in Depending on Python global and free variables of the tf.function guide.
  • Prefer to write functions which take tensors and other TensorFlow types as input. You can pass in other object types but be careful! Learn more in Depending on Python objects of the tf.function guide.
  • Include as much computation as possible under a tf.function to maximize the performance gain. For example, decorate a whole training step or the entire training loop.

Seeing the speed-up

tf.function usually improves the performance of your code, but the amount of speed-up depends on the kind of computation you run. Small computations can be dominated by the overhead of calling a graph. You can measure the difference in performance like so:

\

x = tf.random.uniform(shape=[10, 10], minval=-1, maxval=2, dtype=tf.dtypes.int32)  def power(x, y):   result = tf.eye(10, dtype=tf.dtypes.int32)   for _ in range(y):     result = tf.matmul(x, result)   return result 

\

print("Eager execution:", timeit.timeit(lambda: power(x, 100), number=1000), "seconds") 

\

Eager execution: 4.1027931490000356 seconds 

\

power_as_graph = tf.function(power) print("Graph execution:", timeit.timeit(lambda: power_as_graph(x, 100), number=1000), "seconds") 

\

Graph execution: 0.7951284349999241 seconds 

tf.function is commonly used to speed up training loops, and you can learn more about it in the Speeding-up your training step with tf.function section of the Writing a training loop from scratch with Keras guide.

\

:::tip Note: You can also try tf.function(jit_compile=True) for a more significant performance boost, especially if your code is heavy on TensorFlow control flow and uses many small tensors. Learn more in the _Explicit compilation with tf.function(jitcompile=True) section of the XLA overview.

:::

Performance and trade-offs

Graphs can speed up your code, but the process of creating them has some overhead. For some functions, the creation of the graph takes more time than the execution of the graph. This investment is usually quickly paid back with the performance boost of subsequent executions, but it's important to be aware that the first few steps of any large model training can be slower due to tracing.

No matter how large your model, you want to avoid tracing frequently. In the Controlling retracing section, the tf.function guide discusses how to set input specifications and use tensor arguments to avoid retracing. If you find you are getting unusually poor performance, it's a good idea to check if you are retracing accidentally.

When is a tf.function tracing?

To figure out when your tf.function is tracing, add a print statement to its code. As a rule of thumb, tf.function will execute the print statement every time it traces.

\

@tf.function def a_function_with_python_side_effect(x):   print("Tracing!") # An eager-only side effect.   return x * x + tf.constant(2)  # This is traced the first time. print(a_function_with_python_side_effect(tf.constant(2))) # The second time through, you won't see the side effect. print(a_function_with_python_side_effect(tf.constant(3))) 

\

Tracing! tf.Tensor(6, shape=(), dtype=int32) tf.Tensor(11, shape=(), dtype=int32) 

\

# This retraces each time the Python argument changes, # as a Python argument could be an epoch count or other # hyperparameter. print(a_function_with_python_side_effect(2)) print(a_function_with_python_side_effect(3)) 

\

Tracing! tf.Tensor(6, shape=(), dtype=int32) Tracing! tf.Tensor(11, shape=(), dtype=int32) 

New Python arguments always trigger the creation of a new graph, hence the extra tracing.

Next steps

You can learn more about tf.function on the API reference page and by following the Better performance with tf.function guide.

:::info Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

What Every Platform Eventually Learns About Handling User Payments Across Borders

What Every Platform Eventually Learns About Handling User Payments Across Borders

There is a moment almost every global platform hits. It rarely shows up in dashboards or board meetings. It reveals itself quietly, one payout del
Share
Medium2025/12/10 21:54
U.S. AI leaders form foundation to compete with China

U.S. AI leaders form foundation to compete with China

The post U.S. AI leaders form foundation to compete with China appeared on BitcoinEthereumNews.com. A group of leading U.S. artificial intelligence firms has formed a new foundation to establish open standards for “agentic” AI. The founding members, OpenAI, Anthropic, and Block, have pooled their proprietary agent- and AI-related technologies into a new open-source project called the Agentic AI Foundation (AAIF), under the auspices of the Linux Foundation. This development follows tensions in the global race for dominance in artificial intelligence, leading U.S. AI firms and policymakers to unite around a new push to preserve American primacy. Open standards like MCP drive innovation and cross-platform collaboration Cloudflare CTO Dane Knecht noted that open standards and protocols, such as MCP, are critical for establishing an evolving developer ecosystem for building agents. He added, “They ensure anyone can build agents across platforms without the fear of vendor lock-in.” American companies face a dilemma because they are seeking continuous income from closed APIs, even as they are falling behind in fundamental AI development, risking long-term irrelevance to China. And that means American companies must standardize their approach for MCP and agentic AI, allowing them to focus on building better models rather than being locked into an ecosystem. The foundation establishes both a practical partnership and a milestone for community open-sourcing, with adversaries uniting around a single goal of standardization rather than fragmentation. It also makes open-source development easier and more accessible for users worldwide, including those in China. Anthropic donated its Model Context Protocol (MCP), a library that allows AIs to utilize tools creatively outside API calls, to the Linux Foundation. Since its introduction a year ago, MCP has gained traction, with over 10,000 active servers, best-in-class support from platforms including ChatGPT, Gemini, Microsoft Copilot, and VS Code, as well as 97 million monthly SDK downloads. “Open-source software is key to creating a world with secure and innovative AI tools for…
Share
BitcoinEthereumNews2025/12/10 22:10
Summarize Any Stock’s Earnings Call in Seconds Using FMP API

Summarize Any Stock’s Earnings Call in Seconds Using FMP API

Turn lengthy earnings call transcripts into one-page insights using the Financial Modeling Prep APIPhoto by Bich Tran Earnings calls are packed with insights. They tell you how a company performed, what management expects in the future, and what analysts are worried about. The challenge is that these transcripts often stretch across dozens of pages, making it tough to separate the key takeaways from the noise. With the right tools, you don’t need to spend hours reading every line. By combining the Financial Modeling Prep (FMP) API with Groq’s lightning-fast LLMs, you can transform any earnings call into a concise summary in seconds. The FMP API provides reliable access to complete transcripts, while Groq handles the heavy lifting of distilling them into clear, actionable highlights. In this article, we’ll build a Python workflow that brings these two together. You’ll see how to fetch transcripts for any stock, prepare the text, and instantly generate a one-page summary. Whether you’re tracking Apple, NVIDIA, or your favorite growth stock, the process works the same — fast, accurate, and ready whenever you are. Fetching Earnings Transcripts with FMP API The first step is to pull the raw transcript data. FMP makes this simple with dedicated endpoints for earnings calls. If you want the latest transcripts across the market, you can use the stable endpoint /stable/earning-call-transcript-latest. For a specific stock, the v3 endpoint lets you request transcripts by symbol, quarter, and year using the pattern: https://financialmodelingprep.com/api/v3/earning_call_transcript/{symbol}?quarter={q}&year={y}&apikey=YOUR_API_KEY here’s how you can fetch NVIDIA’s transcript for a given quarter: import requestsAPI_KEY = "your_api_key"symbol = "NVDA"quarter = 2year = 2024url = f"https://financialmodelingprep.com/api/v3/earning_call_transcript/{symbol}?quarter={quarter}&year={year}&apikey={API_KEY}"response = requests.get(url)data = response.json()# Inspect the keysprint(data.keys())# Access transcript contentif "content" in data[0]: transcript_text = data[0]["content"] print(transcript_text[:500]) # preview first 500 characters The response typically includes details like the company symbol, quarter, year, and the full transcript text. If you aren’t sure which quarter to query, the “latest transcripts” endpoint is the quickest way to always stay up to date. Cleaning and Preparing Transcript Data Raw transcripts from the API often include long paragraphs, speaker tags, and formatting artifacts. Before sending them to an LLM, it helps to organize the text into a cleaner structure. Most transcripts follow a pattern: prepared remarks from executives first, followed by a Q&A session with analysts. Separating these sections gives better control when prompting the model. In Python, you can parse the transcript and strip out unnecessary characters. A simple way is to split by markers such as “Operator” or “Question-and-Answer.” Once separated, you can create two blocks — Prepared Remarks and Q&A — that will later be summarized independently. This ensures the model handles each section within context and avoids missing important details. Here’s a small example of how you might start preparing the data: import re# Example: using the transcript_text we fetched earliertext = transcript_text# Remove extra spaces and line breaksclean_text = re.sub(r'\s+', ' ', text).strip()# Split sections (this is a heuristic; real-world transcripts vary slightly)if "Question-and-Answer" in clean_text: prepared, qna = clean_text.split("Question-and-Answer", 1)else: prepared, qna = clean_text, ""print("Prepared Remarks Preview:\n", prepared[:500])print("\nQ&A Preview:\n", qna[:500]) With the transcript cleaned and divided, you’re ready to feed it into Groq’s LLM. Chunking may be necessary if the text is very long. A good approach is to break it into segments of a few thousand tokens, summarize each part, and then merge the summaries in a final pass. Summarizing with Groq LLM Now that the transcript is clean and split into Prepared Remarks and Q&A, we’ll use Groq to generate a crisp one-pager. The idea is simple: summarize each section separately (for focus and accuracy), then synthesize a final brief. Prompt design (concise and factual) Use a short, repeatable template that pushes for neutral, investor-ready language: You are an equity research analyst. Summarize the following earnings call sectionfor {symbol} ({quarter} {year}). Be factual and concise.Return:1) TL;DR (3–5 bullets)2) Results vs. guidance (what improved/worsened)3) Forward outlook (specific statements)4) Risks / watch-outs5) Q&A takeaways (if present)Text:<<<{section_text}>>> Python: calling Groq and getting a clean summary Groq provides an OpenAI-compatible API. Set your GROQ_API_KEY and pick a fast, high-quality model (e.g., a Llama-3.1 70B variant). We’ll write a helper to summarize any text block, then run it for both sections and merge. import osimport textwrapimport requestsGROQ_API_KEY = os.environ.get("GROQ_API_KEY") or "your_groq_api_key"GROQ_BASE_URL = "https://api.groq.com/openai/v1" # OpenAI-compatibleMODEL = "llama-3.1-70b" # choose your preferred Groq modeldef call_groq(prompt, temperature=0.2, max_tokens=1200): url = f"{GROQ_BASE_URL}/chat/completions" headers = { "Authorization": f"Bearer {GROQ_API_KEY}", "Content-Type": "application/json", } payload = { "model": MODEL, "messages": [ {"role": "system", "content": "You are a precise, neutral equity research analyst."}, {"role": "user", "content": prompt}, ], "temperature": temperature, "max_tokens": max_tokens, } r = requests.post(url, headers=headers, json=payload, timeout=60) r.raise_for_status() return r.json()["choices"][0]["message"]["content"].strip()def build_prompt(section_text, symbol, quarter, year): template = """ You are an equity research analyst. Summarize the following earnings call section for {symbol} ({quarter} {year}). Be factual and concise. Return: 1) TL;DR (3–5 bullets) 2) Results vs. guidance (what improved/worsened) 3) Forward outlook (specific statements) 4) Risks / watch-outs 5) Q&A takeaways (if present) Text: <<< {section_text} >>> """ return textwrap.dedent(template).format( symbol=symbol, quarter=quarter, year=year, section_text=section_text )def summarize_section(section_text, symbol="NVDA", quarter="Q2", year="2024"): if not section_text or section_text.strip() == "": return "(No content found for this section.)" prompt = build_prompt(section_text, symbol, quarter, year) return call_groq(prompt)# Example usage with the cleaned splits from Section 3prepared_summary = summarize_section(prepared, symbol="NVDA", quarter="Q2", year="2024")qna_summary = summarize_section(qna, symbol="NVDA", quarter="Q2", year="2024")final_one_pager = f"""# {symbol} Earnings One-Pager — {quarter} {year}## Prepared Remarks — Key Points{prepared_summary}## Q&A Highlights{qna_summary}""".strip()print(final_one_pager[:1200]) # preview Tips that keep quality high: Keep temperature low (≈0.2) for factual tone. If a section is extremely long, chunk at ~5–8k tokens, summarize each chunk with the same prompt, then ask the model to merge chunk summaries into one section summary before producing the final one-pager. If you also fetched headline numbers (EPS/revenue, guidance) earlier, prepend them to the prompt as brief context to help the model anchor on the right outcomes. Building the End-to-End Pipeline At this point, we have all the building blocks: the FMP API to fetch transcripts, a cleaning step to structure the data, and Groq LLM to generate concise summaries. The final step is to connect everything into a single workflow that can take any ticker and return a one-page earnings call summary. The flow looks like this: Input a stock ticker (for example, NVDA). Use FMP to fetch the latest transcript. Clean and split the text into Prepared Remarks and Q&A. Send each section to Groq for summarization. Merge the outputs into a neatly formatted earnings one-pager. Here’s how it comes together in Python: def summarize_earnings_call(symbol, quarter, year, api_key, groq_key): # Step 1: Fetch transcript from FMP url = f"https://financialmodelingprep.com/api/v3/earning_call_transcript/{symbol}?quarter={quarter}&year={year}&apikey={api_key}" resp = requests.get(url) resp.raise_for_status() data = resp.json() if not data or "content" not in data[0]: return f"No transcript found for {symbol} {quarter} {year}" text = data[0]["content"] # Step 2: Clean and split clean_text = re.sub(r'\s+', ' ', text).strip() if "Question-and-Answer" in clean_text: prepared, qna = clean_text.split("Question-and-Answer", 1) else: prepared, qna = clean_text, "" # Step 3: Summarize with Groq prepared_summary = summarize_section(prepared, symbol, quarter, year) qna_summary = summarize_section(qna, symbol, quarter, year) # Step 4: Merge into final one-pager return f"""# {symbol} Earnings One-Pager — {quarter} {year}## Prepared Remarks{prepared_summary}## Q&A Highlights{qna_summary}""".strip()# Example runprint(summarize_earnings_call("NVDA", 2, 2024, API_KEY, GROQ_API_KEY)) With this setup, generating a summary becomes as simple as calling one function with a ticker and date. You can run it inside a notebook, integrate it into a research workflow, or even schedule it to trigger after each new earnings release. Free Stock Market API and Financial Statements API... Conclusion Earnings calls no longer need to feel overwhelming. With the Financial Modeling Prep API, you can instantly access any company’s transcript, and with Groq LLM, you can turn that raw text into a sharp, actionable summary in seconds. This pipeline saves hours of reading and ensures you never miss the key results, guidance, or risks hidden in lengthy remarks. Whether you track tech giants like NVIDIA or smaller growth stocks, the process is the same — fast, reliable, and powered by the flexibility of FMP’s data. Summarize Any Stock’s Earnings Call in Seconds Using FMP API was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story
Share
Medium2025/09/18 14:40